Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

Estimation of escape probabilities for PPM based on universal source coding theory

Åberg, Jan LU ; Shtarkov, Yu M. and Smeets, B. J.M. LU (1997) 1997 IEEE International Symposium on Information Theory, ISIT 1997 In IEEE International Symposium on Information Theory - Proceedings p.65-65
Abstract

Some of the best compression ratios for text compression are provided by the PPM (prediction by partial matching) class of algorithms. These algorithms are based on arithmetic coding using a fixed-depth Markov chain model of the source, i.e., the subsequence of symbols generated in any state s of the source is assumed to be the output of a memoryless subsource w=w(s). One of the most crucial steps in any PPM algorithm is the choice of the escape probability, i.e., the probability of a previously unseen symbol appearing in a given state. Let A={1,...,M} be the source alphabet, x/sup k/=x/ix/ i...x/sub k//spl isin/A/sup k/ be a sequence of symbols generated by a subsource w with unknown parameters, and m=m(x/sup k/)... (More)

Some of the best compression ratios for text compression are provided by the PPM (prediction by partial matching) class of algorithms. These algorithms are based on arithmetic coding using a fixed-depth Markov chain model of the source, i.e., the subsequence of symbols generated in any state s of the source is assumed to be the output of a memoryless subsource w=w(s). One of the most crucial steps in any PPM algorithm is the choice of the escape probability, i.e., the probability of a previously unseen symbol appearing in a given state. Let A={1,...,M} be the source alphabet, x/sup k/=x/ix/ i...x/sub k//spl isin/A/sup k/ be a sequence of symbols generated by a subsource w with unknown parameters, and m=m(x/sup k/) be the number of different symbols in x/sup k/. In most incarnations of PPM, the expression for the escape probability used for coding is of the form /spl thetav/(E|x/sup k/)=k(k,m)/k+m/spl alpha/+k(k,m), where /spl alpha/ is a parameter. The encoding of the "escape symbol" E is followed by a description of the new symbol, which is independent of the choice of escape probability and is not discussed. In almost all PPM schemes, the expression for the escape probability has been chosen heuristically. Following the method of Shtarkov et al. (see Probl. of Information Transmission, vol.31, no.2, p.20-35, 1995) we use a universal coding approach.

(Less)
Please use this url to cite or link to this publication:
author
; and
organization
publishing date
type
Chapter in Book/Report/Conference proceeding
publication status
published
subject
host publication
Proceedings 1997 IEEE International Symposium on Information Theory
series title
IEEE International Symposium on Information Theory - Proceedings
article number
612980
pages
1 pages
publisher
IEEE - Institute of Electrical and Electronics Engineers Inc.
conference name
1997 IEEE International Symposium on Information Theory, ISIT 1997
conference location
Ulm, Germany
conference dates
1997-06-29 - 1997-07-04
external identifiers
  • scopus:0030718271
ISSN
2157-8095
ISBN
0780339568
9780780339569
DOI
10.1109/ISIT.1997.612980
language
English
LU publication?
yes
id
159a00a8-053c-4727-be1c-c0100d66dc26
date added to LUP
2021-11-05 02:36:28
date last changed
2022-02-02 01:10:57
@inproceedings{159a00a8-053c-4727-be1c-c0100d66dc26,
  abstract     = {{<p>Some of the best compression ratios for text compression are provided by the PPM (prediction by partial matching) class of algorithms. These algorithms are based on arithmetic coding using a fixed-depth Markov chain model of the source, i.e., the subsequence of symbols generated in any state s of the source is assumed to be the output of a memoryless subsource w=w(s). One of the most crucial steps in any PPM algorithm is the choice of the escape probability, i.e., the probability of a previously unseen symbol appearing in a given state. Let A={1,...,M} be the source alphabet, x/sup k/=x/<sub>i</sub>x/ <sub>i</sub>...x/sub k//spl isin/A/sup k/ be a sequence of symbols generated by a subsource w with unknown parameters, and m=m(x/sup k/) be the number of different symbols in x/sup k/. In most incarnations of PPM, the expression for the escape probability used for coding is of the form /spl thetav/(E|x/sup k/)=k(k,m)/k+m/spl alpha/+k(k,m), where /spl alpha/ is a parameter. The encoding of the "escape symbol" E is followed by a description of the new symbol, which is independent of the choice of escape probability and is not discussed. In almost all PPM schemes, the expression for the escape probability has been chosen heuristically. Following the method of Shtarkov et al. (see Probl. of Information Transmission, vol.31, no.2, p.20-35, 1995) we use a universal coding approach.</p>}},
  author       = {{Åberg, Jan and Shtarkov, Yu M. and Smeets, B. J.M.}},
  booktitle    = {{Proceedings 1997 IEEE International Symposium on Information Theory}},
  isbn         = {{0780339568}},
  issn         = {{2157-8095}},
  language     = {{eng}},
  pages        = {{65--65}},
  publisher    = {{IEEE - Institute of Electrical and Electronics Engineers Inc.}},
  series       = {{IEEE International Symposium on Information Theory - Proceedings}},
  title        = {{Estimation of escape probabilities for PPM based on universal source coding theory}},
  url          = {{http://dx.doi.org/10.1109/ISIT.1997.612980}},
  doi          = {{10.1109/ISIT.1997.612980}},
  year         = {{1997}},
}