Statistical Natural Language Processing

- Syllabus
- Survey
- Automata handout
- N-gram handout
- Exercise #1
- Information theory
- Smoothing
- Smoothing continued
- Toolkits
- Exercise #2
- HMMs
- HMM algorithms
- HMM algorithms continued
- PCFGs
- Handout on prospectus
- PCFG algorithms

- Chris Manning's Statistical NLP links
- Linguistic Data Consortium
- Project Gutenberg
- Oxford Text Archive

- Krenn & Samuelson's book on statistical NLP*
- Chris Brew's book on Data-intensive Linguistics, as postscript* or HTML
- Abney's paper on Statistical methods and linguistics*
- Chen & Goodman as pdf or as postscript*
- Clarkson & Rosenfeld* (This paper describes the CMU-Cambridge package below.)
- Stolcke as gzipped postscript or as pdf (This paper describes the SRILM package below.)
- Goldsmith on probability for language modeling

- My book manuscript
- Ch.1: Introduction (2/10/03)
- Ch.2: Formal language theory (2/10/03)
- Ch.3: Probability theory (2/10/03)
- Ch.4: N-grams (2/10/03)
- Ch.5: Information theory (2/17/03)
- Ch.6: Sparse data (2/24/03)
- Ch.7: Hidden Markov Models (3/10/03)
- Ch.8: HMM Algorithms (3/24/03)
- References (2/1/03)

- programs from book
require that you have Perl installed on your system.)
- Unigrams.pm
- Bigrams.pm
- uniapprox.pl*
- biapprox.pl*
- entropy.pl*
- entropy2.pl*
- hapax.pl*
- novelwords.pl*
- addonecross.pl*
- addx.pl*
- forngram.pl* (Corrected: 3/7/03)

