Statistical Natural Language Processing

- Syllabus
- Survey
- Assignment #1
- Assignment #2
- N-gram approximation handout
- Revised Assignment #3
- HMM N-gram handout
- Assignment #4
- Assignment #5
- Forward probabilities for a PCFG
- Sample from Welsh-English corpus
- Assignment #6
- Sample alignment for Welsh-English corpus
- Final assignment

- Simple PCFG weights
- Efficiency demo
- Austen text
- Russian text
- The+N sequences in Brown
- Two-dice simulation
- Binomial distribution
- Entropy of a die with up to 8 sides
- Entropy of an unfair die with 4 sides
- Distribution of word lengths in Brown
- Mutual information
- Zipfian distribution in Brown
- Special log2 function
- N-gram approximation
- Laplace and Lidstone demo
- Unseen unigram demo
- Comparing Laplace and Lidstone for Austen
- Comparing MLE and Lidstone for Austen
- Second Austen text
- Comparing smoothing techniques for one sentence in second Austen text
- Deleted estimation demo
- One answer to #3 on HW#2
- Translation of Good-Turing
- Demo of regression for Good-Turing
- Applying Good-Turing to Austen
- Multiple tags in Brown
- Dumb tagger for Brown
- One answer to #2 on HW#3
- Forward probabilities
- HMM visualization (Kludgy, platform-dependent, and Graphviz must be installed.
- One answer to #1 on HW#4
- One (partial) answer to #3 on HW#4
- One answer to #2 on HW#5
- Quick and dirty xml parser for Welsh National Assembly data
- Code for setting different colors
- K-means demo
- EM demo
- One answer to #1 on HW#6
- Vector space demo
- Poisson distributions
- Poisson demo
- two-Poisson distribution demo
- k-mix demo
- least-squares demo
- Binary independence model demo

- Corpora and readings (for enrolled students only)
- Textbook website
- Matlab site license page at UA (free for UA folks)
- Matlab website
- Octave (open-source free alternative to Matlab)
- Graphviz (Graph visualization software, for HMMs)
- Various Matlab files from 478/578 (Spring 2013)
- Various Matlab files from 408/508 (Fall 2013)