|
This is a introductory course in computational linguistics at an
advanced level.
Reference Textbook We will make use of selected chapters from Speech and Language Processing by D. Jurafsky and J.H. Martin, Prentice-Hall 2000. A copy of the book is on reserve in the library. Email List Hosted at listserv.arizona.edu The name of the list is LING538@LISTSERV.ARIZONA.EDU
Software We will use Perl and SWI-Prolog (both freely available) in the computer laboratory classes. We will also use a spreadsheet, namely Microsoft Excel, for calculations.
|
Instructor: Sandiway Fong sandiway@email.arizona.edu
Office: Douglass 311
| Location | S SCI 224 (Computer Lab) |
| Time | Tuesday-Thursdays 12:30-1:45 pm |
See Lecture 1 slides for the homework and grading policy.
| Date | Lecture Notes | Number of Slides |
Topic | |
|---|---|---|---|---|
| Powerpoint | ||||
| 8/21 | lecture1.pdf | lecture1.ppt | 45 | Administrivia and Introduction |
| 8/23 | lecture2.pdf | lecture2.ppt | 20 | Regular expressions (regexp). Introduction to Perl. |
| 8/28 | lecture3.pdf | lecture3.ppt | 14 | Regexps: recap. Perl: variables, conditionals, iterators. Regexps: grouping and backreferences. Homework 1.
File: wsj2000.txt |
| 8/30 | lecture4.pdf | lecture4.ppt | 12 | Note on Homework 1. Regexps: =~ operator, multiple matches, search and replace. |
| Date | Lecture Notes | Number of Slides |
Topic | |
|---|---|---|---|---|
| Powerpoint | ||||
| 9/4 | lecture5.pdf | lecture5.ppt | 19 | Correction to previous lecture's slides. SWI-Prolog. Chomsky Hierarchy. Regular grammars. Definite clause grammar (DCG) system. |
| 9/6 | lecture6.pdf | lecture6.ppt | 16 | More on DCGs. Prolog's computation rule: top-down, depth-first and left-to-right. Left recursive vs. right recursive regular grammars: the case of Sheeptalk! |
| 9/11 | lecture7.pdf | lecture7.ppt | 16 | More on regular grammars. Prolog term data structure. Parse tree recovery using an extra parameter.
Homework 1 review: sample... homework1_sample.txt |
| 9/13 |   | New location: 5pm Speech and Hearing Sciences 205.
Cognitive Science Master's Seminar series talk Statistical Natural Language Parsing and the Penn Treebank: Reliable Models of Language? |
||
| 9/18 | lecture8.pdf | lecture8.ppt | 18 | More on regular grammars. Language enumeration. Ambiguity and DCGs. Combining left and right recursive regular grammar rules. Homework 2. |
| 9/20 |   | No class today. Attend Computational Linguistics Colloquium tomorrow. | ||
| 9/21 |   | Computational Linguistics Colloquium.
Location: 3pm Speech and Hearing Sciences 205. Speakers: Sandiway Fong, Mike Hammond and Ying Lin. |
||
| 9/25 | lecture9.pdf | lecture9.ppt | 18 |
Extra arguments for agreement. The expressive power from extra
arguments: example of a^nb^n with a regular grammar extended with
one extra argument.
Added (2pm 9/25): class exercise for Case agreement. |
| 9/27 | lecture10.pdf | lecture10.ppt | 32 | (Corrected 2:25pm 9/27) Homework 2 review. Recap of regular grammars and extra arguments. New topic: FSA. Regexp and FSA. |
| Date | Lecture Notes | Number of Slides |
Topic | |
|---|---|---|---|---|
| Powerpoint | ||||
| 10/2 | lecture11.pdf | lecture11.ppt | 17 | FSA in Prolog: two implementations. |
| 10/4 | lecture12.pdf | lecture12.ppt | 28 | More on FSA: NDFSA to FSA conversion, pumping lemma. |
| 10/9 | lecture13.pdf | lecture13.ppt |   | FSA to regexp. FSA and complementation. Regexp and complementation. |
| 10/11 | lecture14.pdf | lecture14.ppt | 9 | Midterm exam.
Download for Question 1: wsj.txt |
| 10/16 | lecture15.pdf | lecture15.ppt | 46 | Midterm exam review. FST: Finite State Transducers. |
| 10/18 |   |   |   | No class today |
| 10/23 | lecture16.pdf | lecture16.ppt | 17 | Midterm exam review (correction). FST: Finite State Transducers in Prolog. Worked Example (3.14). Updated: 2pm 10/23. |
| 10/25 | lecture17.pdf | lecture17.ppt | 26 | Running the FST backwards. Word error correction. Minimum edit distance.
Excel Spreadsheet for Edit Distance: eds.xls |
| 10/30 | Guest lecture: David Pinkus, Google. |
Title: Natural Language Processing and the next 10 years of search.
Abstract: Google's mission is to organize the world's information and make it universally accessible and useful. To do this requires not just continually crawling and indexing the world wide web (among other sources), but also translating, on demand, that information into potentially any one of the currently 100 languages supported by Google. This talk will explore some of what Google can do with its large corpus of information, and specifically some successes in language translation. |
||
| Date | Lecture Notes | Number of Slides |
Topic | |
|---|---|---|---|---|
| Powerpoint | ||||
| 11/1 | lecture18.pdf | lecture18.ppt | 31 | Porter Stemmer. Edit distance and Excel programming; The misspellings of Britney Spears. Homework Question. |
| 11/6 | lecture19.pdf | lecture19.ppt | 30 | Basic probability. N-gram language models.
Slides updated: 2:15pm 11/6
References in Excel spreadsheet: referencing.xls |
| 11/8 | lecture20.pdf | lecture20.ppt | 38 | N-gram language models contd. Smoothing: (1) add one. (2)
Witten-Bell. Random sentence generation. Pereira's Colorless green ideas corpus statistic. Backoff. Interpolation.
Add one Excel spreadsheet: addone.xls Witten-Bell Excel spreadsheet: wb.xls |
| 11/13 | lecture21.pdf | lecture21.ppt | 33 | Part of speech (POS) tagging: three methods. Manual rules. Statistical models. Machine learning of rules.
Updated: connexor links and demo |
| 11/15 | lecture22.pdf | lecture22.ppt | 10 | Context-free grammars for English. Passive morphology. Traces.
Updated 3:50pm: added grammar developed in class |
| 11/17 | lecture23.pdf | lecture23.ppt | 25 | Context-free grammars for English contd.
Handling left recursive VP adjunction rules using Lookahead and Marking.
Homework 4 538 Presentation Requirements. |
| 11/27 | lecture24.pdf | lecture24.ppt | 49 | Class schedule. Final. 538 Presentation assignments.
Parsing methods: top-down, left corner, bottom-up LR parsing. |
| 11/28 | lecture25.pdf | lecture25.ppt | 30 | 538 Presentation assignments so far.
WordNet. Applications. Case studies: Semantic oppposition. GRE word quizzes. |
| Date | Lecture Notes | Number of Slides |
Topic | |
|---|---|---|---|---|
| Powerpoint | ||||
| 12/4 | lecture26.pdf | lecture26.ppt | 5 | 538 class presentations.
Features and Unification: Jeff Berry Lexicalized and Probabilistic Parsing: Mary Dungan Language and Complexity: Roeland Hancock Representing Meaning: Mark Siner Semantic Analysis: Sean Humpherys Lexical Semantics: Kevin Moffitt Word Sense Disambiguation and Information Retrieval: HsinMin Lu Discourse: Sven Thoms Discourse: Jamie Samdal Dialog and Conversational Agents: Sunjing Ji Natural Language Generation: Brent Ramerth Machine Translation: Dainon Woudstra Biomedical Information: Tara Paulsen |
| 12/6 | lecture27.pdf | lecture27.ppt | 7 | Optional review session. Sample Final Exam questions. |
| 12/11 | lecture28.pdf | lecture28.ppt |   | Final exam.
bigram.txt Time: 11am to 1pm. |