To my linguistics homepage

LING/C SC/PSYC 438/538
Computational Linguistics
Fall 2007

This is a introductory course in computational linguistics at an advanced level.

Reference Textbook

We will make use of selected chapters from Speech and Language Processing by D. Jurafsky and J.H. Martin, Prentice-Hall 2000. A copy of the book is on reserve in the library.

Email List

Hosted at listserv.arizona.edu

The name of the list is LING538@LISTSERV.ARIZONA.EDU

Software

We will use Perl and SWI-Prolog (both freely available) in the computer laboratory classes.

We will also use a spreadsheet, namely Microsoft Excel, for calculations.

Instructor: Sandiway Fong sandiway@email.arizona.edu
Office: Douglass 311

Administrivia

Location S SCI 224 (Computer Lab)
Time Tuesday-Thursdays 12:30-1:45 pm

Syllabus

Topics. See chapters of the textbook, also see lectures from previous years. Additionally, new topics will be introduced each year.

See Lecture 1 slides for the homework and grading policy.

Lecture Notes

Available in both Adobe PDF and Microsoft Powerpoint formats.

August

Date Lecture Notes Number
of Slides
Topic
PDF Powerpoint
8/21 lecture1.pdf lecture1.ppt 45 Administrivia and Introduction
8/23 lecture2.pdf lecture2.ppt 20 Regular expressions (regexp). Introduction to Perl.
8/28 lecture3.pdf lecture3.ppt 14 Regexps: recap. Perl: variables, conditionals, iterators. Regexps: grouping and backreferences. Homework 1.
File: wsj2000.txt
8/30 lecture4.pdf lecture4.ppt 12 Note on Homework 1. Regexps: =~ operator, multiple matches, search and replace.

September

Date Lecture Notes Number
of Slides
Topic
PDF Powerpoint
9/4 lecture5.pdf lecture5.ppt 19 Correction to previous lecture's slides. SWI-Prolog. Chomsky Hierarchy. Regular grammars. Definite clause grammar (DCG) system.
9/6 lecture6.pdf lecture6.ppt 16 More on DCGs. Prolog's computation rule: top-down, depth-first and left-to-right. Left recursive vs. right recursive regular grammars: the case of Sheeptalk!
9/11 lecture7.pdf lecture7.ppt 16 More on regular grammars. Prolog term data structure. Parse tree recovery using an extra parameter.
Homework 1 review: sample... homework1_sample.txt
9/13   New location: 5pm Speech and Hearing Sciences 205.
Cognitive Science Master's Seminar series talk
Statistical Natural Language Parsing and the Penn Treebank: Reliable Models of Language?
9/18 lecture8.pdf lecture8.ppt 18 More on regular grammars. Language enumeration. Ambiguity and DCGs. Combining left and right recursive regular grammar rules. Homework 2.
9/20   No class today. Attend Computational Linguistics Colloquium tomorrow.
9/21   Computational Linguistics Colloquium.
Location: 3pm Speech and Hearing Sciences 205.
Speakers: Sandiway Fong, Mike Hammond and Ying Lin.
9/25 lecture9.pdf lecture9.ppt 18 Extra arguments for agreement. The expressive power from extra arguments: example of a^nb^n with a regular grammar extended with one extra argument.
Added (2pm 9/25): class exercise for Case agreement.
9/27 lecture10.pdf lecture10.ppt 32 (Corrected 2:25pm 9/27) Homework 2 review. Recap of regular grammars and extra arguments. New topic: FSA. Regexp and FSA.

October

Date Lecture Notes Number
of Slides
Topic
PDF Powerpoint
10/2 lecture11.pdf lecture11.ppt 17 FSA in Prolog: two implementations.
10/4 lecture12.pdf lecture12.ppt 28 More on FSA: NDFSA to FSA conversion, pumping lemma.
10/9 lecture13.pdf lecture13.ppt   FSA to regexp. FSA and complementation. Regexp and complementation.
10/11 lecture14.pdf lecture14.ppt 9 Midterm exam.
Download for Question 1: wsj.txt
10/16 lecture15.pdf lecture15.ppt 46 Midterm exam review. FST: Finite State Transducers.
10/18       No class today
10/23 lecture16.pdf lecture16.ppt 17 Midterm exam review (correction). FST: Finite State Transducers in Prolog. Worked Example (3.14). Updated: 2pm 10/23.
10/25 lecture17.pdf lecture17.ppt 26 Running the FST backwards. Word error correction. Minimum edit distance.
Excel Spreadsheet for Edit Distance: eds.xls
10/30 Guest lecture: David Pinkus, Google. Title: Natural Language Processing and the next 10 years of search.
Abstract: Google's mission is to organize the world's information and make it universally accessible and useful. To do this requires not just continually crawling and indexing the world wide web (among other sources), but also translating, on demand, that information into potentially any one of the currently 100 languages supported by Google. This talk will explore some of what Google can do with its large corpus of information, and specifically some successes in language translation.

November

Date Lecture Notes Number
of Slides
Topic
PDF Powerpoint
11/1 lecture18.pdf lecture18.ppt 31 Porter Stemmer. Edit distance and Excel programming; The misspellings of Britney Spears. Homework Question.
11/6 lecture19.pdf lecture19.ppt 30 Basic probability. N-gram language models. Slides updated: 2:15pm 11/6
References in Excel spreadsheet: referencing.xls
11/8 lecture20.pdf lecture20.ppt 38 N-gram language models contd. Smoothing: (1) add one. (2) Witten-Bell. Random sentence generation. Pereira's Colorless green ideas corpus statistic. Backoff. Interpolation.
Add one Excel spreadsheet: addone.xls
Witten-Bell Excel spreadsheet: wb.xls
11/13 lecture21.pdf lecture21.ppt 33 Part of speech (POS) tagging: three methods. Manual rules. Statistical models. Machine learning of rules.
Updated: connexor links and demo
11/15 lecture22.pdf lecture22.ppt 10 Context-free grammars for English. Passive morphology. Traces.
Updated 3:50pm: added grammar developed in class
11/17 lecture23.pdf lecture23.ppt 25 Context-free grammars for English contd. Handling left recursive VP adjunction rules using Lookahead and Marking.
Homework 4
538 Presentation Requirements.
11/27 lecture24.pdf lecture24.ppt 49 Class schedule. Final. 538 Presentation assignments.
Parsing methods: top-down, left corner, bottom-up LR parsing.
11/28 lecture25.pdf lecture25.ppt 30 538 Presentation assignments so far.
WordNet. Applications. Case studies: Semantic oppposition. GRE word quizzes.

December

Date Lecture Notes Number
of Slides
Topic
PDF Powerpoint
12/4 lecture26.pdf lecture26.ppt 5 538 class presentations.
Features and Unification: Jeff Berry
Lexicalized and Probabilistic Parsing: Mary Dungan
Language and Complexity: Roeland Hancock
Representing Meaning: Mark Siner
Semantic Analysis: Sean Humpherys
Lexical Semantics: Kevin Moffitt
Word Sense Disambiguation and Information Retrieval: HsinMin Lu
Discourse: Sven Thoms
Discourse: Jamie Samdal
Dialog and Conversational Agents: Sunjing Ji
Natural Language Generation: Brent Ramerth
Machine Translation: Dainon Woudstra
Biomedical Information: Tara Paulsen
12/6 lecture27.pdf lecture27.ppt 7 Optional review session. Sample Final Exam questions.
12/11 lecture28.pdf lecture28.ppt   Final exam.
bigram.txt
Time: 11am to 1pm.


To my linguistics homepage