To my linguistics homepage

LING/C SC/PSYC 438/538
Computational Linguistics
Fall 2008

This is a introductory course in computational linguistics at an advanced level.

Reference Textbook

We will make use of selected chapters from Speech and Language Processing 2nd edition, by D. Jurafsky and J.H. Martin, Prentice-Hall 2008.

Email List

Hosted at listserv.arizona.edu

The name of the list is LING538@LISTSERV.ARIZONA.EDU

Software

We will use Perl and SWI-Prolog (freely available) in the computer laboratory classes. Students will implement finite state automata, transducers, parsers and translation programs based on grammar rules in a series of computer laboratory exercises.

In the case of numerical calculations, we will make use of Microsoft Excel for worked examples and homework questions.

Instructor: Sandiway Fong sandiway@email.arizona.edu
Office: Douglass 311

Administrivia

Location SS 224
Time Tuesday-Thursdays 12:30-1:45 pm

Syllabus

See lecture1 slides.

Lecture Notes

Available in Adobe PDF formats.

August

Date Lecture Notes Number
of Slides
Topic
PDF Powerpoint
8/26 lecture1.pdf lecture1.pptx 24 Administrivia and Introduction. Homework 1.
Updated: 8/26 9pm
8/28 lecture2.pdf lecture2.pptx 9 Quiz. Introduction to Perl notes. Homework 2.
Updated: 8/29 5pm

September

Date Lecture Notes Number
of Slides
Topic
PDF Powerpoint
9/2 lecture3.pdf lecture3.pptx 6 Introduction to Perl contd. Homework 3.
Updated: 3pm 9/2
9/4 lecture4.pdf lecture4.pptx 5 Introduction to Perl contd.
Updated: 3pm 9/4.
9/9 lecture5.pdf lecture5.pptx 18 Homework 3 review. Regular Expressions.
Homework exercise - ungraded
Updated 11:39pm 9/9.
9/11 lecture6.pdf lecture6.pptx 11 Homework exercise review. Regular Expressions contd.
Homework 6.
9/16 lecture7.pdf lecture7.pptx 21 Finite State Automata (FSA). Implementation in Perl. Epsilon transitions. Non-deterministic FSA. Set-of-states construction.
Ungraded homework.
Updated 2:10pm 9/16
9/18 lecture8.pdf lecture8.pptx 24 Homework 6 Review. Ungraded homework from last time (review).
FSA and REs.
Updated: 2:30pm 9/16
Updated: 2:40pm 9/30
9/23 lecture9.pdf lecture9.pptx 11 Guest lecture: Dr. Ray Tillman, US Air Force Research Labs, Mesa AZ
Lecture slides. Link to Podcast here
9/25 lecture10.pdf lecture10.pptx 15 Introduction to the Chomsky Hierarchy. Regular Grammars. SWI-Prolog.
Ungraded Homework Exercise.
Updated: 4:14pm 9/25
Updated: 2pm 9/30
9/30 lecture11.pdf lecture11.pptx 20 Regular Grammars and Prolog contd. Converting FSA to REs.
Homework 11.
Updated: 2pm 9/30

October

Date Lecture Notes Number
of Slides
Topic
PDF Powerpoint
10/2 lecture12.pdf lecture12.pptx 18 Regular Grammars and left recursion. Beyond regular languages. The Pumping Lemma for regular languages.
10/7 lecture13.pdf lecture13.pptx 28 Important dates: midterm exam and guest lectures #2 and #3.
Homework 11 review. Morphology and stemming. Google and stemming.
Updated: 2:30pm 10/7
Updated: 8:30pm 10/14
10/9 lifescience.pdf iplant.pdf suebrown.pdf   85 Guest lecture: Nirav Mechant and Prof. Sue Brown, iPlant Project, University of Arizona
Podcast: here
Title: Enhancing the discovery life cycle: Application of information mining and retrieval in life sciences
... Text mining plays a very important role in connecting and transforming text into more "computable" forms that facilitate integration of data, we will discuss some of the challenges and impediments along with advances and standards such as BioCreative and semantic web.
To effectively utilize the emerging tools and future data sets we need a fundamental shift in paradigm at multiple levels of the discovery process; based at the University of Arizona the iPlant Collaborative (iPC) is a distributed, cyberinfrastructure-centered, international community of plant and computing researchers to enable new conceptual advances through adoption of computational thinking to address compelling grand challenges in the plant sciences and associated, cutting-edge research challenges in the computing sciences. We will discuss recent activities and roadmap for iPC along with opportunities for students to interact with various groups involved in iPlant.
10/14 lecture15.pdf lecture15.pptx 29 Stemming: the Porter Stemmer. Finite State Transducers.
10/16       No lecture.
10/21 lecture16.pdf lecture16.pptx 20 Spelling errors. Edit Distance Computation.
Updated: 2:30p, 10/21
10/23 Slides not available     Guest lecture: David Pinkus, Google (Tempe AZ)
Title: Natural Language Processing and the next 9 years of search.
Abstract:
Google's mission is to organize the world's information and make it universally accessible and useful. To do this requires not just continually crawling and indexing the world wide web (among other sources), but also translating, on demand, that information into potentially any one of the currently 100 languages supported by Google. This talk will explore some of what Google can do with its large corpus of information, and specifically some successes in language translation.
10/28 lecture18.pdf lecture18.pptx 12 Midterm exam.
10/30 lecture19.pdf lecture19.pptx 31 Introduction to probability. N-grams. N-gram software.

November

Date Lecture Notes Number
of Slides
Topic
PDF Powerpoint
11/4 lecture20.pdf lecture20.pptx 35 N-grams contd. Smoothing. Back-off interpolation.
Excel: addone.xls, wb.xls
11/6 lecture21.pdf lecture21.pptx 33 Midterm review session. Homework 21.
Corpus for Homework 21: WSJ9_041.txt
Updated: 4:30pm 11/6
Updated: 4:15am 11/8: Typo in HW: Bristol-Myers is correct spelling
11/11       Veterans Day: No lecture.
11/13 lecture22.pdf lecture22.pptx 24 Homework review.
Updated: 2:30pm 11/13
11/18 lecture23.pdf lecture23.pptx 38 Part of speech tagging.
Updated: 8pm 11/18
11/20 lecture24.pdf lecture24.pptx 26 Context-free grammars. The uses for extra arguments in DCGs: parse tree computation and agreement.
11/25 lecture25.pdf lecture25.pptx 34 Context-free grammars contd. Using lookahead to deal with recursion. Treebanks. The Penn Treebank. Tgrep2.
Homework handed out: due next Tuesday
Updated: 11/25 3:20p
11/27       Thanksgiving: no lecture.

December

Date Lecture Notes Number
of Slides
Topic
PDF Powerpoint
12/2 lecture26.pdf lecture26.pptx 14
Presentation assignments
Treebanks. The Penn Treebank. Demos: Tgrep2. tregex
Final homework: out today, due next Tuesday
Optional Project
Updated: 3pm 12/2
12/4 lecture27.pdf lecture27.pptx 45 Context-free grammars parsing: left corner, LR parsing.
12/9 lecture28.pdf lecture28.pptx 4 Class presentations.


To my linguistics homepage