Syllabus:  Linguistics 478/578 (also Speech and Hearing Science 478/578)

Speech Technology, Fall 2007



Time:
  Tuesday/Thursday 9:30-10:45
Place:  Social Sciences 224 (ICL)
Professor:  Ying Lin
Office hours:  T/Th 2:00-3:00, Douglass 305
Phone:  626-0678
Email:  yinglin@email.arizona.edu
Course webpage:  http://d2l.arizona.edu

Course description:
This course is an introduction to speech technologies for linguistics, speech science, and computer science students. The main focus will be speech synthesis and speech recognition. As time allows, we will also spend some time on other speech technologies.  Speech technology is an active industry, and there is great potential both for people with knowledge of speech and people with knowledge of computers (or ideally both) to work in that industry.  The purpose of this course is to give you background that would be useful if you pursue work in the industry, or conduct research that uses speech technology.

Prerequisites
Either a background in phonetics (such as Linguistics 314 or 515 or Speech and Hearing 267) or programming skills are required for this course.  (A background in both is desirable, but not required)  The course will be accessible for students with knowledge of speech but no programming background.  It will also be useful for students with strong programming background but no knowledge of speech.  There will be some readings on basic acoustic phonetics, and students with no previous experience with phonetics should read these soon.


Learning Objectives:
1. Describe characteristics of spoken language data with the correct linguistic terminology.
2. Carry out phonetic transcription with the aid of computer software. 

3. Perform simple formant synthesis by using digital source signal and filters. 

4. Identify key problems in text-to-speech and speech recognition systems.
5. Use appropriate resources to obtain pronunciations from text data.
6. Create prototype recognition and synthesis systems by integrating off-the-shelf software tools.

7. Carry out calculations used in key algorithms of automatic speech recognition.
8. Identify sources of error from the existing commercial TTS and ASR systems and possible solutions to those problems.


Readings:

There are a lot of readings for this course. Most reading materials will be drawn from the following three texts:

Rodman, R.D.  1999.  Computer speech technology.  Artech House. (at the bookstore)

Ths is a friendly, nont-technical reading that serves as a starting point for the materials. However, we will often need to go beyond this text and read chapters of the following two texts.

Jurafsky, D. and Martin., J. Speech and Language Processing, 2nd Edition (draft). Prentice-Hall.
Electronic version available from http://www.cs.colorado.edu/~martin/slp2.html


Taylor, P. Text-to-Speech Synthesis (draft).

This book is freely available from http://mi.eng.cam.ac.uk/~pat40/book.html


Other References:

These books, mostly written from an Electrical Engineering perspective, are also useful for students who wish to study the materials in depth (for example, as references for a term project):


Holmes, J., and Holmes, W. (2001).  Speech synthesis and recognition, 2nd ed.  Taylor and Francis.
Huang, Acero and Hon (2001). Spoken Language Processing: A guide to theory, algorithm, and system development. Prentice-Hall.

Rabiner and Juang (1993). Fundamental of Speech Recognition. Prentice-Hall.
Rabiner and Schafer (1978). Digital Processing of Speech Signals. Prentice-Hall.

Quartieri, T. (2002). Discrete-Time Speech Signal Processing: Principles and Practice. Prentice-Hall.
Coleman, J. (2005): Introduction to Speech and Language Processing. Cambridge University Press.



Software and Server:

Software used in this class includes, but not limited to:


PRAAT: a free software for carrying phonetic analysis on a computer. PRAAT can be downloaded from: http://www.praat.org, a tutorial for how to use PRAAT is provided in D2L.

MATLAB: a programming language and interactive environment. MATLAB 7.0 is already installed on all ICL computers (PC). Instructions for accessing MATLAB on the U of A server from your home computer can be found in D2L.


Hidden Markov Model ToolKit (HTK): a package of speech recognition library and command line tools. You can download HTK (including its source) for free from http://htk.eng.cam.ac.uk/, but you need to register first. HTK compiles and runs under most major operating systems.


Festival: a free speech synthesis engine used in many commercial as well as non-commercial systems. You can download a free version here: http://festvox.org/festival/. Festival compiles and runs in most Unix-like environments (Linux, OSX, CygWin, etc.)


Familiarity with Unix and access to a Unix machine is desirable. Each group will also be assigned an account on the HLT server: hlt.sbs.arizona.edu. Passwords will be distributed after the teams have been formed.


Course Requirements: 478
Homework assignments*    50%
Paper topic proposal    5%
Preliminary presentation    5%
Project progress report    5%
Final presentation    5%
Final team project    25%
Class participation    5%

* Some of the homework assignments will be carried out as part of group work.

 

Course Requirements:  578

Same as above, except that all things worth 5% above are worth 4% each for 578, and a written review on a paper related to your project is also required, and is worth 5%.

The homework assignments, progress report, presentations, and final project are required of all students.  Some homework assignments may also include additional questions which are required only of the students in 578, but which can be done by students in 478 for extra credit.


All students must read all of the required readings and be prepared to discuss them in class.  Questions on the readings will be included in the homework assignments.

All students should attend class every day except in cases of dire emergency or serious illness.  Attendance will not be taken, but you cannot get a good grade for participation without being here to participate.  If attendance becomes a problem, I reserve the right to give short pop quizzes and add these to the grading system, adjusting the percentages above as necessary.  If I feel a need to do this, the change will be announced in advance.

All assignments and the final report must be turned in by 5 PM on the day they are due.  Late assignments will be docked 10% of the possible grade per day late, unless you have a very good documented reason for the lateness.  Methods of submission will vary with the assignment.


Group Work:

The ability of working in teams with team members of different backgrounds is crucial in a corporate environment. The team project also allows you to put together some ideas and tools that you have learned and apply them to a problem that you are interested in. Teams are formed by the instructor, based on the information sheet collected on the first day of class. All team members are expected to participate fully in all the group activities, including the appropriate homework assignments and the term project. Members of the team must have well-defined roles. Non-participating members of any team may be demoted or fired from the team.  That team member may be required to turn in his/her own homework/project or will receive reduced or zero points for that item. Should you have a non-participating member in your team, please notify me immediately.


Each team must prepare a progress report, a preliminary presentation, a final presentation, and a final project (including the writeup, data, code, and documentation if necessary). The due dates for these items are included in the schedule. Presentation order is done at random.  All teams must be ready to present on the first day of presentations.  All team members should be ready to address questions related to the project. Only one person should not have the teams’ only copies of project materials in case of an emergency. You must stay for all presentations. If you do not stay for all presentations or have to leave early, your points will be deducted. 


Submission for the final group project must include an appropriate write-up, any code that you have written, and instructions about how to run your code. Poor documentation and lack of discussion of your results will result in reduced points for your team.

Approximate course schedule (subject to change)

Week of

Topic    

Requirements

8/21

Introduction, IPA and ARPA transcriptions

read Rodman Ch.1, Ch. 7, J&M Ch. 7, Taylor Ch. 7.

8/28

Acoustic phonetics

read Rodman Ch. 1, Ch. 7, J&M Ch. 7, Taylor Ch. 7.

9/4

Representaion and analysis of acoustic signal

read Rodman Ch. 2, Taylor Ch. 10, Ch. 12

9/11

Formant and LPC synthesis

read Rodman Ch. 4.1-4.2, Taylor Ch. 13

9/18
Text cleanup and analysis
read Rodman Ch. 4.4, J&M Ch. 8.1, Taylor Ch. 4, Ch 5.1-5.4
9/25
Text-to-phoneme, Festival
read J&M Ch. 8.2, Taylor Ch. 8

10/2

Prosody

read J&M Ch. 8.3, Taylor Ch. 6, Preliminary presentations 10/4

10/9

Concatenative synthesis, PSOLA, unit selection

read J&M Ch. 8.4-8.5, Taylor Ch. 14, Ch. 16

10/16

Evaluation of synthesis, further issues

read Rodman Ch. 4.5-4.8, J&M Ch. 8.6, Taylor Ch.17

10/23

Intro to ASR, units, HTK

read Rodman Ch. 6, Ch. 3.1-3.6, J&M Ch. 9.1, 9.2

10/30
Feature extraction, acoustic modeling
read Rodman Ch. 3.7, J&M Ch. 9.3, 9.4, Progress report due 11/1

11/6

Hidden Markov models, Viterbi algorithm, training

read Rodman Ch. 3.8, J&M Ch. 6.

11/13

Language modeling, errors, evaluation of ASR

read J&M Ch. 4, Ch. 9.6, 9.7

11/20

Variation, speaker identification

read Rodman Ch. 5, Ch.8, J&M Ch. 10.3, 10.5. No class on 11/22 -- Thanksgiving

11/27
Spoken language understanding, student presentations
read J&M Ch.23. All teams should be ready to present on 11/29
 12/4
Student presentations

Final project presentations

Finals week

 

Project due Tuesday, 12/11, 5pm


 
We hope to have a guest speaker during the semester.  The date will be announced later, and the schedule adjusted accordingly.

Note
Appropriate academic behavior is expected, e.g. cheating and plagiarism are unacceptable, disruptive behavior in class is unacceptable, and the student code of conduct (http://info-center.ccit.arizona.edu/~studpubs/policies/studcofc.htm) should be followed.  It is also expected that students will treat others in the classroom with respect.