Syllabus: Linguistics
478/578 (also Speech and Hearing Science 478/578)
Speech Technology, Fall 2007
Time: Tuesday/Thursday
9:30-10:45
Place: Social
Sciences 224
(ICL)
Professor: Ying
Lin
Office hours:
T/Th 2:00-3:00, Douglass 305
Phone: 626-0678
Email:
yinglin@email.arizona.edu
Course webpage: http://d2l.arizona.edu
Course description:
This course is an introduction to speech technologies for linguistics,
speech science, and computer science students. The main focus will be
speech synthesis and speech recognition. As time allows, we will also
spend some time on other speech technologies. Speech technology
is
an active industry, and there is great potential both for people with
knowledge
of speech and people with knowledge of computers (or ideally
both) to
work in that industry. The purpose of this course is to give you
background that would be useful if you pursue work in the industry, or
conduct research that uses speech technology.
Prerequisites
Either a background in phonetics (such as Linguistics 314 or 515 or
Speech and
Hearing 267) or programming skills are required for this course.
(A background in both is desirable, but not required) The course
will be accessible for students with knowledge of speech but no
programming
background. It will also be useful for students with strong
programming
background but no knowledge of speech. There will be some
readings on
basic acoustic phonetics, and students with no previous experience with
phonetics should read these soon.
Learning
Objectives:
1. Describe characteristics of spoken language data with the correct
linguistic terminology.
2. Carry out phonetic transcription with the aid of computer
software.
3. Perform simple formant synthesis by using digital source signal and filters.
4. Identify key problems in
text-to-speech and speech recognition systems.
5. Use appropriate resources to obtain pronunciations from text data.
6. Create prototype recognition and synthesis systems by integrating
off-the-shelf software tools.
7. Carry out calculations used in key
algorithms of automatic speech recognition.
8. Identify sources of error from the existing commercial TTS and ASR
systems and possible solutions to those problems.
Readings:
There are a lot of readings for this
course. Most reading materials will be drawn from the following three
texts:
Rodman, R.D. 1999. Computer
speech technology. Artech House.
(at the bookstore)
Ths is a friendly, nont-technical reading
that serves as a starting point for the materials. However, we will
often need to go beyond this text and read chapters of the following
two texts.
Jurafsky, D. and
Martin., J. Speech and Language Processing, 2nd Edition (draft).
Prentice-Hall.
Electronic version available from http://www.cs.colorado.edu/~martin/slp2.html
Taylor, P. Text-to-Speech Synthesis (draft).
This book is
freely available from http://mi.eng.cam.ac.uk/~pat40/book.html
Other References:
These books, mostly written from an Electrical Engineering perspective, are also useful for students who wish to study the materials in depth (for example, as references for a term project):
Holmes, J., and Holmes, W. (2001).
Speech synthesis and recognition,
2nd ed. Taylor and Francis.
Huang, Acero and Hon (2001). Spoken
Language Processing: A guide to theory, algorithm, and system
development. Prentice-Hall.
Rabiner and
Juang (1993). Fundamental of Speech Recognition. Prentice-Hall.
Rabiner and Schafer (1978). Digital Processing of Speech Signals.
Prentice-Hall.
Quartieri, T.
(2002). Discrete-Time Speech Signal Processing: Principles and
Practice. Prentice-Hall.
Coleman, J. (2005):
Introduction to Speech and Language Processing. Cambridge University
Press.
Software and Server:
Software used in
this class includes, but not limited to:
PRAAT: a free
software for carrying phonetic analysis on a computer. PRAAT can be
downloaded from: http://www.praat.org,
a tutorial for how to use PRAAT is provided in D2L.
MATLAB: a
programming language and interactive environment. MATLAB 7.0 is already
installed on all ICL computers (PC). Instructions for
accessing MATLAB on the U of A server from your home computer can be
found in D2L.
Hidden Markov Model ToolKit (HTK): a package of speech recognition
library and command line tools. You can download HTK (including its
source) for free from
http://htk.eng.cam.ac.uk/, but
you need to register first. HTK compiles
and runs under most major operating systems.
Festival: a free speech synthesis engine used in many commercial as
well as non-commercial systems. You can download a free version here:
http://festvox.org/festival/.
Festival compiles and runs in most Unix-like environments (Linux, OSX,
CygWin, etc.)
Familiarity with Unix and access to a Unix machine is desirable. Each
group will also be assigned an account on the HLT server:
hlt.sbs.arizona.edu. Passwords will be distributed after the teams have
been formed.
Course
Requirements: 478
Homework assignments* 50%
Paper topic proposal 5%
Preliminary presentation 5%
Project progress report 5%
Final presentation 5%
Final team project 25%
Class participation 5%
* Some of the homework assignments will be carried out as part of group
work.
Same as above,
except that all things worth 5% above are worth 4% each for 578, and a
written review on a paper related to your project is
also
required, and is worth 5%.
The homework assignments, progress report,
presentations, and
final project are required of all students. Some homework
assignments may also
include
additional questions which are required only of the students in 578,
but which
can be done by students in 478 for extra credit.
All students
must read all of the required readings and be prepared to
discuss
them in class. Questions on the readings will be included in the
homework
assignments.
All students should attend class every day except in cases of dire
emergency or
serious illness. Attendance will not be taken, but you cannot get
a good
grade for participation without being here to participate. If
attendance
becomes a problem, I reserve the right to give short pop quizzes and
add these
to the grading system, adjusting the percentages above as
necessary. If I
feel a need to do this, the change will be announced in advance.
All assignments and the final report must be turned in by 5 PM on the
day they are due. Late assignments
will be docked 10% of the possible grade per day late, unless you have
a very
good documented reason for the lateness. Methods of submission
will vary with the assignment.
Group Work:
The ability of
working in teams with team members of different backgrounds is crucial
in a corporate environment. The team project also allows you to put
together some ideas and tools that you have learned and apply them to a
problem that you are interested in. Teams are formed by the instructor,
based on the information sheet collected on the first day of class. All
team members are expected to
participate fully in all the group activities, including the
appropriate homework
assignments and the term project. Members of the team must have
well-defined roles. Non-participating members of any team may be
demoted or fired from the team. That team member may be required
to turn in his/her own homework/project or will receive reduced or zero
points for that item. Should you have a non-participating member in
your team, please notify me immediately.
Each team must prepare a progress report, a preliminary presentation, a
final presentation, and a final project (including the writeup, data,
code, and documentation if necessary).
The due dates for these items are included in the schedule.
Presentation order is done at random. All teams must be ready to
present on the first day of presentations. All team members
should be ready to address questions related to the project. Only one
person should not have the teams’ only copies of project materials in
case of an emergency. You must stay for all presentations. If you do
not stay for all presentations or have to leave early, your points will
be deducted.
Submission for the final group project must include an appropriate
write-up,
any code that you have written, and instructions about how to run your
code. Poor documentation and lack of discussion of your results
will result in reduced points for your team.
Approximate course schedule (subject to change)
|
Week of |
Topic |
Requirements |
|
8/21 |
Introduction, IPA and ARPA transcriptions |
read
Rodman Ch.1, Ch. 7, J&M Ch. 7, Taylor Ch. 7. |
|
8/28 |
Acoustic phonetics |
read
Rodman Ch. 1, Ch. 7, J&M Ch. 7, Taylor Ch. 7. |
|
9/4 |
Representaion
and analysis of acoustic signal |
read
Rodman Ch. 2, Taylor Ch. 10, Ch. 12 |
|
9/11 |
Formant
and LPC synthesis |
read
Rodman Ch. 4.1-4.2, Taylor Ch. 13 |
| 9/18 |
Text cleanup and analysis |
read Rodman Ch. 4.4,
J&M Ch. 8.1, Taylor Ch. 4, Ch 5.1-5.4 |
| 9/25 |
Text-to-phoneme, Festival |
read J&M Ch. 8.2,
Taylor Ch. 8 |
|
10/2 |
Prosody |
read
J&M Ch. 8.3, Taylor Ch. 6, Preliminary
presentations 10/4 |
|
10/9 |
Concatenative
synthesis, PSOLA, unit selection |
read
J&M Ch. 8.4-8.5, Taylor Ch. 14, Ch. 16 |
|
10/16 |
Evaluation
of synthesis, further issues |
read
Rodman Ch. 4.5-4.8, J&M Ch. 8.6, Taylor
Ch.17 |
|
10/23 |
Intro to
ASR, units, HTK |
read
Rodman Ch. 6, Ch. 3.1-3.6, J&M Ch. 9.1, 9.2 |
| 10/30 |
Feature extraction,
acoustic modeling |
read Rodman Ch. 3.7,
J&M Ch. 9.3, 9.4, Progress report
due 11/1 |
|
11/6 |
Hidden
Markov models, Viterbi algorithm, training |
read
Rodman Ch. 3.8, J&M Ch. 6. |
|
11/13 |
Language
modeling, errors, evaluation of ASR |
read
J&M Ch. 4, Ch. 9.6, 9.7 |
|
11/20 |
Variation,
speaker identification |
read
Rodman Ch. 5, Ch.8, J&M Ch. 10.3, 10.5. No class on 11/22 --
Thanksgiving |
| 11/27 |
Spoken language
understanding, student presentations |
read J&M Ch.23. All teams should be ready to present on
11/29 |
| 12/4 |
Student presentations |
Final
project presentations |
|
Finals week |
|
Project due Tuesday, 12/11, 5pm |
We hope to have a guest speaker during the
semester. The date will be announced later, and
the
schedule adjusted accordingly.
Note
Appropriate academic behavior is expected, e.g. cheating and plagiarism
are
unacceptable, disruptive behavior in class is unacceptable, and the
student
code of conduct (http://info-center.ccit.arizona.edu/~studpubs/policies/studcofc.htm)
should be followed. It is also expected that students will treat
others
in the classroom with respect.