Much of the material in this handout is technical, and I don't expect you to know how to prove any of the results mentioned here. We will be focusing on the linguistic significance of the material, and it is that that I want you to try to understand.

Recall that a (formal) language *L* is any subset of S*, where S is a finite vocabulary of symbols. The powerset Ã of S* is the set of all subsets of S*, i.e. set of all languages over S. A class ** CL** of languages is any subset of Ã(S*), or member of Ã(Ã(S*)). We consider here the

The smallest infinite class of languages in the Chomsky hierarchy is the class *RL *of **regular languages**. These are the languages that can be represented by **regular expressions**.

Let *R1* be the regular language represented by the regular expression (*ab*cb**)+. In any string *r* of *R1* there is at least one **dependency** between an *a* and a *c*, which we can describe in this way: Following any occurrence of *a* in *r*, a *c* must occur, and between that *a* and that *c* only *b*s can occur. An example is *a _{1}b^{4}c_{1}ba_{2}b^{9}c_{2}b^{15}a_{3}c_{3}b*

a_{1}b^{4}c_{1}ba_{2}b^{9}c_{2}b^{15}a_{3}c_{3}b^{6}

│ │ │ │ │ │

└───┘ └───┘ └──┘

Notice that the distance between any two dependent elements in *R1* can be arbitrarily large or small (e.g., between *a _{2}* and

Let *R2* be the regular language (*he thinks | they think*)* *I stink*). The dependencies in *R2* between *he* and *thinks* and between *they *and *think* are serial and unbounded, so if *R2* is part of English, then some set of English sentences exhibits unbounded serial dependencies.

Pumping lemma for regular languages (PLRL)

* PLRL: *Let

The primary use for PLRL is to prove that certain languages are not regular. Here I show how to use it to prove that the palindrome language *P* ={*ocô *: *o Î *(*a | b*)* and *ô = x *backwards} = {*c*, *aca*, *bcb*, *a ^{2}ca^{2}*,

The next larger class of languages in the Chomsky hierarchy is the class *CFL *of **context-free languages**. Every regular language is also a context-free language, but the converse is not true. The language *P* above is a context-free language, but is not a regular language. We call the class *SCFL* of context-free languages that are not regular languages **strictly** context-free languages (i.e. *SCFL* = *CFL* - *RL*). Unlike regular languages which can be expressed by regular expressions, there is no class of "expressions" for context-free languages. They are defined only by the class of grammars that generate them. These grammars are presented in the context-free languages handout.

In the string *a _{1}a_{2}b_{1}a_{3}b_{2}cb_{2*}a_{3*}b_{1*}a_{2*}a_{1}*

a_{1}a_{2}b_{1}a_{3}b_{2 }c_{ }b_{2}a_{3}b_{1}a_{2}a_{1}

│ │ │ │ │ │ │ │ │ │

│ │ │ │ └──┘ │ │ │ │

│ │ │ └──────┘ │ │ │

│ │ └──────────┘ │ │

│ └──────────────┘ │

└──────────────────┘

Clearly there are nesting dependencies in natural languages, as in the English sentence *the cats that the dog chases howl*, in which the dependency between *the dog* and *chases* nests within the dependency between *the cats* and *howl*. (The strings **the cats that the dog chase howl*, **the cats that the dog chases howls*, and **the cats that the dog chase howls* are all ungrammatical in English.) The question then arises whether natural languages have unbounded nesting dependencies. The answer depends on judgments like whether the string *the cats that the dog that the horse that the zebras admire frightens chases howl* is grammatical, whereas the strings in which at least one of the following replacements is made are ungrammatical: *admires* for *admire*, *frighten* for *frightens*, *chase *for *chases*, and *howls* for *howl*.

* PLCFL*: Let

Again the major utility of the PLCFL is to show that certain languages are not context free. An example is the language *I* = *a ^{i}b^{i}c^{i}* = {

Context-free languages are closed under intersection with regular languages. Proof of this theorem is also in Hopcroft & Ullman (1979). That is, if *F* Î *CFL* and *R* Î *RL*, the language *F'* = *F Ç R* Î *CFL*. This result can also be used together with *PLCFL* to show that certain languages are not context free, for example the language *C* = {*xx *: *x* Î (*a | b*)+} = {*aa*, *bb*, *a ^{2}a^{2}, abab*,

Context-free languages are not closed under intersection. For example, let *I1* = *a ^{i}b^{i}c^{j}* and

The next larger class of languages in the Chomsky hierarchy is the class *CSL* of **recursive languages**, often called **context-sensitive languages**. These are the languages for which there is a decision procedure for determining whether an arbitrary string does or does not belong to the language. *CFL Ì CSL*. For example, the languages *C* and *I* described above are in *CSL* but not in *CFL*. *CSL* is closed under intersection with *RL*, just as *CFL* is.

Linguists are not interested in the class *CSL* as a whole, but only with a proper subclass *IXL* that includes *C*, *I* and *J* together with the *CFL*s, called **indexed** (or **mildly context-sensitive**) **languages**. We may talk more about this class of languages (which was discovered after the Chomsky hierarchy was established) and grammars for generating them later on in this course. The class *SIXL* of **strictly** indexed languages is *IXL - CFL*.

In *a _{1}a_{2}b_{1}a_{3}b_{2}a_{1*}a_{2*}b_{1*}a_{3*}b_{2}*

a_{1}a_{2}b_{1}a_{3}b_{2 }a_{1}a_{2}b_{1}a_{3}b_{2}

│ │ │ │ │ │ │ │ │ │

└─┼─┼─┼─┼─┘ │ │ │ │

└─┼─┼─┼───┘ │ │ │

└─┼─┼─────┘ │ │

└─┼───────┘ │

└─────────┘

Figure 3. Crossing dependencies in *a _{1}a_{2}b_{1}a_{3}b_{2}a_{1*}a_{2*}b_{1*}a_{3*}b_{2*}*

If unbounded reduplication is grammatical in some language, then the reduplicated strings would show unbounded crossing dependencies. In almost all cases, reduplication is bounded, so that the crossing dependencies they give rise to are bounded. However, Culy (1985) has argued that Bambara (a Northwestern Mande language spoken in Mali) exhibits a kind of unbounded reduplication, and the *A-not-A* construction of Chinese may be another instance.

The largest computable class of languages is the class *REL *of **recursively enumerable languages**. These are the languages for which there is a decision procedure for determining for each string in the language that it is in the language, but not necessarily for determining for each string not in the language that it is not. *CSL* Ì *REL*. The class *SREL *of **strictly** recursively enumerable languages *REL - CSL* is the class of **undecidable languages**. The study of undecidable languages is an interesting branch of mathematics, but not directly relevant to linguistics.