Full Schemas and Categoricity

What I intend to do here is introduce the notion of a schema from scratch, complete with a formalized logic, and prove that a suitable version of arithmetic is in a suitable sense categorical. Today will be the first time that I have presented such a result by itself, instead of along with allied results about other logics. I have tried to restrict myself to the case at hand, but if some unnecessary generality seems to have crept in, it probably has: I've failed to expunge it in adapting earlier more general accounts.

Full Schemas

J. H. Harris—and, following him, McGee—says that the rules of logic are open ended—they apply not just in the present language but in any extension of it, whether that extension has been envisioned or not.

We do in fact take our logical rules to be open ended: Consider, for example, the rule $\phi\land\psi\vdash\phi $ or, for that matter, its informal counterpart in natural language. We do not stop to reëvaluate the rule when moving to an expanded vocabulary or even when introducing modalities—all the new formulas are automatically to be allowed into the rule. As McGee puts it for the case of reductio ad absurdum:

We do not accept reductio ad absurdum because we have surveyed the forms of expression found in English and found that its expressive power is circumscribed in such a way as to validate the rule.
Thus, for example, the rule $\phi\land\psi\vdash\phi $ of a formalized logic in a language ${\mathcal L}$ is not, if the rule is to reflect our ordinary usage,
if $ \phi\land\psi$ is a formula of ${\mathcal L}$, then from $\phi\land\psi$ infer $\phi$,
but rather,
if $ \phi$ and $ \psi$ are any formulas whatsoever, then from $ \phi\land\psi$ infer $\phi$.

The notion of 'any formula whatsoever' is, of course, hopelessly vague. I see no reason to provide a precise account here, and indeed some reason not to: since our formalism is going proxy for natural language, we should surely avoid unnecessary commitment to a particular semantics. That is all very well, but how can we live with rules as imprecise as the one just suggested? The rule expresses a very precise idea: It is used to express a commitment that however we expand or modify our language in the future, we shall always continue to uphold certain principles, in this case, the conjunction elimination principle. Even imprecise rules have clear cases of application, and we shall only need the rules of logic and mathematics in clear cases in this work. Given that we shall only be interested in clear cases, we have no present reason to worry about the ambiguity of the formulation.

The "open ended" rules of logic are clearly in a certain sense very general, but what is the nature of that generality? We surely do not, if at all possible, wish to understand

\[\phi\land\psi\vdash\phi\]
(or its natural language counterpart) along the lines of
\[(\forall \phi )(\forall \psi )(\phi\land\psi\vdash\phi ).\]
It is not even clear that the universally quantified "formula" is well formed: the turnstyle '$\vdash$' is not a part of the language, but a symbol that is used to specify a rule—from the antecedent, infer the consequent.

If the formula is not well formed, it is surely not an appropriate way in which to understand the relevant generality. But even if we brush such niceties aside, expressing the generality of the open-ended rules of logic using quantification will still be deeply problematic. To use such a formalism would be to employ the formalism of logic in introducing that very formalism—an undesirable circularity that can be avoided. One might even consider it to be worse than a circularity, since it requires the comparatively sophisticated notion of quantification even to introduce sentential logic.

To make the circularity more stark, consider how we would apply the rule universal universal specification: $^{\text{footnote}}$ $^{\text{footnote}}$

\[(\forall \phi )(\forall x)(\forall \tau )((\forall x)\phi \vdash \phi (\tau ))\qquad{\text{UUS}}\]
to show that, say,
\[(\forall u)P(u)\vdash P(c)\]
is a valid inference. We would first apply UUS to UUS (circularity!) with $\phi = (\forall x)(\forall \tau )((\forall x)\phi \vdash \phi (\tau )),$ $x=\phi $, and $\tau =P(x)$ to obtain
\[(\forall \phi )(\forall x)(\forall \tau )((\forall x)\phi \vdash \phi (\tau ))\vdash(\forall x)(\forall \tau )((\forall x)P(x) \vdash P (\tau )).\]
We would then apply MP (Actually, meta--modus ponens, since '$\vdash$' is not in the language. But we have already agreed to put such considerations aside.) to UUS and the formula immediately above to yield
\[(\forall x)(\forall \tau )\,((\forall x)P \vdash P (\tau )),\]
knocking off the first quantifier ($(\forall \phi )$), and then apply UUS twice more. We would have to apply UUS to use the rule UUS. The point is entirely parallel to one made by Quine in "Truth by Convention." There is nothing inconsistent about describing what we mean by universal specification in this circular way, but it eliminates the possibility of telling any straightforward story about how quantification could be introduced or learned, and it ignores the existence of a pre-quantificational mechanism that has generally been thought to exist and to play a genuine role in our reasoning: the schema.

The schema is another, more primitive, form of generality that will do the job of introducing logical rules without any attendant circularity: we can take $\phi\land\psi\vdash\phi $ and the other logical rules to be schemas used to declare that any substitution instance is valid. In that case, $\phi $ and $\psi $ are schematic variables, and to say that the schema is open ended is to declare that the variables are what I have elsewhere called full schematic variables: 'full' is added to indicate that what counts as an acceptable substitution instance is open ended and automatically expands as the language in use expands.

We may use the following example to see how a schema differs in conception from a universally quantified formula, even an implicitly universally quantified formula. The sentence $(\forall x)x=x$ is a truth bearer, and it is indeed true. In contrast, the schema $x=x$, with $x$ a schematic variable is not a truth bearer at all, though it does deserve some sort of honorific, since it can be used to declare that all of its suitable instances are true. The free schematic variable in a schematic formula, like an ordinary free variable, prevents the formula from being a truth bearer. The schematic variable, moreover, cannot be regarded as a free referential variable: it does not take on values and has no domain. To assert that a formula involving a schematic variable is an axiom schema does not commit us to any true axioms involving a schematic—or any other kind of—free variable: Rather, it commits us to closed instances of the schema as axioms.

The idea that schemas are involved in the basis of logic has been endorsed by Quine and schemas are arguably employed by Russell—his typical ambiguity—and Hilbert. Full schemas are used by Feferman (who introduced them as early as 1977, in discussing Gödel-type independence phenomena, by Tyler Burge in his theory of truth, by McGee in his theory of "How we learn mathematical language," and by Field, Parsons, and I—we have all taken mathematical induction to be a full schematic principle. In each case, full schemas are used to codify the idea that certain principles are open ended—that we are committed to upholding them even through additions to and modifications of our language. The same applies to the axiom of replacement of set theory, as I shall argue below.

The case against assimilating schematic generality to that of referential or substitutional quantification is strongest in the case of a full schema: Full schemas, which are the ones of chief interest to us here, are explicitly required to allow new instances as our language expands, and so the very idea of a full schema is incompatible with the notion of a fixed domain or substitution class.

Quantificational variables are only auxiliaries for quantification: formulas with free variables are not truth bearers. One might therefore wish to avoid the notion that anything is a consequence of a formula with a free variable. One can, for example, infer $0=0$ from $(\forall x)x=x$ as well as from $x=x$, and so perhaps one might wish to argue that the schematic version of $x=x$ is really just an alternative notation for the more familiar universally quantified sentence, perhaps construing the hypothesized implicit quantifier as referential, perhaps as substitutional. That cannot be made to work for several reasons.

A schematic variable cannot be viewed as a free variable, whether substitutional or referential, one that could be bound by a quantifier, since schematic and quantifiable variables have different inferential roles: If $x$ is a schematic variable, one can infer $0=0$ from $x=x$, but that is not so if $x$ is a quantifiable variable—in that case in some proof systems the formula cannot appear in a proof at all, while in those that do allow it, the inference is valid only if $x$ does not occur free in any of the premises of the argument. No such proviso is required in the case of a schematic variable.

Schematic variables are more closely allied to substitutional than to referential quantification: When we give a schema that is not full, we typically restrict the schematic variable to all terms or formulas in a language, not to a range of all objects or properties in a domain. Intuitively, however, no schema has a domain of application like the substitution class of a substitutional quantifier—a schema isn't used to make a general assertion about a class of possible instances. Instead, a schema provides a general method of obtaining particular assertions about suitable instances without any need in advance of a clear notion of all the suitable instances. Thus, for example, one can infer $(\forall n)\phi (n)$ from $\phi (0)$ and $(\forall n)(\phi (n)\rightarrow \phi (Sn))$, but not, if $n$ is a schematic letter, from $\phi (0)$ and $\phi (n)\rightarrow \phi (Sn)$, since the schema, rather than making an assertion about all numbers, which is what is required to reach the conclusion, provides a mechanism for making assertions about particular numbers.

To summarize, schematic variables are neither referential nor substitutional variables. Schematic generality has important preformal differences from the generality of substitutional universal quantification. Finally no universally quantified sentences can have the same consequences as does a suitable corresponding full schema.

Let us now look briefly at how schematic variables can be restricted to terms that refer to objects in a particular domain. Just as in the case of referential variables, we usually leave such restrictions implicit. If a referential quantification over $x$, $(\forall x)\phi$ or $(\exists x)\phi$, is restricted to a domain $\bf D$, we can make that explicit by using $(\forall x)(D(x)\rightarrow\phi)$ and $(\exists x)(D(x)\land\phi)$ in place of the original formulations, where I have used '$D$' as a predicate symbol with extension $\bf D$. Similarly, a schema $\phi$ may include a schematic letter $s$ that we intend to restrict to terms that denote objects in $\bf D$. We can then write $D(s)\rightarrow\phi$ to make the restriction explicit. When a referential variable $x$ is restricted to a domain $\bf D$ and a schematic variable $s$ is restricted to terms that denote objects in $\bf D$, we obtain

\[(\exists x)x=s,\]
or, when we make the restrictions explicit,
\[D(s)\rightarrow(\exists x)(D(x)\land x=s).\]

Full schemas codify the open-ended principles that will make it possible to provide categorical axiomatizations of the natural numbers. We therefore discuss the formalism of axiom schemas in sufficient detail to provide the background for a categoricity proof. We shall only need to employ schemas in one or two first-order schematic variables and two or three second-order schematic variables, and we reserve the letters $s$, $t$, $P$, $Q$ and $R$ for the purpose. We allow them to have subscripts so that in principle we have infinitely many schematic variables of each order. In our applications the schematic variables will have various numbers of substitutable places. We leave it to the reader to determine the number of substitutable places of a schematic variable by counting the number of terms enclosed in parentheses immediately following it. In this work we shall always allow extra places for parameters. For a fully general treatment, we would have to specify infinitely many schematic variables of each order and number of places.

For any term or formula $ \phi $, variables $x_{1},\dots ,x_{n}$, and terms $t_{1},\dots ,t_{n}$, let $\phi \left[\!\!\left[\!{{x_{1},\dots ,x_{n}}\atop{t_{1},\dots ,t_{n}}}\!\right]\!\!\right]$ denote the result of simultaneously replacing every free occurrence of every $x_i$, $i=1,\dots ,n$, in $\phi$ by $t_i$ and renaming bound variables as necessary to prevent collisions with the (free) variables of the terms $t_{1},\dots ,t_{n}$. Given a schematic formula $\Phi ({\bf P})$, $n$ variables $x_{1},\dots ,x_{n}$, and a formula $\phi$, where $\bf P$ is a first- or second-order schematic variable that has $n$ substitutable places, we define $\Phi\left[\!\!\left[\!{{\bf P}\atop{\hat{x}_{1}\cdots\hat{x}_{n}}\phi \!\right]\!\!\right]}$ or, usually, but less precisely, $\Phi(\hat{x}_{1}\cdots\hat{x}_{n}\phi )$, to be the formula that results from $\Phi ({\bf P})$ when, for every list of $n$ terms $t_{1},\dots ,t_{n}$, every occurrence of ${\bf P}(t_{1},\dots ,t_{n})$ in $\Phi ({\bf P})$ is replaced by $\phi \left[\!\!\left[\!{{x_{1},\dots ,x_{n}}\atop{t_{1},\dots ,t_{n}}}\!\right]\!\!\right]$, changing bound variable in $ \Phi $ as necessary to prevent collisions with the free variables of $\phi\left[\!\!\left[\!{{x_{1},\dots ,x_{n}}\atop{t_{1},\dots ,t_{n}}}\!\right]\!\!\right]$ that are not among the (free) variables of the terms $t_{1},\dots ,t_{n}$. (For example, $ (\forall x) (\exists y)P(x)\left[\!\!\left[\!{{P}\atop{\hat{z}(\exists x) R(x,z,y)}}\!\right]\!\!\right] = (\forall x) (\exists v)(\exists u) R(u,x,y)$, where $u$ replaced $x$ to avoid a collision in $ (\exists x) R(x,z,y)\left[\!\!\left[\!{{z}\atop{x}}\!\right]\!\!\right]$ and $v$ replaced $y$ to avoid a collision in $ (\forall x) (\exists y)P(x)\left[\!\!\left[\!{{P}\atop{ (\exists u) R(u,x,y)}}\!\right]\!\!\right]$.) The resulting formula may have free variables. We take them to be implicitly universally quantified.

For any language $\mathcal L$, we introduce the $\mathcal L$ substitution rule: from any formula $\Phi ({\bf P})$ in which $\bf P$ has $n$ substitutable places, infer $\Phi (\hat{x}_{1}\cdots\hat{x}_{n}\phi )$, where ${x_{1},\dots ,x_{n}}$ are any $n$ variables and $\phi$ is any term of $\mathcal{L} $ if $\bf P$ is a first-order schematic variable and any formula of $\mathcal L$ if $\bf P$ is a second-order schematic variable.

We now associate a full schematic theory with any base theory $T_{0}$ in a language $\mathcal L$. The base theory $T_{0}$ may include some formulas with schematic variables in them. We do not regard a theory as a set of sentences in a language $\mathcal L$ as is usual, but instead as a function from languages ${\mathcal L}^{+}\supset{\mathcal L}$ to sets of axioms and rules. If the schematic variable $\bf P$, whether of first or second order, has $n$ substitutable places, let $\overline{x}$ stand for some fixed $n$-tuple of distinct variables. The full schematic theory associated with $T_0$ over $\mathcal L$, $T^{(+)}$, associates with each ${\mathcal L}^{+}\supset{\mathcal L}$ the theory with axioms $T_{0}$ and with the ${\mathcal L}^{+}$-substitution rule.

When $T_{0}$ is empty, we refer to the corresponding theory as full schematic logic.

Let us define what it is for a structure to be a model of a full schematic theory in line with the redefinition of the notion of a theory.

Let $\phi $ be a full schematic formula in a language ${\mathcal L}$. An assignment $s$ satisfies the formula $ \phi $ with respect to full schematic logic in a structure $ {\mathfrak A} $} if $ {\mathfrak A} $ is a structure for a language including ${\mathcal L}$, $s$ is an assignment for the structure $ {\mathfrak A} $, and if $s$ satisfies in the familiar sense every schematic-variable-free deductive consequence with respect to full schematic logic of $\phi $ in any language ${\mathcal L}^{+}\supset{\mathcal L}$ in any expansion of $ {\mathfrak A}$ to a structure for a language including ${\mathcal L}^{+}$.

Let $T $ be a theory in a language ${\mathcal L}$. A structure ${\mathfrak A} $ is a model of the full schematic theory $T$ if every assignment for ${\mathfrak A}$ satisfies every formula in $T$ with respect to full schematic logic.

To say that any expansion has a certain property is not the universally quantified statement that every expansion has the property—that would require a notion of the class of all possible expansions, a notion to which I am not entitled. It is rather a full schematic metalinguistic statement that will let us infer, whenever we are given an expansion, that it has the property. That is a perfectly clear and formally precise account of a constraint on how we may expand our language. It can be used to represent our metaphysical commitments concerning models of a theory, that is for describing what properties we are committed to any model maintaining even as we revise our language.

ASIDE: The models of vagueness I introduced in the vagueness seminar yesterday were for a fixed language, but it would be routine to modify them to allow different languages at different leaves. In such a model of vagueness, each full scheme would have each linguistic object of suitable kind (that is, of the right order and number and obeying whatever restrictions are imposed by the schema) of the total language as a potential substituend at each node. Of course, that would include different items at different nodes. I do not pursue the details further because typical cases of adding new items to the language that are of interest for our purposes are not cases in which it is vague whether a given item is part of the language, but cases of genuine change—additions to the language that are new and represent an expansion of our linguistic resources. Such innovation is orthogonal to the issue of vagueness.

The present definition is the correct one in the present setting. When a theory is defined, not for a single language, but for a single language and all of its expansions, it only seems natural to require not only that a model of the theory in the original language satisfy the theory, but also that expansions of the model continue to satisfy the theory in the corresponding expanded languages.

Peano Arithmetic with Primitive Recursion

Though primitive recursion is not a part of the axiomatic basis of Peano arithmetic, it is central to another standard arithmetic theory, namely, primitive recursive arithmetic. The first theory I wish to consider is a kind of amalgam of Peano arithmetic and primitive recursive arithmetic. The fact that primitive recursive arithmetic is a standard codification of what many have considered to be the most fundamental and basic properties of the natural numbers serves nicely to make the point that taking primitive recursion to be a fundamental part of the basis of arithmetic is not an ad hoc trick in defense of categoricity, but simply a recognition of the importance in that context of a long and deeply held view.

The language of our schematic theory PAPR$^{(+)}$ (full schematic Peano arithmetic with primitive recursion), which will be a first-order theory with two schemas, will be $\{0,S\}$ plus a functor $\mathbb P$ (where the symbol is chosen arbitrarily) that takes two formulas of our language plus four variables to a relation symbol of our language—we give a more complete specification below. Since we take the functor itself to be part of the language, relation symbols formed using it will be relation symbols of the language and, if we expand the language with new symbols then new relation symbols formed using them in conjunction with the functor will also become relation symbols of the augmented language.

The axioms will be the successor axiom of Peano arithmetic, the induction schema, and an additional schema, the primitive recursion schema:

\[({\mathbb P}(P,Q)(0,w)\leftrightarrow P(w))\land ({\mathbb P}(P,Q)(Sx,z)\leftrightarrow (\exists y) ({\mathbb P}(P,Q)(x,y)\land Q(x,y,z))).\]
I must explain how to substitute $\hat{w}\phi$ for $P$ and $\hat{x}\hat{y}\hat{z}\psi $ for $Q$ in ${\mathbb P}(P,Q)(t_{1},t_{2})$, where $w$, $x$, $y$, and $z$ are any variables, $\phi$ and $\psi$ are any formulas, and $t_1$ and $t_2$ are any terms. For any formula $\theta$, let $\Fv (\theta)$ be the set of free first-order variables in it. The relation symbol ${\mathbb P}(\hat{w}\phi ,\hat{x}\hat{y}\hat{z}\psi)$ has $n+2$ places, where $n$ is the number of variables in
\[p=(\Fv (\phi )-\{ w\})\cup (\Fv (\psi )-\{ x,y,z\} ).\]
The atomic formula that results from the indicated substitution into ${\mathbb P}(P,Q)(t_{1},t_{2})$ is ${\mathbb P}(\hat{w}\phi ,\hat{x}\hat{y}\hat{z}\psi )(t_{1},t_{2},\overline{x})$, where $\overline{x}$ is the variables (parameters) in $p$ listed in alphabetical order. We interpret the $\mathcal L$-substitution rule to apply to substitutions in the newly extended sense and extend it to apply to both $P$ and $Q$, and we similarly extend the definitions that employ the rule.

Since we are taking the functor $\mathbb P$ to be in the base language, we shall be able to prove theorems about relation symbols constructed from $\mathbb P$ by induction. In particular, we shall be able to prove that if there is a unique $w$ such that $\phi (w)$ and for every $x$ and $y$ there is a unique $z$ such that $\psi (x,y,z)$, then ${\mathbb P}(\hat{w}\phi ,\hat{x}\hat{y}\hat{z}\psi )(x,y,z)$ is a well-defined function—I omit showing the parameters here and frequently below. Thus, we can define addition by

\[x+y=z&\leftrightarrow{\mathbb P}(\hat{w}w=u,\hat{x}\hat{y}\hat{z}Sy=z)(y,z,x)\]
and multiplication by
\[x\cdot y=z&\leftrightarrow{\mathbb P}(\hat{w}w=0,\hat{x}\hat{y}\hat{z}{\mathbb P}(\hat{w}w=u,\hat{x}\hat{y}\hat{z}Sy=z)(u,z,y))(y,z,x).\]

PAPR$^{(+)}$ has the conceptual advantage of employing a uniform primitive recursion schema instead of treating each primitive recursive definition separately. That, for example, makes it possible to prove in schematic form that if there is a unique $w$ such that $\phi (w)$ and for every $x$ there is a unique $y$ such that $\psi (x,y)$, then ${\mathbb P}(\hat{w}\phi ,\hat{x}\hat{y}\psi )(x,y)$ is a well-defined function, instead of proving that separately for each primitive recursive definition.

The fragment of PAPR$^{(+)}$ in the fixed language

\[\{0,S,{\mathbb P}(\hat{w}w=u,\hat{x}\hat{y}\hat{z}Sy=z),{\mathbb P}(\hat{w}w=0,\hat{x}\hat{y}\hat{z}{\mathbb P}(\hat{w}w=u,\hat{x}\hat{y}\hat{z}Sy=z)(u,z,y))\}\]
(where the last two relation symbols are the primitive recursive definitions of addition and multiplication given above is definitionally equivalent to PA, that is, that fragment and PA share a common extension by definitions. I shall therefore say, speaking somewhat loosely, that PA is included in PAPR$^{(+)}$. Analogously, PA$^{(+)}$ is included in PAPR$^{(+)}$.

The theory PAPR$^{(+)}$ does not include primitive recursive arithmetic for the simple reason that primitive recursive arithmetic is formulated in terms of function symbols instead of relation symbols. It is, however, a straightforward exercise to show that primitive recursive arithmetic is included in an extension by definitions of PAPR$^{(+)}$, which I take to mean that PAPR$^{(+)}$ incorporates primitive recursive arithmetic.

It is a deeper and more surprising fact, due to Gödel, that all of PAPR$^{(+)}$ restricted to the base language is included in an extension by definitions of PA, and hence that PAPR$^{(+)}$ restricted to that language is in effect merely a somewhat redundant formulation of PA. It is presumably for that reason that axiomatic theories like PAPR are not familiar. Nonetheless, PAPR is an axiomatization that employs only core ideas about the natural numbers.

When we consider adding a new unary relation $R$ to the base language of PAPR$^{(+)}$ things become more interesting. (We consider adding a unary relation for definiteness—the same kinds of considerations would apply for any relation or function.) The theory PAPR$^{(+)}$ automatically adds the requisite new instances of both the primitive recursion and induction schemas when the language is expanded. Since we are adding the new symbols into the induction schema, we must establish that the addition is acceptable. I will offer two arguments for that. The first argument is direct: The use of primitive recursion in concert with new relations and even across domains without any presumption of a background theory is such a well-established part of traditional mathematical practice that no extra-mathematical objection to it could be tenable. It just is a part of standard mathematical practice that we do take arbitrary primitive recursive definitions to be unexceptionable, and we allow them into the induction schema freely. Moreover, that mathematical practice long antedates the idea of a background set theory, which provides some additional support for the common view that the practice does not require the presupposition of a background set theory.

The second argument is that primitive recursive definitions of relations are acceptable because they are canonically equivalent to positive inductive definitions, which I shall argue below are acceptable on independent grounds.

The theory PAPR$^{(+)}$ in the language formed by adding the new predicate $R$ to the base language has an extension by definitions that has as parts both PA$^{(+)}$ and the theory of functions primitive recursive in $R$. The proofs are straightforward extensions of the proofs of the comparable results without the new predicate $R$. Those facts provide an additional argument that the full theory PAPR$^{(+)}$ is the best codification of our preformal intuitions concerning the natural numbers. It is the theory that extends in the way that reflects our unreflective expectations: Anyone who understands primitive recursion automatically understands primitive recursion in additional relations. The idea is not so much a new one as an automatic extension of the old one. It is the theories PAPR$^{(+)}$ that reflect that reality.

Perhaps surprisingly, PAPR$^{(+)}$ is not an essentially new theory: it can be obtained as an extension by definitions of PA$^{(+)}$ by straightforward application of Gödel's method.

It is only when we consider the relativized theory PAPR$^{(+)U}$ that the reason for considering PAPR$^{(+)}$ and not just PA$^{(+)}$ becomes apparent. The theory PAPR$^{(+)U}$ is PAPR$^{(+)}$ with the quantifiers in the axioms and schemas relativized to a unary predicate $U$. Unrelativized formulas are allowed as substituends for the schematic letters in the substitution rule. In addition, we take PAPR$^{(+)U}$ to have the formulas $U(0)$ and $(\forall x)(U(x)\rightarrow U(Sx))$ as axioms. (The need for the additional axioms could be avoided by formulating the theory without constant or function symbols in the first place. Comparable axioms for addition and multiplication are not needed: they are theorems proved by induction.) The two arguments mentioned earlier for believing the primitive recursion schema to be an acceptable addition to PAPR$^{(+)}$ are equally arguments for believing the primitive recursion schema to be an acceptable addition to PAPR$^{(+)U}$, where, of course, the schema is now taken to be relativized to $U$.

The raison d'être of PAPR$^{(+)U}$ is that it makes it possible to consider the natural numbers in conjunction with other systems. For example, consider the natural numbers along with a group. The axioms may be taken to be PAPR$^{(+)U}\cup{\text{G}}^{U'}$, where ${\text{G}}$ is the axioms for the group. Natural number exponentiation is

\[{\mathbb P}(\hat{w}w=e,\hat{x}\hat{y}\hat{z}y\times u=z)(x,z,y),\]
and existence and uniqueness on the appropriate domains is readily proved by induction. With PAPR$^{(+)U}$ it is not necessary to add anything new to get exponentiation. Gödel's method does not apply here, since it does not yield codes for sequences of members of a group, and so PAPR$^{(+)U}$ is truly stronger than PA$^{(+)U}$ in the sense that for an arbitrary theory $T$ the theory PAPR$^{(+)U}\cup T^{U'}$ need not be an extension by definitions of PA$^{(+)U}\cup T^{U'}$. $^{\text{footnote}}$

The theory PAPR$^{(+)U}$ is my candidate for a codification of our basic preformal intuitions concerning the natural numbers. I shall not attempt to give here all my reasons for taking such primitive recursion to be central, resting content with noting that many such functions have historically been employed without the special notice that would have been required had they been essentially new (for example, natural number exponentiation of fractions, polynomials, and complex numbers and iteration of functions) and that those same functions are introduced today in secondary schools without any idea that a new method or procedure is being invoked: it is taken for granted, I believe quite rightly, that such procedures are an integral part of what has already been taught about the natural numbers. Moreover, the procedures are acceptable because they are positive inductive definitions—see below.

Now that I have shown that PAPR$^{(+)U}$ is a natural, independently motivated theory of the natural numbers not introduced in an ad hoc way to solve the Skolem problem, we can see how it helps with that problem:

Theorem. Let $T$ be the theory in the language ${\mathcal A}^{2}\cup\{{\mathbb P},{\mathbb P}'\}$ that is the union of the theory PAPR$^{(+)U}$ and the primed version PAPR$^{(+)U}{}'$ of the very same theory. Then there is a binary relation symbol $I$ in the language such that the following formulas are theorems of $T$:

  1. $(\forall x)(U(x)\rightarrow (\exists y)(U'(y)\land (\forall z)(I(x,z)\leftrightarrow z=y)))$
    $\quad$ ($I$ is a function from $U$ to $U'$),
  2. $(\forall x)(U'(x)\rightarrow (\exists y)(U(y)\land (\forall z)(I(z,x)\leftrightarrow z=y)))$
    $\quad$ ($I$ is one-to-one and onto),
  3. $I(0,0')$,
  4. $(\forall x)(\forall x')(\forall y)(\forall y')(I(x,x')\land I(y,y')\rightarrow Sx=y\leftrightarrow S'x'=y')$,
  5. for any variables $w$, $x$, $y$, and $z$ and formulas $\phi$ and $\psi$ that are obtained from formulas of the base language of PAPR by bounding all quantifiers by $U$,
    \[(\forall x_{1})(\forall x'_{1})\dotsm (\forall x_{n})(\forall x'_{n})(I(x_{1},x'_{1})\land\dots\land I(x_{n},x'_{n})\]
    \[ \rightarrow{\mathbb P}(\hat{w}\phi ,\hat{x}\hat{y}\hat{z}\psi)(x_{1}\dots x_{n})\leftrightarrow{\mathbb P}'(\hat{w}\phi ',\hat{x}\hat{y}\hat{z}\psi ')(x'_{1}\dots x'_{n})),\]
    where $n$ is the number of places of ${\mathbb P}(\hat{w}\phi ,\hat{x}\hat{y}\hat{z}\psi )$ and $\phi '$ and $\psi '$ are the primed versions of $\phi$ and $\psi$.

In other words, in any model of $T$, $I$ is an isomorphism between the unprimed and primed models of PAPR.

The theorem should be compared with work of Parsons, who noted that one can prove two models of arithmetic isomorphic via a primitive recursively defined isomorphism. The $I$ of the theorem is

\[{\mathbb P}(\hat{w}w=0',\hat{x}\hat{y}\hat{z}z=S'y).\]

The theorem does not require adding any symbols or definitions to the combination of the theory with the primed version. The theory PAPR$^{(+)U}$ provides a class of functions, the primitive recursive functions, that can be defined between the natural numbers and other structures without a background notion of an extension. Thus, one can hold that we can successfully characterize the natural numbers with a fully formal theory, PAPR$^{(+)U}$, without any need for the ill-defined notion of acceptability of expansions of a language.

Why primitive recursion?

The very notion of a full schematic theory involves systematically adding new axioms to an antecedently given theory. There is, of course, no formal obstacle to adding any axioms whatsoever to any theory whatsoever. The main question is when the resulting theory will be satisfiable—when it will have a model, and in the case of full schematic theories, when it will have a model in our new specially defined sense. That question has a well-known answer in the case of ordinary first-order theories: by the completeness theorem, a necessary and sufficient condition for a theory to be satisfiable is that it be consistent. That condition is not sufficient for full schematic theories: for example, the theory ${\text{PA}}^{(+)}+\lnot{\text{Con(PA)}}$, where Con(PA) is the sentence asserting the consistency of PA used in Gödel's second incompleteness theorem, is (if PA is consistent) consistent but it has no models. Here is the easy proof: A model in the ordinary first-order sense of this theory must be nonstandard, since PA is consistent, but PA$^{(+)}$ has no nonstandard models by the categoricity theorem.

I do not know of a nontrivial necessary and sufficient condition for a full schematic theory to be satisfiable, certainly not one as straightforward as simple consistency, but there is a useful general sufficient condition. There is also an additional sufficient condition useful for some special cases, including PA.

My main interest in full schematic theories has been their use in characterizing mathematical domains. The full schematic principles of interest concern characteristics of the domain that survive the arbitrary addition of new relations. It is therefore natural to ask when the addition of a theory $A$ to a full schematic theory $T^{(+)}$ adds new relations without changing the domain. I shall call such an addition an acceptable addition. For example, adding $\lnot{\text{Con(PA)}}$ to a theory of arithmetic is not acceptable, because satisfying it may require adding new elements (nonstandard numbers) to the domain.

If an addition to a full schematic theory is an acceptable addition, then any intended model of the full schematic theory can be expanded to a model of the addition. Thus, if a full schematic theory is satisfiable, so is the theory plus any acceptable addition.

The condition is sufficient, but it is surely not necessary. For example, large cardinal axioms are not acceptable additions to set theory—the whole point is that they add things to the domain. Nonetheless, there may well be good reason to allow adding on large cardinal axioms.

We can give the notion of an acceptable addition a mathematically precise explication as follows: an addition $A$ to a theory $T$ is acceptable with respect to a class of models $\Gamma$ if every model of $T$ in $\Gamma$ has an expansion to a model of $A$. Actually, we shall need a slightly more general definition to allow additions with free variables for parameters: an addition $A$ to a theory $T$ is acceptable with respect to a class of ordered pairs of the form $\langle {\mathfrak A},s\rangle$, where ${\mathfrak A}$ is a model and $s$ is an assignment for ${\mathfrak A}$, if every pair $\langle {\mathfrak A},s\rangle$ in $\Gamma$ is such that if ${\mathfrak A}$ is a model of $T$ then ${\mathfrak A}$ has an expansion to a model in which $s$ satisfies $A$.

In the case of PA$^{(+)}$, the intended model is a smallest model. It is isomorphic, in a suitable sense, to a substructure of every structure that is a model of PA$^{(+)}$ with respect to a fixed language. That follows from the following theorem:

Theorem. Let $T$ be a theory that is complete for atomic sentences (that is, if $\sigma$ is an atomic sentence then either $\sigma$ or $\lnot\sigma$ is a consequence of $T$). Then, with respect to a fixed language, any model of $T$ in which every member of the domain is denoted by a closed term is a smallest model of $T$.

The theory PA, and hence PA$^{(+)}$, satisfies the hypothesis: The intended model is one in which every member of the domain is denoted by a closed term. The atomic sentences are all variable-free equations, and PA suffices to prove all the true ones and to disprove all the false ones.

Proof: Let $T$ be a theory and $\mathfrak A$ be a model as in the hypothesis of the theorem. Let $\Delta$ be the set of all atomic and negation atomic sentences that are consequences of $T$. Then $\mathfrak A$ is a model of $\Delta$, and $\Delta$ is a maximal consistent set of atomic sentences. Thus, $\Delta$ is the atomic diagram of $\mathfrak A$. Since every model of $T$ is a model of $\Delta$, it follows that $\mathfrak A$ is isomorphic to a substructure of every model of $T$. $\square$

Since PA$^{(+)}$ has a smallest model, it will be useful to show that any universal theory (that is, any set of formulas of the form $(\forall \overline{x})\phi$, where $\phi$ is free of quantifiers) is an acceptable addition to any theory $T^{(+)}$ with respect to the class of smallest models of $T^{(+)}$ in a fixed language if $T^{(+)}+A$ is consistent:

Theorem. Suppose that some theory $T$ has a smallest model $\mathfrak A$. Then ${\mathfrak A}$ can be expanded to a model of any universal theory $A$ that is consistent with $T$.

Proof: Since $T+A$ is consistent, it has a model $\mathfrak B$. Since $\mathfrak A$ is a smallest model of $T$, we may as well assume that $\mathfrak A$ is a substructure of the reduct of $\mathfrak B$ to the language of $\mathfrak A$. The substructure of $\mathfrak B$ with universe the universe of $\mathfrak A$, call it ${\mathfrak A}^+$, is thus a model of $T$. Since ${\mathfrak A}^+$ is a submodel of $\mathfrak B$ and $\mathfrak B$ is a model of $A$, it follows that ${\mathfrak A}^+$ is a model of $A$—the theory $A$, being universal, has the submodel property. $\square$

By the Matijasevi$\v{\text{c}}$--Robinson--Davis resolution of Hilbert's tenth problem, there is a universal sentence $\sigma$ in the language of PA that can be proved in PA to be equivalent to Con(PA). Since $\sigma$ is universal, ${\text{PA}}^{(+)}+\sigma$ is satisfiable, by the theorem. The preceding considerations now give us a justification for our intuitive preference of PA+Con(PA) over PA$+\lnot$Con(PA): By the theorem, if PA is consistent, PA$^{(+)}+\sigma$, and therefore PA$^{(+)}+$Con(PA), is satisfiable in the base language, and therefore, by the categoricity result, PA$^{(+)}+\lnot$Con(PA) is not.

Next, we show that one can always add certain inductive definitions of new relations to any full schematic theory, that is, that such additions are acceptable additions to any full schematic theory with respect to the class of all ordered pairs of models and assignments for them.

I shall introduce the notion of an inductive definition following Moschovakis, though my terminology is perhaps closer to that of Aczel. An $n$-_ary operator_ $\Gamma$ on a set $A$ is a function from ${\text{power}} (A^{n})$ to ${\text{power}} (A^{n})$. An $n$-ary operator $\Gamma$ on $A$ is monotone if for all $X, Y\subseteq A^{n}$, if $X\subseteq Y$, then $\Gamma (X)\subseteq \Gamma (Y)$. A set $B\subseteq A^{n}$ is a fixed point of an $n$-ary operator $\Gamma$ on $A$ if $\Gamma (B)=B$.

Theorem. Every monotone operator $\Gamma$ has a fixed point, and indeed a least fixed point.

The proof is straightforward: Define, for any ordinal $\xi$, $I_{\Gamma}^{\xi}=\Gamma (\bigcup_{\eta <\xi }I_{\Gamma}^{\eta})$ and let $\kappa$ be the least ordinal whose predecessors form a set of cardinality the cardinality of $A$. Then $I_{\Gamma}^{\kappa ^{+}}$ is the least fixed point (compare Moschovakis).

Let $\mathfrak A$ be a structure for the language $\mathcal L$ with universe $A$, let $S$ be an $n$-placed relation symbol not in the language $\mathcal L$, let $\phi (x_{1}, \dots , x_{n},y_{1},\dots ,y_{m})$ be a formula in the language ${\mathcal L}\cup\{S\}$ with all free variables among those displayed, and let $a_{1}, \dots ,a_{m}$ be members of $A$. Then the $n$-ary operation $\Gamma _{\phi , a_{1}, \dots ,a_{m}}^{\mathfrak A}$ on $A$, the operation defined on $\mathfrak A$ by $\phi$ with parameters $a_{1}, \dots ,a_{m}$, is the operation that takes any $B\subseteq A^{n}$ to

\[\left\{\langle b_{1}, \dots ,b_{n}\rangle\in A^{n}:({\mathfrak A},B)\models\phi\left[\!{{x_{1},\dots , x_{n},y_{1}, \dots ,y_{m}}\atop{b_{1},\dots , b_{n},a_{1}, \dots ,a_{m}}}\!\right]\right\},\]
where $({\mathfrak A}, B)$ is the expansion of $\mathfrak A$ to the language ${\mathcal L}\cup\{ S\}$ in which the interpretation of $S$ is $B$ and where the material in square brackets indicates the obvious assignment of objects to variables. If $S$ occurs positively in $\phi$ (that is, if the connectives $\rightarrow$ and $\leftrightarrow$ do not occur in $\phi$ and if no occurrence of $S$ in $\phi$ is inside the scope of a negation sign), then