Constituent Structure

References: Feature organization / Positional Featuress

One of the major areas in which CFS PAPPI differs from the older 2.x PAPPI series is in the implementation of constituents. In the CFS implementation, we have constituents of the form:

[c(C,Fs,Pn,Pd,I),...]

That is, all constituents are encoded by non-empty lists. The first element of the list is used to name its category and also to store its associated features. Any sub-constituents will occupy positions starting from the second element of the list.

The first element of the list is a tuple c/5 where:

Item Description
C is the category label, typically an atom.
Fs is the (non-positional) feature list. By default, these features are shared with all projections of the category. Other (positional) features are stored in Pn/Pd and require special handling and interpretation (see below).
Pn is the bitset encoding information about the internal structure of the constituent. Note that these features are known (and can be and are instantiated) as soon as the constituent is constructed (by normal methods). The purpose of this bitset is to enable efficient navigation within the constituent. The features are as follows:

Bit Name Value Description
LSB
0/0000000_
proj 0 inactive, e.g. in primitive constituents.
1 active. See next bits for information about the head.
1
2/000000_0
adjoined 0 nothing adjoined to this constituent
1 there are two subconstituents, an adjunct and a lower segment
2-3
4/0000__00
head 0 left subconstituent is the head.
1 2nd subconstituent is the head.
2 3rd subconstituent is the head. Currently unused.
3 4th subconstituent is the head. Currently unused.

Pd is the bitset encoding information about how the constituent attaches to higher constituents. Note that this information is not available at constituent construction time, and must be filled in when the constituent is attached to its parent. The features are as follows:

Bit Name Value Description
LSB
0/0000000_
lseg 0 constituent is not a lower segment
1 constituent is
1
2/000000_0
adjunct 0 constituent is not an adjunct
1 constituent is
2
4/00000_00
apos[1] 0 constituent occupies an A-bar-position
1 constituent occupies an A-position
3
8/0000_000
apos[2] 0 bitset A-position field cannot be overridden
1 field can be overridden, i.e. consult Fs for the true value.
4
16/000_0000
compl 0 constituent occupies a non-complement position
1 constituent occupies a complement position
5
32/00_00000
indirectObject 0 constituent occupies a non-indirect object position
1 constituent occupies an indirect object position
6
64/0_000000
matrix 0 constituent occupies a non-matrix position
1 constituent occupies a position in the matrix clause

(See feature organization for details on how access to these features is provided. See positional features for examples on how these features are used in PAPPI primitives.)

I is a unique, reserved variable. I should be used for comparison purposes only and should never be instantiated.

Note that no upward "pointers" are provided in the definition - that is, it is not possible to reference the parent of a constituent. In fact, only sub-constituents can be accessed. However, the Pd bitset allows us to find out how a given constituent is attached to its parent - positional information that would be otherwise unavailable without ancestor access.

References: Feature Organization / Positional Featuress