This is the third post in the agreement series. Reading the previous posts (part I, part II) is recommended for the general context, but as long as you know GF and some basic linguistics, the examples in this post are understandable.
In the first post, we learned that many languages mark the core argument(s) like subject, object and indirect object in the verb inflection.
In the second post, we learned that some languages mark also the addressee. Then I argued that register can also be thought of as agreement: the argument that is marked is the situation itself.
In this post, I’m going to take it a bit more general still. If an orthographical word in any category depends on other elements in the tree, I’ll call that agreement. The rest of this post will be dedicated to just a single example: contraction of adpositions, pronouns and negation particle in Somali. The way we deal with them in GF in the Somali resource grammar is just like we deal with any agreement.
I’ll start by showing 3 examples from Saeed (1999) pages 39 and 110, and Nilsson (2022) page 127.
The following pair from Saeed (1999, p. 110) shows a single adposition u ‘for’, introducing the oblique argument Cali ‘Ali’.
In the English translation, we have a prepositional phrase for Ali. In Somali, the construction is discontinuous, with the NP Cali at the start of the sentence, and the adposition u later, before the verb.
Discontinuity in itself is not a hard problem in GF: thanks to the record syntax, linearisation rules can add strings to a record, and postpone building the phrase until more features are known. However, what makes Somali adpositions a challenging exercise for GF is their obligatory contractions with other parts of speech.
The next example, Saeed (1999 p. 39), shows two adpositions merging.
Again the English translation features two prepositional phrases as continuous constituents: ‘in this way’ and ‘for Farah’. In Somali, the noun phrases sidan ‘this way’ and Faarax ‘Farah’ appear in the beginning of the sentence, and the two adpositions u and u merge into one orthographic word, ugu, before the verb.
The final example (Nilsson 2022, p. 127) features an adposition ka, the object pronoun i and the negation particle ma, all in a single orthographic word igama.
Note that the adposition ka on its own has meanings such as ‘from’ or ‘about’. In this expression, it is part of the compound adposition ka dul1 ‘over’, where the ka component merges with the object pronoun and the negation in the beginning of the sentence, and dul appears before the infinitive verb.
From the systematic listing in Saeed (1999) pp. 38–41, we count at least 80 distinct combinations, many of which feature nontrivial assimilations and metathesis, and are therefore best stored as full forms, not combined from smaller pieces.
Neither Saeed nor Nilsson cover all of the possible combinations, so it is unclear whether they do not contract, or are just unlikely to occur together in a single verbal group. For instance, there is no mention of combining the first item on the list, an impersonal subject pronoun and an object pronoun (4 distinct forms) with one or more adpositions—we assume for the reason that such sentences were not attested in a corpus.
However, later examples in Saeed (1999) reveal forms that are missing from the systematic listing on pages 38–41, such as impersonal subject pronoun + reflexive pronoun. So in addition to the explicitly listed 80 forms, I have included in the GF implementation all forms I could find elsewhere in the two sources, which brings the total number to 88 unique forms.
This is still not a complete list, but based on the source material, it should cover the bulk of the combinations appearing spontaneously. Later you’ll see the params that are the LHS of the inflection tables, and you can count that there are much more than 88 forms—those not found in the sources are at the moment linearised as "???"
. This is very unsatisfactory, and if you have a more complete resource, get in touch with me!
Negation suffix ma can also appear with these 88+ combinations, but its addition is purely concatenative, so I have chosen to attach the negation in a separate step, using run-time gluing with the BIND
token.
These 88+ forms are incorporated into an inflection table in GF.
The inflection table is build out of three parameter types: Adposition
, AdpCombination
(adpositions + impersonal subject pronoun) and AdpObjAgr
(all object pronouns that contract).
First, we introduce a parameter for single adpositions, called Adposition
.
param
Adposition = NoAdp | U | Ku | Ka | La ;
This parameter Adposition
is present in lexical categories, such as Prep
, Adv
, V2
and others that may introduce a nominal argument with an adposition.
The lack of adposition, NoAdp
, is included as an explicit value, corresponding to a direct object.
The second parameter, called AdpCombination
, lists the combinations of adpositions and impersonal subject pronouns.2
They are represented as one parameter, because both of these components can combine with object pronouns: it was a natural way to cluster the parts of speech: object pronouns vs. everything else.
param
AdpCombination =
Single Adposition -- 0-1 adpositions (0 = NoAdp)
| ImpersSubj Adposition -- impersonal subject + 0-1 adpositions
| Ugu | Uga | Ula -- two adpositions (6 distinct forms)
| Kaga | Kula | Kala ;
The combinations of two adpositions exhibit syncretism: there are only 6 distinct forms, because some of the forms cover many combinations. AdpCombination
is present in phrasal categories where the full information of all objects and obliques and negation is still open.
Referring to the RGL category hierarchy,
Adposition
appears in V2*, V3, N2, A2, Prep, Adv, IAdv, and AdpCombination
in VP, VPSlash and ClSlash.
The third parameter, AdpObjAgr
lists all possible object pronouns that combine with the AdpCombination
values, to produce the 88+ distinct forms.
param
AdpObjAgr = -- NB. separate from verbal agreement
Sg1Obj
| Sg2Obj
| Pl1Obj Inclusion
| Pl2Obj
| ReflexiveObj
| ZeroObj ; -- i.e. the AdpCombination value on its own
The last value is a zero morpheme, used for third person objects, to add no extra segment into the final contraction.
Combining an AdpCombination
with a zero object just returns the combination itself:
ZeroObj
+ Ugu
returns ugu,
whereas Sg1Obj
+ Ugu
returns iigu.
AdpObjAgr
appears in the categories where a nominal object or oblique argument can be added: VP, VPSlash, ClSlash and Adv. Note also that AdpObjAgr
parameter is separate from the parameter that affects the form of the finite verb of the clause. The category NP
records the full agreement Agr
, and AdpObjAgr
is computed from Agr
whenever a NP becomes the object of a VP or an Adv.
The inflection table is of type AdpObjAgr => AdpCombination => Str
and it produces all combinations, within the limits of what data I could find.
allContrs : AdpObjAgr => AdpCombination => Str = table {
Sg1Obj => table {
Single NoAdp => "i" ; -- just the object pronoun
Single Ka => "iga" ;
...
ImpersSubj NoAdp => "lay" ; -- impers.subj + obj.pron
...
Kala => "igala" -- object pronoun + Ka + La
} ;
...
ZeroObj => table {
Single NoAdp => "" ; -- P3 direct object = ∅
Single Ka => "ka" ; -- just the adposition
...
ImpersSubj NoAdp => "la" ; -- just impers. subj
ImpersSubj U => "loo" ; -- impers. subj + adposition
...
Kala => "kala"
}
} ;
In the table, we see values from an empty string (for ZeroObj
+ Single NoAdp
, i.e. third person direct object),
up to combinations of 3 elements, such as igala for Sg1Obj
+ Kala
(which originally comes from Ka
+ La
).
Below is a part of the rules on how to combine two adpositions into a combination parameter.
combine : Adposition -> Adposition -> AdpCombination
combine adp1 adp2 = case <adp1,adp2> of {
<U, U|Ku> => Ugu ;
<U, Ka> => Uga ;
<U, La> => Ula ;
<Ku|Ka,
Ku|Ka> => Kaga ; -- 4 combinations, same form
<Ku, La> => Kula ;
<Ka, La> => Kala ;
<NoAdp, p> => Single p
} ;
Thanks to the syncretism of the contractions, we get away with a much smaller parameter type, which means better performance in the GF code.
Suppose that a VP that inherits the adposition Ku
from a V2 and the adposition Ka
from an Adv. Instead of storing the pair <Ku,Ka>
, the combine
function merges them into a single AdpCombination
value Kaga
.
As explained previously, lexical categories have an Adposition
field, which is elevated to AdpCombination
in phrasal categories.
Below are some examples of how the values are updated at the application of different functions.
UseV : V -> VP
elevates an intransitive verb into a VP, and inserts a Single NoAdp
as the VP’s AdpCombination
. This value can still be updated, if an adverbial is added to the VP.PassV2 : V2 -> VP
makes a V2 into a passive VP, for which the corresponding construction in Somali is to use an impersonal subject: one saw me for I was seen. The function PassV2
elevates the V2s inherent Adposition
into an AdpCombination
with the constructor ImpersSubj
.V2* -> VP
function uses the constructor Single
to wrap the verb’s Adposition
into an AdpCombination
AdvVP : VP -> Adv -> VP
adds an adverbial into a VP. This process updates the VP’s AdpCombination
with the Adposition
of the adverb, with rules similar to previously shown, but this time taking an Adposition
and an AdpCombination
.
The same happens in questions: QuestIAdv : IAdv -> Cl -> QCl
updates the Cl’s AdpCombination
with the IAdv’s Adposition
, and stores the result into the newly constructed QCl.
Below is a GF parse tree for igama dul boodi kartid ‘you cannot jump over me’, where the concrete strings are shown in black Times New Roman, and the internal parameters in colourful Courier font with dotted lines from the lexical categories. The string igama is only linearized when all three contributing parameters are known: negation from Pol
, adposition from Prep
and object pronoun from Pron
.
All these arguments propagate their inherent parameter up to the S level, at which time the inflection table allContrs
is consulted.
The VP didn’t contain any other adposition, so the full value of the AdpCombination
became Single Ka
.
The Adv’s object NP was build out of i_Pron
, which contributed with the AdpObjAgr
value Sg1Obj
.
Finally, the negation morpheme ma was glued onto the iga that came from the inflection table, and the full contraction was linearized into a single token igama.
Contrast with the sentence “I can jump over it” where the adposition doesn’t contract, because the object is 3rd person (i.e. zero morpheme) and the sentence is positive. ↩
Due to gaps in data, impersonal subject pronoun can’t combine with two adpositions—such a state is simply not represented in the param type. Ideally, the ImpersSubj
constructor would combine with one of the 6 combinations of 2 adpositions (and an object pronoun at later step). I do allow the combination impersonal subject + one adposition + object pronoun, but some of the forms have a dummy linearisation, because I couldn’t find a form. ↩