This is the second post of the agreement series. If you already know what agreement means in linguistics, and you know some GF (topics introduced in lessons 1–4 of the GF tutorial), keep reading. Otherwise, I recommend starting with the first post.
Aside from an explicit vocative and imperative, we don’t really think of the addressee as an argument. Yet it is there—usually a speech act is directed to some audience. So this is a thing that happens in some languages.
The verb form for eat may or may not agree with I and bread—we mark that with the morpheme X. But crucially, in addition to the explicit arguments, it encodes you, the addressee. This raises a couple of questions:
I’m going to address the question with examples from Japanese and Basque.
There’s a funny passage from Surely You’re Joking, Mr. Feynman, about learning Japanese. (You can read the whole thing in the link.) Basically, Feynman learns that in order to translate a verb like to see, you have to know the whole context, who is talking to whom:
my garden | your (respectful) garden | your (extra respectful) garden | |
---|---|---|---|
I see… | “May I observe your gorgeous garden?” | “May I hang my eyes on your most exquisite gardens?” | |
You see… | “Would you like to glance at my lousy garden? |
In English, these are just different sentences: different verbs (glance at, observe and hang one’s eyes on) and different modifiers on garden. But Japanese has more strategies for politeness. Like English, Japanese can swap a plain verb for a more polite variant, but it can also keep the same verb and just inflect it in a more polite way.
“How is this related to agreement? Isn’t there another term for this, like register?”
Yes, but a speaker chooses the register depending on their audience. So it’s ultimately the addressee, or rather the broader situation, that is encoded in the verb form.
“I think you’re stretching the definitions here.”
Save your complaints to the third post, where I’m going to argue that preposition contraction is agreement. Now let’s look at Japanese.
Japanese has been in the RGL since 2012. All credit for the actual code goes to Liza Zimina (link to paper). Any misunderstandings about Japanese morphosyntax are mine.
Here are two sentences, loosely inspired by Feynman’s examples, as rendered by the Japanese RG.
Lang> p "I want to see the garden" | l -lang=Jpn
私 は 庭 を 見たい です
watashi wa niwa o mitai desu
Lang> p "the teacher wants to see the garden" | l -lang=Jpn
先生 は 庭 を 見たがって います
sensei wa niwa o mitagatte imasu
In these two sentences, we see different forms of the verb 見る ‘to see’. Note that this difference is not person agreement! The form mitagatte imasu reflects that I should be more respectful when talking about other people, but I can spend fewer morphemes (mitai desu) when talking about myself.1
“But isn’t that the definition of person agreement? You conjugate the verb differently when the subject is yourself vs. other people.”
If you always spoke plainly about yourself and politely about others, and mixing the two was as wrong as English I wants or she want, then I too would call it person agreement. But that’s not the case. In a context where everyone speaks politely, it’s normal to refer to yourself politely.
“Why does the Japanese RG then have this distinction? Would you sometimes want to say watashi wa … mitagatte imasu?”
Some individual constructions are particularly weird to use on the wrong person. Wanting is one of such cases, as well as giving and receiving. The majority of the verbs in the RGL lexicon inflect the same for different subjects, but since some verbs have this distinction, VP
needs to have the structure for it.
Let’s look at the types of NP, VP and Cl. I have omitted parameters that are unrelated to politeness.
param
Speaker = Me | SomeoneElse ;
Style = Plain | Resp ;
lincat
NP = {s : Style => Str ; speaker : Speaker ; … } ;
VP = {s : Speaker => Style => … => Str ; … } ;
Cl = {s : Style => … => Str ; … } ;
lin
PredVP np vp = {
s = \\style,… => np.s ! style ++
vp.s ! np.speaker ! style ! …
} ;
There are two dimensions in this sociolinguistic puzzle: the speaker and the overall style. The Japanese resource grammar forces you to spend more morphemes to say what others want vs. what you want, but you can still choose the general style of your sentence. Here’s the table for “__ want(s) to see”.
Plain | Respectful | |
---|---|---|
Me | 見たい ‘mitai’ | 見たいです ‘mitai desu’ |
SomeoneElse | 見たがっている ‘mitagatte iru’ | 見たがっています ‘mitagatte imasu’ |
Luckily, it’s extremely easy to know if a NP is me or someone else—either it’s UsePron i_Pron
or any other NP. So PredVP
can choose the speaker, and Cl
has only the overall style open. The overall style is chosen by explicit constructors, as we will see in the next section.
We have seen two strategies for encoding politeness:
VP
as a combination of Speaker
and Style
. The Speaker
parameter makes a difference only in very specific constructions, like wanting or giving.Just to be clear: I don’t think RGL should support different lexemes as inflection forms of the same verb, unless it’s a question of well established suppletion. A resource grammar should offer an API to syntactic structures and lexicon, not try to be too clever about usage.
If you need to use glance at and observe as if they belonged to the same verb, I recommend creating a custom type: a record with several V*
fields, and a custom parameter to control when each of them are used.
To choose the forms of those verbs, see the next section.
Speaker—The Speaker
parameter only makes a difference in a few select constructions, and I don’t think you would ever want to override it.
Style—The Style
parameter is open all the way up to Utt
. The default style for Phr
is respectful, chosen by the RGL API function mkPhr
.
The plain style can be chosen with ExtraJpn.
StylePartPhr
, which has the type signature Level -> Part -> PConj -> Utt -> Voc -> Phr
.
To use it in your grammar, you need to open ExtraJpn
in your concrete syntax like this:
concrete TestJpn of Test = open SyntaxJpn,
(E=ExtraJpn),
(P=PhraseJpn) in {
lincat
MyUtt = Utt ;
MyPhr = Phr ;
lin
-- : MyUtt -> MyPhr ;
MyPolitePhr utt = mkPhr utt ; -- use the API function
MyPlainPhr utt = E.StylePartPhr E.Informal P.NoPConj
E.PartGa utt P.NoVoc ;
}
I opened ExtraJpn qualified (E=ExtraJpn)
and prefixed all functions from it with E.
, so you can see the origins clearly.
The function StylePartPhr
, as well as its argument categories Level
(politeness level) and Part
(particle) come from ExtraJpn. That makes sense: we define a function that is beyond the core RGL API, so then we also need to define its argument types and values, all in one module.
The only criticism I have for the type signature of StylePartPhr
is that PConj
(phrase-beginning conjunction, e.g. “therefore”) and Voc
(vocative2) are obligatory.
In contrast, the API function mkPhr
has this overload instance:
mkPhr : (PConj) -> Utt -> (Voc) -> Phr ; -- but sleep, my friend
The arguments in parentheses mean that you can leave them out, in which case the phrase won’t have any conjunction nor vocative expression. But the RGL API doesn’t export NoPConj
nor NoVoc
, so we have to import another low-level RGL module, in this case PhraseJpn.
In part I, we learned that Basque verbs mark their subject, object and indirect object in the verb form. And that’s not all! In certain sociolinguistic contexts, Basque verbs mark also the addressee. This marking of the addressee is called allocutive agreement.
First comes a short intro to Basque verbs, and then I will demonstrate allocutivity with a GF grammar. The code is available at gf-agreement-tutorial/allocutive. There are simplifications and strategic omissions all over the place, but if you spot a genuine error, let me know!
Most Basque verbs inflect with a combination of a content-bearing participle and inflection-bearing auxiliary. Imagine if English verbs like sleep, eat and talk had no inflection, but you had to say do sleeping, do eating and do talking. (In fact, English negation behaves like that: I don’t/she doesn’t sleep/eat/talk.)
Basque has several auxiliaries for different number of arguments—not a direct translation, but think be sleeping, have eating and give talking. The auxiliaries are also used independently. In a context like I am sleeping, the intransitive be is just an auxiliary, and without a participle it functions as a copula, like I am old or I am Inari. (Notice how well this example worked in English too.)
Let’s start with the form that has no allocutive agreement. I could be talking to myself, or addressing a group of people.
> p "they are cats" | l -lang=Eus
katuak dira
Katuak means ‘cats’, and dira is the intransitive auxiliary, inflected for a 3rd person plural subject. I’ve dropped all pronouns in my grammar, because all arguments are marked in the verb inflection. Basque word order is SOV, so if there was a subject pronoun, the whole sentence would be hauek katuak dira, ‘they cats are’.
Now suppose I’m saying “they are cats” to a close friend of a binary gender. Then I need to use one of the following forms.
> p "they are cats ( spoken to a woman )" | l -lang=Eus
katuak ditun
> p "they are cats ( spoken to a man )" | l -lang=Eus
katuak dituk
These forms, ditun for a woman and dituk for a man, are an example of allocutive agreement. The form ditun encodes a 3rd person plural argument (just like dira), and in addition, it encodes a 2nd person singular feminine argument—in this case3, the addressee. Dituk is the same, just for a male addressee.
The logic is exactly the same with transitive verbs. In the first example, I’m saying “I see cats” to nobody in particular; in the latter two, to a close friend or a family member.
> p "I see cats" | l -lang=Eus
katuak ikusi ditut
> p "I see cats ( spoken to a woman )" | l -lang=Eus
katuak ikusi ditinat
> p "I see cats ( spoken to a man )" | l -lang=Eus
katuak ikusi ditiat
For those curious about morphology, ikusi is the participle of the verb to see, and ditut/ditinat/ditiat are forms of the transitive auxiliary.
Ditransitive verbs work the same way. To add some interest, I omitted the last form, so if you never got to participate in linguistic olympiads, now is your chance to predict the ditransitive auxiliary form of “I give them cats”, spoken to a man.
> p "I give them cats" | l -lang=Eus
katuei eman dizkiet
> p "I give them cats ( spoken to a woman )" | l -lang=Eus
katuei eman zizkienat
> p "I give them cats ( spoken to a man )" | l -lang=Eus
katuei eman ________
To check your answer, you can linearise the tree PredVPMasc i_NP (ComplV3 give_V3 they_NP cats_NP)
in the GF grammar.
These are the implementation details of my demo grammar at gf-agreement-tutorial/allocutive. I have simplified some things to concentrate on allocutivity, but the basic principles are the same in the actual Basque RG.
V*
As mentioned earlier in this post, Basque verbs consist of an auxiliary, which inflects in hundreds of forms, and a participle, which for the purposes of this grammar doesn’t inflect.
With the lexical categories V
, V2
and V3
, we only need to store the participle, so their lincat is as simple as {s : Str}
.
lincat
V, V2, V3 = {s : Str} ; -- Invariant participle
In addition to the participle, VP
contains a parameter ObjAgr
, to record its origin (V
, V2
or V3
) and the agreement of its objects.
param
ObjAgr = Intrans -- No object
| Trans Obj -- Direct object (Sg1..Pl3)
| Ditrans
DObj -- Direct object (only number)
IObj ; -- Indirect object (Sg1..Pl3)
lincat
VP = {
s : Str ; -- Invariant participle + maybe object
a : ObjAgr -- The object(s) agreement
} ;
UseV
and ComplV*
The objects are added in ComplV*
: the string in the s
field, and the agreement in the a
field. UseV
adds no objects, just the ObjAgr
value Intrans
to keep track that the VP
is intransitive (i.e. it came from V
).
lin
UseV v = v ** {
a = Intrans
} ;
ComplV2 v2 obj = {
s = obj.s ! Abs ++ v2.s ; -- OV word order
a = Trans obj.a ; -- Obj. agreement kept in the VP
} ;
ComplV3 v3 dobj iobj = {
s = dobj.s ! Abs ++ iobj.s ! Dat ++ v3.s ;
a = Ditrans (agr2num dobj.a) iobj.a
} ;
We define three types for inflection tables for the auxiliaries. These are not used as lincats, but only internally in functions of type NP -> VP -> Cl
.
oper
Verb : Type = Subj => Str ;
Verb2 : Type = Obj => Subj => Str ;
Verb3 : Type = DObj => Subj => IObj => Str ;
These inflection tables match the ObjAgr
param as follows, with subject agreement added.
ObjAgr |
Inflection table |
---|---|
Intrans |
Subj => Str |
Trans Obj |
Obj => Subj => Str |
Ditrans DObj IObj |
DObj => Subj => IObj => Str |
The non-allocutive auxiliaries are of types Verb*
, and the allocutive auxiliaries are Gender => Verb*
. I demonstrate below with the intransitive auxiliary—for the rest, you can imagine the same but with more nested tables.
intransAux : Verb = table {
Sg1 => "naiz" ;
{- ... -}
Pl3 => "dira" } ;
allocutive_intransAux : Gender => Verb = \\gend,agr =>
transAux ! agr ! Sg2 gend ; -- Spurious Sg2 agreement
The intransitive case was easy to write, because all of the allocutive intransitives are also forms of the ordinary transitive auxiliary. But the general pattern is the same, regardless if we type every form by hand or borrow an already existing inflection table. The allocutive version of the transitive auxiliary has 18 unique forms (in present indicative), and the rest are borrowed from the ditransitive auxiliary. I’m not going to paste it in the blog, but you can read the source code.
NP -> VP -> Cl
functionsThese auxiliaries are not a part of any GF category’s lincat. They only exist as opers, and are called in the functions that create a clause from NP
and VP
. There are two considerations:
VP
in the ObjAgr
parameter.NP -> VP -> Cl
function.The only change to the standard RGL is that I’ve added two new functions to form clauses.
fun
PredVP, -- spoken to any audience or no audience
PredVPFem, -- spoken to a woman
PredVPMasc -- spoken to a man
: NP -> VP -> Cl ;
The implementations of the predication functions are identical, except for the concrete auxiliaries chosen. In PredVP
, we have this local helper function:
getAux : NounPhrase -> VerbPhrase -> Str = \np,vp ->
case vp.a of {
Intrans => intransAux ! ...
Trans obj => transAux ! ...
Ditrans dobj iobj => ditransAux ! ...
} ;
and in PredVP{Fem,Masc}
we have this instead:
getAuxAllocutive : Gender -> NounPhrase -> VerbPhrase -> Str =
\g,np,vp -> case vp.a of {
Intrans => allocutive_intransAux ! g ! ...
Trans obj => allocutive_transAux ! g ! ...
Ditrans dobj iobj => allocutive_ditransAux ! g ! ...
} ;
If all verbs had different inflection tables, then I’d need to store inflection tables in the VPs. But in this grammar, all VP
s with the same valency have the same auxiliary, so I can encode the auxiliary in a parameter, and have the actual inflection tables as free-floating opers. I’m not sure how much the GF compiler can optimise in this situation—the actual strings won’t be repeated, but I think there would be hundreds of redundant labels in the PGF. (Future work: write a naive version of this grammar and test how well GF compiler can optimise.)
Regardless of the performance, I think that this design is also nicer to read and write, and reflects the participle–auxiliary situation more accurately.
This is a bit of an anticlimax, but allocutives are not yet implemented in the full Basque resource grammar. One day I’ll have a GF retreat in a Basque village by the sea and don’t emerge until the resource grammar is finished.
Once I implement the allocutives, I’m planning to use the same design I used in the demo grammar. I already have a param like ObjAgr
in VP
, and PredVP
chooses the auxiliary, which have types just like Verb
, Verb2
and Verb3
. So all I need to do is to add allocutive versions of the auxiliary opers, and then add PredVPFem
and PredVPMasc
in an Extra module only for Basque.
The core RGL was not prepared to encode addressee in the verb inflection. There is a sort of vocative construction with VocNP
, but that’s only for an explicit addressee: I see cats, John. That’s not enough, we want to also express just I see cats when you say it to someone without saying their name. So an explicit constructor is the only way to go.
Cl
Why at the Cl
level and not higher up? The biggest reason is the size of the inflection tables.
At Cl
, we know already all core arguments to the verb: subject, object and indirect object.
But we’re not done with inflection! There’s still tense, mood, polarity, aspect and whether the clause is main or subordinate. If we kept allocutivity open, the table size would be multiplied by 3. That’s because there are 3 options: no addressee, female addressee and male addressee.
Furthermore, it’s no problem at all for the API to introduce the new PredVP*
funs. Application grammarians only need to replace one mkCl
with Extra.PredVP*
, and the resulting Cl
can be used just like any other Cl
.
More about allocutivity in general: Wikipedia and Antonov (2015).
If you’re already familiar with Basque and were frustrated by my simplifications, here’s another description on Basque allocutivity. I’m a bit confused though, because it says “in the case of bivalent transitives, the allocutive is formed only if the object is 3rd person.” My source for implementing the GF grammar is a text dump of 11889 morphologically analysed verb forms I got from Måns Huldén years ago, and there are lots of allocutive forms with non-3rd person object.
If you think this is weird, consider that even in English, people make linguistic distinctions based on if they’re talking about themselves or others. For example, the names of parents: father_N2
could have an inflection table Me => "dad" ; SomeoneElse => "father"
. ↩
If you’re confused about the use of the word “vocative” in the GF RGL jargon, see this post. In short: we call please_Voc
and VocNP (UsePN john_PN)
vocatives, because they turn any sentence into a vocative sentence. “I’d like a cookie” and it’s raining can be said in any situation, but “I’d like a cookie, please” and “it’s raining, John” are undoubtedly directed at someone. ↩
In fact, dituk and ditun are ambiguous: in addition to being allocutive forms of the intransitive auxiliary, they are also the ordinary forms of the transitive auxiliary (with Pl3
object and Sg2{masc,fem}
subject). With transitive auxiliary, some (not all) of the allocutive forms are ambiguous with ordinary ditransitive auxiliary. With ditransitive auxiliary, there is no further auxiliary to be ambiguous with, so all ditransitive allocutive forms are unique, and genuinely encode 4 arguments.
(This was just a side note; whether the allocutive forms are unique or not, has no effect on how we implement it in GF.) ↩