17 November 2020

Generalising agreement, part II: Addressee and other implicit arguments

This is the second post of the agreement series. If you already know what agreement means in linguistics, and you know some GF (topics introduced in lessons 1–4 of the GF tutorial), keep reading. Otherwise, I recommend starting with the first post.

Aside from an explicit vocative and imperative, we don’t really think of the addressee as an argument. Yet it is there—usually a speech act is directed to some audience. So this is a thing that happens in some languages.

I eat-X some bread. (written in a personal diary, nobody sees it.)
I eat-X-SG2 your bread.
I eat-X-SG2 some bread. (addressed to someone)

The verb form for eat may or may not agree with I and bread—we mark that with the morpheme X. But crucially, in addition to the explicit arguments, it encodes you, the addressee. This raises a couple of questions:

Are we dealing with morphosyntax or sociolinguistics?
How should we model it in the GF RGL (Resource Grammar Library)?

I’m going to address the question with examples from Japanese and Basque.

Politeness in Japanese
- Implementation in GF
- Controlling style in an application grammar
Allocutive agreement in Basque
- Implementation in GF
- Controlling allocutivity in an application grammar
Read more
Footnotes

Politeness in Japanese

There’s a funny passage from Surely You’re Joking, Mr. Feynman, about learning Japanese. (You can read the whole thing in the link.) Basically, Feynman learns that in order to translate a verb like to see, you have to know the whole context, who is talking to whom:

	my garden	your (respectful) garden	your (extra respectful) garden
I see…		“May I observe your gorgeous garden?”	“May I hang my eyes on your most exquisite gardens?”
You see…	“Would you like to glance at my lousy garden?

In English, these are just different sentences: different verbs (glance at, observe and hang one’s eyes on) and different modifiers on garden. But Japanese has more strategies for politeness. Like English, Japanese can swap a plain verb for a more polite variant, but it can also keep the same verb and just inflect it in a more polite way.

“How is this related to agreement? Isn’t there another term for this, like register?”

Yes, but a speaker chooses the register depending on their audience. So it’s ultimately the addressee, or rather the broader situation, that is encoded in the verb form.

“I think you’re stretching the definitions here.”

Save your complaints to the third post, where I’m going to argue that preposition contraction is agreement. Now let’s look at Japanese.

Implementation in GF

Japanese has been in the RGL since 2012. All credit for the actual code goes to Liza Zimina (link to paper). Any misunderstandings about Japanese morphosyntax are mine.

Agreement or politeness?

Here are two sentences, loosely inspired by Feynman’s examples, as rendered by the Japanese RG.

Lang> p "I want to see the garden" | l -lang=Jpn
私       は  庭  を 見たい です
watashi wa niwa o mitai desu

Lang> p "the teacher wants to see the garden" | l -lang=Jpn
先生     は  庭  を 見たがって います
sensei wa niwa o mitagatte imasu

In these two sentences, we see different forms of the verb 見る ‘to see’. Note that this difference is not person agreement! The form mitagatte imasu reflects that I should be more respectful when talking about other people, but I can spend fewer morphemes (mitai desu) when talking about myself.¹

“But isn’t that the definition of person agreement? You conjugate the verb differently when the subject is yourself vs. other people.”

If you always spoke plainly about yourself and politely about others, and mixing the two was as wrong as English I wants or she want, then I too would call it person agreement. But that’s not the case. In a context where everyone speaks politely, it’s normal to refer to yourself politely.

“Why does the Japanese RG then have this distinction? Would you sometimes want to say watashi wa … mitagatte imasu?”

Some individual constructions are particularly weird to use on the wrong person. Wanting is one of such cases, as well as giving and receiving. The majority of the verbs in the RGL lexicon inflect the same for different subjects, but since some verbs have this distinction, VP needs to have the structure for it.

Speaker vs. style

Let’s look at the types of NP, VP and Cl. I have omitted parameters that are unrelated to politeness.

param
  Speaker = Me | SomeoneElse ;
  Style = Plain | Resp ;

lincat
  NP = {s : Style => Str ; speaker : Speaker ; … } ;
  VP = {s : Speaker => Style => … => Str ; … } ;
  Cl = {s :            Style => … => Str ; … } ;

lin

  PredVP np vp = {
    s = \\style,… => np.s ! style ++
                     vp.s ! np.speaker ! style ! …
    } ;

There are two dimensions in this sociolinguistic puzzle: the speaker and the overall style. The Japanese resource grammar forces you to spend more morphemes to say what others want vs. what you want, but you can still choose the general style of your sentence. Here’s the table for “__ want(s) to see”.

	Plain	Respectful
Me	見たい ‘mitai’	見たいです ‘mitai desu’
SomeoneElse	見たがっている ‘mitagatte iru’	見たがっています ‘mitagatte imasu’

Luckily, it’s extremely easy to know if a NP is me or someone else—either it’s UsePron i_Pron or any other NP. So PredVP can choose the speaker, and Cl has only the overall style open. The overall style is chosen by explicit constructors, as we will see in the next section.

Controlling style in an application grammar

We have seen two strategies for encoding politeness:

Lexical—the choice of verb (glance at vs. observe vs. hang one’s eyes on). Not supported in the Japanese RG.
Morphological—the choice of verb form (mitai vs. mitai desu vs. mitagatte iru vs. mitagatte imasu). Supported in VP as a combination of Speaker and Style. The Speaker parameter makes a difference only in very specific constructions, like wanting or giving.

Choice of verb

Just to be clear: I don’t think RGL should support different lexemes as inflection forms of the same verb, unless it’s a question of well established suppletion. A resource grammar should offer an API to syntactic structures and lexicon, not try to be too clever about usage.

If you need to use glance at and observe as if they belonged to the same verb, I recommend creating a custom type: a record with several V* fields, and a custom parameter to control when each of them are used. To choose the forms of those verbs, see the next section.

Choice of verb form

Speaker—The Speaker parameter only makes a difference in a few select constructions, and I don’t think you would ever want to override it.

Style—The Style parameter is open all the way up to Utt. The default style for Phr is respectful, chosen by the RGL API function mkPhr. The plain style can be chosen with ExtraJpn.StylePartPhr, which has the type signature Level -> Part -> PConj -> Utt -> Voc -> Phr. To use it in your grammar, you need to open ExtraJpn in your concrete syntax like this:

concrete TestJpn of Test = open SyntaxJpn,
                               (E=ExtraJpn),
                               (P=PhraseJpn) in {
 lincat
   MyUtt = Utt ;
   MyPhr = Phr ;

 lin
   -- : MyUtt -> MyPhr ;
   MyPolitePhr utt = mkPhr utt ; -- use the API function
   MyPlainPhr utt = E.StylePartPhr E.Informal P.NoPConj
                                   E.PartGa utt P.NoVoc ;
}

I opened ExtraJpn qualified (E=ExtraJpn) and prefixed all functions from it with E., so you can see the origins clearly. The function StylePartPhr, as well as its argument categories Level (politeness level) and Part (particle) come from ExtraJpn. That makes sense: we define a function that is beyond the core RGL API, so then we also need to define its argument types and values, all in one module.

The only criticism I have for the type signature of StylePartPhr is that PConj (phrase-beginning conjunction, e.g. “therefore”) and Voc (vocative²) are obligatory. In contrast, the API function mkPhr has this overload instance:

mkPhr : (PConj) -> Utt -> (Voc) -> Phr ; -- but sleep, my friend

The arguments in parentheses mean that you can leave them out, in which case the phrase won’t have any conjunction nor vocative expression. But the RGL API doesn’t export NoPConj nor NoVoc, so we have to import another low-level RGL module, in this case PhraseJpn.

Allocutive agreement in Basque

In part I, we learned that Basque verbs mark their subject, object and indirect object in the verb form. And that’s not all! In certain sociolinguistic contexts, Basque verbs mark also the addressee. This marking of the addressee is called allocutive agreement.

First comes a short intro to Basque verbs, and then I will demonstrate allocutivity with a GF grammar. The code is available at gf-agreement-tutorial/allocutive. There are simplifications and strategic omissions all over the place, but if you spot a genuine error, let me know!

Basque verbs

Most Basque verbs inflect with a combination of a content-bearing participle and inflection-bearing auxiliary. Imagine if English verbs like sleep, eat and talk had no inflection, but you had to say do sleeping, do eating and do talking. (In fact, English negation behaves like that: I don’t/she doesn’t sleep/eat/talk.)

Basque has several auxiliaries for different number of arguments—not a direct translation, but think be sleeping, have eating and give talking. The auxiliaries are also used independently. In a context like I am sleeping, the intransitive be is just an auxiliary, and without a participle it functions as a copula, like I am old or I am Inari. (Notice how well this example worked in English too.)

No allocutive agreement

Let’s start with the form that has no allocutive agreement. I could be talking to myself, or addressing a group of people.

> p "they are cats" | l -lang=Eus
katuak dira

Katuak means ‘cats’, and dira is the intransitive auxiliary, inflected for a 3rd person plural subject. I’ve dropped all pronouns in my grammar, because all arguments are marked in the verb inflection. Basque word order is SOV, so if there was a subject pronoun, the whole sentence would be hauek katuak dira, ‘they cats are’.

Allocutive agreement with intransitive verbs

Now suppose I’m saying “they are cats” to a close friend of a binary gender. Then I need to use one of the following forms.

> p "they are cats ( spoken to a woman )" | l -lang=Eus
katuak ditun

> p "they are cats ( spoken to a man )" | l -lang=Eus
katuak dituk

These forms, ditun for a woman and dituk for a man, are an example of allocutive agreement. The form ditun encodes a 3rd person plural argument (just like dira), and in addition, it encodes a 2nd person singular feminine argument—in this case³, the addressee. Dituk is the same, just for a male addressee.

Allocutive agreement with transitive verbs

The logic is exactly the same with transitive verbs. In the first example, I’m saying “I see cats” to nobody in particular; in the latter two, to a close friend or a family member.

> p "I see cats" | l -lang=Eus
katuak ikusi ditut

> p "I see cats ( spoken to a woman )" | l -lang=Eus
katuak ikusi ditinat

> p "I see cats ( spoken to a man )" | l -lang=Eus
katuak ikusi ditiat

For those curious about morphology, ikusi is the participle of the verb to see, and ditut/ditinat/ditiat are forms of the transitive auxiliary.

Allocutive agreement with ditransitive verbs

Ditransitive verbs work the same way. To add some interest, I omitted the last form, so if you never got to participate in linguistic olympiads, now is your chance to predict the ditransitive auxiliary form of “I give them cats”, spoken to a man.

> p "I give them cats" | l -lang=Eus
katuei eman dizkiet

> p "I give them cats ( spoken to a woman )" | l -lang=Eus
katuei eman zizkienat

> p "I give them cats ( spoken to a man )" | l -lang=Eus
katuei eman ________

To check your answer, you can linearise the tree PredVPMasc i_NP (ComplV3 give_V3 they_NP cats_NP) in the GF grammar.

Implementation in GF

These are the implementation details of my demo grammar at gf-agreement-tutorial/allocutive. I have simplified some things to concentrate on allocutivity, but the basic principles are the same in the actual Basque RG.

lincats of `V*`

As mentioned earlier in this post, Basque verbs consist of an auxiliary, which inflects in hundreds of forms, and a participle, which for the purposes of this grammar doesn’t inflect.

With the lexical categories V, V2 and V3, we only need to store the participle, so their lincat is as simple as {s : Str}.

lincat
  V, V2, V3 = {s : Str} ; -- Invariant participle

In addition to the participle, VP contains a parameter ObjAgr , to record its origin (V, V2 or V3) and the agreement of its objects.

param
  ObjAgr = Intrans    -- No object
         | Trans Obj  -- Direct object (Sg1..Pl3)
         | Ditrans
              DObj    -- Direct object (only number)
              IObj ;  -- Indirect object (Sg1..Pl3)
lincat
  VP = {
    s : Str ;  -- Invariant participle + maybe object
    a : ObjAgr -- The object(s) agreement
    } ;

`UseV` and `ComplV*`

The objects are added in ComplV*: the string in the s field, and the agreement in the a field. UseV adds no objects, just the ObjAgr value Intrans to keep track that the VP is intransitive (i.e. it came from V).

lin
  UseV v = v ** {
    a = Intrans
    } ;

  ComplV2 v2 obj = {
    s = obj.s ! Abs ++ v2.s ; -- OV word order
    a = Trans obj.a ; -- Obj. agreement kept in the VP
    } ;

  ComplV3 v3 dobj iobj = {
    s = dobj.s ! Abs ++ iobj.s ! Dat ++ v3.s ;
    a = Ditrans (agr2num dobj.a) iobj.a
    } ;

Inflection tables of the auxiliaries

We define three types for inflection tables for the auxiliaries. These are not used as lincats, but only internally in functions of type NP -> VP -> Cl.

oper
  Verb  : Type =                 Subj => Str ;
  Verb2 : Type =          Obj => Subj => Str ;
  Verb3 : Type = DObj => Subj => IObj => Str ;

These inflection tables match the ObjAgr param as follows, with subject agreement added.

`ObjAgr`	Inflection table
`Intrans`	`Subj => Str`
`Trans Obj`	`Obj => Subj => Str`
`Ditrans DObj IObj`	`DObj => Subj => IObj => Str`

The non-allocutive auxiliaries are of types Verb*, and the allocutive auxiliaries are Gender => Verb*. I demonstrate below with the intransitive auxiliary—for the rest, you can imagine the same but with more nested tables.

  intransAux : Verb = table {
    Sg1 => "naiz" ;
    {- ... -}
    Pl3 => "dira" } ;

  allocutive_intransAux : Gender => Verb = \\gend,agr =>
    transAux ! agr ! Sg2 gend ; -- Spurious Sg2 agreement

The intransitive case was easy to write, because all of the allocutive intransitives are also forms of the ordinary transitive auxiliary. But the general pattern is the same, regardless if we type every form by hand or borrow an already existing inflection table. The allocutive version of the transitive auxiliary has 18 unique forms (in present indicative), and the rest are borrowed from the ditransitive auxiliary. I’m not going to paste it in the blog, but you can read the source code.

`NP -> VP -> Cl` functions

These auxiliaries are not a part of any GF category’s lincat. They only exist as opers, and are called in the functions that create a clause from NP and VP. There are two considerations:

Which auxiliary to use? The answer is stored in VP in the ObjAgr parameter.
Allocutive or ordinary version of the auxiliary? The answer is in the choice of the NP -> VP -> Cl function.

The only change to the standard RGL is that I’ve added two new functions to form clauses.

fun
  PredVP,    -- spoken to any audience or no audience
  PredVPFem, -- spoken to a woman
  PredVPMasc -- spoken to a man
    : NP -> VP -> Cl ;

The implementations of the predication functions are identical, except for the concrete auxiliaries chosen. In PredVP, we have this local helper function:

getAux : NounPhrase -> VerbPhrase -> Str = \np,vp ->
  case vp.a of {
    Intrans => intransAux ! ...
    Trans obj => transAux ! ...
    Ditrans dobj iobj => ditransAux ! ...
  } ;

and in PredVP{Fem,Masc} we have this instead:

getAuxAllocutive : Gender -> NounPhrase -> VerbPhrase -> Str =
  \g,np,vp -> case vp.a of {
    Intrans => allocutive_intransAux ! g ! ...
    Trans obj => allocutive_transAux ! g ! ...
    Ditrans dobj iobj => allocutive_ditransAux ! g ! ...
  } ;

If all verbs had different inflection tables, then I’d need to store inflection tables in the VPs. But in this grammar, all VPs with the same valency have the same auxiliary, so I can encode the auxiliary in a parameter, and have the actual inflection tables as free-floating opers. I’m not sure how much the GF compiler can optimise in this situation—the actual strings won’t be repeated, but I think there would be hundreds of redundant labels in the PGF. (Future work: write a naive version of this grammar and test how well GF compiler can optimise.)

Regardless of the performance, I think that this design is also nicer to read and write, and reflects the participle–auxiliary situation more accurately.

Controlling allocutivity in an application grammar

This is a bit of an anticlimax, but allocutives are not yet implemented in the full Basque resource grammar. One day I’ll have a GF retreat in a Basque village by the sea and don’t emerge until the resource grammar is finished.

Once I implement the allocutives, I’m planning to use the same design I used in the demo grammar. I already have a param like ObjAgr in VP, and PredVP chooses the auxiliary, which have types just like Verb, Verb2 and Verb3. So all I need to do is to add allocutive versions of the auxiliary opers, and then add PredVPFem and PredVPMasc in an Extra module only for Basque.

Why custom constructors

The core RGL was not prepared to encode addressee in the verb inflection. There is a sort of vocative construction with VocNP, but that’s only for an explicit addressee: I see cats, John. That’s not enough, we want to also express just I see cats when you say it to someone without saying their name. So an explicit constructor is the only way to go.

Why `Cl`

Why at the Cl level and not higher up? The biggest reason is the size of the inflection tables.

At Cl, we know already all core arguments to the verb: subject, object and indirect object. But we’re not done with inflection! There’s still tense, mood, polarity, aspect and whether the clause is main or subordinate. If we kept allocutivity open, the table size would be multiplied by 3. That’s because there are 3 options: no addressee, female addressee and male addressee.

Furthermore, it’s no problem at all for the API to introduce the new PredVP* funs. Application grammarians only need to replace one mkCl with Extra.PredVP*, and the resulting Cl can be used just like any other Cl.

More about allocutivity in general: Wikipedia and Antonov (2015).
If you’re already familiar with Basque and were frustrated by my simplifications, here’s another description on Basque allocutivity. I’m a bit confused though, because it says “in the case of bivalent transitives, the allocutive is formed only if the object is 3rd person.” My source for implementing the GF grammar is a text dump of 11889 morphologically analysed verb forms I got from Måns Huldén years ago, and there are lots of allocutive forms with non-3rd person object.

Footnotes

If you think this is weird, consider that even in English, people make linguistic distinctions based on if they’re talking about themselves or others. For example, the names of parents: father_N2 could have an inflection table Me => "dad" ; SomeoneElse => "father". ↩
If you’re confused about the use of the word “vocative” in the GF RGL jargon, see this post. In short: we call please_Voc and VocNP (UsePN john_PN) vocatives, because they turn any sentence into a vocative sentence. “I’d like a cookie” and it’s raining can be said in any situation, but “I’d like a cookie, please” and “it’s raining, John” are undoubtedly directed at someone. ↩
In fact, dituk and ditun are ambiguous: in addition to being allocutive forms of the intransitive auxiliary, they are also the ordinary forms of the transitive auxiliary (with Pl3 object and Sg2{masc,fem} subject). With transitive auxiliary, some (not all) of the allocutive forms are ambiguous with ordinary ditransitive auxiliary. With ditransitive auxiliary, there is no further auxiliary to be ambiguous with, so all ditransitive allocutive forms are unique, and genuinely encode 4 arguments.

(This was just a side note; whether the allocutive forms are unique or not, has no effect on how we implement it in GF.) ↩

tags: gf, linguistics

Generalising agreement, part II: Addressee and other implicit arguments

Politeness in Japanese

Implementation in GF

Agreement or politeness?

Speaker vs. style

Controlling style in an application grammar

Choice of verb

Choice of verb form

Allocutive agreement in Basque

Basque verbs

No allocutive agreement

Allocutive agreement with intransitive verbs

Allocutive agreement with transitive verbs

Allocutive agreement with ditransitive verbs

Implementation in GF

lincats of V*

UseV and ComplV*

Inflection tables of the auxiliaries

NP -> VP -> Cl functions

Controlling allocutivity in an application grammar

Why custom constructors

Why Cl

Read more

Footnotes

lincats of `V*`

`UseV` and `ComplV*`

`NP -> VP -> Cl` functions

Why `Cl`