14 May 2020

Generalising agreement, part I: Introduction

Lately I’ve been thinking about verbal agreement in various GF resource grammars.

This will be a series of three posts. The first post—this one—is an introduction to verbal agreement, most useful for a reader with little knowledge in linguistics or GF. The second post will be about implicit arguments: how do we model the real world adding all kinds of inconvenient restrictions on how we speak. In the third post I will talk about obliques and how, if you squint enough, you can kind of interpret preposition contraction and other unrelated things as agreement!

If you’re a seasoned GF grammarian and have any familiarity with linguistics, the first post won’t be all that new information, but you surely wouldn’t want to miss my hot takes on ergativity. If you don’t know what ergativity means, that’s a sign that you need to read the first post.

Agreement in linguistics
No person marking
1-dimensional person marking
Interlude: Nominative-accusative vs. Ergative-absolutive alignment
2-dimensional person marking
- Nominative-accusative: Hungarian
- Ergative-absolutive: Basque
Read more
Footnotes

Agreement in linguistics

Verbal agreement means that one or more of the verb’s arguments are marked in the verb form. If you’re not familiar with other languages than Standard Average European, you’ve probably only seen subject agreement:

	1st person singular	3rd person singular
Intransitive	I sleep-∅	the cat sleeps
Transitive	I drink-∅ water	the cat drinks water

Both verbs, sleep and drink, only show marking for the subject: null morpheme (marked as -∅) for I, and -s for the cat. Water, which is the object of drink, doesn’t contribute to the conjugation. Contrast this with Hungarian [Wikipedia]:

Screenshot of Wikipedia. The text reads: In Hungarian, verbs not only show agreement with their subjects but also carry information on the definiteness of their direct objects. This results in two types of conjugations: definite (used if there is a definite object) and indefinite (if there is no definite object

In fact, multiple agreement is pretty common in languages around the world. The map below is taken from World Atlas of Language Structures (WALS): red dot means “like English”, orange dot means “like Hungarian”.

Now I’ll present different options for argument marking, and how to do them in GF.

For simplicity, I’m only showing positive present indicative forms. You can freely assume, in addition to person marking, some form of tense, aspect, mood, polarity etc.

No person marking

The simplest person marking is no person marking. In the WALS map you’ve just seen, languages without verbal person marking are shown with white dots.

Here’s what an inflection table looks like in Chinese:

lincat
  V,V2 = {s : Str} ;

lin
  sleep_V  = {s = "睡觉"} ;

  drink_V2 = {s = "喝"} ;

Of course, languages without person marking in the verb are perfectly capable of expressing who does what. Presence or absence of person marking doesn’t imply anything about the overall morphological complexity of the language: e.g. Japanese verbs mark tons of other things, just not person.

1-dimensional person marking

Moving on to higher dimensions. The verb can mark one of its core arguments—shown by the yellow and red dots on the previous map. The marked argument can be consistent, i.e. always the subject or object, or it may depend on more complex criteria.

Subject marking

Let’s use English as an example of subject marking:

param
  Agr = Sg3 | Other ;

lincat
  V,V2 = {s : Agr => Str} ;

lin
  sleep_V  = {s = table { Sg3   => "sleeps" ;
                          Other => "sleep" }} ;

  drink_V2 = {s = table { Sg3   => "drinks" ;
                          Other => "drink" }} ;

So far we have only created an inflection table—nothing in the code says that we have subject marking, just that we have one-dimensional table. But when we add lincats and linearisations for VP and Cl, it will become apparent which argument the verb marks. The following GF code models English (and other red dots on the WALS map):

GF grammar for English

-- English
lincat
  V,
  V2,
  VP = {s : Agr => Str} ;
  NP = {s : Str ; a : Agr} ;
  Cl = {s : Str} ;

lin
  -- UseV : V -> VP
  UseV v = v ;

  -- ComplV2 : V2 -> NP -> VP
  ComplV2 v2 np = {s = \\agr => v2.s ! agr ++ np.s} ;

  -- PredVP : NP -> VP -> Cl
  PredVP np vp = {s = np.s ++ vp.s ! np.a} ;

Object marking

Now, let’s implement the missing categories and functions for a language that marks only object, not subject. According to WALS, this is quite rare but not unheard of; 24 languages of the sample of 378 do that. (They are shown as yellow dots on the previous map.)

To keep it simple, let’s assume that there is a language just like English but object marking.

Transitive	Sg1 subject	Sg3 subject
Sg1 object	I drink-∅ me	the cat drink-∅ me
Sg3 object	I drinks water	the cat drinks water

You see, “I drinks water” would be ungrammatigal in ordinary English, where verbs mark the subject. In contrast, Like-English-but-object-marking puts the -s in I drinks water because of water, not because of I.

Now, the WALS data doesn’t tell me what do these 24 languages mark in an intransitive verb, so I decide that in Like-English-but-object-marking, intransitives have no person marking. (There are other options—more on those later!)

Intransitive	Sg1 subject	Sg3 subject
	I sleep-∅	the cat sleep-∅

GF grammar for Like-English-but-object-marking

-- Like English but object-marking
lincat
  V, VP = {s : Str} ;
  V2 = {s : Agr => Str} ;
  NP = {s : Str ; a : Agr} ;
  Cl = {s : Str} ;

lin
  -- UseV : V -> VP
  UseV v = v ;

  -- ComplV2 : V2 -> NP -> VP
  ComplV2 v2 np = {s = v2.s ! np.a ++ np.s} ;

  -- PredVP : NP -> VP -> Cl
  PredVP np vp = {s = np.s ++ vp.s} ;

Let’s look at the lexical categories, V and V2. The easy case first: V2 is still an inflection table Agr => Str, only this time Agr marks the object. In contrast, the lincat for V doesn’t need a table, just a string!

This change is reflected in VP. It’s clear when you look how to construct a VP: either from a V, which never inflected, or from a V2 and a object—just what is needed to pick out the correct form from the V2. In turn, PredVP is just trivial concatenation, because the subject doesn’t contribute anything to the verb inflection.

Object marking: another option for intransitive verbs

Previously we just had zero marking for intransitive verbs. But there is another strategy that appears in some languages¹:

For transitive verbs, mark the object in the verb.
For intransitive verbs, mark the subject in the same way the object is marked for transitive verbs.

(There is a name for this and I will introduce it soon. Let’s just finish our Like-English-but-… series!)

We established already how transitive verbs work:

Transitive	Sg1 subject	Sg3 subject
Sg1 object	I drink-∅ me	the cat drink-∅ me
Sg3 object	I drinks water	the cat drinks water

Now let’s add intransitive verbs. As you can see, a transitive verb is marked with an -s, when its object is a 3rd person singular NP. Logically then, an intransitive verb will be marked with -s, when its subject is a 3rd person singular NP.

The same logic holds for 1st person. A transitive verb is unmarked, when its object is a 1st person singular NP (me). When me is a subject of an intransitive verb like sleep, then that verb should also be unmarked.

Yes, the subject case for an intransitive verb is now me—I is reserved for transitive verbs.

Intransitive	Sg1 subject	Sg3 subject
	me sleep-∅	the cat sleeps

However, I have ignored NP case so far in my GF examples, and I continue ignoring it. (Apologies if you got excited about novel subject cases!) One thing at a time, and now we’re looking at verbal agreement.

GF grammar for Like-English-but-object-marking-ERG

I realise that the names of my invented languages are getting quite unwieldy. So let’s just call them LEBOM (Like-English-but-object-marking) and LEBOM-ERG, for reasons that will become obvious soon.

-- LEBOM-ERG
lincat
  V, V2, VP = {s : Agr => Str} ;
  NP = {s : Str ; a : Agr} ;
  Cl = {s : Str} ;

lin
  -- UseV : V -> VP
  UseV v = v ;

  -- ComplV2 : V2 -> NP -> VP
  ComplV2 v2 np = {s = \\_ => v2.s ! np.a ++ np.s} ;

  -- PredVP : NP -> VP -> Cl
  PredVP np vp = {s = np.s ++ vp.s ! np.a} ;

We’re back to {s : Agr => Str} for all verbal categories. Unlike LEBOM, where intransitive verbs had always zero marking, in LEBOM-ERG all verbs have some marking, it just varies for which argument. So both V and V2 need an inflection table.

Because V needs a subject to know the correct form, VP needs to be an inflection table too. Just looking at UseV and PredVP, they are exactly like in ordinary English.

In contrast, ComplV2 has a different behaviour: it picks out the right form of the V2 already when forming a VP, using the object’s agreement (v2.s ! np.a). But VP still needs a table, that’s how GF works. So we put the chosen form in a dummy inflection table: all left-hand sides lead to that form.

Subject or object: Differential argument marking

Who says you need to do your argument marking based on its role? Instead, you can mark based on something inherent in the argument.

For instance, if a transitive verb has one animate and one inanimate argument, mark the verb based on the animate argument, regardless or their roles. In the following examples, the verb write agrees with its animate argument whenever it has one.

Animate > Inanimate	Sg1 subject	Sg3 subject
Sg1 object	I write-∅ me	the blog write-∅ me
Sg3 object	I write-∅ the blog	the blog writes the blog

If there are two animate arguments, they can be ordered by person hierarchy, such as 1st > 2nd > 3rd. Here’s an example, where love agrees with the argument that is highest in the person hierarchy:

P1 > P2 > P3	P2 subject	P3 subject
P2 object	you love-∅ you	he love-∅ you
P3 object	you love-∅ him	he loves him

Now, I think you’ve seen enough GF code that you should be able to do this on your own!

I have created a repository at github.com/inariksit/gf-agreement-tutorial, which contains examples to this post and the following posts.

In the directory like-english-but, you find a complete abstract syntax and finished concrete syntaxes for English, LEBOM and LEBOM-ERG. However, Like-English-but-differential-argument-marking (LEBDAM) is just a dummy concrete syntax, so you have a chance to implement it properly.

Some instructions:

Intransitive verbs mark their only argument.
You will need to modify the NP category (and obviously all verbal categories). However, you can still ignore case in NP: you love he and the blog write I are acceptable output, if you only want to concentrate on person marking in the verb.
You don’t need to use animacy or person hierarchy—you can make your system as wild as you want, as long as you can encode it in GF.

If you want feedback, make your solution available to me! You can make a pull request on GitHub or send your code to inari.listenmaa@gmail.com.

And we’re done with 1-dimensional agreement! If I have missed something, please let me know. If it’s about, say, case or clitics or adpositions or word order or wearing a cowboy hat to mark an argument, I’ll be even more interested in it for the 3rd post. There’s no time limit, even if you’re reading this when I have published the 3rd part (so like 2027), a blog post can always be edited.

Soon we will go on to 2-dimensional agreement. But before that, I need to introduce properly “the other strategy” for the alignment of argument marking, which was used in LEBOM-ERG.

Interlude: Nominative-accusative vs. Ergative-absolutive alignment

So far I’ve used the term subject for two things: the only argument of an intransitive verb (subject sleeps), and the doer argument of a transitive verb (subject eats object).

Now, let’s make the definitions a bit narrower. With subject, we refer to the only argument of an intransitive verb, whereas transitive verbs have an agent and a object (also called patient²). Like this:

the cat sleeps
the cat eats the mouse

With these three roles, we have four reasonable³ strategies of marking them. This is called the alignment of argument marking.

All same:
- subject sleep
- agent eat object
Subject and agent same, object different:
- subject sleeps
- agent eats object
Subject and object same, agent different:
- subject sleeps
- agent eats object
All different:
- subject sleeps
- agent eats object

Strategy #2 is called nominative-accusative alignment, and strategy #3 is called ergative-absolutive alignment.

Where do the names come from? All four—nominative, accusative, absolutive and ergative—are originally nominal cases, used to mark the arguments. However, a language doesn’t need to have cases in order to implement one of the strategies; there are lots of ways to treat arguments “same” or “different” that don’t involve case! I’ll rant about the names later, now let’s look at both alignment strategies in detail.

Nominative-accusative alignment

In languages such as English, nominative is the case which you use for subject and agent: she drinks water, she sleeps. Accusative is used for the object: you see her. So that’s why the whole alignment strategy is called nominative-accusative.

To shorten a bit, they are often called “nominative-accusative languages”, and to shorten even more, just “accusative languages”—named after the more marked argument.

Keep in mind that no single feature of the picture below is necessary alone, they are just common tendencies.

Ergative-absolutive alignment

In languages such as Basque, the subject of an intransitive verb and the object of a transitive verb are marked with a case called absolutive.

Ni naiz
I am

Inarik ni nauka
Inari has me

See how the first ni is translated as I, and second as me. That’s absolutive case in action. In contrast, the -k after Inari (that’s my name, not some obscure Basque word), that’s the marker for ergative case.

Let’s translate now “Inari is” and “I have Inari”:

Inari da
Inari is
Nik Inari daukat
I have Inari

Pretty solid ergative-absolutive alignment from the NPs! Notice also how the verb indexes the absolutive argument (S or O): naiz and nauka for 1st person singular, da and daukat for 3rd person singular. If you just looked at the translations of “Inari is” and “Inari has me”, you wouldn’t find any similarities in the verb forms, da and nauka.

Languages that behave like this are called “ergative-absolutive languages”, or “absolutive-ergative languages”. (Why the two orders? Ergative-absolutive mimics nominative-accusative in that the agent case comes first, and absolutive-ergative in that the unmarked case comes first.) However, everyone agrees on the short name, “ergative languages”—again, named after the marked case.

Terminology notes

As you’ve seen, these terms conflate features from nominal morphology, verbal person marking and even syntax, whereas in the real world, they don’t need to appear in just those combinations.

For example, Somali is a nominative-accusative language with a marked subject case and unmarked object case. The name “accusative” means “marked object case”, so technically Somali’s object case isn’t an accusative (well, and a few other reasons which this margin is too narrow to contain). Despite Somali not having accusative as a nominal case, it still is a nominative-accusative language as for its person marking alignment.

So if you’re new to linguistics, try not to get confused! To finish this digression, here’s an absurd example to remind you that despite the names, person marking alignment doesn’t need to rely on cases.

2-dimensional person marking

Back to the regularly scheduled program! By now you’re surely tired of pretending that English has morphology, so we’re graduating from Like-English-but-… to actual languages.

I will show one nominative-accusative language, and one ergative-absolutive language—not because there is a huge difference, but rather because there isn’t.

There are, in fact, fewer possible 2-dimensional systems than 1-dimensional, because we must include both arguments. For 1-dimensional systems, we had 3 main classes: a) always mark subject, b) always mark object or c) it depends. For 2-dimensional systems, we have one main class, “always mark both”, and variation is more fine-grained, like “mark both arguments in the verb inflection, but use a weird case for one of them”.

Nominative-accusative: Hungarian

Here’s a grammar of simplified Hungarian—just enough morphology to illustrate how to deal with 2-dimensional agreement.

param
  Agr = Sg1 | Sg2 | Sg3 | Pl1 | Pl2 | Pl3 ;
  Definiteness = Def | Indef ;
  Case = Nom | Acc | Dat ; -- Most cases omitted here

lincat
  V,
  VP = {s :                 Agr => Str} ;
  V2 = {s : Definiteness => Agr => Str ; c2 : Case} ;
  NP = {s : Case => Str ; a : Agr ; d : Definiteness} ;

lin
  sleep_V = {s =
    table { Sg1 => "alszok" ;
            Sg2 => "alszol" ;
            Sg3 => "alszik" ;
            Pl1 => "alszunk" ;
            Pl2 => "alszotok" ;
            Pl3 => "alszanak" }
  } ;

  drink_V2 = {
    s = table { Def   => table {
                           Sg1 => "iszom" ;
                           Sg2 => "iszod" ;
                           Sg3 => "issza" ;
                           Pl1 => "isszuk" ;
                           Pl2 => "isszátok" ;
                           Pl3 => "isszák" } ;
                Indef => table {
                           Sg1 => "iszok" ;
                           Sg2 => "iszol" ;
                           Sg3 => "iszik" ;
                           Pl1 => "iszunk" ;
                           Pl2 => "isztok" ;
                           Pl3 => "isznak" }
              } ;
    c2 = Acc ;
    } ;

  UseV v = v ;

  ComplV2 v2 np = {s =
    \\subjAgr => v2.s ! np.d ! subjAgr
              ++ np.s ! v2.c2
    } ;

  PredVP np vp = {s = np.s ! Nom ++ vp.s ! np.a} ;

Now transitive verbs have inflection tables in two layers, as you can see in drink_V2. The first choice is made at ComplV2, when the verb gets an object.

ComplV2 v2 np = {s =
  \\subjAgr => v2.s ! np.d ! subjAgr
            ++ np.s ! v2.c2
  } ;

The second choice is made in PredVP, when the verb gets a subject:

PredVP np vp = {s = np.s ! Nom ++ vp.s ! np.a} ;

If these were in one function, you would notice the pattern more clearly. If we add a function that takes a subject and the object at the same time, then it would look like this:

-- MakeSentence : NP -> V2 -> NP -> S ;
MakeSentence subj verb obj = subj.s ! Nom
                          ++ verb.s ! obj.d ! subj.a
                          ++ obj.s ! verb.c2 ;

As usual, there is runnable code for this little exercise grammar at github.com/inariksit/gf-agreement-tutorial. The actual grammar is quite different from this blog post, because I implemented a few things unrelated to verbal agreement, and moved some things to opers.

Ergative-absolutive: Basque

This is the final example of this blog post: a simplified version of Basque transitive verbs. (Home assignment is to implement Basque ditransitive verbs!)

param
  Agr = Sg1 | Sg2 | Sg3 | Pl1 | Pl2 | Pl3 ;
  Case = Abs | Erg ; -- Most cases omitted here

lincat
  NP = {s : Case => Str ; a : Agr} ;

  VP = {s :        Agr => Str ; sc : Case} ; -- subj/agent case
  V  = {s :        Agr => Str} ;
  V2 = {s : Agr => Agr => Str} ;

V and V2 are just like in Hungarian. However, VP needs something extra due to ergativity: namely, the case of its future subject/agent. Trees don’t track their origin automatically, so without the parameter sc, the VP wouldn’t know if it came from an intransitive verb (via UseV) or a transitive verb (via ComplV2), and hence whether it has a subject or an agent.

Next, let me show some verbs in action. The real code is at the usual place and will be linked later, let’s just imagine for a moment that allomorphs and stem changes don’t exist. The code below produces forms like lo egin gara ‘we sleep’, edan dituzu ‘you drink them’, edan zaitut ‘I drink you’ and so forth.

lin
  sleep_V = {
    s = \\subj => "lo egin" ++ case subj of {
                      Sg1 => "naiz" ; Pl1 => "gara" ;
                      Sg2 => "zara" ; Pl2 => "zarete" ;
                      Sg3 => "da"   ; Pl3 => "dira" }
    } ;

  drink_V2 = {s =
    \\obj,ag => "edan" ++ (prefix ! obj + suffix ! ag) ;
  } where {
      prefix : Agr => Str = table {
        Sg1 => "nau"   ; Pl1 => "gaitu" ;
        Sg2 => "zaitu" ; Pl2 => "zaituzte" ;
        Sg3 => "du"    ; Pl3 => "ditu" } ;

      suffix : Agr => Str = table {
        Sg1 => "t"  ; Pl1 => "gu" ;
        Sg2 => "zu" ; Pl2 => "zue" ;
        Sg3 => []   ; Pl3 => "te" } ;
    } ;

Finally, the syntactic functions look as follows.

UseV v = v ** {
  sc = Abs ; -- Comes from V, has absolutive subject
  } ;

ComplV2 v2 np = {
  s = \\agAgr => np.s ! Abs ++ v2.s ! np.a ! agAgr ;
  sc = Erg ; -- Comes from V2, has ergative agent
  } ;

PredVP np vp = {
  s = np.s ! vp.sc ++ vp.s ! np.a ;
  } ;

Nothing exciting to see. ComplV2 picks the right object agreement in v2 and leaves the agent open. PredVP completes the job and picks the right form, now that it has an agent. It also picks the right case from the NP, using the VP’s sc field. Thus we get the following sentences:

Agreement: PredVP i_NP (UseV sleep_V)
AgreementEus: ni lo egin naiz

Agreement: PredVP i_NP (ComplV2 drink_V2 water_NP)
AgreementEus: nik ura edan dut

If you read through the interlude expecting some mindblowing grammar tricks due to ergativity, you’ll be disappointed. The only thing we did was to add a subject case to VP, but that’s what we do in other languages as well. In the actual Hungarian implementation I have added a verb that takes its subject in dative. (Well, the difference is that the subject case is present already in the verb, not just VP.)

So, now you can go and check out the full grammar for Basque! If you get bored when waiting for the next posts, you can always take up the challenge of implementing ditransitive verbs, with their 3-dimensional agreement.

WALS article about person marking: wals.info/chapter/102
WALS article about alignment of person marking: wals.info/chapter/100

Footnotes

In case you’re wondering what “some languages” means, here’s a WALS map. Pink shapes are different varieties of the strategy mark transitive object same as intransitive subject (at least some of the time). ↩
Sometimes you see the whole trio called by semantic rather than syntactic roles: experiencer sleeps, and agent eats patient. A common combination is Subject, Agent and Patient. In this post, I use object instead of patient, because I want to introduce as little new terminology as possible. But if you e.g. read the WALS articles, it’s useful to know what they mean with patient, shortened P. ↩
Technically there’s a 5th alternative; subject is different but agent and object are the same. But it would be strange to spend precious morphemes on just the case where it doesn’t make a difference (intransitive subject vs. both transitive arguments) and then not spend them where it is actually important. ↩

tags: gf, linguistics