Skip to main content

Flat vs. nested structure

The previous examples used a structure where we define each morpheme (e.g. that the root can be "call" or "jump", that the suffix can be "s" or "ing" or "ed", etc.) and then at the end we concatenate all the morphemes together. Let's call that a "flat" structure. (You might also call it a "templatic" structure, but I'm going to avoid that term here because it can mean a lot of things. You might be modeling a language that linguists say has "templatic morphology", but you might not want to structure your grammar that way.)

But there's another way of doing this, where we build up the structure piece-by-piece. It barely matters for an English verb because they're so simple, but for illustration let's consider the noun "counter-reformation". The intuitive way of talking about how this word is formed is by saying we start with "form", then add a piece to get "reform", then add another piece to get "reformation", then another piece "counter-reformation". It might seem more complex at first (because we're going from inside-out, rather than left-to-right), but some things are just easier to express from the inside-out.

For example, certain phonological rules might only apply at particular stages of the construction, and that's hard to express if we're assembling everything at the end, left-to-right, in one fell swoop.

Example: Swahili verbs

We'll use Swahili verbs for illustration. Swahili is the first or second language for more than 100 million people in East Africa, and serves as a regional lingua franca. Swahili verbs are fairly complex overall, but there are parts of the verb system that are relatively straightforward, and we'll start with those.

The first thing to do before you start writing a program is understand exactly what phenonemon you want to model, so that you can make a plan. Let's say our goal is modeling Swahili verbs in the present, perfect, and past tenses, with 1st, 2nd, and 3rd person (I, you, he/she/it) singular subjects. They look like this:

textsubjtenseroot
ninapenda1preslove
unapenda2preslove
anapenda3preslove
nilipenda1pastlove
ulipenda2pastlove
alipenda3pastlove
nimependa1perflove
umependa2perflove
amependa3perflove
ninaona1pressee
unaona2pressee
anaona3pressee
niliona1pastsee
uliona2pastsee
aliona3pastsee
nimeona1perfsee
umeona2perfsee
ameona3perfsee

Study that table for a few minutes to get a good sense of what we need to model. Do you notice any patterns? Here are three patterns we can notice:

  • All the 1st person subject forms start with ni, all the 2nd person forms start with u, and all the 3rd person forms start with a.
  • After these subject markers, you find na in all the present forms, li in all the past forms, and me in all the perfect forms.
  • All of the forms with "love" contain penda, so presumably that means love. (Actually it's pend, and it happens that all of these verb forms have a suffix a at the end. Most verb forms end in a, although a few root/conjugations combinations end in other vowels. We'll ignore that for now, though, and pretend the root is penda.) Meanwhile all the forms with "see" contain ona.

The flat grammar

Here's the "flat" style of grammar that we've been using so far:

Person =textsubj
ni1
u2
a3
 
Tense =texttense
napres
lipast
meperf
 
Root =textroot
pendalove
onasee
 
Verb =embedembedembed
PersonTenseRoot

Remember that an embed header causes the cell below it (say, Person) to be interpreted as containing any form generated by the Person table. That's 3 forms, and then those are concatenated with any of the 3 forms from the Tense table, for 9 possible forms. Then those 9 forms are concatenated with any of the 2 forms from the Verb table. That's 3 x 3 x 2 = 18 possible forms, total... and that's exactly how many forms were in the original table above.

The nested grammar

Here's another way of structuring the same grammar, by building up the stem bit by bit. For example, to build "ninapenda", we would start with the Root penda, then build a TenseStem napenda by adding a tense prefix, and finally building a Verb ninapenda by adding a person prefix.

Root =textroot
pendalove
onasee
 
TenseStem =texttenseembed
napresRoot
lipastRoot
meperfRoot
 
Verb =textsubjembed
ni1TenseStem
u2TenseStem
a3TenseStem

Note that there's nothing special about the order we wrote these in, we could also have listed Verb first and Root last. The important part is the structure of the embeds. Each Verb line embeds the TenseStem table, each TenseStem line embeds the Root table.

Let's look closely at what the first line of the TenseStem table is doing. We'll step through it left-to-right. It's adding text: na to the output, then adding tense: pres to the output, giving us the result {text:na, tense:pres}. And then it hits embed: Root. This cell has two possibilities, so it does both, it takes the current output and adds {text:penda, root:love}, giving us {text:napenda, tense:pres, root:love}, and it ALSO takes the current output and adds {text:ona, root:see}, giving us another output {text:naona, tense:pres, root:see}.

So this line has two outputs, and so do the other lines in TenseStem, so TenseStem has 2 x 3 = 6 outputs.

The same kind of thing is happening in the Verb table. Each line embeds TenseStem (with 6 outputs), and there are 3 lines again, so that's 6 x 3 = 18 outputs -- same as the "flat" grammar, and same as the original table.

Why nested?

This structure is a little more complicated, so why would we want to do it?

  • It's often simpler to describe the construction of words as an inward-out, nested process (like "counter-reformation" above)

  • As mentioned, sometimes phonological rules only apply at certain stages of the construction process. By giving those stages names like TenseStem, we can apply phonological rules to that unit alone (rather than to the whole word).

  • Also, we can apply tests to those units alone. For example, that that unit can generate the form waona but can't generate the incorrect form uona. Being able to test individual units is often easier than testing whole words (because many things can go wrong in a whole word, besides what you're testing for).

  • It lets us handle circumfixes and certain kinds of long-distance dependencies more easily... more on that in the next chapter!