Starting in Saarbrücken: Information density, complexity and cross-linguistic variation

It’s October already and I have just received my appointment as a guest professor in the SFB (special research unit) on Information density in Saarbrücken. I will teach a class on “Cross-linguistic variation in structural complexity” and I’m excited to learn more about information density and possible applications of existing hypotheses and tools to typological comparison.

Compiling a list of glosses from your glossed examples in a LaTeX document (under UNIX)

If you have many interlinearized examples in your LaTeX documents, you have probably wondered about the best way to handle them. Here are some ideas. There are two potential problems with the glosses: 1) different publishers may have different requirements for how to print them, so transferring glossed examples from one manuscript to another may be difficult. 2) You’ll want to have a list of all the glosses in your document, and it should be complete and consistent. To solve all that, the main strategy is to label all your glosses explicitly as such by using a new command we may call “Gloss”:

\newcommand{\Gloss}[1]{\textsc{#1}}

This command prints all your glosses in small caps. If a publisher requires all-caps instead, you can change this command to:

\newcommand{\Gloss}[1]{\MakeUppercase{#1}}

Then you need to use this command, of course, when interlinearising examples. This might look as follows:

\exg. \dots a mwe \textbf{yur}-yurmiline suku-on nyoo, suku-on ane gyes=an nyoo. \\
  and \Gloss{real} \Gloss{redup}-forget stuff.of-\Gloss{3sg}.\Gloss{poss} \Gloss{3pl} stuff.of-\Gloss{3sg}.\Gloss{poss} \Gloss{tr} work=\Gloss{nmlz} \Gloss{3pl}\\
  \enquote{and he repeatedly forgot his things, his tools for work.}

In order to use this command for your automatic compilation of glosses, it is important to not include any separators in a gloss. As you can see above, separators such as “.-=” are not included in within the wavy brackets of a Gloss argument, but stay outside.

Now open your terminal and type in the following command:

grep 'Gloss' INFILE.tex | tr -s ' .;:\-=()\\' '\n' | grep 'Gloss' | sort -u

I usually use ack instead of grep, but here grep works just fine. Let’s break this down a little: the first “grep” command selects all lines that contain the sequence “Gloss” out of your INFILE.tex document (this command isn’t strictly necessary). The “tr” command translates all occurrences of the characters in the first pair of quotation marks into the newline character “\n”. The next command takes the result of that action and again filters out only those lines that contain the word “Gloss”. And “sort -u” sorts uniquely, that is, it gives you an alphabetical list of your search with duplicates thrown out. The result of this action is displayed at the end of this post.

You can take this list directly and put it into your LaTeX document. If you use the sublime editor you can use multiple cursors to add dashes and semicolons to all entries at the same time. Usually, there will be inconsistencies and typos in your glosses. If you get any open brackets as in “Gloss{xx”, that probably means you included a separator within the brackets and you’ll have to look for those cases. So first use the list to clean up your glosses, run the above command again after each round, and once your list looks perfect, include it in your LaTeX document.

Gloss
Gloss{1du}
Gloss{1excl}
Gloss{1incl}
Gloss{1pl}
Gloss{1sg}
Gloss{1s}
Gloss{2sg}
Gloss{2s}
Gloss{2}
Gloss{3du}
Gloss{3pc}
Gloss{3pl}
Gloss{3sg
Gloss{3sg}
Gloss{3}
Gloss{adv}
Gloss{ad}
Gloss{agr}
Gloss{ana}
Gloss{art}
Gloss{asr}
Gloss{aux}
Gloss{bi}
Gloss{body
Gloss{caus}
Gloss{clf}
Gloss{comp}
Gloss{cond}
Gloss{conj}
Gloss{cons}
Gloss{cont}
Gloss{cop}
Gloss{def}
Gloss{dem}
Gloss{detr}
Gloss{det}
Gloss{disc}
Gloss{dist}
Gloss{dl}
Gloss{dst}
Gloss{es}
Gloss{excl}
Gloss{freq}
Gloss{fut}
Gloss{hab}
Gloss{hesit}
Gloss{impf}
Gloss{incl}
Gloss{incpt}
Gloss{irr
Gloss{irr}
Gloss{it}
Gloss{it}/
Gloss{loc}
Gloss{med}
Gloss{name}
Gloss{nec}
Gloss{neg2}
Gloss{neg}
Gloss{nmlz}
Gloss{np}s
Gloss{num}
Gloss{obj}
Gloss{part}
Gloss{pft}
Gloss{pl}
Gloss{poss1}
Gloss{poss2}
Gloss{poss}
Gloss{pos}
Gloss{pot}
Gloss{pp}
Gloss{prep}
Gloss{prf}
Gloss{prog}
Gloss{prox}
Gloss{prsup}
Gloss{real}
Gloss{recp}
Gloss{redup}
Gloss{res}
Gloss{sbj}

Pretty WALS maps

A pretty map with WALS data, generated by GMT
A pretty map with WALS data, generated by GMT

The World Atlas of Language Structures maps data from typological studies to a world map. In addition to the online version, there is also a program for the local production of maps.

An even prettier map, in SVG format, with the background customised in Inkscape
An even prettier map, in SVG format, with the background customised in Inkscape

However, the options for customisation are limited. I use the free and open command-line tool GMT for the production of linguistic maps. It has awesome tools for all kinds of tasks, including the mapping of symbols from a file of coordinates. Here is a quick guide on how to produce your own pretty WALS map.

  1. Download your data set from WALS in tab-separated values (there is a button just underneath the header). Save it as walsXY.xy, where XY is the WALS feature you want to map.
  2. Remove the metadata lines at the top of the file and the header of the table.
  3. GMT does not distinguish between tabs and other simple blanks. Replace all simple space characters by nothing or a character of your choice.
  4. Start GMT and move to the directory to which you have downloaded your data set and where you want to produce your map.
  5. In the same folder, create a cpt file containing the colors that you want to assign to different values. My wals.cpt file has the following content: (number of WALS value, RGB values).
    1 240/11/0 
    2 0/210/240
    3 240/180/0
    4 28/142/59
    5 28/54/142
    6 90/28/142
    7 211/211/170
    8 0/0/0
  6. Run the following commands in GMT:
    pscoast -R-180/180/-70/80 -JQ7i -K -Ssteelblue > walsXY.ps
    psxy walsXY.xy -R -i5,4,2 -J -O -Sc0.15c -Cwals.cpt >> walsXY.ps
    
  7. For more options, see the documentation of GMT.