Compiling a list of glosses from your glossed examples in a LaTeX document (under UNIX)

If you have many interlinearized examples in your LaTeX documents, you have probably wondered about the best way to handle them. Here are some ideas. There are two potential problems with the glosses: 1) different publishers may have different requirements for how to print them, so transferring glossed examples from one manuscript to another may be difficult. 2) You’ll want to have a list of all the glosses in your document, and it should be complete and consistent. To solve all that, the main strategy is to label all your glosses explicitly as such by using a new command we may call “Gloss”:

\newcommand{\Gloss}[1]{\textsc{#1}}

This command prints all your glosses in small caps. If a publisher requires all-caps instead, you can change this command to:

\newcommand{\Gloss}[1]{\MakeUppercase{#1}}

Then you need to use this command, of course, when interlinearising examples. This might look as follows:

\exg. \dots a mwe \textbf{yur}-yurmiline suku-on nyoo, suku-on ane gyes=an nyoo. \\
  and \Gloss{real} \Gloss{redup}-forget stuff.of-\Gloss{3sg}.\Gloss{poss} \Gloss{3pl} stuff.of-\Gloss{3sg}.\Gloss{poss} \Gloss{tr} work=\Gloss{nmlz} \Gloss{3pl}\\
  \enquote{and he repeatedly forgot his things, his tools for work.}

In order to use this command for your automatic compilation of glosses, it is important to not include any separators in a gloss. As you can see above, separators such as “.-=” are not included in within the wavy brackets of a Gloss argument, but stay outside.

Now open your terminal and type in the following command:

grep 'Gloss' INFILE.tex | tr -s ' .;:\-=()\\?\t' '\n' | grep 'Gloss' | sort -u

I usually use ack instead of grep, but here grep works just fine. Let’s break this down a little: the first “grep” command selects all lines that contain the sequence “Gloss” out of your INFILE.tex document (this command isn’t strictly necessary). The “tr” command translates all occurrences of the characters in the first pair of quotation marks into the newline character “\n”. The next command takes the result of that action and again filters out only those lines that contain the word “Gloss”. And “sort -u” sorts uniquely, that is, it gives you an alphabetical list of your search with duplicates thrown out. The result of this action is displayed at the end of this post.

You can take this list directly and put it into your LaTeX document. If you use the sublime editor you can use multiple cursors to add dashes and semicolons to all entries at the same time. Usually, there will be inconsistencies and typos in your glosses. If you get any open brackets as in “Gloss{xx”, that probably means you included a separator within the brackets and you’ll have to look for those cases. So first use the list to clean up your glosses, run the above command again after each round, and once your list looks perfect, include it in your LaTeX document.

Gloss
Gloss{1du}
Gloss{1excl}
Gloss{1incl}
Gloss{1pl}
Gloss{1sg}
Gloss{1s}
Gloss{2sg}
Gloss{2s}
Gloss{2}
Gloss{3du}
Gloss{3pc}
Gloss{3pl}
Gloss{3sg
Gloss{3sg}
Gloss{3}
Gloss{adv}
Gloss{ad}
Gloss{agr}
Gloss{ana}
Gloss{art}
Gloss{asr}
Gloss{aux}
Gloss{bi}
Gloss{body
Gloss{caus}
Gloss{clf}
Gloss{comp}
Gloss{cond}
Gloss{conj}
Gloss{cons}
Gloss{cont}
Gloss{cop}
Gloss{def}
Gloss{dem}
Gloss{detr}
Gloss{det}
Gloss{disc}
Gloss{dist}
Gloss{dl}
Gloss{dst}
Gloss{es}
Gloss{excl}
Gloss{freq}
Gloss{fut}
Gloss{hab}
Gloss{hesit}
Gloss{impf}
Gloss{incl}
Gloss{incpt}
Gloss{irr
Gloss{irr}
Gloss{it}
Gloss{it}/
Gloss{loc}
Gloss{med}
Gloss{name}
Gloss{nec}
Gloss{neg2}
Gloss{neg}
Gloss{nmlz}
Gloss{np}s
Gloss{num}
Gloss{obj}
Gloss{part}
Gloss{pft}
Gloss{pl}
Gloss{poss1}
Gloss{poss2}
Gloss{poss}
Gloss{pos}
Gloss{pot}
Gloss{pp}
Gloss{prep}
Gloss{prf}
Gloss{prog}
Gloss{prox}
Gloss{prsup}
Gloss{real}
Gloss{recp}
Gloss{redup}
Gloss{res}
Gloss{sbj}

One thought on “Compiling a list of glosses from your glossed examples in a LaTeX document (under UNIX)”