Arabic Typography: An Interactive History.

Vita Nouva has a remarkable interactive introduction to the terrific experience of rendering Arabic typography and its technical debt:

Once upon a time, a frontend ticket landed on my queue which was not properly mine, but the only other Arabic reader on the team was on leave. It went roughly as follows; a block of mixed-content Arabic prose on the customer-facing dashboard was rendering with a ragged left edge (the rag falls on the left in Arabic, since the lines set out from the right margin; the ticket said “ragged right”) when the design team had explicitly specified justified text. Attached were three screenshots from three browsers and a polite note from the product manager observing that the Latin-script version of the same block looked, I quote, “fine.”

The same six months I had closed three other tickets against the same product, each of which had presented to its filer as the only bug. A customer’s name had appeared with its letters unjoined on a printed agreement, the way a sign-painter would have laid them out in 1962, because the PDF library on the receipt server pre-dated the existence of a shaping engine in its language runtime. A search index had been returning empty for accounts the customer service team could see in the database because a 2017 import had encoded twelve thousand names using fossil Unicode codepoints from 1991 instead of regular ones from 1995, and the index, very reasonably, treated the two encodings as different strings, So, that ragged-left ticket was the smallest of the four, HOWEVER, it sat on top of the same iceberg and pointed at the same thing. […]

It did look fine. I spent about half an hour with it, I walked the rendered DOM, I set text-align: justify in so many different combinations of font-family and direction declarations, and at the end of the exercise I wrote a reply explaining, more or less honestly, that the problem was not a bug in our stylesheet but the state of Arabic typography on the web.

The reply took and the closure of the ticket took half an hour or so. The reasons behind it took five hundred years to pile up, and they involve a twice-mutilated vizier, a Qurʾān that vanished for four centuries, a Beirut newspaperman with a deadline, and an Egyptian physician who taught himself font engineering for fun (or that what I imagine about him). Walking through these, ended up to be the most enjoyable couple of weeks in that job, and I want to go through it here too.

Trust me, the resultant story is worth your while, especially if you know or care anything about Arabic and/or coding. I got it via Lycaste’s MeFi post, where there are some good comments:
[Read more…]

Miscellanea 3.

It’s too hot and muggy today to construct a coherent post, so I’ll just toss out some bits and pieces I’ve had sitting around.

1) In some earlier thread I ran across a mention of the widespread Afroasiatic root *lis- ‘tongue,’ which has the following descendants:

• Proto-Berber: *iləs (see there for further descendants)
• Egyptian: ns (see there for further descendants)
• Proto-Semitic: *lišān- (see there for further descendants)
• Proto-Chadic: *lis-um-
   • Hausa: harshè (see there for further comparisons)

Check out the Berber writing systems: Central Atlas Tamazight: ⵉⵍⵙ (ils), ⵉⵔⵙ (irs); Medieval Tashelhit: ايلس (iles).

2) I recently watched the 1925 movie The Big Parade (it’s about American soldiers in WWI; an intertitle reads “THE BIG PARADE/ Men! Guns! Men!/ Men! Guns!”), and at one point one Yank calls to another (per an intertitle): “Yo….Slim!” I’m not sure if this is an antedate; the OED (entry revised 2016) lumps all these senses together: “An exclamation used to attract attention, to express warning, surprise, etc., or to incite or encourage action; hey! Later (colloquial (frequently in African American usage)) used as a greeting or in response to a greeting,” and these are the two WWI-era citations:

1919 Every morning we fall out at six o’clock and yell ‘YO’ when our name is called. We like it.
Marine (Paris Island) 12 March 5

1920 Yo!! Breakfast.
W. B. Ellington, Company ‘A’ 23rd Engineers 112

Meanwhile, Green’s has “1. yes!” (first cite: 1918 [US] D.G. Rowse Doughboy Dope 29: You are assisted in answering ‘Yo!’ to your name by the fortunate knowledge of where it comes on the company roster) and “2. (US) a general term of address” (first cite very late: 1961 [US] (con. 1945) G. Forbes Goodbye to Some (1963) 94: ‘Greetings.’ ‘Yo.’). At any rate, I was struck by the very modern use in a 1925 film (set in 1917).

3) Just now I watched the 1994 movie English, August (based on a 1988 novel of the same name), which is a multilingual delight — it’s about a young Bengali, Agastya Sen, who as a reluctant civil servant is posted to the rural town of Madna (the movie was filmed in Visakhapatnam and the nearby seaside town Bheemunipatnam), and the movie has dialogue in English, Hindi, Bengali, and Telugu, the local language (which poor Agastya/August has to learn for his job by means of lessons based exclusively on official phrases like “The decision will be deferred until next week”). I was surprised to learn that Telugu was spoken so far north on the east coast of India. Also, Visakhapatnam has complicated onomastics:
[Read more…]

Forward Into Foreignness

As soon as I started reading Joseph O’Neill’s “Forward Into Foreignness” (called “Polyglotism” in the paper version of the issue of the New Yorker I was reading; archived), I knew I was going to post it:

In the nineteen-sixties, my father, a Corkman, was employed by Chicago Bridge & Iron, an American corporation that built industrial plants worldwide. He worked in hardhat management positions. An early project took him to Mersin, in Turkey. There, he met my mother. She had just spent a year at Langham Secretarial College, in London. They courted in English, then married at Mersin’s Church of St. Anthony of Padua, the patron saint of lost things.

My mother belonged to Mersin’s well-off Christian community, which was mainly of Syrian origin. This Levantine subculture socialized in French, voiced endearments in Arabic, communicated with functionaries in Turkish. Polyglotism was prized. My mother’s father spoke French, Arabic, Turkish, German, English, Italian, and Ladino. He sent my mother to French-language boarding schools in Lyon and Aleppo. She used French with her four children. We called her Maman and my father Papa. My first word was “attends,” because “attends” was my mother’s invariable response to my cries from the crib.

That was in Neuchâtel, in Switzerland. We kept moving—to Tripoli, in Lebanon; to Amanzimtoti, in South Africa; and to Matola, in colonial Mozambique. Our nanny there, Victoria, chatted to us in the language of Lisbon, and my first ironic remark was made in Portuguese. I was four years old. The remark came in response to my parents turning off my bedroom light. “Muito obrigado,” I said. I added, translating, “Thank you very much.”

[Read more…]

The Science of Bruschetta.

I posted about a video by Taylor “Language” Jones last year, and now he’s got another one I can’t resist sharing, Dear Hank Green, here’s the science of “Bruschetta”. It’s about how we choose which version of borrowed words to say (with detours into what even counts as “borrowed,” e.g. the fake-French “nom de plume”), and it’s one of the few video essays I wish were longer — it’s under fifteen minutes, and I would happily have watched for half an hour if he analyzed a bunch of good examples. One of his trick questions (“leave the answer in the comments if you know it!”) is what the real French phrase is; another is “in what language is bgadim an actual word?” (spoiler: it’s Hebrew). I was distressed that he called /beɪˈʒɪŋ/ (“Beizhing”) the “standard” pronunciation of Beijing, but I can’t in good conscience dispute it. And I’m reminded of my comment here: “The problem, of course, is knowing with whom to use which pronunciation.” (I just thought of a good example: I’ve heard enough Cantonese-speakers call Wong Kar-wai “Wong GAH-wei” that that’s how I say it in my head, but I would never dream of saying that in conversation — I will keep calling him WONG kar-WYE like a normal English-speaker.)

Relatedly, I ran across the name of the luxury clothing line Xuly Bët (or, as they apparently style it, XULY.Bët Funkin’ Fashion Factory), which is said to mean “to open your eyes wide” in Wolof. Amazingly, that is actually correct; Arame Fal’s Dictionnaire wolof-français has “xulli, v. écarquiller les yeux, faire les gros yeux” (Bul xulli xale bi, dafay tiit ! Ne fais pas les gros yeux à l’enfant, il va avoir peur !) and “bët b-, n. oeil.” As I understand the writing system of Wolof, this should be pronounced /ˈxulli bət/; my question, in case any of you have any dealings with luxury clothing lines, is: how do non-Senegalese pronounce this? If I were an English-speaker who knew nothing about Wolof, I might try /ˈzuwli bɛt/, but I can think of all sorts of other possibilities, and I can’t imagine either English- or French-speakers hitting on the correct (i.e., Wolof) one.

Gesha, Geisha, Geshe.

Stone Creek Coffee has a page on a variety I was unfamiliar with, and there’s a linguistic angle:

If you’ve been following our Reserve coffee line-up over the past few years, and just the coffee world in general, you may have noticed a few observations. This year, we had Wildflower Geisha Colombia and Apricot Glaze Geshe Peru; several years ago, we had Cinnamon Blackberry Gesha. These variations aren’t misspellings; they’re different names for the same coffee variety, shaped by history, geography, language, and each producer’s preference. […]

Why Are There Different Spellings?

The variety traces back to Ethiopia, referring to Mount Gesha, which is why many coffee historians consider Gesha the most historically accurate spelling. Somewhere along the way, Gesha also became Geisha, a spelling that stuck in Panama and throughout much of Latin America. And in Peru, you may also, more rarely, see Geshe, particularly among some indigenous growers and producer communities.

But coffee names don’t travel through history in a clean, straight line. Seeds moved between countries. Agricultural records were translated. Names were handwritten, copied, and adapted between languages over decades. There’s also a fascinating etymological layer here. When words move between languages, Amharic to Spanish, Spanish to English, and through Indigenous languages like Quechua along the way, they rarely arrive as perfect one-to-one translations. Sounds shift. Spellings adapt to fit new alphabets and pronunciation habits. A single original word can branch into multiple versions, each shaped by the language that adopted it.

Gesha, Geisha, and Geshe all reflect different moments in that journey. […] At Stone Creek Coffee, we use the spelling the producer uses. If a farm calls the variety Geisha, that’s what goes on our label. If another producer uses Gesha or Geshe, we use that instead.

I’d say that’s a pretty sophisticated analysis for a coffee site. (The Amharic word appears to be ጌሻ.) Thanks, Sven!

Yealms and Broaches.

I’m a sucker for the technical vocabulary of traditional fields, the more obsolescent the better (cf. retting flax), so of course I enjoyed Rukmini Callimachi’s NY Times piece (archived) on the beleaguered master thatchers of olde England and the roofs they thatch:

For the most ardent traditionalists, the only true thatch is “long straw” — typically cereal straw, like wheat, which is threshed to remove the grain — believed by historians to be England’s original roof. Then there’s water reed, the more durable alternative that is increasingly imported from abroad.

For master thatcher Stephen Letch, the difference is unmistakable. The problem is that for almost everyone else it’s undetectable — which is one reason long-straw roofs are going extinct. “There’s 20 or 30 long-straw thatchers left in all of England,” said Mr. Letch, 66, who has spent much of his life trying to preserve this dying art. “We’re the last and we know we’re the last — and we know that once we’re gone, those skills will be lost.” […]

Long before Britain was stitched together by a railway, roofs were made from whatever grew nearby, like heather in the northern highlands and reed near bogs and waterways. Overwhelmingly, though, most areas of the country used straw, a byproduct of the wheat grown to make bread, according to historians. It’s a lightweight material that keeps homes well insulated in the summer heat and the winter cold, but it is also flammable, attracts insects and the spiders that feed on them, and requires costly maintenance.

[Read more…]

Semantic Antics.

Back in 2010 I posted about the death of Sol Steinmetz, rabbi and etymologist; now a longtime LH reader has sent me a copy of his 2008 book Semantic Antics: How and Why Words Change Meaning, and it’s a pure delight. In the introduction, he says:

Changes in meanings make language flexible and malleable. But how do words take on new meanings? The study of meanings and the changes of meaning that words undergo is called semantics (from Greek sēmantikos “having meaning, signifying”). I’ve titled this work Semantic Antics because many English words have changed meaning in fascinating, unusual, and unexpected ways. Those are the words I focus on in this book.
[…]

As a language consultant to the Oxford English Dictionary, I was fortunate to have had access to the OED’s treasury of historical citations, which I used to trace and illustrate the development of meanings discussed in this book.

His very first entry, about “A1,” taught me something I didn’t know; after citing the first use in the sense ‘first-class, outstanding’ in Dickens’s Pickwick Papers (1837) — “‘He must be a first-rater,’ said Sam. ‘A, 1,’ replied Mr. Roker.” — he explains:

Dickens adopted a technical shipping term, A1, and used it figuratively. The shipping term was created by Lloyd’s Register of Shipping, a British publication founded in 1760 by Edward Lloyd to circulate and exchange shipping news among merchants and underwriters. Lloyd published his first Register of Ships in 1764, and in it he devised a system for classifying the condition of every registered ship. In this system, the top classification was A1, the letter A denoting a first-class condition of a ship’s hull, and the number 1, a top condition of the ship’s stores. When shipping merchants would describe a ship’s condition as being “A1,” it was the highest praise they could assign to it, and so inevitably the term passed into figurative use as a synonym of “first-class, excellent.”

And paging through it I see all sorts of entries I look forward to exploring; many thanks, Brian!

Sokoto.

I was looking at the Wikipedia article for Sokoto when I noticed the “Name and etymology” section, which begins: “The name Sokoto (which is the modern/anglicised version of the local name, Sakkwato) is of Arabic origin, representing sooq, ‘market’ in English.” Is this true? If so, how is it derived? Sounds like folk etymology to me, but others will know more.

Green or Gray?

Beth of the Cassandra Pages is an old friend (my wife and I visited her in Montreal in 2004: 1, 2) and it’s always a pleasure to hear from her; she’s sent me a link to You see grēne where I see grœg from a Scottish knitting blog written by Kate Davies, whom Beth calls “a very smart designer,” and while it’s mostly about colors themselves, there’s enough linguistic material I thought I’d bring it here.

Your responses to yesterday’s piece – in which I introduced KC’s fabulous Chingly Yorlin – really interested me. In both the Ravelry group and newsletter comments, many of you suggested that you do not see Chingly as I do – as a greenish-grey – but as very definitely green. […] Whether we see / name a colour as “green” or “grey” can depend on many factors: the physical mechanics of perception, our cultural heritage, our linguistic positioning, and (it is now increasingly clear) our age. […]

As grey is one of those shades which, for many of us it seems, perpetually hovers in an area of chromatic indeterminacy, you may be interested to know that, in some languages, it is among the first colours to be named. In Old English, grœg (grey, grey-ish) is a basic colour term (or BCT) that appears in the language at an earlier date than blue (hœwen) and which is used in a wide variety of contexts in reference to everything from wolves and stones to stormy seas.* [*My discussion of of grey and green in Old English and Old Norse-Icelandic draws heavily on Carole Biggham and Kirsten Wolf’s excellent A Cultural History of Colour in the Medieval Age, volume 2 in Bloomsbury’s Cultural History of Colour series (2021; 2024)]

Gren (grēne) is a BCT that precedes blue in the Old English language too: in reference to freshness or newness, to un-ripe or uncooked things, to glassy gemstones and to metals with a colourful patina, such as copper or brass. Grēne also frequently appears in Old English place names in association with landmarks, property boundaries, and objects in the natural world, such as paths, hills, and trees.

[Read more…]

AI Model for Ancient Papyri.

As anyone who has been following LH for any length of time will be aware, I am no fan of “AI,” but this seems like a situation in which large language models could be of great use; the Austrian Academy of Sciences reports:

The Austrian Academy of Sciences (OeAW) is collaborating with Mistral AI and Sail Reply, a Reply Group Company, on the development of a Large Language Model (LLM) for Ancient Greek: Apollo, named after the Greek god of light and patron of the arts and sciences, will propel research on ancient Greek texts. The model supports advanced searching and automatic text restoration in hundreds of thousands of undeciphered papyri and inscriptions, making it possible to accurately capture content in a matter of hours rather than years. The OeAW and its partners are doing pioneering work, as LLMs have not yet been developed for a historical language evolving over many centuries or the reconstruction of heavily damaged ancient texts.

On behalf of the OeAW, the project is led by Anna Dolganov, an ancient historian and papyrologist at the Austrian Archaeological Institute of the OeAW, who provides field–specific guidance, oversees the integration of relevant sources, and guarantees scientific quality. Through her expertise, Dolganov ensures that historical contextualization and methodological standards are upheld. […]

Anna Dolganov: “Our project with Mistral AI and Sail Reply is building the world’s first advanced multimodal Large Language Model for an ancient language, trained on the largest digital corpus of historical Greek to date. This AI system can be developed in many directions for a wide range of research tasks, from reconstructing fragmentary inscriptions and papyri to conducting semantic and thematic searches across the entire Greek textual tradition to deciphering handwritten texts. For example: there are one million Greek papyri worldwide that have never been read, tens of thousands of which are held by the Papyrus Collection of the Austrian National Library. Such treasures of historical knowledge are our target. This LLM marks the beginning of an exciting journey in the study of antiquity.”

I didn’t realize there were so many unread papyri — if this works as advertised, it could be a boon. Thanks, Martin!