May 01, 2008

TYPESETTING ARABIC.

A very interesting article by Eildert Mulder about the difficulty of setting Arabic script in type:

The technical problem is this: Arabic letters are generally not written separately but joined to each other in groups or entire words, like a script typeface in English. And though the Arabic alphabet has only 28 letters, most letters have four forms, depending on whether they occur at the beginning of the word, in the middle of the word, at the end of the word, or stand alone. Furthermore, each combination of letters is unique, creating a typographic challenge greater than Chinese. Because all letters connect dynamically with the preceding one, and most also with the following one, the number of unique combinations is almost astronomical.

The esthetic problem comes from the dizzying mutability of written Arabic. For example, there are actually three ways the letter ha can be written in the middle of a word, and the calligrapher’s choice is influenced not only by the letter immediately preceding the ha, but also by the letters earlier in the word, and even by letters that follow it—yet, in whatever form, it is still in essence the ha in the beginner’s textbook. A sequence of letters can run along a baseline the way Roman letters do—though Arabic runs from right to left, of course—or they may start above the baseline and descend in a diagonal if the connections from one letter to the next make that an esthetically pleasing choice.

The result is that the individual letters in a well-written piece of text are in constant motion, like dancers in a polonaise: In the course of the dance, they bow to each other, embrace each other, push each other away, hug each other’s necks and fall at each other’s feet—and there are some real acrobats among them. Thus, well-written Arabic texts feel alive to their readers, whereas mechanically typeset ones feel like graveyards: At their best they are only still photographs of the calligrapher’s living, moving polonaise.
Thomas Milo, Mirjam Somers, and Peter Somers have solved this problem:
Using the calligraphy of Mustafa Izzet Effendi and other great calligraphers, the Milo–Somers team took the concept of script analysis further than either Müteferrika or Mühendisyan, making the basic unit they examined not the letter but the penstroke. That made it possible to derive the dancing, shifting letters, the tens of thousands of combinations, and the variable words all from a few hundred individual penstrokes and a clear and limited set of rules—just the sort of fundamental, tabular information that computers like to use. And with modern computers, it became possible finally to resolve the conflict that has blighted the relationship between Arabic script and book-printing technology for most of five centuries.
Fascinating stuff, with some gorgeous illustrations. (Via MetaFilter.) Posted by languagehat at May 1, 2008 09:57 AM
Comments

That's a fine article.

Müteferrika: Hungarian. Mühendisyan: Armenian. Milo-Somers: Dutch. The article gives the impression that all the most important innovations in Arabic (-script) typesetting were made by foreigners. I confess that that struck me.

Posted by: komfo,amonan at May 1, 2008 12:57 PM

Typesetting Arabic Arabic must be easy. At least using a standard newspaper font of the Naskh http://en.wikipedia.org/wiki/Naskh_%28script%29 type, compared to Arabo-Persian beauties of the Nasta'liq type: http://en.wikipedia.org/wiki/Nasta'liq_script

I have found a couple of pdf converters that handle plain tedious Naskh, but so far, none that manages the lovely Urdu Nastaliq.

Posted by: Lugubert at May 1, 2008 02:49 PM

It's so interesting how complex orthography translates into such settings (I still am utterly fascinated by Chinese typesetting, with its seemingly endless array of kanji), and it's a pleasure to read about it in such an engaging article. Great link!

Posted by: lengli at May 1, 2008 03:47 PM

An interesting overview of the history of Arabic type lives here.

Posted by: komfo,amonan at May 1, 2008 05:10 PM

@ "I have found a couple of pdf converters that handle plain tedious Naskh, but so far, none that manages the lovely Urdu Nastaliq."

Conversion which way - from PDF or to PDF?
I used Nuance PDF Converter Pro 5 to make this PDF:
http://maxqnzs.com/pdfs/Nana.pdf

Posted by: Stuart at May 1, 2008 05:33 PM

Here is another history article from Saudi Aramco World, but 25 years earlier. It mentions the first Arabic book printed with metal type, the 1514 Kitāb ṣalāt al-sawāʻi (Book of Hours). And the first such book printed in the Middle East, the 1706 Kitāb al-Zabūr al-Sharīf (Psalter). And the petition to and 1726 decree by Sultan Ahmed III authorizing secular works.

Posted by: MMcM at May 1, 2008 07:33 PM

MMcM,

can you access the book? Apparently Krek mentions Bratislava and I'd love to know in what context.

Posted by: bulbul at May 1, 2008 08:22 PM

Just snippets, of which only one (of three) is legible enough to tell what it refers to: Arabische, türkische und persische Handschriften der Universitätsbibliothek in Bratislava. I'll try to look up the other two when I happen to be at a library with a physical copy.

Posted by: MMcM at May 1, 2008 10:48 PM

@ Stuart: Such "linear" fonts as yours work fine. It's to pdf, and I think that the ScanSoft PDF Create 3.0 is an earlier version of your Nuance. My PDF Create handles lots of fonts (for Arabic, Chinese, Hebrew, Hindi, Panjabi) without any protests. Adobe Acrobat doesn't even try but hangs immediately.

The problem with proper Nasta`liq fonts is that they are sloping, so the number of possible letter combinations must be enormous. Look at the font name in the wiki article I quoted! Another font that doesn't work is Tibetan Machine, also vertically complicated but in another way.

I need all of them, because I'm trying to explain all non-English words and expressions in Kipling's Kim and write them correctly using language appropriate fonts. I think I've managed some that aren't in the Kipling society's material, and add some illustrations.

Posted by: Lugubert at May 2, 2008 08:10 AM

"I need all of them, because I'm trying to explain all non-English words and expressions in Kipling's Kim and write them correctly using language appropriate fonts. I think I've managed some that aren't in the Kipling society's material, and add some illustrations."

Arre vah! That sounds REALLY interesting. I'd never be able to master nastaliq, devanagari is beyond my impaired handwriting without resorting to the keyboard, but I love the poetry of Urdu, and Kim is my favourite of Kipling's work because his world was the world of my father and his father, and Kipling writes about it the way my Dad told stories of it when I was a kid. Plus, I've long thought that my Dad's cursive Roman looks suspiciously like the nastaliq he learned at boarding school, only slightly less readable.My only beef with Kipling is his shonky transliteration, but then I'm looking at from a Hindi POV not his Urdu one, so I guess I have to allow for that. If access to your work is permitted to a pieriansipist such as I, I would be VERY keen to read it.

Posted by: Stuart at May 2, 2008 08:37 AM

That sounds REALLY interesting

Indeed. Please alert me when it becomes available; if it's online, I'll link to it.

I'd never be able to master nastaliq

I feel the same way. Lovely to look at, but yikes.

Posted by: language hat at May 2, 2008 08:49 AM

I'd love to see the Kim material too.

I well remember the first time I got my hands on an Arabic word processor -- the Xerox Star of the 1980s, the spiritual ancestor of every desktop computer in use today. I didn't (and don't) know any Arabic, so I could only type either copied examples or gibberish -- fortunately, the Star provided an on-screen keyboard so I could find the proper keys. Watching the cursor chug along from right to left was impressive enough in itself. But watching the letters mutate from isolated to initial form, or from medial form to final form, as I typed more letters -- that was really something. They seemed alive on the screen, instead of just lying there as Latin letters and symbols did.

The Star also did bidi, although a less sophisticated flavor than Unicode provides for, and it was a very strange feeling, typing Latin script inside the Arabic and watching the letters slide away leftward from the unmoving cursor. But the ultimate hack was typing "The Arabic for Islam and the Arabs is al-islam wa al-arab" (a sample sentence), selecting the "wa" and changing it to English "and" -- and watching the surrounding Arabic words magically switch places! Spooky actions at a distance indeed.

Finally, as to bogus transliteration, or rather transcription, Kipling was probably mostly trying to be helpful to his monolingual readers, giving them the flavor without confusing them. Doubtless "sati" is a better transcription than "suttee", but unquestionably the latter form has become the English spelling, and its natural pronunciation in English is closer to the underlying Hindi. See also Hat's transcription (ha!) of T.E. Lawrence's attitude toward transliteration (scroll down a bit).

Posted by: John Cowan at May 2, 2008 11:09 AM

1 hr presentation by Thomas Milo

Posted by: caffeind at May 2, 2008 07:16 PM

"See also Hat's transcription (ha!) of T.E. Lawrence's attitude toward transliteration (scroll down a bit)."

Thanks for that. I've only read 7 Pillars once, when I was 15, and clearly missed that delightful gem. I wholeheartedly agree with this:
"There are some 'scientific systems' of transliteration, helpful to people who know enough Arabic not to need helping, but a wash-out for the world. I spell my names anyhow, to show what rot the systems are."
I realised after reading Lawrence's replies that my attitude to Kipling's transcription of hindustani words was exactly the same as that of Lawrence's correspondent toward his transcription of Arabic. Now that I've been able to see that for what it is, and laugh at myself accordingly, I can relish Kipling with unmarred delight. For that, and for reminding me how much I enjoyed 7 Pillars, thank you!

Posted by: Stuart at May 2, 2008 07:49 PM

so pleased to see this addressed here, as well as the previous arabic-related posts. it's even more pleasing to see this being addressed in this manner to solve my hours of frustration in typing arabic on myown computer.

as for lawrence's transliteration, i confess that the introduction to seven pillars in which he outlines the letters between his editor and himself was one of my favourite parts of the whole text, which i otherwise greatly enjoyed。

Posted by: Kellen Parker at May 3, 2008 01:47 AM

Yes, it's completely irresistible.

Posted by: language hat at May 3, 2008 08:44 AM

JC,

I agree that Kipling's "transcription" is quite suitable to English readers. But it took me twoscore years and a few until I realized that the sorceress Huneefa must be named Hanifa, and what that meant, or that Hurree Babu should be Hari Babu.

I have interrupted my Kim work, because my Hindi prof. didn't accept my project for a third semester thesis, the book being in English, not Hindi.

Chapter one is rather completely covered, the rest not particularily detailed. I suppose my 1.5 MB Word file would be readable to people who aboe standard fonts have installed the open SimSun, Tibetan Machine, Nafees Nastaleeq and Raavi fonts (which could be supplied if you don't find them). For Hindi/Sankrit, I use Sanskrit 2003 (also free), because I find it much more pleasing than the (retch) Mangal supplied by MS.

Oh, for many transliterations, TITUS Cyberbit Basic will be needed.

Try this one excerpt:

P آباد ābād ‘a city, building; cultivated etc.’ When added to a noun, it denotes a city or place of abode, as allāhābād ‘the city of Allah’. – This ābād should not be confused with the initial short ‘a’ termination of Persian and Urdu expression like “Zindabad!”, “Mordabad!” (Long live, resp. Down with, or literally, may [X] die), where it is a rest of an earlier Persian optative.

This paragraph has to be heavily amended, referring to the mistranslations of Ahmadinejad's supposed threats and changing ‘the city of Allah’ to, like, Ilahabad 'the city of God'. What I have written on urdu/hindi/hindustani has to be severely reviewed as well.

Anyway, if you're sufficiently interested, I'd be quite happy to inform you of the current project state and supply my humble effort file. Just (before 1 June, when that supplier vanishes) write a few lines to aring at rixmail dot se.

Posted by: Lugubert at May 3, 2008 03:22 PM

MMcM,

thanks, I figure the Bašagić collection (which also includes prints) would be there.

Posted by: bulbul at May 3, 2008 04:21 PM

Fascinating post, thanks.

Posted by: beth at May 7, 2008 09:39 AM