CARON.

A longstanding mystery has just been solved for me. Every time I look down a list of unicode characters (e.g., this one), I see something like “Latin Capital Letter S With Caron” (next to Š) and think “That’s not a ‘caron,’ that’s a haček.” I always meant to look it up and find out where they got “caron,” but never got around to it. Now John Cowan‘s post on the stability of standard names has brought to my attention Unicode Technical Note #27: Known Anomalies in Unicode Character Names, which is almost as much fun as a collection of newspaper corrections:

In this document we list all Unicode character names with known clerical errors in the spelling of their names at the time of its writing. In addition, we have compiled information on many misnamed characters, misleading character names, and characters with other known problems with their names.
Because Unicode Standard is a character encoding standard and not the Universal Encyclopedia of Writing Systems and Character Identity, the stability and uniqueness of published character names is far more important than the correctness of the name… The authors therefore intend this Technical Note to serve as a convenient summary of the information about character name anomalies in the Unicode Standard at the time of its writing.

And alongside embarrassments like “LATIN SMALL LETTER OI” (“should have been called letter GHA”) and “TAMIL SIGN VISARGA” (“This character is the aaytham”), we find:

The “caron” should have been called hacek and combining hacek. The term “caron” is suspected by some to be an invention of some early standards body, but it has also been claimed by others to have been in use at Linotype before the days of digital typography. Its true origin may be lost in the mists of time.

How wonderful! Does anybody know anything more about this mysterious “word”?

Comments

  1. And here I was, thinking it’s a perfume!

  2. Or one of the names of Pushkin’s favorite writer!

  3. You may have seen the series of BabelStone posts (1 2 3) from around the same time as that note and giving some more details and background. Nothing on your caron question that I recall, though.

  4. I have always wondered why you English speaking people had to adopt the Czech name for ” ˇ “, i.e. “háček”. Why not use the wonderful Slovak word “mäkčeň”?

  5. I thought it was the NESN studio host for Red Sox games.

  6. michael farris says

    The hacek is my favorite diacritic ever (narrowly winning over the cedilla, and I detest calling it a “caron” (sounds like a relative of the cubic zirconia). I think maybe part of the problem in English is that the word needs a … hacek, as in háček. Is it ever spelled hachek?
    mäkčeň is also a nice word, but requires to haceks and an umlaut, so that’s even less likely to be used, makcen looks more like [‘maksn] or even [‘mAksn]. And if memory serves the Czechs did come up with it first. I often wish that Polish had dumped its archaic sz,cz,rz, and ż for haceked letters (according to my boss, this was more or less seriously considered at one time).

  7. According to German Wikipedia, other names are Slovenian strešica, Croatian kvačica, Finnish hattu—the latter being simply the word for ‘hat,’ which of course endears it to me and makes me want to suggest we ditch haček with its annoying haček and adopt hattu instead.

  8. Funny you should mention this today when I was just looking for the names of Croatian diacritics! The Caron perfume manufacturing company is of course named after a person (not Leslie Caron!), and as Wikipedia mentions that the caron is used in mathematics, my first thought was that perhaps there was a Caron who was a mathematician (or linguist, or typesetter..). However, having also discovered in a text referring to the use of the caron to indicate tones in Mandarin that it’s opposite (which I would call a circumflex) is called the macron, I wondered what that derived from (Greek?) and found that Wikipedia calls this a CARET! Which, when used as a proofreading symbol, indicates something to be inserted, i.e. missing, from Latin, carre, to lack (it says, 3rd person sing. present tense, it is lacking). But, while there seems to be a link between caret (the “hat”) and caron (inverted “hat”) (not language hat) I don’t see where the -on comes from.

  9. Jim Tucker says

    My guess is “caron” is Romanian for “corona,” which could easily be a Romance name for this mark.

  10. Jim Tucker says

    Oh ok, the heck with Romanian. Something like that though. The Finns put a hat on it, like the guy upstairs said. I say vive le roi.

  11. I’ve always loved the name Caron, because she was so adorable in Gigi… but as an accent?? I’ve always wondered where ‘they’ got that spare name for the well-known hacek, and this is the perfect solution: it’s just some made-up geeky thing like octothorp, not a real name at all. Thank you for that.

  12. I just realized that a macron (indeed from Greek, makros, and indicating long vowels) is flat, falling in-between the hat-shaped caret and saucer-shaped caron, or wedge. In Spanish they (carons) are called “cuernitos”, or little horns – just like the sign made with the thumb and little finger to indicate a cuckold, or “cornudo”. Perhaps that supports a possible Spanish influence, mentioned here: http://www.proz.com/kudoz/619623. Or perhaps that is sheer fancy!

  13. LH, do the Finns wear their hats upside down? I’ve heard people use “hat” to refer to the circumflex in mathematical notation.

  14. I dug around in some of my old MLP docs.
    The Xerox Character Code Standard of May 1986 has: “317 Hachek accent = caron”. This is Joe Becker’s work from the Star, one of the direct antecedents of Unicode. (Also, that would be yes to michael farris’ question above on whether it’s ever spelled that way.)
    I do not seem to have an early printout and there were revisions, but indications are that ISO 6937/2 always had it. That is 1983, a year before the earliest cite from the FAQ, but still within their “mid ’80s”. But that would also mean that someone in ISO/TC 97 knew it or invented it.
    Still no help on the actual etymology, I’m afraid.

  15. Is the origin of the hacek in it being a short hand for “z”? (Czechs put haceks where Poles add a “z”). A sort-of paper-saving mark, like the accent-circumflex in French?

  16. Gawain: as far as I know, “mäkčeň” and other diacritics currently used in Slavic languages were invented by the Czech theologian and scholar Jan Hus and laid out in his treatise “De orthographia Bohemica”. Originally, however, he used a dot (punctus rotundus) to mark palatalized consonants and accute accent (gracilis virgula) to mark long vowels (” á é í ó ú “). If I recall correctly, punctus rotundus later developed into a short horizontal line mostly due to reasons of technical nature (the difficulty to write a dot using quills) and finally into the shape it has now.

  17. Czechs put haceks where Poles add a “z”
    It’s not just the Czech, thank you very much :o), and it hasn’t always been that way. The most common way of indicating a palatalized consonant was doubling it. For example, a famous letter from the 15. century features words like ” nassih ” (“našich” = “our”) and ” zvessaty ” (“zvešaty” = “hang”).

  18. Poles (like the Chinese!) have to distinguish between pairs of retroflex and palatalized consonants::
    z’ – rz
    s’ – sz
    c’ – cz
    d’ – dz
    (I would put the lines over the letters, but I can’t – why is the page encoding not UTF-8 here?)
    Getting rid of the z in favor of a hacek would possibly further confuse what is already one of Europe’s most painful writing systems, especially where handwriting is concerned.
    You could make a good case for switching Polish over to Cyrillic, with appropriate diacritics where needed. The political implementation of this highly beneficial reform is left as an exercise for the reader.
    Full disclosure: the distinctive hacek and the polish digraphs are extremely handy in automated language identification, so I am loath to see any kind of orthographic pan-slavism take hold.

  19. Paul Clapham says

    The Wikipedia article on “Caron” has a brief note on etymology which leads to this FAQ entry:
    http://www.unicode.org/faq/casemap_charprop.html#14

  20. It’s inexact for typographical uses etc. but we use the word hattu for both circumflexes and haceks in Finnish. When one needs to be exact, those words have been borrowed and can be used. Sirkumfleksi is very rare, though.
    It works well enough in casual language; if we’re talking about maths, the hats go the circumflex way, if typography, they generally go the other way. (Esperantists don’t have it so easy, though.) In addition, you may be amused to know that the tilde, which our linguistic relatives, the Estonians, use in one of their vowels, has been dubbed “the worm”: mato.

  21. Maciej; the page accepts numeric entity references, which is what the major browsers (Safari included) pass to ISO-8859-1-encoded pages if you try to submit form data outside that character set. Yes, it should be in UTF-8.

  22. Roger Depledge says

    According to Logos Dictionary, “karon”, in Cyrillic characters, is the word for “crown” in the Finno-Ugrian language Moksha, spoken in Mordovia. Now to track down the programmer with Mordovian links who chose the term.

  23. John Emerson says

    Jan Hus? Oh My God!
    Not much new here, except the Estonian “katus” (= “roof”), but anyway: http://encyclopedia.thefreedictionary.com/hacek

  24. Erkki, isn’t French more common than Esperanto, even in Finland? I’d imagine typographers do occasionally have to deal with it.
    And now thanks to John Emerson I’m wondering about the construction of Estonian roofs. It seems like they’d develop a lot of leaks when the water collects in the middle.

  25. KCinDC, to ease your worries: Estonian roofs.
    Roger Depledge: no need to look for Mokshan programmer since what you call “karon in Cyrillic characters” is more than likely a Russian word корона (pronounced karOna), which means exactly same thing.

  26. Yes, it occurred to me too some time after posting that French has letters like ê. I believe that I could get away with calling it a hattu-e, but that would depend on the other guy knowing which way the hat goes. The word iss ambiguous between ‘circumflex’ and ‘hackek’, but in everyday language that hasn’t so far been problematic enough to force people to make the distinction.
    Typographers in their work obviously have to use the more exact terms. They could have some of their own, too, but I don’t know about that.

  27. John Emerson says

    After due consideration, I would like to propose that Jan Hus be named Language Hat’s official Christian theologian and heretic. How many Christian theologians or heretics have ever invented a diacritic?
    Some of the Buddhists probably did. They seemed to like that kind of thing.

  28. Bishop Ulfilas was an Arianist theologian and invented a whole alphabet. And Saint Cyril invented an alphabet as well, though I don’t know that there was anything heretical about him. Surely there are other examples.
    Granted, both alphabets were inspired by existing alphabets, but still I think they rate higher than a single diacritic (besides I think both included diacritics).

  29. John Emerson says

    Ulfilas was a heretic too. I guess Hus is out of luck.

  30. michael farris says

    “Getting rid of the z in favor of a hacek would possibly further confuse what is already one of Europe’s most painful writing systems, especially where handwriting is concerned.”
    Polish? Painful? Polish spelling is ridiculously easy (I always found it to be at any rate).
    And the Latin Belarusian alaphabet (more commonly used than Cyrillic before the Soviets) had both haceks and acute accents and it didn’t look cluttered.
    “You could make a good case for switching Polish over to Cyrillic, with appropriate diacritics where needed.”
    The tradition in Cyrillic is not so much to use diacritics but to create new letters (look at all the Soviet created alphabets). And I do think every Slavic language should have standardized Cyrillic and Latin alphabets (I at least think that would be cool).

  31. > Polish? Painful? Polish spelling is ridiculously easy (I always found it to be at any rate).
    Then you are a lucky and unique individual. The written language is full of painful quirks, such as the several duplicates (ch/h, ó/u, rz/ż), with mainly arbitrary rules on when to use which.
    You can see evidence of the Polish people’s struggle with their orthography on any convenient Warsaw wall or viaduct; one of the more pungent expletives requires the correct choice of both ‘h’ and ‘u’ and as the laws of probability predict is misspelled about 75% of the time.

  32. michael farris says

    Only c/ch is really arbitrary, ó usually alternates somewhere with o (samochód – samochodu, mówić – mowa, etc) and rz often enough alternates with r (morze – morski) and when it doesn’t a little knowledge of other slavic languages can come in handy (as in rzeka, rzecz etc).
    nd if you learn the spelling as an adult from the written word first, then it’s not difficult at all (though for some reason I try to extend -ęk when it should be -enk.
    I think some mispellings in grafitti are on purpose and expressive (like the group Defekt Muzgó). I actually think spelling chuj (dick, cock) with ch would look strange in grafitti.
    I think a lot of spelling problems that other people have are the result of an inflexible education system that depends a lot on rote memorization rather than an understanding of the principles of Polish spelling which is (by world standards) pretty straightforward.

  33. There’s also St. Meshrop Mashtots, who created the Armenian alphabet. For further discussion of creation of alphabets by early Christian theologians, see this post at Glosses.

  34. John Emerson says

    OK, how about two awards, one for early Christian theologian-heretics who designed scripts, and one for modern ones? 1400 would be a good cutoff point.
    The Jehovah Witnesses don’t design scripts, do they?

  35. Not so far as I’m aware. The Mormons, on the other hand…

  36. John Emerson says

    I don’t wanna give no award to no Mormons.

  37. Huh. I did not know about the Deseret alphabet.

  38. Really, hat? You surprise me. It’s the kind of thing I’d expect you to know about anyway, but I thought I remembered you from a comment thread on the topic at pf’s late blog a couple of years ago. I posted a link there to this page, which has a pdf of the Book of Mormon in the Deseret alphabet. Anyway, a pleasure to inform, even on so inconsequential a matter.
    Seriously, though, John, I can think of any number of reasons why Hus is a better candidate, but once you’d mentioned Jehovah’s Witnesses I couldn’t very well fail to bring up the Mormons.

  39. Actually, I may very well have read about it at pf’s late (sob) blog and then forgotten it again. I seem to have reached an age at which that can happen. I should always include caveats with statements like that (“I don’t think I’ve seen that before”).

  40. John Emerson says

    The hussite military arm was far superior to the Mormon one. Perhaps we should write that into the qualifications.

  41. It doesn’t look like anyone has come up with a language with such an accent so described. So for now we don’t seem to be any further than the original Unicode FAQ.
    I double-checked Herzog et al., “Some orthographic recommendations”, American Anthropologist 36 (4): 629-631 (1934). (JSTOR) It was important for recommending standard use of this mark in the pre-IPA days. But it does not give any particular name to it.
    The similarity to caret, its mirror, it tempting. But that’s a proofreader’s mark and a Latin verb form. Likewise something having to do with corona ‘crown’. Like the top part of a Jughead emoticon >:o;. But it isn’t clear exactly where the vowels came from or how it was applied to the diacritic.
    The internet isn’t so good at material from twenty-five years ago. Plus there is a lot of noise. Caron is of course a surname. And Carón is Charon. carón in Argentinean slang means ‘cara grande’. carón in Galician is part of the phrase a carón de meaning ‘beside’.
    Interestingly enough, I did find that this same question and more or less the same guesses came up in a Russian lingua-forum a few years ago.
    The Deseret alphabet is mentioned in The City of the Saints (1861) by Sir Richard Burton (of Arabian Nights and Kama Sutra fame). He observes that it is “a stereographic modification of Pitman’s and other systems” and correctly predicts, “it will probably share the fate of the ‘Fonetik Nuz’”. I was disappointed that the chart he included wasn’t reproduced in my cheap paperback edition. It’s different from any linked to above, with the charming slogan, “Learn this alphabet and appreciate its advantages”, along the side. Plus, isn’t it a little insensitive of Omniglot to link to a description on a LDS-debunking site?

  42. Indeed, there’s no need to worry about the Estonian roofs. The word ‘katus’ (‘roof’) is used for the circumflex. For caron, we use ‘haak’ (‘hook’) instead.
    Also, should you ever need Estonian names for any other characters, http://www.eki.ee/letter is your friend.

  43. Mattěj Cepl says

    Being a Czech I always wondered over two things:
    1) way in the world you won’t translate the Czech term háček into English and don’t call the thing “a hook”?
    2) why so many people writing about háček in English take all the pain of writing down č, but they omit acute sign over á — it is háček, not haček. 🙂 Which is actually strange, because in my opinion non-Czechs are generally more struggling with acute sign (how long is long?) then with hooks — of course, the first meeting with a hook is a terrible shock (and we are talking here about quite simple č, what about ř — it is estimated that around third of native Czechs, including me and our former president Václav Havel, are not able to pronounce it properly :-), the name of the famous Czech composer is Antonín Dvořák, not what all American radio hosts make out of his name), but when they got it they usually make it quite easily. However, struggle with length of the vowels is quite often never ending story :-).
    Matěj

  44. Matěj/Mattěj, “hook” isn’t specific enough. It could just as well apply to the cedilla or the ogonek.
    As for why English speakers care about the háček in “háček” more than the acute accent, I’d imagine it’s for much the same reason they care more about the last acute accent in “résumé” than the first: it’s a more important indicator of a difference in pronunciation that’s significant to the English speaker.

  45. 1) English is a whore.
    2) When care it taken, háček is spelled with both diacritics in English. For instance, in the ANSI Z39.47 standard.

  46. michael farris says

    Matěji (wild guess about the vocative),
    One of the main reasons I use hacek or hachek and not ‘hook’ is because it doesn’t much look like a hook to me (it looks like an arrowhead or little v to my eyes). By way of contrast, the Vietnamese tone mark in the words cả or của or the nasalization marker in Polish in się or są look much more like hooks as does the cedilla ç.
    And the tradition in English is mostly to borrow names of diacritics from languages they’re associated with (which is why I wrote cedilla instead of ‘little z’).
    As for ř, I’m curious, what is it exactly that those Czech speakers that don’t pronounce it ‘correctly’ do instead? Do they pronounce it like a plain r? (as in Slovak?) like ž (as in Polish)? something else (what)?

  47. I just happened on this thread and read it with great enjoyment. The good news is that caron is finally in a dictionary; the fifth edition of the AHD has it as an entry. The bad news is that the etymology is [Origin unknown.]. And I’m sorry nobody ever answered michael farris’s interesting question:

    As for ř, I’m curious, what is it exactly that those Czech speakers that don’t pronounce it ‘correctly’ do instead?

  48. David Marjanović says

    ^ AFAIK, they’re generally Moravians, and Moravia is next to Slovakia, so most likely plain [r]…

  49. John Cowan says

    In old books the left arm of the hacek is longer than the right, making it a hook indeed.

  50. All these years later and the OED still doesn’t have an entry; Wiktionary says:

    Etymology unknown; first known use is the United States Government Printing Office Style Manual of 1967, where it apparently referred to an inverted caret. Possibly derived from caret after its similar shape (^), and with -on either from macron or as an augmentative after reanalysis of -et as a diminutive.
    […]
    The term caron gained usage through the computer world, through usage at Adobe and later in Unicode. As such, it is the most common name in many computer environments, whereas some form of háček is more common in linguistic circles.

    I don’t often say this kind of thing, but it’s a stupid and unnecessary word, and the proposed etymology is both plausible and stupid. Why did some idiot decide to run with this misbegotten lexeme?

  51. David Marjanović says

    Saw it somewhere, didn’t know it, and thought “huh, so that’s the proper technical name – I better start using it”, is my guess.

  52. jack morava says

    Jan Hus Rules, OK ?

    [I learned about him, and

    https://en.wikipedia.org/wiki/Jan_%C5%BDi%C5%BEka

    at my daddy’s knee…]

  53. first known use is the United States Government Printing Office Style Manual of 1967, where it apparently referred to an inverted caret

    Page 180, bottom right. I note also that circumflex and caret are shown separately but look identical.

  54. @jack morava: I also grew up with a father who an admirer of the Hussites.

  55. jack morava says

    @ Brett, Thanks:

    This gives me an opportunity to write Jan \v{Z}i\v{z}ka as we say in my native LaTeX: he was also an interesting guy…

Speak Your Mind

*