Uhm, isn't it just that the page claims to be GB2312 but aside from the title is actually 8859-1? You can tell your browser to display it correctly, right?
Posted by MMcM at December 23, 2004 11:15 AMNot that strange; it's being served with metadata suggesting an ASCII-compatible Chinese character set, but the French words (fédératif, démocracie) with accents have been taken from directly a document encoded in the ASCII-compatible ISO-8859-1 character set.
While both encodings _are_ ASCII-compatible, the non-ASCII parts of the two are mutually incompatible. And the result is yet another advertisement for Unicode :-)
(I'm turning into a bit of a character-set pedant lately, as is obvious, I think. Ah well, it's a relatively harmless pecularity.)
Is Uighur a common language among Turks in China or a particular dialect of Turkish? Or is it a synthetic merging of several dialects? I had heard the term before as the name of an ethnicity, not of a language.
Posted by Jeremy Osner at December 23, 2004 01:13 PMUighur is the language of the ethnic group of the same name. It's a Turkic language related to Turkish, Uzbek, Turkmen, Azeri, etc. According to Ethnologue, there are over 7m speakers in China.
Posted by Andrew at December 23, 2004 01:39 PMChinese characters showing up inappropriately is pretty common. An online publisher's catalogue I use as well as PSU library's in-house catalog occasionally toss up Cyrillic or Chinese by mistake.
Completely off-topic, but where did the name "Austria" come from? Sounds like it should mean "southern" from "Austr-", like Australia or the Argentinian currency austral. In fact the OED lists "southern" as a rare meaning of "austrian".
Austria IS south of Germany, but they call themselves Oesterreich (with an umlaut) = "Eastern nation". Is this a fight over the Eastern heritage of Otto the Great -- claimed by both Germany and Ausria?
Posted by John Emerson at December 23, 2004 02:23 PMI, too, too often encounter Chinese charactres instead of non-US letters. It happens ever so often for the Swedish åäö, and often persists despite changing code page No. I suppose it has to do with codes >ASCII 128.
Writing systems for Uyghur could probably fill several theses; nowadays it seems that they mainly use a variety of the Arabo-Persian system, with several additions. LIke Kurdish, for example, they have invented characters for theiar vowels: http://www.omniglot.com/writing/uyghur.htm
Posted by anders at December 23, 2004 02:55 PMTwo comments:
"four of them — Tibetan, Mongolian, Uighur and Zhuang — appear on Chinese bank notes...."
Well, actually quite a few more than that since all Chinese languages and/or dialects have a shared written form. (Traditional versus simplified characters is a separate issue related to governments, not languages).
As for the French accents coming out as Chinese characters, as pointed out by several, this is a well-known side-effect of the myriad character encodings used by computers before Unicode. There is even a Chinese term for it: 亂碼 / 乱码, luan4 ma3, which means "chaotic codes". Though the Japanese term 文字化け, mojibake, is probably more familiar to English speakers.
Andrew Dunbar.
Actually, it's an AP story that dates back to at least December 5.
Although I'm glad that some attention is being given to Beijing's suppression of many languages within China, the article falls prey to myths about Chinese characters, which are at the root of misunderstandings about the nature of the Chinese languages and help support this suppression.
For example:
Chinese dialects are based on the same system of writing.Aaaagh! What the author is saying isn't so different than claiming that Chinese people wrote their languages before they spoke them, which is of course absurd. (DuPonceau noted this problem nearly 200 years ago.) But this is typical of how the myths about characters and languages have confused people, even about what ought to be fairly obvious.
That means that Cantonese speakers in Hong Kong can enjoy subtitled Mandarin movies and Mandarin-speakers can order off Chinese menus in the far west of the country.
I've come across Chinese sites written entirely in Japanese -- using GB2312 encoding! I suppose that the odd Japanese visitor might be take the trouble to tweak the encoding and read the content, but I'm doubtful.
It's amazing how many so-called web site professionals in China are totally ignorant of the need for the correct tag in the head of web pages.
Posted by bathrobe at December 27, 2004 02:27 AMI have just posted a short page on the question of characters as a bridge between dialects at my website. At the risk of sounding as though I'm beating my own drum, I would welcome any comments language hat lovers might have. (The article is only half finished. I intend to add a brief section on the differences among dialects).
The URL is:
http://www.cjvlang.com/Writing/writsys/dial.html
All comments welcomed, and please delete this if it is felt to transgress the rules of this blog.
Posted by bathrobe at December 27, 2004 02:40 AMNo no, I welcome links to interesting entries on other people's blogs! Here's the direct link for anyone who's interested.
Posted by language hat at December 27, 2004 06:04 PM