Tuesday, March 24, 2009

An English Mode for Cyrillic

Among my experiments with alternate orthographies in the past few months has been adapting various scripts to writing English, in a similar way to how the Latin alphabet has been extended to all the languages of the world. While I may aspire to write English phonemically in the Arabic script, Cyrillic is a more plausible writing system to start with. In the custom of the tengwar, I call this an English mode for Cyrillic.

Is there a rational reason to do this? Well, as I learned in China, being able to decipher street signs and place names is of enormous value when traveling in a country, even if your command of the language is poor. Learning a phonemic script can take a week or three for passable use, but learning a language can take years. Russian is a beautiful and elegant language, but realistically I am unlikely to learn more than a smattering of it at any time. Learning to read Cyrillic script in English may prove of value in learning to scan Cyrillic text generally, although it could plausibly increase errors in the decipherment of non-English text among people with a serious interest in learning a Slavic language.

Why would any English speakers write in Cyrillic? I imagine a lost Arctic island, New Sannikov Island, peopled by the descendants of illiterate English whalers marooned in the 17th century or later American prospectors, who only learned to read after a Ukrainian doctor washed ashore. They probably wouldn't write this way, though.

This script is geared most strongly to the Russian values for the letters, although many characters are used in different ways to represent the greater number of vowels of English. I also relied upon Cyrillic characters archaic in modern Russian, and the Cyrillic alphabet used to write Romanian before the 1860s. Romanian, as a Romance language, seemed like it would be an instructive example in using Cyrillic to write a non-Slavic language.

Most of the letters (АБВГДЖЗИКЛМОПРСТФЦЧШЫ) have common values in old Romanian and Russian. With some of the letters (ЕӖЙНУЭЮЯ) the Russian value is preferred. With others (ЅѸХѠЪѢѲ) I leaned to the Romanian sound. And I probably just made stuff up, too. Several of the characters from Romanian, such as Yat, are archaic letters that were reformed out of modern alphabets decades ago. The glyphs for the hard sign and such in Russian are here used for separate vowels.

This system should work reasonably well for transcribing my dialect, a western/mid-southern US dialect that merges lots of vowels, including "Mary-merry-marry," "cot-caught," and "pin-pen." It might work less well if you use more vowels. Dipthongs are represented with digraphs: эи, аи, ои, ѣу

Here are some samples, followed by some Benjamin Franklin quotes; I'll post some longer extracts when I get my keyboard for made-up languages finished.


Аи лыв ын Лѹъвъл, Кынтъки.

"Ѳъ Кѣт ын ѳъ Хѣт" ыз ъ бѹк баи Др. Соис.

Ѳъ рэин ын Спэин стэиз мэинли ын ѳъ плэин.

Ѳэир нэвър ѡъз ъ гѹд ѡор ор ъ бѣд пис.

Аи ѳинк ал ѳъ хэрътыкс аи хѣв нон хѣв бын анъръбл мын. …Ыт ыз нат ту маи гѹд фрынд'з хэръси ѳѣт аи ымпют хыз анэсти. он ѳъ кантрэри, 'тыз хыз анэсти ѳѣт хѣз брат ъпан хым ѳъ кэиръктър ъв хэрътык.

А а Az "a" in "calm"
Б б Bet "b" in "bay"
В в Ve "v" in "vie"
Г г Glagol "g" in "gay"
Д д De "d" in "day"
Е е Ye "ye"*
Ӗ ӗ Yo "yo"*
Ж ж Zhe "zh", the sound in "J'ai," "vision," or "pleasure"
Ѕ ѕ Je "j" in "jay"
З з Ze "z" in "zoo"
И и I "ee", the vowel in "eat"
Й й Short I "y" in "you"
К к Ka "k" in "key"
Л л El "l" in "low"
М м Em "m" in "may"
Н н En "n" in "no"
О о O "o", the sound in "oak"
П п Pe "p" in "pea"
Р р Er "r" in "roe"
С с Es "s" in "see"
Т т Te "t" in "tea"
Ѹ ѹ Uk "oo," the sound in "foot," "pull," and "good"
У у U "oo," the sound in "ooze," "truth," and "boot"
Ф ф Ef "f" in "fee"
Х х Ha "h" in "he"
Ѡ ѡ Omega "w" in "way"
Ц ц Tse "ts" the sound in "pizza"
Ч ч Che "ch" in "cheer"
Ш ш Sha "sh" in "she"
Ъ ъ Yer "u" in "utter"
Ы ы Yeru "i" in "it"
Ѣ ѣ Yat "a" in "at" or "cat"
Э э E "e" in "ever"
Ю ю Yu "yoo" in "you"*
Я я Ya "yaw"*
Ѳ ѳ Theta "th" in "thaw", "they"

*iolated vowels

Monday, March 23, 2009

My experience with alternate orthographies

I think the first non-Latin alphabet I learned to read was Japanese, when I moved there back in 1998. I perused some Japanese in rōmaji books, but when confronted with having to get by in Japan, one of the first things I learned was the kana. It's certainly the most important thing to do if you want to learn a language—to learn the script that best represents its speech and grammar.

Learning the script takes a week or so. Learning the language is much harder, and takes some years and hundreds of hours of study.

When I left Japan, I could get by in ordinary conversations, read the kana, and understood about 300 of the 2,000 or so jōyō kanji.

Rather than flying home, I took a boat from Ōsaka to Shanghai and spent two months travelling around China third class. The Chinese ideographs are the lingua franca of East Asia, and although I could barely speak any Chinese, I was able to get around the country, reading maps and place names and train destinations, deciphering menus in a basic way. It wasn't my understanding of the Chinese language that helped me get around China, so much as my understanding of the script shared between the Chinese and Japanese languages.
  • 東京 means "East Capital." In Japanese, it's pronounced "Tōkyō." You could pronounce it "Dongjing" in Mandarin.

  • 北京 means "North Capital." In Mandarin, it's pronounced "Beijing." You could pronounce it "Hokkyō" in Japanese, although most people don't.

  • 台北 means "Platform North." The capital of Taiwan is pronounced "Taibei" in Mandarin and was "Taihoku" under Japanese administration.

  • 仙台 means "Wizard Platform." It's the Japanese city "Sendai."

I've been fascinated with Central Asia for some time: the steppe, the Altay Mountains, the Gobi Desert, to Siberia and the coldest places on Earth. I went to Tibet on my trip to China, and I'd like to traverse the Silk Road from Beijing, to Mongolia and Kazakhstan, and to Tehran and points south. One of the things I find so fascinating about this region is that the terrain is similar in many ways similar to the dramatic Western landscape of my childhood, the Great Plains and the Rocky Mountains, but has a completely different cultural history which has been recorded for hundreds and thousands of years.

Uyghur is a Turkic language spoken around the Tian Shan, the mountains of heaven, from Kashgar to Ürümqi. Over the centuries, it had been written in Old Turkish runes, a variant of the Sogdian alphabet derived from Aramaic, variants of the Perso-Arabian script, several variants of the Latin alphabet, and Cyrillic. The Old Uyghur alphabet was adapted by the Mongols during the time of Genghis Khan to write Mongolian, and Nurhaci adapted the Mongolian alphabet to Manchu before laying some of the groundwork for the Qing Empire; the Manchu alphabet is one of the world's loveliest scripts.

In other words, Uyghur has been written in most of the dominant scripts of Eurasia, and some original ones besides. I wondered: why shouldn't English be so diversely written? What would English look like if it were written in a truly phonemic orthography? What would it be like to write poetry in a script in which words that rhyme, like "mate" and "weight" were written in a way that looked similar?

I have been a fairly digitally-oriented person in this decade, but I came to be interested in the book as a physical object about a year and a half ago after reading about the surprisingly many tomes bound in human flesh.

In a thousand years, all our DVDs and hard drives will be as blank, meaningless stones. The historians of the future will look back for individual objects to interpret to corroborate what data survives, as they look back to the baked clay tiles the Sumerians scratched, the carved obelisks of Egypt, the ink-scrawled bamboo strips of wartorn China. No book can be more uniquely individual than one bound in the skin of a person, and such a precious work would deserve to be handwritten rather than blogged or printed on an inkjet.

So I got some calligraphy pens to see about writing insular miniscule, perhaps the most attractive script used to write English in the years before Charlemagne made it Europe's standard Latin script. I also set about learning to write in the Read alphabet.

It was interesting going at first, trying to re-discern the phonemes of my idiolect, using guidebooks keyed to Received Pronunciation and northeastern dialects. First I had to figure out that my basically Oklahoma dialect, mixing Western and Southern elements, had undergone all the vowel mergers one might guess, including "pin-pen," "Mary-marry-merry," and "cot-caught."

Writing it has become fairly easy, although most everything there is to read is stuff I wrote myself or things written by people with very different dialects. Except for copying back notes, I have little practical reason to read it, so writing is actually much easier than reading. I want to see what can be done scribing it with some proper calligraphy tools.

While Quikscript is an easy-to-write and attractive script, useful for many purposes, a truly phonemic script has some drawbacks. It's problematic to create a universal English dictionary when dialect and pronunciation varies so. And I wonder if there is a more appropriate way of representing English vowels.

Saturday, March 21, 2009

First Day of Spring 2009

Well, tonight I'm getting married.

March 20 was the Vernal Equinox. I wanted to do it yesterday, but Saturday is more convenient.

Monday, March 2, 2009

Pork brains in milk gravy

Now with only 1170% of your daily cholesterol intake.

(via MR/NC)

Gold pieces & US dollars ...back when there were dragons

It's surprisingly easy to convert between US dollars or other contemporary currency and the currency of D&D and d20 games. The currency of D&D is a fixed weight of metal, in which gold, silver, and copper are convertible between each other at a rate of 10:1. One hundred copper pieces are worth 10 silver or 1 gold.

The measures are a little confusing, because while gold and silver are priced in troy ounces, copper is priced in standard avoirdupois pounds. But in troy, avoirdupois, and apothecaries measures, a grain is a grain is a grain. One troy ounce is 480 grains and one avoirdupois pound is 7000 grains.

Since in D&D there are 50 coins to the pound, each coin weighs 140 grains. That's very slightly more than 9.07 grams, or just less than 1/3 of an ounce (for comparison, a Sacagawea dollar weighs 8.1 grams; four dimes are about 9 g). Get the spot prices for gold, silver, and copper, from Kitco.com, and plug it all into a spreadsheet. Official coinage is somewhat more valuable than scrap and bullion, because it carries official authenticity, has numismatic value, and is scarcer. But the spot price sets its intrinsic value.

Last Friday (February 27, 2009), gold cost $939.60/oz, silver cost $13.23/oz, and copper cost $1.51/lb. So a d20 gold piece is worth $274.05, a silver piece is worth $3.86, and a copper $0.03.

Today's levels are the result of a decade-long runup in commodity prices, which persists for precious metals; older prices are a little more typical. On June 1, 2001, gold was $266.70/oz; silver was $4.58/oz; and copper was $0.745/lb. This means that a d20 gold piece was worth $77.79, a silver piece was worth $1.28, and a copper piece was worth a penny.

Since the d20 system uses standard equipment prices, it's simple to compare game prices to contemporary real ones. A longsword costs 15 gp, or $4,110.75 in today's money; a +1 longsword costs more than a quarter million dollars. A 2-gp backpack costs $548.10, a vial of ink costs $2,192.40, a fishhook costs $3.86, and a day's worth of trail rations is $19.29. A donkey costs as much as a vial of ink, a light riding horse costs $20,553.75, and a pig is worth about eight hundred dollars. An ordinary, skilled 1st-level commoner earns nearly $30k/year. Raising someone from the dead, a spellcasting service too expensive to be generally available, would cost at least $1.49 million, while removing a negative level would merely cost about a hundred thousand dollars.

The currency of Rome, by contrast, used a multi-metal standard. After the currency reforms of Augustus Caesar, the gold aureus was worth 25 silver denarii, 100 bronze sestercii, or 400 copper ases. The bronze sestercii weighed about 25-28 g. The silver denarius weighed about 3.9 g (about $1.66 of silver), and served as the backbone of Roman finance. Roman prices and currency varied greatly.

A century ago, the US used a silver dollar, a coin with 24.057 g actual silver weight. That specie is worth $10.23 in today's money, but an inflation calculator will show that a 1909 dollar is can buy $22.81 worth of goods in 2007 dollars. The word dollar comes from the Austrian "thaler," a Bohemian coin originally minted in 1518. The actual American dollar comes from the Spanish reales de ocho, the piece of eight, which Spain had shipped by the galleon-full from its rich silver mines in Mexico and Peru. At the time of the American Revolution, the states had no currency of their own, and approved the pieces of eight for American commerce until a dollar with the same value could be established. The eight reales coin was often cut into eight bits for change: two bits was thus equivalent to a quarter dollar. The New York Stock Exchange traded in eighths and sixteenths of a dollar until as late as 2001.

Of course, comparing prices between historical periods, much less fantasy worlds, is misleading. Agatha Christie expected to never be rich enough to buy a car, but never too poor to be without servants. And mass production and economic development has made things much, much more affordable, as Brad DeLong notes:

Our goods are not only plentiful but cheap. I am a book addict. Yet even I am fighting hard to spend as great a share of my income on books as Adam Smith did in his day. Back on March 9, 1776 Adam Smith's Inquiry into the Nature and Causes of the Wealth of Nations went on sale for the price of 1.8 pounds sterling at a time when the median family made perhaps 30 pounds a year. That one book (admittedly a big book and an expensive one) cost six percent of the median family's annual income. In the United States today, median family income is $50,000 a year and Smith's Wealth of Nations costs $7.95 at Amazon (in the Bantam Classics edition). The 18th Century British family could buy 17 copies of the Wealth of Nations out of its annual income. The American family in 2009 can buy 6,000 copies: a multiplication factor of 350.

Books are not an exceptional category. Today, buttermilk-fried petrale sole with pickled vegetables and parsley mayonnaise, served at Chez Panisse Café, costs the same share of a day-laborer's earnings as the raw ingredients for two big bowls of oatmeal did in the 18th Century. ...Today we still spend about one dollar in five on food—down from the half of income that Americans spent in 1776. The share hasn't fallen more because some of us buy buttermilk-fried petrale sole with pickled vegetables and parsley mayonnaise cooked, served, and cleaned up by others rather than (or in addition to) oats in the gunnysack. One reason is that the oats-for-five-meals-out-of-six-diet of 18th Century Scots was monotonous, and we are glad to escape it.

Sunday, March 1, 2009

English spelling reform & alternate orthographies

For years, decades, and centuries, writers and such have been complaining about the ridiculousness or unreasonableness of English spelling and writing. The Latin alphabet was adapted to English over a thousand years ago, and wrote it pretty well at the time, with the addition of a number of Germanic runes and Irish letters to represent sounds not found in Latin. Later, these letters (such as thorn, eth, and the ligature of ae) were dropped, and English went through a series of sound changes and vowel shifts. so that now the individual letters used to represent a word bear small relationship to the sounds they are intended to represent. It is a parody of an ostensibly phonemic, alphabetic system.

Spanish has five vowels. It is well served by the Latin alphabet, which has five letters to represent vowels. English has more than a dozen vowels, dipthongs, and two dozen or more consonants, depending on dialect. However, the Latin alphabet only has 26 letters. These days, a vowel-letter can represent nearly any English vowel-sound, in the right context. The letter "A," for example, represents different vowel sounds in "what," "cat," and "father." How is it that the first vowel letter of "women" rhymes with the first vowel letter of "linen," but not with the first vowel letter of "woman"? Why does the spell-checker insist that "correspondance" is incorrect, while "correspondence" is great...does it really matter which letter represents that unstressed vowel, when the sounds that letters represent vary so greatly?

The consonants are bad enough. "Q" has no job that couldn't be done by "K": it's a completely superfluous letter. "C" sometimes sounds like "K," and sometimes sounds like "S." The only sound that is unambiguously represented by "C" uses the digraph "CH," as in "cheese," but which often also substitutes for the sound of "K." Why not just have "K" do its job, "S" do its job, and let "C" represent the consonant at the beginning of the word "cheese"?

The digraph "GH" among the most infamous, since it used to represent a phoneme barely present in modern English. It is said to represent the sound we associate with "F" at the end of words. as in "cough." It represents a "hard G" at the beginning of words such as "ghost." In the word "through," the digraph is silent; the alternate spelling "thru" is unambiguous, understood by everyone, simpler, and almost universally regarded as incorrent for some reason. The tetragraph "ough" is great fun in general.

The guidelines for deriving the sounds of spoken English from the patterns of the letters are of infernal complexity. Here is a list of 56 rules for deciding how to pronounce words based on their spelling, which correctly pronounces about 85% of them; it has trouble with more than one word in 10. The popularly simplified form of "night" is not "nit" or "nait," but "nite." The relationship between letters and sounds is not arbitrary; but perhaps it is more accurate to say that groups of letters correspond to groups of sounds, with numerous, significant exceptions.

Some of these irregularities are a consequence of English's plunder of the world's vocabulary, but many of them have terrible justification. "Island" was originally written as Anglo-Saxon "iland," but an "S" was added just to make it look more like the French word "isle," derived from the etymylogically seperate Latin word "insula." There is no good reason for the "P" in "ptarmigan;" someone tacked a "P" onto the beginning of a Scottish Gaelic word to make it look Greek, and for some reason it stuck as the "correct" spelling. We'd do just as well or better without it.

While it is true that Enlish orthography represents the etymology of and lexical relationships between words well, it is an extraordinarily complex written language that is difficult to learn proficiently. People often regard Asian ideographic characters as extremely difficult because there are thousands of individual glyphs compised of several strokes, each of which ideally represents a discrete idea. But is not English orthography, composing individual words as a unique but seemingly arbitrary sequence of letters, similarly complicated?

Attempts at a more phonemic writing began long ago. In the mid-17th century, Frances Lodwick developed an abugida to write the languages of western Europe, including the vowels of English, French, and Dutch, as a "universal alphabet" in the intellectual currents of the time. Benjamin Franklin, a printer by trade, proposed a reformed alphabet in 1768. But the biggest advance for reformed English spelling came after Isaac Pitman published his system of shorthand in 1837. Quick, legible, and attractive handwriting was very important in the 19th century, after literacy and printing became widespread, but before everyone had a typewriter or computer. Shorthand, which allowed the quick transcription of speech, was developed much earlier, but Pitman's shorthand was phonemic rather than relying on a system of symbols to represent common words. Shorthand became popular among many authors of the time, who, recognizing the advantages of phonemic handwriting, became advocates of phonemic orthography and simplified spelling more broadly. Mark Twain praised Pitman's "phonography" in his 1899 essay "A Simplified Alphabet."

Nineteenth-century spelling reformers such as Alexander J. Ellis took great delight in the absurdity of English writing: "fish" written as "ghoti" is the most famous example, but he described comical respellings of "scissors" as "schiesourrhce," "favorite" as "phaighpheawraibt," and "orthography" as "eolotthowghrhroighway." Or, you write potato, I write ghoughpteighbteau. "Ghoti" is often criticized by the apologists for conventional orthography, but "phoche" illustrates much of the same point.

For many of the 19th-century reformers, the chief cited benefit of phonemic writing is reducing handwriting. They meticulously count the number of strokes needed to write a selected passage and compare it to the number of strokes needed in some preferred system. These arguments seem unconvincing these days, since most of the writing contemporary people do is on a keyboard (also organized in a confusing way to slow down writing). But some research shows it takes children twice as long to learn to read English as it takes children to learn to read other European languages. Partly this is because English syllables are complicated; but this complicated language is written in a very complicated way.

People learn to read by sounding out the sounds of words, and this learning is hampered by the opaque, complicated relationship between letter and sound in English. Fast adult readers have become proficient in quickly sounding out the word in their head. But how can we blame bad spellers, when any particular sound sequence could be written any number of different ways?

As Hieratic and Demotic originated as shorthand for Egyptian hieroglyphics, Cicero's secretary recorded his speeches in the Notae Tironianae, the Japanese syllabary began as simplified phonemic glyphs for the ideographic Chinese characters, and the lower-case Latin letters started as cursive versions of the Roman capitals, shorthand systems can yield important innovation in writing standards. Pitman's shorthand inspired the Deseret alphabet created by Mormons at the University of Deseret in the 1850s.

Another 19th-century advocate of shorthand and phonemic writing was Anglo-Irish playwright George Bernard Shaw, who wrote his plays in shorthand and had them transcribed by his secretary. A Fabian and socialist, Shaw saw how class discrimination on the basis of sociolect inhibited the advancement of working peoples, portraying the abandonment of (rich, historic, and interesting) dialects as desirable in his preface to "Pygmalion"

In his will, Shaw stipulated that a sum of money be set aside to fund the creation of a new alphabet for English, but following his death in 1950, this aim became delayed. After several years, a contest held for the design of the alphabet was won by amateur typographer Ronald Kingsley Read in 1958, and Shaw's "Androcles and the Lion" became the only book published in the Shavian alphabet. Eight years later, Read distributed a reform of the Shavian alphabet, aimed to make the script easier to handwrite and create connected, cursive forms. This script became known as Quikscript or the Read alphabet. The PDF of the Quikscript manual is the place to start; the Read Alphabet Yahoo Group has many invaluable resources.

I started learning Quikscript about a year and a half ago; here's a sample, a near-pangram of my idiolect (missing one dipthong). It says: "With pleasure, the fox is baiting the dog with more chow: a juicy dough of fish oil and hair."

Other alternate orthographies for English include the Pitman Initial Teaching Alphabet , Unifon, and Visible Speech.

Miles per burger

For a long time, I thought of bicycle fuel efficiency in terms of "miles per spaghetti dinner." A few weeks ago, I realized that "miles per burger" was a better acronym: short and accessible, even if spaghetti is a better source of carbs.

Lo and behold, Good magazine uses the metric in their fuel efficiency graph.

No real high-speed rail on the chart tho: just Amtrak.