Help:Scripts
This wiki can produce its output in several phonemic scripts. In each case, every word is looked up within the wiki itself in order to find the pronunciation. Each script is referred to by its code in the international standard for classifying scripts, ISO 15924.
If you choose any script, a cookie will be set so that pages will display in that script thenceforth. This includes following the links to Alice's Adventures in Wonderland given below.
Details of all the scripts available are on this page.
Contents |
A word about fonts
This wiki attempts to embed the fonts needed. This works currently only with standard Safari, standard Explorer, and extremely recent builds of Firefox (at least version 3.5). If you are running earlier versions of Firefox, or Opera, you will need to download fonts. Each alphabet below will suggest a font you can use.
The Mac can display Shavian and Deseret fonts out of the box, but not Runic, Tengwar, or Unifon.
The scripts
Conventional
This is the ordinary alphabet generally used for English. Selecting this option will show you the underlying text of any document.
For example, here is Alice in the conventional spelling.
The ISO 15924 code for the conventional alphabet is Latn.
Shavian
The Shavian alphabet, an alphabet designed in the 1960s, is the raison d'etre of this wiki. All the entries in the lexicon are written in this alphabet. Turning this option on will show you the lexicon entries as they really are. More on Shavian can be found on Wikipedia.
For example, here is Alice in the Shavian alphabet.
The Mac can display Shavian out of the box. If you can't see Shavian text, try downloading the font Androcles.
The ISO 15924 code for the Shavian alphabet is Shaw.
Deseret
The Deseret alphabet was invented in the mid-1800s by the University of Utah. More information on this alphabet is at deseretalphabet.com.
For example, here is Alice in the Deseret alphabet.
The Mac can display Deseret out of the box. If you can't see Deseret text, try downloading the font Analecta. There is no automatic font embedding yet for Deseret.
The ISO 15924 code for the Deseret alphabet is Dsrt.
Unifon
Unifon is a caseless alphabet designed in the 1950s. More information can be found at unifon.org.
For example, here is Alice in the Unifon alphabet.
If you can't see Unifon text, try downloading the font Constructium. (Its home page is here.)
There is no ISO 15924 code for Unifon. Instead we use Qabu from the standard's private use area, as suggested by Doug Ewell.
Unifon characters do not appear in Unicode. Instead we use the code points given in ConScript.
Runic
The Runic alphabet is not yet fully working on this wiki.
The ISO 15924 code for the Runic alphabet is Runr.
Ewellic
Ewellic is a caseless alphabet designed by Doug Ewell as a secret code while he was in high school. More information can be found at ewellic.org.
For example, here is Alice in the Ewellic alphabet.
If you can't see Ewellic text, try downloading the font Fairfax. (Its home page is here.)
There is no ISO 15924 code for Ewellic. Instead we use Qabe from the standard's private use area, as suggested by Doug Ewell.
Ewellic characters do not appear in Unicode. Instead we use the code points given in ConScript.
Tengwar
Tengwar is an alphabet designed by the linguist J.R.R. Tolkien as a script for his constructed languages. Much more information about these languages is available at Ardalambion.
Our support for Tengwar is based on Omniglot's guide. However, this guide differs. We may alter our support based on that guide at some point.
For example, here is Alice in Tengwar.
If you can't see Tengwar text, try downloading the font Constructium. (Its home page is here.) There may be better fonts out there for Tengwar, and we invite suggestions.
Tengwar characters do not currently appear in Unicode. Instead we use the code points given in ConScript. This will soon change, since Tengwar is due to be admitted into Unicode proper. When it changes, so will we.
The ISO 15924 code for Tengwar is Teng.
Arabic
The Arabic script is used to write not only Arabic, but also other languages such as Persian and Urdu. Formerly, it was also used to write other languages such as Turkish (Osmani) and Malay.
Modern Standard Arabic only has three short vowels /a i u/ and three long vowels /ฤ ฤซ ลซ/, and only the long vowels are usually written; but by the use of diacritics used to mark the short vowels in texts for learners and a diacritic for the glottal stop, it's possible to represent nearly all of the vowel sounds of English at least semi-intuitively.
The conventions used on this wiki for the Arabic script are inspired, in part, by the conventions used for writing Yiddish in the Hebrew script; specifically, this includes using 'alif with diacritics for "a" and "o" sounds (๐จ ๐ญ ๐ช ๐ท) and using `ayin for "e" (๐ง).
In general, use of hamza (the glottal stop diacritic) marks a short vowel; this distinguishes the pairs ๐จ ๐ญ; ๐ช ๐ท; ๐ซ ๐ต; ๐ฆ ๐ฐ. (Sam=ุณุฃูู , psalm=ุณุงูู ; cot=ูุฃูุช, caught=ูุงูุช; pull=ูพุคู, pool=ูพูู; fit=ูุฆุช, feet=ููุช.)
The transformation from Shavian to Arabic is nearly reversible; the main difficulties are that ๐ฐ and ๐ are merged as ู, ๐ต and ๐ข as ู, and ๐ฉ and ๐ณ as ุฃ (and, because of this, ๐ผ and ๐ป as ุฃุฑ), and that the ligatures with ๐ฎ are written with a separate ๐ฎ letter, merging, for example, "merry" (๐ฅ๐ง๐ฎ๐ฆ) and "Mary" (ยท๐ฅ๐บ๐ฆ) as ู ุนุฑุฆ.
The ISO 15924 code for Arabic is Arab.
Braille
Braille is a way of making writing available to the touch by means of a system of embossed dots. It is nowadays probably the most common system of embossed writing, though others (such as the Moon script) are also in use and others (such as New York Point) were also designed but have fallen out of (major) use. Regardless of which method is "best" or "superior", the Braille script certainly has the advantage of the widest use. It is based on a script elaborated by Louis Braille, a Frenchman, in 1821.
Braille traditionally uses a matrix of six dots, two wide by three high, for a total of 64 possible patterns (including the empty pattern) -- enough for the Latin alphabet in one case plus punctuation and a number of meta-characters (for things such as "the next letter is capitalised"). Computers sometimes extend this by adding two dots at the bottom, for a total of 256 possible patterns; among other things, this allows upper-case and lower-case letters to use distinct patterns and to represent more letters with diacritics with specific patterns.
There is no fixed correspondence of Braille patterns to letters; every language is free to make its own assignment. However, most languages using the Roman alphabet base their correspondence on that devised by Louis Braille, and even languages using other scripts often base their assignments on this, too, using phonetic or visual similarity of their letters to a certain Roman letter; in this way, for example, dots 1 and 2 represent not only Latin B not also Cyrillic ะ, Greek ฮ, Hebrew ืึผ, or Arabic ุจ, and even Korean final -B (แธ; ใ ) and Chinese initial B- (ใ ).
My first idea was to base Shavian Braille on this, too, but this quickly got unwieldy due to the large number of vowel signs Shavian has compared to Latin, and also because for many letters, it was not clear what the closest Latin letter would be.
Then I noticed that Shavian, as presented in the Shaw Alphabet Reading Key, has four rows of 10 letters each (10 tall, 10 deep, 20 short), and one method of laying out Braille patterns systematically also starts with four rows of 10 letters each.
This also means that certain pairs of voiced and unvoiced consonants, and certain pairs of "short" and "long" vowels, will have related shapes, identical except for the presence of dot-3 in one letter and its absence in its partner.
I decided to use the first two rows for the short characters, since these tend to have fewer dots (especially the first row, which only uses dots 1-2-4-5); other assignments are also conceivable.
This leaves the eight letters marked "compound" on the Reading Key; unfortunately, of the remaining 24 possible characters only about 3 are typically used for letters or letter combinations in the Braille versions I am familiar with. Still, by scrounging a little and co-opting a punctuation character and a couple more that are typically used as meta-characters, I was able to find one pattern for each Shaw letter.
Braille patterns are included in Unicode (as abstract patterns rather than with specific sounds, since these are so language-dependent); they are included in some fonts but not all.
The ISO 15924 code for the Braille script is Brai.
Known bugs
The abbrevations bug
The words "the", "to", "of", and "and" are represented by a single character in Shavian. This is not the case for the other alphabets, yet the character corresponding to the single consonant used in Shavian gets used in these alphabets too. Fixes for this will come along eventually.
(This is especially important because some scripts, notably Tengwar, have their own such abbreviations which differ.)
Inherent mergers
Mergers which are inherent to Shavian will be replicated in all alphabets. For example, Shavian has the wine/whine merger and the common mode of Tengwar does not; the wiki nevertheless exhibits the merger when displaying Tengwar.
The transliterator
The transliterator can currently only read and write the Shavian and conventional alphabets.
Adding new scripts
All the scripts available work by having a mapping table at Shavian:Script magic/Dsrt, replacing Dsrt with whatever code is appropriate.
Each line in the mapping table looks like
{{Map|1|2|3}}
where 1, 2, and 3 are:
- for a caseless script such as Unifon, 1 is a Shavian letter, 2 is the equivalent letter in the script, and 3 is empty.
- for a cased script such as Deseret, 1 is a Shavian letter, 2 is the equivalent uppercase letter in the script, and 3 is the equivalent lowercase letter.
- for a few magic special cases, 1 is a key, 2 is a value, and 3 is the word "SPECIAL".
All the live mappings are listed at Shavian:Script magic, and the menu on the side of documents is taken from that page. That page is protected, so when your script is ready, ask for it to be added.
The transformations are applied from top to bottom, so if, for example, your script uses a different character for ๐๐ than it does for ๐ and ๐, just put an entry for that sequence at the top.
Scripts we should have, yet we don't
| Name | Code | Notes |
|---|---|---|
| Runic | Runr | Under construction |
| The IPA | Latn-IPA | Would be very useful. |
| Visible Speech | Visp | Also a way of representing arbitrary sounds. A PUA font is available. |
| Pitman i.t.a. | Latn-ITA | Famous and worth having. No idea about fonts or codepoints. (Marnanel mailed Michael Everson to ask about codepoints; apparently they're planned) |
| Cirth | Cirt | since we have Tengwar. In Constructium. |
| There's always Klingon... | Qaak | No idea about available fonts. |
| Some of these... |