Introduction
The Qazaq language is currently undergoing a transition to a Latin script. The existing official Cyrillic alphabet, being a superset of the Russian alphabet, presents numerous inconsistencies in orthography and phonetics, leading to an unnecessarily extensive alphabet. This project proposes a solution to these challenges by reintroducing the historical Yanalif alphabet, which previously served as the first official Latin alphabet for the language.
Review
Throughout history, the Qazaq language has utilized multiple alphabets. Our focus is on the initial Latin-based Yanalif alphabet, introduced in 1929. This alphabet underwent one modification in 1938 before being entirely abolished in 1940 in favor of a Cyrillic version. Since then, various proposals for reintroducing a Latin script have emerged from both governmental bodies and independent enthusiasts. A comparative analysis is presented, contrasting the current alphabet with the historical Yanalif, the latest official proposal from 2021, and the Qazaq Grammar (QG) community version. We have omitted letters such as ABDEFGKLMNOPQRSTZ due to a consensus regarding their phoneme-symbol mapping.
| Cyrillic (official) | Yanalif 1929 | Version (2021) | Qazaq Grammar |
|---|---|---|---|
| Ә | Ə | Ä | Ä |
| Ө | Ɵ | Ö | Ö |
| І | I | I | İ |
| Ы | Ь | Y | I |
| Ұ | U | Ū | U |
| Ү | Y | Ü | Ü |
| И (vowel) | - | İ | I |
| Й (consonant) | J | İ | Y |
| Ш | C | Ş | C |
| Ж | Ç | J | J |
| Ғ | Ƣ | Ğ | Ğ |
| Х | - | H | H |
| Һ | H | H | - |
| Ң | Ꞑ | Ñ | Ń |
| У (vowel) | - | U | - |
| У (consonant) | V | U | W |
| В | - | V | V |
A significant issue with newly suggested alphabets is the consistent requirement for introducing novel symbols, which subsequently engenders encoding and typography complications. Furthermore, these proposals often attempt a direct mapping of existing Cyrillic symbols to new Latin symbols while preserving the original orthography. This approach perpetuates the same inconsistencies inherent in the current language. While the Qazaq Grammar version partially addresses this problem, its heavy influence from the Turkish alphabet results in complexities, particularly concerning the letters ı I i İ l L in a typographical context.
Proposal
Based on the aforementioned problems, the following criteria have been established for the new Yanalif2025 alphabet:
- Consistent spelling rules regarding letters И and У, with the Yanalif1929 version being the optimal choice.
- Compactness, achieved by eliminating all non-native phonemes and letters.
- Avoidance of new character introductions, maximizing the utilization of existing Latin alphabet letters.
The most suitable option, considering these criteria, involves adopting an existing extended Latin alphabet from European languages. We propose the use of the German alphabet for Yanalif 2025 due to the following advantages:
- It comprises 30 letters, sufficient to encompass all characters of the new alphabet.
- It features vowels with umlauts that possess a one-to-one correspondence with Qazaq alphabet phonemes.
- As an official alphabet in multiple European countries with substantial populations, it ensures robust support in typography, encoding, and various electronic devices such as keyboards, smartphones, and computer software.
Additionally, transliteration rules should be developed to manage situations where only 26-letter Latin symbols are permissible, such as in international passports.
This section presents a comparative table of the old and new Yanalif versions, alongside transliteration rules and illustrative examples.
| Cyrillic (official) | Yanalif 1929 | Yanalif 2025 | Translit Rules | Examples |
|---|---|---|---|---|
| Ә | Ə | Ä | AH | älem/ahlem |
| Ө | Ɵ | Ö | OH | öner/ohner |
| І | I | I | I | temir, bilim |
| Ы | Ь | W | W | tabws, wdws |
| Ұ | U | U | U | tumar |
| Ү | Y | Ü | UH | kün/kuhn |
| И | - | - | IY/WY | |
| Й | J | Y | Y | toy |
| Ш | C | C | C/SH | cam/sham |
| Ж | Ç | J | J | jol |
| Ғ | Ƣ | X | X/GH | xalam/ghalam |
| Х | - | H | H/KH | han/khan |
| Һ | H | - | - | |
| Ң | Ꞑ | ß | NH | teßge/tenhge |
| У (vowel) | - | - | - | |
| У (consonant) | V | V | V | savda |
| В | - | - | - |
Fundamentally, our modifications to Yanalif1929 entail the following alterations:
- The consolidation of Һ and Х into a single letter, "H".
- The remapping of ƏƟYЬJÇƢꞐ to ÄÖÜWYJXẞ, respectively.
- The letter "F" from the Latin alphabet was retained, despite its absence in the original Yanalif.
Full version
Converter, Orthography and Loanwords
Despite the distinct orthography based on Yanalif1929, the current linguistic landscape exhibits a significant presence of foreign words directly adopted from Russian. To address this, we are developing an automatic conversion tool that employs character-based detection to infer the etymology of words, categorizing them as either native or borrowed. For loanwords, a set of simple mapping rules is applied for conversion, and they are capitalized to highlight their non-native origin. Conversely, the primary challenge in converting native words lies in accurately transcribing the vowels И and У. The formalization of this conversion process necessitated an analysis of scanned books and articles published in the 1930s. Given the absence of pre-transcribed texts, an optical character recognition tool was developed by fine-tuning Tesseract OCR. We intend to publish these materials in due course.
Benefits
Leveraging an established alphabet offers significant advantages over creating a new one from scratch. Key benefits include eliminating the need to develop new keyboard layouts, fonts, and text encodings, which greatly accelerates the adoption of the new alphabet.
Controversies
Using letters like W, X, and V for specific Qazaq phonemes might seem counterintuitive. The main argument from opponents of this approach is that non-Qazaq speakers will be confused. However, the design of the alphabet should prioritize the convenience of Qazaq speakers, not foreign speakers, for several reasons:
- It's impossible to represent Qazaq phonemes in a way that non-Qazaq speakers would pronounce correctly. Attempts by Russians to use apostrophes to guide English speakers in softening consonants, for example, have never been effective.
- Even Latin-based alphabets have significant differences in how the same letter is pronounced. Consider how Germans pronounce "Von Neumann" [fon NOY-mahn] or how Spanish speakers pronounce "Jorge" [HOR-heh].
Online converter
Getting started with the Keyboard
- Computers – Just switch to the German keyboard layout
- Phones/Tablets – You can switch to the German layout, or just long-press a, o, u, and s to get ä, ö, ü, and ß.
