Yanalif 2025 / Jaßälip 2025 / Жаңәліп 2025

Abstract

A novel approach to designing a Latin alphabet for the Qazaq language.

Introduction

The Qazaq language is currently undergoing a transition to a Latin script. The existing official Cyrillic alphabet, being a superset of the Russian alphabet, presents numerous inconsistencies in orthography and phonetics, leading to an unnecessarily extensive alphabet. This project proposes a solution to these challenges by reintroducing the historical Yanalif alphabet, which previously served as the first official Latin alphabet for the language.

Review

Throughout history, the Qazaq language has utilized multiple alphabets. Our focus is on the initial Latin-based Yanalif alphabet, introduced in 1929. This alphabet underwent one modification in 1938 before being entirely abolished in 1940 in favor of a Cyrillic version. Since then, various proposals for reintroducing a Latin script have emerged from both governmental bodies and independent enthusiasts. A comparative analysis is presented, contrasting the current alphabet with the historical Yanalif, the latest official proposal from 2021, and the Qazaq Grammar (QG) community version. We have omitted letters such as ABDEFGKLMNOPQRSTZ due to a consensus regarding their phoneme-symbol mapping.

Cyrillic (official)Yanalif 1929Version (2021)Qazaq Grammar
ӘƏÄÄ
ӨƟÖÖ
ІIIİ
ЫЬYI
ҰUŪU
ҮYÜÜ
И (vowel)-İI
Й (consonant)JİY
ШCŞC
ЖÇJJ
ҒƢĞĞ
Х-HH
ҺHH-
ҢÑŃ
У (vowel)-U-
У (consonant)VUW
В-VV

A significant issue with newly suggested alphabets is the consistent requirement for introducing novel symbols, which subsequently engenders encoding and typography complications. Furthermore, these proposals often attempt a direct mapping of existing Cyrillic symbols to new Latin symbols while preserving the original orthography. This approach perpetuates the same inconsistencies inherent in the current language. While the Qazaq Grammar version partially addresses this problem, its heavy influence from the Turkish alphabet results in complexities, particularly concerning the letters ı I i İ l L in a typographical context.

Proposal

Based on the aforementioned problems, the following criteria have been established for the new Yanalif2025 alphabet:

  1. Consistent spelling rules regarding letters И and У, with the Yanalif1929 version being the optimal choice.
  2. Compactness, achieved by eliminating all non-native phonemes and letters.
  3. Avoidance of new character introductions, maximizing the utilization of existing Latin alphabet letters.

The most suitable option, considering these criteria, involves adopting an existing extended Latin alphabet from European languages. We propose the use of the German alphabet for Yanalif 2025 due to the following advantages:

  1. It comprises 30 letters, sufficient to encompass all characters of the new alphabet.
  2. It features vowels with umlauts that possess a one-to-one correspondence with Qazaq alphabet phonemes.
  3. As an official alphabet in multiple European countries with substantial populations, it ensures robust support in typography, encoding, and various electronic devices such as keyboards, smartphones, and computer software.

Additionally, transliteration rules should be developed to manage situations where only 26-letter Latin symbols are permissible, such as in international passports.

This section presents a comparative table of the old and new Yanalif versions, alongside transliteration rules and illustrative examples.

Cyrillic (official)Yanalif 1929Yanalif 2025Translit RulesExamples
ӘƏÄAHälem/ahlem
ӨƟÖOHöner/ohner
ІIIItemir, bilim
ЫЬWWtabws, wdws
ҰUUUtumar
ҮYÜUHkün/kuhn
И--IY/WY
ЙJYYtoy
ШCCC/SHcam/sham
ЖÇJJjol
ҒƢXX/GHxalam/ghalam
Х-HH/KHhan/khan
ҺH--
ҢßNHteßge/tenhge
У (vowel)---
У (consonant)VVVsavda
В---

Fundamentally, our modifications to Yanalif1929 entail the following alterations:

Full version

Yanalif 2025 alphabet chart

Converter, Orthography and Loanwords

Despite the distinct orthography based on Yanalif1929, the current linguistic landscape exhibits a significant presence of foreign words directly adopted from Russian. To address this, we are developing an automatic conversion tool that employs character-based detection to infer the etymology of words, categorizing them as either native or borrowed. For loanwords, a set of simple mapping rules is applied for conversion, and they are capitalized to highlight their non-native origin. Conversely, the primary challenge in converting native words lies in accurately transcribing the vowels И and У. The formalization of this conversion process necessitated an analysis of scanned books and articles published in the 1930s. Given the absence of pre-transcribed texts, an optical character recognition tool was developed by fine-tuning Tesseract OCR. We intend to publish these materials in due course.

Benefits

Leveraging an established alphabet offers significant advantages over creating a new one from scratch. Key benefits include eliminating the need to develop new keyboard layouts, fonts, and text encodings, which greatly accelerates the adoption of the new alphabet.

Controversies

Using letters like W, X, and V for specific Qazaq phonemes might seem counterintuitive. The main argument from opponents of this approach is that non-Qazaq speakers will be confused. However, the design of the alphabet should prioritize the convenience of Qazaq speakers, not foreign speakers, for several reasons:

Online converter

Getting started with the Keyboard