BetaCode for Arabic

6 minute read

The latest version is on GitHub @ github.com/maximromanov/ArabicBetacode

Minor update to the scheme (2015-03-09:10-21)

Done to avoid issues with Alpheios translation alignment, which automatically splits supplied texts into words. Essentially, combinations with “.” and “:” are replaced with “*” and “=” respectively.

  • =t is tāʾ marbūṭaŧ
  • *s is ṣād (and the same for other letters transliterated with dots)

Why BetaCode?

Although both Windows and Mac OS now support Arabic, it is still quite difficult to type and edit Arabic texts. It is particularly frustrating to edit and manipulate fully vocalized texts, since most fonts either render “short vowels” (ḥarakāt) invisible, or do not render them properly. Because of the “stacking,” i.e. “short vowels” being placed on top of letters and on top of each other, it becomes impossible to edit texts and one is often forced to go into delete-and-retype mode (and there is still no guarantee, because of visual issues, that all the letters and “short vowels” will actually be in the right order). betaCode can make it easy to type fully-vocalized Arabic texts on any machine through the use of simple character combinations and automatic rendering into various transliteration schemes and the Arabic script (scroll below for examples).

betaCode is first converted into a one-to-one transliteration scheme, which combines conventions from various academic transliteration schemes. Such scheme is necessary, since none of the existing academic schemes (American/Library of Congress, British, French, German, etc.) allow representing Arabic text unambiguously for computational purposes. Arabic betaCode transliteration can be then converted into any transliteration convention. At the moment the following schemes are implemented:

  • Library of Congress Romanization of Arabic
  • Simplified transliteration (LOC without diacritics)
  • Arabic script (the rules of hamzaŧ orthography are implemented, but may require some additional testing)

NB: The idea of betaCode is borrowed from the Classicists who developed “a method of representing, using only ASCII characters, characters and formatting found in ancient Greek texts”. The current betaCode is inspired by, and is therefore quite similar to, the arabTex scheme. Linguists working with Arabic are commonly using Buckwalter transliteration, which is very similar to the current betaCode, but less readable.

betaCode and One-To-One Transliteration

: betacode : : translit : : Arabic letter :
_a ā alif
b b bāʾ
t t tāʾ
_t thāʾ
^g, j ǧ jīm
*h ḥāʾ
_h khāʾ
d d dāl
_d dhāl
r r rā’
z z zayn
s s sīn
^s š shīn
*s ṣād
*d ḍād
*t ṭāʾ
*z ẓāʾ
` ʿ ‘ayn
*g ġ ghayn
f f fāʾ
*k, q qāf
k k kāf
l l lām
m m mīm
n n nūn
h h hā’
w w wāw
_u ū wāw
y y yāʾ
_i ī yāʾ

Non-alphabetic letters

: betacode : : translit : : Arabic :
ʾ ḥamzaŧ
/a á alif maqṣūraŧ
=t ŧ tāʾ marbūṭaŧ

Vowels

: betacode : : translit : : Arabic :
~a ã dagger alif
u u ḍammaŧ
i i kasraŧ
a a fatḥaŧ
*n ȵ n of tanwīn
*a å silent alif
*w ů silent wāw
?u final ḍammaŧ *
?i final kasraŧ *
?a final fatḥaŧ *

Basic principles:

Every Arabic letter is betaCoded with its one-letter equivalent, preceded (if necessary) with a technical character that is similar to a diacritical mark in the transliterated version. Thus, most common symbols are as follows:

General

  • _ (underscore), if a letter can be transliterated with macron/breve below or above (ā, , , , ū, ī)
  • . (period), or * (asterisk), if a letter can be transliterated transliterated with dot below or above (, , , , , ġ, )
  • ^ (caret), if a letter can be transliterated with caron (ǧ, š)

Specifics

  • attached prepositions/conjunctions and pronominal suffixes must be separated with “-” (mostly relevant for text alignment, treebanking, and general readability):
    • bi-Llah?i
    • fa-_dahaba
  • add “?” before “optional” final vowels that are usually dropped in transliteration and pronounciation (mostly relevant for transliteration):
    • bi-Llah?i , but not:
    • fa-_dahaba
  • tāʾ marbūṭaŧ: add “+” after tāʾ marbūṭaŧ, if the first word of iḏāfaŧ (mostly relevant for transliteration):
    • `_amma:t+u Ba.gd_ada, but:
    • al-`_amma:tu f_i Ba.gd_ada
  • transliterating tanwīn:
    • .n
      • ?u.n
      • ?i.n
      • ?a.n
  • silent wāw and alif:
    • .w (Amr?u.n.w, for <span=”arabic”>عَمْرٌو</span>)
    • .a (wa-fa`al_u.a, for <span=”arabic”>وَفَعَلُوا</span>)

Running the converter

  • (Python 3.xx must be installed on the machine)
  • clone git repository @ github.com/maximromanov/ArabicBetacode
  • save texts that must be transliterated (i.e., the text is in English, but has some Arabic terms that must be transliterated) into ./to_translit/ (follow the format given in the example file).
  • save texts that must be fully transliterated or/and converted into Arabic script (i.e., the entire texts is in Arabic) into ./to_arabic/ (follow the format given in the example file).
  • run the script _generateBetaCode.py (in Mac terminal: python3 _generateBetaCode.py; on Windows: double-click on the script should work).
  • converted texts (in all available modes of conversion) will be appended to the file.
  • if you need to make any changes, edit your initial betaCode text and run the script again, converted results will be replaced with relevant updated versions.

Examples

betaCode Example

NB: These are examples of converting betaCode to full transliteration and Arabic script. The very last paragraph showcases conversion of hamzaŧ in different positions.

q_ala ‘ab_u Mas`_ud?i.n :: ‘an_a qad sami`tu h~a_d_a min ras_ul?i All~ah?i ( .sl`m )

.hadda_ta-n_a `Amr?u.w bn?u R_afi`?i.n , .hadda_ta-n_a `Abd?u All~ah?i bn?u al-Mub_arak?i , `an Mu.hammad?i bn?i ‘Is.h_aq?a , `an Mu.hammad?i bn?i ^Ga`far?i.n , `an `Ubayd?i All~ah?i bn?i `Abd?i All~ah?i bn?i `Umar?a , `an ‘Ab_i-hi , `an?i al-Nabiyy?i ( .sl`m ) na.hwa-hu

‘a_hbara-n_a Qutayba:t?u q_ala , .hadda_ta-n_a Sufy_an?u , `an Ya.hy/a bn?i Sa`_id?i.n , `an ‘Ab_i Bakr?i bn?i Mu.hammad?i.n , `an `Umar?a bn?i `Abd?i al-`Az_iz?i , `an ‘Ab_i Bakr?i bn?i `Abd?i al-Ra.hm~an?i bn?i al-.H_ari_t?i bn?i Hi^s_am?i.n , `an ‘Ab_i Hurayra:t?a mi_tla-hu

Ta.hw_il?u al-hamza:t?i ( kalim_at?u.n mufrada:t?u.n )

‘amr?u.n ‘uns?u.n ‘ins?u.n ‘_im_an?u.n ‘_aya:t?u.n ‘_amana mas’ala:t?u.n sa’ala ra’s?u.n qur’_an?u.n ta’_amara _di’b?u.n as’ila:t?u.n q_ari’i-hi su’l?u.n mas’_ul?u.n tak_afu’u-hu su’ila q_ari’i-hi _di’_ab?u.n ra’_is?u.n bu’isa ru’_uf?u.n ra’_uf?u.n su’_al?u.n mu’arri_h?u.n abn_a’a-hu abn_a’u-hu abn_a’i-hi ^say’?a.n _ha.t_i’a:t?u.n .daw’u-hu .d_u’u-hu .daw’a-hu .daw’i-hi mur_u’a:t?u.n ‘abn_a’i-hi bar_i’u-hu s_u’ila f_il?u.n f_ann?u.n f_unn?u.n s_a’ala fu’_ad?u.n ^surak_a’u-hu ri’_asa:t?u.n tahni’a:t?u.n daf_a’a:t?u.n .taff_a’a:t?u.n ta’r_i_h?u.n fa’r?u.n ^say’?u.n ^say’?i.n ^say’?a.n .daw’?u.n .daw’?i.n .daw’?a.n juz’?u.n juz’?i.n juz’?a.n mabda’?u.n mabda’?i.n mabda’?a.n naba’a q_ari’?u.n tak_afu’?u.n tak_afu’?i.n tak_afu’?a.n abn_a’u abn_a’i abn_a’a jar_i’?u.n maqr_u’?u.n .daw’?u.n ^say’?u.n juz’?u.n `ulam_a’u al-`ulam_a’i al-`ulam_a’a `Amr?u.n.w wa-fa`al_u.a

betaCode converted into one-to-one translit

ḳāla ʾabū Masʿūdỉȵ :: ʾanā ḳad samiʿtu hãḏā min rasūlỉ Allãhỉ ( ṣlʿm )

ḥaddaṯa-nā ʿAmrủů bnủ Rāfiʿỉȵ , ḥaddaṯa-nā ʿAbdủ Allãhỉ bnủ al-Mubārakỉ , ʿan Muḥammadỉ bnỉ ʾIsḥāḳả , ʿan Muḥammadỉ bnỉ Ǧaʿfarỉȵ , ʿan ʿUbaydỉ Allãhỉ bnỉ ʿAbdỉ Allãhỉ bnỉ ʿUmarả , ʿan ʾAbī-hi , ʿanỉ al-Nabiyyỉ ( ṣlʿm ) naḥwa-hu

ʾaḫbara-nā Ḳutaybaŧủ ḳāla , ḥaddaṯa-nā Sufyānủ , ʿan Yaḥyá bnỉ Saʿīdỉȵ , ʿan ʾAbī Bakrỉ bnỉ Muḥammadỉȵ , ʿan ʿUmarả bnỉ ʿAbdỉ al-ʿAzīzỉ , ʿan ʾAbī Bakrỉ bnỉ ʿAbdỉ al-Raḥmãnỉ bnỉ al-Ḥāriṯỉ bnỉ Hišāmỉȵ , ʿan ʾAbī Hurayraŧả miṯla-hu

Taḥwīlủ al-hamzaŧỉ ( kalimātủȵ mufradaŧủȵ )

ʾamrủȵ ʾunsủȵ ʾinsủȵ ʾīmānủȵ ʾāyaŧủȵ ʾāmana masʾalaŧủȵ saʾala raʾsủȵ ḳurʾānủȵ taʾāmara ḏiʾbủȵ asʾilaŧủȵ ḳāriʾi-hi suʾlủȵ masʾūlủȵ takāfuʾu-hu suʾila ḳāriʾi-hi ḏiʾābủȵ raʾīsủȵ buʾisa ruʾūfủȵ raʾūfủȵ suʾālủȵ muʾarriḫủȵ abnāʾa-hu abnāʾu-hu abnāʾi-hi šayʾảȵ ḫaṭīʾaŧủȵ ḍawʾu-hu ḍūʾu-hu ḍawʾa-hu ḍawʾi-hi murūʾaŧủȵ ʾabnāʾi-hi barīʾu-hu sūʾila fīlủȵ fānnủȵ fūnnủȵ sāʾala fuʾādủȵ šurakāʾu-hu riʾāsaŧủȵ tahniʾaŧủȵ dafāʾaŧủȵ ṭaffāʾaŧủȵ taʾrīḫủȵ faʾrủȵ šayʾủȵ šayʾỉȵ šayʾảȵ ḍawʾủȵ ḍawʾỉȵ ḍawʾảȵ ǧuzʾủȵ ǧuzʾỉȵ ǧuzʾảȵ mabdaʾủȵ mabdaʾỉȵ mabdaʾảȵ nabaʾa ḳāriʾủȵ takāfuʾủȵ takāfuʾỉȵ takāfuʾảȵ abnāʾu abnāʾi abnāʾa ǧarīʾủȵ maḳrūʾủȵ ḍawʾủȵ šayʾủȵ ǧuzʾủȵ ʿulamāʾu al-ʿulamāʾi al-ʿulamāʾa ʿAmrủȵů wa-faʿalūå

betaCode converted into Arabic script

قَالَ أَبُو مَسْعُودٍ :: أَنَا قَدْ سَمِعْتُ هٰذَا مِنْ رَسُولِ الـلّٰـهِ ( صْلْعْمْ )

حَدَّثَنَا عَمْرُو بْنُ رَافِعٍ ، حَدَّثَنَا عَبْدُ الـلّٰـهِ بْنُ الْمُبَارَكِ ، عَنْ مُحَمَّدِ بْنِ إِسْحَاقَ ، عَنْ مُحَمَّدِ بْنِ جَعْفَرٍ ، عَنْ عُبَيْدِ الـلّٰـهِ بْنِ عَبْدِ الـلّٰـهِ بْنِ عُمَرَ ، عَنْ أَبِيهِ ، عَنِ النَّبِيِّ ( صْلْعْمْ ) نَحْوَهُ

أَخْبَرَنَا قُتَيْبَةُ قَالَ ، حَدَّثَنَا سُفْيَانُ ، عَنْ يَحْيٰى بْنِ سَعِيدٍ ، عَنْ أَبِي بَكْرِ بْنِ مُحَمَّدٍ ، عَنْ عُمَرَ بْنِ عَبْدِ الْعَزِيزِ ، عَنْ أَبِي بَكْرِ بْنِ عَبْدِ الرَّحْمٰنِ بْنِ الْحَارِثِ بْنِ هِشَامٍ ، عَنْ أَبِي هُرَيْرَةَ مِثْلَهُ

تَحْوِيلُ الْهَمْزَةِ ( كَلِمَاتٌ مُفْرَدَةٌ )

أَمْرٌ أُنْسٌ إِنْسٌ إِيمَانٌ آيَةٌ آمَنَ مَسْأَلَةٌ سَأَلَ رَأْسٌ قُرْآنٌ تَآمَرَ ذِئْبٌ أَسْئِلَةٌ قَارِئِهِ سُؤْلٌ مَسْؤُولٌ تَكَافُؤُهُ سُئِلَ قَارِئِهِ ذِئَابٌ رَئِيسٌ بُئِسَ رُؤُوفٌ رَؤُوفٌ سُؤَالٌ مُؤَرِّخٌ أَبْنَاءَهُ أَبْناؤُهُ أَبْنائِهِ شَيْئًا خَطِيئَةٌ ضَوْءُهُ ضُوؤُهُ ضَوْءَهُ ضَوْئِهِ مُرُوءَةٌ أَبْنائِهِ بَرِيؤُهُ سُوئِلَ فِيلٌ فَانٌّ فُونٌّ سَاءَلَ فُؤَادٌ شُرَكاؤُهُ رِئَاسَةٌ تَهْنِئَةٌ دَفَاءَةٌ طَفّاءَةٌ تَأْرِيخٌ فَأْرٌ شَيْءٌ شَيْءٍ شَيْئًا ضَوْءٌ ضَوْءٍ ضَوْءًا جُزْءٌ جُزْءٍ جُزْءًا مَبْدَأٌ مَبْدَأٍ مَبْدَأً نَبَأَ قَارِئٌ تَكَافُؤٌ تَكَافُؤٍ تَكَافُؤًا أَبْناءُ أَبْناءِ أَبْناءَ جَريءٌ مَقْروءٌ ضَوْءٌ شَيْءٌ جُزْءٌ عُلَماءُ الْعُلَماءِ الْعُلَماءَ عَمْرٌو وَفَعَلُوا

betaCode into Translit

betaCode in English text

NB: This is an example of the English text with terms, names and toponyms given in betaCode and automatically converted into different transliteration flavors (exerpts are from Brill’s Encyclopaedia of Islam).

Dima^s.k, Dima^s.k al-^S_am or simply al-^S_am , (Lat. Damascus, Fr. Damas) is the largest city of Syria. It is situated … very much at the same latitude as Ba.gd_ad and F_as, at an altitude of nearly 700 metres, on the edge of the desert at the foot of ^Gabal .K_asiy_un.

al-_Dahab_i, ^Sams al-D_in Ab_u `Abd All~ah Mu.hammad b. `U_tm_an b. .K_aym_a.z b. `Abd All~ah al-Turkum_an_i al-F_ari.k_i al-Dima^s.k_i al-^S_afi`_i, an Arab historian and theologian, was born at Damascus or at Mayy_afari.k_in on 1 or 3 Rab_i` II (according to al-Kutub_i, in Rab_i` I) 673/5 or 7 October 1274, and died at Damascus, according to al-Subk_i and al-Suy_u.t_i, in the night of Sunday-Monday on 3 _D_u al-.Ka`da:t 748/4 February 1348, or, according to A.hmad b. `Iy_as, in 753/1352-3. He was buried at the B_ab al-.Sa.g_ir.

betaCode converted into one-to-one translit

Dimašḳ, Dimašḳ al-Šām or simply al-Šām , (Lat. Damascus, Fr. Damas) is the largest city of Syria. It is situated … very much at the same latitude as Baġdād and Fās, at an altitude of nearly 700 metres, on the edge of the desert at the foot of Ǧabal Ḳāsiyūn.

al-Ḏahabī, Šams al-Dīn Abū ʿAbd Allãh Muḥammad b. ʿUṯmān b. Ḳāymāẓ b. ʿAbd Allãh al-Turkumānī al-Fāriḳī al-Dimašḳī al-Šāfiʿī, an Arab historian and theologian, was born at Damascus or at Mayyāfariḳīn on 1 or 3 Rabīʿ II (according to al-Kutubī, in Rabīʿ I) 673/5 or 7 October 1274, and died at Damascus, according to al-Subkī and al-Suyūṭī, in the night of Sunday-Monday on 3 Ḏū al-Ḳaʿdaŧ 748/4 February 1348, or, according to Aḥmad b. ʿIyās, in 753/1352-3. He was buried at the Bāb al-Ṣaġīr.

betaCode converted into the Library of Congress scheme

Dimashq, Dimashq al-Shām or simply al-Shām , (Lat. Damascus, Fr. Damas) is the largest city of Syria. It is situated … very much at the same latitude as Baghdād and Fās, at an altitude of nearly 700 metres, on the edge of the desert at the foot of Jabal Qāsiyūn.

al-Dhahabī, Shams al-Dīn Abū ʿAbd Allāh Muḥammad b. ʿUthmān b. Qāymāẓ b. ʿAbd Allāh al-Turkumānī al-Fāriqī al-Dimashqī al-Shāfiʿī, an Arab historian and theologian, was born at Damascus or at Mayyāfariqīn on 1 or 3 Rabīʿ II (according to al-Kutubī, in Rabīʿ I) 673/5 or 7 October 1274, and died at Damascus, according to al-Subkī and al-Suyūṭī, in the night of Sunday-Monday on 3 Dhū al-Qaʿda 748/4 February 1348, or, according to Aḥmad b. ʿIyās, in 753/1352-3. He was buried at the Bāb al-Ṣaghīr.

betaCode converted into a searcheable string (diacritics removed)

Dimashq, Dimashq al-Sham or simply al-Sham , (Lat. Damascus, Fr. Damas) is the largest city of Syria. It is situated … very much at the same latitude as Baghdad and Fas, at an altitude of nearly 700 metres, on the edge of the desert at the foot of Jabal Qasiyun.

al-Dhahabi, Shams al-Din Abu Abd Allah Muhammad b. Uthman b. Qaymaz b. Abd Allah al-Turkumani al-Fariqi al-Dimashqi al-Shafii, an Arab historian and theologian, was born at Damascus or at Mayyafariqin on 1 or 3 Rabi II (according to al-Kutubi, in Rabi I) 673/5 or 7 October 1274, and died at Damascus, according to al-Subki and al-Suyuti, in the night of Sunday-Monday on 3 Dhu al-Qada 748/4 February 1348, or, according to Ahmad b. Iyas, in 753/1352-3. He was buried at the Bab al-Saghir.