Tag Archives: phonemes

Arabic Speech Corpus shared by Dr. Nawar Halabi

respond symbol with audioIf you have been using our Arabic symbols page you will have noticed that we have made every phoneme for our lexical entries available as a sound file, so that you can hear how it is pronounced. You can see the audio links at the bottom of the symbol for ‘respond’ in the picture beside this text.   This can help those who have literacy skills difficulties as well as those wish to learn Arabic.

Nawar, who has been part of our Tawasol Symbols project from the beginning at the same time as successfully completing  his PhD, has made this possible with the development of an Arabic Speech Corpus with support from the University of Southampton and MicrolinkPC.

The synthesised speech output that results from this corpse is a very natural sounding voice, recorded using Levantine Arabic, as heard in and around Damascus.  Levantine Arabic is considered one of the three main Arabic dialects and differs from Gulf Arabic in some aspects of grammar and pronunciation although when phonemes are read aloud, they are often nearer Modern Standard Arabic and when combined there is less dialectal impact.

The corpus has been made available for download as a zip file and is released under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.  As the Arabic Speech Corpus website says the packages includes:

  • 1813 .wav files containing spoken utterances.
  • 1813 .lab files containing text utterances.
  • 1813 .TextGrid files containing the phoneme labels with time stamps of the boundaries where these occur in the .wav files. These files can be opened using Praat software.
  • phonetic-transcript.txt which has the form “[wav_filename]” “[Phoneme Sequence]” in every line.
  • orthographic-transcript.txt which has the form “[wav_filename]” “[Orthographic Transcript]” in every line. Orthography is in Buckwalter Format which is friendlier where there is software that does not read Arabic script. It can be easily converted back to Arabic.
  • There is an extra 18 minutes of fully annotated corpus (separate from above, but with the same structure as above) which was used to evaluate the corpus (see PhD thesis). Feel free to use this in your applications.

Please contact Nawar Halabi by email for further information.

English and Arabic Phonic Representations to aid Literacy Skills in AAC users

Over the past few weeks we have been trying to understand the importance of the various ways phonemes are represented to support literacy skills in Arabic and English and how best to show them alongside words or multiwords that are added to the dictionary via the Symbol Management system.   We have also discussed the need for recorded speech where synthesised speech or text to speech fails for both MSA and Qatari Arabic.   Research has shown how important phonemic awareness skills are for AAC users who go on to develop literacy skills and it appears that listening to the sounds and seeing the text highlighted helps reading skills as well as finger pointing (Vandervelden & Siegel 1999).

symbols with phonemic representationOne of the hardest problems in English is how to represent the sounds when the spelling of the words bears little resemblance to the spoken version.  Decisions have to be made as to whether one uses a system similar to that offered by the BBC where sounds are written as a combination of vowels or consonants that represent what is said such as /th a ng k / y oo/ or stay with the original spelling and just divide the word up into segments or syllables with the various blended or individual sounds e.g. th a n k | y ou

BBC phonics kit

BBC Phonics kit available at from the BBC website

Whilst discussing this matter with Professor Annalu Waller, Rolf Black, Andrea Kirton and Simon Judge at the Communication Matters Conference 2014 it was clear that the presentation should follow the way the phonics are being taught in schools by primary school teachers where the AAC users developing literacy skills could work alongside their classmates.  In UK such schemes as Jolly Phonics are being used and Andrea Kirton and Simon Judge are working on a phonic screen that might well be developed further to present the sounds with speech output in a similar way to the Macmillan app developed by Vivid Interactive to provide speech therapists with the phonetic alphabet.     It is possible that with the English section of the Arabic Symbol Dictionary we will need to take this further with clusters and blends being part of the segmentation to aid search and categorisation of words for example the listings provided in ‘Spotlight On Spelling: A Structured Guide To The Assessment And Teaching Of Spelling’ and the work of Cootes and Jamieson 

In Arabic some thought is needed as to how phonemes are represented with the various diacritical marks.  However, it is felt that by offering all the movements (diacritical marks) the text to speech (TTS) voices on offer will be able to provide acceptable pronunciation for most words even if they fail on individual phonemes were there will be the need for human recordings.

Below you will find 16 rows with 28 representations of the Arabic alphabet with possible phonemic variations which can be read using the Arabic version of ATbar. As the phonemes are used in written Arabic their letter shapes will change.  The shape of each letter altering depending on the position in the word and phrase.  Arabic keyboards achieve this automatically!  You are seeing all the letter combinations as if they are in their initial position.  I should point out that corrections to this table may still need to be made by our Arabic speaking experts, but this is just to show the type of discussions taking place at this stage in the research.

ي و ه ن م ل ك ق ف غ ع ظ ط ض ص ش س ز ر ذ د خ ح ج ث ت ب ا
يَ وَ هَ نَ مَ لَ كَ قَ فَ غَ عَ ظَ طَ ضَ صَ شَ سَ زَ رَ ذَ دَ خَ حَ جَ ثَ تَ بَ اَ
يُ وُ هُ نُ مُ لُ كُ قُ فُ غُ عُ ظُ طُ ضُ صُ شُ سُ زُ رُ ذُ دُ خُ حُ جُ ثُ تُ بُ اُ
يِ وِ هِ نِ مِ لِ كِ قِ فِ غِ عِ ظِ طِ ضِ صِ شِ سِ ذِ رِ ذِ دِ خِ حِ جِ ثِ تِ بِ اِ
يّْ وّْ هّْ نّْ مّْ لّْ كّْ قّْ فّْ غّْ عّْ ظّْ طّْ ضّْ صّْ شّْ سّْ زّْ رّْ ذّْ دّْ خّْ حّْ جّْ ثّْ تّْ بّْ اّْ
يَّ وَّ هَّ نَّ مَّ لَّ كَّ قَّ فَّ غَّ عَّ ظَّ طَّ ضَّ صَّ شَّ سَّ زَّ رَّ ذَّ دَّ خَّ حَّ جَّ ثَّ تَّ بَّ اَّ
يُّ وُّ هُّ نُّ مُّ لُّ كُّ قُّ فُّ غُّ عُّ ظُّ طُّ ضُّ صُّ شُّ سُّ زُّ رُّ ذُّ دُّ خُّ حُّ جُّ ثُّ تُّ بُّ اُّ
يِّ وِّ هِّ نِّ مِّ لِّ كِّ قِّ فِّ غِّ عِّ ظِّ طِّ ضِّ صِّ شِّ سِّ زِّ رِّ ذِّ دِّ خِّ حِّ جِّ ثِّ تِّ بِّ اِّ
يَا وَا هَا نَا مَا لَا كَا قَا فَا غَا عَا ظَا طَا ضَا صَا شَا سَا زَا رَا ذَا دَا خَا حَا جَا ثَا تَ بَا آ
يُو وُو هُو نُو مُو لُو كُو قُو فُو غُو عُو ظُو طُو ضُو صُو شُو سُو زُو رُو ذُو دُو خُو حُو جُو ثُو تُو بُو اُو
يِي وِي هِي نِي مِي لِي كِي قِي فِي ضِي عِي ظِي طِي ضِي صِي شِي سِي زِي رِي ذِي دِي خِي حِي جِي ثِي تِي بِي إِي
يَّا وَّا هَّا نَّا مَّا لَّا كَّا قَّا فَّا غَّا عَّا ظَّا طَّا ضَّا صَّا شَّا سَّا زَّا رَّا ذَّا دَّا خَّا حَّا جَّا ثَّا تَّا بَّا آ
يُّو وُّو هُّو نُّو مُّو لُّو كُّو قُّو فُّو غُّو عًّو ظُّو طُّو ضُّو صُّو شُّو سُّو زُّو رُّو ذُّو دُّو خُّو حُّو جُّو ثُّو تُّو بُّو اُّو
يِّي وِّي هِّي نِّي مِّي لِّي كِّي قِّي فِّي غِّي عِّي ظِّي طِّي ضِّي صِّي شِّي سِّي زِّي رِّي ذِّي دِّي خِّي حِّي جِّي ثِّي تِّي بِّي اِّي
يَة وَة هَة نَة مَة لَة كَة قَة فَة غَة عَة ظَة طَة ضَة صَة شَة سَة زَة رَة ذَة دَة خَة حَة جَة ثَة تَة بَة اَة
يَّة وَّة هَّة نَّة مَّة لَّة كَّة قَّة فَّة غَّة عَّة ظَّة طَّة ضَّة صَّة شَّة سَّة زًّة رَّة ذَّة دَّة خَّة حَّة جَّة ثَّة تَّة بَّة اَّة

Tullah has also been carrying out research in this area and has discovered an iPad app called ‘Sawti‘ developed by Gadah Alofisan from King Saud University who has won awards for his work in this area and has presented at ICCHP .  This is one of the first apps to offer Arabic AAC support with symbols and their corresponding words being said by male and female children’s voices.  It offers users the chance to practice symbol / word recognition with free text being read aloud with the synthesised voice.   There are some colloquial Arabic words as well as MSA and the user can choose when to use speech feedback.

sawti ipad app


The only problem we have found is that the voice changes depending on the symbol being read which can be a little distracting and sometimes the way the word is pronounced was questioned by some Arabic speakers.

Both Arabic and English have such a wide range of pronunciation that we are going to have to agree on some guidelines for the way we work with voices / TTS and the way phonemes are presented.


Bayan Alarifi, Arwa Alrubaian, Ghada Alofisan, Nora Alromi, Areej Al-Wabil (2013) Towards an Arabic Language Augmentative and Alternative Communication Application for Autism, In proceedings of HCI International 2013 A. Marcus (Ed.): DUXU/HCII 2013, Part II, LNCS 8013, pp. 333-341. Springer, Heidelberg (2013).

Black R, Waller A, Pullin G, Abel E. Introducing the PhonicStick: Preliminary evaluation
with seven children. Montreal, Canada: ISAAC; 2008.  http://phonicstick.computing.dundee.ac.uk/publications/ 

Andrea Kirton, Simon Judge, P. B. (2014). Using Phonemes to Construct Utterances for Aided Communication. ISAAC 2014. doi:10.13140/2.1.3524.4162  http://openconf.faiddsolutions.com/modules/request.php?module=oc_program&action=summary.php&id=142

Trinh, H. (2011). Using a Computer Intervention to Support Phonological Awareness Development of Adults with Severe Speech and Physical Impairments. The 13th International ACM SIGACCESS Conference on Computers and Accessibility, Dundee, UK. Accessed 5th September 2014  http://src.acm.org/2012/HaTrinh.pdf

Trinh1, H. (2012). iSCAN: A Phoneme-based Predictive Communication Aid for Nonspeaking Individuals. Proceeding ASSETS ’12 Proceedings of the 14th international ACM SIGACCESS conference on Computers and accessibility. ccessed 5th September 2014  http://keithv.com/pub/iscan/iSCAN_Final.pdf

Vandervelden, M., & Siegel, L. (1999). Phonological Processing and Literacy in AAC Users and Students with Motor Speech Impairments. Augmentative and Alternative Communication, 15(September), 191–211.  Accessed 5th September 2014  http://informahealthcare.com/doi/abs/10.1080/07434619912331278725 



Phonemic awareness for literacy skills – Possible additional feature for each lexical entry

Whilst we analyse the voting for the symbols and check the results against the core English vocabulary lists provided by Tullah from the various Doha groups of AAC users we have been investigating how we could tag and store the symbols and their matching lexicons in English and Arabic.

Meetings with Professor Annalu Waller in Dundee and Simon Judge in Sheffield confirmed suspicions that if we wanted to ensure that the dictionary not only coped with the communication side of symbol use and also encouraged literacy skills there needed to be links to the way words were made up of phonemes. Research has shown that phonemic awareness can be used as a predictor for reading ability (Gillon, 2004)

Teaching phonics has become a hot topic in UK Primary schools to the extent that even the newspapers have come up with lists of resources to aid teachers and parents.  An example is the Guardian article in 2013 “How to teach … phonics”

But the dilemma is how we make a dictionary of words / multiwords linking to symbols that can also be searched by phonemes in English and Arabic for those who can often only listen to the sounds or may find it hard to say them or recognise their significance.  Clearly this aspect of the dictionary is only for certain groups of symbol users who may have the ability to learn to read and write.

Synthetic phonics

‘geck’ and ‘chom’ from Year 1 Phonics Screening (Five things about phonics By Kathryn Westcott BBC News, Magazine, 2012)

Synthetic phonics‘ is often used with ‘made-up’ words to encourage grapheme / phoneme recognition.  However, as can be seen from the example, even those words introduced at primary school level may have letter combinations that can be pronounced in different ways, for instance in ‘geck’ the letter ‘g’ can be said with a hard sound or a soft sound in front of an ‘e’ (/dz/ or /g/).  Children tend to be taught around


According to teaching documents provided by the Education Department at Oxford Brookes University: “In English, much more than in other languages,

  • many letters or letter-combinations can commonly represent more than one sound – for example ea as in heat and head;
  • most sounds can be spelt in more than one way – for example the vowel sound in heat is also commonly spelt as in he, see, chief and complete;
  • Some very common words contain grapheme-phoneme correspondences that occur in few if any other words – for example one, two, are, said, great, people, laugh.” (TDA, 2011)

Children tend to learn the basic 44 phonemes in English which are represented below.

44 phonemes

The Phoneme Table below lists the 44 phonemes of the English Language, from the National Strategies Standards’ phonics sounds. (DFES 2007)

The good news is that in Arabic every letter combination has a set rule and although there may be many sounds that cannot be replicated in English the grapheme/phoneme representation is stable.  The image below shows where there are overlaps between Arabic and English phonemes.


Research carried out by Amor and Maad (2013) working with Arabic speaking Tunisian children has shown that: (direct quote from their article)

The best performances of Good Readers confirm the idea supported by many researchers (e.g. Byrne et al., 1992; Gombert, 1992) that reading failure may manifest itself through a lack of phonemic awareness. The lowest results of the Preliterate children suggest that phonemic awareness does not develop spontaneously, but only in the specific context of learning to read an alphabetic script at school. This phenomenon was observed in many alphabetically written languages, such as  English, French (Gillon, 2004; Morais et al., 1987), and Hebrew (Bentin et al., 1991; Oren, 2001).

The researchers found there were specific difficulties with phonemic segmentation for Arabic words across the cohort of “110 Tunisian children enrolled in primary education schools and kindergartens”.   All their participants scored less well than expected in comparison to research results carried out in other languages.  This was felt to be due to the diglossia nature of Arabic with colloquial Arabic often being spoken at home whereas Modern Standard Arabic is used in schools.   Nevertheless, the children coped better with consonants in deletion tasks, for example /k/t/b/ that make up many words around reading and writing e,g /kataba/ (he wrote), /kutiba/ (it was written), /kutubun/ (books), etc.

Those learning to read in English tend to find it easier to mark initial and final phonemes as individual sounds with the medial one proving harder to work out.  Amor and Maad (2013)  found that their participants had difficulties across all three positions.  “Divergence between the performances of Arabic-speaking and English-speaking children confirms that the representations about the consonantal segments were not the same. “


It appears that the ability to gain phoneme / grapheme awareness is harder in Arabic than some other languages and that the total number of consonant segments in Arabic are higher than most languages but the number of vowel based segments are lower (Newman, 2002)

One idea when creating the lexical entry in the symbol dictionary is to include the representative phonemes with their diacritics,  their change in shape (depending on the position in the word) and the sound they make using text to speech or recorded speech. 


Abu-Rabia, S. (2001). The role of vowels in reading Semitic scripts: Data from Arabic and Hebrew. Reading and Writing: An Interdisciplinary Journal, 14, 39-59.

Amaryeh, M. M., Dyson, A.T. (1998). The Acquisition of Arabic Consonants. Journal of Speech, Language and Hearing Research, 41, 642

Amor P. D. M., & Maad, R. Ben. (2013). The Role of Arabic Orthographic Literacy in the Phonological Awareness of Tunisian Children (April). Retrieved from http://www.ijonte.org/FileUpload/ks63207/File/02.amor.pdf 

Bentin, S., Hammer, R. & Cahan, C. (1991). The effects of ageing and first grade schooling on the development of phonological awareness. Psychological science, 2(4) 271-274.

Byrne, B., Freebody, P. & Gates, A. (1992). Longitudinal data on the relation of word-reading strategies to comprehension, reading time and phonemic awareness. Reading, Research Quarterly, 27, 141-151.

Department of Education and Skills (2007) Letters and Sounds: Principles and Practice of High Quality Phonics – Primary National Strategy ( PDF Downloaded July 30th, 2014)

Gombert, J.E. (1992). Metalinguistic development. Chicago: University of Chicago Press.
Gillon, G. (2004). Phonological awareness: From research to practice. New York: Guilford Press.

Kurtz, R. (2010). Phonemic awareness affects speech and literacy. Speech-Language-Development. Retrieved from http://www.speech-language-development.com/phonemic-awareness.html

Khomsi, A. (1993). L’epreuve collective d’identification de mots. Nantes : University of Nantes.

Liberman, I., Shankweiler, D., Fisher, F. & Carter, B. (1974). Explicit syllable and phoneme segmentation in the young child, Journal of Experimental Child Psychology, 18, 201-212.

MacDonald, G. & Cornwall, A. (1995). The relationship between phonological awareness and reading and spelling achievement eleven years later. Journal of Learning Disabilities, 28(8) 523-527.

Morais, J., Alegria, J., & Content, A. (1987). The relationship between segmental analysis and alphabetic literacy: An interactive view. Cahiers de Psychologie Cognitive, 7, 415-438.

Newman, Daniel L. 2002. The phonetic status of Arabic within the world’s languages.
Antwerp Papers in Linguistics, 100:63–75 http://uahost.uantwerpen.be/apil/apil100/Arabic1.pdf 

Saiegh-Haddad, E. (2005). Correlates of reading fluency in Arabic: Diglossic and orthographic factors. Reading and Writing: An Interdisciplinary Journal: An Interdisciplinary Journal, 18, 559-582. International Journal on New Trends in Education and Their Implications  April 2013 Volume: 4 Issue: 2 Article: 02 ISSN 1309-6249

The Training and Development Agency for Schools (TDA) (2011) Systematic synthetic phonics in initial teaching training: Guidance and support materials. (Word document downloaded July 30th, 2014)

Vandervelden, M., & Siegel, L. (1995). Phonological recoding and phonemic awareness in early literacy: Developmental approach. Reading Research Quarterly, 30 (4). 854-875.

Ziegler, J. & Goswami, U. (2005). Reading acquisition, developmental dyslexia and skilled reading across languages. Psychological Bulletin, 131, 3–29.