Spell checking and the Arabic script

The Arabic script is cursive and we have been exploring difficulties with accurate online spell checking. Fadwa Mohamad has kindly shared her knowledge about some of the issues that arise for those with dyslexia when it comes to the way Arabic characters are linked. Arabic has 28 letters to represent 34 phonemes and we have already discussed the issues of vowels and diacritics. Now we have learnt there is the thorny problem that only 22 of the 28 letters have two way connectors. The 6 remaining letters can only be joined in one way – so an Arabic word can contain one of more spaces. This means a word using some of these 6 letters, that can only be joined up in one way, may be divided in several places.

The other problem of note is that capital letters are not used in Arabic, so once again it may not be easy to see or work out where word boundaries occur. This along with the odd spacing obviously causes concerns for some readers, but may also be one reason why a spell checker can appear to gobble letters when it tries to correct a word!

To add to these issues the articles ‘the’,’a’ or ‘an’ in English tend to be joined to the following word in Arabic –  so those who can read Arabic will recognise the letters ‘AL’ or “Arabic: الـ‎, also transliterated as ul- and in some cases il- and el- ” according to Wikipedia. The reader has to also work out whether the ‘AL’ will be silent or voiced in some cases which impacts on text to speech engines and the lack of spacing can affect spell checking.

Finally Arabic letters may be formed in different ways depending on their position in the word.  So a shape may change from its isolated form to one that is different when seen as the initial letter in the word or the medial one or even the final one! This is how arabic-course.com describe the issue.

Arabic letter changes depending on the position in a word

The work to discover how we can overcome the letter gobbling spell checking and the mispronouncing speech synthesis continues!