Over the summer the team have been investigating the issues around TTS in Arabic and Edrees Abdu Alkinani has completed his MSc report which has made interesting reading as it summarises many of the findings. It was noted that Arabic TTS synthesis did not have the early successes of European languages due to the limitations in Natural Language Processing (NLP) and the complexities of using diacritics as substitutes for vowel combinations. However, with the advances in Natural Language Processing (NLP) and Digital Signal Processing (DSP) plus automatic diacrtizers progress is being developed progress has been made in the commercial world where there are now several attractive Arabic synthesised voices as will be seen in an evaluation to follow.
Issue No 1 – Lack of diacritics on web pages.
English speakers may wonder at the reasons for the difficulties with Arabic TTS, but it does not take more than a cursory glance at the written language to understand that having 14 different diacritic marks with 34 phonemes, 28 of which are consonants, and only six vowels that the combinations may cause TTS problems. As Eedris pointed out… ” كُتُبْ ” means books and ” كَتَبَ ” means wrote – the only difference you will notice is the type of marks used above the letters.
This is compared to the English basic 12 vowel sounds with no accents or diacritics even though we may complain about our odd pronunciation of some written words – rough, cough, though, thorough and through – at least some of the letters are different and we cannot leave any out. Yet this is what is happening with written Arabic on the web – the diacritics are being left out….. Number one problem for a text to speech engine.
Issue No 2 – The differences between the way the TTS is developed and the resulting output.
Research has shown that although there are now a few text to speech engines they are commercial and even these vary in quality. The MBROLA project links to work carried out in the open source world, but at present it has been impossible to achieve success with the code offered in the various repositories for evaluation purposes. However, Eedris has supplied the team with these comments based on the demonstrators offered by the various organisations and companies.
- MBROLA project
MBROLA has two Arabic voices as a recorded audio file. The speed of speech is slow, and the quality poor. Moreover, the pronunciation is hard to understand – even for a an Arabic speaker. The stress pattern is often incorrect and the distinction between words unclear. The most difficult words to understand have letters like, “ أ” ‘A’, “ ض” ‘th’, “ ل” ‘L’. - Acapela Group
Acapela offers two good quality male and female voices. The pronunciation for words with and without diacritic marks is understandable, with accurate stress patterns. There are three letters which appear to cause some difficulty “ ج” ‘j’, “ ا’ ‘a’, “ ك” ‘k’. The pronunciation of numbers in all situations is good. - Nuance Vocalizer
Nuance provide a very clear male voice with clear pronunciation. The only problem is that the system produces speech without taking into account diacritics. Words which have letters like “ ق” ‘q’, “ ش” ‘sh’, and “ ض” ‘th’ may cause problems but the speed of speech used in the online demo is good. Numbers are not clearly enunciated due to the lack of diacritics. - Loquendo
Loquendo offer a recording of a male and female voice on their site as the Arabic voice has only be available since October 2010. The system has good sound quality clear speech. The example on the website has diacritic marks but as it is a small sample it is hard to judge the overall quality but it appears to be good.
Issue No 3 – Further Development of eSpeak with Arabic.
The current version of MBROLA does not appear to run with the arabic voice files and there seem to be very few people who have had success. So this is work in progress…