Over the summer the team have been investigating the issues around TTS in Arabic and Edrees Abdu Alkinani has completed his MSc report which has made interesting reading as it summarises many of the findings. It was noted that Arabic TTS synthesis did not have the early successes of European languages due to the limitations in Natural Language Processing (NLP) and the complexities of using diacritics as substitutes for vowel combinations. However, with the advances in Natural Language Processing (NLP) and Digital Signal Processing (DSP) plus automatic diacrtizers progress is being developed progress has been made in the commercial world where there are now several attractive Arabic synthesised voices as will be seen in an evaluation to follow.
Issue No 1 – Lack of diacritics on web pages.
The Learning Resource - Arabic language
English speakers may wonder at the reasons for the difficulties with Arabic TTS, but it does not take more than a cursory glance at the written language to understand that having 14 different diacritic marks with 34 phonemes, 28 of which are consonants, and only six vowels that the combinations may cause TTS problems. As Eedris pointed out… ” كُتُبْ ” means books and ” كَتَبَ ” means wrote – the only difference you will notice is the type of marks used above the letters.
TEFL world wiki - English vowel sounds
This is compared to the English basic 12 vowel sounds with no accents or diacritics even though we may complain about our odd pronunciation of some written words – rough, cough, though, thorough and through – at least some of the letters are different and we cannot leave any out. Yet this is what is happening with written Arabic on the web – the diacritics are being left out….. Number one problem for a text to speech engine.
Issue No 2 – The differences between the way the TTS is developed and the resulting output.
Research has shown that although there are now a few text to speech engines they are commercial and even these vary in quality. The MBROLA project links to work carried out in the open source world, but at present it has been impossible to achieve success with the code offered in the various repositories for evaluation purposes. However, Eedris has supplied the team with these comments based on the demonstrators offered by the various organisations and companies.
- MBROLA project
MBROLA has two Arabic voices as a recorded audio file. The speed of speech is slow, and the quality poor. Moreover, the pronunciation is hard to understand – even for a an Arabic speaker. The stress pattern is often incorrect and the distinction between words unclear. The most difficult words to understand have letters like, “ أ” ‘A’, “ ض” ‘th’, “ ل” ‘L’.
- Acapela Group
Acapela offers two good quality male and female voices. The pronunciation for words with and without diacritic marks is understandable, with accurate stress patterns. There are three letters which appear to cause some difficulty “ ج” ‘j’, “ ا’ ‘a’, “ ك” ‘k’. The pronunciation of numbers in all situations is good.
- Nuance Vocalizer
Nuance provide a very clear male voice with clear pronunciation. The only problem is that the system produces speech without taking into account diacritics. Words which have letters like “ ق” ‘q’, “ ش” ‘sh’, and “ ض” ‘th’ may cause problems but the speed of speech used in the online demo is good. Numbers are not clearly enunciated due to the lack of diacritics.
Loquendo offer a recording of a male and female voice on their site as the Arabic voice has only be available since October 2010. The system has good sound quality clear speech. The example on the website has diacritic marks but as it is a small sample it is hard to judge the overall quality but it appears to be good.
Issue No 3 – Further Development of eSpeak with Arabic.
The current version of MBROLA does not appear to run with the arabic voice files and there seem to be very few people who have had success. So this is work in progress…
There are several spell checkers available as open source applications and much has been said about the quality of their output in English but there appears to be very little research when linked to the Arabic language. However, Hunspell is used with many word processing packages.
Seb has succeeded in getting it to work with ATbar vers 2 which means that the Kit version is now almost in beta and there is the beginnings of an Arabic spell checker.
Internal Alpha testing of the spell checker.
ATbar is now available as a WordPress Plugin. It is simple to install and allows the website owner to select whether the toolbar is persistent or not (active on all pages).
Future versions look to include the option to exclude the toolbar on specific pages as well as a widget (sidebar) option for greater flexibility.
Download ATbar for WordPress
I have just read an extremely interesting report by Mashael that looks into the issues around creating an Arabic speech recognition module for the ATbar and ATKit. The report has a very useful analysis about the tools available and some important considerations which we will cover in more detail in the future.
Mashael collected data from 41 Arabic speaking post-graduate, under-graduate and secondary school students. In brief the results showed that this group of users tended to browse for text (44%) and multimedia content (42%) with only 14% games or shopping and using social networks etc. Few seemed to know or use off line services (90%) and this was commented upon in the conclusion as being a useful way of working with the toolbar when off line and should be considered in a similar way to the Silverlight approach – saving useful dictation results or working with forms at a later date.
Speech recognition command and control was not felt to always be useful and the group surveyed did not specify a need due to a disability, in fact 80% said they were happy to use the mouse and keyboard for browser control. However, 35 of the users said they would use speech recognition for language learning, 20 selected translation, 16 school work, 15 web activities and 10 for work based reports. High accuracy rates were required (90%) with the use of diacritics, despite the fact that these can cause problems for those with visual impairment and for the elderly. 61% felt that it would be useful to save dictated data for re-use.
Other research that Seb found showed that only 1% of websites are available in Arabic and Mashael found that 44% of her participants wanted to be able to use both English and Arabic for data entry and over half (59%) wanted to have text to speech to read back content. They appeared to require accuracy over a large vocabulary in terms of speech dictation and its use on the web.
Although several users of the prototype ATbar shown by Mashael in the video below wanted extra features most were happy with the basic version and were content with the design and core functionality. Mashael highlighted the usefulness of the kit approach with the introduction of a Braille API and the need for a flexible approach to language support.
The ATbar website is at the stage where the framework is complete and the only thing missing is the content on a few minor pages. There are four sections to the website:
— Kit subsite
— API subsite
— Arabic version site
The Kit subsite is to hold the marketplace for additional modules for the toolbar, but until we have decided how that will be implemented we can’t start making the interface. Currently there is a placeholder with a “coming soon…” message.
The API subsite contains a Wiki for developers which will most likely list the code or have links to GitHub where it is currently published.
The Arabic site is the same as the English one expect for everything reading to from right to left. We will need someone to translate the ATbar.org pages into Arabic, but once that is done then it will be finished.
The ATbar installer is available from the site now. Simply navigate to the download page and the automatic installer will pick the appropriate installation files depending on what browser you are using. Currently only Chrome and Safari are supported but we hope to include FireFox and IE in the future.
Currently I am working on improving the instructions page to have detailed instructions of use however, these may need to change depending on what features we decide to use on ATbar v2.0. The Help and FAQ files are also under construction – any input with questions that you might think need answering would be appreciated.
The About and Privacy pages also need to be written (looking to E.A for input on those).