Recent research by Mashael AlKadi using an ATBar simulation.

I have just read an extremely interesting report by Mashael that looks into the issues around creating an Arabic speech recognition module for the ATbar and ATKit.  The report has a very useful analysis about the tools available and some important considerations which we will cover in more detail in the future.

Mashael collected data from 41 Arabic speaking post-graduate, under-graduate and secondary school students.   In brief the results showed that this group of users tended to browse for text (44%) and multimedia content (42%) with only 14% games or shopping and using social networks etc.  Few seemed to know or use off line services (90%) and this was commented upon in the conclusion as being a useful way of working with the toolbar when off line and should be considered in a similar way to the Silverlight approach – saving useful dictation results or working with forms at a later date.

Speech recognition command and control was not felt to always be useful and the group surveyed did not specify a need due to a disability, in fact 80% said they were happy to use the mouse and keyboard for browser control.  However, 35 of the users said they would use speech recognition for language learning, 20 selected translation, 16 school work, 15 web activities and 10 for work based reports.  High accuracy rates were required (90%) with the use of diacritics, despite the fact that these can cause problems for those with visual impairment and for the elderly.  61% felt that it would be useful to save dictated data for re-use.

Other research that Seb found showed that only 1% of websites are available in Arabic and Mashael found that 44% of her participants wanted to be able to use both English and Arabic for data entry and over half (59%) wanted to have text to speech to read back content.  They appeared to require accuracy over a large vocabulary in terms of speech dictation and its use on the web.

Although several users of the prototype ATbar shown by Mashael in the video below wanted extra features most were happy with the basic version and were content with the design and core functionality.  Mashael highlighted the usefulness of the kit approach with the introduction of a Braille API and the need for a flexible approach to language support.

 

Meeting – Speech in and Speech Out!

This afternoon we debated the issues arising with speech recognition research and text to speech (TTS).  Mashael had two very interesting papers that were showing that Sphinx4 is still the place to be when it comes to looking at Arabic speech recognition but the debate about recognition rates with or without diacritics prevails.  In one paper it appeared that rates were higher without diacritic marks.

We then moved over to listen to the impact of diacritic marks with TTS.  Edrees had a web page that showed us how the recordings he had made with his mobile phone of two synthesised voices were clearer without diacritic marks.

اَللُغَةُ اَلعَرَبِيَةِ لُغَةٍ مَجِيْدَةٌ يَتَحَدْثُ بِهَاَ اَلنْاَسُ فِي أَكْثَرِ مَنْ سِتَةٍ وَ عِشْرِيْنَ دْوُلَةٍ حَوُلَ اَلعَاَلَمِ.

اَللُغَةِ اَلعَرَبِيَةِ مُفْرَدَاتُهَاَ غَيْرِ مَحْدُوُدَةٌ وَ تَحْتَوُيِ عَلَىَ عَدْدٍ مِنْ عَلَامَاتِ اَلْتَشْكِيِلِ اَلَتيِ تُمَيُزِ كَلِمَاَتِهْاَ وَ تَجْعَلَهَاَ لُغَةٌ مُعَقْدَةٌ بَعْضَ اَلشْئِ.

 

listen to .amr file with diacritic marks

اللغة العربية لغة مجيدة يتحدث بها الناس في أكثر من ستة و عشرين دولة حول العالم.

اللغة العربية لغة مفرداتها غير محدودة و تحتوي على عدد من علامات التشكيل التي تميز كلماتها وتجعلها لغة معقدة بعض الشئ.

listen to .amr file without diacritic marks

 

Arabic blog explains some of the issues around pronunciation and diacritics in a recent posting called ‘Arabic Diacritics (Al-Tashkeel الـتـشـكـيـــل )‘.

Further comments

I have taken the liberty of including a comment that Mashael made about our meeting in this blog as she has supplied us with some very useful links.

mashael on said: Edit

Hello All :)

After our meeting today me, Mrs. EA, and Edrees

Here are some papers which we think will be of interest

1) Arabic Phonetic Web Sites Platform Using VoiceXML : (Includes implementation of Arabic ASR (using Sphinx) and TTS (Using MBROLA project)
http://ieeexplore.ieee.org/

2)Natural speaker-independent Arabic speech recognition system based on Hidden Markov Models using Sphinx tools (What draws attention of this paper is that the system gives higher accuracy when implemented without diacritics)
http://ieeexplore.ieee.org

3) This a dictation ASR developed by CMU university called EvalDictator (It supports Arabic, but it suffers from some problems. we’ll check how much progress have been archived on the project)
http://www.speech.cs.cmu.edu/sphinx/dictator/

Arabic Speech Regonition

Available Commercial Arabic ASRs:
1. Sakar ASR:
A promising software that emerged from an experienced background that involved various technical solutions which are concerned on serving Arabic-based applications. In fact, they have claimed that their ASR doesn’t require training.
http://international.sakhr.com/arabic-speech-recognition-and-arabic-TTS.html

Sakar’s Client:
• USA national departments such as Defense, Homeland Security, and Justice
• European Commission
• Government of Canada

* An interesting point is that Sakar has a partnership with Carnige Melon University who is the developer of Sphinx open-source ASR


2. Emerging Technologies ASR:
Deals with dialects & accents because of Nawaz company engine (75% of market share)
The company provide ASR, text-to-speech, and voice verification.
http://www.em-t.com/Arabic-Voice-Recognition-Solutions-s/54.htm
Features:
Supports voice-commands (Such in "Sama’ni" service in STC [KSA]
User independent recognition model (no user intervention)
Client:
Middle East : Saudi Telecom Company (STC) [KSA] , Dubia Airport & Etisalat Telecom Company [UAE], Kuwait finance house [KTW]
. Dubia Airport as a case study showed that the ASR caused the reduction of live agents at call centers


Both ASRs support diverse accents and dialects

3. IBM viavoice Arabic:
Firstly, through researching the internet I didn’t found any official link for the Arabic version. Even though, I’ve found many copies of the Arabic version of IBM viavoice engine for download in the Arabic content. Moreover, one of the sites has cited that these versions were a previous support from IBM viavoice which doesn’t exist anymore.
On the other hand, IBM is working with King Abdulaziz City for Technology & Science (KACTS):
to develop an Arabic ASR for technical use & voice verification

http://ceri.kacst.edu.sa/English/Speech_Recognition_for_telephony_applications.html
They offer a Saudi accented Arabic speech database for researchers & commercial use. (http://www.mghamdi.com/SAAVB_Athens.pdf


4. AppTek ASR:
Especially integrated within a system for news broadcasting
Offers more products than only speech recognition to include services such as data mining and much more
http://aramedia.com/speechtrans.htm
http://www.marketwire.com/press-release/AppTek-Releases-Arabic-Farsi-Pashto-Urdu-Language-Engines-Its-Hybrid-Machine-Translation-1279485.htm


5. VOCAPIA Speech Recognizer API:
http://www.vocapia.com/technos.html


6. Voice-to-text API (Cloud, supports Arabic )
http://www.voicecloud.com/hk/
7. Votek:
an Arabic based speech recognizer, specialized in dealing with various dialects

http://www.votek-group.com/

Arabic ASR

Hello,

The only open-source Arabic ASR that is found up to now is “Arabisc” which was developed by Dr. Hussein Hiyassat et. al

Here is the download link: http://sourceforge.net/projects/arabisc/ (Gives an error at run-time)
But I’m still trying to contact the developer for a full access to the system to download if available.

Here is a full description of their work : http://www.springerlink.com/content/n3658k1758140266/