Category Archives: Arabic speech recognition

Arabic Speech Now Recognized by Google

Whilst looking at all the speech recognition apps and software available for those wishing to use Arabic speech recognition there has been some very good news and as a result I am copying the entire blog written by by Nina Curley, March 11, 2012 from Wamda

“Developers throughout the Arab World should be excited- Google’s quiet rollout of Arabic voice recognition continues to create new opportunities for localized apps.

Voice Search, an app which allows users to search by simply using their voices, launched in December but has now expanded to recognize speech in eight dialects, including, as we understand it, Jordanian, Kuwaiti, Lebanese, Qatari, Saudi, Emirati, Egyptian, and Palestinian Arabic.

“I’ve never been this excited about a product in my life,” says MENA Product Marketing Manager Najeeb Jarrar, who was on the in-house team that worked for over two years to hone the app.

Google not only tackled a different algorithmic issue, from an engineering and linguistic perspective, he notes, but also built a product that will open up new ways of searching on the web and new opportunities for developers.

Here’s how it works: when you click the search button, your mobile device records your speech as a sound wave, and transfers it to a Google server, where it is compared to billions of sound waves to determine its meaning. Your sentence is then parsed by keywords and compared to billions of keyword combinations. Google uses the best keyword combination to return the results to your phone, all in under a second, depending on your internet connection.

“It’s as accurate as if you tried voice search in English,” says Jarrar. Voice Search also works in Google Maps to return place results.

To maximize its accuracy, the team worked hard to make the app robust, testing it while having local native speakers read popular queries in a train station, in a public cafe, or near echoes, so that it could detect speech patterns despite machine or human noise.

The app, which runs on Android and a feature in the Google Search app for iPhone and Blackberry, will also continue to get more accurate as its learns. If it doesn’t understand the user fully, Voice Search will offer a list of suggestions based on the closest matches, which the user can choose from, thus helping to improve future results.

Most importantly for entrepreneurs, Voice Search in Arabic will open up  to localize apps, make programs simpler to use, and increase accessibility for less tech literate populations. Because Arabic voice recognition is included in the Google Voice Search API, developers can just load the API and select the dialect of their choice.

Some of the ideas that I saw recently pitched at Startup Weekend Amman and QITCOM could certainly benefit. “Maybe we’ll start seeing games where [people] are playing simply by speaking,” offers Jarrar.

Especially where those games or apps are educational- it would be great to see this space use Arabic Voice Search to expand flexibility when it comes to including different types of learners.

Demo below [Arabic]. The Google Voice application for iPhone is available in the iTunes Store and for Android as part of Google Play.


Nina is the Editor-in-Chief at Wamda. You can reach her through Wamda, on Twitter @9aa, on FacebookGoogle+ or at nina [AT] wamda.com.

Testing times with Arabic Windows 8 and Arabic eSpeak.

A visit to the Assistive Technology Industry Association 2013 conference where the Microsoft team kindly showed me how we could work in Arabic and English plus the arrival of our Dell tablet with Windows 8 has made us look at the issue of Qatari Arabic support and Windows in depth.

qatari keyboard Qatari Arabic language pack

We downloaded the language pack and changed the keyboard and all seemed well but it appears from the email I received from their product advisor that there is no Window Arabic voice at present.

“I researched the question to see if Windows 8 supports Arabic (namely Qatari dialect) text to speech. Unfortunately, at this time, Windows 8 does not support it. Only certain languages are included in the built in software.”

So back to the drawing board for the ATbar desktop option – Narrator is not going to speak in Arabic unless someone has found an Arabic Windows system with a well hidden free voice from Microsoft!   If anyone has found a solution to this problem please do let us know!

eSpeak

eSpeak logo

More research and thanks to a recent development with Arabic eSpeak we now have a free voice,  Testing has shown that the voice needs to be improved but with work on the phonetics in the future this is something that could be done.  The aim is to ship NVDA with the ATbar desktop version and the Arabic eSpeak voice.  It will not really be an acceptable voice where a Nuance or Acapela option is available.

 

translation into ArabicThe Windows 8 mobile OS has the potential to support more Arabic options and offers translation from OCR although the actual text is still not 100% correct – Spot the problem!

Nuance has a choice of Arabic voices  for mobile and has added speech recognition but none of our team have been able to test its success rates.  Google has also rolled out speech recognition in Arabic for Android phones 

We have been testing online speech recognition systems offered by Google Chrome and they really are not very successful in the Arabic dialects offered.  Below is an example of Speech Recognizer in Arabic.

speech recognizer

The TalkTyper system uses Speech Recognizer for speech recognition as well as text to speech – the latter uses a very good voice in Arabic – we are still exploring which voice is used but it sounds like Nuance Maged in Arabic.

What this spot for updates next week linked to the ATbar desktop app and ATbar TTS.

 

 

 

Recent research by Mashael AlKadi using an ATBar simulation.

I have just read an extremely interesting report by Mashael that looks into the issues around creating an Arabic speech recognition module for the ATbar and ATKit.  The report has a very useful analysis about the tools available and some important considerations which we will cover in more detail in the future.

Mashael collected data from 41 Arabic speaking post-graduate, under-graduate and secondary school students.   In brief the results showed that this group of users tended to browse for text (44%) and multimedia content (42%) with only 14% games or shopping and using social networks etc.  Few seemed to know or use off line services (90%) and this was commented upon in the conclusion as being a useful way of working with the toolbar when off line and should be considered in a similar way to the Silverlight approach – saving useful dictation results or working with forms at a later date.

Speech recognition command and control was not felt to always be useful and the group surveyed did not specify a need due to a disability, in fact 80% said they were happy to use the mouse and keyboard for browser control.  However, 35 of the users said they would use speech recognition for language learning, 20 selected translation, 16 school work, 15 web activities and 10 for work based reports.  High accuracy rates were required (90%) with the use of diacritics, despite the fact that these can cause problems for those with visual impairment and for the elderly.  61% felt that it would be useful to save dictated data for re-use.

Other research that Seb found showed that only 1% of websites are available in Arabic and Mashael found that 44% of her participants wanted to be able to use both English and Arabic for data entry and over half (59%) wanted to have text to speech to read back content.  They appeared to require accuracy over a large vocabulary in terms of speech dictation and its use on the web.

Although several users of the prototype ATbar shown by Mashael in the video below wanted extra features most were happy with the basic version and were content with the design and core functionality.  Mashael highlighted the usefulness of the kit approach with the introduction of a Braille API and the need for a flexible approach to language support.

 

Meeting – Speech in and Speech Out!

This afternoon we debated the issues arising with speech recognition research and text to speech (TTS).  Mashael had two very interesting papers that were showing that Sphinx4 is still the place to be when it comes to looking at Arabic speech recognition but the debate about recognition rates with or without diacritics prevails.  In one paper it appeared that rates were higher without diacritic marks.

We then moved over to listen to the impact of diacritic marks with TTS.  Edrees had a web page that showed us how the recordings he had made with his mobile phone of two synthesised voices were clearer without diacritic marks.

اَللُغَةُ اَلعَرَبِيَةِ لُغَةٍ مَجِيْدَةٌ يَتَحَدْثُ بِهَاَ اَلنْاَسُ فِي أَكْثَرِ مَنْ سِتَةٍ وَ عِشْرِيْنَ دْوُلَةٍ حَوُلَ اَلعَاَلَمِ.

اَللُغَةِ اَلعَرَبِيَةِ مُفْرَدَاتُهَاَ غَيْرِ مَحْدُوُدَةٌ وَ تَحْتَوُيِ عَلَىَ عَدْدٍ مِنْ عَلَامَاتِ اَلْتَشْكِيِلِ اَلَتيِ تُمَيُزِ كَلِمَاَتِهْاَ وَ تَجْعَلَهَاَ لُغَةٌ مُعَقْدَةٌ بَعْضَ اَلشْئِ.

 

listen to .amr file with diacritic marks

اللغة العربية لغة مجيدة يتحدث بها الناس في أكثر من ستة و عشرين دولة حول العالم.

اللغة العربية لغة مفرداتها غير محدودة و تحتوي على عدد من علامات التشكيل التي تميز كلماتها وتجعلها لغة معقدة بعض الشئ.

listen to .amr file without diacritic marks

 

Arabic blog explains some of the issues around pronunciation and diacritics in a recent posting called ‘Arabic Diacritics (Al-Tashkeel الـتـشـكـيـــل )‘.

Further comments

I have taken the liberty of including a comment that Mashael made about our meeting in this blog as she has supplied us with some very useful links.

mashael on said: Edit

Hello All :)

After our meeting today me, Mrs. EA, and Edrees

Here are some papers which we think will be of interest

1) Arabic Phonetic Web Sites Platform Using VoiceXML : (Includes implementation of Arabic ASR (using Sphinx) and TTS (Using MBROLA project)
http://ieeexplore.ieee.org/

2)Natural speaker-independent Arabic speech recognition system based on Hidden Markov Models using Sphinx tools (What draws attention of this paper is that the system gives higher accuracy when implemented without diacritics)
http://ieeexplore.ieee.org

3) This a dictation ASR developed by CMU university called EvalDictator (It supports Arabic, but it suffers from some problems. we’ll check how much progress have been archived on the project)
http://www.speech.cs.cmu.edu/sphinx/dictator/

Arabic Speech Regonition

Available Commercial Arabic ASRs:
1. Sakar ASR:
A promising software that emerged from an experienced background that involved various technical solutions which are concerned on serving Arabic-based applications. In fact, they have claimed that their ASR doesn’t require training.
http://international.sakhr.com/arabic-speech-recognition-and-arabic-TTS.html

Sakar’s Client:
• USA national departments such as Defense, Homeland Security, and Justice
• European Commission
• Government of Canada

* An interesting point is that Sakar has a partnership with Carnige Melon University who is the developer of Sphinx open-source ASR


2. Emerging Technologies ASR:
Deals with dialects & accents because of Nawaz company engine (75% of market share)
The company provide ASR, text-to-speech, and voice verification.
http://www.em-t.com/Arabic-Voice-Recognition-Solutions-s/54.htm
Features:
Supports voice-commands (Such in "Sama’ni" service in STC [KSA]
User independent recognition model (no user intervention)
Client:
Middle East : Saudi Telecom Company (STC) [KSA] , Dubia Airport & Etisalat Telecom Company [UAE], Kuwait finance house [KTW]
. Dubia Airport as a case study showed that the ASR caused the reduction of live agents at call centers


Both ASRs support diverse accents and dialects

3. IBM viavoice Arabic:
Firstly, through researching the internet I didn’t found any official link for the Arabic version. Even though, I’ve found many copies of the Arabic version of IBM viavoice engine for download in the Arabic content. Moreover, one of the sites has cited that these versions were a previous support from IBM viavoice which doesn’t exist anymore.
On the other hand, IBM is working with King Abdulaziz City for Technology & Science (KACTS):
to develop an Arabic ASR for technical use & voice verification

http://ceri.kacst.edu.sa/English/Speech_Recognition_for_telephony_applications.html
They offer a Saudi accented Arabic speech database for researchers & commercial use. (http://www.mghamdi.com/SAAVB_Athens.pdf


4. AppTek ASR:
Especially integrated within a system for news broadcasting
Offers more products than only speech recognition to include services such as data mining and much more
http://aramedia.com/speechtrans.htm
http://www.marketwire.com/press-release/AppTek-Releases-Arabic-Farsi-Pashto-Urdu-Language-Engines-Its-Hybrid-Machine-Translation-1279485.htm


5. VOCAPIA Speech Recognizer API:
http://www.vocapia.com/technos.html


6. Voice-to-text API (Cloud, supports Arabic )
http://www.voicecloud.com/hk/
7. Votek:
an Arabic based speech recognizer, specialized in dealing with various dialects

http://www.votek-group.com/

Arabic ASR

Hello,

The only open-source Arabic ASR that is found up to now is “Arabisc” which was developed by Dr. Hussein Hiyassat et. al

Here is the download link: http://sourceforge.net/projects/arabisc/ (Gives an error at run-time)
But I’m still trying to contact the developer for a full access to the system to download if available.

Here is a full description of their work : http://www.springerlink.com/content/n3658k1758140266/