Category Archives: Arabic TTS

New ATbar YouTube video in Arabic and continued maintenance

Nawar Halabi has very kindly provided an introductory video of the Arabic version of ATbar and we have uploaded it to YouTube.

YouTube video overview of ATbar in Arabic
Nawar has also been testing the Arabic version as part of our maintenance programme.  We have found some issues with Arabic mis-aligned text at times and there are occasions when the CSS of the website needs to be isolated from the toolbar.  Otherwise all the plugins appear to have worked well in the last few months.

testing dictionary

Testing the Arabic dictionary

testing word prediction

Testing the spell checker, text to speech and word prediction.

arabic report

Where failures were reported these were double checked and found to be due to the word not being a partial word or not being in the dictionaries – usually due to an English speaking person trying to cut and past Arabic words!

 

Testing times with Arabic Windows 8 and Arabic eSpeak.

A visit to the Assistive Technology Industry Association 2013 conference where the Microsoft team kindly showed me how we could work in Arabic and English plus the arrival of our Dell tablet with Windows 8 has made us look at the issue of Qatari Arabic support and Windows in depth.

qatari keyboard Qatari Arabic language pack

We downloaded the language pack and changed the keyboard and all seemed well but it appears from the email I received from their product advisor that there is no Window Arabic voice at present.

“I researched the question to see if Windows 8 supports Arabic (namely Qatari dialect) text to speech. Unfortunately, at this time, Windows 8 does not support it. Only certain languages are included in the built in software.”

So back to the drawing board for the ATbar desktop option – Narrator is not going to speak in Arabic unless someone has found an Arabic Windows system with a well hidden free voice from Microsoft!   If anyone has found a solution to this problem please do let us know!

eSpeak

eSpeak logo

More research and thanks to a recent development with Arabic eSpeak we now have a free voice,  Testing has shown that the voice needs to be improved but with work on the phonetics in the future this is something that could be done.  The aim is to ship NVDA with the ATbar desktop version and the Arabic eSpeak voice.  It will not really be an acceptable voice where a Nuance or Acapela option is available.

 

translation into ArabicThe Windows 8 mobile OS has the potential to support more Arabic options and offers translation from OCR although the actual text is still not 100% correct – Spot the problem!

Nuance has a choice of Arabic voices  for mobile and has added speech recognition but none of our team have been able to test its success rates.  Google has also rolled out speech recognition in Arabic for Android phones 

We have been testing online speech recognition systems offered by Google Chrome and they really are not very successful in the Arabic dialects offered.  Below is an example of Speech Recognizer in Arabic.

speech recognizer

The TalkTyper system uses Speech Recognizer for speech recognition as well as text to speech – the latter uses a very good voice in Arabic – we are still exploring which voice is used but it sounds like Nuance Maged in Arabic.

What this spot for updates next week linked to the ATbar desktop app and ATbar TTS.

 

 

 

ATbar has a choice of voices and a colour overlay plugin.

Arabic voice choices

Arabic voice choices

We have been experimenting with voices on the ATbar as there has been some discussion about using a male voice as this may be more acceptable to some users.  We really would value your input into these thoughts.  The English version of the toolbar now has Lucy (F) and Peter (M) and the Arabic voices are now Leila (F) and Mehdi (M).  This additional service comes thanks to Insipio and the work Lars and Magnus carried out over the last few weeks.

The return of the text to speech from the server has been reduced from 4 to 2 seconds – we will monitor whether this has an other unforeseen consequences.

Magnus has also add another plugin as standard to the Arabic and English toolbars.  A colour overlay plugin that will allow users to read websites with less glare. There is a choice of colours – cream, pink, pale blue and pale green.  We hope this will help those who have visual stress, find the glare of black on white hard to read as well as those with other specific learning difficulties such as dyslexia.  If you are using Chrome, Safari and FireFox browsers you will also be able to click through the overlay and even write with most forms.  Sadly Opera and Internet Explorer do not support a click through ability. There is a step by step guide on the ATbar wiki in English and Arabic.

pink overlaywordpress with blue overlay

 Looking to the coming months

Arabic Dictionary

Nawar has been working on a new dictionary that will offer users a word list that is more useful for Arabic speakers – it will be based on prefixes and suffixes with root words that may then link with Arabic Wiktionary if it exists and he is hoping to adapt the way the results are presented. It is hoped this will be finished by the end of March 2013

dictionary plan

 

 

 

 

 

 

 

 

 

CSS issues

Magnus is working on the CSS issues that occur with some Arabic websites – we have recently been running evaluations on a series of important sites to see how prevalent the problem actually is with the poor presentation of our dialog boxes.  I will be blogging about the particular issues and we will be illustrating the results of the changes as they happen.   It is possible to change the dialog box for individual sites as has been done in the past but this is not the answer as sites constantly change so we need to find a robust solution that works for all.  This will be finished by the end of January.

TTS free voices

TTS work has been on-going and Nawar has tried the Euler/Mbrola route which despite much experimentation has not been successful so far.  eSpeak experiments are ongoing for the desktop version and it is hoped that we can still find a solution for both desktop and  web based toolbar TTS functions by the end of March 2013.

Meetings with Mesar at ATSummit resulted in a discussion about NVDA being used for text to speech as well as a screen reader – in other words developing a way for the program to respond to selected text that has been visually highlighted as well as offer more options to reduce the verbosity for dyslexic users.

Arabic ATbar Desktop version (Windows Xp, 7)

We want to have a free TTS  for the Windows system desktop ATbar when we link it to NVDA as at present the desktop version links to Narrator which does not read in all applications but offers good selected text to speech and screen reading feedback in Wordpad, Notepad, Internet Explorer and works with all individual letters typed as well as for all actions on the Windows desktop and with system operations -the help file has useful keyboard shortcuts.

The ATbar desktop version once installed launches at start up and has menu buttons for text to speech, coloured overlays, an onscreen keyboard and magnification as mentioned in our previous news update.

desktopATbar

 

 

 

 

There is an ATbar desktop download available    There is now a portable version of the desktop ATbar that can be used on USB pendrives – the lower menu button on ATbar website.  The work on the desktop version has been completed and is awaiting any comments from users.

YouTube videos illustrating the ATbar features.

We have set up a series of YouTube videos that include:

Text resizing, font style changes and line spacing. This video has no audio but shows how a user can select the magnifier on the toolbar to enlarge text without resizing the graphics – this tends to allow for more readable text when compared to zooming using the browser Ctrl+ which also enlarges the graphics.  However, this feature does not work when Flash has been used within a webpage or fonts have fixed sizes or styles.  The same applies to increased line spacing which is also demonstrated.

YouTube link to the video

The second video demonstrates how the A.I.Type word prediction works as well as spell checking when writing a blog using WordPress.  Use the HTML mode when working in the edit box rather than the Visual mode and then you will also be able to use the text to speech to aid proof reading.


YouTube link to the video

The last video demonstrates the use of text to speech with the Acapela voice in both Arabic and English.


YouTube link to the video

ATBar Word Prediction and Text to Speech working in text boxes

Arabic wordprediction

Arabic wordprediction with keyboard access

Seb has enabled the AIType word prediciton with keyboard access and text to speech for simple text boxes in his recent updates to the toolbar for both Arabic and English.

The Word prediction button needs to be selected before entering text.  It is possible to use the ‘esc’key to ignore a prediction and close the dialog box or use Ctrl+Alt and the word position as a number to insert the required word.

word prediction

Word Prediction in WordPress

We have found that the prediction and text to speech work with HTML views of text boxes in WordPress and Blogger but not the Visual mode which overrides the ATbar.

The text needs to be highlighted before the text to speech button is selected.  There may be a pause before you hear the speech.

Updates on the progress on Arabic spell checking, TTS, Word Prediction and the ATKit

footstepsThe last few weeks since the Christmas break have flown by with a flurry of activity which is retrospect seems at times to have made us feel as if we have been going two steps forward only to have to go at least one if not more steps backward!  But there have been some breakthroughs in the areas of Spell checking, Text to Speech, Word Prediction and the ATKit website.

Spell Checking

Thanks to Mashael AlKadi we have a really clear evaluation of the spell checker titled Dyslexic Typing Errors in Arabic (PDF download) and also thank you to Mina Monta who commented that:

  • “Some of the words are correct in spell & in the meaning but AT spell checker detect that those are wrong words
  • In the suggested word list, there is no sorting according to the priority of the suggested word (according to the relativity between the suggested word & the original wrong word)
  • Some of the suggested words are wrong in spell
  • The number of the suggested words is to high comparing with MS Word spell checker.
  • MS Word is better in detecting the wrong words in grammar (the word has correct spell) “

Sadly research into English spell checkers has revealed that they are not as accurate as we had hoped when it comes to providing false errors and real words or homophones as can be seen from this presentation about online spell checking.

I asked Mashael whether adding a new corpus would help as Seb has succeeded in collecting a larger Arabic corpus and has put in some code to make it possible to add this extended vocabulary.   However, Mashael’s comment was:

“regarding adding new words, do you mean expanding the tool’s dictionary? I don’t think you should worry beacuse it was working very well expect for certain remarks that I’ve said such as the tool’s behavior with words attached to prepositions. In such case only some adjustments should be applied to the tool’s mechanism and I think it will work great.”

So with the support of Erik and Mina in our last meeting, it has been decided that we will work on particular improvements as a future aim with the help of our Arabic speaking colleagues.

Text to Speech

It has been a bit of a trial and error period starting with the withdrawal of Google Translate. We were aware this might happen, but had rather hoped there could be a reprieve as this was a free option, although in the tests carried out with 5 Arabic speaking students the results were poor in comparison to Acapela and Vocalizer voices. The sadness also on the part of the time spent on this work as it was something we had proved was possible to achieve – a free TTS on the toolbar.  Microsoft Speak Method was also tried and tested – but the TTS appeared to leave off initial sounds and the voice was unacceptable to our beta testers.

We also learnt that NVDA in Arabic was only going to work with the Arabic TTS offered by Microsoft and eSpeak and Festival with the Mbrola project was still an uphill struggle.

As a research project and definitely not for profit we also wondered if we could go back to Google Translate but the agreement  specifically says  “The program may be used only by registered researchers and their teams, and access may not be shared with others.”

Meanwhile Fadwa Mohamad kindly visited King Abdulaziz City for Science and Technology(KACST) over the Christmas period and met Professor Ibrahim A. Almosallam who has been in touch to say that they are developing an Arabic Text to Speech application, but it has yet to be released.  I am enquiring as to whether this is a desktop application or a VAAS system (Voice as a Service) such as that offered by Acapela in Arabic.

Seb then spent time working on the Acapela VAAS system and this was shown to work well in all the tests although there are issues when a whole page is read out.  It is felt that it might be more appropriate to restrict the call on the servers and just allow text to be highlighted and then spoken.  We now have to negotiate the way we can work with this system, as the final output needs to be free to the user.

There is also the option of building a new Arabic voice and this is being explored – although it would take time and effort to generate the corpus, normalise the output and beta test, even when there are engines available to achieve this aim….. A new build Arabic voice needs further discussion but we have the connections in place.

WordPrediction

wordprediction screen grabSeb has been able to show how this feature for the toolbar is possible in English and the background architecture is in place for the Arabic version pending the language pack.

ATKit website

ATkit siteIt has been agreed that the mock up of the ATKit website that was available as a demonstrator should be taken forward and developed.  This has been completed with the ability to add plugins both free and those that require payment (for instance where a TTS requires a fee). Users can register, build  their own toolbar and save the results.  The next step is a completed Arabic translation and the ability to author plugins …

Arabic ATKit

Arabic TTS discussions and success with ATKit beta

TTS logosAs we have all suspected the market for text to speech is now a choice between Nuance and Acapela with eSpeak and Festival offering a very limited choice of languages.  The licences for using options offered by the operating systems such as Microsoft and Apple do not allow us to use these for a browser based toolkit.

So we have been trialling the voice with Google translate but that only works for 1000 characters and is liable to disappear as a service.  We discussed the issue with a Google employee who was not very hopeful that we would be able to pursue this idea further although we would still like to keep this door open.

We also want to continue to see if we can discover any researchers still working on an open source free TTS for Arabic speech, but in the meantime we have been discussing the use of the Acapela Voice As A Service system that also works well as a plug-in for the new ATKit.

The web site for the English version of ATBar using the ATKit system of plug-ins  is ready for testing and final checks for the Arabic version will be set in place with plugins once agreements have occurred regarding the TTS, as all other sections are complete.  We are still looking for suitable dyslexic type errors to improve the present dictionary and have begun the research on both the word prediction and speech recognition.

Finally we have set up a ATKit plugin Google Group for further collaboration in the hope that this can become a truly open innovative process and a case study for the REALISE market place which has just received sponsorship from Devices for Dignity who are interested in seeing how case studies such as the ATKit develop in the future.

Spell checking and the Arabic script

The Arabic script is cursive and we have been exploring difficulties with accurate online spell checking. Fadwa Mohamad has kindly shared her knowledge about some of the issues that arise for those with dyslexia when it comes to the way Arabic characters are linked. Arabic has 28 letters to represent 34 phonemes and we have already discussed the issues of vowels and diacritics. Now we have learnt there is the thorny problem that only 22 of the 28 letters have two way connectors. The 6 remaining letters can only be joined in one way – so an Arabic word can contain one of more spaces. This means a word using some of these 6 letters, that can only be joined up in one way, may be divided in several places.

The other problem of note is that capital letters are not used in Arabic, so once again it may not be easy to see or work out where word boundaries occur. This along with the odd spacing obviously causes concerns for some readers, but may also be one reason why a spell checker can appear to gobble letters when it tries to correct a word!

To add to these issues the articles ‘the’,’a’ or ‘an’ in English tend to be joined to the following word in Arabic –  so those who can read Arabic will recognise the letters ‘AL’ or “Arabic: الـ‎, also transliterated as ul- and in some cases il- and el- ” according to Wikipedia. The reader has to also work out whether the ‘AL’ will be silent or voiced in some cases which impacts on text to speech engines and the lack of spacing can affect spell checking.

Finally Arabic letters may be formed in different ways depending on their position in the word.  So a shape may change from its isolated form to one that is different when seen as the initial letter in the word or the medial one or even the final one! This is how arabic-course.com describe the issue.

Arabic letter changes depending on the position in a word


The work to discover how we can overcome the letter gobbling spell checking and the mispronouncing speech synthesis continues!

 

Insight into the issues for open source TTS in Arabic.

Over the summer the team have been investigating the issues around TTS in Arabic and Edrees Abdu Alkinani has completed his MSc report which has made interesting reading as it summarises many of the findings.   It was noted that Arabic TTS synthesis did not have the early successes of European languages due to the limitations in Natural Language Processing (NLP)  and the complexities of using diacritics as substitutes for vowel combinations. However, with the advances in Natural Language Processing (NLP) and Digital Signal Processing (DSP) plus automatic diacrtizers progress is being developed progress has been made in the commercial world where there are now several attractive Arabic synthesised voices as will be seen in an evaluation to follow.

Issue No 1 – Lack of diacritics on web pages.

Arabic diacritics

The Learning Resource - Arabic language

English speakers may wonder at the reasons for the difficulties with Arabic TTS, but it does not take more than a cursory glance at the written language to understand that having 14 different diacritic marks with 34 phonemes, 28 of which are consonants, and only six vowels that the combinations may cause TTS problems. As Eedris pointed out… ” كُتُبْ ” means books and ” كَتَبَ ” means wrote – the only difference you will notice is the type of marks used above the letters.

English vowel sounds

TEFL world wiki - English vowel sounds

This is compared to the English basic 12 vowel sounds with no accents or diacritics even though we may complain about our odd pronunciation of some written words – rough, cough, though, thorough and through – at least some of the letters are different and we cannot leave any out.   Yet this is what is happening with written Arabic on the web – the diacritics are being left out….. Number one problem for a text to speech engine.

Issue No 2 – The differences between the way the TTS is developed and the resulting output.

Research has shown that although there are now a few text to speech engines they are commercial and even these vary in quality.  The MBROLA project links to work carried out in the open source world, but at present it has been impossible to achieve success with the code offered in the various repositories for evaluation purposes.    However, Eedris has supplied the team with these comments based on the demonstrators offered by the various organisations and companies.

  1. MBROLA project
    MBROLA has two Arabic voices as a recorded audio file. The speed of speech is slow, and the quality poor. Moreover, the pronunciation is hard to understand – even for a an Arabic speaker.  The stress pattern is often incorrect and the distinction between words unclear. The most difficult words to understand have letters like, “ أ” ‘A’, “ ض” ‘th’, “ ل” ‘L’.
  2. Acapela Group
    Acapela offers two good quality male and female voices.  The pronunciation for words with and without diacritic marks is understandable, with accurate stress patterns. There are three letters which appear to cause some difficulty  “ ج” ‘j’, “ ا’ ‘a’, “ ك” ‘k’. The pronunciation of numbers in all situations is good.
  3. Nuance Vocalizer
    Nuance provide a very clear male voice with clear pronunciation. The only problem is that the system produces speech without taking into account diacritics. Words which have letters like “ ق” ‘q’, “ ش” ‘sh’, and “ ض” ‘th’ may cause problems but the speed of speech used in the online demo is good. Numbers are not clearly enunciated due to the lack of diacritics.
  4. Loquendo
    Loquendo offer a recording of a male and female voice on their site as the Arabic voice has only be available since October 2010. The system has good sound quality clear speech. The example on the website has diacritic marks but as it is a small sample it is hard to judge the overall quality but it appears to be good.


Issue No 3 – Further Development of eSpeak with Arabic.

The current version of MBROLA does not appear to run with the arabic voice files and there seem to be very few people who have had success.  So this is work in progress…

 


Recent research by Mashael AlKadi using an ATBar simulation.

I have just read an extremely interesting report by Mashael that looks into the issues around creating an Arabic speech recognition module for the ATbar and ATKit.  The report has a very useful analysis about the tools available and some important considerations which we will cover in more detail in the future.

Mashael collected data from 41 Arabic speaking post-graduate, under-graduate and secondary school students.   In brief the results showed that this group of users tended to browse for text (44%) and multimedia content (42%) with only 14% games or shopping and using social networks etc.  Few seemed to know or use off line services (90%) and this was commented upon in the conclusion as being a useful way of working with the toolbar when off line and should be considered in a similar way to the Silverlight approach – saving useful dictation results or working with forms at a later date.

Speech recognition command and control was not felt to always be useful and the group surveyed did not specify a need due to a disability, in fact 80% said they were happy to use the mouse and keyboard for browser control.  However, 35 of the users said they would use speech recognition for language learning, 20 selected translation, 16 school work, 15 web activities and 10 for work based reports.  High accuracy rates were required (90%) with the use of diacritics, despite the fact that these can cause problems for those with visual impairment and for the elderly.  61% felt that it would be useful to save dictated data for re-use.

Other research that Seb found showed that only 1% of websites are available in Arabic and Mashael found that 44% of her participants wanted to be able to use both English and Arabic for data entry and over half (59%) wanted to have text to speech to read back content.  They appeared to require accuracy over a large vocabulary in terms of speech dictation and its use on the web.

Although several users of the prototype ATbar shown by Mashael in the video below wanted extra features most were happy with the basic version and were content with the design and core functionality.  Mashael highlighted the usefulness of the kit approach with the introduction of a Braille API and the need for a flexible approach to language support.