Category Archives: Arabic Word Prediction

Word Prediction update

During the summer we noticed that AIType was cutting off the initial letters when used with Internet Explorer and then we had a complete collapse of the service for a short time. This caused some concern and when we tested the Windows desktop version of the software we could not reach the servers as quickly as expected.

AIType were amazingly quick in reassuring us that they had had some server issues but these were immediately rectified. We checked our servers and the problems happened to occur just as Magnus was updating our servers as well.

Happily the speed issues have been resolved and so have the problems that were occurring with Internet Explorer. Both the Arabic and English word prediction are working well at the moment as has been illustrated in the graphic whilst writing this blog. I have been using the HTML view in WordPress. If you use the Visual mode in rich text editors there tends to be a problem with the dialog box with the word selections not appearing possibly due to the javascript edit box overriding the word prediction whereas a simple edit box always works. As many of the menu items on the rich text editor tool bar are not accessible via keyboard, this may not be an issue for those who do not use a mouse.

word prediction

Word prediction used with a simple edit box - HTML mode in WordPress


الصورة التي تظهر هي اختبار البرنامج بالعربي
Arabic word prediction

YouTube videos illustrating the ATbar features.

We have set up a series of YouTube videos that include:

Text resizing, font style changes and line spacing. This video has no audio but shows how a user can select the magnifier on the toolbar to enlarge text without resizing the graphics – this tends to allow for more readable text when compared to zooming using the browser Ctrl+ which also enlarges the graphics.  However, this feature does not work when Flash has been used within a webpage or fonts have fixed sizes or styles.  The same applies to increased line spacing which is also demonstrated.

YouTube link to the video

The second video demonstrates how the A.I.Type word prediction works as well as spell checking when writing a blog using WordPress.  Use the HTML mode when working in the edit box rather than the Visual mode and then you will also be able to use the text to speech to aid proof reading.


YouTube link to the video

The last video demonstrates the use of text to speech with the Acapela voice in both Arabic and English.


YouTube link to the video

ATBar Word Prediction and Text to Speech working in text boxes

Arabic wordprediction

Arabic wordprediction with keyboard access

Seb has enabled the AIType word prediciton with keyboard access and text to speech for simple text boxes in his recent updates to the toolbar for both Arabic and English.

The Word prediction button needs to be selected before entering text.  It is possible to use the ‘esc’key to ignore a prediction and close the dialog box or use Ctrl+Alt and the word position as a number to insert the required word.

word prediction

Word Prediction in WordPress

We have found that the prediction and text to speech work with HTML views of text boxes in WordPress and Blogger but not the Visual mode which overrides the ATbar.

The text needs to be highlighted before the text to speech button is selected.  There may be a pause before you hear the speech.

Updates on the progress on Arabic spell checking, TTS, Word Prediction and the ATKit

footstepsThe last few weeks since the Christmas break have flown by with a flurry of activity which is retrospect seems at times to have made us feel as if we have been going two steps forward only to have to go at least one if not more steps backward!  But there have been some breakthroughs in the areas of Spell checking, Text to Speech, Word Prediction and the ATKit website.

Spell Checking

Thanks to Mashael AlKadi we have a really clear evaluation of the spell checker titled Dyslexic Typing Errors in Arabic (PDF download) and also thank you to Mina Monta who commented that:

  • “Some of the words are correct in spell & in the meaning but AT spell checker detect that those are wrong words
  • In the suggested word list, there is no sorting according to the priority of the suggested word (according to the relativity between the suggested word & the original wrong word)
  • Some of the suggested words are wrong in spell
  • The number of the suggested words is to high comparing with MS Word spell checker.
  • MS Word is better in detecting the wrong words in grammar (the word has correct spell) “

Sadly research into English spell checkers has revealed that they are not as accurate as we had hoped when it comes to providing false errors and real words or homophones as can be seen from this presentation about online spell checking.

I asked Mashael whether adding a new corpus would help as Seb has succeeded in collecting a larger Arabic corpus and has put in some code to make it possible to add this extended vocabulary.   However, Mashael’s comment was:

“regarding adding new words, do you mean expanding the tool’s dictionary? I don’t think you should worry beacuse it was working very well expect for certain remarks that I’ve said such as the tool’s behavior with words attached to prepositions. In such case only some adjustments should be applied to the tool’s mechanism and I think it will work great.”

So with the support of Erik and Mina in our last meeting, it has been decided that we will work on particular improvements as a future aim with the help of our Arabic speaking colleagues.

Text to Speech

It has been a bit of a trial and error period starting with the withdrawal of Google Translate. We were aware this might happen, but had rather hoped there could be a reprieve as this was a free option, although in the tests carried out with 5 Arabic speaking students the results were poor in comparison to Acapela and Vocalizer voices. The sadness also on the part of the time spent on this work as it was something we had proved was possible to achieve – a free TTS on the toolbar.  Microsoft Speak Method was also tried and tested – but the TTS appeared to leave off initial sounds and the voice was unacceptable to our beta testers.

We also learnt that NVDA in Arabic was only going to work with the Arabic TTS offered by Microsoft and eSpeak and Festival with the Mbrola project was still an uphill struggle.

As a research project and definitely not for profit we also wondered if we could go back to Google Translate but the agreement  specifically says  “The program may be used only by registered researchers and their teams, and access may not be shared with others.”

Meanwhile Fadwa Mohamad kindly visited King Abdulaziz City for Science and Technology(KACST) over the Christmas period and met Professor Ibrahim A. Almosallam who has been in touch to say that they are developing an Arabic Text to Speech application, but it has yet to be released.  I am enquiring as to whether this is a desktop application or a VAAS system (Voice as a Service) such as that offered by Acapela in Arabic.

Seb then spent time working on the Acapela VAAS system and this was shown to work well in all the tests although there are issues when a whole page is read out.  It is felt that it might be more appropriate to restrict the call on the servers and just allow text to be highlighted and then spoken.  We now have to negotiate the way we can work with this system, as the final output needs to be free to the user.

There is also the option of building a new Arabic voice and this is being explored – although it would take time and effort to generate the corpus, normalise the output and beta test, even when there are engines available to achieve this aim….. A new build Arabic voice needs further discussion but we have the connections in place.

WordPrediction

wordprediction screen grabSeb has been able to show how this feature for the toolbar is possible in English and the background architecture is in place for the Arabic version pending the language pack.

ATKit website

ATkit siteIt has been agreed that the mock up of the ATKit website that was available as a demonstrator should be taken forward and developed.  This has been completed with the ability to add plugins both free and those that require payment (for instance where a TTS requires a fee). Users can register, build  their own toolbar and save the results.  The next step is a completed Arabic translation and the ability to author plugins …

Arabic ATKit

About the font types and ATbar translation

Hello every one..

About the suitable font type for Arabic Dyslexic, I found some papers reported that we should try to avoid the angular types (e.g. Koufi and Andalus) and we can use Arabic Transparent and Simplified Arabic Fixed.

About the translation for ATbar labels, I upload to files: “ATbar translation” and “ATbar translation with diacritics”.

Thank you.

Regards

Fadwa

Arabic Word Prediction

Modelling text prediction systems in low- and high-inflected languages

Abstract – Text prediction was initially proposed to help people with a low text composition speed to enhance their message composition. After the important advancements obtained in the last years, text prediction methods may nowadays benefit anyone trying to input text messages or commands, if they are adequately integrated within the user interface of the application. Diverse text prediction methods are based in different statistic and linguistic properties of natural languages. Hence, they are very dependent on the language concerned. In order to discuss general issues of text prediction it is necessary to propose abstract descriptions of the methods used. In this paper a number of models applied to text prediction are presented. Some of them are oriented to low-inflected languages while others are for high-inflected languages. All these models have been implemented and their results are compared. Presented models may be useful for future discussion. Finally, some comments related to the comparison of previously published results are also done.

TurboType

Turbo Type adds the word prediction feature to all text editors. This is not just about selecting words from a dictionary. Turbo Type is able to predict and select the word you are most likely to type. You can add your own words to the dictionary if you have technical or specific terms that you use often. Also with Turbo Type you can expand a small word into a complex text. You can sign your email in a snap! This program is highly customizable.

Project:Possibility Word Predictor

Overview

This goal of this software is to allow users with limited mobility to be able to more accurately and quickly input text into a computer. This software is intelligent enough to offer suggestions for the user based on context and the characters inputted so far.

For implementing the word level prediction, the application reads input text files, that can be all the documents in the users local machine, or all his emails. Any text that can help the application to initiate the word level contextual prediction.

The application maintains 3 sorted maps for storing as keys:

  1. A unigram map: This map stores individual words in the input text
  2. A bigram map: This map stores consecutive words in the input text
  3. A trigram map: This map stores 3 consecutive words in the input text.

Initially the user enters some input text (a word or two). As the user enters the letters of the words, the application presents him with some predictions. After the user has entered a word or two, the user can call the application to make predictions based on the usage context. The application crawls over the maps to present a set of predictions based on the bigrams and the trigrams that the user has provided as input text. The user can accept or reject the suggestions. In both the cases, the users input is reflected back to all the maps so that the context bases prediction becomes more refined.

Features

  • A demonstration of a state of the art word prediction technology which can be applied to other applications in the future

Current Issues

  • Slow startup/initialization time

Future Plans

  • Code cleanup
  • Integrate with non-OS specific “on-screen keyboards”
  • Ability to import impure text formats such as .doc and .html
  • GUI for importing files
  • Offer word predictor as a XML-RPC service

References