Monthly Archives: December 2011

Testing for Arabic spelling errors

Once again thanks to the help of Areeb we have been discovering the issues around Arabic spell checking even in MSWord which has been our comparator for the toolbar spell checker.  Areeb constructed a Word document with Spelling Mistakes so that we could test it against the Microsoft Arabic spell checker, then with the present toolbar spell checker and finally with the new corpus when it is uploaded.

Areeb has already made several useful comments:
“There are several issues:

MS Word false positives: detected by Word as mistakes, but aren’t actually.

  • Some names rarely used, reasonable for MS Word to flag them
  • Words that should not be flagged like in the doc I sent بها was considered a mistake by Word 2007 although it is absolutely correct.

MS Word false negatives:

  • Mistakes undetected
  •  A mistake that would change a word into another correct word

This happens more often than in English I think, I realised that when I was trying to force mistakes sometimes I had to try several times to misspell the word, mimicking common spelling mistakes, and MS Word would still consider it correct, and it is correct but not in the context.

  • Words that should be flagged but aren’t like ذالك Should be ذلك

And this is a common error, I believe it  should be flagged.

 

Arabic ATkit 1st paragraph mistakes

Arabic ATkit 1st paragraph mistakes - select the picture to see an enlarged view.

It would be very helpful if other Arabic speakers could use the spell checker in MSWord to test the type of errors made using our Spelling Mistakes document and then connect to the ATbar2 site and delete the present text in the edit box, select the spell checker on the launched toolbar and then copy and paste in sections, to see if the same results occur. Please make comments on the blog – then we will update the present version of the ATbar to review any changes that occur as a result of the new corpus.

Thank you for your help in this project – best wishes over this holiday period and for the New Year.

Additions to the ATbar in Arabic spell checker

This blog is to really thank Mashael Alkadi in Saudi Arabia for discovering another corpus of spelling errors in Arabic.  Then thanks go to Areeb Alowisheq, here in the lab, for helping us to understand the differences between the list we had and the new lists.

Seb has been able to access files provided by the Galtawi project.  This has allowed us to experiment with improvements to the spell checker.

The original 71,000 words with errors appear to result in a large range of words based on the nearest possible correction, whereas the 120,000 words with suffixes and prefixes, that will be added, all have exact matches to corrections.  It is hoped this will improve error correction but we need to test this with a series of paragraphs.

The paragraphs will have around 100 common errors that will initially be tested against the Arabic Microsoft Word spell checker results, then against the present version of the ATbar spell checker and finally against the latest version of the spell checker to see if any improvements have been achieved.

Watch this space for the outcome!