This blog is to really thank Mashael Alkadi in Saudi Arabia for discovering another corpus of spelling errors in Arabic. Then thanks go to Areeb Alowisheq, here in the lab, for helping us to understand the differences between the list we had and the new lists.
Seb has been able to access files provided by the Galtawi project. This has allowed us to experiment with improvements to the spell checker.
The original 71,000 words with errors appear to result in a large range of words based on the nearest possible correction, whereas the 120,000 words with suffixes and prefixes, that will be added, all have exact matches to corrections. It is hoped this will improve error correction but we need to test this with a series of paragraphs.
The paragraphs will have around 100 common errors that will initially be tested against the Arabic Microsoft Word spell checker results, then against the present version of the ATbar spell checker and finally against the latest version of the spell checker to see if any improvements have been achieved.
Watch this space for the outcome!