Update on the Wiktionary issues for the Arabic ATbar dictionary

In the last blog Nawar mentioned the issues we are having with the Arabic version of Wiktionary and its presentation of definitions and alternative words when selecting text on Arabic web sites.  The Wiktionary pages do not appear to be as well organised in Arabic as they are in English.  They are incomplete and often return incorrect results or no results.

Arabic wiktionary homepage

Arabic wiktionary homepage

In a previous blog we showed a diagram that highlighted the importance of organising the stems related to words along with the definitions taken from Wiktionary. The way the words are presented with their changing meanings is important and Maraim and Nawar have been discussing the use of crowd sourcing to achieve a successful outcome as this is not something that can be done immediately if we want to make a useful dictionary that makes the most of open source software alongside content that is also open and accessible to all.

Maraim has written a blog about the subject in Arabic.  She explains the concept of crowd sourcing and provides examples of three different dictionaries – Lingoz, Wordia and Collins that have all used this technique to gather data.

6 thoughts on “Update on the Wiktionary issues for the Arabic ATbar dictionary

  1. E.A. Draffan Post author

    I have been testing Wikitionary and it is interesting to see the differences between the Arabic pages and the translated ones – if you go to the menu and pick another language for the same word – you may find it is categorised in a very different way so it is not just a straight translation.

    1. EA Draffan

      The real issue is all about the way the 3 letter root is found in the older dictionaries – you not only have to know the root but then go to its last letter. The dictionaries were developed for poets as that is how the literary culture was built. Having found the root you then have a series of characters and diacritics that are added to show whether the word is going to be a verb, noun etc or even the subject or object within a sentence. The root letters can be found anywhere within the word you have chosen – the letters are not necessarily next to each other when they become other words. So the 28 letters in the Arabic alphabet can be made up into a series of so called ‘templates’ as basic roots that are then changed to suit the meaning. Experts are trying to develop easier dictionaries as we explore this issue, so we hope to make contact with them via our students but at present there is nothing online that has an API like Wiktionary, so this seems to be the best root for improvement with the addition of stems.

  2. davidbanes

    Id be interested to know which crowd sourcing projects you think have been really successful in our field – I haven’t really seen a successful accessibility crowd sourced project (such as fixtheweb) but wonder if I have missed something

  3. E.A. Draffan Post author

    Magnus and I were saying exactly the same yesterday and this is the problem! Dr Mike Wald has talked about it and used these methods when asking groups of students to annotate lectures on Synote. I noticed this article on Crowdsourcing the components of accessibility that also discusses the issues from that point of view. There have been articles on the subject in relation to disability and support and in a way all our mailing lists are a type of crowd sourced collection of ideas and strategies but there has to be some understandable gain for the contributors. Here is a paper about the way an app could be used for crowd sourcing fashion ideas for those who are blind Crowdsourcing subjective fashion advice using VizWiz: challenges and opportunities

    Just as a small update – Nawar has been working on using a free dictionary and then possibly allowing people to add to it – I think this is much more realistic, but it will be interesting to see if Maraim and Nawar can also get the help from colleagues to check how easy the dictionary is to use and whether additions make a difference.

  4. davidbanes

    I think crowd sourcing / contributing can work once sufficient core resources are available – so adding symbols to an existing corpus is likely to be more successful than starting from scratch – my experience is that a crowd needs some clear direction upon which to work otherwise it is likely to lose interest

Comments are closed.