Debating multiword phrases and core vocabularies

“Several major transitions in language use take place during the first 5 years of life. Each transition allows the child to move to a higher level of complexity of expression and to accomplish communicative goals more flexibly and precisely than was done at the previous level. At least three of these transitions appear to be modulated to some degree by speech. In the first transition, prelinguistic to early linguistic communication, babbling provides the infant with a prelinguistic form of vocal behavior that is in many ways analogous to language.A second transition takes place in the movement from single words to multiword combinations. In the process of this transition, word order becomes a means by which children convey semantic role information, and transitional forms such as successive one-word utterances help to facilitate the child’s leap from single-word speech to multiword sentences. “(Paul, 1997) 

We are aiming to build a dictionary database in order to collect the initial core vocabularies in Arabic and English to help us decide which symbol sets we use and how much work there will be when it comes to adapting them for localisation in Qatar.    Once we have made that decision we can add all the symbols from the set or sets if they have been produced with a Creative Commons licence, providing the right attributes.   Finally the aim is to start voting on how acceptable each symbol is in term of language and culture  as well as:

  • translucency (How appropriate is a proposed symbol for  a suggested meaning?) (Bloomberg et al. 1990),
  • guessability (Can subjects guess the intended meaning of a symbol?) (Hanson & Hartzema 1995, Dowse & Ehlers 2001,2003), and
  • iconicity (How distinctive are the symbols?) (Haupt & Alant 2003).

whilst beginning the adaptation process for both core and fringe vocabularies.

We know when building the dictionary  that encourages dialogue that it is the verbs/doing words that often provide the main part of early conversation with question words such as What? Where? etc rather than just the user specific nouns.  So it is important to think how the symbols will develop when it comes to multiword expressions (MWEs) as described in Multiword Expressions a Pain in the Neck for NLP.

For instance – how many symbols should we add for a core vocabulary when we add the verb  ‘put’ –  should this word be linked with symbols for ‘put on’ , ‘put off’, put under, put over, put out?  ‘Put’ is found in the English core vocabulary but are all those phrases?   Sometimes there are over 16 symbols for one verb in the present tense.

put sample symbols

Picto-Selector showing some ARASAAC and Sclera symbols for multiword phrases with ‘put’

It also happens in Arabic, as has been discovered by Amatullah when collecting data for the core vocabularies …

While I was taking down some word lists I noticed that ‘play’ was a common word with one group and then with another group they were commonly using these images (each one separate, and each it’s own picture) for ‘play’, e.g. ‘play with the ball’, ‘play on the bike’, ‘play with the toys’.


The debate seems to be swaying in favour of having all options but where a “word occurs in our core vocabulary then we make a generic symbol for it so it can be combined with other symbols to make phrases UNLESS there is a commonly used phrase which falls in our core vocab. In this case we have one picture to represent the phrase. ”  Amatullah

If this is the case when we select the symbols we will need to allow for the fact that many include a noun  from the fringe vocabulary as part of the image, so we have ‘see’ television, or ‘look at’ and the symbol shows a man looking at the television and the symbol is categorised under action verbs as well as nouns and part of the home environment. 

Multiword expressions appear in all languages and if one is considering Natural Language Processing technologies as a way of supporting access to symbols for conversational and written communication strategies then according to Paul (1997) there may be problems to overcome related to ‘overgeneration,  idiomaticity, flexibility and lexical proliferation.’


Battle, D. E., ( 2012) Ed  Communication disorders in multicultural and international populations; 2012 Pub by Mosby, an imprint of Elsevier Inc.

Bloomberg K, Karlan GR, Lloyd LL (1990): The comparative translucency of initial lexical items represented by five graphic symbol systems and sets, Journal of Speech and Hearing Research 33, 717–25.

Dowse R, Ehlers MS (2001): The evaluation of pharmaceutical pictograms in a low literate South African population, Patient Education and Counseling 45, 87–99.

Dowse R, Ehlers MS (2003): The influence of education on the interpretation of pharmaceutical pictograms for communicating medicine instructions, International Journal of Pharmacy Practice 11, 11–18.

Hanson EC, Hartzema A (1995): Evaluating pictograms as an aid for counseling elderly and low-literate patients, Journal of Pharmaceutical Marketing and Management 9(3), 41–54.

Haupt E, Alant E (2003): The iconicity of picture communication symbols for rural Zulu children, South African Journal of Communication Disorders 48, 45–54

Ivan, A., Cicling-, C., & It, S. (2002). This document is downloaded from DR-NTU , Nanyang Technological Author ( s ) and Intelligent Text Processing : Third International Conference : CICLing-2002 , Lecture Notes in Computer publication by Proceedings of Computational Linguistics and Intelligence.

Netzer, Y. (2006). Semantic Authoring for Blissymbols Augmented Communication Using Multilingual Text Generation Semantic Authoring for Blissymbols Augmented Communication Using Multilingual Text Generation, (November).

Paul, R. (1997). Facilitating transitions in language development for children using AAC. Augmentative and Alternative Communication, 13(3), 141–148. doi:10.1080/07434619712331277958

Tatenhove, G. M. Van. (2007). Normal Language Development , Generative Language & AAC, (October), 1–11.