Tuesday, January 17, 2012

The Origin and Evolution of Languages – II: Bio-informatic Apparoach

From a linguist’s perspective, there is no reason to assume that the words would behave the same way that the organism would do. Legendary linguist Prof Noam Chomsky of the Massachusetts Institute of Technology believes that (a) the evolution of spoken language does not follow biological evolution because spoken language was not present in apes and originated in humans suddenly due to a mutation, (b) the genetic clues unravel events that are much older than those mapped by the linguistics, and (c) invasions and political revolutions in certain regions, could have completely wiped out or modified the languages of those regions.
Psychologist David Premack, emeritus professor of psychology at the University of Pennsylvania is famous for having taught language to chimpanzees. Premack has shown that the chimpanzees can understand the language but cannot produce it. The epigraph at the head of the webpage for the ‘Third Conference on Evolution of Languages’ carried this provocative statement by Premack:
The above statement neatly sums up the mood of the linguists.  But look at the map of human migration generated by the molecular biologist and compare it with the presence of various language families in the world.
A map of early human migration (from the Wikimedia Commons). Figure shows number of years from present.

HumanLanguage Families Map (from the Wikimedia Commons)

Linguists believe that the original language of humans has become extinct. It is difficult to establish relationships across major families and that any grand evolutionary trees of languages cannot be formed. Languages totally un-related to each are spoken in geographically adjacent zones. From a biologist’s point of view, I have reasons to believe that all human beings should have carried a common mother tongue from their centre of origin in Africa. During the transmission of words from one generation to another, the words would have replicated/ reproduced. Mutations in words would have occurred in much the same way as in genes. Replacement of a wrong letter in place of the original one would have given rise to phonetic variation (a la biological variations) in vocabulary giving rise to synonyms. As branches and sub-branches of the original populations settled in different geological territories along the path of human migration, geographic isolation would have cut off the chances of communications leading to speciation of new language forms. However, some aspects of evolution of language are difficult to be explained by invoking either the classical theories of biological speciation or by theories of Chomskian linguistics. The 144-member strong Indo-European language family has a vast geographic reach. Cutting cross all sorts of geographical barriers, there is much less difference in various languages of the Indo-European family, say between Bengali and Spanish. On the other hand there are no such great equivalent barriers between territories occupied by speakers of Iranian, an Indo-European language and Arabic, an Afro-Asian language, or between Dravidian Tamil and Indo-European Oriya.  We have to find a satisfactory explanation for the abrupt discontinuities of the Afro-Asiatic and Dravid language families with the Indo-European language family. There is no historical evidence for the alleged Aryan conquest of a Dravidian India that replaced speakers of Dravidian languages in the North India with the Indo-Europeans. In the last few years, researches in molecular genetics have completely and convincingly demolished the Aryan Invasion Theory (AIT) proposed by the German linguist Max Muller in the 19th century. A complete understanding of the factors affecting the linguistic history of the world continues to elude as well as excite the investigators and the public alike.  Christiansen, M. H. and S. Kirby, editors of the book ‘Language Evolution’ (2003) have termed the subject as “the hardest problem in science”.

In the year 2002, I was supervising the work of a student for her M.Phil. thesis on bioinformatics. We were performing genome analysis of a plant Arabidopsis by employing Basic Local Alignment Search Tool, or BLAST (it is an algorithm for comparing primary biological sequences like amino acids in proteins and bases in DNA and RNA. While editing the thesis I felt that it is time that we applied Basic Local Alignment Search Tool (BLAST) for matching the sequence of letters in synonyms. I found that although linguists have been doing matching of words indifferent languages, they were obsessed with their known theories of origin of new words. I had no such binding. I was doing for fun. I did not know theories of linguistics and I still don’t know anything. I started doing the BLAST for words manually, as a hobby and as a serious hobby by February 2004. By the end of 2010, I had the matched the sequences of about 14,000 words (and it is growing). I was playing with words all these years. During this play, the words started to unravel the rules of the game. Which letter will mutate to which one, in a manner similar to the game of bases in the genetic code. The multiple gene code for amino acids was like synonyms. The letters in words wobbled like bases in genetic code. I found that just like the bases in DNA, the letters in words never disobey those rules. But after seven years the words demanded my full time attention. I quit my job to play with them full time. I invite you to join me in this game! We shall play with the words and also discuss their behavior and evolution.
Welcome aboard to the “DNA of Words!”


  1. Amazing !! We can discuss your finding.
    I think every language has some restriction in sequence of sounds. Like in Bangla the way we pronounce a sound in a word depends on the neighbouring sounds, thus the deviation from Odiya and Sanskrit. This is to ensure that the tongue doesn't have to make difficult movements and neighbouring sounds are pronounced with very similar movements of the tongue. A common transformation will be of A (the first vowel) which is pronounced both as "awe" (awe-some, all, fall) and "o" (Odissa, Fore, Lore). May be similar "constraints" regulate how the word evolves.

    P.S. Aha! Now I know why you quit so early. You had a greater plan.

  2. Wow


    (Sir - please remove word verification for comments - it becomes difficult to comment)

    1. Thanks Shilpa. If you have liked this post, I am sure that you will like the post dated 26 Feb 2012: Search for the mother tongue of the world: Genes and words mutate in the same way पूरी दुनिया की मातृभाषा की खोज: जीन और शब्द एक तरह से बदलते हैं; Rules of the word game शब्दों के खेल के नियम.

      I have no control on the word verification tool. I suppose that it is an embedded software to prevent spam comments and advertisements.

    2. I am not sure but I think that you are not asked for word verification if you are logged in. Shall check about it.

    3. Sir - word verification is a choice of the blog owner.

      Please go to -->

      -your blogger dashboard
      -settings tab
      -comments (under settings tab - not by the side of it)
      - you will find options for Comments, Who Can Comment?, Comment Form Placement,..... one below the other. Please scroll down -->
      -below the comment moderation option, there is an option "Show word verification for comments? "
      -select "no"
      -please scroll down
      -click on "save settings"

      done :) Thanks

    4. Sir, I think my comment went into spam :(

      Sir- we can adjust settings to remove word verification

      * blogger dashboard
      * settings tab
      * comments tab UNDER settings tab - not by the side of it
      * lots of options appear one below the other - please scroll down
      * click on "Show word verification for comments?" which appears below moderation option
      * click "no"
      * scroll further down
      * Click on "save settings"

      Done :)
      [and sir - please delete this comment later :)]

  3. Thanks Shilpa ji. I have tried my best but I am not finding the said option in the 'settings' on the dashboard. I have read about it in help topics also. i will try to get somebody's help. Yes, your earlier comment had gone to spam folder. I am a new blogger and trying to learn. Hope you will forgive. Shall check the spam folder now onwards. Thanks again.

  4. sir - sorry if you feel i am talking too much - but i think you want to remove it - so am trying to help

    i think you are using the"new blogger interface" which is the reason you are not finding the option. You need to go to the old blogger interface.

    --blogger dashboard
    --below your name there are two tabs - "english (or whatever language you have chosen)" and a small "settings gear" {- side by side - just under your name.}

    -please click on the gear (similar to settings in most mobile phones) under your name

    -drop down menu window appears
    -in that drop down menu - click "old blogger interface"

    - old interface will open
    - now you can use the options as i have explained in the previous comment.

    Thanks, and sorry to trouble you. And - i am new myself - not a very old blogger :), i dont know if i can even call myself a "blogger" ? By the way - could you try tracing the word "blog" ? :)

    No need to thank me sir, I am being selfish :) it will be much more convenient to comment once it is removed :) and many other readers may not be commenting because of the word verification.

  5. Thanks Shilpa for your help. Yes I have done it through the old interface.

  6. @ Shilpa: The etymology of the word blog is well-documented. The term 'web log' coined in 1997 was changed to 'we blog' and to simply 'blog' in 1999 (please see Wikipedia article. I have researched on the words 'web' and 'log'. Shall write about it soon.

  7. क्‍या इस लेख का हिंदी रूपांतरण भी है?