Skip to content

Latest commit

 

History

History
80 lines (65 loc) · 7.97 KB

File metadata and controls

80 lines (65 loc) · 7.97 KB

Authors: Luke Carlson, Mike Browne

Introduction

Telugu is a language with roots in several southeastern provinces in India. Though it is commonly written using a roman alphabet (with the assistance of pronunciation symbols) its original script is roughly ~900 years old and is decended from the phoenician and aramaic alphabets. The language itself has Dravidian roots, like most of its bretheren in southern India.

Syntax and Morphology

  • Order of Elements
  • Subject Object Verb (SOV): Telugu's ordering of the subject, verb, and object is different from English, using an SOV ordering instead of an SVO ordering.
  • Adjective Placement: Adjectives are placed before the noun or pronoun they modify (the opposite of French and other romance languages). For example, the phrase "big dog" = pedda kukka = పెద్ద కుక్క where "dog" = kukka = కుక్క and "big" = pedda = పెద్ద.
  • Adverb Placement: Adverbs are placed before the verb they modify which is different from the way it's done in English. A good example is the phrase "runs quickly" = tvaragā naḍustundi = త్వరగా నడుస్తుంది where "quickly" = tvaragā = త్వరగా and "runs" = naḍustundi = నడుస్తుంది.
  • Word Order
  • Word order is not free in Telugu, as one might conclude from the statements above regarding the placement of subjects, objects, verbs, adjectives, etc.
  • Interactional Morphology
  • Inflections occur mainly for nouns, which can be inflected for a variety of reasons. The three main reasons are as follows:
    • Number - singular or plural
    • Gender - masculine, feminine, or neutral
    • Case - nominative, accusative, genitive, locative, ablative, dative, instrumental, and vocative
  • Cases & Genders
  • Telugu has three genders: masculine, feminine, and neutral. Masculine nouns almost always have an ending with "-Du". These nouns can be modified to become feminine by removing the "-Du" and adding either an "i" or "rA-lu".
  • Telugu has eight cases: nominative, accusative, genitive, locative, ablative, dative, instrumental, and vocative. There are varying suffixes for each of these cases which are added to the verb depending on the currently used case.
  • Pronouns & Politeness
  • There are four main types of pronouns found in Telugu. They are all listed below. Pronouns and verbs are varied depending on formality (there are specific ways to translate verbs depending on the pronoun formality), an example of which is listed below.
  • Personal Pronouns:
    • Description: The person/people spoken to, speaking, or spoken about.
    • Examples:
      • English: I, you, me, he, she, etc.
      • Telugu: "I" = nēnu = నేను, "you" (formal) = mīru = మీరు, "you" (informal) = nīru = నీవు
  • Infinite Pronouns:
    • Description: A group, thing, or object that is acting or being acted upon
    • Example:
      • English: All, any, everybody, few, many, etc.
      • Telugu: "Everybody" = andarū = అందరూ, "many" = anēka = అనేక
  • Relative Pronouns:
    • Description: Used to connect thoughts or ideas within sentences
    • Example:
      • English: Who, whom, which, whoever, etc.
      • Telugu: "Who" = evaru = ఎవరు, "whom" = vīrilō = వీరిలో
  • Reflexive Pronouns:
    • Description: Pronouns used by the actor to describe the actor
    • Example:
      • English: myself, yourself, himself, ourselves, themselves, etc.
      • Telugu: "myself" = nāku = నాకు, "themselves" = tāmu = తాము
  • Punctuation:
  • Telugu script uses single bars to indicate a commas and double bars to indicate periods. Modern Telugu uses punctuation borrowed from English.

Speakers

Overview

Telugu is one of the twenty-two national languages of India and is the third most-popular language in the country. It's the language most commonly spoken in the Andhra Pradesh, Telangana, and Yanam regions of India which are located in the southeast, bordering the Indian Ocean (see: http://bit.ly/1agopKx). In India alone, there are roughly seventy-four million native speakers leaving an estimated one million native speakers scattered around the globe (Canada and the US are home to two large populations of these people). The individuals located outside of India are most likely to be bi-lingual with either English or French being the two most commonly spoken second languages. Within India, it's comparatively more likely that a Telugu speaker only knows Telugu. If they are bi-lingual, they're likely to know either Hindi or English. As for a map of the worldwide distribution of Telugu speakers, we've included the link above to a outdated positioning of the language within India. As a vast majority of speakers are located in India (those who are not have moved from the country), there aren't any maps of Telugu speakers outside of India.

Twitter Presence

After using the tweepy twitter scraper to look for tweets in Telugu, we found a surprisingly small number of tweets in Telugu from the provinces where Telugu is most commonly spoken. This can be attributed to the fact that the twitter data was collected at ~4:30pm EST which is the middle of the night for the east cost of India. Another reason might be the fact that twitter is a fairly english-centric social network. A majority of the content created and shared on the site is written in english and it should be assumed that most users, regardless of their country of origin, would use english as their preferred language of communication.

Writing System

Telugu is written in Telugu script, a beautiful curving alphabet that is ~900 years old. Currently, most speakers write/read a modernized version of the script which was standardized in the second half of the 20th century. Because of this, modern speakers don't understand the original Telugu script. There's a small Indian population at Penn that is fluent in Telugu, but none of them can read the original Telugu script.

The script reads left to right then downward on the page like English. With the help of keyboards, online converters, and other internet tools, anyone with access to the internet can write Telugu script. The plethora of online resources in Telugu script backs this up - there are several periodicals written in the original script below.

  • Andhra Bhoomi
  • Leading Telugu daily newspaper distributed in Hyderabad, Vijayawada, Anantapur, Visakhapatnam, Rajahmundry, and Nellore
  • Link: http://www.andhrabhoomi.net/
  • Andhra Jyothy
  • Popular Telugu newspaper published in the Andhra Pradesh region
  • Link: http://www.andhrajyothy.com/
  • Suryaa
  • Another popoular Telugu newspaper published in the Andhra Pradesh region
  • Link: http://www.suryaa.com/

Machine Translation Systems

There are a minimal number of machine translation systems for Telugu. The largest is Google, which has incorporated Telugu into their translation site and also has a beta transliteration program. For example, the sentence "this is a sentence" gets translated as దీనిని వాక్యం ఉందిand transliterated as Dīnini vākyaṁ .

There exists a few machine translation papers on Telugu. The first one is a masters thesis on rules for translation/literation, the second has to do with sentence token segmentation, and the third invovles part of speech tagging.

(2012) R.SRIBADRI NARAYANAN: English-Telugu Rule Based Machine Translation system. http://nlp.amrita.edu:8080/project/mhrd/ms/Telugu/Final_Thesis.pdf

(2012) Loganathan Ramasamy, Zdeněk Žabokrtský, & Sowmya Vajjala: The study of effect of length in morphological segmentation of agglutinative languages. [ACL 2012] Proceedings of the First Workshop on Multilingual Modeling, Jeju, Republic of Korea, 8-14 July 2012; pp.18-24.

(2011) Siva Reddy & Serge Sharoff: Cross-language POS taggers (and other tools) for Indian languages: an experiment with Kannada using Telugu resources. [IJCNLP 2011] Proceedings of the 5th International Joint Conference on Natural Language Processing, Chiang Mai, Thailand, November 8-13, 2011; pp.11-19