CRRES interviewed Dr. Shobhana Chelliah, Professor of Linguistics in the Department of Linguistics, about her work in language documentation and The Computational Resource for South Asian Languages (CoRSAL), a digital archive for source audio, video, and text on the low-resourced languages of South Asia. The archive is housed at the University of North Texas Digital Library and is managed at IU under a Collaborative Research Agreement between the two Universities. In this interview, we learn more about CoRSAL, the digital archiving process, and upcoming materials to be incorporated into the archive.
An Interview with Dr. Shobhana Chelliah
Could you tell us a little bit about yourself, your academic discipline, and your areas of study?
Dr. Chelliah: My name is Shobhana Chelliah. Originally from South India, I've spent a considerable portion of my life in the United States, both during my childhood and later as a graduate student. I am a linguist interested in all subfields of language science but, in particular, I'm interested in the quality of data that informs our science. I engage in a relatively new subfield of linguistics called language documentation, which involves recording, transcribing, translating, and annotating (for part of speech, for example) language in everyday use. The products of documentation can then feed into our discoveries of language structure. An equally important impact of language documentation is the creation of a corpus of such speech samples (traditional stories and narratives of traditional practices, for example) that speech communities can use to preserve and revitalize their language practices.
What is CoRSAL?
Dr. Chelliah: The Computational Resource for South Asian Languages, or CoRSAL, is a digital language archive which hosts audio, video, and texts examples of language in use and experimental data sets. These language materials enable communities to celebrate, maintain, and preserve their linguistic and cultural heritage. The archived materials, when hosted with annotations and analytics, also provide valuable resources for linguists to analyze the structure of language. And, these analyses recycle back into the language maintenance and preservation activities, for example, helping with the creation of dictionaries and grammars for language learners.
What were the best parts or and more challenging parts of putting an archive together?
Dr. Chelliah: Our objective with CoRSAL was to establish a repository where individuals could contribute their materials with the assurance of long-term preservation and access and respect and ethical use of their deposits. Fortunately, I did not have to create or manage the extensive infrastructure needed to accomplish this myself. The digital library team at the University of North Texas (UNT), particularly Mark Phillips, the Associate Dean for Digital Libraries, provided invaluable assistance. He had already developed a robust infrastructure for the ingestion of digital materials to the UNT library. We piggy-backed on this existing infrastructure and focused on the helping depositors understand the advantages of archiving workflow toward archiving (file naming, data management, metadata creation). We spend a fair amount of time on supporting content creation by interested native speakers, native-speaking linguists, and established language documenters. We provide workshops on recording, transcription, translation, and annotation methods. For example, suppose a speaker collects a thousand words in their language and creates a document with glosses and pronunciation tips, how should they proceed? What software can they use to create a dictionary from this material? How should items be organized and archived? Because we are an 'all-services included' archive, we receive numerous inquiries from individuals seeking to deposit their materials in our archive.
One of the ongoing challenges we face is securing funding for personnel who manage the archiving process. Currently, I have an amazing team, all IU students, doing this work. Sydney Weber (MA) creates metadata, checks file names, and connects with the UNT digital library for upload of collections to the repository. Grayson Pettit (MA) oversees the digitization of physical materials, Alexandra O'Neil (PhD) works with contributors on dictionary creation and publications. Daniel Swanson (PhD) creates query tools and fills other computational needs. Lauren Perkins (PhD) and Grayson Ziegler (PhD) are researching linguistic aspects of data in CoRSAL and thinking about how to improve data formats for further research. Margaret Carpenter (MA) just started as our social media and outreach person. In addition a senecio research scholar, Mary Downs, has joined our team for program development and grant writing. Continued funding for these roles remains a persistent concern. Indiana University (IU) has generously provided financial support for the initial years of this project, specifically to fund such curatorial positions. However, the future of this funding is uncertain, and we have yet to determine how we will sustain these roles once the initial support period concludes. Aside from this challenge, the overall operation of the archive has been proceeding smoothly.
Could you speak about the languages that are covered?
Dr. Chelliah: The archive mostly features languages spoken by approximately 5,000 individuals, though there are some with fewer than 5,000 speakers and others with populations of up to 2 million. While we are open to including larger languages, the focus of our outreach tends to be on smaller languages.
Do people typically approach you with materials or are people invited to submit materials?
Dr. Chelliah: It has been a combination of folks approaching us and us doing outreach to explain the existence of this resource. Outreach has been a crucial component from the outset because archiving is such a new concept to many communities in South Asia. I have dedicated considerable time to traveling across India to various institutions and communities to tell them about CoRSAL, how they can create their own collections and how these can be hosted on CoRSAL. Initially, outreach was a significant effort, but as the archive has grown, we have begun to receive more inquiries. Typically, we encounter three to four new requests per semester, which provides a steady stream of activity and helps maintain our focus on the gradual and consistent expansion of the archive.
Are there any misconceptions about linguistics?
Dr. Chelliah: Many people think that linguists are people who speak a lot of languages and while many linguists do speak a lot of languages, not all do. What is true is that linguists examine both the similarities and the intriguing differences across languages and use language data to develop predictive rules for how languages are structured.
What has your engagement with other disciplines been like?
Dr. Chelliah: There is so much potential of our work to overlap with other social and behavioral sciences. Political scientists use personal narratives to understand the causes and effects of political instability. I don’t know if you listen to NPR, but they recently broadcasted a story about the Rwandan massacres in the 1990s. Here they talked about the power of learning about these events through first-person narratives. You often don’t get that information in English, German, or French. You only get it in somebody’s native language. What did the experiencers go through? How did they react? As one of CoRSAL's activities we ran a conference with political scientists at UNT called Language Endangerment and Political Instability where we explored the overlapping interests in personal narratives as a data source. Here is another example. I currently have a grant from the National Endowment for the Humanities to work on environmental changes and how they are reflected in language focusing on the Lamkang community in Manipur state in northeast India. My co-investigator is an environmental philosopher and expert in water issues. We are working with Lamkang speakers to collect personal narratives in Lamkang that provide insights into past and present practices. So, these are two ways that CoRSAL partner with other social scientists to improve methods and data sets throughout collaborative research.
What exciting and upcoming work should we expect in the future?
Dr. Chelliah: As we speak, in my office are 30 boxes of materials donated to CoRSAL from the estate of University of Chicago linguists, Norman Zide and Arlene Zide. This is an extensive collection of linguistic field notes on the Munda languages, a subgroup within the Austroasiatic language family. The Munda languages are one of the four major language families in India. We know very little about many of the languages in this family, so these field notes are a treasure both for the communities that speak the languages and for linguists exploring language history and structure. We are currently collaborating with Living Tongues, a nonprofit dedicated to documenting endangered languages, to draft proposals that would enable us to bring a Zide collection to CoRSAL. The project would allow us to hire students to digitize and create metadata for these materials, thereby learning more about linguistics, information science, language archiving, and, who knows, they may even pursue further scholarship on the Munda languages. Ultimately, we would like CoRSAL to acquire such resources, make them accessible to students and accelerating their research in the represented areas. Instead of starting from scratch, students can build upon existing work and, in subsequent stages, reconnect with the communities associated with these languages. This approach will significantly enhance their research efficiency and impact. These are the exciting developments we anticipate over the next two to three years!
Meet the Researcher
Shobhana Chelliah is a Professor in the Department of Linguistics at Indiana University. She specializes in Sino-Tibetan languages. Her research interests include documentary linguistics, morphosyntax, trans-Himalayan languages, language archiving, and corpus linguistics for low-resourced languages. Dr. Chelliah also has four different projects funded through various grants including the National Science Foundation, one by the National Endowment for the Humanities, and another by the Museum of Library and Information Science Library Sciences. Each of these is related to a corpus that is being built that will be incorporated into CoRSAL.