The data presented in this website were collected through fieldwork carried out exclusively in rural and semi-urban communities. The data were collected between May 2018 and March 2020; a total of 2285 persons in 231 villages across 25 districts of Maharashtra were interviewed. All interviews were audio-recorded using Sony IC recorders.

Selection of villages

The ‘district’ was used as a research heuristic to conduct data collection for the project. The tehsil of the district headquarter and two to three other details were selected depending on considerations such as social demography, proximity to a neighbouring district / state and our understanding of dialectal differentiation within the region. Two villages were selected in each tehsil: one close to a prominent city or town in the tehsil, and the other located away from the town or city.

Example: Sampling in Kolhapur District.

The district is divided into twelve tehsils or administrative units. Of these, the city of Kolhapur is situated in Karveer tehsil. Hence Karveer tehsil was sampled in the survey. A village close to the urban centre of Kolhapur (viz. Gadmudshingi) and a village away from the urban centre (viz. Khupire) were selected for data collection. Further, the tehsil close to the neighbouring (Kannada-speaking) state of Karnataka, viz. Gadahinglaj; the tehsil close to the neighbouring district of Sindhudurg as well as the state of Karnataka, viz. Chandgad; the tehsil close to the neighbouring district of Ratnagiri in the Konkan, viz. Shahuwadi were also included in the survey. Kolhapur was the first district to be sampled for the survey. At this early stage in the survey, two additional tehsils were sampled – Kagal (situated at the Maharashtra –Karnataka inter-state border) and Radhanagari, close to Sindhudurg district. Thus a total of six tehsils in Kolhapur were sampled.

Details of data collection (until July 2020):

Total no. of Districts sampled Tehsils Villages Sampled Total no. of Interviews
25 101 231 2285

Selection of speakers

Standard sociolinguistic criteria were used in the selection of persons for data collection:

Data were collected from male and female speakers of three age groups (18-30, 31-55, 55+ years), belonging to all major demographic groups in the village. Care was taken to include illiterate as well as literate speakers. There were instances when speakers reporting a mother tongue other than Marathi constituted a major demographic group in the village and reported Marathi to be their contact language in the village. Such speakers were also included in the survey. For example, Bhils in Nandurbar; Kannada speakers in Latur, Solapur, Miraj; Urdu speakers in Ratnagiri and elsewhere; Gondi speakers in Nagpur, etc.

Elicitation tools

A sociolinguistic questionnaire was designed to collect information about language use within the village community as well as social and economic linkages of the local community.
A set of pictures was used to elicit names for household items and vegetables. Kinship terms and deictic terms were elicited The Deccan College Stimulus Kit (consisting of about seventy videos) was used to elicit information regarding the selected grammatical features. This stimulus kit contains videos designed to elicit particular grammatical categories and constructions: example –
gender system, case markers, aspectual markers and agreement patterns.

Two kinds of narrations were recorded to overcome the observer’s paradox (Labov 1972): the narration of a traditional story (Tortoise and Rabbit; The Woodcutter; The Old Woman and Pumpkin) and the narration of a personal experience, a childhood memory, etc.


Data elicited using the pictures and videos were transcribed using the IPA. Recordings of traditional stories and personal experiences collected in each village were fully transcribed manually by the research assistants. The narrative data were glossed using the Leipzig Glossing Rules. (See https://www.eva.mpg.de/lingua/pdf/Glossing-Rules.pdf)

Abbreviations and Symbols

Phonetic Symbols

List of Surveyed Villages

