Project Methodology

This section presents a discussion on the data collection methods used to collect linguistic material for the SDML project. It also includes a discussion of the criteria for selection of lexical concepts for analysis and analysis of lexical features; selection of grammatical features for analysis and analysis of grammatical features; and analysis of narratives.

Data Collection Methods

The data presented on this website were collected through fieldwork carried out exclusively in rural and semi-urban communities. The data were collected between May 2018 and July 2022; a total of 2962 people in 277 villages across 34 districts of Maharashtra were interviewed. All interviews were audio-recorded using Sony IC recorders. The two urban districts—Mumbai City and Mumbai Suburban—are not included in this survey.

Elicitation tools

A sociolinguistic questionnaire was designed to collect information about language use within the village community as well as the social and economic linkages of the local community.

A set of pictures was used to elicit names for household items and vegetables, kinship terms and deictic terms. (https://sdml.ac.in/features)

The Deccan College Stimulus Kit (consisting of about seventy videos) was used to elicit information regarding the selected grammatical features (https://sdml.ac.in/stimulus-kit). This stimulus kit contains videos designed to elicit particular grammatical categories and constructions, for example, gender system, case markers, aspectual markers, and agreement patterns.

Two kinds of narrations were recorded to overcome the observer’s paradox (Labov 1972): the narration of a traditional story (Tortoise and Rabbit; The Woodcutter; The Old Woman and Pumpkin) and the narration of a personal experience, a childhood memory, etc.


The data elicited using the pictures and videos were transcribed using the IPA. Recordings of the selected traditional stories and personal experiences collected in each village were fully transcribed manually by the research assistants. The narrative data were glossed using the Leipzig glossing rules. (See https://www.eva.mpg.de/lingua/pdf/Glossing-Rules.pdf )

Selection of talukas, villages, and speakers

The ‘district’ was used as a research heuristic to conduct data collection for the project. The tehsil of the district headquarters and two to three other details were selected depending on considerations such as social demography, proximity to a neighbouring district/state and our understanding of dialectal differentiation within the region. Two villages were selected in each tehsil: one close to a prominent city or town in the tehsil, and the other located away from the town or city.

Example: Sampling in Kolhapur District.

The district is divided into twelve tehsils, or administrative units. Of these, the city of Kolhapur is situated in Karveer tehsil. Hence, Karveer tehsil was sampled in the survey. A village close to the urban centre of Kolhapur (viz. Gadmudshingi) and a village away from the urban centre (viz. Khupire) were selected for data collection. Further, the tehsil close to the neighbouring (Kannada-speaking) state of Karnataka,

viz. Gadahinglaj; the tehsil close to the neighbouring district of Sindhudurg as well as the state of Karnataka, viz. Chandgad; the tehsil close to the neighbouring district of Ratnagiri in the Konkan, viz. Shahuwadi were also included in the survey. Kolhapur was the first district to be sampled for the survey. At this early stage in the survey, two additional tehsils were sampled – Kagal (situated at the Maharashtra –Karnataka inter-state border) and Radhanagari, close to Sindhudurg district. Thus a total of six tehsils in Kolhapur were sampled.

Details of data collection (until July 2020):

Total no. of Districts sampled Tehsils Villages Sampled Total no. of Interviews No. of Full interviews
34 125 277 2962 2261

Selection of speakers

Standard sociolinguistic criteria were used in the selection of individuals for data collection:

Data were collected from male and female speakers of three age groups (18-30, 31-55, 55+ years), belonging to all major demographic groups in the village. Care was taken to include illiterate as well as literate speakers. There were instances when speakers reporting a mother tongue other than Marathi constituted a major demographic group in the village and reported Marathi to be their contact language in the village. Such speakers were also included in the survey. For example, Bhils in Nandurbar; Kannada speakers in Latur, Solapur, Miraj; Urdu speakers in Ratnagiri and elsewhere; Gondi speakers in Nagpur, etc.

18-30 31-50 50+
675 685 903
 Gents  Ladies
 1225  1036

Selection of lexical concepts

What is meant by lexical concepts?

Lexical concepts are nomenclature used for specific object-related or person-related things. Regional varieties of a language are immediately recognised in terms of the use of some distinct words as well as the prosody (intonation). For example, mulga, porga, lyok, aṇḍor, ḍikra, ǰʰil, čeḍo, bʰurɡe, poṭə, ṭura, lek, ṭuḍal, lekus, sohəra, pošya are some of the lexical terms referring to “Ego’s son” in Marathi.

How were the lexical items identified for elicitation in this survey?

Lexical items elicited for the SDML were identified based mainly on Dhongde (2013): this study based entirely on lexical evidence, concludes that regional dialects of Marathi vary from the standard dialect of Marathi in approximately 25% of the 2900 lexical items elicited in the study. The SDML uses some of these 25% most distinctive lexical items in addition to other lexical terms gleaned from other dialectal studies. A total of seventy lexical items were elicited for the survey. These belong to the following semantic fields: household items, vegetables, deictic terms, and kinship terms.

How were the lexical items elicited?

Lexical items are collected from the village speakers who claimed to be fluent in Marathi, who reported using Marathi as their village contact language, but whose mother tongue was not Marathi. Such words are often words from the speaker's L1 i.e. mother tongue/first language (eg Hindi-Urdu, Telugu, Kannada, Gondi, Bhil language, etc.).

How were the lexical items transcribed?

Sound variation as well as semantic shifts have been captured in the lexical database.

Selection of grammatical features

What is a grammatical feature?

A grammatical trait, such as a word formation process or a syntactic relation between words in a sentence is referred to as ‘grammatical feature’. E.g.1 The various case markers which can be attached to the noun rǝsta ‘road’: rǝstyala (accusative), rǝstyat (locative), rǝstyapasun (ablative), rǝstyani (instrumental). E.g.2 Agreement in a sentence bhintǝ pǝḍli OR bhintǝ pǝḍlǝ ‘the wall fell-feminine agreement’ and ‘the wall fell-neuter agreement’ respectively.

How were grammatical features identified for this survey?

About twenty grammatical features were identified on the basis of previous published works and previous experience of data collection in Maharashtra. It was expected that a sub-set of these features would prove to be diagnostic to identify the major dialectal divisions in Marathi. These features include the following: (i) case marking (ii) aspect marking (iii) tense marking (iv) person and number distinction in verbal marking (v) case marking in the transitive perfective clause (vi) agreement in the transitive perfective clause, and so on.

How were grammatical features elicited?

The ‘Deccan College Stimulus’ consisting of seventy videos was used to elicit particular grammatical features. (https://sdml.ac.in/stimulus-kit)

Methodology used for analysing and mapping grammatical variation

Considering the social demography of each village included in the survey sample, representative speakers’ interviews (each of 40 to 60 minutes duration) were transcribed using IPA. The occurrences of the selected grammatical features in each transcript were noted. Care was taken to ensure that the speech of male/female speakers, old/young speakers, educated/uneducated speakers were used in the analysis of grammatical features. The use of this method enabled us to capture both inter-speaker and intra-speaker variation within the village. An examination of dialect change is facilitated by this method. The grammatical data culled from individual interviews was aggregated at the village level.

The alternative realisations of a particular grammatical feature (variable) which were noted across the regions and communities sampled in the survey were examined and assigned values. For example, the following variation was noted in the phonological form of the instrumental suffix across Maharashtra: [-ne/ni/nə/n/na/nu), (2) [-wəri (wər)], (3) [-kʰəl (kʰalə/kʰali/kʰale/kʰal)], (4) [-ɡʰəi (ɡʰi)], (5) [-ši], (6) [-kʰən (kʰun/kən)], (7) [-vowel (i/e/ə)], (8) [kəḍə (kəḍ/kə/kər/kəri)]. The geographical distribution and examples of the variant forms are given below. As shown here, values 1 to 8 were assigned to the observed variants. The geographical distribution of each variant was then displayed in the form of a dialect map.

Analysis of narrations

The details of the district, taluka, village in which each narration was recorded and the details of the narrator's gender, age, education, social group/caste/tribe etc. are given in the tabular form. (See – Chapter 5 – List given in sub-section 3 under each district)


Phonetic Symbols

List of Surveyed Villages

