Project Methodology

This section presents a discussion on the data collection methods used to collect linguistic material for the SDML project. It also includes a discussion of the criteria for selection of lexical concepts for analysis and analysis of lexical features; selection of grammatical features for analysis and analysis of grammatical features; and analysis of narratives.

Data Collection Methods

The data presented on this website were collected through fieldwork carried out exclusively in rural and semi-urban communities. The data were collected between May 2018 and July 2022; a total of 2962 people in 277 villages across 34 districts of Maharashtra were interviewed. All interviews were audio-recorded using Sony IC recorders. The two urban districts—Mumbai City and Mumbai Suburban—are not included in this survey.

Elicitation tools

A sociolinguistic questionnaire was designed to collect information about language use within the village community as well as the social and economic linkages of the local community.

A set of pictures was used to elicit names for household items and vegetables, kinship terms and deictic terms. (https://sdml.ac.in/features)

The Deccan College Stimulus Kit (consisting of about seventy videos) was used to elicit information regarding the selected grammatical features (https://sdml.ac.in/stimulus-kit). This stimulus kit contains videos designed to elicit particular grammatical categories and constructions, for example, gender system, case markers, aspectual markers, and agreement patterns.

Two kinds of narrations were recorded to overcome the observer’s paradox (Labov 1972): the narration of a traditional story (Tortoise and Rabbit; The Woodcutter; The Old Woman and Pumpkin) and the narration of a personal experience, a childhood memory, etc.

Transcription

The data elicited using the pictures and videos were transcribed using the IPA. Recordings of the selected traditional stories and personal experiences collected in each village were fully transcribed manually by the research assistants. The narrative data were glossed using the Leipzig glossing rules. (See https://www.eva.mpg.de/lingua/pdf/Glossing-Rules.pdf )

Selection of talukas, villages, and speakers

The ‘district’ was used as a research heuristic to conduct data collection for the project. The tehsil of the district headquarters and two to three other details were selected depending on considerations such as social demography, proximity to a neighbouring district/state and our understanding of dialectal differentiation within the region. Two villages were selected in each tehsil: one close to a prominent city or town in the tehsil, and the other located away from the town or city.

Example: Sampling in Kolhapur District.

The district is divided into twelve tehsils, or administrative units. Of these, the city of Kolhapur is situated in Karveer tehsil. Hence, Karveer tehsil was sampled in the survey. A village close to the urban centre of Kolhapur (viz. Gadmudshingi) and a village away from the urban centre (viz. Khupire) were selected for data collection. Further, the tehsil close to the neighbouring (Kannada-speaking) state of Karnataka,

viz. Gadahinglaj; the tehsil close to the neighbouring district of Sindhudurg as well as the state of Karnataka, viz. Chandgad; the tehsil close to the neighbouring district of Ratnagiri in the Konkan, viz. Shahuwadi were also included in the survey. Kolhapur was the first district to be sampled for the survey. At this early stage in the survey, two additional tehsils were sampled – Kagal (situated at the Maharashtra –Karnataka inter-state border) and Radhanagari, close to Sindhudurg district. Thus a total of six tehsils in Kolhapur were sampled.

Details of data collection (until July 2020):

Total no. of Districts sampled	Tehsils	Villages Sampled	Total no. of Interviews	No. of Full interviews
34	125	277	2962	2261

(https://sdml.ac.in/data-analysed)

Selection of speakers

Standard sociolinguistic criteria were used in the selection of individuals for data collection:

Born in the village and brought up in the village or living in the village for the past 20 years or more
Represented the major demographic groups in the village (i.e. caste / religion / tribe, age and sex)
Reported Marathi to be the mother tongue or the primary contact language in village interactions

Data were collected from male and female speakers of three age groups (18-30, 31-55, 55+ years), belonging to all major demographic groups in the village. Care was taken to include illiterate as well as literate speakers. There were instances when speakers reporting a mother tongue other than Marathi constituted a major demographic group in the village and reported Marathi to be their contact language in the village. Such speakers were also included in the survey. For example, Bhils in Nandurbar; Kannada speakers in Latur, Solapur, Miraj; Urdu speakers in Ratnagiri and elsewhere; Gondi speakers in Nagpur, etc.

18-30	31-50	50+
675	685	903
2261

Males		Females
1225		1036
2261

Selection of lexical concepts

What is meant by lexical concepts?

Lexical concepts are nomenclature used for specific object-related or person-related things. Regional varieties of a language are immediately recognised in terms of the use of some distinct words as well as the prosody (intonation). For example, mulga, porga, lyok, aṇḍor, ḍikra, ǰʰil, čeḍo, bʰurɡe, poṭə, ṭura, lek, ṭuḍal, lekus, sohəra, pošya are some of the lexical terms referring to “Ego’s son” in Marathi.

How were the lexical items identified for elicitation in this survey?

Lexical items elicited for the SDML were identified based mainly on Dhongde (2013): this study based entirely on lexical evidence, concludes that regional dialects of Marathi vary from the standard dialect of Marathi in approximately 25% of the 2900 lexical items elicited in the study. The SDML uses some of these 25% most distinctive lexical items in addition to other lexical terms gleaned from other dialectal studies. A total of seventy lexical items were elicited for the survey. These belong to the following semantic fields: household items, vegetables, deictic terms, and kinship terms.

How were the lexical items elicited?

Local words for household items, fruit and vegetable names were elicited using pictures. (These pictures are shared on the website under ‘Click here to view image’ on the Lexical Features page, https://sdml.ac.in/features)
Deictic terms (e.g. here, there, tomorrow, yesterday, etc.) were elicited using situational cues / pictures.
Kinship terminology was elicited using paraphrases as cues (e.g. What is the relationship between your mother’s brother’s wife and you? What is the relationship between you and your sister’s son? etc.)

Lexical items are collected from the village speakers who claimed to be fluent in Marathi, who reported using Marathi as their village contact language, but whose mother tongue was not Marathi. Such words are often words from the speaker's L1 i.e. mother tongue/first language (eg Hindi-Urdu, Telugu, Kannada, Gondi, Bhil language, etc.).

How were the lexical items transcribed?

The elicited lexical items were transcribed using (i) Roman-based International Phonetic Script (IPA) and (ii) modified Devanagari script. E.g. cǝmča – च़मचा ‘spoon’; ja – ज़ा ‚go.IMP‘. [A complete list of the symbols used will be found under ‚Methodology‘. https://sdml.ac.in/methodology]
The lexical database captures variation: E.g. In Palghar district,
mewʰna, bʰawə, daji, mewʰnas, sala, saḷa, saḍa, hala, haḷa, haro, hara were the lexical variants received for the concept “male ego’s wife’s younger brother”.
bʰači, bʰasi, bʰašis, bʰasi, bʰawači porɡi, sun were the variants received for the concept “Female ego’s brother’s daughter”.

Sound variation as well as semantic shifts have been captured in the lexical database.

Selection of grammatical features

What is a grammatical feature?

A grammatical trait, such as a word formation process or a syntactic relation between words in a sentence is referred to as ‘grammatical feature’. E.g.1 The various case markers which can be attached to the noun rǝsta ‘road’: rǝstyala (accusative), rǝstyat (locative), rǝstyapasun (ablative), rǝstyani (instrumental). E.g.2 Agreement in a sentence bhintǝ pǝḍli OR bhintǝ pǝḍlǝ ‘the wall fell-feminine agreement’ and ‘the wall fell-neuter agreement’ respectively.

How were grammatical features identified for this survey?

About twenty grammatical features were identified on the basis of previous published works and previous experience of data collection in Maharashtra. It was expected that a sub-set of these features would prove to be diagnostic to identify the major dialectal divisions in Marathi. These features include the following: (i) case marking (ii) aspect marking (iii) tense marking (iv) person and number distinction in verbal marking (v) case marking in the transitive perfective clause (vi) agreement in the transitive perfective clause, and so on.

How were grammatical features elicited?

The ‘Deccan College Stimulus’ consisting of seventy videos was used to elicit particular grammatical features. (https://sdml.ac.in/stimulus-kit)

Methodology used for analysing and mapping grammatical variation

Considering the social demography of each village included in the survey sample, representative speakers’ interviews (each of 40 to 60 minutes duration) were transcribed using IPA. The occurrences of the selected grammatical features in each transcript were noted. Care was taken to ensure that the speech of male/female speakers, old/young speakers, educated/uneducated speakers were used in the analysis of grammatical features. The use of this method enabled us to capture both inter-speaker and intra-speaker variation within the village. An examination of dialect change is facilitated by this method. The grammatical data culled from individual interviews was aggregated at the village level.

The alternative realisations of a particular grammatical feature (variable) which were noted across the regions and communities sampled in the survey were examined and assigned values. For example, the following variation was noted in the phonological form of the instrumental suffix across Maharashtra: [-ne/ni/nə/n/na/nu), (2) [-wəri (wər)], (3) [-kʰəl (kʰalə/kʰali/kʰale/kʰal)], (4) [-ɡʰəi (ɡʰi)], (5) [-ši], (6) [-kʰən (kʰun/kən)], (7) [-vowel (i/e/ə)], (8) [kəḍə (kəḍ/kə/kər/kəri)]. The geographical distribution and examples of the variant forms are given below. As shown here, values 1 to 8 were assigned to the observed variants. The geographical distribution of each variant was then displayed in the form of a dialect map.

Analysis of narrations

Transcription of selected narratives in revised Devanagari script and IPA
Linguistic analysis using Leipzig Glossing Conventions
Free translation of each sentence into English

The details of the district, taluka, village in which each narration was recorded and the details of the narrator's gender, age, education, social group/caste/tribe etc. are given in the tabular form. (See – Chapter 5 – List given in sub-section 3 under each district)

Abbreviations

(Click to expand)

Abbreviation	Expanded Form
ABL	Ablative
ACC	Accusative Case
AUX	Auxiliary
COMP	Complementizer
COREL	Correlative
CP	Conjunctive Participle
DAT	Dative
DEM	Demonstrative
DIST	Distal
EMPH	Emphatic
ERG	Ergative
EXCL	Exclusive
F	Feminine
FUT	Future
GEN	Genitive Case
HAB	Habitual
HON	Honorific
HORT	Hortative
IMP	Imperative
INF	Infinitive
INS	Instrumental
IPFV	Imperfective
INCP	Inceptive
INCL	Inclusive

Abbreviation	Expanded Form
LOC	Locative
M	Masculine
N	Neuter
NEG	Negative
NON.FIN	Non-Finite
OBL	Oblique
PLU.PFV	Pluperfective
PRT	Particle
PST	Past
PFV	Perfective
PL	Plural
PP	Postposition
PRS	Present
PROG	Progressive
PROS	Prospective
PROX	Proximate
PTPL	Participle
Q	Question Particle
QUOT	Quotative
REDUP	Reduplicative
REL	Relative
SG	Singular
VOC	Vocative
1	First Person
2	Second Person
3	Third Person

Phonetic Symbols