Bilingualism, Language Shift, and
Economic Development in India, 1931–1961∗
Department of Economics
October 30, 2006
JOB MARKET PAPER
During the past 500 years, a small group of the world’s languages have grown
immensely in relative size. A much larger group of languages have declined, and
many have disappeared altogether. Sociolinguists term such displacement lan-
guage shift. Economists have shown that language aﬀects a variety of economic
outcomes, such as trade, economic growth, and the provision of public goods.
This literature treats language as exogenous. Yet, productivity in manufactur-
ing is higher when workers can communicate with each other. Urban residents
interact and use markets more than their rural counterparts. Both manufac-
turing employment and urbanization provide incentives to learn new languages.
I develop a model of second language learning and language shift and test it
using a new panel dataset of Indian districts for 1931 and 1961. My estimations
are consistent with the model. I show that the growth of both manufacturing
employment and urbanization in mid-20th India strongly encouraged bilingual-
ism, and that bilingualism in turn lead to the relative decline of languages.
On average, one secondary language speaker is induced to become bilingual for
every four additional manufacturing jobs and for every twenty new urban res-
idents in a district. The manufacturing and urbanization eﬀects were smaller
for speakers of the district majority language. Initial bilingualism is associated
with relative decline of a language over time. The growth of manufacturing
employment decreased district-level linguistic heterogeneity.
∗Please do not cite without permission.
†Email: email@example.com. I am grateful for discussions with Leah Platt Boustan, Carola
Frydman, Claudia Goldin, Eric Hilt, Lakshmi Iyer, Asim Khwaja, Michael Kremer, Bob Margo,
Simona Mkrtschjan, Rohini Pande, and Jeﬀrey Williamson. Thanks to participants in the Harvard
Economic History Workshop, Harvard Development Lunch and the Harvard Economic History Tea
for useful comments. A portion of this project was conducted while I was a fellow of the Project on
Justice, Welfare, and Economics at the Weatherhead Center for International Aﬀairs at Harvard.
The Harvard Department of Economics’s Warburg Funds supported the data entry.
The world’s languages have been undergoing a profound consolidation for at least half
a millennium. The share of the world’s population speaking a small set of languages,
such as English, Spanish, and Hindi, has grown immensely since around 1500. The
vast majority of languages have declined since then, and some have disappeared alto-
gether. About 6, 700 languages are spoken today (Gordon 2005). Linguists estimate
there were as many as 10, 000 languages spoken in 1500, and that half of the languages
currently spoken are likely to disappear in the next century (Hill 1978; Crystal 1997;
Krauss 1992). Hard evidence on the long-term evolution of language size is scarce,
but the imprint of linguistic consolidation can be seen in the present size distribution
of languages. The ten largest languages are mother tongues to 3.2 billion people,
while the median language has only 10, 000 speakers. This distribution was surely
less skewed in the distant past, when the scope of human interaction and political
organization was much narrower.
Sociolinguists use the term language shift to refer to a change in the ﬁrst spoken
language learned by a population, also known as the mother tongue. Language shift
within a lineage always begins with an individual acquiring a second language. Bilin-
gualism is a necessary condition for language shift because parent and child always
share at least one language in common. When a lineage that has become bilingual
abandons its original mother tongue, language shift has occured. One possible out-
come of language shift is the extinction of a language.
Scholars have proposed a wide variety of causal explanations for language shift,
including industrialization, urbanization, political centralization, and mass media.
Political centralization is probably the most popular explanation for language shift.
States may cause language shift through their choice of standard languages for ad-
ministration and education—both for eﬃciency reasons and to weaken subnational
loyalties—and encourage their adoption (Dasgupta 1970; Weber 1976; Brass 1994).
Mass media, from the newspaper to television, encourage the acquisition of and iden-
tiﬁcation with the majority language (Anderson 1991).
Industrialization and urbanization are the explanations of most interest to economists.
These changes in the structure of the economy make sharing a communication network
with many others more valuable, encouraging speakers of small languages to become
bilingual (Weinreich 1953; Fishman 1964; Gal 1978; Crystal 1997). Bilinguals may
subsequently abandon their ancestral tongue. Communication is more valuable for
factory workers than farmers because the former have larger work groups and more
task coordination. They also are more likely to use the labor market to ﬁnd work.
Urban residents similarly rely more on the market to provide household goods and
services than their rural counterparts.
The eﬀect of language on economic outcomes has generated substantial interest
among economists, though economists have not studied language shift. For example,
gravity models of trade show that common language has large positive eﬀects on the
trade ﬂow between countries (Helliwell 1999). Linguistic diversity within a country is
negatively associated with both economic growth and the provision of public goods
(Alesina et al. 1999; Alesina & La Ferrara 2005). This literature generally assumes
that language ability is exogenous.
I explore the eﬀect of manufacturing employment growth and urbanization on
second language acquisition and language shift in India using a new panel dataset of
Indian districts for 1931 and 1961. I show that as sociolinguists suspect, industrializa-
tion or urbanization are in fact drivers of investment in learning new languages, and
that we should treat language as endogenous to the process of economic development.
India is one of the world’s most linguistically diverse countries, with more than
180 distinct languages spoken. Even within small geographic regions, one can ﬁnd a
variety of mother tongues spoken. In this way, India is similar to how much of the
world was before the consolidation of languages began in earnest. Between 1931 and
1961, India had substantial increases in bilingualism and major shifts in the structure
of employment toward manufacturing and of the population into cities and towns.
I organize my empirical investigation using a model that links the return to com-
munication to second language acquisition and language shift. In the model, com-
munication generates a return by improving productivity of manufacturing. We may
also imagine communication-generated productivity improvements coming from other
sources, such as living in a city. The model distinguishes between mother tongue
speakers of dominant and secondary languages. I use the term dominant language to
refer to the language spoken by the largest share of the population in an area. I term
all other languages in that area secondary languages. I derive and empirically test
four main predictions: (1) A language which is locally rare will have more bilinguals.
(2) Bilingualism will be greater when the relative return to communication-intensive
activities is greater. (3) The incentive to become bilingual will be smaller for speak-
ers whose mother tongue is the dominant language. (4) Bilingualism among mother
tongue speakers of a small language may or may not encourage its relative decline.
My district-level empirical analysis allows for language-by-district ﬁxed eﬀects.
To deal with remaining simultaneity and omitted variables biases, I develop an in-
strumental variable for both manufacturing employment growth and urbanization.
The instrument predicts the change in district-level manufacturing employment that
results from the channeling eﬀect a district’s initial portfolio of manufacturing indus-
tries has on shifts in national-level employment demand.
I ﬁnd that one secondary language speaker is induced to learn a new language
for every four additional manufacturing jobs and for every twenty new urban res-
idents in the district.
As the model predicts, the eﬀects on bilingualism among
dominant language speakers are smaller. My measures combine both direct eﬀects
and spill-overs. Economic development, measured by manufacturing employment and
urbanization, account for 30% of the mean change in bilingualism among secondary
language speakers. The initial share of speakers of a given language that is bilingual
in 1961 is correlated with a relative decline in the number of speakers of that language
between 1961 and 1991.
The combination of a positive eﬀect of economic development on bilingualism and
a negative eﬀect of bilingualism on the relative number of mother tongue speakers
of a language suggests economic development might decrease linguistic heterogeneity.
Average district-level linguistic heterogeneity increased by about 4.8 points between
1931 and 1961. I ﬁnd that on average manufacturing employment growth held back
this increase by about 5.9 points. Urbanization had no signiﬁcant eﬀect on linguistic
Early 20th Century India
Structural Stagnation and Structural Change
India had very slow economic growth the ﬁrst half of the 20th century. Per-captia
real GDP in 1947 was only 4% higher than it had been in 1900 (Sivasubramonian
2000). Until the 1920s, the structure of the Indian economy was also quite static. For
the half-century between the ﬁrst all-India census in 1872 and 1921, manufacturing
consistently provided jobs to about 10% of the workforce and about 10% of the
population lived in cities.
India’s structural stagnation broke during the 1920s. The economy began to shift
employment into the manufacturing sector, and urbanization increased. Mortality
also began to decline in the early 1920s, setting India’s demographic transition into
motion. Life expectancy at birth increased from 26.8 to 45.6 years between 1931 and
1961 (Gopalan & Shiva 2000). The employment rate decreased by 2.8% during the
period, reﬂecting in part an increasing share of children and elderly in the population.
India’s development between 1931 and 1961 was rapid compared to the preceding
half century. Manufacturing employment grew at 2.7% annually, expanding from 7.4%
to 11.1% of the total workforce. While this might not seem dramatic, it is similar
to the 3.1% annual growth U.S. manufacturing employment had between 1849 and
1879 (Carter et al. 2006). India also became substantially more urban. In 1961,
18% of India’s population lived in cities and towns, up from 12% in 1931. High
transport costs meant that much of India’s 19th century industry had been located
near to the point of consumption. The location of manufacturing shifted toward rail
lines and ports and into towns as India’s national product markets integrated and
agglomeration economies became more important (Krishnamurty 1983). This process
continued between 1931 and 1961. Nearly all districts become more urbanized, but
some saw a decline in industrial employment (Figure 1). Manufacturing employment
growth and urbanization are positively correlated, though the correlation coeﬃcient
of 0.32 is rather weak.
Manufacturing enterprises increased substantially in scale between 1931 and 1961,
in part due to increased specialization (Roy 2000, 1999). Large factories, deﬁned as
those having more than 10 employees with power or 20 without power, provided
39.9% of all manufacturing jobs in 1961, more than double their 15.6% share in 1931.
Labor productivity in large factories grew at a relatively brisk 2.1% annual rate
between 1931 and 1947, while small factories actually saw a 1.5% annual decline in
labor productivity (Sivasubramonian 2000). The shift to larger factories increased the
communication demands on workers, who now had to coordinate their tasks with a
larger group. While both scale and the productivity of large factories were increasing,
its important to remember that the bulk of Indian industry continued to use simple,
labor intensive technologies (see Figure 2).
India’s external environment and trade policy were important factors underlying
the structural shift toward manufacturing. India exports in the early 20th century
were primarily agricultural commodities such as tea, wheat, ﬂax, raw cotton, and
raw jute. The price of these export commodities relative to the manufactured goods
India imported began fall in the late teens (Appleyard 2006). This negative terms of
trade shock favored Indian manufacturing at the expense of agriculture. Around the
same time, India assumed control of her trade policy under the 1919 Government of
India Act. Tariﬀs policy sought to both to raise revenue and to provide protection
to Indian industry (Tomlinson 1979). Average import tariﬀs almost trebled from an
average of 4.5% in the teens to 12.3% in the 1920s (Figure 3). Average tariﬀs were
23.3% between 1931 and 1961, almost double the level of the 1920s. World War II
brought additional restrictions, such as licenses, quantity limits, and other non-tariﬀ
Language, Literacy, and Education
India is one of the most linguistically diverse countries in the world. The probability
that two randomly selected Indians share a mother tongue is only about 10%, similar
to countries such as Papua New Guinea, Nigeria, and the Democratic Republic of
Congo. At least 180 distinct languages and about 600 dialects are indigenous to
India. Dozens of these languages have literary traditions and a large minority of
them are written using one of India’s several scripts.
Although most of India’s languages are concentrated in particular regions of the
country, there is still substantial diversity in small geographic units (Table 1, Panel
A). The mother tongue of 23% of Indians was secondary language in their district
of residence in 1931 and rose to 26% in 1961. The average district has two or three
secondary languages with substantial population shares (Figure 5, panel (a)).
Between 1931 and 1961, the average bilingualism rate among secondary-language
mother tongue speakers increased from 28.2% to 43.8% (Table 1). Panel (b) in ﬁgure
5 shows bilingualism was negatively correlated with district-level language rank in
1931. Most of the growth in bilingualism between 1931 and 1961 happened among
speakers of languages ranked 2 through 4. Bilingualism among languages ranked 5
or 6 was steady at about 50%, showing that many people found it possible to live as
linguistic isolates. Nearly 80% of secondary language mother tongue speakers who
were bilinguals chose the dominant language of their district as their second language.
Only 1.5% of dominant-language mother tongue speakers were bilingual in 1931,
but bilingualism increased substantially to 6.7% in 1961. This partly reﬂects the de-
cline in the share of the population whose mother tongue was the dominant language
from 77.3% to 73.9%. These bilinguals chose one of India’s two lingua francas, Hindi
and English, as their second language more than half of the time. Otherwise, they
chose one of their district’s secondary languages.
Literacy was a more widespread form of human capital than formal schooling
in the period, and it was expanding rapidly.
About 24.0% of adults could read
in 1961, up from just 9.5% in 1931. Most languages the enter my analysis have
written forms, so desire for literacy per se probably did not generate substantial
second language acquisition. In fact, it is possible that literacy and bilingualism
are substitutes. While formal schooling in the vernacular languages of India and in
English had been promoted since the 1850s, the population primary completion rate
was very low even in 1961. The Census of India did not ask about schooling until
1941 (Srivastava 1972); only 7.0% of the population had completed the three to four
years that comprised primary school in 1961.
Economics of Bilingualism and Language Shift
This paper focuses on the value of language as a means of communication. Linguists
believe that all of the world’s languages are equivalent in communicative power. There
is no idea expressible or action realizable in Tamil that cannot be expressed or re-
alized in Bengali or in English. One language is not an inherently better means of
communication than another. This suggests that from an instrumental point of view,
we should not agonize over the displacement or death of a language. However, lan-
guage is also a marker of ethnic identity and a carrier of cultural forms, both of which
may have value, particularly to native speakers but possibly to others as well. Lin-
guists value languages in and of themselves because of the information they contain
about the structural possibilities of human language. They believe the human brain
is programmed so that only a limited range of languages are possible. Each extant
language helps us learn about range of structural possibilities inherent in the brain.
I model how the return to communication aﬀects the decision to acquire a second
language and the decision to transmit languages to children. Interaction with others
allows individuals to take advantage of many diﬀerent types of gains from trade.
Knowledge of a second language expands the network of individuals with whom one
can interact. Gains from trade thus provide individuals with incentives to expand
their communication network by becoming bilingual. Returns to language knowledge
also inﬂuences parent’s decisions about which languages their children will learn.
Parents may be less likely to transmit a language which only expands their child’s
communication network a small amount.
Consider a two-period economy with two production sectors. The economy is
populated by N dynasties. Each dynasty has one worker alive in both period 0 and
period 1. Workers are endowed with one unit of labor and engage in production
in each period. Both sectors produce the same ﬁnal good, but one sector makes
use of a more productive technology. Workers in the more productive sector must
communicate with one another to use the technology. The price of the ﬁnal good
is normalized to 1. Workers care about overall consumption for their dynasty j:
Uj = c0 + c1.
Two languages, Gujarati (G) and Hindi (H), are spoken in the economy. Each
worker has either G or H as their mother tongue. Some workers may be bilingual.
The population shares of monolingual G and H speakers in period t are mt and mt ;
the share of bilinguals is bt. These shares sum to one: mt + mt + bt = 1. The period
t population share of everyone able to speak G and H, whether as monolinguals or as
bilinguals, are pt = 1 − mt and pt = 1 − mt . I assume that a majority of workers
speak H while a minority speak G: p0 > 1 > p0 ; in period 0. This implies that in
period 0 there are fewer monolingual speakers of G than of H.
In the analysis that follows, I will think of the more productive sector as manu-
facturing and the less productive one as owner-operator agriculture. Manufacturing
workers rely more on coordination in their work than farmers. They are also more
likely to be working with unrelated people than farmers because they ﬁnd employ-
ment through the labor market. However, gains from communication also appear in
other settings, such as cities. Urban households rely more heavily on goods and ser-
vices purchased market in their production of household goods, such as clean clothes
At the beginning of each period, workers are randomly paired into ﬁrms. If mem-
bers of a ﬁrm share a language in common, they can jointly operate the manufacturing
technology and each earn the return wM . If they do not share a common language,
the workers use the agricultural technology and each earn return wA ≤ wM . Let
λ = wM − wA be the productivity diﬀerence between manufacturing and agriculture.
The productivity diﬀerence does not vary over time. The expected period 0 income
of a monolingual H speaker is p0 w0 + (1 − p0 )w
Bilinguals earn wM with cer-
tainty since they can communicate with everyone. While workers in the real world
target their job search based on where they think the opportunities are best, this sim-
ple framework captures the intuitively appealing idea that membership in a bigger
network leads to better matches between worker and ﬁrm.
After workers are matched and produce in period 0, they give birth to one child.
Each period 0 worker’s existing language abilities are inherited by the child. The
period 0 worker may also invest in an additional language for their child. Learning
ability varies across dynasties, reﬂected in the language learning cost cj ∼ U [0, c].
Workers can costlessly borrow against period 1 income to ﬁnance investment if they
wish. In making their decision, workers do not act strategically. Once period 0
workers have made their investment decision the period ends.
A monolingual G speaker will invest in bilingualism if doing so increases income
in his or her dynasty in period 1. This will be the case if the cost of learning H is less
than the expected value of the additional return λ their child will get from being able
to work in manufacturing if they happen matched with a monolingual H speaker.
λm1 ≥ c
A parallel inequality holds for monolingual H speakers. Bilinguals will never invest
since they already know both languages. The shares qG and qH of monolingual G and
H speakers for whom the beneﬁts of becoming bilingual outweigh the costs are given
and qH =
Workers make the investment decision at the end of period 0 anticipating the
equilibrium share of workers that will be able to speak G and H in period 1. The
period 1 population shares that speak G and H in turn depend on the decisions of
the monolingual workers
p1 = p0 + q
and p1 = p0 + q
I use 3.3 to 3.2 solve for the equilibrium shares qG and qH in terms of the information
available in period 0.
1 − λ2 m0 m0
1 − λ2 m0 m0
The shares will be bounded between 0 and 1 as long as the wage gap between man-
ufacturing and agriculture is not too large relative to the cost of becoming bilingual.
I therefore assume that λ ≤ c.
Language Share, Manufacturing, and Bilingualism
Three results ﬂow from equations 3.5 and 3.5 that link bilingualism to the population
share speaking each mother tongue and to the wage gap between manufacturing and
Result 1 A larger share of monolingual speakers will become bilingual when they are
a smaller share of the population.
This follows directly from diﬀerentiating qG and qH with respect to the initial popu-
lation share of H speakers: ∂qG < 0 and ∂qH < 0.
Result 2 A larger share monolinguals will become bilingual when there is a larger
gap in returns between manufacturing and agriculture.
This similarly follows by diﬀerentiating qG and qH with respect to λ: ∂qG > 0 and
∂qH > 0.
Result 3 The incentive to become bilingual generated by the manufacturing-agriculture
wage gap is larger for the minority G speakers than the majority H speakers.
Diﬀerentiating qG and qH as before and using the assumption that p0 < p0 gives
the result that ∂qG < ∂qH . Intuitively, because H speakers are a larger share of the
population, the additional return from being match with certainty to someone who
shares a common language is lower, while the cost cj of becoming bilingual is ﬁxed.
H speakers will thus have less incentive to learn G than G speakers will have to learn
Bilingualism and Language Shift
Language shift within a lineage depends on a bilingual parent passing on only their
second language to their children. I have assumed thus far that parents costlessly
pass along to their children all languages known to them. This implies that bilingual
dynasties will remain bilingual, even if one of the languages they know has very
few or no monolingual speakers and therefore low communication value. There are
examples of stable bilingualism of this type. In Wales, for example, about 20% of
the population currently speaks Welsh even though there are no monolingual Welsh
speakers left. However, the global pattern of language consolidation implies that
language shift is a very common outcome.
I extend the model to allow for language shift by assuming that one language
can be transmitted costlessly between parent and child, but transmitting a second
one costs s. Successive dynasties must share a common language, in keeping with
the real-world observation that parents and children always share the same language.
Results 1 to 3 will continue to hold under these assumptions: a that monolingual G