This article is reposted from the old Wordpress incarnation of Not Exactly Rocket Science. The blog is on holiday until the start of October, when I'll return with fresh material.
For decades, scientists have realised that languages evolve in strikingly similar ways to genes and living things. Their words and grammars change and mutate over time, and new versions slowly rise to dominance while other face extinction.
In this evolutionary analogy, old texts like the Canterbury Tales are the English language's version of the fossil record. They preserve the existence of words that used to be commonplace before they lost a linguistic Darwinian conflict with other, more popular forms.
Now, Erez Lieberman, Martin Nowak and colleagues from Harvard University are looking at this record to mathematically model how our verbs evolved and how they will change in the future.
Today, the majority of English verbs take the suffix '-ed' in their past tense versions. Sitting alongside these regular verbs like 'talked' or 'typed' are irregular ones that obey more antiquated rules (like 'sang/sung' or 'drank/drunk') or obey no rules at all (like 'went' and 'had').
In the Old English of Beowulf, seven different rules competed for governance of English verbs, and only about 75% followed the "-ed" rule. As the centuries ticked by, the irregular verbs became fewer and far between. With new additions to the lexicon taking on the standard regular form ('googled' and 'emailed'), the irregulars face massive pressure to regularise and conform.
Today, less than 3% of verbs are irregular but they wield a disproportionate power. The ten most commonly used English verbs - be, have, do, go say, can, will, see, take and get - are all irregular. Lieberman found that this is because irregular verbs are weeded out much more slowly if they are commonly used.
To get by, speakers have to use common verbs correctly. More obscure irregular verbs, however, are less readily learned and more easily forgotten, and their misuse is less frequently corrected. That creates a situation where 'mutant' versions that obey the regular "-ed" rule can creep in and start taking over.
Lieberman charted the progress of 177 irregular verbs from the 9th century Old English of Beowulf, to the 13th century Middle English of Chaucer's Canterbury Tales, to the modern 21st century English of Harry Potter. Today, only 98 of these are still irregular; many formerly irregular verbs such as 'laugh' and 'help' have put on new regular guises.
He used the CELEX corpus - a massive online database of modern texts - to work out the frequency of these verbs in modern English. Amazingly, he found that this frequency affects the way that irregular verbs disappear according to a very simple and mathematical formula.
They regularise in a way that is 'inversely proportional to the square root of their frequency'. This means that if they are used 100 times less frequently, they will regularise 10 times as fast and if they are used 10,000 times less frequently, they will regularise 100 times as fast.
As Lieberman says, "We measured something no one really thought could be measured, and got a striking and beautiful result." Using this model, the team managed to estimate how much staying power the remaining irregular verbs have and assigned them 'half-lives' just as they would to radioactive isotopes that decay over time.
The two most common irregulars - 'be' and 'have' - crop up once or more in every ten words and have half-lives of over 38,000 years. That's such a long time that they are effectively immune to regularity and are unlikely to change.
Less common verbs like 'dive' and 'tread' only turn up once in every 10,000-100,000 words. They have much shorter half-lives of 700 years and for them, regularisation is a more imminent prospect. Out of the 98 remaining irregular verbs examined in the study, a further 16 will probably have adopted the '-ed' ending by 2500.
Which will be next? Lieberman has his speculative sights set on 'wed'. It is one of the least commonly used of modern irregular verbs and the past form 'wed' will soon be replaced with 'wedded'. As he jokes, "Now is your last chance to be a 'newly wed'. The married couples of the future can only hope for 'wedded' bliss.
That little jibe highlights the greatest strength of this paper - it's not the striking and elegant results, it's Lieberman's delightful turns of phrase. Suitably for a study about language, he describes his results in pithy and measured language. Observe, for example, his concluding paragraph:
"In previous millennia, many rules vied for control of English language conjugation and fossils of those rules remain to this day. Yet, from this primordial soup of conjugations, the suffix '-ed' emerged triumphant. The competing rules are long dead, and unfamiliar even to well-educated native speakers. These rules disappeared because of the gradual erosion of their instances by a process that we call regularisation. But regularity is not the default state of a language - a rule is the tombstone of a thousand exceptions."
Ah, if only all scientists could write with such poetic flair.
Reference: Lieberman, Michel, Jackson, Tang & Nowak. 2007. Quantifying the evolutionary dynamics of language. Nature doi:10.1038/nature06137
- Log in to post comments
Excellent summary. The same subject is also covered very well in "Words and Rules" by Steven Pinker.
What about "read"? In my opinion that is one participle that needs to change as it frequently causes ambiguity in written English.
I guess this explains where the battle between "sneaked" and "snuck" will eventually go.
"Read" won't change; it's too common. The spelling might, though.
There's also an intermediate stage, where the truly irregular verbs - the ones who past tense and past participle are different - conflate the two into one, mimicking the regular paradigm. Oddly, sometimes the participle "wins" (as is happening with "seen, done" - you don't hear "have saw" or "have did") but sometimes the past does ("have sang" is quite common).
I'm wondering if plotting against time could maybe not be the best way to look for a rule... it may depend on the number of speakers, too, ... or the number of new speakers, etc. It MAY be more important to consider the number for whom it's a second language, since those are the ones who may rely on rules more (eg pidgin)
Perhaps for the sake of future children (and teachers) we should encourage the process of regularisation...?
I wonder how this interacts with the tendency for dialects to speciate into new languages, and whether there's a parallel in evolutionary genetics for this regularization.
Some of the regularisation has happened in some regions and not in others. Surprisingly for a country of great linguistic invention, the USA seems to be holding onto the older forms. In Britain, we always have "dived", never "dove", "pleaded" rather than "pled" (I think), and I'm sure there's others. And certainly "sneaked" rather than "snuck".
"If only all scientists could write with such poetic flair." Haha, I'm inspired. Perhaps I will write my next paper in iambic pantameter.
Great post. Thanks.
Fortunately, the Germans are on the case: the Gesellschaft zur Stärkung der Verben (Society for the Strengthening of Verbs - http://verben.texttheater.de/) is dedicated to encouraging the use of strong (i.e. irregular) verbs in German, which is also experiencing a gradual loss of such forms. Their page of proposed forms for English is at http://verben.texttheater.de/v3/en.htm.
I particularly like "scribble, scrolb, scrolben"
That explains why some words I'm sure I remember as being irregular when I was a kid, now show "ed" as an alternative past tense form in the dictionary and are used as often.
I would expect a fair amount of noise in this process due to local features though. For instance, I'd wager that "have" is actually a lot more likely to change than "be". "beed" is awkward to pronounce and write, and it sounds nothing like its current past form. "haved", on the other hand, is easy to write and both its spelling and pronunciation is very close to the current past form of "had". I'd say that one has a small but non-zero chance of a fairly rapid flip if, for instance, the regular form is taken up as a joke or slang by young people and subsequently spreads into wider use.
It may also be true that the very existence of the internet will change this progression. How easy it is to type a word is already changing how people say and communicate things. Also, the way a word looks is much more noticeable, while the sound it makes when you say it is less so.(Emoticons, 'leet speak' are both good examples of these. Internet jargon such as 'pwned' may not even have a consistent pronunciation, but it does have a consistent spelling).
Lastly, I would look into acronym's as a source of lingual drift, as an acronym itself will often be suffixed with '-ed' even when the final word of the suffix would not. If, for instance, it becomes popular to say 'iwntb' to mean 'it was not to be' then it could arise to some such like "I iwntb'ed ..." that could lead to someone actually saying 'it was not to beed'.
Obviously just a bit of speculation on my part.
Of course, sometimes regular verbs become irregular. "Catch" is a good example. Historically, the past tense was "catched", but it later became "caught", by analogy with "teach" (which once rhymed with "catch")
Regularization is not the only process at work - if it was, there'd be no irregular forms in any language, given that all natural spoken languages are the end result of hundreds of thousands of years of linguistic evolution!
It would be interesting to see a study of how the process of *irregularization* occurs.
One small nitpick about that paper, though - written documents aren't often reflective of how people actually speak. In some cases, archaic forms may survive for centuries in writing after they'd been dropped from speech*, so that might pose some problems with a study like that.
*And changes in how conservative writing is can distort the perception of linguistic change - the Norman invasion being a great example, prior to 1066, there was a thriving scribal tradition in England which was highly conservative in its language, using forms that had been dropped for a long time in speech; when English writing was re-established a few decades after the Norman invasion, the new scribes simply wrote the language as it existed, creating the illusion of much more rapid change than actually existed, as a few centuries of actual change were compressed into a couple of decades of records.
This is a fascinating study. The fact that there could be any measurable correlation is amazing. I love the quote--especially "a rule is the tombstone of a thousand exceptions". Poetic flair in scientific papers would be wonderful!
On the other hand, the Urban Dictionary notes that "wung" is being used as the past tense of "wing," as in "I lost my notes for the speech so I wung it." Much better than "winged," and who knows if it hasn't just awoken from a linguistic slumber.
Regarding the haved vs had...what's not to say that 'had' is actually derived from the original 'haved' over a period of time? Think about it, it certainly seems very likely. Say 'Haved' often enough, and you'll be tempted to drop the 've' yourself! I know this to be the case with several French verb conjugations.
In fact, maybe it was whittled down to haved..hav'd..ha'd...had...any takers?
It would be interesting to see a study of how the process of *irregularization* occurs.
I've noticed that most media outlets have ditched "pled" in favor of "pleaded". I, for one, believe it sounds horrendous.
"The defendant pled 'not guilty.'"
"The defendant pleaded 'not guilty.'"
The other one that has always mystified me relates to a bodily function. Why is sh*t considered vulger, but anytime someone makes a mess in their pants it is okay to say that "he shat himself"?
Really interesting research methodology and all but what exactly's the use of the research? How does it help anybody? It just looks a little pointless to me.