Talking with someone comes so naturally that we forget sometimes how skilful it is. Rhythms of conversation and cues of grammar need to be judged so that people can take their turns at talking without cutting off their partner or without leaving pregnant pauses. The former is rude, the latter awkward.
That's certainly how things are usually conducted in English, but a new study suggests that this pattern of turn-taking applies across human cultures. By studying 10 languages from all over the world, Tanya Stivers from the Max Planck Institute for Psycholinguistics discovered a universally consistent pattern of avoiding overlaps and minimising pauses.
There are small variations certainly, but they are far smaller than stereotypes might suggest. Anecdotes and academic literature alike often claim that different cultures have radically different preferences for the tempo of conversations, from the reputed long pauses of Scandinavian speakers to the almost "simultaneous speech" of Jewish New Yorkers. But until now, no one had analysed these potential differences across a broad spectrum of languages and cultures.
Stivers did so by collecting video recordings of conversations in ten different languages from five continents - from English to Korean, and from Tzeltal (a Mayan language spoken in Mexico) to Yeli-Dyne (a language of just 4,000 speakers used in Papua New Guinea). In terms of grammar or sound, the tongues couldn't be more different and their speakers vary from hunter-gatherers in Namibia to city-dwellers in Japan.
All of the chats were spontaneous and informal, free of the specific rules that govern conversation in, say, interviews or courts of law. However, to get a decent comparison, Stivers restricted her search to dialogue involving questions and answers, and specifically those that entailed yes/no questions. These feature commonly in all ten languages but vary in how they're answered - from direct equivalents of yes/no to longer repetitions of the question. Stivers also checked a large database of Dutch conversation to confirm that the timings of turns in yes/no questions were representative of more general banter.
She found that in all ten languages, the delay between a question and its response follows a unimodal (single-peaked) distribution that peaks around zero and is skewed slightly to the right of it. That paints a clear picture - in all ten cultures, speakers shoot for as little silence as possible without speaking over each other, and the majority of answers follow questions after virtually no delay or overlap.
The average delays certainly varied from language to language, but with tiny differences. Danish speakers had the longest pauses between turns - that slightly confirms the Scandinavian stereotype, but even then the average delay was just half a second, about the time it takes to say two English syllables. For comparison, the most rapid-fire conversationalists were the Japanese, with just 7 milliseconds between turns. Both extremes are only a quarter of a second off from the international average, a pattern that fits much better with the idea of a universal system of turn-taking than it does with the idea of significant cultural differences.
Stivers also found that similar factors altered the gap between speakers in the various languages. In all ten, people answered questions after longer delays if their responses were contradictory or uninformative (such as "I don't know", "I can't remember", or "No, it isn't"). These denials are delivered about 100-500 milliseconds slower than confirmations, a difference that was statistically significant in seven of the ten languages. In contrast, speakers tended to answer more quickly if their questioners were looking directly at them, with significant differences in five of the ten languages. .
While the differences weren't always significant, the patterns were consistent and they fit with the idea of universal norms of dialogue. If there were strong cultural differences, you'd expect that different factors would affect the gap between speakers in different languages. Alternatively, the universal hypothesis predicts that the same things slow or speed up the transition between turns across different tongues - that's the pattern that Trivers found.
Why then are there variations? They're certainly not a natural consequence of linguistic structure. For example, Danish, Dutch and English, where questions are indicated by words at the start of the sentence (why, how, where etc.) are no more likely to elicit snappier responses than Japanese and Korean, where questions are indicated by markers at the end of sentences. Incidentally, the same fact shows that close relationships between languages - either through their origins or their geography - don't predict turn-taking customs either.
Instead, Stivers suggests that while all cultures attempt to minimise delays between speakers, they have different concepts of what counts as a delay. She asked independent analysts to watch the various dialogues and classify responses as late or on time, depending on the subjective rhythm of the conversations. Their judgments suggested that gaps of 36ms would feel on time to a Japanese speaker, those of 200ms would feel on time to Danish speakers.
Because language is so important to us, we are very sensitive to these infinitesimal fractions of time. It's this hypersensitivity that makes tiny differences in the tempos of other languages seem like vast gulfs of silence or near collisions of words. To an English speaker, the gaps between Nordic banter may seem enormous even though they're only a quarter of a second longer, barely enough time to say a syllable.
Reference: PNAS 10.1073/pnas.0903616106
More on languages:
- Five-month-old babies prefer their own languages and shun foreign accents
- Bacteria and languages reveal how people spread through the Pacific
- Babies can tell apart different languages with visual cues alone
- Why music sounds right - the hidden tones in our own speech
- Gestures reveal universal word order, regardless of language
She found that in all ten languages, the delay between a question and its response follows a normal distribution that peaks around zero and is skewed slightly to the right of it.
I'm not sure what you're saying here: a normal distribution isn't skewed, so you're implying that half the time the answer was started before the question was finished.
The graphs are horrible (not you fault, I know). They're pooled to every 500ms (I assume that's the unit), but all the differences you mention are less than that. So we can't see the important detail. It looks like the modal difference is always <250ms, but it's difficult to say if it's zero.
Actually, the graphs could show symmetrical distributions, just with means shifted to above zero by differing amounts. Difficult to say, though.
Yes, you're right. I've changed "normal" to "unimodal" - my bad. I agree with the point of the x-axis units on the graphs.
I'm sceptical. It sounds as if the study focussed on the least interesting dialogues (straightforward yes/no questions and their answers) and only confirmed them to be "representative of more general banter" for one language (Dutch). You'd get more diverse results by looking at questions that require the listener to think before answering. I understand that different cultures have different expectations with respect to whether you can begin to speak before you've fully planned what you're going to say, or whether you can begin to formulate a response while the other person is still speaking.
It would also be interesting to compare how long people wait before making a contribution that either changes the topic or takes it on a new tangent.
One subject that I took at university (not part of my major) was called "Language, Culture and Communication". Our instructor (a French woman) taught us that according to French communication mores, talking over each other is considered a sign of enthusiasm. That wouldn't suit me at all. Conversations in English with more than a few participants are generally way too fast for my tastes and I often have difficulty getting a word in edgeways. (Remember that, next time you drop in to my place for a coffee.)
Best science write-up of the week!
It would be interesting to see this sort of study done with conversations that consists of arguments, point and counter-point, and conversations that consists of anecdotes followed by other participants' anecdotes and see if there is the same consistency.
However, to get a decent comparison, Stivers restricted her search to dialogue involving questions and answers, and specifically those that entailed yes/no questions.
That makes this study completely boring.
I agree with the comments about the limitation to yes/no questions. Surely they could've simply recorded ordinary conversations and measured the time delay (or overlap) between participants' speech.
On a secondary note, I've never understood the concept of awkward silences. Long gaps have never bothered me.
The pause in Nordic languages is for hearty laughter and other sweeping forms of gesticulation, like taking deep breaths, or waving around giant weapons.
I was excited to see the headline specify "cultures" rather than "languages" or "countries". Imagine my disappointment when neither goths nor computer-programmers were included in the study. In my experience, both these sub-cultures tend to have longer gaps in speech-response, and to use less formal language (aka "small talk") than other sub-cultures in the same country. Computer programmers also tend to have more overlapping speech, in my unquantified observation.
Speaking of language generation, I am totally digging the word "embiggenise"