Saturday, February 5, 2011

Sneaked, snuck?

From the language log blog:

[...]
The context here is "Snuckward Ho!", 11/29/2009, "Snuck-gate", 6/18/2010, and "Graphically snuckward", 6/19/2010. Michael wrote:

Something you overlooked  — people are much more likely to say "snuck" and write "sneaked". A reflection of the influence of style manuals, perhaps?.

I was prompted to look at this by your comment

It's not clear whether this is a linguistic change (that is, a change the words that people choose to express a certain concept) or a cultural change (that is, a change in the concepts that people choose to write about).

since I doubt that most of the "choosing" of words in speech is as self-conscious as it is in writing.

Interestingly, the proportion of sneak : sneaked / snuck is consistent across the corpus (except in fiction, which is its own fictive thing), supporting the idea that some of this may be down to people choosing forms of words to express ideas — see Davidson's "A nice derangement of epitaphs" for the problems this raises for theories of comprehension.

Michael attached a graph based on the following data, from the COCA corpus, expressed as frequencies per million words:

SPOKENMAGAZINENEWSPAPERACADEMIC
sneak6.938.646.490.99
sneaks0.481.290.770.28
sneaked0.641.631.770.30
snuck1.871.270.680.16
SUM9.9212.839.711.73


The overall frequency of these forms of the lexeme sneak varies by a factor of about 7.5, from 1.73 per million in academic prose to 12.83 per million in COCA's magazine collection.

And the percentage of choosing snuck, given the choice between snuck and sneaked, varies from 35% in academic prose to 75% in COCA's spoken transcripts — that's the ratio of row 4 to the sum of rows 3 and 4.

But if we divide the sum of row 3 and row 4 by the sum of all the rows, we get a proportion of "sneaked" or "snuck" forms that is remarkably consistent across genres; and, of course, similarly for the complementary sum of rows 1 and 2:

SPOKENMAGAZINENEWSPAPERACADEMIC
snuck+sneaked25.3%22.6%25.2%26.6%
sneak+sneaks74.7%77.4%74.8%73.4%


In other words, the choice among abstract inflectional categories is much more consistent than either the choice among lexemes or the choice among inflectional variants.

As this morning's Breakfast Experiment™, I thought I'd check this in the LDC's conversational speech collection. The part of this collection that is indexed on line comprises 26,151,602 words of transcript. Adding this source to the previous tables, we get:

COCA SPOKENLDC SPOKENMAGAZINENEWSPAPERACADEMIC
sneak6.937.158.646.490.99
sneaks0.480.731.290.770.28
sneaked0.640.191.631.770.30
snuck1.871.981.270.680.16
SUM9.9210.0612.839.711.73


The snuck/(snuck+sneaked) percentage in this additional source is the highest of all, at 91%, compared to 75% in the COCA spoken category — as we expect, since the COCA "spoken" material is pretty formal in comparison. However, the snuck+sneaked and sneak+sneaks percentages remain rather consistent:

COCA SPOKENLDC SPOKENMAGAZINENEWSPAPERACADEMIC
snuck+sneaked25.3%21.7%22.6%25.2%26.6%
sneak+sneaks74.7%78.3%77.4%74.8%73.4%


No comments:

Post a Comment