- Optical character recognition problems increase going back in time, as widely noted.
- The "smoothing" function, which is on by default, can make anomalous spikes look like trends.
- Google Books contains a number of magazines, and these often bear the date of the magazine's first-ever issue.
- There are dates that are erroneous in other ways - the periodical Microprocessors and Microsystems is dated 1906. In another case I saw a date assigned that was actually a year mentioned in the title of the work, implying they're entered manually.
In short, the worst thing to use Ngram for (as currently implemented) is dates of first use. The word "robot," as everyone knows, entered English in 1921 with the play R.U.R., but Ngram would have us believe it enjoyed a bit of use in the 1900s. This is a combination of bad dates, mis-OCRing of "Robert," "robbery," etc. and the use of the word in other contexts, such as sociological discussions of the use of forced labor in Eastern Europe.
1 comments:
Post a Comment