Tuesday, September 18, 2012

The Great Google Book Scan

On my way into work today, I heard a segment on NPR about the court case of Google vs. the publishers and authors whose books are under an active copyright. This case is many years old (10+), but in the recap, the journalist was outlining the basics of the case, specifically Google's stance that their intention is to make all knowledge available to everyone, and the publishers/authors stances that Google is infringing on copyright law.  This got me to thinking about a number of issues surrounding this case (Note that I have not fully researched it, so these are simply questions that sprung to mind as I was listening).

1) How are the plaintiffs presenting the case?  Of course nothing is simple, but I would have to think that it would at minimum be two fold: a cease and desist order and a claim against lost revenue.

2) How do the plaintiffs quantify lost revenue?  Without seeing Google's records of the number of downloads of a specific copy of  work with an active copyright, how does the plaintiff quantify revenue lost?  So maybe there is an order to release those records

3) Even if on the surface, revenue is lost initially, how does the plaintiff quantify how many people have downloaded a book from Google and then, after reading it, actually bought a hard copy (or a legitimate eBook)?

Add to all of the above the inaccurate content of the scans provided by Google.  They place a disclaimer at the beginning of each eBook paraphrased something like this: Google uses an algorithm to capture the words within the scan and as such errors will occur.  However, the small number of errors is acceptable in their opinion given the greater benefit of having ready access to the book.  This is a point on which I have to differ, having downloaded a number of Google's eBooks myself (all copyright free, I might add). In each, the number of "typos" was at the very least distracting, but quite often confusing or misleading.  Some words were translated by Google's software as a jumble of letters which spelled nothing and I had to guess at what the author intended based on context.  This might be acceptable to a novice reader, or someone who really doesn't cultivate a high level of reading comprehension, but to me it took me out of the story.  I actually had to think about what I was reading -- not because of the content, but instead because of the actual mechanics of the delivery system.

I understand, and in some respects applaud, Google for their concept of making books available to everyone, but as a reader who cherishes words and the twist of a phrase, I have to question the depth of the intent.  If the entire project is to provide books for general use, then the actual books, as originally written, need to be what is made available.  In short, they need to grasp the concept that some things just can't be done by a machine.  They need an editor.

Alternatively, one could simply go to Project Guttenberg and get the books instead.  They proof read everything that they scan (they have a large staff of volunteers who do this) which, even though it is not perfectly edited, still  provides a very enjoyable reading experience.  And per a disclaimer on their website, all of the books they scan are in the public domain in the US (copyright has expired).

No comments:

Post a Comment