Sunday, November 29, 2009

Resource review #5: Libraries and Google: a love/hate relationship

Waller, V. (22 August 2009) "The relationship between public libraries and Google: Too much information" First Monday, 14 (9). Retrieved from http://www.uic.edu/htbin/cgiwrap/bin/ojs/index.php/fm/article/view/2477/2279 

   I'll just get this out of the way. One thing that's really gotten to me in the course of reading all these articles is that librarians and other authors can't seem to keep the name of Google's digitization project straight. I understand that it's changed several times, from Google Print, to Google Library Project, to Google Book Search, to Google Books. According to the Google's history of the project, the name was changed to Google Books in 2005. That's plenty of time for librarians to catch up. If you're going to write an article that's critical of something, you've got to have your facts straight. Getting the object of criticism's name right should be the absolute bare minimum. Similarly, Waller doesn't seem to know the names of Google's founders, Larry Page and Sergey Brin. When citing their published work, she lists their last names correctly, but when she mentions them in passing, she refers to them as Sergey and Brin every time. I'm sure I'm guilty of my own fair share of typos, so I try not to harp on this sort of thing. But in all other respects, this article is very academic (if I was on the reference desk, I'd tell students it was scholarly), so in that context, this kind of error is rather glaring.

   Waller's article describes the partnership between Google and libraries in terms of the stages of a romantic relationship. At first, libraries felt that their goals matched perfectly with Google's, whose "stated mission is to organise the world’s information and make it useful." However, with time, libraries have realized that the match isn't so perfect after all. Most of the problems Waller lists stem from the fact that Google is a private company--it is driven by advertising revenue; it doesn't share the concern of libraries for users' privacy; and as a private company, it could go out of business, taking with it the massive number of books it has digitized. Additionally, Waller notes the suggestion made by many librarians that Google has "[conflated] information retrieval and knowledge." Waller (and the librarians she cites) fear that "the Google effect" will lead to a mindset that ignores traditional research methods in favor of shortcuts. Similarly, she cites "concerns that the relationship between the reader and a digital text will be superficial in comparison with the intimate relationship that can develop between a reader or scholar and a physical text," and fears "the possibility that searching a book will become an increasingly adequate substitute for reading it." Finally, she notes the complicated nature of digital preservation issues. 

   To her credit, Waller is not suggesting that librarians cut ties with Google. Rather, she stresses the importance of using Google as a starting point, and emphasizing to library users the limits of Google as a research tool. She argues that librarians "should teach library users, through example, about the difference between freely flowing information and balanced information. They should not be afraid of giving priority to more significant information. They should also be discussing the losses involved in representing an aspect of an analogue world with ones and zeroes. As philosopher of technology Don Ihde says, it is ‘what is revealed is what excites; what is concealed may be forgotten.’ Libraries need to pay attention to that which is concealed by Google’s search results and by digitised information. What is concealed includes vital aspects of human knowledge and culture and it is part of the task of the public library to preserve these things." 

   One of Waller's main points is that for information to be accessible and useful, it has to be organized, with priority given to information of greater significance. She claims that Google's search results have no such organization, but are only organized and ranked in order to serve the needs of advertisers. In fact, the organization of Google's search results has been well-documented--results are ranked by an algorithm influenced by citation analysis. It's certainly not perfect, but it is a form of relevance ranking and organization. Waller's failure to acknowledge this seems like a pretty big oversight.

   I'm also somewhat bothered by Waller's suggestion that if Google goes out of business, its digital library will disappear. Every participating library gets a digital copy of the books it allows Google to digitize. These can (and are) used in various ways--copies can be printed on the Espresso Book Machine, digital material can be integrated into the library's OPAC, libraries can create their own digital libraries, like the Hathi Trust Digital Repository, which began at the University of Michigan and is now a partnership between thirteen large research universities. Just because content is originally digitized by Google doesn't mean it has to live there exclusively. However, the problems Waller notes are real. Librarians should feel uncomfortable partnering with an organization that collects data about individual users and shares that data with advertisers. The Google settlement stipulates that public libraries can each have one terminal from which the public can access all of the material available on Google Books--will the convenience and benefits of this access be outweighed by the presence of targeted advertising and the potential invasion of users' privacy? I don't know. I'm planning on tackling privacy issues (and the dreaded copyright/legal challenges) next.

 

Saturday, November 28, 2009

Resource review #4 - Mass digitization leads to more books in print?

Rosen, J. (2009). Bookselling Heads To the Espresso Age. Publishers Weekly, 256(40), 3-4.

Badger, B. (2009, September 9). Books Digitized by Google Available via the Espresso Book Machine.
Retrieved from http://booksearch.blogspot.com/2009/09/books-digitized-by-google-available-via.html

(2008). U of Michigan Library Installs Espresso Book Machine. Advanced Technology Libraries, 37(11), 1, 10-11.

(2009). Espresso Book Machine. Retrieved from http://www.lib.umich.edu/espresso-book-machine

   In September, Google announced that it would partner with On Demand Books to make its two million + digitized public domain books available for printing on the Espresso Book Machine (EBM). The machine is "capable of making a 300-page perfect-bound book in five to seven minutes" and can print a yearly total of 60,000 books. According to Brandon Badger, a product manager at Google Books, "If sentient robots ever succeed in taking over the world, this is how they will print their books."
   When I originally came across this announcement at Inside Google Books, I thought it was just a neat bit of technology. According to Rosen's article, the implications are much larger. She cites one of the founders of On Demand Books as asserting that the machine's [relatively] low cost and the company's partnership with Google signal "the end of the Gutenberg age." The ability to quickly and inexpensively print books does have the potential to radically decentralize the publishing industry. Rosen's article explores the implications for small independent bookstores, and also alludes to possible library use. Dane Neller (cofounder and CEO of On Demand) suggests that "the Espresso machine enables local retailers to do everything a national behemoth like Amazon does." Additionally, Espresso machines can allow small bookstores to save space while offering a much larger inventory. Booksellers quoted in the article described plans to sell copies of classics, and in a university bookstore, to print copies of books authored by faculty members. Rosen also suggested that libraries may begin printing copies of digitized rare books.
    I  have some trouble imagining the use of the EBM in libraries. Would libraries be selling books to patrons, printing books to add to their collections, printing copies of digitized rare or fragile items? So far, the best example I can find is the University of Michigan library, which became the first university library to purchase an EBM in 2008. The library planned to sell copies of of books they had digitized for the Open Content Alliance, as well as items from their pre-1923 collection, for about $10 a book. U of M's dean of libraries, Paul Courant, stated "This is a significant moment in the history of book publishing and distribution. As a library, we're stepping beyond the limits of physical space. Now we can produce affordable printed copies of rare and hard-to-find books. It's a great step toward the democratization of information, getting information to readers when and where they need it." According to the library's website, U of M also expects to offer additional uses of the EBM: "Small runs of printed books produced by classes, such as anthologies of creative writing; printed copies of proceedings of University conferences and events; printing and binding course materials; self-publishing for Ann Arbor authors." This is potentially a large expansion of the library's role on campus, and I think it's illustrative of digitization's potential for expanding access to information, in digital form and paradoxically, in print as well.
   Possibly the most interesting point this article makes is that (at least in theory) the EBM gives booksellers an opportunity to offer readers a convenient print alternative to e-books. Many librarians worry that Google Books represents a serious challenge to the relevance of libraries as physical repositories for printed objects; however, this partnership can provide booksellers and librarians with an opportunity to inexpensively put more materials into print. The University of Michigan, an early and enthusiastic participant in mass digitization projects, suggests that the opportunity to return digitized materials to print provides necessary flexibility in format: "Rather than a one-size-fits-all solution, we believe that the best book format varies in relationship to its uses and its users. Some of the time, an electronic book -- that can be accessed any time, anywhere, and quickly searched -- is exactly what we need. At other times, the ideal form of the book is a nicely bound copy that helps with sustained reading, that serves as a physical reminder of a reading experience, or that can easily be passed from hand to hand." This lends further credence to the argument that the greatest justification for Google Books is the expansion of access to information. If the EBM becomes widely available (a big if, I suppose), users don't even have to have internet access to benefit from mass digitization. They just have to have $10.
   Finally, after looking at the DIY book scanner several weeks ago, I have to wonder how plausible it is that someone will cook up a DIY bookmaking machine. It seems like a pretty huge undertaking, but who knows! Then we'll  really democratize access.

Wednesday, November 18, 2009

Resource review #3

Google Books Mutilates the Printed Past. By: Musto, Ronald G., Chronicle of Higher Education, 6/12/2009, Vol. 55, Issue 39.

In this article, Ronald G. Musto, a medieval historian, describes the “promise and perils” of using Google Books for historical research. Musto’s work involves studying archival records related to Naples in the Middle Ages. He briefly describes the repeated destruction and subsequent reconstruction of those records. He notes that “for the few of us who work on the city's urban development, that double mutilation -- of both its archival and architectural past -- makes work difficult at best. More than many other historians, we have to rely on remnants to recreate this history.” Many of these remnants are now available on Google Books, which Musto is decidedly not satisfied with.

Like almost everyone involved in the debate about Google Books, Musto is pleased with the level of new access the resource provides. However, citing a key work in his field, he rails against the quality of Google’s scanning:

“In its frenzy to digitize the holdings of its partner collections, in this case those of the Stanford University Libraries, Google Books has pursued a "good enough" scanning strategy. The books' pages were hurriedly reproduced: No apparent quality control was employed, either during or after scanning. The result is that 29 percent of the pages in Volume 1 and 38 percent of the pages in Volume 2 are either skewed, blurred, swooshed, folded back, misplaced, or just plain missing. A few images even contain the fingers of the human page-turner. (Like a medieval scribe, he left his own pointing hand on the page!) Not bad, one might argue, for no charge and on your desktop. But now I'm dealing with a mutilated edition of a mutilated selection of a mutilated archive of a mutilated history of a mutilated kingdom -- hardly the stuff of the positivist, empirical method I was trained in a generation ago.”

While he admits that this is just one book, and that a cursory search of materials outside his field of study fails to reveal a similar concentration of errors, the poor scanning quality seems to essentially push him over the rhetorical edge. He expresses concerns that Google’s poorly scanned books will replace the world’s collections of rare books and archival materials, arguing that “should Google Books prevail, and the resources of the scholarly community be made irrelevant by Google's sheer scale and force, the future of our past will be in great doubt.”

This view seems pretty extreme to me, but it’s expressed often enough in a variety of articles and blog posts to merit discussion. I don’t think that Google Books is about preservation. I think it’s about access. The ability to do full-text searching in four million books is ridiculously convenient, and that massive opportunity comes at the expense of precision and quality. But given the legal complications of the Google Books settlement, I don’t think that access at the expense of preservation is an argument that Google’s leaders want to be publicly pushing. In a recent New York Times editorial called “A Library to Last Forever,” Google co-founder Sergey Brin (on the basis of title alone) is obviously suggesting that the project is justified because it will digitally preserve the world’s libraries.* So I suppose it makes sense to judge Google Books by the stringent standards that the goal of preservation implies, since these are the claims that the company itself is making.

Still, I don’t see any reason to jump to the conclusion that Google’s digitial copies of books are going to make physical collections irrelevant, especially in the case of rare books. If Google were to launch a project involving digitization of the world’s art, I don’t think anyone would suggest that museum curators may as well trash the original “Starry Night.” However, I do understand that the tenor of Musto’s argument is provoked in part by the arrogance of Google’s stated goals. He suggests that Google believes that the noble goals and public good resulting from the project grant them the “right to turn copyright on its head.” It’s an important point, as the stakes are pretty high. Whatever comes out of the settlement, the repercussions will be huge. Once I read a bit more about the new proposed settlement, I’ll blog about it.


*Brin does point out that without access, preservation doesn’t really matter: “…if our cultural heritage stays intact in the world’s foremost libraries, it is effectively lost if no one can access it easily.”

Tuesday, November 3, 2009

No weekend plans?


 
scanner + image by Daniel Reetz.

So, apparently you can make a fully functional book scanner yourself for about $300, if you're willing to scavenge a bit and you happen to have some power tools lying around. More info here and here.