Martin, S. (2008). To Google or Not to Google, That Is the Question: Supplementing Google Book Search to Make It More Useful for Scholarship. Journal of Library Administration, 47(1/2), 141-150. Retrieved from http://search.ebscohost.com.ezproxy.library.wisc.edu/login.aspx?direct=true&db=lxh&AN=33007439&loginpage=Login.asp&site=ehost-live
Sutherland, J. (2008). A Mass Digitization Primer. Library Trends, 57(1), 17-23. Retrieved from http://search.ebscohost.com.ezproxy.library.wisc.edu/login.aspx?direct=true&db=lxh&AN=34929033&loginpage=Login.asp&site=ehost-live
Juliet Sutherland's article describes the process of digitization and outlines some of the many problems involved with using optical character recognition (OCR) to convert images of book pages into computer-readable text. She outlines a few of the potential solutions to these problems, including Recaptcha and Distributed Proofreaders. She notes the limits of both--Recaptcha doesn't correct spacing errors, and it can only be used to identify words that a computer is unsure of; it doesn't correct the computer's mistakes. The obvious limit of Distributed Proofreaders is time. Proofreading by dedicated humans is by far the most effective way to ensure the accuracy of digitized text, but there's no way that human proofreading will be able to match the speed of Google's scanning. Sutherland also mentions the possibilities of semantic coding, a process of enriching the text with useful information. She explains that "This can be as simple as identifying chapter titles or as complex as identifying whether a particular instance of the word 'Washington' refers to the person, the city, or the state."
Similarly, Shawn Martin's article discusses the limitations of Google Books for academic research, and discusses a possible solution. Martin is affiliated with the Text Creation Partnership (TCP). He describes their work as follows: "Instead of relying on a computer to read the book and extract readable text from an image (as OCR does) TCP works with companies whose employees read the text, transcribe it, and add structural tagging (that allows a computer to see elements of the book such as paragraphs, typeface changes, and chapters)." This process is undertaken typically by three people--two to transcribe and one to person to view and edit the results. If I understand the article correctly, Martin is not suggesting the Google Books move away from OCR and begin transcribing texts manually. Rather, he's suggesting that Google Books partner with a company like TCP to enhance their existing collections, particularly through the use of the kind of semantic coding Sutherland describes.
I agree with both authors that problems with the accuracy of OCR are a big obstacle to creating useful digital content. The solutions proposed by Sutherland seem most intriguing to me--despite its limits, Recaptcha (acquired by Google since the publication of this article) is an especially clever approach to the problem. Although Martin describes some very successful projects undertaken by TCP, it's just hard to imagine how that kind of detail-oriented, labor-intensive approach would work on Google's grand scale. I wonder if it's possible that eventually Google Books will identify materials that are of particular interest to researchers, and make strides to ensure accuracy and provide enhanced content through semantic coding within those specific materials. In the meantime, though they can't match the scope of Google Books, the smaller projects undertaken by universities that Martin describes, as well as projects like the Internet Archive and Project Gutenberg, simply offer higher quality products to researchers.
Because this is my last resource review, I'll also throw in a few thoughts about solutions to the larger problems of legality and privacy. Based on what I've read, I'm not sure that the new proposed settlement offers large enough changes to appease the Department of Justice. Grimmelmann's solution ("open up the settlement to any competitor on the same terms Google would receive") seems ideal. Competition will do more than just (hopefully) keep the cost of access down; in addition, if other providers of digitized content find innovative ways to increase accuracy and enrich content while still digitizing books at a rapid pace, this would force Google to make similar improvements.
As for privacy concerns and the larger problems that come with this concentration of content in the hands of a private company, I'm not sure what can be done. Librarians can't expect Google to adhere to the values of our profession (beyond their pledge to not be evil). Google Books is an amazing tool, and if libraries must use it carefully and thoughtfully because of its proprietary nature, so be it. It's still extremely useful. I also think it's important for libraries to consider taking on their own digitization projects when possible, because that allows them a greater degree of quality control. Additionally, it's worth watching to see what libraries like the University of Michigan do with the digitized material that Google provides. Libraries need not "eliminate their print collections and become dependent on Google's institutional subscription, only to see its price rise uncontrollably" (Grimmelmann, 2009); libraries can use Google Books without depending on it exclusively for digital content.
Thursday, December 17, 2009
Wednesday, December 16, 2009
Resource review #7: The amended settlement & objections
Band, J. (November 23, 2009). A Guide for the Perplexed Part III: The Amended Settlement Agreement, American Library Association, Association of College and Research Libraries, and Association of Research Libraries. Retrieved from http://www.arl.org/bm~doc/guide_for_the_perplexed_part3_final.pdf
Grimmelmann, J. (November 23, 2009). James Grimmelmann on The Google Settlement: what’s right, what’s wrong, what’s left to do, Publisher’s Weekly. Retrieved from http://www.publishersweekly.com/article/CA6708106.html
In cooperation with the American Library Association, the Association of College and Research Libraries, and the Association of Research Libraries, Jonathan Band has published several "[Guides] for the Perplexed," which outline the changes made in various versions of the Google Books Settlement, with a focus on the implications for libraries. I highly recommend these guides, as they briefly and clearly explain concepts in other contexts can seem kind of nebulous. I'll summarize his most recent summary of the Amended Settlement Agreement (ASA):
1. In response to complaints made by foreign rightsholders of books, the ASA does not apply to these books (with a few exceptions). This means that the great majority of books published elsewhere will not be available in full-text. This is a lot of books, possibly half of the books Google has digitized so far. Google will keep scanning these kinds of books, and will attempt to get permission from rightsholders to provide full-text access. Not only does this remove a huge quanitity of books from Google's collection, it also removes the great majority of plaintiffs from "the plaintiff class" in the settlement.
2. OCLC is now included among institutions "that can receive benefits under the settlement."
3. The ASA extends the time in which rightsholders can "request the removal" of books.
4. Changes to the Book Rights Registry set up by the ASA:
-The Book Rights Registry will have "the purely discretionary" ability to make more than one free public access terminal available at public libraries.
-The BRR will no longer be permitted to use unclaimed funds resulting from the sale of orphan works for operating expenses. These will now be given to charity. Up to 25% of these funds can be used to search for rightsholders of unclaimed works.
-An Unclaimed Works Fiduciary will be selected by the BRR (with the court's approval) to "[represent] the interests of the rightsholders of the unclaimed works."
-The BRR will protect the right of rightsholders to distribute their works through Creative Commons licensing, or other alternative licenses.
5. The ASA removes a few clauses that give Google favored treatment over third parties providing similar services.
6. Google waives its right to antitrust immunity -- apparently "under the Noerr-Pennington doctrine, if an activity receives government approval, it cannot form the basis of antitrust liability." This means that the Department of Justice now has time to see how things play out before determining if Google Books should actually be the target of an antitrust investigation.
James Grimmelmann is a professor at New York Law School who has written extensively about Google Books (other articles are available here). He is also responsible for Public-Interest Book Search Initiative, which is in turn responsible for (as far as I'm concerned) the best and most comprehensive resource about Google Books, the Public Index. The Public Index provides access to the settlement documents and to related legal documents. Users can annotate the settlement. Additionally, the Public Index links to a wealth of articles on the subject, written from a legal perspective or for a wider audience.
Grimmelmann's most recent general audience article on the subject discusses the ASA and identifies positive changes and areas that remain problematic. The new settlement, as he sees it, consists of "one big feature cut and a bunch of small bug fixes" (the big feature cut being the exclusion of books with foreign rightsholders). Grimmelmann concludes that while the changes to the settlement are mostly positive, the bigger issues are unchanged. Or as he put it more eloquently, "the dark heart of the deal remains: Google will still have effectively exclusive access to unclaimed books." According to Grimmelmann, the issue of antitrust is unresolved. In addition, the opt-out feature of the settlement threatens to set what Grimmelmann calls " a bad precedent for future class actions." He explains: "the plaintiffs aren't just giving up the right to sue Google for scanning their books; they're also being shanghaied into a complicated commercial deal that includes a controversial allocation of electronic book rights and requires them to give up the right in the future to sue Google for plenty of things it hasn't even contemplated doing yet." Because of this, Grimmelmann argues that the court should reflect not only on the fairness of the proposed settlement, but also on the implications for future class action cases.
I've been trying to focus on resources that present perspectives on the potential library use of Google Book as a tool, without any of the legal and ethical arguments for and against. These kinds of articles are somewhat difficult to locate, because (understandably) it's difficult to write anything about Google Books without addressing its legal implications. It's been easy to dismiss a lot of the arguments librarians have made against Google Books--for example, the problem of poor scanning quality and metadata--because the project provides such unprecedented access to such a massive quantity of materials. However, at the core of the project is a massive trade-off: we have unprecedented access to these materials, but other similar vendors have their access to these materials severely curtailed (As Grimmelmann puts it: "A competitor, however, would need to get individual permission [to sell these works] first or be sued into oblivion. That's hard enough in general, and for orphan books it's impossible. There's no one to ask. The class action opens a door for Google, but leaves it closed for everyone else"). Without competition, its possible that the costs to use Google Books will rise dramatically, again limiting access to the materials they've digitized. Grimmelmann describes the possible implications: "Will it drive libraries to eliminate their print collections and become dependent on Google's institutional subscription, only to see its price rise uncontrollably? Will the FBI force Google to turn over its lists of who's been reading the Qur'an? If these kinds of broad-reaching policy decisions were being made by Congress, the legislative process would in theory take everyone's interests into account. But in a settlement negotiated by a handful of lawyers, the danger is always that the “public interest” means whatever they say it does."
Grimmelmann's Public-Interest Book Search Initiative exists for the purposes of encouraging the public to discuss and evaluate the settlement. In that sense, the project attempts to allow the public to reflect on the real meaning of "public interest." As such, the materials provided in the Public Index and the guides written by Jonathan Band are a valuable resource for librarians concerned about the Google Books Settlement's effects on public access to information.
Grimmelmann, J. (November 23, 2009). James Grimmelmann on The Google Settlement: what’s right, what’s wrong, what’s left to do, Publisher’s Weekly. Retrieved from http://www.publishersweekly.com/article/CA6708106.html
In cooperation with the American Library Association, the Association of College and Research Libraries, and the Association of Research Libraries, Jonathan Band has published several "[Guides] for the Perplexed," which outline the changes made in various versions of the Google Books Settlement, with a focus on the implications for libraries. I highly recommend these guides, as they briefly and clearly explain concepts in other contexts can seem kind of nebulous. I'll summarize his most recent summary of the Amended Settlement Agreement (ASA):
1. In response to complaints made by foreign rightsholders of books, the ASA does not apply to these books (with a few exceptions). This means that the great majority of books published elsewhere will not be available in full-text. This is a lot of books, possibly half of the books Google has digitized so far. Google will keep scanning these kinds of books, and will attempt to get permission from rightsholders to provide full-text access. Not only does this remove a huge quanitity of books from Google's collection, it also removes the great majority of plaintiffs from "the plaintiff class" in the settlement.
2. OCLC is now included among institutions "that can receive benefits under the settlement."
3. The ASA extends the time in which rightsholders can "request the removal" of books.
4. Changes to the Book Rights Registry set up by the ASA:
-The Book Rights Registry will have "the purely discretionary" ability to make more than one free public access terminal available at public libraries.
-The BRR will no longer be permitted to use unclaimed funds resulting from the sale of orphan works for operating expenses. These will now be given to charity. Up to 25% of these funds can be used to search for rightsholders of unclaimed works.
-An Unclaimed Works Fiduciary will be selected by the BRR (with the court's approval) to "[represent] the interests of the rightsholders of the unclaimed works."
-The BRR will protect the right of rightsholders to distribute their works through Creative Commons licensing, or other alternative licenses.
5. The ASA removes a few clauses that give Google favored treatment over third parties providing similar services.
6. Google waives its right to antitrust immunity -- apparently "under the Noerr-Pennington doctrine, if an activity receives government approval, it cannot form the basis of antitrust liability." This means that the Department of Justice now has time to see how things play out before determining if Google Books should actually be the target of an antitrust investigation.
James Grimmelmann is a professor at New York Law School who has written extensively about Google Books (other articles are available here). He is also responsible for Public-Interest Book Search Initiative, which is in turn responsible for (as far as I'm concerned) the best and most comprehensive resource about Google Books, the Public Index. The Public Index provides access to the settlement documents and to related legal documents. Users can annotate the settlement. Additionally, the Public Index links to a wealth of articles on the subject, written from a legal perspective or for a wider audience.
Grimmelmann's most recent general audience article on the subject discusses the ASA and identifies positive changes and areas that remain problematic. The new settlement, as he sees it, consists of "one big feature cut and a bunch of small bug fixes" (the big feature cut being the exclusion of books with foreign rightsholders). Grimmelmann concludes that while the changes to the settlement are mostly positive, the bigger issues are unchanged. Or as he put it more eloquently, "the dark heart of the deal remains: Google will still have effectively exclusive access to unclaimed books." According to Grimmelmann, the issue of antitrust is unresolved. In addition, the opt-out feature of the settlement threatens to set what Grimmelmann calls " a bad precedent for future class actions." He explains: "the plaintiffs aren't just giving up the right to sue Google for scanning their books; they're also being shanghaied into a complicated commercial deal that includes a controversial allocation of electronic book rights and requires them to give up the right in the future to sue Google for plenty of things it hasn't even contemplated doing yet." Because of this, Grimmelmann argues that the court should reflect not only on the fairness of the proposed settlement, but also on the implications for future class action cases.
I've been trying to focus on resources that present perspectives on the potential library use of Google Book as a tool, without any of the legal and ethical arguments for and against. These kinds of articles are somewhat difficult to locate, because (understandably) it's difficult to write anything about Google Books without addressing its legal implications. It's been easy to dismiss a lot of the arguments librarians have made against Google Books--for example, the problem of poor scanning quality and metadata--because the project provides such unprecedented access to such a massive quantity of materials. However, at the core of the project is a massive trade-off: we have unprecedented access to these materials, but other similar vendors have their access to these materials severely curtailed (As Grimmelmann puts it: "A competitor, however, would need to get individual permission [to sell these works] first or be sued into oblivion. That's hard enough in general, and for orphan books it's impossible. There's no one to ask. The class action opens a door for Google, but leaves it closed for everyone else"). Without competition, its possible that the costs to use Google Books will rise dramatically, again limiting access to the materials they've digitized. Grimmelmann describes the possible implications: "Will it drive libraries to eliminate their print collections and become dependent on Google's institutional subscription, only to see its price rise uncontrollably? Will the FBI force Google to turn over its lists of who's been reading the Qur'an? If these kinds of broad-reaching policy decisions were being made by Congress, the legislative process would in theory take everyone's interests into account. But in a settlement negotiated by a handful of lawyers, the danger is always that the “public interest” means whatever they say it does."
Grimmelmann's Public-Interest Book Search Initiative exists for the purposes of encouraging the public to discuss and evaluate the settlement. In that sense, the project attempts to allow the public to reflect on the real meaning of "public interest." As such, the materials provided in the Public Index and the guides written by Jonathan Band are a valuable resource for librarians concerned about the Google Books Settlement's effects on public access to information.
Tuesday, December 1, 2009
Resource review # 6 - ...maybe it is preservation after all?
Blakeley, R. (2009). What Was Lost, Now Is Found: Using Google Books and Internet Archive to Enhance a Government Documents Collection with Digital Documents. DttP, 37(3), 26-9.
In this article, Rebecca Blakely, a government documents librarian at McNeese State University, describes the process by which she used Google Books and the Internet Archive to supplement the McNeese Library government documents collection. The collection fared badly in Hurricane Rita, suffering water damage and mold. Blakely eventually stumbled upon some full-text government documents in Google Books while helping a patron, and it occurred to her that digitized materials could compensate somewhat for the library's loss. She describes the search methods she used to find government documents in both Google Books and the Internet Archive, and compares the strengths and weaknesses of the two.
For Blakely, the best feature in Google Books is the "my library" option, which can be used to compile items and share them with other users. She started compiling full-text government documents she found using that option - her collection is available here. (Because of extensive tagging, in some ways her small digital library is much more easily browsable than physical collections of government documents.) Blakely notes that it's also possible to create RSS feeds to point out new items added to the collection. She mentions that the quality of scanning and metadata varies, but praises the range of viewing options: zooming, one or two page display, plain-text display, thumbnails or full screen. Her biggest complaint is that Google Books only provides limited viewing of many government documents, even though the great majority of them are in the public domain. (Google responded to an email about this by explaining that rather than taking the time to figure out the rights status, they just add materials in limited view until the status can be determined for sure. Hopefully this means that more government documents will be available in full view later on.) It is for this reason that Blakely prefers the Internet Archive.
This fits in pretty well with the comparisons drawn by Kalev Leetaru in an article I wrote about previously. The Internet Archive doesn't post books until they've determined that materials are in the public domain or secured permission from the rightsholder in question. They also take a lot more time to produce high-quality scans. As a result, there's significantly less there, but what's there isn't as messy as Google Books. Additionally, the Internet Archive allows users to upload materials to the collection -- Blakely notes that some materials have been uploaded by users who originally downloaded them from Google Books. The Internet Archive allows users to bookmark items, which can then be shared via RSS feeds. The site also offers a "bookmark explorer," which allows users to view items bookmarked by others.
This article illustrates a pretty neat use of these two large digital repositories, and provides good examples of the differences between the two, in terms of features and underlying philosophies. The Internet Archive, while growing, looks like a finished product, while Google Books is very much constantly in progress. I came across an interesting blog post by Ed Felton recently, discussing another blog post about the metadata problems at Google Books; it addresses this point:
Either way, Blakely's article serves as a great example of the flexibility that digitization allows. We've read a great deal this semester about the complicated nature of digital preservation, but in cases like this, digitized documents are certainly preferable to moldy ones.
In this article, Rebecca Blakely, a government documents librarian at McNeese State University, describes the process by which she used Google Books and the Internet Archive to supplement the McNeese Library government documents collection. The collection fared badly in Hurricane Rita, suffering water damage and mold. Blakely eventually stumbled upon some full-text government documents in Google Books while helping a patron, and it occurred to her that digitized materials could compensate somewhat for the library's loss. She describes the search methods she used to find government documents in both Google Books and the Internet Archive, and compares the strengths and weaknesses of the two.
For Blakely, the best feature in Google Books is the "my library" option, which can be used to compile items and share them with other users. She started compiling full-text government documents she found using that option - her collection is available here. (Because of extensive tagging, in some ways her small digital library is much more easily browsable than physical collections of government documents.) Blakely notes that it's also possible to create RSS feeds to point out new items added to the collection. She mentions that the quality of scanning and metadata varies, but praises the range of viewing options: zooming, one or two page display, plain-text display, thumbnails or full screen. Her biggest complaint is that Google Books only provides limited viewing of many government documents, even though the great majority of them are in the public domain. (Google responded to an email about this by explaining that rather than taking the time to figure out the rights status, they just add materials in limited view until the status can be determined for sure. Hopefully this means that more government documents will be available in full view later on.) It is for this reason that Blakely prefers the Internet Archive.
This fits in pretty well with the comparisons drawn by Kalev Leetaru in an article I wrote about previously. The Internet Archive doesn't post books until they've determined that materials are in the public domain or secured permission from the rightsholder in question. They also take a lot more time to produce high-quality scans. As a result, there's significantly less there, but what's there isn't as messy as Google Books. Additionally, the Internet Archive allows users to upload materials to the collection -- Blakely notes that some materials have been uploaded by users who originally downloaded them from Google Books. The Internet Archive allows users to bookmark items, which can then be shared via RSS feeds. The site also offers a "bookmark explorer," which allows users to view items bookmarked by others.
This article illustrates a pretty neat use of these two large digital repositories, and provides good examples of the differences between the two, in terms of features and underlying philosophies. The Internet Archive, while growing, looks like a finished product, while Google Books is very much constantly in progress. I came across an interesting blog post by Ed Felton recently, discussing another blog post about the metadata problems at Google Books; it addresses this point:
"What's most interesting to me is a seeming difference in mindset between critics like Nunberg on the one hand, and Google on the other. Nunberg thinks of Google's metadata catalog as a fixed product that has some (unfortunately large) number of errors, whereas Google sees the catalog as a work in progress, subject to continual improvement. Even calling Google's metadata a "catalog" seems to connote a level of completion and immutability that Google might not assert. An electronic "card catalog" can change every day -- a good thing if the changes are strict improvements such as error fixes -- in a way that a traditional card catalog wouldn't."I think one of the biggest reasons for the backlash against Google Books by librarians stems from overlooking this. They feel like they've turned over a bunch of their best materials to be digitized, but it's been done sloppily in terms of scanning or metadata, and no one knows exactly what the final shape of Google Books will look like, once the settlement is (or isn't) finalized. I think it's a good point, but I'm also skeptical about the plausibility of fixing all these errors. Is Google planning to rescan everything that's blurry, or all the pages with visible scanning hands? I think the "beta" label is a good explanation for some problems, but isn't Google digging itself kind of a deep hole by doing so much so quickly and imprecisely?
Either way, Blakely's article serves as a great example of the flexibility that digitization allows. We've read a great deal this semester about the complicated nature of digital preservation, but in cases like this, digitized documents are certainly preferable to moldy ones.
Sunday, November 29, 2009
Resource review #5: Libraries and Google: a love/hate relationship
Waller, V. (22 August 2009) "The relationship between public libraries and Google: Too much information" First Monday, 14 (9). Retrieved from http://www.uic.edu/htbin/cgiwrap/bin/ojs/index.php/fm/article/view/2477/2279
I'll just get this out of the way. One thing that's really gotten to me in the course of reading all these articles is that librarians and other authors can't seem to keep the name of Google's digitization project straight. I understand that it's changed several times, from Google Print, to Google Library Project, to Google Book Search, to Google Books. According to the Google's history of the project, the name was changed to Google Books in 2005. That's plenty of time for librarians to catch up. If you're going to write an article that's critical of something, you've got to have your facts straight. Getting the object of criticism's name right should be the absolute bare minimum. Similarly, Waller doesn't seem to know the names of Google's founders, Larry Page and Sergey Brin. When citing their published work, she lists their last names correctly, but when she mentions them in passing, she refers to them as Sergey and Brin every time. I'm sure I'm guilty of my own fair share of typos, so I try not to harp on this sort of thing. But in all other respects, this article is very academic (if I was on the reference desk, I'd tell students it was scholarly), so in that context, this kind of error is rather glaring.
Waller's article describes the partnership between Google and libraries in terms of the stages of a romantic relationship. At first, libraries felt that their goals matched perfectly with Google's, whose "stated mission is to organise the world’s information and make it useful." However, with time, libraries have realized that the match isn't so perfect after all. Most of the problems Waller lists stem from the fact that Google is a private company--it is driven by advertising revenue; it doesn't share the concern of libraries for users' privacy; and as a private company, it could go out of business, taking with it the massive number of books it has digitized. Additionally, Waller notes the suggestion made by many librarians that Google has "[conflated] information retrieval and knowledge." Waller (and the librarians she cites) fear that "the Google effect" will lead to a mindset that ignores traditional research methods in favor of shortcuts. Similarly, she cites "concerns that the relationship between the reader and a digital text will be superficial in comparison with the intimate relationship that can develop between a reader or scholar and a physical text," and fears "the possibility that searching a book will become an increasingly adequate substitute for reading it." Finally, she notes the complicated nature of digital preservation issues.
To her credit, Waller is not suggesting that librarians cut ties with Google. Rather, she stresses the importance of using Google as a starting point, and emphasizing to library users the limits of Google as a research tool. She argues that librarians "should teach library users, through example, about the difference between freely flowing information and balanced information. They should not be afraid of giving priority to more significant information. They should also be discussing the losses involved in representing an aspect of an analogue world with ones and zeroes. As philosopher of technology Don Ihde says, it is ‘what is revealed is what excites; what is concealed may be forgotten.’ Libraries need to pay attention to that which is concealed by Google’s search results and by digitised information. What is concealed includes vital aspects of human knowledge and culture and it is part of the task of the public library to preserve these things."
One of Waller's main points is that for information to be accessible and useful, it has to be organized, with priority given to information of greater significance. She claims that Google's search results have no such organization, but are only organized and ranked in order to serve the needs of advertisers. In fact, the organization of Google's search results has been well-documented--results are ranked by an algorithm influenced by citation analysis. It's certainly not perfect, but it is a form of relevance ranking and organization. Waller's failure to acknowledge this seems like a pretty big oversight.
I'm also somewhat bothered by Waller's suggestion that if Google goes out of business, its digital library will disappear. Every participating library gets a digital copy of the books it allows Google to digitize. These can (and are) used in various ways--copies can be printed on the Espresso Book Machine, digital material can be integrated into the library's OPAC, libraries can create their own digital libraries, like the Hathi Trust Digital Repository, which began at the University of Michigan and is now a partnership between thirteen large research universities. Just because content is originally digitized by Google doesn't mean it has to live there exclusively. However, the problems Waller notes are real. Librarians should feel uncomfortable partnering with an organization that collects data about individual users and shares that data with advertisers. The Google settlement stipulates that public libraries can each have one terminal from which the public can access all of the material available on Google Books--will the convenience and benefits of this access be outweighed by the presence of targeted advertising and the potential invasion of users' privacy? I don't know. I'm planning on tackling privacy issues (and the dreaded copyright/legal challenges) next.
Saturday, November 28, 2009
Resource review #4 - Mass digitization leads to more books in print?
Rosen, J. (2009). Bookselling Heads To the Espresso Age. Publishers Weekly, 256(40), 3-4.
Badger, B. (2009, September 9). Books Digitized by Google Available via the Espresso Book Machine.
Retrieved from http://booksearch.blogspot.com/2009/09/books-digitized-by-google-available-via.html
(2008). U of Michigan Library Installs Espresso Book Machine. Advanced Technology Libraries, 37(11), 1, 10-11.
(2009). Espresso Book Machine. Retrieved from http://www.lib.umich.edu/espresso-book-machine
In September, Google announced that it would partner with On Demand Books to make its two million + digitized public domain books available for printing on the Espresso Book Machine (EBM). The machine is "capable of making a 300-page perfect-bound book in five to seven minutes" and can print a yearly total of 60,000 books. According to Brandon Badger, a product manager at Google Books, "If sentient robots ever succeed in taking over the world, this is how they will print their books."
When I originally came across this announcement at Inside Google Books, I thought it was just a neat bit of technology. According to Rosen's article, the implications are much larger. She cites one of the founders of On Demand Books as asserting that the machine's [relatively] low cost and the company's partnership with Google signal "the end of the Gutenberg age." The ability to quickly and inexpensively print books does have the potential to radically decentralize the publishing industry. Rosen's article explores the implications for small independent bookstores, and also alludes to possible library use. Dane Neller (cofounder and CEO of On Demand) suggests that "the Espresso machine enables local retailers to do everything a national behemoth like Amazon does." Additionally, Espresso machines can allow small bookstores to save space while offering a much larger inventory. Booksellers quoted in the article described plans to sell copies of classics, and in a university bookstore, to print copies of books authored by faculty members. Rosen also suggested that libraries may begin printing copies of digitized rare books.
I have some trouble imagining the use of the EBM in libraries. Would libraries be selling books to patrons, printing books to add to their collections, printing copies of digitized rare or fragile items? So far, the best example I can find is the University of Michigan library, which became the first university library to purchase an EBM in 2008. The library planned to sell copies of of books they had digitized for the Open Content Alliance, as well as items from their pre-1923 collection, for about $10 a book. U of M's dean of libraries, Paul Courant, stated "This is a significant moment in the history of book publishing and distribution. As a library, we're stepping beyond the limits of physical space. Now we can produce affordable printed copies of rare and hard-to-find books. It's a great step toward the democratization of information, getting information to readers when and where they need it." According to the library's website, U of M also expects to offer additional uses of the EBM: "Small runs of printed books produced by classes, such as anthologies of creative writing; printed copies of proceedings of University conferences and events; printing and binding course materials; self-publishing for Ann Arbor authors." This is potentially a large expansion of the library's role on campus, and I think it's illustrative of digitization's potential for expanding access to information, in digital form and paradoxically, in print as well.
Possibly the most interesting point this article makes is that (at least in theory) the EBM gives booksellers an opportunity to offer readers a convenient print alternative to e-books. Many librarians worry that Google Books represents a serious challenge to the relevance of libraries as physical repositories for printed objects; however, this partnership can provide booksellers and librarians with an opportunity to inexpensively put more materials into print. The University of Michigan, an early and enthusiastic participant in mass digitization projects, suggests that the opportunity to return digitized materials to print provides necessary flexibility in format: "Rather than a one-size-fits-all solution, we believe that the best book format varies in relationship to its uses and its users. Some of the time, an electronic book -- that can be accessed any time, anywhere, and quickly searched -- is exactly what we need. At other times, the ideal form of the book is a nicely bound copy that helps with sustained reading, that serves as a physical reminder of a reading experience, or that can easily be passed from hand to hand." This lends further credence to the argument that the greatest justification for Google Books is the expansion of access to information. If the EBM becomes widely available (a big if, I suppose), users don't even have to have internet access to benefit from mass digitization. They just have to have $10.
Finally, after looking at the DIY book scanner several weeks ago, I have to wonder how plausible it is that someone will cook up a DIY bookmaking machine. It seems like a pretty huge undertaking, but who knows! Then we'll really democratize access.
Badger, B. (2009, September 9). Books Digitized by Google Available via the Espresso Book Machine.
Retrieved from http://booksearch.blogspot.com/2009/09/books-digitized-by-google-available-via.html
(2008). U of Michigan Library Installs Espresso Book Machine. Advanced Technology Libraries, 37(11), 1, 10-11.
(2009). Espresso Book Machine. Retrieved from http://www.lib.umich.edu/espresso-book-machine
In September, Google announced that it would partner with On Demand Books to make its two million + digitized public domain books available for printing on the Espresso Book Machine (EBM). The machine is "capable of making a 300-page perfect-bound book in five to seven minutes" and can print a yearly total of 60,000 books. According to Brandon Badger, a product manager at Google Books, "If sentient robots ever succeed in taking over the world, this is how they will print their books."
When I originally came across this announcement at Inside Google Books, I thought it was just a neat bit of technology. According to Rosen's article, the implications are much larger. She cites one of the founders of On Demand Books as asserting that the machine's [relatively] low cost and the company's partnership with Google signal "the end of the Gutenberg age." The ability to quickly and inexpensively print books does have the potential to radically decentralize the publishing industry. Rosen's article explores the implications for small independent bookstores, and also alludes to possible library use. Dane Neller (cofounder and CEO of On Demand) suggests that "the Espresso machine enables local retailers to do everything a national behemoth like Amazon does." Additionally, Espresso machines can allow small bookstores to save space while offering a much larger inventory. Booksellers quoted in the article described plans to sell copies of classics, and in a university bookstore, to print copies of books authored by faculty members. Rosen also suggested that libraries may begin printing copies of digitized rare books.
I have some trouble imagining the use of the EBM in libraries. Would libraries be selling books to patrons, printing books to add to their collections, printing copies of digitized rare or fragile items? So far, the best example I can find is the University of Michigan library, which became the first university library to purchase an EBM in 2008. The library planned to sell copies of of books they had digitized for the Open Content Alliance, as well as items from their pre-1923 collection, for about $10 a book. U of M's dean of libraries, Paul Courant, stated "This is a significant moment in the history of book publishing and distribution. As a library, we're stepping beyond the limits of physical space. Now we can produce affordable printed copies of rare and hard-to-find books. It's a great step toward the democratization of information, getting information to readers when and where they need it." According to the library's website, U of M also expects to offer additional uses of the EBM: "Small runs of printed books produced by classes, such as anthologies of creative writing; printed copies of proceedings of University conferences and events; printing and binding course materials; self-publishing for Ann Arbor authors." This is potentially a large expansion of the library's role on campus, and I think it's illustrative of digitization's potential for expanding access to information, in digital form and paradoxically, in print as well.
Possibly the most interesting point this article makes is that (at least in theory) the EBM gives booksellers an opportunity to offer readers a convenient print alternative to e-books. Many librarians worry that Google Books represents a serious challenge to the relevance of libraries as physical repositories for printed objects; however, this partnership can provide booksellers and librarians with an opportunity to inexpensively put more materials into print. The University of Michigan, an early and enthusiastic participant in mass digitization projects, suggests that the opportunity to return digitized materials to print provides necessary flexibility in format: "Rather than a one-size-fits-all solution, we believe that the best book format varies in relationship to its uses and its users. Some of the time, an electronic book -- that can be accessed any time, anywhere, and quickly searched -- is exactly what we need. At other times, the ideal form of the book is a nicely bound copy that helps with sustained reading, that serves as a physical reminder of a reading experience, or that can easily be passed from hand to hand." This lends further credence to the argument that the greatest justification for Google Books is the expansion of access to information. If the EBM becomes widely available (a big if, I suppose), users don't even have to have internet access to benefit from mass digitization. They just have to have $10.
Finally, after looking at the DIY book scanner several weeks ago, I have to wonder how plausible it is that someone will cook up a DIY bookmaking machine. It seems like a pretty huge undertaking, but who knows! Then we'll really democratize access.
Wednesday, November 18, 2009
Resource review #3
Google Books Mutilates the Printed Past. By: Musto, Ronald G., Chronicle of Higher Education, 6/12/2009, Vol. 55, Issue 39.
In this article, Ronald G. Musto, a medieval historian, describes the “promise and perils” of using Google Books for historical research. Musto’s work involves studying archival records related to Naples in the Middle Ages. He briefly describes the repeated destruction and subsequent reconstruction of those records. He notes that “for the few of us who work on the city's urban development, that double mutilation -- of both its archival and architectural past -- makes work difficult at best. More than many other historians, we have to rely on remnants to recreate this history.” Many of these remnants are now available on Google Books, which Musto is decidedly not satisfied with.
Like almost everyone involved in the debate about Google Books, Musto is pleased with the level of new access the resource provides. However, citing a key work in his field, he rails against the quality of Google’s scanning:
“In its frenzy to digitize the holdings of its partner collections, in this case those of the Stanford University Libraries, Google Books has pursued a "good enough" scanning strategy. The books' pages were hurriedly reproduced: No apparent quality control was employed, either during or after scanning. The result is that 29 percent of the pages in Volume 1 and 38 percent of the pages in Volume 2 are either skewed, blurred, swooshed, folded back, misplaced, or just plain missing. A few images even contain the fingers of the human page-turner. (Like a medieval scribe, he left his own pointing hand on the page!) Not bad, one might argue, for no charge and on your desktop. But now I'm dealing with a mutilated edition of a mutilated selection of a mutilated archive of a mutilated history of a mutilated kingdom -- hardly the stuff of the positivist, empirical method I was trained in a generation ago.”
While he admits that this is just one book, and that a cursory search of materials outside his field of study fails to reveal a similar concentration of errors, the poor scanning quality seems to essentially push him over the rhetorical edge. He expresses concerns that Google’s poorly scanned books will replace the world’s collections of rare books and archival materials, arguing that “should Google Books prevail, and the resources of the scholarly community be made irrelevant by Google's sheer scale and force, the future of our past will be in great doubt.”
This view seems pretty extreme to me, but it’s expressed often enough in a variety of articles and blog posts to merit discussion. I don’t think that Google Books is about preservation. I think it’s about access. The ability to do full-text searching in four million books is ridiculously convenient, and that massive opportunity comes at the expense of precision and quality. But given the legal complications of the Google Books settlement, I don’t think that access at the expense of preservation is an argument that Google’s leaders want to be publicly pushing. In a recent New York Times editorial called “A Library to Last Forever,” Google co-founder Sergey Brin (on the basis of title alone) is obviously suggesting that the project is justified because it will digitally preserve the world’s libraries.* So I suppose it makes sense to judge Google Books by the stringent standards that the goal of preservation implies, since these are the claims that the company itself is making.
Still, I don’t see any reason to jump to the conclusion that Google’s digitial copies of books are going to make physical collections irrelevant, especially in the case of rare books. If Google were to launch a project involving digitization of the world’s art, I don’t think anyone would suggest that museum curators may as well trash the original “Starry Night.” However, I do understand that the tenor of Musto’s argument is provoked in part by the arrogance of Google’s stated goals. He suggests that Google believes that the noble goals and public good resulting from the project grant them the “right to turn copyright on its head.” It’s an important point, as the stakes are pretty high. Whatever comes out of the settlement, the repercussions will be huge. Once I read a bit more about the new proposed settlement, I’ll blog about it.
*Brin does point out that without access, preservation doesn’t really matter: “…if our cultural heritage stays intact in the world’s foremost libraries, it is effectively lost if no one can access it easily.”
In this article, Ronald G. Musto, a medieval historian, describes the “promise and perils” of using Google Books for historical research. Musto’s work involves studying archival records related to Naples in the Middle Ages. He briefly describes the repeated destruction and subsequent reconstruction of those records. He notes that “for the few of us who work on the city's urban development, that double mutilation -- of both its archival and architectural past -- makes work difficult at best. More than many other historians, we have to rely on remnants to recreate this history.” Many of these remnants are now available on Google Books, which Musto is decidedly not satisfied with.
Like almost everyone involved in the debate about Google Books, Musto is pleased with the level of new access the resource provides. However, citing a key work in his field, he rails against the quality of Google’s scanning:
“In its frenzy to digitize the holdings of its partner collections, in this case those of the Stanford University Libraries, Google Books has pursued a "good enough" scanning strategy. The books' pages were hurriedly reproduced: No apparent quality control was employed, either during or after scanning. The result is that 29 percent of the pages in Volume 1 and 38 percent of the pages in Volume 2 are either skewed, blurred, swooshed, folded back, misplaced, or just plain missing. A few images even contain the fingers of the human page-turner. (Like a medieval scribe, he left his own pointing hand on the page!) Not bad, one might argue, for no charge and on your desktop. But now I'm dealing with a mutilated edition of a mutilated selection of a mutilated archive of a mutilated history of a mutilated kingdom -- hardly the stuff of the positivist, empirical method I was trained in a generation ago.”
While he admits that this is just one book, and that a cursory search of materials outside his field of study fails to reveal a similar concentration of errors, the poor scanning quality seems to essentially push him over the rhetorical edge. He expresses concerns that Google’s poorly scanned books will replace the world’s collections of rare books and archival materials, arguing that “should Google Books prevail, and the resources of the scholarly community be made irrelevant by Google's sheer scale and force, the future of our past will be in great doubt.”
This view seems pretty extreme to me, but it’s expressed often enough in a variety of articles and blog posts to merit discussion. I don’t think that Google Books is about preservation. I think it’s about access. The ability to do full-text searching in four million books is ridiculously convenient, and that massive opportunity comes at the expense of precision and quality. But given the legal complications of the Google Books settlement, I don’t think that access at the expense of preservation is an argument that Google’s leaders want to be publicly pushing. In a recent New York Times editorial called “A Library to Last Forever,” Google co-founder Sergey Brin (on the basis of title alone) is obviously suggesting that the project is justified because it will digitally preserve the world’s libraries.* So I suppose it makes sense to judge Google Books by the stringent standards that the goal of preservation implies, since these are the claims that the company itself is making.
Still, I don’t see any reason to jump to the conclusion that Google’s digitial copies of books are going to make physical collections irrelevant, especially in the case of rare books. If Google were to launch a project involving digitization of the world’s art, I don’t think anyone would suggest that museum curators may as well trash the original “Starry Night.” However, I do understand that the tenor of Musto’s argument is provoked in part by the arrogance of Google’s stated goals. He suggests that Google believes that the noble goals and public good resulting from the project grant them the “right to turn copyright on its head.” It’s an important point, as the stakes are pretty high. Whatever comes out of the settlement, the repercussions will be huge. Once I read a bit more about the new proposed settlement, I’ll blog about it.
*Brin does point out that without access, preservation doesn’t really matter: “…if our cultural heritage stays intact in the world’s foremost libraries, it is effectively lost if no one can access it easily.”
Tuesday, November 3, 2009
Subscribe to:
Comments (Atom)
