Leetaru, K (2008, October 6). Mass book digitization: the deeper story of Google Books and the Open Content Alliance. First Monday, 13 (10).
In this article, Kalev Leetaru offeres a nuanced perspective regarding the similarities and differences of Google Books and the Open Content Alliance (OCA). He focuses primarily on the technical aspects of their work, their willingness to reveal information about their technical processes, their approaches to copyright and user access, and their use of metadata. Although OCA formed as a reaction to the commercial and secretive nature of Google Books, Leetaru points out that they have not quite delivered on their promise of transparency. While Google has released technical reports about innovations they have made and revealed information about their processing in speeches, very little is known about OCA's technical process. Based on information gathered about both organizations, Leetaru suggests that the two projects are conducted using similar methods. However, OCA spends more time on quality control, while Google Books focuses on increasing output and efficiency. Additionally, Google's PDFs are bitonal, which make them easier to view (even with limited bandwidth) than the full color scans provided by OCA.
Another difference is found in the search options -- Google offers full-text searching, while OCA allows searching only in title and description fields. The two organizations also differ in their approaches to copyright. Google scans copyrighted material, but only allows users to view limited portions in search results. OCA focuses on scanning out-of-copyright materials, but scans in-copyright materials if given permission by the publisher. Possibly the most striking difference described by Leetaru is the approach to restrictions on use of the materials. Public domain materials on Google Books can be downloaded in full. Members of OCA can set their own restrictions on use of the materials they contribute to the project, which means that restrictions vary from item to item. This can apparently get pretty complicated. Google provides metadata explaining the rights policy of each item; OCA does not.
This article provides useful information comparing Google Books to a similar mass digitization project. It's interesting to evaluate OCA's attempt to provide an alternative approach to digitization. Leetaru offers a pretty convincing argument suggesting that OCA hasn't been too successful in meeting its stated goals of transparency and open access. This article also includes pretty thorough descriptions of the process of digitization. Leetaru makes a point of differentiating between the goal of preservation digitization and access digitization. The latter is focused primarily on providing user access to materials, rather than gathering and preserving that material. He argues that both OCA and Google Books are attempts at access digitization, which largely negates much of the criticism directed at Google's quality control standards. If large-scale access is the goal, Leetaru suggests that some level of attention to detail will be lost in order to provide access to more materials. This is an interesting perspective that I hadn't come across before.
Wednesday, October 28, 2009
Wednesday, October 7, 2009
Resource review #1: Metadata and Google Books
Jackson, M. (2008). Using Metadata to Discover the Buried Treasure in Google Books Search. Journal of Library Administration. (47), 1/2.
In this article, Millie Jackson discusses the relative merits of the metadata created by Google Books, compared to that provided by WorldCat and the MBooks project (now known as the Hathi Trust Digital Library) at the University of Michigan. As she points out, full-text keyword searching has advantages and disadvantages. This feature may allow the researcher to search for concepts more easily than can be done with traditional subject headings or controlled vocabulary. However, Jackson notes that the listing of frequently-used words in the text found on Google Books may fail to give the user a sense of the book's "aboutness." In WorldCat, she explains, the user can click on a subject heading and find similar works. In Google Books, it can be more difficult to find relevant materials in a similar manner.
I appreciated the author's even tone and willingness to acknowledge that Google Books will undoubtedly be improved and tweaked over time. Many of the other articles I've come across (some of which I'll discuss later) seem to primarily serve as a list of complaints about poor scanning and bad metadata (as Jackson explains, some of Google's metadata is retrieved automatically from a variety of sources, which can sometimes result in comical and/or frustrating errors). The author also argues that libraries should be looking to Google Books for new ideas, instead of simply finding fault, a perspective I agree with. (Of course, arguments about copyrights and monopolies are another thing entirely, and later on I'll write about resources that address this.)
Additionally, this article is useful because of its discussion about the ways in which libraries can consolidate Google Books with other library services. The Hathi Trust Digital Library at the University of Michigan is a good example. As Jackson explains, the Hathi Trust Digital Library offers many features, some of which are similar or identical to those found on Google Books, but in a very different interface. At the Hathi Trust Digital Library, the user can export citations to a citation manager, find print copies in a library using WorldCat, or search a material's full text. More flexible search options are also available -- searches can be narrowed by viewability, subject, author, language, place or date of publication, and original format and location. (Many of these options are also available in Google's advanced search, but aren't as immediately obvious).
Jackson's article strikes me as a good introduction to the strengths and weaknesses of the search options available in Google Books. Her points, while not discussed in great detail, will be useful in directing me toward other related resources.
In this article, Millie Jackson discusses the relative merits of the metadata created by Google Books, compared to that provided by WorldCat and the MBooks project (now known as the Hathi Trust Digital Library) at the University of Michigan. As she points out, full-text keyword searching has advantages and disadvantages. This feature may allow the researcher to search for concepts more easily than can be done with traditional subject headings or controlled vocabulary. However, Jackson notes that the listing of frequently-used words in the text found on Google Books may fail to give the user a sense of the book's "aboutness." In WorldCat, she explains, the user can click on a subject heading and find similar works. In Google Books, it can be more difficult to find relevant materials in a similar manner.
I appreciated the author's even tone and willingness to acknowledge that Google Books will undoubtedly be improved and tweaked over time. Many of the other articles I've come across (some of which I'll discuss later) seem to primarily serve as a list of complaints about poor scanning and bad metadata (as Jackson explains, some of Google's metadata is retrieved automatically from a variety of sources, which can sometimes result in comical and/or frustrating errors). The author also argues that libraries should be looking to Google Books for new ideas, instead of simply finding fault, a perspective I agree with. (Of course, arguments about copyrights and monopolies are another thing entirely, and later on I'll write about resources that address this.)
Additionally, this article is useful because of its discussion about the ways in which libraries can consolidate Google Books with other library services. The Hathi Trust Digital Library at the University of Michigan is a good example. As Jackson explains, the Hathi Trust Digital Library offers many features, some of which are similar or identical to those found on Google Books, but in a very different interface. At the Hathi Trust Digital Library, the user can export citations to a citation manager, find print copies in a library using WorldCat, or search a material's full text. More flexible search options are also available -- searches can be narrowed by viewability, subject, author, language, place or date of publication, and original format and location. (Many of these options are also available in Google's advanced search, but aren't as immediately obvious).
Jackson's article strikes me as a good introduction to the strengths and weaknesses of the search options available in Google Books. Her points, while not discussed in great detail, will be useful in directing me toward other related resources.
Subscribe to:
Comments (Atom)