Google Faces Legal Challenges in its Effort to Digitize University Library Contents

Krista Kennedy, PhD Student, University of Minnesota
Assistant Chair, CCCC Intellectual Property Caucus

CASE OVERVIEW

In December 2004, Google announced an ambitious new attempt to scan and render searchable millions of volumes from the libraries of Harvard, Stanford, Oxford, and the University of Michigan, as well as the New York Public Library. The original project name was Google Print, which was changed to Google Book Search in November 2005. There are two central facets to the initiative: Google Publisher and Google Library. The first works with publishers to coordinate permissions, direct contributions of texts, and promotion. Compensation is provided in the form of links that encourage searchers to purchase the product from booksellers or directly from the publisher. Publishers may also share in contextual advertising (“Google ads”) revenue if they agree that advertising be included on the pages for their books. All material generated under this project is digitized and offered with full permission of the copyright holder.

Google Library, on the other hand, partners with libraries to arrange and facilitate scanning of materials under fair use doctrine. No permission is sought from the publisher for reproduction of these materials. The initial stage of the Google Library project involves a six-year partnership between Google and the University of Michigan called the Michigan Digitization Project. The seven million volumes in the UM Libraries collection would constitute the initial acquisitions for Google Book Search. In return for their cooperation, the Libraries will receive digitized copies for their own use. While UM has been a leader in digital preservation and has pursued an internal digitization project for a number of years, they have only been able to digitize about 5,000 volumes annually. At that rate, digitization of the entire collection would take approximately 1600 years. By partnering with Google, they are able to drastically increase the pace of this project while also making strides toward opening their collection to users worldwide. In the process, they also reduce their own digitization expenses, since Google bears the costs of reproduction, conversion, and transmission, as well as costs associated with pulling and reshelving materials. Scanned and converted works are made available immediately through Google as they are processed and are stored in perpetuity on the company servers.

From the project launch, Google has drawn a sharp distinction between public and proprietary works. Works that have passed into the public domain are made available in full. Out-of-print works whose copyright is still in duration are made available in “snippets” consisting of approximately three sentences. Availability of in-print works is at the discretion of the copyright holder, who may choose to allow availability of the entire work, of a few sample pages, or of a snippet. The owner may also choose to opt out of the project altogether, much in the way that domain owners can request that Google not catalogue digital works. On August 22, 2005, Google announced that it would not begin the project until November, so as to give publishers a chance to make decisions about participation and submit a list of works to be excluded from the project.

On September 20, 2005, the Author’s Guild (AG) filed a class action complaint against Google. The three named plaintiffs were authors whose works are in the UM collection and the Class was initially defined as all persons or entities holding copyright to one of the seven million volumes in the UM Libraries. It alleged that “by reproducing for itself a copy of those works that are not in the public domain, Google is engaging in massive copyright infringement. It has infringed, and continues to infringe, the electronic rights of the copyright holders of those works.” This infringement, it was claimed, adversely affected the market for their works and damaged their goodwill and reputations. They also claimed that Google intended to derive revenue from those works by using them specifically to attract visitors and consequently generate advertising revenue.

A month later, the Association of American Publishers (AAP) also filed suit. The named publishers include McGraw-Hill, Pearson Education, Penguin, Simon & Schuster, and John Wiley & Sons. They claimed that they were engaged or planned to engage in similar digitization endeavors that would eventually be made available to all search engines, including Google. Google’s project impinges on this potential market. The suit also objects specifically to Google’s announcement that publishers could provide the company with a list of books to be excluded from the project by November 2005, arguing that it is a clear inversion of the default rights afforded authors to control of reproduction, distribution, and display of their works in 17 U.S.C. §106. It further characterizes Google’s actions as willful infringement executed with conscious disregard for author and publisher rights.

DISCUSSION OF THE CASE

The primary decision to be made in this case concerns the application and limits of fair use doctrine. The application of fair use rests on a four factor test: the purpose and nature of the use, the nature of the copyrighted work, the amount and substantiality of the work taken, and the effect of the use upon the potential market.

The precedent most often cited in fair use issues relevant to search engine operations is Kelly v. Arriba Soft, 336 F.3d 881 (9th Cir. 2003). Much like Google Image Search, Arriba Soft created a database of images from websites without obtaining the permission of the site owners or copyright holders. They then displayed the images as thumbnails that linked to the original content on external sites. Kelly, a photographer, discovered that his images were being used as thumbnails and sued for copyright infringement. The court found that the reproduction of the photographs as thumbnails did satisfy the conditions of fair use, and the Ninth Circuit affirmed the opinion. The opinion addressed the four factors as follows:

Purpose and character of the use: Arriba was not using the images to promote itself nor did it attempt to profit through their use. Kelly’s images were only a few among many thousands in the database. More importantly, their use of images served a different function than the original prints, namely directing access to material on the Internet rather than facilitating original expression. The court ruled that this use was sufficiently transformative, since the images were reduced in size and reproduced at a lower resolution. Since their use was not exploitative, the commercial aspects of Arriba’s venture weigh only slightly against their favor.
The nature of the work: The court observed that while creative works are closer to the core intent of copyright law than factual works, “published works are more likely to qualify as fair use because the first appearance of the artist’s expression has already occurred.” Kelly’s works were both creative and published. Because of their publication, the court ruled that fair use only slightly favored Kelly.
The amount and substantiality of the portion used: Arriba copied each of the images in their entirety. However, the court ruled that this was necessary in order to construct an identifiable link that would allow users to recognize the content and decide whether or not to continue on to the originating website. In the end, this factor favored neither party.
The effect of use upon the potential market: By providing direct links to Kelly’s original site and content, Arriba steered potential customers directly to him. Since the thumbnails were small and of low resolution, the court ruled that they did not dilute Kelly’s market for full size images. This final factor favored Arriba.

The purpose and nature is a primary aspect of concern in both the AG and AAP complaints. Fair use doctrine extends protections for specific types of use: critical comment, parody, educational purposes, and news reporting. Google satisfies none of these criteria. While the materials in question largely come from educational institutions, Google itself is not a neutral or altruistic entity or technology. Rather, it is for-profit, publicly traded venture. Its business model relies heavily on contextual advertising, and a significant portion of its revenues come from advertisements of one sort or another. It will in fact profit from advertising associated with this new material. (The fact that Google’s market value three months after their IPO was half that of Viacom lends some perspective.) However, it will not profit directly from the sale of reproduced copies. On the contrary: search results will bring the texts to the attention of the reader, and links to booksellers and the publisher will encourage the reader to purchase a hard copy of the text.

The question of the nature of the work is easily satisfied in this case: all of it is previously published. While some of it is indeed creative, the majority of the texts in question are non-fiction and technical works. (It’s perhaps relevant to note that all of the named authors in the AG suit are authors of creative works.)

As in Arriba, duplication of entire works is necessary in order to ensure effective operation of the search engine. However, Google will not provide users with access to the entire text. In most cases, users will receive only snippets or a few pages in their search results. If the search term appears multiple times throughout the work, Google will return only three results. Repeat access attempts will be blocked in order to reduce the chances of the searcher viewing too much of the text.

Through the use of direct links to purchasing opportunities, Google Book Search will increase the demand for searchable texts. This should be particularly true for lesser-known texts that readers might not happen upon in any other fashion. Even if the reader checked out the book at a library rather than purchasing it themselves, the libraries will in turn respond to increased demand. If users were able to print out entire works, diminishment of the market would be conceivable. A three-sentence snippet simply cannot do similar harm.

The McGraw-Hill suit suggests that the project restricts their ability to license digitized copies themselves, thus reducing potential market share. In his copyright analysis of the Google project, Jonathan Band argues that the existence of the Publisher program negates this complaint. By opting to license works through the Publisher program, publishers receive revenue from contextual advertising and linkage, thus opening up revenue streams unavailable to them elsewhere.

Following the line of argument presented here, the Google Book Search project is lawful under U.S. fair use doctrine. However, Google results are available internationally, and copyright exceptions vary from country to country. Band reminds us that copyright infringement is specific to the jurisdiction it was committed within. Since Google is working in the Untied States to scan books from United States libraries, the relevant law concerning these actions is U.S. law. While few other countries would allow reproduction of entire texts, most countries do permit short quotations similar to what might appear in a snippet. Band suggests that these exceptions for quotations should protect Google’s international transmission of search results.

IMPLICATIONS FOR EDUCATORS AND WRITING TEACHERS

In a statement issued the day after the AG suit was filed, UM associate provost and interim librarian James Hilton addressed the crux of this issue:

This is tremendously important public policy discussion. … We need to decide whether we are going to allow the development of new technology to be used as a tool to restrict the public’s access to knowledge, or if we are going to ensure that people can find these works and that they will be preserved for future generations.

As educators, we should be particularly concerned about the preservation of our written culture and the access that we and our students have to written artifacts. Our cultural history is rapidly disappearing, as Lawrence Lessig has pointed out in various books, articles, and lectures. In his lecture on Google Print, he reminds us that only 9% of published American literature is currently in print and under copyright. 16% of it is in the public domain. The remaining 75% is out of print but still in copyright. Because of our loose registration requirements, there is no practical means of obtaining permission from the owners. The volume in this predominant segment of written culture are largely orphaned works. We are faced with opposing options: either reproduce the materials without permission and preserve them, or observe the letter of the law and lose them. Copyright exceptions (such as fair use doctrine) and complements (such as Creative Commons licenses) provide the only viable solutions to this current and future dilemma.

We are also faced with deciding exactly how we should harness emerging technologies. Whenever we discuss issues of cultural production, be it text, audio, or video, we are also forced to discuss control of the technology that delivers and transmits them. Will the Internet be a technology that helps us preserve and share our culture, or will it be a means for corporations to sell our culture to us bit by bit and destroy whatever isn’t profitable?

A different but related question is, what sort of texts do we want to see on the Internet? As teachers of research and argumentation, we often caution our students about wholesale acceptance of materials found online. We hold a wide variety of opinions about the value and reliability of collaboratively constructed resources such as Wikipedia. If a wide range of vetted publications from established publishing houses was available for searches (whether the results be full-text or snippets), would we feel that the Internet had become a more reliable place? Would a mix of commercial and personal publication increase its inherent value?

As educators who are also advocates of culture, our basic responsibilities lie in the preservation of cultural works. As educators who are also intellectual property scholars, our responsibilities lie in the creation and dissemination of a technological philosophy that encourages progress and creativity. And as writing teachers and disciples of text in all its forms, it is imperative that we work toward converting those commitments into policy and law.

RELEVANT SOURCES

Band, Jonathan. “The Google Print Library Project: A Copyright Analysis.” www.policybandwidth.com/doc/googleprint.pdf

Google Books. http://print.google.com/googlebooks/about.html

Hilton, James. “U-M Statement on Google Library Project.” http://www.umich.edu/news/?Releases/2005/Sep05/r092105

Kelly v Arriba Soft, 336 F.3d 881 (9th Cir. 2003). http://homepages.law.asu.edu/~dkarjala/cyberlaw/KelllyvArriba(9C2003).htm

Lessig, Lawrence. “Google Book Search: The Argument.” http://www.lessig.org/blog/archives/003292.shtml

Michigan Digitization Project. http://www.lib.umich.edu/mdp/index.html

Download Author’s Guild v Google
http://files.findlaw.com/news.findlaw.com/hdocs/docs/google/aggoog92005cmp.pdf
Download McGraw-Hill et al. v Google
http://files.findlaw.com/news.findlaw.com/hdocs/docs/google/mcggoog101905cmp.pdf

CASE OVERVIEW

DISCUSSION OF THE CASE

IMPLICATIONS FOR EDUCATORS AND WRITING TEACHERS

RELEVANT SOURCES

Copyright

Organization User Resources

User Links

CASE OVERVIEW

DISCUSSION OF THE CASE

IMPLICATIONS FOR EDUCATORS AND WRITING TEACHERS

RELEVANT SOURCES

Copyright

Organization User Resources

User Links

Website Search