Saturday, March 11, 2006
Book Digitization and the Revenge of the Librarians
Mike Zarro and I are at SXSW this week, learning whatever we can about what's going on in web technology. Right now, he's learning about AJAX and I'm at a session about book digitization. I just went to a session on tagging and I'll write about that later. I went analog for that session so I have to gather my notes. I'm live blogging, so pardon the lack of coherence.
We have on the panel Dan Clancy, manager of the Google book search project, Danielle Tiedt from Microsoft's book search project, Bob Stein from a think tank about the future of the book.
Questions:
1) Once books are digitized, what can we do with the material?
2) Transliteracy.
3) How do we deal with the fear of corporatization of the library and its materials?
4) What's the relationship between book digitization and living documents like the Wikipedia?
5) Is our pool of knowledge actually shrinking because the digital content is only a small part of what's available?
Lawley on concerns:
To what extent is this process an exclusive process? Will the digitized material be available more widely or just via the libraries that have partnered with companies?
Personal information issues? What kind of information on your searches will be tracked? For example, if you log in to book search systems, what informaiton about you do they monitor?
Ranking issues. Who decides what books move to the top? When you do a search, what book gets listed first?
Google Book Search. Clancy describes the program. It seeks to address the problem of a large amount of content that isn't in digital form. Trying to get valuable content available. Lots of students don't go to the library anymore.
Publisher program--get rights from a publisher to digitize a book. See about 10% of the content.
Library program--over 85% of the books in the world are no longer available from a publisher. Partnered with several libraries to create full text searchable index of books. Can only provide full text of books that they've gotten permission for or is in the public domain.
From the audience--asks whether others can use public domain materials to create other content based on what Google has scanned.
Tiedt on Microsoft's book search program--not yet publicly available. A lot of the same reasons as Google's program. Only 5% of the world's information in online which is about 8 petabytes of informations. Microsoft is trying to answer people's questions better via search. In order to do that better, they want to put more of the world's information online. Book digitization is a very long term project. Cost is high--about 10 cents a page. Going to require a lot of community effort. Microsoft joined the open content alliance. The oca is focused on public domain works, helps partner technology companies with libraries. Internet archive--creates three copies--one for the library, one for the oca, and for the company.
Stein says Google always says trust us, we're the good guys, but is that really true? Different models of digitizing books. Has a problem with any corporation controlling the archive of our information. (Mark C., if you're reading this, you would be right in line with this guy.) It bothers him to ceding the collection of our culture to the corporations. Advocates an open source approach. [aside from me, are the universities not doing this because they're opposed to it and they lack the funds to do it?]
Tiedt says, she agrees. She doesn't want to be in the business of digitizing the books, but searching them and providing a great user experience. The ideal scenario is that all the content is already digitized and all they have to do is search it. She understands the concern of having corporations in charge of this process.
Clancy asks how would you have us do this differently? Good question. Further, are you comfortable
Liz: why does it have to centralized? offers the model of wikipedia.
From the audience: federal funding could help fund a distributed model. Need to develop an infrastructure. Need to know what I'm referring to.
Is the perception of people really wanting this information imaginary?
Clancy reassures us that the agreements with the libraries are not exclusive.
Lawley is asked about the role of librarians. She says that librarians are still needed to organize and help people choose the right sources. They will serve as guides to information. Joy of searching vs. joy of finding.
Clancy says the need for librarians is actually increasing because of the proliferation of information. Search is not the end all be all of finding information. The library is still a community. People still need to find community; they want to be around people.
Tiedt discusses the challenge of relevance ranking.
A good panel. Raises more questions than it answers really. They did not address copyright issues--which they said they weren't going to address because it's a huge issue. But, Google and Microsoft are plowing ahead with their projects. So books and other printed material are being digitized and libraries will have to decide how to deal with it and think about how they will participate.
We have on the panel Dan Clancy, manager of the Google book search project, Danielle Tiedt from Microsoft's book search project, Bob Stein from a think tank about the future of the book.
Questions:
1) Once books are digitized, what can we do with the material?
2) Transliteracy.
3) How do we deal with the fear of corporatization of the library and its materials?
4) What's the relationship between book digitization and living documents like the Wikipedia?
5) Is our pool of knowledge actually shrinking because the digital content is only a small part of what's available?
Lawley on concerns:
To what extent is this process an exclusive process? Will the digitized material be available more widely or just via the libraries that have partnered with companies?
Personal information issues? What kind of information on your searches will be tracked? For example, if you log in to book search systems, what informaiton about you do they monitor?
Ranking issues. Who decides what books move to the top? When you do a search, what book gets listed first?
Google Book Search. Clancy describes the program. It seeks to address the problem of a large amount of content that isn't in digital form. Trying to get valuable content available. Lots of students don't go to the library anymore.
Publisher program--get rights from a publisher to digitize a book. See about 10% of the content.
Library program--over 85% of the books in the world are no longer available from a publisher. Partnered with several libraries to create full text searchable index of books. Can only provide full text of books that they've gotten permission for or is in the public domain.
From the audience--asks whether others can use public domain materials to create other content based on what Google has scanned.
Tiedt on Microsoft's book search program--not yet publicly available. A lot of the same reasons as Google's program. Only 5% of the world's information in online which is about 8 petabytes of informations. Microsoft is trying to answer people's questions better via search. In order to do that better, they want to put more of the world's information online. Book digitization is a very long term project. Cost is high--about 10 cents a page. Going to require a lot of community effort. Microsoft joined the open content alliance. The oca is focused on public domain works, helps partner technology companies with libraries. Internet archive--creates three copies--one for the library, one for the oca, and for the company.
Stein says Google always says trust us, we're the good guys, but is that really true? Different models of digitizing books. Has a problem with any corporation controlling the archive of our information. (Mark C., if you're reading this, you would be right in line with this guy.) It bothers him to ceding the collection of our culture to the corporations. Advocates an open source approach. [aside from me, are the universities not doing this because they're opposed to it and they lack the funds to do it?]
Tiedt says, she agrees. She doesn't want to be in the business of digitizing the books, but searching them and providing a great user experience. The ideal scenario is that all the content is already digitized and all they have to do is search it. She understands the concern of having corporations in charge of this process.
Clancy asks how would you have us do this differently? Good question. Further, are you comfortable
Liz: why does it have to centralized? offers the model of wikipedia.
From the audience: federal funding could help fund a distributed model. Need to develop an infrastructure. Need to know what I'm referring to.
Is the perception of people really wanting this information imaginary?
Clancy reassures us that the agreements with the libraries are not exclusive.
Lawley is asked about the role of librarians. She says that librarians are still needed to organize and help people choose the right sources. They will serve as guides to information. Joy of searching vs. joy of finding.
Clancy says the need for librarians is actually increasing because of the proliferation of information. Search is not the end all be all of finding information. The library is still a community. People still need to find community; they want to be around people.
Tiedt discusses the challenge of relevance ranking.
A good panel. Raises more questions than it answers really. They did not address copyright issues--which they said they weren't going to address because it's a huge issue. But, Google and Microsoft are plowing ahead with their projects. So books and other printed material are being digitized and libraries will have to decide how to deal with it and think about how they will participate.
Tags:

