Librarians and the Net: Why Librarians should rule the net?


      EEEEE      N   N    OOO    DDDD    EEEEE
      E          NN  N   O   O   D   D   E
      EEEE  ===  N N N   O   O   D   D   EEEE
      E          N  NN   O   O   D   D   E
      EEEEE      N   N    OOO    DDDD    EEEEE

Vol 1, No. 4


-- by R. Anders Schneiderman, PhD.,

A recent article in Business Week, "Has The Net Hit The Wall, " complained that it is harder and harder to find anything on the Net. One solution that holds great promise, they said, is using artificial intelligence to catalog the web.

Meanwhile, back at the lab, scientists were finding it was easier said than done. The National Center for Supercomputing Applications (NCSA), the people who brought us the first popular software for creating and browsing the World Wide Web, tackled a relatively small collection of documents (ten million abstracts from an engineering library). But even this small set overwhelmed their powerful workstation computers; eventually, they had to run their programs on a massive supercomputer for four days.

NCSA's experience is a good reminder of one of the central problems with the Internet. Most of us think of libraries as quaint, antiquated places, home of "Marian the Librarian." The reality is that librarians have a lot to offer the Information Age. Librarians have been managing complex information for over two hundred years. If we were smart, we'd let librarians rule the Net.

Let's start with the issue of searching. Until recently, computer scientists argued that the best way to search for information on the web was by using keyword searching: you type in a word or two and the computer searches documents for them. But keyword searching often fails miserably. If I'm interested in poems about love, what do I search for? If I simply searched for "love," I would miss many famous love poems. If I'm interested in housing policy, I have the opposite problem: there is no easy way to distinguish between government housing policy, campus dorm housing policy, ads for housing, and detailed housing codes. Clearly, keyword searching isn't enough; information needs to be catalogued.

As librarians know from years of experience, cataloging information is a tricky business. If I'm interested in information about ancient Egypt, the kind of information I'd want to search can differ greatly. A child, an adult who wants a quick overview, an Egyptologist, and an anthropologist have very different needs. And as NCSA learned the hard way, computers aren't very good at cataloging information even when the information, in the case of engineering, is already quite specialized. If we're going to catalog the web, people will have to do the bulk of the work.

Given how quickly the Web grew, no system of cataloging would have worked perfectly. But if librarians had been in charge, they would have insisted that that every web author have access to simple programs that helped them briefly catalog any document or collection of documents they put up on the web. That way, every document would have at least been identified by author, title, date, and a subject heading according to at least one standard schema of catagorization. It wouldn't have been as accurate as standard library card catalogs, but it would have given us a fighting chance of finding the information we really need no matter how vast the Web becomes.

There are a number of similar issues where librarians would have saved us from pain and suffering. For example, one of the really irritating aspects of the web is that if someone moves their web, there is no easy way to find it. This is because it never occurred to the web's creators that documents might move and so they didn't put in a way to keep track of them. Nor did it occur to them that some system of collaboration was needed to ensure that if the owner of a frequently used web site could no longer provide access (e.g., because they had left a university where they could freely house the site) another web site would house the collection. As a result, extremely valuable information sometimes disappears off the web without a trace. Librarians have spent years handling these and other complex problems that arise when managing large archives of information over time, and their experience would have been invaluable if computer scientists had been smart enough to use it.

Perhaps the most tragic aspect of having computer scientists, rather than librarians, rule the Net is a result of the differences between the cultures of these two professions. Both believe in providing information for free, but they do so using very different methods.

Computer programmers operate by what we might call the "Treehouse" ethic of sharing. The Net contains a wealth of computer resources--programming languages, programs, Frequently Answered Question (FAQ) lists--that are free for the taking. But at the same time, there is no sense that everyone should have the right to join the club. In fact, programmers often have a certain amount of disdain for those who can't play by their rules.

Computer culture is also laced with the attitude of, "I'll do what want and tough luck if you don't like it." The people deciding whose needs get served by software that's given away for free are, for the most part, programmers who are fortunate enough to have the time and the freedom to putter around (the people, as a friend who's a secretary pointed out, who do not have to worry about having their keystrokes monitored at work or having to change diapers at home). As a result, the Internet tends to be driven by their desire for the coolest toys rather than by the needs of most people.

Libraries, in contrast, are built around the idea that they need to serve everyone. Instead of focusing on the latest toys, they focus on resources that everyone will be able to use, and they strongly believe in ensuring universal access. In short, libraries are based on a culture that says that knowledge and information must be available to everyone if our democracy is to survive. Computer science types occasionally make grandiose statements about helping humanity; librarians actually try to do it.

Unfortunately, far from being in charge of the rapidly expanding Net, libraries and librarians are simply struggling to survive. While the Federal government pours millions into questionable experiments with "digital libraries," funding for libraries continues to suffer.

The Net also poses a direct threat to libraries though the battle over "fair use." Libraries work because they are allowed to freely lend out books and other items they have purchased. However, on the World Wide Web, if you make one copy freely available, you've essentially made millions of free copies. Not surprisingly, the publishing industry wants to radically restrict "fair use," outlawing making any freely available copies. Some of the industry's favorite proposals are probably unworkable, as they would essentially make web surfing illegal: some go so far as to define viewing a web page as "copying." But even some of the more moderate proposals could devastate libraries' ability to serve the public as more and more information moves online (an issue we'll cover in more detail in a future E-NODE column).

In the long run, the only way the Net will rise to its true potential is if librarians become an integral part of the discussion of the Net's future. In the meantime, we need to fight to make sure that libraries survive and thrive in the new Information Age, and we need to start giving librarians the respect they are due.

Special thanks to Karen Coyle, UC Librarian and head of the Berkeley Chapter for Computer Professionals for Social Responsibility (CPSR). Karen is one of the smartest people around on issues related to libraries, information, and the Net, and she's responsible for completely changing my understanding of what libraries are all about. To learn more, visit her web site at You can also check her out in the latest issue of HotWired.

ENODE: to loose, untie a knot; to solve a riddle.

E-NODE is a monthly column about the Internet. To subscribe to E-NODE, send the following email to subscribe e-node

BACK TO *********************************************************************