*** Modified files in JOE when it aborted on Thu Aug 1 16:38:37 1996 *** JOE was aborted by signal 1 *** File 'sin.html'
As the volume and variety of network information grows, several trends, needs and possibilities are increasingly evident. For instance, perhaps the greatest immediate impact of the World Wide Web is that it has makes network publishing a viable enterprise. The advantages include instant, world-wide availability, hypertext and multimedia content, and extreme flexibility in the material and format of publications. Besides traditional books and articles, for instance, we can now potentially publish data, software, images, animation and audio.
There is a growing trend in many areas of research towards large scale projects and studies that involve contributions from many sources (Green, 1993a). Also, there is no need for a "publication" to be stored all in one place. For instance, acting independently many Web sites have put together national or regional guides. Many of these documents, such as the Guide to Australia integrate information from many different sources. In turn these documents are now themselves being merged to form encyclopaedic information bases, such as the Virtual Tourist.
There are also great advantages in publishing raw data, as well as the conclusions of scientific studies. In many cases data that are gathered for one purpose can be recycled and, combined with other data, add value to related studies. Perhaps the most prominent example is the growth of molecular biology databases. International databases, such as Genbank (Bilofsky & Burks, 1988) and EMBL (Cameron, 1988), are public compilations consisting of contributions from thousands of scientists. Attempts are now underway to expand this practice into other areas, such as biodiversity (e.g. Burdet, 1992; Canhos et al., 1992; (Green, 1994; Greuter, 1991).
The trends described above have made several needs increasingly obvious. These include:
SINS consist of a series of participating "nodes" that each contribute to the network's functions. More specifically the nodes carry one or more of the following:
For research activity, SINs are the modern equivalent of learned societies. Some may even be the communications medium for societies (e.g. Burdet, 1992). We can also consider SINs as a logical extension of newsgroups and bulletin boards. Namely, they aim to provide a complete working environment for their members and users. SINs differ from SIGs ("special interest groups") in two important ways. First SIGs are usually part of larger organizations. The second, and greater, distinction lies in the use of networks. Whereas a group usually has a focus, SINs are explicitly decentralized.
A good example of a SIN is the European Molecular Biology Network. EMBNet is a special interest network that serves the European molecular biology and biotechnology research community. It consists of nodes operated by biologically oriented centers in different European countries. It features a number of services and activities, especially genomic databases such as EMBL (Cameron, 1988).
The following features characterize most large special interest networks. They also provide guidelines for setting one up.
For instance, an international biodiversity database project might consist of agreements on the above points by a set or participating sites ("nodes"). Contributors could submit their entries to any nodes and each node would either "mirror" the others or else provide on-line links to them.
A few of the services currently available include: Gopher, WAIS, World Wide Web, FTP, Usenet News, Telnet, Hytelnet (a bibliographic protocol for libraries, a library SIN), X.500 and network resource location services, such as Archie, Veronica and Jughead, for searching the network. For details of available services, see for example, The Biologist's Guide to the Internet.
The key factors in the success of Gopher are its simplicity - just point and click on a menu - and the availability of "client" software for all of the most commonly used computing platforms. Previously, using the Internet had required a fair measure of computer literacy. Gopher made it possible for many people to explore "The Net" for the first time.
Furthermore, gopher server sites are very easy to set up and maintain; basically ascii files are formatted and placed in a gopher file system. However more sophisticated implementations involving such things as gateways to SQL databases are also possible.
WWW's hypertext formatting language (HTML) is an application of SGML (see earlier). The freeware program RTFtoHTML converts Rich Text Format (an output option on many wordprocessors) to HTML and macros for converting text to HTML are available for MS Word. The HTML browser tkWWW (freeware for Unix/X11) includes a WYSIWYG editor for HTML.
During 1993 World Wide Web (WWW) began to have a profound effect on the academic community. Like Gopher, participation on the "Web" is growing exponentially (doubling time is at present 3 months). The stimulus of the explosion was NCSA's release of a new program (Mosaic) that realized the full potential of WWW's hypermedia capability. NCSA Mosaic is now available under X-Windows, Macintosh and DOS-Windows systems. Important features of Web browsers (first introduced by NCSA's program Mosaic) include:
Many of the above steps will be automated. "Mirroring" is the process of duplicate of a set of information that originates from another site. Whereas it is generally better to provide a pointer to the site that maintains an item of information, it is desirable to mirror any information (e.g. a "home" page for the SIN) that is frequently used, especially to reduce international traffic. Mirroring is also desirable in case of disk crashes or breaks in entwork connections.
To ensure validity, molecular biology databases use the simple, but effective criterion of publication in a refereed journal. Many other approaches can be used. For example one might insist that a description of methodology accompany each data set that has not been published (say) in the scientific literature. Alternatively, a site might accept all contributions and categorize them on the basis of the evident quality of information.
Whatever criteria are used it is desirable to include indicators of reliability for the information in the attribute standard. Ideally every item of information should include a tag denoting accuracy or validity. Quality control fields need to include information about what error checks have been applied to ensure that the values have been recorded and entered correctly.
The compiling agent can apply consistency and outlier checks to filter out errors that may have been missed earlier (Green 1991, 1992). If the data incorporate sufficient redundancy, then consistency checks can reveal many errors. Does the named species exist? For instance, does the location given for a field site lie on land? and within the country indicated? If the database maintains suitable background information, then outlier tests can reveal suspect records that need to be rechecked. For instance if a record indicates that a plant grows at a site that has significantly lower rainfall than any other for that species, then the record needs to be checked in case of error. Both sorts of checks can be automated and are now routine for census data. They have recently been applied to herbarium records and other environmental data (e.g. Chapman, 1992).
The general publication procedure (Fig. 1) includes a quality control step. When a contribution is received the editor applies tests to ensure that the information conforms to the standard and to check for any obvious errors. For text material this quality control process might simply be a careful reading of the ms. If any faults are detected, the information is returned to the source for correction. After this initial checking, new items are placed in an updates area (Fig. 1) and users are invited to submit comments about them. After suitable checks, and corrections by the contributor, the new entry is transferred to the database proper.
The logical design of the system could be based around major projects & themes and the library can be compiled and maintained in several ways:
The above information could be made available via a series of menus and pages available on the Internet via Gopher, World Wide Web and other suitable protocols. Copies of the main pages and hierarchy of documents could be available at each node in the network.
This will require a regular "mirroring" process to ensure that all nodes are kept up to date. It is very important to ensure that all information items in this library are visible at all nodes and not just visible as an isolated reference at a particular site.
An important principle in network publication is that the site that maintains an item of information publishes the information. This rule applies esecially to items that are updated regularly. Secondary sources (other sites that want to provide their users with access to the item concerned) should adopt one of two options: either provide a link to the primary site, or else mirror the original by downloading copies at regular intervals. These practices ensure that users always have access to the most up-to-date information available.
One approach to publishing that a SIN can adopt is simply to register relevant existing activities. This benefits both the SIN as a whole and the publishing site:
Once the necessary scripts and programs have been developed, they could be provided with other standard files as astartup package to new nodes. In many cases the scripts and programs needed to automate particular procedures already exist and are freely available on the Internet.
Second, the evident success of molecular biology databases and physics preprint services suggests that the underlying principles can be extended both to other fields and to other areas of activity. Across the entire range of science, for instance, observations and experiments yield a wealth of raw data which, if suitably organized, can add value to future studies.
Finally there is the problem of how to organize an exploding pool of information on the network. Librarians have struggled with this problem for centuries. Whilst their solutions are useful, the information explosion on the network poses problems never encountered before: the sheer volume of information, rapid turnover and change (especially the need to maintain information), and the flexibility of hypertext and multimedia. The SINS approach provides a user-driven solution, in which groups of people interested in a particular topic organize and index information in ways that they find most useful.
Various projects are putting into practice the SINS concept, as outlined here. For example, FireNet, for example, is a SIN concerned with all aspects of landscape fires (Green et al., 1994) and the Biodiversity Information Network (BIN21) has now organized its network activity as a SIN (Green and Croft, 1994). These and other similar activities have provided many useful lessons about putting the SINS idea in practice. I have tried to incorporate some of this practical experience into the above account. The interest shown in such groups encourages my belief that the SINS approach is a very fruitful way to organize activity via the Internet.
To put current developments into perspective, we can consider the changes that have taken place in the way that scientific results are disseminated. We might term the Sixteenth and Seventeenth Century was the era of correspondence between great scholars. The Nineteenth Century can be classed as the era of the great societies and the Twentieth as the era of the great journals. The Twenty-First Century will surely become the era of the knowledge web and I expect that SINS, whatever form they may take, will play a major role in its organization.