Inside an Environmental Data Archive WWW Site*

T. G. Yow, Ph.D.1, S. V. Jennings2, J. W. Grubb2, A. W. Smith1

1Oak Ridge National Laboratory, 2University of Tennessee

Abstract

The Oak Ridge National Laboratory (ORNL) Distributed Active Archive Center (DAAC), which is associated with NASA's Earth Observing System Data and Information System (EOSDIS), provides access to tabular and imagery datasets used in ecological and environmental research. Because of its large and diverse data holdings, the ORNL DAAC must not only find an efficient way to manage the data but must also help users find data of interest from thousands of files available at the DAAC. To accomplish these goals, the ORNL DAAC has developed several World Wide Web (WWW) tools such as the Biogeochemical Information Ordering Management Environment (BIOME), a WWW search and order system, and WWW-based data management and configuration control tools. This paper describes these systems and the special features that allow for easy access to and management of the data.

Keywords

Metadata, data management, configuration control, data archive

1 INTRODUCTION

1.1 The ORNL DAAC

The Oak Ridge National Laboratory (ORNL) Distributed Active Archive Center (DAAC) is one of nine data archive and distribution centers belonging to NASA's Earth Observing System Data and Information System (EOSDIS). Both the Earth Observing System (EOS) and EOSDIS are components of NASA's contribution to the U.S. Global Change Research Program through its Mission to Planet Earth Program.

The ORNL DAAC archives and distributes data relating to the Earth's biogeochemical dynamics. These data come from NASA-sponsored ground-based field investigations and include tabular data and imagery from satellite and aircraft sensors. Non-NASA biogeochemical data relevant to global change research are also included.

1.2 The ORNL DAAC WWW Site

To meet the needs of our growing user community, in 1994 the ORNL DAAC created a World Wide Web (WWW) site using a National Center for Supercomputing Applications (NCSA) httpd server and the Unix operating system on a Silicon Graphics workstation.

The first ORNL DAAC home page was a relatively simple text description of the DAAC, its holdings, and contact information for obtaining the data. In addition, detailed descriptions of each dataset were available. These text pages have been improved and updated and continue to serve as an access option. They are located at http://www-eosdis.ornl.gov.

As the number of datasets and users grew, we saw the need for a more sophisticated search and order system. In response to this need we developed the Biogeochemical Information Ordering Management Environment (BIOME) search and order system in 1995. BIOME is located at

http://www-eosdis.ornl.gov/BIOME/biome.html and can be accessed from the DAAC home page. Also, as the complexity of the archive increased, we saw the need for tools to automate site management. In 1996 we created WWW-based database management and configuration management (CM) tools.

The ORNL DAAC WWW site's customized search and order system and its two customized utilities use many generic Web features as well as advanced features that help users locate data quickly and easily and help site personnel manage the archive's metadata. The following subsections describe some of these features.

2 BIOME

The ORNL's search and order system has several customized features that make data retrieval fast and efficient:

A few of BIOME's more interesting features are described in the following subsections.

2.1 Browser-Aware Linking and Dynamic Paging

Because many of our users are scientific researchers working in remote areas, we try to balance their needs with those of users who have access to the latest technology. On-the-fly browser customization allows the ORNL DAAC WWW site to take advantage of the most innovative WWW features while maintaining backwards compatibility with older browsers and text-based browsers.

The ORNL DAAC WWW interface allows any user with a Web browser and Internet access to search and order/download data directly to his/her machine. Because we have no way of knowing which browser a user might have and do not wish to exclude any potential users, we designed the interface to work with any browser capable of supporting forms, including non-GUI browsers (e.g., Lynx).

2.1.1 Browser Identification and Classification

A shell script is executed as the home page is being constructed prior to being sent to the user's browser. The script examines a Unix environment variable that contains the descriptive identification of the user's browser. The script then launches a C program that parses the browser description and categorizes it into one of four classes of browsers (additional classes can be added in the future): (1) incompatibles that cannot handle forms (Cello, etc.); (2) character based that can handle forms (Lynx, etc.); (3) Mosaic-compatible and variants (various Mosaics, HotJava, etc.); and (4) Netscape-compatible and variants (Netscape, Internet Explorer, etc.).

As new, previously unknown browsers access the system, the user is asked to call or email ORNL DAAC User Services with a description of the browser. The browser is then entered into one of the classes listed above. From August 1995 to August 1996, for example, 1152 unique browser/platform combinations accessed the ORNL DAAC site.

2.1.2 Constructing Dynamic Pages

The user's browser class is stored in a file with the name based on the IP address of the user. This file can be read by any other process that needs to know the browser class, allowing us to dynamically alter the pages.

A shell script uses the browser class to determine whether or not the user's browser can display images. If so, the section of the page that allows the user to select information based on images is included in the page. If the browser class does not allow images, this portion of the page is not included for display.

Each page contains only links that the user's browser can display. Links to pages requiring image capabilities are all in page components shown only to browsers that can display images. In this way, the site also exhibits dynamic linking.

2.2 User Search Methodology Awareness

In general, users who are searching for data already have some information that they are trying to complete or expand. Our web site contains a great deal of information that can be difficult to search without a convenient starting point. The BIOME user interface offers several ways to search for data based upon what the user already knows; a few of these search options are described below.

2.2.1 Attribute Searching

Attribute searching allows the user to search the DAAC Sybase RDBMS metadata database by some attribute(s) of the data. BIOME displays a list of attributes from which to select the search criteria (i.e., dataset name, investigator, source, sensor, or geophysical parameter). The search criteria are formed into a query that searches our Sybase database for a match; the search results are then returned to the user's browser. The user may then select additional criteria to narrow the search results; a continuously shrinking results set eventually returns the desired data.

2.2.2 Subject Searching

The data at this DAAC cover a variety of different subjects, e.g., meteorology, hydrology, atmospheric chemistry, etc. However, these wide subject categories may not be specifically mentioned in the documentation or the metadata. To assist the user in searching for information about a general subject, we have categorized datasets into various subject areas. The user can select one or more subjects from a subject list that is linked to a page containing dataset titles and links to data and documents.

2.2.3 Map Searching

BIOME allows users to select a point from various geographic maps, and the system returns all the datasets that contain data about the selected point. There is a world map and several continental maps. The maps are clickable images where the pixel location corresponds to a map position. The latitudes and longitudes of the defined map position are retrieved from the database; then those values are compared to the latitude and longitude for each data file as listed in the metadata. All datasets that contain files with the selected latitude and longitude within their latitude and longitude ranges are identified and listed, as are the datasets to which the files belong.

2.3 User-Selected Dynamic Packaging and Delivery

Once the user has selected the dataset(s) of interest, he/she has the option of downloading data files directly to the browser or selecting complete datasets of files to be collected and either made available by FTP or recorded onto some media and shipped to the user. Hard media requests require User Services intervention. FTP delivery requests are as automated as possible. To be considerate to others on the network, all requests are compressed into a single file but are in the platform format requested by the user.

The web site maintains Unix (tar, compress, gzip), PC (PK-Zip), and Mac (Stuffit) packaging software that is invoked whenever the user requests FTP delivery, which is the most popular type of data delivery. The user is emailed an order confirmation when the order is placed and then emailed FTP instructions when the order is ready, usually within a few minutes. This is all done without any human intervention although every action is logged for human review at a later date.

3 WWW-BASED DATABASE MANAGEMENT TOOLS

Metadata for the data archived at the ORNL DAAC is stored in several Sybase databases. As the complexity of the data holdings has increased, the task of maintaining the databases has become increasingly difficult and time-consuming. Fortunately, the Web-based database administrator (DBA) maintenance tool provides options that make the task of the database administrator less difficult. This tool is a GUI interface that uses HTML 3.2 cgi scripts and C processes executed by the cgi scripts to access the databases and perform database functions using Sybase's DBLibrary.

What makes this interface so useful and robust is the design options that have been custom-built and implemented. For example, the DBA tool handles the ingest of new metadata by providing on-the-fly templates of database tables generated dynamically from Sybase's system tables. New data can be typed onto the templates, eliminating the need for manual construction of Sybase bulk copy files using a text editor, a task that is tedious and error prone. In addition, the DBA Maintenance Tool easily handles updates to existing metadata. The tool offers such options as global updates to the databases; changes can be made to all tables in a database that contain a particular field as well as to other databases containing the same table and field. The tool also easily handles single updates to a database.

Other options include automated bulk copies out of the database and the printing of the current structure for each table. The DBA Tool also automatically generates a transaction log that provides a record of all DBA actions on the databases. Future enhancements will include automated database backups, table creation options, and the granting of user privileges.

4 WWW-BASED CONFIGURATION MANAGEMENT TOOLS

One of the many software management issues the ORNL DAAC has addressed is how to allow multiple Web developers to work on a common set of HTML documents. We chose the Revision Control System (RCS) as the tool for archiving and managing these documents.

Our Web interface to RCS is the Directory Management System (DMS). This WWW interface makes this system easy to use by those who are not familiar with RCS and its UNIX commands. Furthermore, the information provided by RCS is arranged in a clear and concise layout on a single HTML page rather than appearing as a series of UNIX and RCS commands at the prompt. Examples of the information and the RCS functionality DMS provides the user are the capability to (1) quickly view all of the files that are and are not maintained in the RCS archives, (2) check in and check out files from the software archive, and (3) view which users have checked out which documents.

Another function of the DMS is to copy documents from one UNIX machine to another. This is important to the DAAC because all WWW development occurs on the development machine; then when the software is ready, it is moved to the operational machine. The DMS provides the mechanism for determining which files on the development server are new and which ones have been modified compared to the files on the operational server. Furthermore, DMS has the capability of tagging these new or modified files and uploading them to the operational environment. This system has greatly reduced confusion over which files in the development area have been modified but not yet upgraded to operational status.

5 CONCLUSION

The ORNL DAAC provides WWW access to a large number of tabular and imagery datasets relating to ecological and environmental information. The ORNL DAAC has accomplished this task by designing and offering a customized WWW search and order system that allows efficient and rapid data search and retrieval. To manage the metadata and documentation that supports this data archive, DAAC team members have also developed several WWW data management and configuration control utilities.

By developing customized WWW tools to manage global ecological and environmental data, the ORNL DAAC has made an important contribution to NASA's Mission to Planet Earth Program. By staying on the cutting edge of WWW technology, the ORNL DAAC remains an important player in this program.

6 REFERENCES

Lemay, L. (1995) Teach Yourself Web Publishing with HTML in a Week. Sam's Publishing. Indianapolis, Indiana.

Lemay, L. (1995) Teach Yourself More Web Publishing with HTML in a Week, Sam's Publishing. Indianapolis, Indiana.

7 BIOGRAPHY

Dr. Teresa G. Yow is a Systems Analyst in the Computational Physics and Engineering Division of ORNL. She is database designer and database administrator for the ORNL DAAC.

Sarah V. Jennings is a Research Associate at the University of Tennessee Transportation Center's Pellissippi Research Institute and serves as WWW Curator and Documentation Specialist for the ORNL DAAC.

Jon W. Grubb is a Research Associate at the University of Tennessee Energy, Environment, and Resources' Pellissippi Research Institute, and serves as the WWW Programming Specialist for the ORNL DAAC.

Anthony W. Smith is a Systems Analyst in the Computational Physics and Engineering Division of ORNL. He serves as the User Interface Specialist for the ORNL DAAC.

* Research sponsored by NASA under Interagency Agreement DOE No. 2013-F044-A1 under Lockheed Martin Energy Research Corp., contract DE-AC05-96OR22464 with the U.S. Department of Energy.

"The submitted manuscript has been authored by a contractor of the U.S. Government under contract No. DE-AC05-96OR22464. Accordingly, the U.S. Government retains a nonexclusive, royalty-free license to publish or reproduce the published form of this contribution, or allow others to do so, for U.S. Government purposes."