Unfortunately, many on-line systems designed for satellite data do not take into account the somewhat different nature of associated ground-based data. Data queries that work most of the time but fail because the system has not taken into account less common data are not robust enough for today’s users. In order to avoid embarrassing problems, EO system designers must be aware of the nature of ground-based data.
It is also imperative that the EO community be willing to embrace all the various subgroups that constitute the earth observation community. Excluding certain segments (such as modelers or biologists) in order to cater to other segments of the community can have unanticipated backlash effects. Also, unnecessary divisions within the community may be created, an unbalanced view of earth science may be presented, customers may become frustrated and stop using the system if they cannot easily find the data they are looking for, and ultimately funding may be affected.
Fortunately, at the WWW Access to Earth Observation/GeoReferenced Data (EO/GEO) Workshop held as part of the Fifth International World Wide Web Conference in Paris in May 1996, a small but determined group voiced their views, experiences, and concerns about finding themselves archivers of ground-based data that are part of systems designed for satellite data. The Oak Ridge National Laboratory (ORNL) Distributed Active Archive Center (DAAC), a data archive and distribution center that is part of NASA's Earth Observation System Data Information System (EOSDIS) program, has some experience in this area. In this paper we describe some of our insights on this subject in the hope that the designers of other systems may learn from our experience.
Another difference is that while satellites carry a limited number of sensors, ground-based scientists use a multitude of small sensors to collect and measure data. Furthermore, satellite datasets consist of a steady stream of data measuring, for example, one parameter with one sensor; on the other hand, ground-based datasets are typically very small tabular datasets made up of many, perhaps thousands, of files produced by a multitude of sensors that measure many different parameters. In addition, the sensor may not be an instrument; it just might be a process like radiocarbon dating or human observation. Therefore, the metadata (source, sensors, and parameters, etc.) associated with ground-based data are inherently more complicated than those for satellite data.
The time resolution for ground-based data is also different from that for satellite data. Whereas satellite data generally involve information that can be subset by the day and hour, ground-based data may include paleoclimatic data that date back thousands of years with absolutely no distinction or concern for day or hour.
Browse also means very different things for satellite data and ground-based tabular data. Browsing satellite imagery data involves displaying an oversampled view of the data, whereas browsing tabular data involves either viewing a portion of the data itself or a plot of various parameters.
For instance, HDF seems to be the standard data format within the EOSDIS realm. Unfortunately, it is rather unwieldy for small, tabular data files. We at the ORNL DAAC have thousands upon thousands of small tabular data files that may contain only a few dozen lines of data in flat ASCII text. Transferring all of these files into HDF is overkill. In addition, our users prefer to receive their tabular data in flat ASCII text or other native formats. These are formats users are comfortable with, that are appropriate to the data, and that users are able to use with no further effort on their part. If we were to put these data files into HDF, we would find ourselves in the business of explaining to all our customers what HDF is and how to use it.
The modeling community also has very real and important data format needs, but their needs may be very different from the needs of other parts of the community. They use standardized model input files, and model outputs can be datasets. The needs of the modeling community within EO systems are just now beginning to be recognized and addressed.
For instance, satellite-based data may have a satellite as the platform, with instruments such as various cameras and sensors attached to the platform. With field-based data, on the other hand, platforms and sources can be more unusual, such as those shown in the following list:
Platform/Source Instrument/Sensor meteorological station rain gauge ocean-going research vessel procedure of gas chromatography computer computer model field investigation procedure of collecting and burning leaves to determine chemical propertiesAlso, often it is the procedure that is important, not the actual apparatus that is used to perform the procedure (e.g., instrument = radiocarbon dating). So allowing techniques and procedures into the metadata model is important if ground-based data are to fit within the system.
Furthermore, it is necessary to have the ability to subset information into more specific subcategories that are applicable to the particular type of information in question. Just as satellite data would lose meaning within a system that subsets only to the level of Satellite Data - Imagery, ground-based data lose meaning if the only subcategory is Nonsatellite Data - Field Investigation. Precise definitions make data easier to find.
In the satellite world, time is generally accepted to be the time of acquisition of the data, with a precise start time, an end time, and perhaps a measurement interval. The data are usually assumed to have been collected within the last 40 years. Ground-based data often have similar timeframes, but occasionally there are data that do not fit within a tidy box. For instance, ground-based data are sometimes measured in geological eras rather than years, e.g., Mesozoic 250,000 B.P. (Before Present). A dataset from a satellite commonly contains one day's worth of images, but paleoclimatic data may contain information that spans thousands of centuries.
U.S. Federal agencies are required to use the Content Standards for Digital Geospatial Metadata as approved on June 8, 1994, to document geospatial data. This may be a useful guideline for non-Federal agencies as long as a certain flexibility in interpretation is retained. It should be noted that following these guidelines can, at times, result in documentation created at a high cost that totally dwarfs the data that it is intended to describe.
Ground-based data can play a significant role in our understanding of the world if it is used in conjunction with satellite data to explore and explain physical phenomenon. We hope our paper has provided some insight into this integration process and that designers of future satellite-based systems will learn from our experience.
* Research sponsored by NASA under Interagency Agreement DOE No. 2013-F044-A1 under Lockheed Martin Energy Research Corp., contract DE-AC05-96OR22464 with the U.S. Department of Energy.
"The submitted manuscript has been authored by a contractor of the U.S. Government under contract No. DE-AC05-96OR22464. Accordingly, the U.S. Government retains a nonexclusive, royalty-free license to publish or reproduce the published form of this contribution, or allow others to do so, for U.S. Government purposes."