Integrating Ground-Based EO Data in Satellite-Based Systems*

by Sarah V. Jennings, Patricia F. Daugherty, and Teresa G. Yow, Ph.D.

1. Introduction

Earth observation (EO) and other forms of geo-referenced data are typically thought of as being "satellite data." It is true that the majority of EO data are satellite oriented; thus, most on-line EO data systems are designed primarily for satellite image data. However, there is a small but significant minority of EO data that is not satellite image data; i.e., it is ground-based or terrestrial data.

Unfortunately, many on-line systems designed for satellite data do not take into account the somewhat different nature of associated ground-based data. Data queries that work most of the time but fail because the system has not taken into account less common data are not robust enough for today’s users. In order to avoid embarrassing problems, EO system designers must be aware of the nature of ground-based data.

It is also imperative that the EO community be willing to embrace all the various subgroups that constitute the earth observation community. Excluding certain segments (such as modelers or biologists) in order to cater to other segments of the community can have unanticipated backlash effects. Also, unnecessary divisions within the community may be created, an unbalanced view of earth science may be presented, customers may become frustrated and stop using the system if they cannot easily find the data they are looking for, and ultimately funding may be affected.

Fortunately, at the WWW Access to Earth Observation/GeoReferenced Data (EO/GEO) Workshop held as part of the Fifth International World Wide Web Conference in Paris in May 1996, a small but determined group voiced their views, experiences, and concerns about finding themselves archivers of ground-based data that are part of systems designed for satellite data. The Oak Ridge National Laboratory (ORNL) Distributed Active Archive Center (DAAC), a data archive and distribution center that is part of NASA's Earth Observation System Data Information System (EOSDIS) program, has some experience in this area. In this paper we describe some of our insights on this subject in the hope that the designers of other systems may learn from our experience.

2. Characteristics of Ground-Based Data

Let’s start by saying that ground-based data are fundamentally different from satellite data. The most obvious difference is that satellite data are remotely sensed and are produced when various sensors on satellites in space automatically record observations. On the other hand, ground-based data are produced when a scientist records observations from the earth immediately surrounding him (i.e., soil moisture content, wind velocity, etc.).

Another difference is that while satellites carry a limited number of sensors, ground-based scientists use a multitude of small sensors to collect and measure data. Furthermore, satellite datasets consist of a steady stream of data measuring, for example, one parameter with one sensor; on the other hand, ground-based datasets are typically very small tabular datasets made up of many, perhaps thousands, of files produced by a multitude of sensors that measure many different parameters. In addition, the sensor may not be an instrument; it just might be a process like radiocarbon dating or human observation. Therefore, the metadata (source, sensors, and parameters, etc.) associated with ground-based data are inherently more complicated than those for satellite data.

The time resolution for ground-based data is also different from that for satellite data. Whereas satellite data generally involve information that can be subset by the day and hour, ground-based data may include paleoclimatic data that date back thousands of years with absolutely no distinction or concern for day or hour.

Browse also means very different things for satellite data and ground-based tabular data. Browsing satellite imagery data involves displaying an oversampled view of the data, whereas browsing tabular data involves either viewing a portion of the data itself or a plot of various parameters.

3. Solving the Problem

Luckily the task of integrating ground-based data into EO systems is not particularly difficult. Being aware that the problem exists is a good starting point. Using broader definitions and having a certain amount of system flexibility can be easy ways to incorporate different data types with very little effort. The following paragraphs explore ways the problem can be solved.

3.1 Data Formats

A data system must be designed to accommodate various types of data, according to the requirements of the data and of the user communities of that data. One issue that must be addressed is the matter of data formats. Satellite imagery data may be in formats different from that of nonsatellite data. Field data may be tabular row-and-column data, with very small file sizes in relation to imagery files. GIS information may be in other, very different formats. There currently does not seem to be a single data format that can meet the needs of all these (and other) types of data.

For instance, HDF seems to be the standard data format within the EOSDIS realm. Unfortunately, it is rather unwieldy for small, tabular data files. We at the ORNL DAAC have thousands upon thousands of small tabular data files that may contain only a few dozen lines of data in flat ASCII text. Transferring all of these files into HDF is overkill. In addition, our users prefer to receive their tabular data in flat ASCII text or other native formats. These are formats users are comfortable with, that are appropriate to the data, and that users are able to use with no further effort on their part. If we were to put these data files into HDF, we would find ourselves in the business of explaining to all our customers what HDF is and how to use it.

The modeling community also has very real and important data format needs, but their needs may be very different from the needs of other parts of the community. They use standardized model input files, and model outputs can be datasets. The needs of the modeling community within EO systems are just now beginning to be recognized and addressed.

3.2 Metadata

3.2.1 Flexible Definitions

We have found that it is possible for ground-based data to fit within the confines of a satellite-based system if a certain amount of flexibility is available within the system definitions. If it is possible to define satellite-based terminology such as platforms, instruments, and even projects somewhat loosely, it is much easier for nonsatellite data to reside within the system.

For instance, satellite-based data may have a satellite as the platform, with instruments such as various cameras and sensors attached to the platform. With field-based data, on the other hand, platforms and sources can be more unusual, such as those shown in the following list:

	Platform/Source                   Instrument/Sensor

	meteorological station	          rain gauge 
	ocean-going research vessel	  procedure of gas chromatography
	computer			  computer model
	field investigation		  procedure of collecting and burning leaves 
					    to determine chemical properties
Also, often it is the procedure that is important, not the actual apparatus that is used to perform the procedure (e.g., instrument = radiocarbon dating). So allowing techniques and procedures into the metadata model is important if ground-based data are to fit within the system.

Furthermore, it is necessary to have the ability to subset information into more specific subcategories that are applicable to the particular type of information in question. Just as satellite data would lose meaning within a system that subsets only to the level of Satellite Data - Imagery, ground-based data lose meaning if the only subcategory is Nonsatellite Data - Field Investigation. Precise definitions make data easier to find.

3.2.2 Temporal Characteristics

In addition to the flexibility within the definitions of instruments and platforms, looser constraints are needed on other metadata contents. For instance, both satellite and nonsatellite data use time and location as primary metadata. However, even within those categories there are significant differences in application.

In the satellite world, time is generally accepted to be the time of acquisition of the data, with a precise start time, an end time, and perhaps a measurement interval. The data are usually assumed to have been collected within the last 40 years. Ground-based data often have similar timeframes, but occasionally there are data that do not fit within a tidy box. For instance, ground-based data are sometimes measured in geological eras rather than years, e.g., Mesozoic 250,000 B.P. (Before Present). A dataset from a satellite commonly contains one day's worth of images, but paleoclimatic data may contain information that spans thousands of centuries.

3.2.3 Geographic Location of Atmospheric, Solar, Lunar, and Astral Data

Geographic location is usually a simple matter of identifying the latitude and longitude of a point location or an area. However, this omits all solar, lunar, and other astral data that may have an impact on our climate. For instance, the Atmospheric Radiation Measurement (ARM) Archive at the Oak Ridge National Laboratory archives ground-based data about atmospheric radiation balance and cloud feedback processes that are critical to the understanding of global climate change. The points where the data were collected are much less relevant than the part of the atmosphere to which the data apply. In this context it is meaningless to classify data based on the latitude and longitude of the collecting instruments. Atmospheric data, sunspot data, and many other types of data are very valid types of data to be included in earth observation data systems, yet there is often no way to indicate solar latitude and longitude. The differing nature of these types of data needs to be addressed in system design and system standards.

3.2.4 Definition of Geographic Areas

Ground-based data are often closely associated with geographic areas. If an area is the site of an intensive field campaign, scientists may refer to the data by the geographic area. (e.g., Konza Prairie data, Amazon data, etc.) However, there is currently no commonly used standard for defining what those areas are. For instance, what are the accepted borders of the North Atlantic region? This would appear to be an obvious candidate for some standardization within the earth observation community.

3.2.5 Principle Investigators

Satellite data are generally anonymous data streams, while ground-based data are very much associated with the scientists who braved the elements to collect the data. For instance, the dataset named "Atmospheric Concentrations - Mauna Loa Observatory, Hawaii 1958-1986" is commonly known within the scientific community as "Keeling's data" after the principle investigator, C. D. Keeling. A related issue, which is currently being studied, is how to ensure that the scientist is given credit when his/her data are used. There are many accepted methods for citation of documents, but the citation of data is less defined. This is an issue that is vitally important to the scientific community because it can have impacts on funding. Assurances that proper credit will be received can significantly impact the willingness of scientists to place their data in data archives.

3.3 Data Streams vs. Intermittent Data

Satellite data tend to come in streams, while ground-based data are typically much more intermittent in nature. Sampling intervals may be very irregular, with intensive measurements during some time periods and few or none at other times. Unexpected circumstances may arise (e.g., "A moose kicked the instrument so no data were recorded for several days.") Ground-based data that do stream may be significantly less characterized than satellite data streams. The characteristics of satellite data streams are defined when the sensor is developed, which can be years before launch. Conversely, often it is not known which instruments will be deployed in the field until the field campaign is in progress.

3.4 Documentation

Documentation is vital for on-line data to be useful to the end user. Without accurate and thorough documentation the user has little more than pretty pictures or lists of numbers. Yet the needs and types of documentation within the EO community vary greatly. The documentation needed for an instrument on a satellite may be very different from that used to describe a tool (e.g., caliper) or method (e.g., analysis of the composition of burned leaves) used to collect ground-based data. (In EOSDIS parlance, this is called a noninstrument instrument.) For instance, the EOSDIS Sensor Guide Document contains an entry for Calibration, which is nonsensical when the "Sensor" is a Human Observer.

U.S. Federal agencies are required to use the Content Standards for Digital Geospatial Metadata as approved on June 8, 1994, to document geospatial data. This may be a useful guideline for non-Federal agencies as long as a certain flexibility in interpretation is retained. It should be noted that following these guidelines can, at times, result in documentation created at a high cost that totally dwarfs the data that it is intended to describe.

4. Summary

Our experiences at the ORNL DAAC have convinced us that ground-based data can easily be incorporated into satellite-based systems with a bit of forethought during the system design process. Flexibility seems to be the key to making different types of data work together.

Ground-based data can play a significant role in our understanding of the world if it is used in conjunction with satellite data to explore and explain physical phenomenon. We hope our paper has provided some insight into this integration process and that designers of future satellite-based systems will learn from our experience.

* Research sponsored by NASA under Interagency Agreement DOE No. 2013-F044-A1 under Lockheed Martin Energy Research Corp., contract DE-AC05-96OR22464 with the U.S. Department of Energy.

"The submitted manuscript has been authored by a contractor of the U.S. Government under contract No. DE-AC05-96OR22464. Accordingly, the U.S. Government retains a nonexclusive, royalty-free license to publish or reproduce the published form of this contribution, or allow others to do so, for U.S. Government purposes."