Chapter 2. GRASP software package

2.1. GRASP architecture

Figure 2.1, “The architecture of the GRASP software package” illustrates the architecture of the GRASP software package organization. This architecture deployes decoupled independent modules, such as configuration (settings) module, scientific core and the controller module, which communicates between them. The dashed boxes show the application-specific modules (input / output drivers) that can be optionally added to the GRASP software package.

This design by modules of the package aimes to minimize dependency between developed subsystems and enabling its extensibility. As a result, the two most valuable aspects realized in the GRASP architecture are:

  1. the common interfaces were defined for the replaceable elements;
  2. evolution of the scientific core without modifications of the whole package.

Figure 2.1. The architecture of the GRASP software package

The architecture of the GRASP software package

2.2. GRASP input and retrieved data

2.2.1. Measurements and retrieved parameters

The list of measurements and the retrieved parameters can present a variety of possibilities. This list can change strongly depending on the selected application and the inversion strategy. An example of the measurements and the retrieved parameters configuration for the PARASOL space observation application can be found in Dubovik et al. (2011). Therefore, this document refers to the inverted observations and retrieved properties using the terms: "input measurements/observations", "retrieved parameters/characteristics". The details of the application of the GRASP algorithm to specific measurements are expected to be clarified to users from the comments included in the input and the text of the source files. Examples and relevant scientific discussion can be found in referred articles. Additionally, the software package is distributed together with some examples. Once the code is compiled and ready to be executed, the users are encouraged to consult and run the examples.

2.2.2. GRASP inputs

The inputs of GRASP contain the following information:

  • measurements;
  • definition of unknowns (and the employed forward model);
  • retrieval setting and a priori constraints.

It should be noted that the inputs include not only the actual observations needed to be inverted, but also an ancillary information that drives many aspects of the retrieval. For instance, the employed "exact forward model" and the assumptions, the variety of a priori constraints, the mathematical and logical procedures, etc. Though, such information is generally expected in the input for any retrieval, the GRASP stands out from most of existing retrieval methods/codes by the flexibility of the retrieval and the versatility of its applicability.

The GRASP input is separated into two groups:

  • "measurements" – includes the actual values of measurements and some information of their configurations;
  • "retrieval settings" – includes all information about retrieval implementation (description of the retrieved characteristics, all settings for forward simulations and numerical inversions, etc.)

As shown in Figure 2.2, “Illustration of managing input data for GRASP software package.”, when the GRASP package is employed for operational processing, the observation are provided by the control unit from the data reader and the user defines the retrieval using YAML configuration file . If the scientific core is running as a standalone code, the next input text files are used: SDATA_INSTRUMENT.dat – the file containing the observation data INPUT_ INSTRUMENT.txt – the file containing the retrieval configuration information. The SDATA_INSTRUMENT.dat input file can also be used with the control unit. This is a useful option while applying GRASP to a new type of data aiming for the functionality and sensitivity tests.

Figure 2.2. Illustration of managing input data for GRASP software package.

Illustration of managing input data for GRASP software package.

2.2.3. GRASP input data structures

The control unit provides the measurements to the retrieval via "SDATA measurement structure", prepared using a specific data reader. The description of the structure is provided in Section 4.2.1, “The SDATA format”. The configuration information is provided by the control unit from the YAML configuration files, described in Section 4.1.1, “Settings file”. The list of parameters and their explanations are provided in Section, “HELP argument”. This information can be directly assessed by typing "help" command.

2.2.4. Input text files for running the Scientific Core alone

Retrieval library keeps its old capability of running as a stand alone application. In this case, the measurements description has to be provided in sdata format (see Section 4.2.1, “The SDATA format”). In fact, this format also works when the entire system is executed because it was ported from the scientific library to the entire system. This format is an easier one for the scientific community. When the entire system is executed, a settings file in YAML format defines the inversion strategy. In the case of using the scientific core as a stand alone application, a settings file in ascii format has to be provided. Each parameter in this file has a specific fixed location that should be respected. The curent guide, however, does not describe this file structure. Please note that the scientific module can be isolated and tested independently, but it is only for development purposes. If the reader is interested in knowing more details about that, the technical documentation can be reviewed ( Note also that, for a general user, it is not recommended to run separately the scientific library.

2.3. GRASP Scientific Core algorithm

2.3.1. Overall structure

The structure of the scientific GRASP code is shown in Figure 2.3, “General structure of the GRASP scientific algorithm (Fig.3 in Dubovik et al. 2011).”. The code and retrieval are organized as an interaction of the two main functionally different modules: "Numerical Inversion" and "Forward model". The "Numerical Inversion" is the module that drives the whole retrieval, therefore it can be considered as hierarchically the main part of the core algorithm program that determines the retrieval data flow. The "Forward model" implements simulations of the inverted observations. The overall GRASP development concept emphasizes the generalized structure of the algorithm and the retrieval. This assumes that the algorithm should be versatile, i.e. applicable to variety of remote sensing observations, and enable some flexibility in choosing retrieval approaches. For instance, choosing of different: assumptions of overall retrieval; mathematical procedures; physical models for simulating observations; presentations of obtained results, etc. Therefore, both "Numerical Inversion" and "Forward model" modules are adapted for implementing varieties of different procedures. At the same time, the data flow interaction between these modules and a high tolerance of overall code to the modifications inside of each module are implemented. The information transmitted from the input "Observation definition" and the "Inversion settings" modules determines the actual regime of the retrieval execution.

Figure 2.3. General structure of the GRASP scientific algorithm (Fig.3 in Dubovik et al. 2011).

General structure of the GRASP scientific algorithm (Fig.3 in Dubovik et al. 2011).

The data flow exchange between the "Numerical Inversion" and the "Forward model" modules, as illustrated in Fig. 6, includes the information about the following values:

f * vector of inverted measurements,

f ( a p ) vector of measurement fit at p-th iterations,

a p vector of unknowns at p-th iteration (retrieved parameters).

The content of these vectors was denoted in the describing input section.

The scientific GRASP code is written in Fortran 90. In the technical documentation there are descriptions of the structure of data flows, the source file structure and the locations.

2.3.2. Forward model

The "forward model" module implements simulations of the inverted remote sensing observations. The GRASP "forward model" is rather a universal one, i.e. can simulate large variety of remote sensing observations (passive and active observations obtained from ground and space). Also it consists from several distinct blocks (Figure 2.1, “The architecture of the GRASP software package”): aerosol single scattering; surface reflectance; and radiative transfer calculations. These blocks are semi-independent in the sense that each block can be changed or entirely replaced with no effect or minimal effect on other parts of the "forward model" routine. For example, GRASP "forward model" allows to choose physical approaches/models for simulating surface reflectance.

Figure 2.4. General organization of Forward modeling in the algorithm

General organization of Forward modeling in the algorithm

Depending on the inverted data, only a part of the "forward model" can be used. The dashed lines in Figure 2.4, “General organization of Forward modeling in the algorithm” indicate that only single scattering or surface reflectance calculations can be used by the code, if accounting for multiple scattering is not needed, as in the cases when measurements of spectral AOD, phase matrices or lidar data are inverted. Moreover, the design assumes a possibility for users to add to the GRASP "forward model" other routines implementing similar simulations. For example, the subroutine implementing radiative transfer calculations can be replaced by a subroutine implementing another method to account for multiple scattering. In the future, several new modules are planed to be included in the "forward model", such as the module for accurate modeling of the gaseous absorption, the module for radiative transfer calculation for thermal infra red spectral range, etc.

In the GRASP code, the "Forward model" is driven by a single subroutine "forward_model_pixel_PHMX" located in the file "forw_model.f90" (see technical documentation). Aerosol single scattering properties are simulated assuming aerosol as mixture of randomly oriented spheroids using of DLS spheroid package (Dubovik et al. 2006). This package can be provided as an independent program with some descriptive documentation. Surface reflectance BRDF and BPDF can be calculated using a variety of subroutines representing different models (see scientific description in Dubovik et al. 2011 and directly in technical description included in the GRASP code settings file). Radiative transfer calculation accounting for multiple scattering effects in GRASP is implemented by on-line radiative transfer calculations using Successive Order of Scattering method using the program developed by M. Herman (the method is documented in the paper by Lenoble et al. 2007). The modules for aerosol single scattering and BRDF, BPDF are easily extractable from the program and can be easily used with other radiative transfer codes if needed. In addition, some input parameters in the configuration file define the regimes of the radiative transfer calculation implementations. Specifically, a number of trade-offs between accuracy and speed can be used including the possibilities of changing the number of terms M used in the expansion of the phase matrix into Legendre polynomials, the number of terms N used in Gaussian quadrature for zenithal integration, number of numerical layers in vertical atmosphere properties integrations, etc.

2.3.3. Numerical Inversion

The "numerical inversion" is a main and most complex part of the code from the functional and the logistical point of view that governs the flow of the data. The description of the algorithm and the details of the approach are given in the scientific papers listed in Section 1. Here we provide only a short description sufficient for understanding the structure and the organization of the GRASP Scientific Core.

The program includes two main "layers" (parts): Single-pixel inversion and Multi-pixel inversion. Single-pixel inversion

The structure of the single-pixel inversion is illustrated in Figure 2.5, “Organization of GRASP Numerical Inversion: Single-Pixel Scenario”. It includes the following main operations:

  1. Modeling observations f ( a p ) for state vector p-th approximation (for p=0, initial guess is used);

  2. Calculation of matrices of first derivatives K p Jacobians;

  3. Forming p-th Normal System: A p Δ a p = ∇ Ψ p , where A p Fisher matrix; Ψ( a p ) residual; ∇ Ψ p gradient of Ψ ( a p ).

  4. Solving Normal System to determine Δ a p , and correcting the solution approximation a p+1 = a p t Δ a p so that: Ψ p - Ψ p+1 > 0;

  5. Repeating steps i - iv until Δ Ψ = Ψ p - Ψ p+1 changes significantly i.e. until Δ Ψ / Ψ p < ε

Figure 2.5. Organization of GRASP Numerical Inversion: Single-Pixel Scenario

Organization of GRASP Numerical Inversion: Single-Pixel Scenario Multi-pixel inversion

The multi-pixel retrieval approach proposed by Dubovik et al. (2011) is illustrated in Figure 2.6, “Organization of GRASP Numerical Inversion: Single-Pixel Scenario”. This is a new and very promising retrieval concept when a large group of "pixels" (instantaneous set of satellite data over one location) is inverted simultaneously. This approach allows a significant enhancement of atmospheric properties retrievals from remote sensing imagery by using additional a priori information on "correlation" between characteristics in different pixels of the inverted group. In addition, this principle allows a combination of different sets of coordinated observations, even when they are not perfectly co-incident and co-located (see Dubovik et al. 2014).

Figure 2.6. Organization of GRASP Numerical Inversion: Single-Pixel Scenario

Organization of GRASP Numerical Inversion: Single-Pixel Scenario

The multi-pixel scenario retrieval was implemented in the code with the idea of achieving maximum benefits from the similarities in the mathematical and logistical operations between the single and multi-pixel retrievals. As a result, the multi-pixel retrieval, which is a more complex procedure compared to conventional single-pixel retrieval, was realized by implementing only limited modifications of the program. This approach practically does not increase calculation time (per pixel) and does not change (complicate) the code organization.

The structure of multi-pixel inversion is illustrated in Figure 2.6, “Organization of GRASP Numerical Inversion: Single-Pixel Scenario” for a segment, i.e. a group of N inverted pixels . It includes the following operations in addition to those realized for single-pixel retrieval scenario:

  1. A loop implementing steps i – iii (of single-pixel procedure) for N pixels and forming N single-pixel Normal Systems A i, p Δ a i p = ∇ Ψ i p ;

  2. Forming single Normal System for the tile of N pixels by arranging N single-pixel Normal Systems into a sparse diagonal matrix structure and adding the matrix Ω inter defined using a priori inter-pixel smoothness constraints;

  3. Forming p-th Normal System:

    A p Δ a p = ∇ Ψ p ,

    where A p Fisher matrix; Ψ ( a p ) residual; ∇ Ψ p gradient of Ψ ( a p ).

  4. Solving Normal System for the tile of N pixels to determine Δ a p , and correcting the solution approximation a p+1 = a p t Δ a p so that: Ψ p - Ψ p+1 > 0;

  5. Repeating steps i – iv until change of residual Δ Ψ = Ψ p - Ψ p+1 is significant i.e. until Δ Ψ / Ψ p < ε

Figure 2.7. Organization of GRASP Numerical Inversion: Multi-Pixel Scenario

Organization of GRASP Numerical Inversion: Multi-Pixel Scenario

2.4. GRASP Control Unit

The control unit is a set of "service" programs that brings the application of the scientific GRASP algorithm to the operational level, first of all in the context of the processing of the data from satellite missions, such as PARASOL. It also provides a number of convenient for user features for applying GRASP to the observation and significantly reduces and simplifies the efforts in the development of new GRASP applications.

The control unit addresses a number of practical aspects:

  • The original GRASP scientific core has been designed as a standalone application for processing a limited amount of observations both in spatial and temporal extent. However, integration of this original program to operational processing of remote sensing observation, such as global satellite observations, requires significant efforts on refactoring the scientific module and adapting it to operational data production environment.

  • The data preparation for GRASP multi-pixel retrieval in processing satellite images is more complex than for classic operational retrievals, since the number of level-1 inputs needed for one level-2 output may range from a few days to several weeks. Correspondingly, the system must be able to load the significant volume of data without exhausting the available memory. Also, a compromise between the spatial and temporal extent of multi-pixel retrieval application has to be found in order to satisfy the available memory constraints and processing time requirements.

  • Though performance of GRASP algorithm is under constant improvement, the GRASP is a more complex and generally slower code than most of the conventional retrieval approaches. Therefore, a possibility of simultaneous retrievals is desirable for benefiting from parallelization of observation processing.

  • The level of input data preprocessing for GRASP multi-pixel retrieval is significantly higher because inverted tails of (satellite) observations to be composed from observations acquired at different times should characterize the same grid of geo locations. Therefore, some kind of regridding is generally required in addition to common data preprocessing (application of cloud mask, gas corrections, etc.).

  • The GRASP is versatile algorithm that has the potential to perform retrievals from diverse remote sensing observations and their combinations sensors, ranging from the ground-based photometers, radiometers and lidars to imagers onboard satellites. Therefore, adaptation of the GRASP algorithm for diverse observations should be always foreseeing. One of the main objectives of the control unit is to split the operations of the scientific algorithm and those of the data preparation.

The control unit manages all the system interactions with the processing environment. It loads the configuration settings. It is also responsible for receiving events from the system and provides the control commands for the application (the connection with the user interface). The control unit consists of the following unites (see Figure 2.1, “The architecture of the GRASP software package”):

2.4.1. Configuration manager

One of the first responsibilities of the controller is to load the configuration settings for the processing (production settings and scientific settings, such as initial guesses, number of parameters for the forward model etc). The configuration manager provides the possibility to deal with all the settings, including both the production and the scientific ones, in one unique way. In the development of the control unit this approach was considered as a strategic one, even though the production settings and the scientific settings are of entirely different nature, since they do not intervene at the same levels.

The configuration management is a key part of the developed system because the user usage experience depends on it. This module describes the usage interface, how to work with the code to achieve results. In addition, for developing this module it was necessary to understand, to document and to organize all the possible options (different behaviors) of the complex retrieval code. Moreover, some refactoring of the scientific package has been done to realize the configuration management concept.

The configuration manager controls the behavior of the control unit, as well as the peripheral elements such as the scientific input settings. Therefore, a change in the interface of the subsystems (especially for the scientific package as it evolves) can occur.

2.4.2. Controller Module

The GRASP executable results issued from the compilation of the controller module contains the main routine of the system. As illustrated in Figure 2.8, “Illustration of the data processing by the Controller”, the controller governs the data processing:

  • gets orders and other events from the runtime interface
  • performs actions in response to the events

Figure 2.8. Illustration of the data processing by the Controller

Illustration of the data processing by the Controller

The controller is responsible for making all the parts of the control unit work together. While it receives events from the runtime interface, it takes actions and delegates most of its work to other modules of the control unit, such as the input and output drivers, and certainly to the scientific package.

There are two main workflows implemented in the controller. In the sequential version, the controller will retrieve a tile (a block of data that can be decomposed in many segments - a minimum instrument data treated inside the retrieval) and will work segment by segment, sequentially. In the parallel version of the controller, it can retrieve many segments at the same time, using MPI technology. The parallelization technology allows the controller to send jobs to different cores in the system, obtaining a lower total processed time.

2.4.3. Abstract input and output drivers

These sub-systems are responsible for preparing the input data for the scientific module and gathering output data in the unified "abstract" format. This procedure is not dependent on the particular application and is managed in unified manner by the GRASP scientific core. The creation of these sub-systems within the control unit assure the versatile and "generalized" character of GRAPS algorithm, allowing the system to be extended for specific purposes.

2.4.4. Concrete input and output data drivers

These sub-systems can be considered as peripheral sub-systems since they can be replaced in the context of every specific application. The concrete input data drivers are responsible for the satellite (e.g. PARASOL, MERIS) or ground-based (e.g. photometer or lidar) data loading. The rest of the system should never communicate directly with the loading driver but always with the abstract input bridge. The GRASP multi-pixel retrieval scenario uses multi-temporal data organized in the so-called segments, while the native formats of the input data may be in the form of many independent files (orbits for a given period, ancillary data, etc). Therefore, it is the role of the concrete input data drivers to obtain the data in the native format, gather them in a single, easy-to-use object tile, and to present them as if they came from a single data source. Also, the input drivers may include some preprocessing of the data, such as atmospheric gaseous correction for satellite data, application of calibration, etc.

The concrete output data drivers are responsible for the scientific retrieval output products storage. They can be declined in several output formats, depending on the needs of the users and of the applications, and also on the requirements of the data centers: HDF, NetCDF, GIS databases, etc. The design of the control unit assures that the rest of the system does not interact directly with a concrete output driver, but with an abstract output bridge that delegates the action of writing to a concrete driver. This is because all storage formats are not adapted to all data sources and to all applications. In addition, the control unit system allows a straightforward replacement of the storage module by another one, if the GRASP retrieval is adapted to a new application or if an instrument is changed in the developed application.

2.4.5. GRASP file organization

The following list shows the GRASP source files organization. The code is classified into folders. The folders are represented by bold letters, followed by an explanation of their content.

  • build: Compiled executable. It appears after the code compilation.

  • doc: Technical and user documentation of the software package. The lines that appear for reading are stored in a raw format in this folder.

  • examples: Some examples of retrieving instrument data

  • libs: Bridges to certain libraries (facades)

  • src Source code of the GRASP software

    • controller: contains the source files used by the controller main program, responsible for organizing the calls to all the modules of the system.

    • global: contains the source files of some functionalities that can be used by different submodules of GRASP. This code is GRASP dependent, thus can not be located in the "libs" folder, but is general enough to be used by the entire system.

    • input: contains the source files used by input abstract driver - the module is responsible for handling input data and injecting them into the scientific unit functions (the retrieval algorithm). This module can be extended by adding an input concrete driver that can include two additional kinds of functions: i) specific instrument drivers and ii) "transformers". i) are the functions called for loading data from specific instrument and ii) are the functions called after reading the input, call the scientific unit and transform the input data to scientific core GRASP algorithm.

    • output: contains the source files used by the output abstract driver - a module responsible for handling retrieval output. For example, the module creates a tile output based on single-segment outputs. This module can be extended by the output concrete driver that may includes different functions: 1) output segment functions - the functions that receive the output from a segment (provided by the output abstract driver) and can use it for extracting and printing target information; 2) output tile functions - these functions are called at the end of the process (once the retrieval information was received by the output abstract driver) in order to print the output for the entire tile; 3) output current functions - these functions can be called after processing a segment (once the retrieval information was received by the output abstract driver). Yet, the retrieval results for the entire tile will receive the tile output information as an argument. This approach can be used for printing a current status of retrieval for a tile before finishing the complete retrieval process.

    • retrieval: source files used by the scientific unit

      • constants_set: different sets of constants which define main array sizes used in the code. The use of this constants allows to optimize the memory used by GRASP for different applications.

      • inversion: fortran functions related to numerical inversion.

      • forward_model: fortran files for computation of modeled measurements (forward model)

      • interfaces: routines that provide data preparation, validation and exchange between different submodules of the scientific module

      • external_interfaces: definition of connections of the code with some external softwares (mainly superLU solver).

      • utilities: general routines used in many different submodules of the scientific code such as print routines.

      • internal_files: kernels used for computing particle single scattering properties by the forward model part of the code

    • settings: contains the source files used by the configuration unit that defines the settings for the calls to all the modules of the system.

2.4.6. External Libraries used by GRASP code

The GRASP software package allows the performance optimization of both the scientific retrieval and the control unit by utilising the external standard libraries that are not distributed as part of the GRAPS Open Code, but can provide some performance improvement of the code. These software packages are available on the Internet open access and can be downloaded by the users directly with no charge. Figure 2.9, “Structure of the utilization of public standard libraries in the GRASP code” shows the utilization of the standard software libraries in GRASP.

Figure 2.9. Structure of the utilization of public standard libraries in the GRASP code

(green color indicates the optional libraries, violet color indicates the optional but highly desirable libraries, the reddish color indicates that which is currently mandatory for the control unit, but that will be separated from the code before GRASP open is released). The licenses of each library are indicated in parenthesis.

Structure of the utilization of public standard libraries in the GRASP code

The following main libraries are used by GRASP:

mpi library: the control unit has the optional feature of parallelizing segment process using mph technology. The various mpi libraries can be used (each one with different licenses). Thus, the user can choose the implementation of the mpi technology, using the selected software that may have different performance and license.

lib csv: this library helps to parse the databases prepared in CSV (Comma-Separated Values) format that is used in some input concrete drivers. This library is not needed if a specific compilation is used (that depends on the concrete data and driver used).

grib api: this library is needed to read the grib format that is used for reading climatology information in concrete satellite data drivers.

hdf4: this library is used to read/write files in hdf4 format. It is used in some output optional GRASP functions. Using a specific compilation (removing these output functions) the code can be run without using these libraries.

solver : the software package is optimized for solving linear systems. Such solver can significantly improve the performance of GRASP in certain situations since GRASP scientific core performs retrieval sequentially solving a number of linear systems. For example, when the multi-pixel retrieval is performed, the GRASP scientific core solves linear systems that can be of very large dimension and have pronounced sparse structure. The code was adapted and tested for using libraries such as SuperLU, ViennaCL, and MUMPS. It is applicable for solution of any linear system in the GRASP internal routine.

GLib: this is the GNU C library that contains a set of tools for programing in C. Specifically, it is used by yaml settings library (which source code is part of GRASP settings module) helping to read YAML files (using lib yaml dependency) and translate them into C structures. In that process GLib is used to define internal tree structures.

lib yaml: this is low level library to parse yaml format files.