Mastering Map Projection for GCOM-C L1 HDF5 Data: A Step-by-Step Guide

I Created a Python Program for Converting JAXA GCOM-C SGLI Level 1 HDF5 Files to GeoTIFF and Performing Map Projection.

Development Environment
Overview of GCOM-C (Shikisai)
Acquiring HDF5 Data
Overview of L1B Data
Explanation of the L1B format
briefly explanation of the program flow
github page is here

Development Environment

Since gdal-python may have environment dependencies, I will provide details of my development environment. Personally, I find it convenient to set up the environment using Docker.

python 3.7.4
h5py 2.9.0
hdf5 1.10.4
numpy 1.16.5
gdal 1.11.4

Overview of GCOM-C (Shikisai)

GCOM-C is equipped with an optical observation sensor called SGLI (Second Generation Global Imager). It obtains various physical quantities related to the Earth's environment from its observational data.

GCOM-C@EORC

Acquiring HDF5 Data

You can obtain GCOM-C data from JAXA G-Portal.

G-PortalTop

Overview of L1B Data

According to the GCOM-C Data User's Guide, the L1B data is derived from the observation data in L1A format, where radiance is computed and geometric corrections are applied to generate the product. The Product Format Guide for L1B (a document explaining the file structure) can be found below.

https://gportal.jaxa.jp/gpr/assets/mng_upload/GCOM-C/SGLI_Level1_Product_Format_Description_en.pdf

Explanation of the L1B format

I will provide a brief explanation of the information needed to create an image from the HDF5 files available for download from G-Portal. In simple terms, HDF5 is a binary file with a hierarchical structure. If you have knowledge of programming, you can think of it as a list containing labels and arrays.

For GCOM-C SGLI L1B data, the data is divided into the following major components:

Geometry_Data: This section contains information about the direction of the sun on the Earth's surface observed by GCOM-C, as well as latitude and longitude information of the observed Earth's surface.

Image_Data: In the case of L1B, this section contains information about land and ocean areas, quality information, and the actual image data.

Program Flow:

The program is generally structured with the following flow:

Extract image data from the HDF5 file.
Convert the image data to GeoTIFF format.
Attach Ground Control Points (GCP) to the image data.
Perform a projection transformation using gdal_translate based on the GCP to EPSG:4326.

briefly explanation of the program flow

lat_arr=np.array(f['Geometry_data']['Latitude'],dtype='float64')
lon_arr=np.array(f['Geometry_data']['Longitude'],dtype='float64')
#making GCP list
gcp_list=[]
for column in range(0,lat_arr.shape[0],20):
    for row in range(0,lat_arr.shape[1],20): 
        gcp=gdal.GCP(lon_arr[column][row],lat_arr[column][row],0,row*10,column*10)
        gcp_list.append(gcp)

The Geometry_Data is extracted from the HDF5 file, and the correspondence between the positions (row, column) on the image and the (lat, lon) coordinates is converted into a list and stored.

In the HDF5 file, latitude and longitude information is provided at intervals of 10 pixels. However, extracting all the points and providing them to gdalWarp for projection transformation can lead to errors. To address this, GCPs (Ground Control Points) are set at intervals of 200 pixels to ensure a smoother transformation process.

    # extract only RGB 3bands
    sgli_bands=['Lt_VN03','Lt_VN05','Lt_VN08']
    rgb_list=[]

    for sgli in (sgli_bands):
        print(sgli)
        band_image_arr=f['Image_data'][sgli]
        slope=f['Image_data'][sgli].attrs['Slope']
        offset=f['Image_data'][sgli].attrs['Offset']

        #numpy array
        band_image_arr=np.array(band_image_arr,dtype='uint16')
        #Set the most significant bit (MSB) to 0
        band_image_arr=np.where(band_image_arr>=32768,band_image_arr-32768,band_image_arr)
        #Set 15 th bit to 0
        band_image_arr=np.where(band_image_arr>=16384,band_image_arr-16384,band_image_arr)
        #Set missing to nan
        band_image_arr=np.where(band_image_arr==16383,np.nan,band_image_arr)
        # calculate top-of-atmosphere radiance 
        Lt=slope*band_image_arr+offset
        Lt=np.array(Lt,dtype='float64')
        rgb_list.append(Lt)

The key point is to read the Image_data array as uint16. If not specified, it is likely to be read as int16, which would result in values equal to or greater than 215 being interpreted as negative. By explicitly reading the array as uint16, we ensure that the data is interpreted correctly as unsigned 16-bit integers, ranging from 0 to 65535. This prevents any negative interpretation of values. After reading the array as uint16, the next step is to calculate the top-of-atmosphere radiance. This is achieved by multiplying the array by the slope and adding the offset. This transformation applies the calibration parameters to accurately convert the digital numbers to radiance values.

By performing the multiplication and addition, we obtain the top-of-atmosphere radiance, which is a crucial parameter for various applications and further analysis of the satellite data.

github page is here

github.com

たぶん動く...

多分GIS系の人。あくまで個人的見解であり、所属団体を代表するものではありません。