This lesson is in the early stages of development (Alpha version)

Introduction to Geospatial Raster and Vector Data with Python

Key Points

Introduction to Raster Data
  • Raster data is pixelated data where each pixel is associated with a specific location.

  • Raster data always has an extent and a resolution.

  • The extent is the geographical area covered by a raster.

  • The resolution is the area covered by each pixel of a raster.

Introduction to Vector Data
  • Vector data structures represent specific features on the Earth’s surface along with attributes of those features.

  • Vector objects are either points, lines, or polygons.

Coordinate Reference Systems
  • All geospatial datasets (raster and vector) are associated with a specific coordinate reference system.

  • A coordinate reference system includes datum, projection, and additional parameters specific to the dataset.

The Geospatial Landscape
  • Many software packages exist for working with geospatial data.

  • Command-line programs allow you to automate and reproduce your work.

  • JupyterLab provides a user-friendly interface for working with Python.

Access satellite imagery using Python
  • Accessing satellite images via the providers’ API enables a more reliable and scalable data retrieval.

  • STAC catalogs can be browsed and searched using the same tools and scripts.

  • rioxarray allows you to open and download remote raster files.

Read and visualize raster data
  • rioxarray and xarray are for working with multidimensional arrays like pandas is for working with tabular data.

  • rioxarray stores CRS information as a CRS object that can be converted to an EPSG code or PROJ4 string.

  • Missing raster data are filled with nodata values, which should be handled with care for statistics and visualization.

Vector data in Python
  • Vector dataset metadata include geometry type, CRS, and extent.

  • Load spatial objects into Python with geopandas.read_file() function.

  • Spatial objects can be plotted directly with GeoDataFrame’s .plot() method.

Crop raster data with rioxarray and geopandas
  • Use clip_box in DataArray.rio to crop a raster with a bounding box.

  • Use clip in DataArray.rio to crop a raster with a given polygon.

  • Use buffer in geopandas to make a buffer polygon of a (multi)point or a polyline. This polygon can be used to crop data.

  • Use reproject_match function in DataArray.rio to reproject and crop a raster data using another raster data.

Raster Calculations in Python
  • Python’s built-in math operators are fast and simple options for raster math.

  • numpy.digitize can be used to classify raster values in order to generate a less complicated map.

Calculating Zonal Statistics on Rasters
  • Zones can be extracted by attribute columns of a vector dataset

  • Zones can be rasterized using rasterio.features.rasterize

  • Calculate zonal statistics with xrspatial.zonal_stats over the rasterized zones.

Parallel raster computations using Dask
  • The %%time Jupyter magic command can be used to profile calculations.

  • Data ‘chunks’ are the unit of parallelization in raster calculations.

  • (rio)xarray can open raster files as chunked arrays.

  • The chunk shape and size can significantly affect the calculation performance.

  • Cloud-optimized GeoTIFFs have an internal structure that enables performant parallel read.

Glossary

FIXME