Introduction to Raster Data
|
Raster data is pixelated data where each pixel is associated with a specific location.
Raster data always has an extent and a resolution.
The extent is the geographical area covered by a raster.
The resolution is the area covered by each pixel of a raster.
|
Introduction to Vector Data
|
Vector data structures represent specific features on the Earth’s surface along with attributes of those features.
Vector objects are either points, lines, or polygons.
|
Coordinate Reference Systems
|
All geospatial datasets (raster and vector) are associated with a specific coordinate reference system.
A coordinate reference system includes datum, projection, and additional parameters specific to the dataset.
|
The Geospatial Landscape
|
Many software packages exist for working with geospatial data.
Command-line programs allow you to automate and reproduce your work.
JupyterLab provides a user-friendly interface for working with Python.
|
Access satellite imagery using Python
|
Accessing satellite images via the providers’ API enables a more reliable and scalable data retrieval.
STAC catalogs can be browsed and searched using the same tools and scripts.
rioxarray allows you to open and download remote raster files.
|
Read and visualize raster data
|
rioxarray and xarray are for working with multidimensional arrays like pandas is for working with tabular data.
rioxarray stores CRS information as a CRS object that can be converted to an EPSG code or PROJ4 string.
Missing raster data are filled with nodata values, which should be handled with care for statistics and visualization.
|
Vector data in Python
|
Vector dataset metadata include geometry type, CRS, and extent.
Load spatial objects into Python with geopandas.read_file() function.
Spatial objects can be plotted directly with GeoDataFrame ’s .plot() method.
|
Crop raster data with rioxarray and geopandas
|
Use clip_box in DataArray.rio to crop a raster with a bounding box.
Use clip in DataArray.rio to crop a raster with a given polygon.
Use buffer in geopandas to make a buffer polygon of a (multi)point or a polyline. This polygon can be used to crop data.
Use reproject_match function in DataArray.rio to reproject and crop a raster data using another raster data.
|
Raster Calculations in Python
|
|
Calculating Zonal Statistics on Rasters
|
Zones can be extracted by attribute columns of a vector dataset
Zones can be rasterized using rasterio.features.rasterize
Calculate zonal statistics with xrspatial.zonal_stats over the rasterized zones.
|
Parallel raster computations using Dask
|
The %%time Jupyter magic command can be used to profile calculations.
Data ‘chunks’ are the unit of parallelization in raster calculations.
(rio )xarray can open raster files as chunked arrays.
The chunk shape and size can significantly affect the calculation performance.
Cloud-optimized GeoTIFFs have an internal structure that enables performant parallel read.
|