This lesson is in the early stages of development (Alpha version)

Introduction to Vector Data

Overview

Teaching: 10 min
Exercises: 5 min
Questions
  • What are the main attributes of vector data?

Objectives
  • Describe the strengths and weaknesses of storing data in vector format.

  • Describe the three types of vectors and identify types of data that would be stored in each.

About Vector Data

Vector data structures represent specific features on the Earth’s surface, and assign attributes to those features. Vectors are composed of discrete geometric locations (x, y values) known as vertices that define the shape of the spatial object. The organization of the vertices determines the type of vector that we are working with: point, line or polygon.

Types of vector objects

Image Source: National Ecological Observatory Network (NEON)

Data Tip

Sometimes, boundary layers such as states and countries, are stored as lines rather than polygons. However, these boundaries, when represented as a line, will not create a closed object with a defined area that can be filled.

Identify Vector Types

The plot below includes examples of two of the three types of vector objects. Use the definitions above to identify which features are represented by which vector type.

Vector Type Examples

Solution

State boundaries are polygons. The Fisher Tower location is a point. There are no line features shown.

Vector data has some important advantages:

The downsides of vector data include:

Vector datasets are in use in many industries besides geospatial fields. For instance, computer graphics are largely vector-based, although the data structures in use tend to join points using arcs and complex curves rather than straight lines. Computer-aided design (CAD) is also vector- based. The difference is that geospatial datasets are accompanied by information tying their features to real-world locations.

Vector Data Format for this Workshop

Like raster data, vector data can also come in many different formats. For this workshop, we will use the Shapefile format. A Shapefile format consists of multiple files in the same directory, of which .shp, .shx, and .dbf files are mandatory. Other non-mandatory but very important files are .prj and shp.xml files.

Together, the Shapefile includes the following information:

Because the structure of points, lines, and polygons are different, each individual shapefile can only contain one vector type (all points, all lines or all polygons). You will not find a mixture of point, line and polygon objects in a single shapefile.

More Resources on Shapefiles

More about shapefiles can be found on Wikipedia. Shapefiles are often publicly available from government services, such as this page from the US Census Bureau or this one from Australia’s Data.gov.au website.

Why not both?

Very few formats can contain both raster and vector data - in fact, most are even more restrictive than that. Vector datasets are usually locked to one geometry type, e.g. points only. Raster datasets can usually only encode one data type, for example you can’t have a multiband GeoTIFF where one layer is integer data and another is floating-point. There are sound reasons for this - format standards are easier to define and maintain, and so is metadata. The effects of particular data manipulations are more predictable if you are confident that all of your input data has the same characteristics.

Key Points

  • Vector data structures represent specific features on the Earth’s surface along with attributes of those features.

  • Vector objects are either points, lines, or polygons.