20 Minutes
JSON (JavaScript Object Notation) is a ubiquitous data format among web data services. Although JavaScript is in the name, most programming languages can read and write JSON files. The format uses human-readable text, so it’s easy to see the data. Objects in JSON are based on keys and values, surrounded by braces and separated by a colon:
{ key : value }
You can have multiple values, separated by commas:
{ key1 : value1 },
{ key2 : value2 }
You can also have arrays of values, designated by brackets:
{ "affiliations": [
"California Institute of Technology, Pasadena, CA (US)",
"National Institute of Meteorological Sciences, Seogwipo-si (KR)"
] }
By combining these elements you can generate complex data structures that both
humans and machines can easily read. Now we’re going to take a look at the
results from a web API. We’re going to use curl
, a Unix command for transferring data from or to an Internet server
without human interaction.
First, check that you are on your desktop. Type
$ pwd
This will print your location to the terminal. In most cases you can move to the desktop by typing
$ cd ~/Desktop
Now we’ll use curl to grab web content. Note that you can copy and paste these commands into your terminal window to save typing.
curl https://doi.org/10.14291/tccon.ggg2014.ascension01.R0/1149285
We see a short HTML file with the link the DOI redirects to. If we want the
actual content we have to add a -L
option to enable curl to follow redirects.
Because curl is not a web harvesting tool web sites collected this way may look
weird. We’re going to use curl to gather JSON files- we have to specify this
is what we want by adding a header with the -H
option.
$ curl -LH "Accept:application/vnd.datacite.datacite+json" https://doi.org/10.14291/tccon.ggg2014.ascension01.R0/1149285
Most DOIs have a special property that you can access an API by just using the normal DOI URL. In this case we’re requesting DataCite standard JSON file by the text in the header. You can look at all the possible formats at the DataCite Documentation.
You can save this metadata to a file by typing
$ curl -LH "Accept:application/vnd.datacite.datacite+json"
https://doi.org/10.14291/tccon.ggg2014.ascension01.R0/1149285 > datacite_metadata.json
Much more detail about DOIs and APIs is available in this AuthorCarpentry Lesson
Next: Dataset Basics