Different Shapes of Data. APIs.

Overview

Teaching: 90 min
Questions
  • What the popular flat file formats used by data analysts?

  • How I get data from the internet?


Objectives
  • Familiarize popular data file format

  • Build data files in different formats

  • Understanding URLs used on the internet

  • Introducing APIs as data sources

  • Exercise REST APIs

Keywords

#CSV

#XML

#JSON

#API

#POSTMAN

#EUROSTAT API

#WORLDBANK API

Table of Content

Prerequisites for this chapter
Flat files as data store
CSV
XML
JSON
Anatomy of a URL
Getting data from internet
Using Eurostat API



Prerequisites for this chapter

  • Install: Postman

  • Read: Chapter: “When Databases Attack: A Guide for When to Stick to Files” from “Bad Data Handbook”



Flat files as data store

Advantages

Disadvantages



*.csv

CSV

Format

Column name1,Column name2 --> <Header_line>
Value1,Value2 --> <Record_1_line>
Value3,Value4 --> <Record_2_line>



Example

First Name,Last Name,Job Title
Kate,Middleton,Princess
Bill,Gates,Retired
Ronald,McDonald,Clown



*.xml

Format

<element>information</element>
<element attribute=“information”>information</element>
<element attribute=“information”/>
<element>
	<nested-element>information</nested-element>
</element>



Example - Variation 1

<persons>
  <person>
	    <first_name>Kate</first_name>
	    <last_name>Middleton</last_name>
	    <job_title>Princess</job_title>
  </person>
  <person>
	    <first_name>Bill</first_name>
	    <last_name>Gates</last_name>
	    <job_title>Retired</job_title>
  </person>
  <person>
	    <first_name>Ronald</first_name>
	    <last_name>McDonald</last_name>
	    <job_title>Clown</job_title>
  </person>
</persons>


Example - Variation 2

<persons>
  <person>
  	<name first=“Kate” last=“Middleton”/>
  	<job title=“Princess”/>
  </person>
  <person>
  	<name first=“Bill” last=“Gates”/>
  	<job title=“Retired”/>
  </person>
  <person>
  	<name first=“Ronald” last=“McDonald”/>
  	<job title=“Clown”/>
  </person>
</persons>


Example - Variation 3

<persons>
  <person first_name=“Kate” last_name=“Middleton” job_title=“Princess”/>
  <person first_name=“Bill” last_name=“Gates” job_title=“Retired”/>
  <person first_name=“Ronald” last_name=“McDonald” job_title=“Clown”/>
</persons>



*.json



Example - The previous Variation 3 in JSON

{
   "persons": [
       {
           "first name": "Kate",
           "last name": "Middleton",
           "job title": "Princess"
       },
       {
           "first name": "Bill",
           "last name": "Gates",
           "job title": "Retired"
       },
       {
           "first name": "Roland",
           "last name": "McDonalds",
           "job title": "Clown"
       }
   ]
}



JSON validation

Learning JSON takes time and practice. The learning curve is higher than in case of CSV or XML.

In this learning process a validation tool could be really handy: https://jsonformatter.curiousconcept.com/#. Use this to validate the JSON structures you are creating during the exercises.



Elements of JSON format: Key/Value data

In the heart of JSON is the key value data represented like this:

"key":"value"

key - is represented by a string enclosed in single or double quotation marks.

value - is typically represented a data type.



Elements of JSON format: Data types

A value can use the following data types:

Strings: Characters that are enclosed in single or double quotation marks. Example:

"name":"Mickey"

Number: A number could be integer or decimal, positive or negative. Example:

"year of birth":1989 

Booleans: The Boolean value could be either true or false without any quotation marks. Example:

"human":true 

Null: Here, null means nothing without any quotation marks. Example:

"known illness":null 



Elements of JSON format: The JSON Object

Objects are JSON elements that are enclosed in curly brackets. In objects, keys and values are separated by a colon ‘:’, pairs are separated by comma. This is the smallest JSON possible (one JSON Object with one key/value):

{"key":"value"}

Now, here is a JSON Object with two key/value:

{"key1":"value1","key2":"value2"}

Besides the data types above, a value can be a JSON Object as well (nesting):

{"key1":{"key2":"value2"}}



Elements of JSON format: The JSON Arrays

To complicate further, besides the data types and JSON Objects a value can be an array as well. Arrays are the lists that are represented by the square brackets, and the values have commas in between them. The value of an array can be the well known data types or JSON objects or another array. We can also mix these i.e., a single array can have strings, boolean, numbers or objects:

{"example1":[1, 2, 7.8, 5, 9, 10]}
{"example2":["red", "yellow", "green"]}
{"example3":[{"key1":30}, {"key2":40}]}
{"example4":[1,2,[56,57]]}
{"example5":[8, "hello", null, true, {"key1":30}]}



Elements of JSON format: The JSON Document

The root element (top of the tree hierarchy) in a .json file should be always one JSON object or a JSON Array. This said the following JSON document is invalid:

{"key1":"value1"},{"key2":"value2"}

but this one is valid:

[{"key1":"value1"},{"key2":"value2"}]




Exercise 1 XML to JSON exercise

Convert bookstore.xml to JSON.



Exercise 2 JSON design exercise

Create a JSON file which represents best the information we stored for “University” DB (Students, Programs, Courses etc)

Anatomy of a URL

URL


Getting data from internet

Web scraping

Data through APIs


Example - Eurostat API

Eurostat is offering a wide range of databases: https://ec.europa.eu/eurostat/web/main/data/database




Exercise 3 - Worldbank API

Using World Bank Data API, request “GDP per capita, PPP (current international $)” [indicator:NY.GDP.PCAP.PP.CD] for all 266 countries and regions.

Hints:

  • The information requested is from “World Development Indicators” dataset, which can be browsed here
  • More details in the API you can find here



Exercise 4 - Worldbank API Advanced

Using World Bank Data API, request “CO2 emissions (metric tons per capita)” between 2000 and 2002 for EU,Hungary, USA and China.