Introduction
In traditional data analysis, the analyst is usually asked to “cleanse” the data, to give it a suitable format in order to be analysed. This process implies that the data exists in concentrated form in a file. However, this is not always the case. The problem with analysing a dataset is that it does not take into account any new values. In this case I have to re-download the new data and rerun the analysis through R or whatever tool I’m using. This workflow is not productive. Where it becomes apparent the inefficiency of the above process is when there is a constant flow of data. In these cases the usefulness of the API is shown since it gives us the new values with relatively little effort. At the same time the use of the API favours the integration of new data, so if we build a predictive model its accuracy will remain at a satisfactory level. Finally, another positive contribution is in Shiny Apps in order to automate data analysis and the information is reliable and up-to-date for the visitor.
In general there are many APIs from which we can get important information. For more details about the availability of free APIs you can see a relevant repository with a list of them. In this article we will deal with Greek Government’sAPI available through data.gov.gr.
Requesting an API Key
However, like most APIs, this one requires us to register with the platform. We can do this on the relevant page by filling in our details. All the fields of the form must be completed, as shown below:
Next, you should check your e-mail because you will get a message with a Token through which the API will be used. Be sure to also check the spam folder. By the way, in case you lose the token and have deleted the mail and reapply (with the same mail) it will be resent to your email.
Using API
Once we receive the Token, we need to somehow get the data. On the specific platform there are two ways.
- Using API from the website:
- Using API with R
One way (and probably the least efficient) is to request the data directly from the data.gov.gr website. This is on the one hand extremely simple, but on the other hand we are downloading a fixed version of the data and therefore if I want to update the data I have to download the data again.
Just a use-case
Before completing this article, I decided it would be helpful to provide an example. At the time of writing, there are 49 databases to choose from. To demonstrate the usefulness of the API, I will select a database that is updated fairly frequently. One such dataset is that of passengers travelling on ships.
I set the base url of my data. Given the fact that I am interested on sailing traffic I will use the respective endpoint:
base = "https://data.gov.gr/api/v1/query/sailing_traffic"
As the documentation points out, we need to set the date range we are interested in. It is worth noting that you cannot retrieve a large range with a single API call. Returning to the previous example, ship passenger traffic data starts in 2017 and goes up to the present day (2023). Let’s assume that we don’t mind that, and we want the data from the first four days of July in 2023.
date_from = "2023-07-01"
date_to = "2023-07-04"
API_URL = paste0(base, "?date_from=", date_from, "&", "date_to=", date_to)
call = httr::GET(url = API_URL,
add_headers(`Authorization` = paste0('Token token_id')
)
)
Where it refers to token_id, you should insert the token that was sent to you by data.gov.gr via email. So, after requesting the data through a GET request and waiting a bit, we receive a list with various pieces of information. What we are interested in is the data, so we look into the content category of the list. However, we notice that the information is in an unreadable format since it is in hexadecimal form.
data = base::rawToChar(call$content)
Using rawToChar converts the response into readable characters but I still need to convert my data to tabular format in order to begin with my analysis.
#data = jsonlite::fromJSON(data, flatten = T)
Finally, using the jsonlite
package, I receive a data frame named as data which includes every destination, passenger/car count for the requested date.
Acknowledgements
Image by Christopher Kuszajewski from Pixabay