Forecasting Unemployment in Greece

Make a prediction about the future value of Greece’s unemployment using ARIMA model.

R
Time Series
Author

stesiam

Published

October 22, 2022

Introduction

Background

Time series

Methodology

Short Answer

If you are in a hurry, I predicted that unemployment on Greece is expected to further reduce. It will range between 10% - 13% in February, 2023.

Prerequisites

Import Libraries

For this analysis we will need standard libraries for importing and processing my data, such as readr (Wickham, Hester, & Bryan, 2024) and dplyr (Wickham, François, Henry, Müller, & Vaughan, 2023). The kableExtra (Zhu, 2024) package was used to print the results in table format, while the flextable (Gohel & Skintzos, 2024) package was used to print the results of the Dickey-Fuller and KPSS tests.

Then, due to the nature of the data (time series) it was deemed necessary to use relevant libraries such as lubridate(Spinu, Grolemund, & Wickham, 2024), tseries(Trapletti & Hornik, 2024) & forecast(R. Hyndman et al., 2025) packages.

Finally, the ggplot2 (R-ggplot2?) package was used to create some visualizations, as well as an auxiliary package, ggtext (R-ggtext?), for further formatting those.

Show the code
# General purpose R libraries
library(dplyr)
library(readr)
library(kableExtra)
library(flextable)

# Graphs
library(highcharter)

# Time Series 

library(lubridate)
library(tseries)
library(forecast)
Show the code
acf_plot_function = function(data, min = -1, max = 1){
  ts_data <- data
  N <- length(ts_data)

# Compute confidence limit
conf_limit <- qnorm(0.975) / sqrt(N)

# Use forecast::Acf (biased estimator, recommended for ARIMA)
acf_obj <- Acf(ts_data, plot = FALSE, lag.max = 20)

# Extract lags and values
# Note: Acf() returns acf_obj$acf as a matrix; acf_obj$lag as a matrix
lags_acf <- as.numeric(acf_obj$lag[-1])             # skip lag 0
acf_values <- as.numeric(acf_obj$acf[-1])           # skip lag 0

# Build the highcharter ACF plot
acf_chart <- highchart() %>%
  hc_chart(type = "column") %>%
  hc_title(text = "Autocorrelation Plot") %>%
  hc_xAxis(categories = lags_acf, title = list(text = "Lag")) %>%
  hc_yAxis(min = min, max = max,
    title = list(text = "ACF"),
    plotLines = list(
      list(value = 0, color = "#000000", width = 3),
      list(value = conf_limit, color = "#FF0000", dashStyle = "ShortDash", width = 1),
      list(value = -conf_limit, color = "#FF0000", dashStyle = "ShortDash", width = 1)
    )
  ) %>%
  hc_legend(enabled = FALSE) %>%
  hc_add_series(name = "ACF", data = round(acf_values, digits = 4))

# Show chart
acf_chart
}

pacf_plot_function <- function(data, min = -1, max = 1) {
  ts_data <- data
  N <- length(ts_data)
  
  # Compute confidence limit (same formula as for ACF)
  conf_limit <- qnorm(0.975) / sqrt(N)
  
  # Compute PACF using forecast::Pacf (biased estimator, recommended for ARIMA)
  pacf_obj <- forecast::Pacf(ts_data, plot = FALSE, lag.max = 20)
  
  # Extract lags and PACF values
  lags_pacf <- as.numeric(pacf_obj$lag)  # already excludes lag 0
  pacf_values <- as.numeric(pacf_obj$acf)
  
  # Build the highcharter PACF plot
  pacf_chart <- highcharter::highchart() %>%
    highcharter::hc_chart(type = "column") %>%
    highcharter::hc_title(text = "Partial Autocorrelation Plot") %>%
    highcharter::hc_xAxis(categories = lags_pacf, title = list(text = "Lag")) %>%
    highcharter::hc_yAxis(
      min = min, max = max,
      title = list(text = "PACF"),
      plotLines = list(
        list(value = 0, color = "#000000", width = 3),
        list(value = conf_limit, color = "#FF0000", dashStyle = "ShortDash", width = 1),
        list(value = -conf_limit, color = "#FF0000", dashStyle = "ShortDash", width = 1)
      )
    ) %>%
    highcharter::hc_legend(enabled = FALSE) %>%
    highcharter::hc_add_series(name = "PACF", data = round(pacf_values, digits = 4))
  
  # Show chart
  pacf_chart
}

Import dataset

After loading the libraries I am able to use the commands of the readr package to import my data. My data is in .csv format, so I’ll use the read_csv() command (Wickham et al., 2024) to import them.

Additionally, I choose not to include EA-19 values (as I investigate Greece’s unemployment).

Show the code
unemployment <- read_csv("data/unemployment.csv") %>%
  select(LOCATION, TIME, Value) %>% filter(LOCATION != "EA19")

Preview Dataset

Show the code
head(unemployment, 10) %>%
  kbl(., 
    align = 'c',
    booktabs = T,
    centering = T,
    valign = T) %>%
  kable_styling()
Table 1: Preview Dataset (first 6 rows)
LOCATION TIME Value
GRC 1998-04 10.9
GRC 1998-05 11.0
GRC 1998-06 10.9
GRC 1998-07 11.0
GRC 1998-08 11.2
GRC 1998-09 11.1
GRC 1998-10 11.1
GRC 1998-11 11.4
GRC 1998-12 11.4
GRC 1999-01 11.4

Dataset Structure

Our dataset is consisted by 3 variables (columns). More specifically, concerning my variables, are as follows :

Variable Property Description
LOCATION qualitative
(nominal)
Specific country’s statistics
TIME qualitative
(ordinal)
Month of the reported data
Value quantitative
(continuous)
Unemployment at the given Time and Country

Thus, my sample has 3 variables, of which 2 are qualitative and 1 is quantitative property.

Time Series Preprocessing

The TIME variable needs to be a Date variable which is not fulfilled on our case.

Show the code
sapply(unemployment, class) %>% 
  kbl() %>% kable_styling(full_width = F, position = "center")
x
LOCATION character
TIME character
Value numeric

So, above I see that I have dates in a format “YYYY-MM” (Year - Month) and they are considered as characters. With the help of lubridate package I will convert my time series on a Date format.

Show the code
unemployment$TIME <- lubridate::ym(unemployment$TIME)

And let’s check again :

Show the code
sapply(unemployment, class) %>% kbl() %>% kable_styling(full_width = F, position = "center")
x
LOCATION character
TIME Date
Value numeric

And now I got the Date format. I am able to continue my analysis.

Missing Values

Show the code
sum(is.na(unemployment))

Good news! On this dataset there are 0 missing values, in total.

In case the dataset had missing values, I would first look at which variables those were.In a second phase, it might be necessary to replace the missing values.

Descriptive Statistics

The unemployment data for Greece refer to the period from April 1998 το August 2022. Regarding Europe we have data from January 2000 to August 2022.

The last 20 years have seen particularly large changes mainly in Greece, due to the domestic economic crisis. There is a noticeable change between September 2010 (10.1%) and September 2013 (28.1 %), in terms of unemployment in Greece. For the record, below you can find a table with the months that presented the highest unemployment in Greece. As you will notice the five highest values were observed in 2013.

Show the code
unemployment %>% dplyr::filter(LOCATION == "GRC") %>% select(TIME, Value) %>% as.data.frame() %>%
  arrange(Value) %>% tail(5) %>% kbl() %>% kable_styling(full_width = F, position = "center")
TIME Value
289 2013-11-01 27.9
290 2013-04-01 28.0
291 2013-05-01 28.0
292 2013-07-01 28.1
293 2013-09-01 28.1

Accordingly, the highest unemployment rates observed at the European level are the following:

Show the code
unemployment %>% dplyr::filter(LOCATION == "EU27_2020") %>% select(TIME, Value)  %>%
  as.data.frame() %>% arrange(Value) %>% tail(5) %>% kbl() %>% kable_styling(full_width = F, position = "center")
TIME Value
268 2013-06-01 11.6
269 2013-01-01 11.7
270 2013-02-01 11.7
271 2013-03-01 11.7
272 2013-04-01 11.7

The lowest unemployment ever recorded in Greece was in the period of 2008:

Show the code
unemployment %>% dplyr::filter(LOCATION == "GRC") %>% select(TIME, Value) %>% as.data.frame() %>%
  arrange(Value) %>% head(5) %>% kbl() %>% kable_styling(full_width = F, position = "center")
TIME Value
2008-05-01 7.4
2008-06-01 7.6
2008-07-01 7.6
2008-10-01 7.6
2008-01-01 7.7

On the other hand, at the moment in Europe we are at the lowest levels of the last 20 years, with unemployment hovering around 6%.

Show the code
unemployment %>% dplyr::filter(LOCATION == "EU27_2020") %>% select(TIME, Value) %>%
  as.data.frame() %>% arrange(Value) %>% head(5) %>% kbl() %>% kable_styling(full_width = F, position = "center")
TIME Value
2022-07-01 6.0
2022-08-01 6.0
2022-04-01 6.1
2022-05-01 6.1
2022-06-01 6.1

Finally, it is evident that in the case of Greece the changes are more intense than in Europe.

All of the above can be summarized by the following graph:

Show the code
g = unemployment %>%
  mutate(Year = year(TIME)) %>%
  group_by(LOCATION, Year) %>%
  summarise(mean_unemployment = mean(Value))
`summarise()` has grouped output by 'LOCATION'. You can override using the
`.groups` argument.
Show the code
highchart() %>%
    hc_chart(type = "line") %>%
    hc_title(text = "Unemployment Rates") %>%
    hc_xAxis(title = list(text = "Year")) %>%
    hc_yAxis(title = list(text = "Value")) %>%
    hc_tooltip(shared = TRUE, valueDecimals = 2, valueSuffix = " %") %>%
    hc_add_series(
        data = g,
        type = "line",
        hcaes(x = Year, y = mean_unemployment, group = LOCATION)
    ) %>%
  hc_plotOptions(
  series = list(
    dataLabels = list(
      enabled = TRUE,
      formatter = JS("function() {
        if (this.point.index === this.series.data.length - 1) {
          return this.y.toFixed(2) + ' %';
        }
        return null;
      }")
    )
  )
)

Examine trend and seasonality

In time series analysis, it’s important to distinguish where the variability in the final form of the series originates from. Time series consist of three main components: trend, seasonality, and randomness. The trend refers to whether the series shows a specific direction (upward/downward). Seasonality refers to recurring patterns at regular intervals.

y_t = S_t + T_t + E_t Όπου:

  • y_t δηλώνουν τα δεδομένα που έχουμε διαθέσιμα,
  • S_t
  • T_t
  • E_t

Similarly, the multiplicative model is defined:

y_t = S_t \cdot T_t \cdot E_t

where the components that make up the time series are multiplied instead of added.

Examine stationarity

Definition of stationarity

An important concept in studying time series is stationarity. A time series is called stationary (“Applied Time Series Analysis,” n.d.) if:

  1. E(X_t):\text{constant}
  2. Var(X_t): \text{constant}
  3. Cov(X_t, X_s): \text{constant}

Examine Stationarity Graphically

It is apparent that there is a big variation on the values of unemployment. This time series is not stationary and the differencing is justified.

Show the code
highchart() %>%
    hc_chart(type = "line") %>%
    hc_title(text = "Unemployment Rates") %>%
    hc_xAxis(title = list(text = "Year")) %>%
    hc_yAxis(title = list(text = "Value")) %>%
    hc_tooltip(shared = TRUE, valueDecimals = 2, valueSuffix = " %") %>%
    hc_add_series(
        data = g,
        type = "line",
        hcaes(x = Year, y = mean_unemployment, group = LOCATION)
    ) %>%
  hc_plotOptions(
  series = list(
    dataLabels = list(
      enabled = TRUE,
      formatter = JS("function() {
        if (this.point.index === this.series.data.length - 1) {
          return this.y.toFixed(2) + ' %';
        }
        return null;
      }")
    )
  )
)

Here we can see a big improvement in comparison with original data. I have some concerns about points close to 150 (mildly upwards trend) and 250 (outlier).

Show the code
grc_unemployment = unemployment %>% dplyr::filter(LOCATION == "GRC")
grc_unemployment_diff1 <- diff(grc_unemployment$Value, differences = 1)

highchart() %>%
    hc_chart(type = "line") %>%
    hc_title(text = "Unemployment Rates") %>%
    hc_xAxis(title = list(text = "Year")) %>%
    hc_yAxis(title = list(text = "Value")) %>%
    hc_tooltip(shared = TRUE, valueDecimals = 2, valueSuffix = " %") %>%
    hc_add_series(
        data = grc_unemployment_diff1
    )

Given the concerns of above, I made also a second difference plot. It seems to solve the problem on points close to 150.

Show the code
grc_unemployment_diff2<- diff(grc_unemployment$Value, differences = 2)

highchart() %>%
    hc_chart(type = "line") %>%
    hc_title(text = "Unemployment Rates") %>%
    hc_xAxis(title = list(text = "Year")) %>%
    hc_yAxis(title = list(text = "Value")) %>%
    hc_tooltip(shared = TRUE, valueDecimals = 2, valueSuffix = " %") %>%
    hc_add_series(
        data = grc_unemployment_diff2
    )

Examine Stationarity with Statistical tests

The graphical interpretation of stationarity can be beneficial for a quick assessment on topic of stationarity. However it can be considered a subjective metric, which leads on a non consistent decision (someone may consider the second figure as stationary and some others not.

Thankfully, there are some statistical tests which can help us on our decisions. Some commonly used are :

  • Augmented Dickey-Fuller (ADF) test
  • Phillips- Perron (PP) test
  • Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test
  • Zivot-Andrews (ZA) test

But an other problem arises. We are not in a fancy IDE with menus and checkboxes. We use R and its packages. There are many packages that have already write functions to study stationarity with the aforementioned tests. Some of them are:

  • tseries
  • urca
  • CADFtest

We should carefully evaluate our choices, as these packages advertise similar functionality but differ significantly in the scope of their functions and, more importantly, in the level of flexibility they offer for handling time series characteristics. In short, the tseries package is the one I’m most familiar with, but it is also the most limited in terms of configurable parameters. Although many tutorials use this package, I found that I couldn’t set the number of lags for tests or specify characteristics like trends when applicable. On the other hand, the urca package addresses these limitations by allowing users to specify both the trend component and the number of lags in the respective tests. One minor drawback of urca, however, is that it only provides critical values, not p-values.

Summary

Show the code
summary_stationarity_results <- data.frame(
                             "Type" = c("levels", "Diff(GRC)", "Diff2(GRC)"),
                             "ADF test" = c("Non Stationary", "Stationary", "Stationary"),
                              "PP test" = c("Non Stationary", "Stationary", "Stationary"),
                             "KPSS test" =c("Non Stationary", "Non Stationary", "Stationary")
)


summary_stationarity_results  %>% kbl() %>% kable_styling() 
Type ADF.test PP.test KPSS.test
levels Non Stationary Non Stationary Non Stationary
Diff(GRC) Stationary Stationary Non Stationary
Diff2(GRC) Stationary Stationary Stationary

ADF test

One of the most

H_0 : \text{The time series has a unit root} \\ H_1 : \text{The time series has not a unit root} Selecting the lags

Show the code
Loading required package: MASS

Attaching package: 'MASS'
The following object is masked from 'package:dplyr':

    select
Loading required package: strucchange
Loading required package: zoo

Attaching package: 'zoo'
The following objects are masked from 'package:base':

    as.Date, as.Date.numeric
Loading required package: sandwich
Loading required package: urca
Loading required package: lmtest

Attaching package: 'lmtest'
The following object is masked _by_ '.GlobalEnv':

    unemployment
The following object is masked from 'package:highcharter':

    unemployment
Show the code
G = VARselect(grc_unemployment$Value, lag.max = 20, type = "const")
Show the code
get_adf_results = function(variable){

    # Ensure variable is numeric (e.g., a time series or numeric vector)
    if (!is.numeric(variable)) {
        stop("Input variable must be numeric.")
    }
    
    # Perform ADF tests
    adf_orig = adf.test(variable)
    adf_diff1 = adf.test(diff(variable, differences = 1))
    adf_diff2 = adf.test(diff(variable, differences = 2))
    
    # Extract results
    adf_table = data.frame(
        "Transformation" = c("Level", "1st Difference", "2nd Difference"),
        "Dickey-Fuller" = c(
            adf_orig$statistic[["Dickey-Fuller"]],
            adf_diff1$statistic[["Dickey-Fuller"]],
            adf_diff2$statistic[["Dickey-Fuller"]]
        ),
        "Lag Order" = c(
            adf_orig$parameter[["Lag order"]],
            adf_diff1$parameter[["Lag order"]],
            adf_diff2$parameter[["Lag order"]]
        ),
        "p-value" = c(
            adf_orig$p.value,
            adf_diff1$p.value,
            adf_diff2$p.value
        )
    ) %>%
        kbl(digits = 5) %>%
        kable_styling(full_width = FALSE)
    
    return(adf_table)
}
Show the code
get_adf_results(grc_unemployment$Value)
Augmented Dickey-Fuller test results
Transformation Dickey.Fuller Lag.Order p.value
Level -0.93123 6 0.94846
1st Difference -3.50738 6 0.04240
2nd Difference -9.48682 6 0.01000

Συνεπώς, είναι προφανές από τα αποτελέσματα του στατιστικού ελέγχου Dickey Fuller ότι η χρονοσειρά μου δεν είναι στάσιμη. Θα πρέπει να εφαρμόσω τον έλεγχο Dickey-Fuller στις δοαφορές.

PP test

H_0 : \text{The time series has a unit root} \\ H_1 : \text{The time series has not a unit root}

Show the code
get_pp_results = function(variable){

    # Ensure variable is numeric (e.g., a time series or numeric vector)
    if (!is.numeric(variable)) {
        stop("Input variable must be numeric.")
    }
    
    # Perform ADF tests
    pp_orig = pp.test(variable)
    pp_diff1 = pp.test(diff(variable, differences = 1))
    pp_diff2 = pp.test(diff(variable, differences = 2))
    
    # Extract results
    pp_table = data.frame(
        "Transformation" = c("Level", "1st Difference", "2nd Difference"),
        "Phillips-Perron" = c(
            pp_orig$statistic[["Dickey-Fuller Z(alpha)"]],
            pp_diff1$statistic[["Dickey-Fuller Z(alpha)"]],
            pp_diff2$statistic[["Dickey-Fuller Z(alpha)"]]
        ),
        "Lag Order" = c(
            pp_orig[["parameter"]][["Truncation lag parameter"]],
            pp_diff1[["parameter"]][["Truncation lag parameter"]],
            pp_diff2[["parameter"]][["Truncation lag parameter"]]
        ),
        "p-value" = c(
            pp_orig$p.value,
            pp_diff1$p.value,
            pp_diff2$p.value
        )
    ) %>%
        kbl(digits = 5) %>%
        kable_styling(full_width = FALSE)
    
    return(pp_table)
}
Show the code
get_pp_results(grc_unemployment$Value)
Phillips-Perron test results
Transformation Phillips.Perron Lag.Order p.value
Level 0.19401 5 0.99
1st Difference -279.81456 5 0.01
2nd Difference -337.13053 5 0.01

KPSS test

H_0 : \text{The time series has not a unit root} \\ H_1 : \text{Alternatively}

Show the code
get_kpss_results = function(variable){

    # Ensure variable is numeric (e.g., a time series or numeric vector)
    if (!is.numeric(variable)) {
        stop("Input variable must be numeric.")
    }
    
    # Perform ADF tests
    kpss_orig = kpss.test(variable)
    kpss_diff1 = kpss.test(diff(variable, differences = 1))
    kpss_diff2 = kpss.test(diff(variable, differences = 2))
    
    # Extract results
    pp_table = data.frame(
        "Transformation" = c("Level", "1st Difference", "2nd Difference"),
        "KPSS" = c(
            kpss_orig[["statistic"]][["KPSS Level"]],
            kpss_diff1[["statistic"]][["KPSS Level"]],
            kpss_diff2[["statistic"]][["KPSS Level"]]
        ),
        "Lag Order" = c(
            kpss_orig[["parameter"]][["Truncation lag parameter"]],
            kpss_diff1[["parameter"]][["Truncation lag parameter"]],
            kpss_diff2[["parameter"]][["Truncation lag parameter"]]
        ),
        "p-value" = c(
            kpss_orig$p.value,
            kpss_diff1$p.value,
            kpss_diff2$p.value
        )
    ) %>%
        kbl(digits = 5) %>%
        kable_styling(full_width = FALSE)
    
    return(pp_table)
}
Show the code
get_kpss_results(grc_unemployment$Value)
KPSS test results
Transformation KPSS Lag.Order p.value
Level 2.53719 5 0.01000
1st Difference 0.59510 5 0.02308
2nd Difference 0.01426 5 0.10000

Identify Model

Show the code
acf_plot_function(unemployment$Value, min=-0.2, max=1)
Show the code
pacf_plot_function(unemployment$Value, min=-0.2, max=1)
Show the code
acf_plot_function(grc_unemployment_diff1, min=-0.1, max = 0.6)
Show the code
pacf_plot_function(grc_unemployment_diff1, min=-0.2, max=0.5)
Show the code
acf_plot_function(grc_unemployment_diff2, min=-0.5, max = 0.5)
Show the code
pacf_plot_function(grc_unemployment_diff2, min=-0.5, max = 0.5)

Build Time Series Model

Show the code
auto_model <- auto.arima(grc_unemployment$Value, trace = T,seasonal = TRUE)

 Fitting models using approximations to speed things up...

 ARIMA(2,2,2)                    : 302.1399
 ARIMA(0,2,0)                    : 473.9345
 ARIMA(1,2,0)                    : 405.2316
 ARIMA(0,2,1)                    : 297.5442
 ARIMA(1,2,1)                    : 301.0253
 ARIMA(0,2,2)                    : 299.5392
 ARIMA(1,2,2)                    : 302.154

 Now re-fitting the best model(s) without approximations...

 ARIMA(0,2,1)                    : 296.71

 Best model: ARIMA(0,2,1)                    
Show the code
auto_model 
Series: grc_unemployment$Value 
ARIMA(0,2,1) 

Coefficients:
          ma1
      -0.9075
s.e.   0.0229

sigma^2 = 0.1597:  log likelihood = -146.33
AIC=296.67   AICc=296.71   BIC=304.02
Show the code
arimaModel_1=arima(grc_unemployment$Value, order=c(0,1,2))
arimaModel_1

Call:
arima(x = grc_unemployment$Value, order = c(0, 1, 2))

Coefficients:
         ma1     ma2
      0.1445  0.0760
s.e.  0.0588  0.0571

sigma^2 estimated as 0.1714:  log likelihood = -156.83,  aic = 319.67
Show the code
arimaModel_2=arima(grc_unemployment$Value, order=c(1,1,2))
arimaModel_2

Call:
arima(x = grc_unemployment$Value, order = c(1, 1, 2))

Coefficients:
         ar1      ma1     ma2
      0.9710  -0.9048  0.0168
s.e.  0.0185   0.0666  0.0629

sigma^2 estimated as 0.1569:  log likelihood = -144.19,  aic = 296.37
Show the code
arimaModel_3=arima(grc_unemployment$Value, order=c(9,2,1))
arimaModel_3

Call:
arima(x = grc_unemployment$Value, order = c(9, 2, 1))

Coefficients:
          ar1      ar2      ar3      ar4      ar5      ar6      ar7      ar8
      -0.7879  -0.7083  -0.6715  -0.6944  -0.5922  -0.3838  -0.3160  -0.4457
s.e.   0.1713   0.1539   0.1493   0.1434   0.1379   0.1172   0.0883   0.0718
          ar9      ma1
      -0.2704  -0.1456
s.e.   0.0690   0.1767

sigma^2 estimated as 0.1388:  log likelihood = -126.92,  aic = 275.83

Compare Models

After building some ARIMA models, I should decide which is the best one to make my estimations. One metric to evaluate those models is AIC (Akaike Information Criterion). The lower the value of AIC, the better my model.

Show the code
accuracy_table <- data.frame(
                             "Name of Model" = c("Auto Model", "ARIMA Candidate #1", "ARIMA Candidate #2", "ARIMA Candidate #3"),
                             Model = c("ARIMA(0,2,1)", "ARIMA(0,1,2)", "ARIMA(1,1,2)", "ARIMA(9,2,1)"),
                             AIC =c(auto_model$aic, arimaModel_1$aic, arimaModel_2$aic, arimaModel_3$aic)
)


accuracy_table %>% kbl() %>% kable_styling()
Name.of.Model Model AIC
Auto Model ARIMA(0,2,1) 296.6684
ARIMA Candidate #1 ARIMA(0,1,2) 319.6669
ARIMA Candidate #2 ARIMA(1,1,2) 296.3705
ARIMA Candidate #3 ARIMA(9,2,1) 275.8314

So, the best model is the Auto Model (ARIMA(9,2,1)), which has the lowest AIC value.

Checking best models

Forecast Future Unemployment

Previously, I identify which is the best model. Now, I will use this model in order to predict unemployment for the next 6 months. It should be recalled that the last available value was from August of 2022 (12.2%). Therefore, I will make a prediction for unemployment in Greece until February 2023.

ARIMA (0,2,1) forecasts

Show the code
s = forecast(auto_model,20)

month = seq(max(grc_unemployment$TIME) %m+% months(1),  # start at next month
              by = "1 month",
              length.out = 20)

highchart() %>%
    hc_title(text = "Prediction of Unemployment with ARIMA(0,2,1)") %>%
    hc_xAxis(type = "datetime")  %>%
    hc_yAxis(title = list(text="Unemployment (%)")) %>%
  hc_add_series(name="Historic Data",
    data = list_parse2(data.frame(x = datetime_to_timestamp(grc_unemployment$TIME), y = grc_unemployment$Value)),
    type = "line"
  )  %>%
  hc_add_series(name = "Forecast",
                  data = list_parse2(data.frame(
                      x = datetime_to_timestamp(month),
                      y = round(s$mean, digits=2)
                  )),
                  type = "line",
                  color = "#d62728") %>%
    hc_add_series(name = "Confidence Interval",
                data = list_parse2(data.frame(
                  x = datetime_to_timestamp(month),
                  low = round(s$lower[, "80%"], digits=2),
                  high = round(s$upper[, "80%"], digits=2)
                )),
                type = "arearange",
                color = hex_to_rgba("#d62728", 0.2),
                linkedTo = ":previous")
Warning: `unite_()` was deprecated in tidyr 1.2.0.
ℹ Please use `unite()` instead.
ℹ The deprecated feature was likely used in the highcharter package.
  Please report the issue at <https://github.com/jbkunst/highcharter/issues>.

ARIMA (9,2,1) forecasts

Show the code
s1 = forecast(arimaModel_3,20)

month = seq(max(grc_unemployment$TIME) %m+% months(1),  # start at next month
              by = "1 month",
              length.out = 20)

highchart() %>%
    hc_title(text = "Prediction of Unemployment with ARIMA(0,2,1)") %>%
    hc_xAxis(type = "datetime")  %>%
    hc_yAxis(title = list(text="Unemployment (%)")) %>%
  hc_add_series(name="Historic Data",
    data = list_parse2(data.frame(x = datetime_to_timestamp(grc_unemployment$TIME), y = grc_unemployment$Value)),
    type = "line"
  )  %>%
  hc_add_series(name = "Forecast",
                  data = list_parse2(data.frame(
                      x = datetime_to_timestamp(month),
                      y = round(s1$mean, digits=2)
                  )),
                  type = "line",
                  color = "#d62728") %>%
    hc_add_series(name = "Confidence Interval",
                data = list_parse2(data.frame(
                  x = datetime_to_timestamp(month),
                  low = round(s1$lower[, "80%"], digits=2),
                  high = round(s1$upper[, "80%"], digits=2)
                )),
                type = "arearange",
                color = hex_to_rgba("#d62728", 0.2),
                linkedTo = ":previous")

Results

Given the diagram as well as the forecast table (of the best performing model, candidate #3), I conclude that a reduction in unemployment in Greece is expected in the next period of time (in the next six months). More specifically, Greece’s unemployment in February 2023 will range between 10% (9.4%) and 13% (14%) with an 80% (95%) probability (based on ARIMA(9,2,1) model).

Acknowledgements

References

Applied time series analysis. (n.d.). Retrieved October 19, 2022, from https://online.stat.psu.edu/stat510/
Gohel, D., & Skintzos, P. (2024). Flextable: Functions for tabular reporting. Retrieved from https://ardata-fr.github.io/flextable-book/
Grolemund, G., & Wickham, H. (2011). Dates and times made easy with lubridate. Journal of Statistical Software, 40(3), 1–25. Retrieved from https://www.jstatsoft.org/v40/i03/
Hyndman, R. J., & Athanasopoulos, G. (2018). Forecasting: Principles and practice. OTexts.
Hyndman, R. J., & Khandakar, Y. (2008). Automatic time series forecasting: The forecast package for R. Journal of Statistical Software, 27(3), 1–22. https://doi.org/10.18637/jss.v027.i03
Hyndman, R., Athanasopoulos, G., Bergmeir, C., Caceres, G., Chhay, L., Kuroptev, K., … Yasmeen, F. (2025). Forecast: Forecasting functions for time series and linear models. Retrieved from https://pkg.robjhyndman.com/forecast/
Kunst, J. (2022). Highcharter: A wrapper for the highcharts library. Retrieved from https://jkunst.com/highcharter/
OECD. (2022). Unemployment rate (indicator). Retrieved October 14, 2022, from doi: 10.1787/52570002-en
R Core Team. (2025). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. Retrieved from https://www.R-project.org/
R Graph Gallery. (2018). Time series. URL Https://r-Graph-Gallery.com/Time-Series.html.
Spinu, V., Grolemund, G., & Wickham, H. (2024). Lubridate: Make dealing with dates a little easier. Retrieved from https://lubridate.tidyverse.org
Trapletti, A., & Hornik, K. (2024). Tseries: Time series analysis and computational finance. https://doi.org/10.32614/CRAN.package.tseries
Wickham, H., François, R., Henry, L., Müller, K., & Vaughan, D. (2023). Dplyr: A grammar of data manipulation. Retrieved from https://dplyr.tidyverse.org
Wickham, H., Hester, J., & Bryan, J. (2024). Readr: Read rectangular text data. Retrieved from https://readr.tidyverse.org
Zhu, H. (2024). kableExtra: Construct complex table with kable and pipe syntax. Retrieved from http://haozhu233.github.io/kableExtra/

Citation

BibTeX citation:
@online{2022,
  author = {, stesiam},
  title = {Forecasting {Unemployment} in {Greece}},
  date = {2022-10-22},
  url = {https://stesiam.com/posts/forecasting-greek-unemployment/},
  langid = {en}
}
For attribution, please cite this work as:
stesiam. (2022, October 22). Forecasting Unemployment in Greece. Retrieved from https://stesiam.com/posts/forecasting-greek-unemployment/