Forecasting Unemployment in Greece

Introduction

Background

Time series

Methodology

Short Answer

If you are in a hurry, I predicted that unemployment on Greece is expected to further reduce. It will range between 10% - 13% in February, 2023.

Prerequisites

Import Libraries

For this analysis we will need standard libraries for importing and processing my data, such as readr (Wickham, Hester, & Bryan, 2024) and dplyr (Wickham, François, Henry, Müller, & Vaughan, 2023). The kableExtra (Zhu, 2024) package was used to print the results in table format, while the flextable (Gohel & Skintzos, 2024) package was used to print the results of the Dickey-Fuller and KPSS tests.

Then, due to the nature of the data (time series) it was deemed necessary to use relevant libraries such as lubridate(Spinu, Grolemund, & Wickham, 2024), tseries(Trapletti & Hornik, 2024) & forecast(R. Hyndman et al., 2025) packages.

Finally, the ggplot2 (R-ggplot2?) package was used to create some visualizations, as well as an auxiliary package, ggtext (R-ggtext?), for further formatting those.

Show the code

# General purpose R libraries
library(dplyr)
library(readr)
library(kableExtra)
library(flextable)

# Graphs
library(highcharter)

# Time Series 

library(lubridate)
library(tseries)
library(forecast)

Show the code

acf_plot_function = function(data, min = -1, max = 1){
  ts_data <- data
  N <- length(ts_data)

# Compute confidence limit
conf_limit <- qnorm(0.975) / sqrt(N)

# Use forecast::Acf (biased estimator, recommended for ARIMA)
acf_obj <- Acf(ts_data, plot = FALSE, lag.max = 20)

# Extract lags and values
# Note: Acf() returns acf_obj$acf as a matrix; acf_obj$lag as a matrix
lags_acf <- as.numeric(acf_obj$lag[-1])             # skip lag 0
acf_values <- as.numeric(acf_obj$acf[-1])           # skip lag 0

# Build the highcharter ACF plot
acf_chart <- highchart() %>%
  hc_chart(type = "column") %>%
  hc_title(text = "Autocorrelation Plot") %>%
  hc_xAxis(categories = lags_acf, title = list(text = "Lag")) %>%
  hc_yAxis(min = min, max = max,
    title = list(text = "ACF"),
    plotLines = list(
      list(value = 0, color = "#000000", width = 3),
      list(value = conf_limit, color = "#FF0000", dashStyle = "ShortDash", width = 1),
      list(value = -conf_limit, color = "#FF0000", dashStyle = "ShortDash", width = 1)
    )
  ) %>%
  hc_legend(enabled = FALSE) %>%
  hc_add_series(name = "ACF", data = round(acf_values, digits = 4))

# Show chart
acf_chart
}

pacf_plot_function <- function(data, min = -1, max = 1) {
  ts_data <- data
  N <- length(ts_data)
  
  # Compute confidence limit (same formula as for ACF)
  conf_limit <- qnorm(0.975) / sqrt(N)
  
  # Compute PACF using forecast::Pacf (biased estimator, recommended for ARIMA)
  pacf_obj <- forecast::Pacf(ts_data, plot = FALSE, lag.max = 20)
  
  # Extract lags and PACF values
  lags_pacf <- as.numeric(pacf_obj$lag)  # already excludes lag 0
  pacf_values <- as.numeric(pacf_obj$acf)
  
  # Build the highcharter PACF plot
  pacf_chart <- highcharter::highchart() %>%
    highcharter::hc_chart(type = "column") %>%
    highcharter::hc_title(text = "Partial Autocorrelation Plot") %>%
    highcharter::hc_xAxis(categories = lags_pacf, title = list(text = "Lag")) %>%
    highcharter::hc_yAxis(
      min = min, max = max,
      title = list(text = "PACF"),
      plotLines = list(
        list(value = 0, color = "#000000", width = 3),
        list(value = conf_limit, color = "#FF0000", dashStyle = "ShortDash", width = 1),
        list(value = -conf_limit, color = "#FF0000", dashStyle = "ShortDash", width = 1)
      )
    ) %>%
    highcharter::hc_legend(enabled = FALSE) %>%
    highcharter::hc_add_series(name = "PACF", data = round(pacf_values, digits = 4))
  
  # Show chart
  pacf_chart
}

Import dataset

After loading the libraries I am able to use the commands of the readr package to import my data. My data is in .csv format, so I’ll use the read_csv() command (Wickham et al., 2024) to import them.

Additionally, I choose not to include EA-19 values (as I investigate Greece’s unemployment).

Show the code

unemployment <- read_csv("data/unemployment.csv") %>%
  select(LOCATION, TIME, Value) %>% filter(LOCATION != "EA19")

Preview Dataset

Show the code

head(unemployment, 10) %>%
  kbl(., 
    align = 'c',
    booktabs = T,
    centering = T,
    valign = T) %>%
  kable_styling()

Table 1: Preview Dataset (first 6 rows)

LOCATION	TIME	Value
GRC	1998-04	10.9
GRC	1998-05	11.0
GRC	1998-06	10.9
GRC	1998-07	11.0
GRC	1998-08	11.2
GRC	1998-09	11.1
GRC	1998-10	11.1
GRC	1998-11	11.4
GRC	1998-12	11.4
GRC	1999-01	11.4

Dataset Structure

Our dataset is consisted by 3 variables (columns). More specifically, concerning my variables, are as follows :

Variable	Property	Description
`LOCATION`	qualitative (nominal)	Specific country’s statistics
`TIME`	qualitative (ordinal)	Month of the reported data
`Value`	quantitative (continuous)	Unemployment at the given Time and Country

Thus, my sample has 3 variables, of which 2 are qualitative and 1 is quantitative property.

Time Series Preprocessing

The TIME variable needs to be a Date variable which is not fulfilled on our case.

Show the code

sapply(unemployment, class) %>% 
  kbl() %>% kable_styling(full_width = F, position = "center")

	x
LOCATION	character
TIME	character
Value	numeric

So, above I see that I have dates in a format “YYYY-MM” (Year - Month) and they are considered as characters. With the help of lubridate package I will convert my time series on a Date format.

Show the code

unemployment$TIME <- lubridate::ym(unemployment$TIME)

And let’s check again :

Show the code

sapply(unemployment, class) %>% kbl() %>% kable_styling(full_width = F, position = "center")

	x
LOCATION	character
TIME	Date
Value	numeric

And now I got the Date format. I am able to continue my analysis.

Missing Values

Show the code

sum(is.na(unemployment))

Good news! On this dataset there are 0 missing values, in total.

In case the dataset had missing values, I would first look at which variables those were.In a second phase, it might be necessary to replace the missing values.

Descriptive Statistics

The unemployment data for Greece refer to the period from April 1998 το August 2022. Regarding Europe we have data from January 2000 to August 2022.

The last 20 years have seen particularly large changes mainly in Greece, due to the domestic economic crisis. There is a noticeable change between September 2010 (10.1%) and September 2013 (28.1 %), in terms of unemployment in Greece. For the record, below you can find a table with the months that presented the highest unemployment in Greece. As you will notice the five highest values were observed in 2013.

Show the code

unemployment %>% dplyr::filter(LOCATION == "GRC") %>% select(TIME, Value) %>% as.data.frame() %>%
  arrange(Value) %>% tail(5) %>% kbl() %>% kable_styling(full_width = F, position = "center")

	TIME	Value
289	2013-11-01	27.9
290	2013-04-01	28.0
291	2013-05-01	28.0
292	2013-07-01	28.1
293	2013-09-01	28.1

Accordingly, the highest unemployment rates observed at the European level are the following:

Show the code

unemployment %>% dplyr::filter(LOCATION == "EU27_2020") %>% select(TIME, Value)  %>%
  as.data.frame() %>% arrange(Value) %>% tail(5) %>% kbl() %>% kable_styling(full_width = F, position = "center")

	TIME	Value
268	2013-06-01	11.6
269	2013-01-01	11.7
270	2013-02-01	11.7
271	2013-03-01	11.7
272	2013-04-01	11.7

The lowest unemployment ever recorded in Greece was in the period of 2008:

Show the code

unemployment %>% dplyr::filter(LOCATION == "GRC") %>% select(TIME, Value) %>% as.data.frame() %>%
  arrange(Value) %>% head(5) %>% kbl() %>% kable_styling(full_width = F, position = "center")

TIME	Value
2008-05-01	7.4
2008-06-01	7.6
2008-07-01	7.6
2008-10-01	7.6
2008-01-01	7.7

On the other hand, at the moment in Europe we are at the lowest levels of the last 20 years, with unemployment hovering around 6%.

Show the code

unemployment %>% dplyr::filter(LOCATION == "EU27_2020") %>% select(TIME, Value) %>%
  as.data.frame() %>% arrange(Value) %>% head(5) %>% kbl() %>% kable_styling(full_width = F, position = "center")

TIME	Value
2022-07-01	6.0
2022-08-01	6.0
2022-04-01	6.1
2022-05-01	6.1
2022-06-01	6.1

Finally, it is evident that in the case of Greece the changes are more intense than in Europe.

All of the above can be summarized by the following graph:

Show the code

g = unemployment %>%
  mutate(Year = year(TIME)) %>%
  group_by(LOCATION, Year) %>%
  summarise(mean_unemployment = mean(Value))

`summarise()` has grouped output by 'LOCATION'. You can override using the
`.groups` argument.

Show the code

highchart() %>%
    hc_chart(type = "line") %>%
    hc_title(text = "Unemployment Rates") %>%
    hc_xAxis(title = list(text = "Year")) %>%
    hc_yAxis(title = list(text = "Value")) %>%
    hc_tooltip(shared = TRUE, valueDecimals = 2, valueSuffix = " %") %>%
    hc_add_series(
        data = g,
        type = "line",
        hcaes(x = Year, y = mean_unemployment, group = LOCATION)
    ) %>%
  hc_plotOptions(
  series = list(
    dataLabels = list(
      enabled = TRUE,
      formatter = JS("function() {
        if (this.point.index === this.series.data.length - 1) {
          return this.y.toFixed(2) + ' %';
        }
        return null;
      }")
    )
  )
)

Examine trend and seasonality

In time series analysis, it’s important to distinguish where the variability in the final form of the series originates from. Time series consist of three main components: trend, seasonality, and randomness. The trend refers to whether the series shows a specific direction (upward/downward). Seasonality refers to recurring patterns at regular intervals.

y_t = S_t + T_t + E_t Όπου:

y_t δηλώνουν τα δεδομένα που έχουμε διαθέσιμα,
S_t
T_t
E_t

Similarly, the multiplicative model is defined:

y_t = S_t \cdot T_t \cdot E_t

where the components that make up the time series are multiplied instead of added.

Examine stationarity

Definition of stationarity

An important concept in studying time series is stationarity. A time series is called stationary (“Applied Time Series Analysis,” n.d.) if:

E(X_t):\text{constant}
Var(X_t): \text{constant}
Cov(X_t, X_s): \text{constant}

Examine Stationarity Graphically

It is apparent that there is a big variation on the values of unemployment. This time series is not stationary and the differencing is justified.

Show the code

highchart() %>%
    hc_chart(type = "line") %>%
    hc_title(text = "Unemployment Rates") %>%
    hc_xAxis(title = list(text = "Year")) %>%
    hc_yAxis(title = list(text = "Value")) %>%
    hc_tooltip(shared = TRUE, valueDecimals = 2, valueSuffix = " %") %>%
    hc_add_series(
        data = g,
        type = "line",
        hcaes(x = Year, y = mean_unemployment, group = LOCATION)
    ) %>%
  hc_plotOptions(
  series = list(
    dataLabels = list(
      enabled = TRUE,
      formatter = JS("function() {
        if (this.point.index === this.series.data.length - 1) {
          return this.y.toFixed(2) + ' %';
        }
        return null;
      }")
    )
  )
)

Here we can see a big improvement in comparison with original data. I have some concerns about points close to 150 (mildly upwards trend) and 250 (outlier).

Show the code

grc_unemployment = unemployment %>% dplyr::filter(LOCATION == "GRC")
grc_unemployment_diff1 <- diff(grc_unemployment$Value, differences = 1)

highchart() %>%
    hc_chart(type = "line") %>%
    hc_title(text = "Unemployment Rates") %>%
    hc_xAxis(title = list(text = "Year")) %>%
    hc_yAxis(title = list(text = "Value")) %>%
    hc_tooltip(shared = TRUE, valueDecimals = 2, valueSuffix = " %") %>%
    hc_add_series(
        data = grc_unemployment_diff1
    )

Given the concerns of above, I made also a second difference plot. It seems to solve the problem on points close to 150.

Show the code

grc_unemployment_diff2<- diff(grc_unemployment$Value, differences = 2)

highchart() %>%
    hc_chart(type = "line") %>%
    hc_title(text = "Unemployment Rates") %>%
    hc_xAxis(title = list(text = "Year")) %>%
    hc_yAxis(title = list(text = "Value")) %>%
    hc_tooltip(shared = TRUE, valueDecimals = 2, valueSuffix = " %") %>%
    hc_add_series(
        data = grc_unemployment_diff2
    )

Examine Stationarity with Statistical tests

The graphical interpretation of stationarity can be beneficial for a quick assessment on topic of stationarity. However it can be considered a subjective metric, which leads on a non consistent decision (someone may consider the second figure as stationary and some others not.

Thankfully, there are some statistical tests which can help us on our decisions. Some commonly used are :

Augmented Dickey-Fuller (ADF) test
Phillips- Perron (PP) test
Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test
Zivot-Andrews (ZA) test

But an other problem arises. We are not in a fancy IDE with menus and checkboxes. We use R and its packages. There are many packages that have already write functions to study stationarity with the aforementioned tests. Some of them are:

tseries
urca
CADFtest

We should carefully evaluate our choices, as these packages advertise similar functionality but differ significantly in the scope of their functions and, more importantly, in the level of flexibility they offer for handling time series characteristics. In short, the tseries package is the one I’m most familiar with, but it is also the most limited in terms of configurable parameters. Although many tutorials use this package, I found that I couldn’t set the number of lags for tests or specify characteristics like trends when applicable. On the other hand, the urca package addresses these limitations by allowing users to specify both the trend component and the number of lags in the respective tests. One minor drawback of urca, however, is that it only provides critical values, not p-values.

Summary

Show the code

summary_stationarity_results <- data.frame(
                             "Type" = c("levels", "Diff(GRC)", "Diff2(GRC)"),
                             "ADF test" = c("Non Stationary", "Stationary", "Stationary"),
                              "PP test" = c("Non Stationary", "Stationary", "Stationary"),
                             "KPSS test" =c("Non Stationary", "Non Stationary", "Stationary")
)


summary_stationarity_results  %>% kbl() %>% kable_styling()

Type	ADF.test	PP.test	KPSS.test
levels	Non Stationary	Non Stationary	Non Stationary
Diff(GRC)	Stationary	Stationary	Non Stationary
Diff2(GRC)	Stationary	Stationary	Stationary

ADF test

One of the most

H_0 : \text{The time series has a unit root} \\ H_1 : \text{The time series has not a unit root} Selecting the lags

Show the code

library(vars)

Loading required package: MASS


Attaching package: 'MASS'

The following object is masked from 'package:dplyr':

    select

Loading required package: strucchange

Loading required package: zoo


Attaching package: 'zoo'

The following objects are masked from 'package:base':

    as.Date, as.Date.numeric

Loading required package: sandwich

Loading required package: urca

Loading required package: lmtest


Attaching package: 'lmtest'

The following object is masked _by_ '.GlobalEnv':

    unemployment

The following object is masked from 'package:highcharter':

    unemployment

Show the code

G = VARselect(grc_unemployment$Value, lag.max = 20, type = "const")

Show the code

get_adf_results = function(variable){

    # Ensure variable is numeric (e.g., a time series or numeric vector)
    if (!is.numeric(variable)) {
        stop("Input variable must be numeric.")
    }
    
    # Perform ADF tests
    adf_orig = adf.test(variable)
    adf_diff1 = adf.test(diff(variable, differences = 1))
    adf_diff2 = adf.test(diff(variable, differences = 2))
    
    # Extract results
    adf_table = data.frame(
        "Transformation" = c("Level", "1st Difference", "2nd Difference"),
        "Dickey-Fuller" = c(
            adf_orig$statistic[["Dickey-Fuller"]],
            adf_diff1$statistic[["Dickey-Fuller"]],
            adf_diff2$statistic[["Dickey-Fuller"]]
        ),
        "Lag Order" = c(
            adf_orig$parameter[["Lag order"]],
            adf_diff1$parameter[["Lag order"]],
            adf_diff2$parameter[["Lag order"]]
        ),
        "p-value" = c(
            adf_orig$p.value,
            adf_diff1$p.value,
            adf_diff2$p.value
        )
    ) %>%
        kbl(digits = 5) %>%
        kable_styling(full_width = FALSE)
    
    return(adf_table)
}

Show the code

get_adf_results(grc_unemployment$Value)

Augmented Dickey-Fuller test results
Transformation	Dickey.Fuller	Lag.Order	p.value
Level	-0.93123	6	0.94846
1st Difference	-3.50738	6	0.04240
2nd Difference	-9.48682	6	0.01000

Συνεπώς, είναι προφανές από τα αποτελέσματα του στατιστικού ελέγχου Dickey Fuller ότι η χρονοσειρά μου δεν είναι στάσιμη. Θα πρέπει να εφαρμόσω τον έλεγχο Dickey-Fuller στις δοαφορές.

PP test

H_0 : \text{The time series has a unit root} \\ H_1 : \text{The time series has not a unit root}

Show the code

get_pp_results = function(variable){

    # Ensure variable is numeric (e.g., a time series or numeric vector)
    if (!is.numeric(variable)) {
        stop("Input variable must be numeric.")
    }
    
    # Perform ADF tests
    pp_orig = pp.test(variable)
    pp_diff1 = pp.test(diff(variable, differences = 1))
    pp_diff2 = pp.test(diff(variable, differences = 2))
    
    # Extract results
    pp_table = data.frame(
        "Transformation" = c("Level", "1st Difference", "2nd Difference"),
        "Phillips-Perron" = c(
            pp_orig$statistic[["Dickey-Fuller Z(alpha)"]],
            pp_diff1$statistic[["Dickey-Fuller Z(alpha)"]],
            pp_diff2$statistic[["Dickey-Fuller Z(alpha)"]]
        ),
        "Lag Order" = c(
            pp_orig[["parameter"]][["Truncation lag parameter"]],
            pp_diff1[["parameter"]][["Truncation lag parameter"]],
            pp_diff2[["parameter"]][["Truncation lag parameter"]]
        ),
        "p-value" = c(
            pp_orig$p.value,
            pp_diff1$p.value,
            pp_diff2$p.value
        )
    ) %>%
        kbl(digits = 5) %>%
        kable_styling(full_width = FALSE)
    
    return(pp_table)
}

Show the code

get_pp_results(grc_unemployment$Value)

Phillips-Perron test results
Transformation	Phillips.Perron	Lag.Order	p.value
Level	0.19401	5	0.99
1st Difference	-279.81456	5	0.01
2nd Difference	-337.13053	5	0.01

KPSS test

H_0 : \text{The time series has not a unit root} \\ H_1 : \text{Alternatively}

Show the code

get_kpss_results = function(variable){

    # Ensure variable is numeric (e.g., a time series or numeric vector)
    if (!is.numeric(variable)) {
        stop("Input variable must be numeric.")
    }
    
    # Perform ADF tests
    kpss_orig = kpss.test(variable)
    kpss_diff1 = kpss.test(diff(variable, differences = 1))
    kpss_diff2 = kpss.test(diff(variable, differences = 2))
    
    # Extract results
    pp_table = data.frame(
        "Transformation" = c("Level", "1st Difference", "2nd Difference"),
        "KPSS" = c(
            kpss_orig[["statistic"]][["KPSS Level"]],
            kpss_diff1[["statistic"]][["KPSS Level"]],
            kpss_diff2[["statistic"]][["KPSS Level"]]
        ),
        "Lag Order" = c(
            kpss_orig[["parameter"]][["Truncation lag parameter"]],
            kpss_diff1[["parameter"]][["Truncation lag parameter"]],
            kpss_diff2[["parameter"]][["Truncation lag parameter"]]
        ),
        "p-value" = c(
            kpss_orig$p.value,
            kpss_diff1$p.value,
            kpss_diff2$p.value
        )
    ) %>%
        kbl(digits = 5) %>%
        kable_styling(full_width = FALSE)
    
    return(pp_table)
}

Show the code

get_kpss_results(grc_unemployment$Value)

KPSS test results
Transformation	KPSS	Lag.Order	p.value
Level	2.53719	5	0.01000
1st Difference	0.59510	5	0.02308
2nd Difference	0.01426	5	0.10000

Identify Model

Show the code

acf_plot_function(unemployment$Value, min=-0.2, max=1)

Show the code

pacf_plot_function(unemployment$Value, min=-0.2, max=1)

Show the code

acf_plot_function(grc_unemployment_diff1, min=-0.1, max = 0.6)

Show the code

pacf_plot_function(grc_unemployment_diff1, min=-0.2, max=0.5)

Show the code

acf_plot_function(grc_unemployment_diff2, min=-0.5, max = 0.5)

Show the code

pacf_plot_function(grc_unemployment_diff2, min=-0.5, max = 0.5)

Build Time Series Model

Show the code

auto_model <- auto.arima(grc_unemployment$Value, trace = T,seasonal = TRUE)


 Fitting models using approximations to speed things up...

 ARIMA(2,2,2)                    : 302.1399
 ARIMA(0,2,0)                    : 473.9345
 ARIMA(1,2,0)                    : 405.2316
 ARIMA(0,2,1)                    : 297.5442
 ARIMA(1,2,1)                    : 301.0253
 ARIMA(0,2,2)                    : 299.5392
 ARIMA(1,2,2)                    : 302.154

 Now re-fitting the best model(s) without approximations...

 ARIMA(0,2,1)                    : 296.71

 Best model: ARIMA(0,2,1)

Show the code

auto_model

Series: grc_unemployment$Value 
ARIMA(0,2,1) 

Coefficients:
          ma1
      -0.9075
s.e.   0.0229

sigma^2 = 0.1597:  log likelihood = -146.33
AIC=296.67   AICc=296.71   BIC=304.02

Show the code

arimaModel_1=arima(grc_unemployment$Value, order=c(0,1,2))
arimaModel_1


Call:
arima(x = grc_unemployment$Value, order = c(0, 1, 2))

Coefficients:
         ma1     ma2
      0.1445  0.0760
s.e.  0.0588  0.0571

sigma^2 estimated as 0.1714:  log likelihood = -156.83,  aic = 319.67

Show the code

arimaModel_2=arima(grc_unemployment$Value, order=c(1,1,2))
arimaModel_2


Call:
arima(x = grc_unemployment$Value, order = c(1, 1, 2))

Coefficients:
         ar1      ma1     ma2
      0.9710  -0.9048  0.0168
s.e.  0.0185   0.0666  0.0629

sigma^2 estimated as 0.1569:  log likelihood = -144.19,  aic = 296.37

Show the code

arimaModel_3=arima(grc_unemployment$Value, order=c(9,2,1))
arimaModel_3


Call:
arima(x = grc_unemployment$Value, order = c(9, 2, 1))

Coefficients:
          ar1      ar2      ar3      ar4      ar5      ar6      ar7      ar8
      -0.7879  -0.7083  -0.6715  -0.6944  -0.5922  -0.3838  -0.3160  -0.4457
s.e.   0.1713   0.1539   0.1493   0.1434   0.1379   0.1172   0.0883   0.0718
          ar9      ma1
      -0.2704  -0.1456
s.e.   0.0690   0.1767

sigma^2 estimated as 0.1388:  log likelihood = -126.92,  aic = 275.83

Compare Models

After building some ARIMA models, I should decide which is the best one to make my estimations. One metric to evaluate those models is AIC (Akaike Information Criterion). The lower the value of AIC, the better my model.

Show the code

accuracy_table <- data.frame(
                             "Name of Model" = c("Auto Model", "ARIMA Candidate #1", "ARIMA Candidate #2", "ARIMA Candidate #3"),
                             Model = c("ARIMA(0,2,1)", "ARIMA(0,1,2)", "ARIMA(1,1,2)", "ARIMA(9,2,1)"),
                             AIC =c(auto_model$aic, arimaModel_1$aic, arimaModel_2$aic, arimaModel_3$aic)
)


accuracy_table %>% kbl() %>% kable_styling()

Name.of.Model	Model	AIC
Auto Model	ARIMA(0,2,1)	296.6684
ARIMA Candidate #1	ARIMA(0,1,2)	319.6669
ARIMA Candidate #2	ARIMA(1,1,2)	296.3705
ARIMA Candidate #3	ARIMA(9,2,1)	275.8314

So, the best model is the Auto Model (ARIMA(9,2,1)), which has the lowest AIC value.

Checking best models

Forecast Future Unemployment

Previously, I identify which is the best model. Now, I will use this model in order to predict unemployment for the next 6 months. It should be recalled that the last available value was from August of 2022 (12.2%). Therefore, I will make a prediction for unemployment in Greece until February 2023.

ARIMA (0,2,1) forecasts

Show the code

s = forecast(auto_model,20)

month = seq(max(grc_unemployment$TIME) %m+% months(1),  # start at next month
              by = "1 month",
              length.out = 20)

highchart() %>%
    hc_title(text = "Prediction of Unemployment with ARIMA(0,2,1)") %>%
    hc_xAxis(type = "datetime")  %>%
    hc_yAxis(title = list(text="Unemployment (%)")) %>%
  hc_add_series(name="Historic Data",
    data = list_parse2(data.frame(x = datetime_to_timestamp(grc_unemployment$TIME), y = grc_unemployment$Value)),
    type = "line"
  )  %>%
  hc_add_series(name = "Forecast",
                  data = list_parse2(data.frame(
                      x = datetime_to_timestamp(month),
                      y = round(s$mean, digits=2)
                  )),
                  type = "line",
                  color = "#d62728") %>%
    hc_add_series(name = "Confidence Interval",
                data = list_parse2(data.frame(
                  x = datetime_to_timestamp(month),
                  low = round(s$lower[, "80%"], digits=2),
                  high = round(s$upper[, "80%"], digits=2)
                )),
                type = "arearange",
                color = hex_to_rgba("#d62728", 0.2),
                linkedTo = ":previous")

Warning: `unite_()` was deprecated in tidyr 1.2.0.
ℹ Please use `unite()` instead.
ℹ The deprecated feature was likely used in the highcharter package.
  Please report the issue at <https://github.com/jbkunst/highcharter/issues>.

ARIMA (9,2,1) forecasts

Show the code

s1 = forecast(arimaModel_3,20)

month = seq(max(grc_unemployment$TIME) %m+% months(1),  # start at next month
              by = "1 month",
              length.out = 20)

highchart() %>%
    hc_title(text = "Prediction of Unemployment with ARIMA(0,2,1)") %>%
    hc_xAxis(type = "datetime")  %>%
    hc_yAxis(title = list(text="Unemployment (%)")) %>%
  hc_add_series(name="Historic Data",
    data = list_parse2(data.frame(x = datetime_to_timestamp(grc_unemployment$TIME), y = grc_unemployment$Value)),
    type = "line"
  )  %>%
  hc_add_series(name = "Forecast",
                  data = list_parse2(data.frame(
                      x = datetime_to_timestamp(month),
                      y = round(s1$mean, digits=2)
                  )),
                  type = "line",
                  color = "#d62728") %>%
    hc_add_series(name = "Confidence Interval",
                data = list_parse2(data.frame(
                  x = datetime_to_timestamp(month),
                  low = round(s1$lower[, "80%"], digits=2),
                  high = round(s1$upper[, "80%"], digits=2)
                )),
                type = "arearange",
                color = hex_to_rgba("#d62728", 0.2),
                linkedTo = ":previous")

Results

Given the diagram as well as the forecast table (of the best performing model, candidate #3), I conclude that a reduction in unemployment in Greece is expected in the next period of time (in the next six months). More specifically, Greece’s unemployment in February 2023 will range between 10% (9.4%) and 13% (14%) with an 80% (95%) probability (based on ARIMA(9,2,1) model).

Acknowledgements

Image by Rosy / Bad Homburg / Germany from Pixabay

References

Applied time series analysis. (n.d.). Retrieved October 19, 2022, from https://online.stat.psu.edu/stat510/

Gohel, D., & Skintzos, P. (2024). Flextable: Functions for tabular reporting. Retrieved from https://ardata-fr.github.io/flextable-book/

Grolemund, G., & Wickham, H. (2011). Dates and times made easy with lubridate. Journal of Statistical Software, 40(3), 1–25. Retrieved from https://www.jstatsoft.org/v40/i03/

Hyndman, R. J., & Athanasopoulos, G. (2018). Forecasting: Principles and practice. OTexts.

Hyndman, R. J., & Khandakar, Y. (2008). Automatic time series forecasting: The forecast package for R. Journal of Statistical Software, 27(3), 1–22. https://doi.org/10.18637/jss.v027.i03

Hyndman, R., Athanasopoulos, G., Bergmeir, C., Caceres, G., Chhay, L., Kuroptev, K., … Yasmeen, F. (2025). Forecast: Forecasting functions for time series and linear models. Retrieved from https://pkg.robjhyndman.com/forecast/

Kunst, J. (2022). Highcharter: A wrapper for the highcharts library. Retrieved from https://jkunst.com/highcharter/

OECD. (2022). Unemployment rate (indicator). Retrieved October 14, 2022, from doi: 10.1787/52570002-en

R Core Team. (2025). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. Retrieved from https://www.R-project.org/

R Graph Gallery. (2018). Time series. URL Https://r-Graph-Gallery.com/Time-Series.html.

Spinu, V., Grolemund, G., & Wickham, H. (2024). Lubridate: Make dealing with dates a little easier. Retrieved from https://lubridate.tidyverse.org

Trapletti, A., & Hornik, K. (2024). Tseries: Time series analysis and computational finance. https://doi.org/10.32614/CRAN.package.tseries

Wickham, H., François, R., Henry, L., Müller, K., & Vaughan, D. (2023). Dplyr: A grammar of data manipulation. Retrieved from https://dplyr.tidyverse.org

Wickham, H., Hester, J., & Bryan, J. (2024). Readr: Read rectangular text data. Retrieved from https://readr.tidyverse.org

Zhu, H. (2024). kableExtra: Construct complex table with kable and pipe syntax. Retrieved from http://haozhu233.github.io/kableExtra/

Citation

BibTeX citation:

@online{2022,
  author = {, stesiam},
  title = {Forecasting {Unemployment} in {Greece}},
  date = {2022-10-22},
  url = {https://stesiam.com/posts/forecasting-greek-unemployment/},
  langid = {en}
}

For attribution, please cite this work as:

stesiam. (2022, October 22). Forecasting Unemployment in Greece. Retrieved from https://stesiam.com/posts/forecasting-greek-unemployment/