Introduction

Background

Unemployment has been a chronic problem in our country, as it has historically been at higher levels than the European average and OECD countries over the last 25 years. The phenomenon worsened during the years of the economic crisis, when at its worst moments one quarter of the workforce was unable to find employment. Even worse was the situation for young people in the country, with youth unemployment reaching 46%, a figure that represents the worst performance in the EU.

Before proceeding with the analysis, I think it is important to clarify what exactly the term “unemployed” means. The definition is stricter than it appears, as it is not enough for someone simply not to be working. To be classified as unemployed, a person must simultaneously meet three criteria: not be working, be available for work, and be actively seeking it. Students, retirees, and anyone not participating in the labour force are thus excluded from the calculation. The unemployment rate is derived as follows:

Of course, calculating the number of unemployed is a fairly complex process for various reasons, such as the existence of underemployment or seasonal work. There are two dominant methodologies for recording unemployment. The first is based on administrative data, that is, the number of registered unemployed at DYPA (formerly OAED). This is essentially a count of those receiving or applying for unemployment benefits. The problem is that the eligibility criteria are strict:

  • Dismissal, not resignation, from the last employment
  • 125 days of work in the last 14 months (the last two months are not counted)
  • No benefit entitlement for more than 400 days per four-year period of unemployment

With these criteria, registered unemployment significantly underestimates the actual figure, as it excludes those who resigned, those who do not meet the minimum days of work requirement, and those working in undeclared or temporary arrangements. The second and more reliable methodology is the Labour Force Survey (LFS). This is a sample survey conducted by national statistical agencies (in Greece, ELSTAT) based on the International Labour Organization (ILO) definition of unemployment: a person is considered unemployed if they are not working, are actively seeking work, and are available to take up employment within two weeks. This methodology is also used by Eurostat and the OECD, making the data comparable across countries.

Summary Answer

In this article, my goal is to forecast the trajectory of unemployment over the coming months. I therefore obtained some historical data on unemployment in the EU, the OECD, and our country. I will use a simple (S)ARIMA model to make an estimate of its magnitude over the coming months. The data I am using covers the period from 1998 to 2022. If you want a quick answer, in this particular analysis I forecast that the downward trend in unemployment is expected to continue over the coming months. In February 2023, it is expected to range between 10% and 13%.

Prerequisites

For this analysis I used standard libraries for data import and processing (readr, dplyr), the kableExtra and gt packages for table formatting, and the highcharter library for interactive charts. For time series analysis the following packages were used: lubridate, tseries, forecast, tsibble, feasts, fable, strucchange, and urca. The ACF and pACF diagrams as well as the unemployment trend forecasts were built with Highcharter.

Data Structure

The dataset contains unemployment data for various countries or country entities such as the European Union and the OECD countries (Organisation for Economic Co-operation and Development), which allows us to compare unemployment in Greece with that of the other developed economies.

The dataset preview (first 5 rows):

LOCATIONTIMEValue
GRC1998-04-0110.9
GRC1998-05-0111.0
GRC1998-06-0110.9
GRC1998-07-0111.0
GRC1998-08-0111.2
GRC1998-09-0111.1

My data consists of 3 variables (columns). More specifically, my columns are as follows:

VariableVariable TypeDescription
LOCATIONQualitative (categorical)Country or country entity
TIMEQualitative (ordinal)Month-Year to which the measurement refers
ValueQuantitative (continuous)Unemployment level (by area and month)

Consequently, my sample consists of 3 variables, two of which are qualitative and one quantitative, which is also the value I want to forecast (unemployment). It is worth pointing out at this stage that the LOCATION variable has three values: Greece, the OECD countries, and the 27 European countries. Finally, regarding the classification of the time variable, since my values take the form of month-year (mm/year), its type is not immediately clear. We could split it into two additional variables where one would be the year, classified as a quantitative variable, and the month would be a qualitative ordinal variable.

Time Series Preprocessing

The variable indicating the month to which the corresponding unemployment figure refers (TIME) is automatically recognised as a character vector. One of the first things we need to do when handling data that indicate a time interval is to convert them to the appropriate variable type, which in R is called Date. The table below shows the existing variables, as well as the type automatically assigned to them based on the values they contain. R did a good job and identified that the Value variable is a numeric vector since it represents the unemployment level. The mapping of the three entities is also correct, since we are referring to their names and they will therefore be a character vector.

Variable NameVariable Type
LOCATIONcharacter
TIMEcharacter
Valuenumeric

It was noted above that the dates take the form “YYYY-MM” (Year-Month) and were automatically recognised by the software as characters. With the help of the lubridate package I will convert the time variable to the Date type. And once we have made the conversion, if we check again we will see that the change was successful, with the variable now having the Date type.

Variable NameVariable Type
LOCATIONcharacter
TIMEDate
Valuenumeric

Missing Values

We have some good news. In this dataset there are a total of 0 missing values. In the event that observations were missing from my dataset, I would first need to investigate which of these variables were affected. In a second step, and depending on the type of the variable, I would either need to drop those rows entirely or attempt to impute their values using various methods.

Descriptive Statistics

The unemployment data for Greece covers the period from April 1998 to August 2022. As for the EU data, it runs from January 2000 to August 2022. Over the last 20 years, shockingly large changes have occurred in our country’s unemployment rate, with the most abrupt change recorded during the periods of the economic crisis (after 2009). As an illustration, we can observe the difference in the unemployment level between September 2010, when it stood at 10%, and three years later, in September 2013, when it reached 28.1%. For completeness, tables are attached below showing the 5 months with the highest and lowest unemployment in Greece and in Europe over the last 20 years.

Table: Top 5 highest unemployment months — Greece (left) and EU27 (right)

On the other hand, it is also interesting to look at the periods with the lowest observed unemployment. In Greece this period was just before the economic crisis, in 2008, while the EU27 is going through one of its best periods in terms of unemployment, with a 20-year historical low at 6%.

Table: Top 5 lowest unemployment months — Greece (left) and EU27 (right)

All of this can be summarised in the chart below, where the enormous changes in our country stand out. We can see that the 2008 crisis affected unemployment in the EU and across the developed world, as there is an upward trajectory over the same period. The EU has since recovered, but Greece has not managed to return to pre-crisis levels, although the trend is downward.

Chart: Unemployment rate comparison — GRC, OECD, EU27 (annual averages 1998–2022)

Examining Trend and Seasonality

In time series analysis it is important to separate the sources of variation in a time series and determine where they originate from. Time series have three basic components: trend (), seasonality (), and randomness (). The trend describes the general direction the time series follows over time, whether upward, downward, or horizontal. In our case, for example, the sharp rise in unemployment after 2009 is a characteristic example of a strong upward trend. Seasonality refers to patterns that repeat with a fixed periodicity; for instance, if unemployment tends to increase every winter and decrease in summer, we are talking about a seasonal component. Finally, randomness (or the remainder) is what is left after removing the trend and seasonality, that is, the unpredictable fluctuations that cannot be attributed to any systematic pattern, such as the sudden increase in unemployment during the pandemic in March 2020.

Where:

  • denotes the data we have available,
  • is the seasonal component,
  • is the trend component,
  • is the random component.

Similarly, the multiplicative model:

where the components composing the time series are multiplied rather than added.

Chart: STL decomposition of Greek unemployment — observed, trend, seasonality, remainder (4-panel stock chart)

From the chart above, it is very important to distinguish the seasonal component, because I need to know whether I am dealing with a model without seasonality using ARIMA, an autoregressive AR model, or a moving average MA model. If I detect seasonality I will need to use a model that incorporates it, such as Seasonal ARIMA (SARIMA), a seasonal autoregressive model (SAR), or a seasonal MA. The seasonality does not follow the same pattern throughout the entire time series. Up to 2004, seasonality is negligible; from then until 2014, there are signs of weak seasonality. From 2014 onwards, seasonality is stronger than at any other period since 1998. It is worth noting that most peaks occur, approximately, in the months of February and March.

I obtained an ambiguous picture with historically weak seasonality, which has been more pronounced in recent years. A simple method to get a quick answer is through the nsdiffs command from the {forecast} package. In this case we received a response of zero, which leads me to believe that any seasonality present has generally been weak.

TestValueResult
Canova-Hansen (CH)No seasonal differencing required
OCSBNo seasonal differencing required
Seasonal Influence (STL)0.153Weak seasonality

Testing for Structural Breaks

Time series are not a measure that can be reliably interpreted, and consequently forecast, without taking into account various exogenous factors. In our case, we are studying and seeking to forecast unemployment in our country over the coming months. Our task becomes even more difficult when we consider that we cannot do this satisfactorily, as the model cannot understand the patterns of changes in the series and how they came about. Many changes in the time series may have arisen from external factors that influence the specification of our model. It is therefore important to determine whether there are structural breaks, that is, defining events that may have affected the movement of the time series. The case of Greece is one such complex case. These time points could be various dates on which significant events occurred that may have affected the behaviour of the time series. In our case, we are analysing Greek unemployment, which skyrocketed after the 2009 economic crisis, reaching a historical high at its peak. In addition, the series includes the period of the pandemic, which affected the unemployment index.

Chart: RSS and BIC by number of structural breaks (spline lines)

There is a fairly large reduction in the Bayesian Information Criterion (BIC) when moving from a model with no structural break to one with two structural breaks, with a smaller reduction for three breaks. This is an important indication that my unemployment series was indeed affected by sudden factors. Of course, we suspected this, as we have the crisis period that contributed to high unemployment rates. Having determined the number of breaks, it is time to identify which segments need to be analysed; in short, to determine the date ranges during which a significant change in the behaviour of the time series was detected. According to the results I have:

Table: Breakpoint indices by number of breaks (Break A, Break B, … columns; rows = number of breaks 1–5)

and the dates of the corresponding breaks:

Table: Breakpoint dates by number of breaks

Based on the error results, I will choose either 2 or 3 structural breaks, given that there is a significant reduction in the BIC criterion at that number. At the third break there is a small reduction, while at the fourth it increases. Let us examine the proposed breaks individually. On the one hand, the 2-break model proposes breaks in June 2011 and March 2018; on the other hand, the 3-break model proposes breaks also in June 2011, June 2015, and January 2019. I deliberated at length over whether I could decide on my own which was a defining moment of the crisis, as this is potentially a somewhat subjective judgement, which is why I prefer to let the model calculate it. Furthermore, the entire period was quite turbulent and full of negative developments, meaning that in reality one cannot pinpoint a single clear break.

Looking at the dates, the three-break model may be the one that makes the most sense to observers. 2011 saw problems that were already showing up in the unemployment index; 2015 was a period of uncertainty; and January 2019 marks the country’s recovery, with the exit from the memoranda announced a few months earlier, in August 2018.

Stationarity Testing

Definition of Stationarity

An important concept in time series is stationarity. A time series is called stationary if:

Examining Stationarity Graphically

From the unemployment chart above it is very evident that our series does not move around any specific value, violating the first condition for a time series to be considered stationary. This indicates the need to use first-order differences for Greek unemployment. From the first difference () I observe a great improvement, as we no longer have the enormous deviations seen in the previous chart. The values mostly show no trend and move around values relatively close to zero. This is a good sign, though I have a mild concern as there are two points in the time series with relatively large deviations from zero. The first is between points 120 and 170, where the fluctuation around zero has deviated slightly, referring to the period between 2008 and 2012 (the crisis and deterioration of economic indicators). Another slightly problematic point is the 266th, referring to March-April 2020 and the imposition of movement restrictions to curb the spread of coronavirus when the first cases were detected in our country.

Chart: First-order differences of Greek unemployment — line chart, hovering around 0

Taking into account the above concerns, I also computed the second differences () and visualised them. The chart is almost identical, with values fluctuating around zero even at the problematic point near the 150th observation. In the second differences I observe a more consistent fluctuation around zero, but at the same time there is an even larger deviation at the onset of COVID-19 measures in our country (3.6% versus 2.6% for the first differences).

Chart: Second-order differences of Greek unemployment — line chart

Examining Stationarity with Statistical Tests

Graphical examination of stationarity is a fairly easy way to identify the presence of trends or whether our series has generally stable behaviour. Except in some very clear-cut cases, there will be times when many people may disagree on the stationarity of a series simply from its trajectory. This is understandable given that as a measure of evaluation it is somewhat subjective, being based on the personal opinion and interpretation each person gives to the movement of the time series. The use of this method alone may therefore lead to inconsistent or unstable decisions, since one person may consider a slight upward trend that reverts as stationary, while another may interpret the pattern as a faint but real trend. By analogy with normality tests, which include both graphical (quantile-quantile plot) and statistical tests (Kolmogorov-Smirnov test, Shapiro-Wilk test), we have similar alternatives for testing stationarity. This way we can have a more objective criterion for whether our series are stationary or not. Some of the best-known stationarity tests, which are also known as unit root tests, are as follows:

  • The DF test (Dickey-Fuller)
  • The ADF test (Augmented Dickey-Fuller)
  • The ADF-GLS test
  • The PP test (Phillips-Perron)
  • The KPSS test (Kwiatkowski-Phillips-Schmidt-Shin) and
  • The ZA test (Zivot-Andrews)

And this is where the chaos begins, which I started to realise when I began learning R. When using a programming language we do not have nice menus and tick boxes with options for each test. In R, as in other languages such as Python, there are packages that add functions and capabilities to each language. For stationarity tests, quite a few R packages have been built that serve a similar purpose. Some fairly well-known packages offering stationarity tests are tseries and urca. In brief, the tseries package is quite restrictive — despite the fact that many guides use it, I realised that I could not set the lags for the tests or set characteristics of the time series. The urca package addresses these limitations by allowing users to set the number of lags as well as time series characteristics. The only drawback of urca is that it does not provide a p-value in the test results. For the hypothesis test, the test statistic is computed and compared against the corresponding critical value; if the test statistic exceeds the critical value, the test is rejected at the corresponding significance level.

Summary of Results

Summarising the results of the statistical stationarity tests, I conclude that the unemployment observations in Greece cannot be characterised as stationary. Furthermore, all classical tests agree on the presence of stationarity in the first- and second-order differences.

TestResultStationarity achieved at…
ADFI(1)first difference
PPI(1)first difference
KPSSI(1)first difference
ZAI(1)first difference
LSI(1)first difference

DF Test

The Dickey-Fuller test is one of the simplest unit root tests for determining the stationarity of a time series. This test is based on the first-order autoregressive model, :

Where is the value of the time series and is the error term. That is, it is a time series whose values are influenced by, and depend on, previous values of the series.

Misplaced &y_t & = \rho y_{t-1} + e_t \\ y_t - y_{t-1} & = \rho y_{t-1} - y_{t-1} + e_t \\ \Delta y_t & = (\rho - 1) y_{t-1} + e_t \\ \Delta y_t & = \gamma y_{t-1} + e_t \end{aligned}$$ The test has specific variants based on the behaviour of the time series in question. More precisely, there are three variants: - Without a constant and without a trend, where the series moves around zero: $$\Delta y_t = \gamma y_{t-1} + e_t$$ - With a constant, when the series moves around a fixed value (different from zero): $$\Delta y_t = \alpha + \gamma y_{t-1} + e_t$$ - With a constant and trend, when the series appears to follow a downward or upward trajectory over time: $$\Delta y_t = \alpha + \beta t + \gamma y_{t-1} + e_t$$ This particular test has some problems in application, as it makes a series of assumptions: our series follows an autoregressive model with $p=1$, the errors are homoskedastic, and the errors are uncorrelated. The series is clearly non-stationary in the original observations, while the trend disappears in the first-order differences. I will therefore check the assumptions of the Dickey-Fuller test. From the Ljung-Box and Durbin-Watson tests, I reject the hypothesis that the errors are uncorrelated. Finally, the ARCH test shows that the errors are not homoskedastic (they have varying variance). From these results it is clear that the DF test is not reliable and is generally not used, as its assumptions are quite restrictive and are typically violated. Tests such as the ADF and PP address this error behaviour in various ways to obtain valid results when assessing the stationarity of a series. <div id="tbl-check_autocorrlation" class="chart-container" style="min-height:200px"> <p style="text-align:center;padding:2rem;color:var(--color-text-muted)">Table: DF assumption tests — Ljung-Box, Durbin-Watson, ARCH (test statistic, df, p-value)</p> </div> #### ADF Test One of the most widely used unit root and stationarity tests is the **Augmented Dickey-Fuller (ADF) test**, which is a generalisation of the simple Dickey-Fuller test, as it is based on a higher-order autoregressive model ($\rho > 1$). That is, the value of the time series does not depend only on its previous value but also on other $\rho$ preceding ones. An autoregressive model of order $\rho$ takes the following form: $$y_t = \phi_1 y_{t-1} + \phi_2 y_{t-2} + \cdots + \phi_p y_{t-p} + \epsilon_t$$ Analogously with the simple Dickey-Fuller test, variants are defined based on the behaviour of the time series in question: - Without a constant and without a trend: $$\Delta y_t = \gamma y_{t-1} + \sum_{i=1}^{p} \delta_i \Delta y_{t-i} + u_t$$ - With a constant: $$\Delta y_t = \alpha + \gamma y_{t-1} + \sum_{i=1}^{p} \delta_i \Delta y_{t-i} + u_t$$ - With a constant and trend: $$\Delta y_t = \alpha + \beta_t + \gamma y_{t-1} + \sum_{i=1}^{p} \delta_i \Delta y_{t-i} + u_t$$ The **ADF test** has the following hypothesis structure: $$H_0 : \gamma = 0 \quad \text{(the series is not stationary)}$$ $$H_1 : \gamma \neq 0 \quad \text{(the series is stationary)}$$ That is, if our results reject the null hypothesis, this indicates that the time series is stationary. Before doing so I need to determine how many time lags to include. Using Schwert's rule based on the number of observations ($T = 293$): $$p_{max} = \left\lfloor 12 \cdot \left(\frac{293}{100}\right)^{0.25} \right\rfloor = \lfloor 15.7 \rfloor = 15 \text{ lags}$$ <div id="tbl-adf-results" class="chart-container" style="min-height:200px"> <p style="text-align:center;padding:2rem;color:var(--color-text-muted)">Table: ADF test on levels — 3 rows (none, drift, trend) with test statistic, lag, critical values 1% and 5%</p> </div> Given that the results are not statistically significant, I can conclude that my time series is not stationary. The use of differenced observations therefore makes sense, and they should be re-tested for stationarity. <div id="tbl-adf-results-diff1" class="chart-container" style="min-height:200px"> <p style="text-align:center;padding:2rem;color:var(--color-text-muted)">Table: ADF test on first differences — 3 rows (none, drift, trend)</p> </div> Since the test statistic (−2.119) is smaller than the corresponding critical value (−1.95), we can reject the null hypothesis of non-stationarity for the differences of the original Greek unemployment observations, at the 5% significance level. <div id="tbl-adf-final" class="chart-container" style="min-height:280px"> <p style="text-align:center;padding:2rem;color:var(--color-text-muted)">Table: Full ADF results — 6 rows (3 levels + 3 first differences), with series form, specification, statistic, lag, critical values</p> </div> #### PP Test Another stationarity test is the Phillips-Perron test, which is based on the simple Dickey-Fuller. The logic is the same as the ADF and the hypothesis structure is identical, but they differ significantly in how they handle autocorrelation of the errors. The PP test does not add lags to reduce error autocorrelation (as the ADF does); instead, it attempts to modify the test statistic by estimating the long-run variance $\hat{\lambda}$. The only term that cannot be determined in advance is the long-run variance, as its summation depends on a parameter $q$. For the bandwidth parameter $q$ of the long-run variance, there are calculation rules that coincide with those for the maximum lag number in the ADF test. For small samples: $$q_{\text{small}} = \left\lfloor 4 \cdot \left(\frac{T}{100}\right)^{0.25} \right\rfloor$$ and for larger samples: $$q_{\text{large}} = \left\lfloor 12 \cdot \left(\frac{T}{100}\right)^{0.25} \right\rfloor$$ The hypothesis structure takes the same form as the tests mentioned above, with the null hypothesis assuming non-stationarity of our time series: $$H_0 : \text{The series is not stationary}$$ $$H_1 : \text{The series is stationary}$$ <div id="tbl-pp-results" class="chart-container" style="min-height:220px"> <p style="text-align:center;padding:2rem;color:var(--color-text-muted)">Table: PP test results — 4 rows (levels: constant/trend + diffs: constant/trend), test statistic, lag, critical values</p> </div> Consequently, we have a failure to reject the null hypothesis ($H_0$) for the original data and a rejection for the first differences. This means that the Greek unemployment observations were not stationary, but their differences were. The PP test thus confirms the findings of the ADF test. #### KPSS Test Another unit root test is the KPSS test. Although the aim of the test is the same, it has the opposite logic compared to the previous tests: its null hypothesis is stationarity, as opposed to non-stationarity. $$H_0 : \text{The series is stationary}$$ $$H_1 : \text{The series is not stationary}$$ <div id="tbl-kpss-results" class="chart-container" style="min-height:200px"> <p style="text-align:center;padding:2rem;color:var(--color-text-muted)">Table: KPSS test results — 3 rows (in levels, first differences, second differences): KPSS statistic, lag, critical values</p> </div> My null hypothesis assumed stationarity, which is rejected by the KPSS test at the 1% significance level. Furthermore, the test cannot reject stationarity for the differenced observations. These results confirm both the PP test and the ADF test. #### ZA and LS Tests Finally, another stationarity test is the Zivot-Andrews (ZA) test. This particular test statistic differs from the previous ones in that it takes into account certain points at which the time series changes behaviour. The best-known tests for stationarity in time series with structural breaks are: - The Zivot-Andrews (ZA) test, if there is only one structural break, and - The Lee-Strazicich (LS) test, if there are two structural breaks. For completeness I will include the Zivot-Andrews test, which is not optimal since it assumes only one structural break. Although for classical tests there are packages and commands that compute the test statistic, no equivalent commands exist for the Lee-Strazicich test. Fortunately, instead of relying on a statistical package (e.g. EViews), I found a relevant [repository](https://github.com/hannes101/LeeStrazicichUnitRoot) on GitHub, where user [hannes101](https://github.com/hannes101/) has written a series of functions for this specific test that follow a logic similar to those in the `urca` package. <div id="tbl-results-struc-change-test" class="chart-container" style="min-height:200px"> <p style="text-align:center;padding:2rem;color:var(--color-text-muted)">Table: ZA and LS stationarity tests — 4 rows (ZA levels, ZA diffs, LS levels: −4.71, LS diffs: −7.72); critical value at 5%: −5.65</p> </div> ## Model Identification Above I concluded that the unemployment observations are not stationary; however, their first differences constitute a stationary time series. At this point I would like to investigate which (S)ARIMA(p, d, q) model is appropriate for my case. For this reason I will construct autocorrelation and partial autocorrelation diagrams in order to identify the ideal values of $p$ and $q$. Finally, it is worth recalling that in an ARIMA model, $d$ represents the order of differencing, which in our case is 1. ### Original Observations <aside class="margin-note"> <div id="tbl-results-acf-pacf-values-lv" class="chart-container" style="min-height:240px"> <p style="text-align:center;padding:1rem;color:var(--color-text-muted)">Table: ACF/PACF values for original series — Lag 0–11, ACF, pACF</p> </div> </aside> For completeness, we will start with the given unemployment observations. We already know that they are not stationary, but there are specific patterns that we need to observe in the autocorrelation diagrams in order to confirm this. The autocorrelation diagram decreases at an extremely slow rate, which is a strong indication of non-stationarity of the series. <div style="display:grid;grid-template-columns:1fr 1fr;gap:1rem;margin:1.5rem 0"> <div id="fig-acf-hc-lv" class="chart-container" style="min-height:300px"> <p style="text-align:center;padding:1.5rem;color:var(--color-text-muted)">Chart: ACF of original series (bar chart, 20 lags, red confidence bands)</p> </div> <div id="fig-pacf-hc-lv" class="chart-container" style="min-height:300px"> <p style="text-align:center;padding:1.5rem;color:var(--color-text-muted)">Chart: PACF of original series (bar chart, 20 lags)</p> </div> </div> ### First Difference <aside class="margin-note"> <div id="tbl-results-acf-pacf-values-d1" class="chart-container" style="min-height:240px"> <p style="text-align:center;padding:1rem;color:var(--color-text-muted)">Table: ACF/PACF values for first differences — Lag 0–11, ACF, pACF</p> </div> </aside> The non-stationarity of my data, confirmed countless times above, compels us to take the first differences and obtain the corresponding autocorrelation diagrams. I observe a rather difficult-to-interpret situation from the diagrams, as there are significant fluctuations in both diagrams that make it hard to decide on the appropriate ARIMA model. Normally, to determine the right lag orders I look for a sharp drop within the red region, but there are quite a few statistically significant exceedances. After the 10th lag this tendency diminishes or disappears significantly, so my model likely has parameters $p$ and $q$ somewhere between 1 and 10. <div style="display:grid;grid-template-columns:1fr 1fr;gap:1rem;margin:1.5rem 0"> <div id="fig-acf-hc-first-diffs" class="chart-container" style="min-height:300px"> <p style="text-align:center;padding:1.5rem;color:var(--color-text-muted)">Chart: ACF of first differences (bar chart, 20 lags)</p> </div> <div id="fig-pacf-hc-first-diffs" class="chart-container" style="min-height:300px"> <p style="text-align:center;padding:1.5rem;color:var(--color-text-muted)">Chart: PACF of first differences (bar chart, 20 lags)</p> </div> </div> ## Model Building ### Splitting the Time Series Forecasting future unemployment requires us to build a model. Established methods include ARIMA models and Exponential Smoothing (ETS) models. Simply running a model says very little unless we evaluate its parameters. More generally, in work where we try to build predictive models we want to evaluate the model's power, and for this reason we typically split our data into two parts. One part is used for training the model (train set) and the remainder for evaluating it (test set). The usual split is 70/30 or 80/20 respectively. This approach generally works well for classical machine learning problems such as classification and regression. ``` Full Dataset (1998–2022, 293 obs) ↙ ↘ Training (1998–2015) Evaluation (2015–2022) ``` However, if we have time series problems, things are not so straightforward. Let us first suppose we want to apply the same logic to these. We would then end up with a model that ignores one fifth of the observations, specifically the most recent ones, which may well influence the variability of the time series. For this reason, other methods have been proposed for time series. ``` Full Time Series (293 observations) ├─ 1st Fold: Training 1998–2010 / Forecast 2011 ├─ 2nd Fold: Training 1998–2011 / Forecast 2012 ├─ 3rd Fold: Training 1998–2012 / Forecast 2013 ├─ … └─ N-th Fold: Training 1998–2021 / Forecast 2022 ``` All of this may sound somewhat complicated. I have included an interactive application below for you to try for yourselves, so that the difference is clear and you can see how the various options affect the result. <style> @import url('https://fonts.googleapis.com/css2?family=JetBrains+Mono:wght@400;600&family=Sora:wght@300;500;700&display=swap'); :root, [data-bs-theme="light"] { --dropped: #dde1e7; --training: #2d3a4a; --forecast: #f59e0b; --cv-bg: transparent; --cv-text: #1a1f2e; --cv-subtext: #6b7280; --cv-border: #e0ddd8; --avail-bg: #e8e5e0; --avail-text: #9ca3af; --btn-active-bg: #1a1f2e; --btn-active-fg: #ffffff; --tooltip-bg: #1a1f2e; --tooltip-fg: #ffffff; --shadow-fc: rgba(245,158,11,.35); } [data-bs-theme="dark"] { --dropped: #2e3a4e; --training: #7fa8cc; --forecast: #f59e0b; --cv-bg: transparent; --cv-text: #e2e8f0; --cv-subtext: #94a3b8; --cv-border: #334155; --avail-bg: #1e2a3a; --avail-text: #475569; --btn-active-bg: #e2e8f0; --btn-active-fg: #1a1f2e; --tooltip-bg: #e2e8f0; --tooltip-fg: #1a1f2e; --shadow-fc: rgba(245,158,11,.45); } .cv-widget { font-family: 'Sora', sans-serif; color: var(--cv-text); background: var(--cv-bg); max-width: 100%; padding: 1.4rem 0 1rem; transition: color .3s, background .3s; } .cv-widget .mode-toggle { display: flex; gap: .5rem; margin-bottom: 1.3rem; } .cv-widget .mode-btn { padding: .32rem .8rem; border-radius: 20px; border: 1.5px solid var(--cv-border); background: transparent; font-family: 'Sora', sans-serif; font-size: .71rem; font-weight: 500; color: var(--cv-subtext); cursor: pointer; transition: all .2s; } .cv-widget .mode-btn.active { background: var(--btn-active-bg); border-color: var(--btn-active-bg); color: var(--btn-active-fg); } .cv-widget .controls { display: flex; gap: 1.2rem; margin-bottom: 1.6rem; flex-wrap: wrap; align-items: flex-end; } .cv-widget .ctrl-group { display: flex; flex-direction: column; gap: .28rem; } .cv-widget .ctrl-group label { font-size: .65rem; font-family: 'JetBrains Mono', monospace; font-weight: 600; color: var(--cv-subtext); text-transform: uppercase; letter-spacing: .08em; } .cv-widget .ctrl-group input[type=range] { -webkit-appearance: none; width: 130px; height: 4px; border-radius: 2px; background: var(--cv-border); outline: none; cursor: pointer; transition: background .3s; } .cv-widget .ctrl-group input[type=range]::-webkit-slider-thumb { -webkit-appearance: none; width: 14px; height: 14px; border-radius: 50%; background: var(--forecast); cursor: pointer; transition: transform .15s; } .cv-widget .ctrl-group input[type=range]::-webkit-slider-thumb:hover { transform: scale(1.25); } .cv-widget .ctrl-val { font-family: 'JetBrains Mono', monospace; font-size: .72rem; color: var(--forecast); font-weight: 600; } .cv-widget .axis-row { display: flex; align-items: center; margin-bottom: .45rem; padding-left: 62px; } .cv-widget .axis-label { font-size: .63rem; font-family: 'JetBrains Mono', monospace; color: var(--cv-subtext); text-transform: uppercase; letter-spacing: .07em; } .cv-widget .axis-arrow { flex: 1; height: 1px; background: linear-gradient(to right, var(--cv-border), var(--cv-text)); position: relative; margin: 0 .5rem; } .cv-widget .axis-arrow::after { content: ''; position: absolute; right: 0; top: -3px; border: 4px solid transparent; border-left: 7px solid var(--cv-text); transition: border-color .3s; } .cv-widget .axis-end { font-size: .63rem; font-family: 'JetBrains Mono', monospace; font-weight: 600; color: var(--cv-text); text-transform: uppercase; letter-spacing: .07em; } .cv-widget .folds { display: flex; flex-direction: column; gap: 8px; } .cv-widget .fold-row { display: flex; align-items: center; gap: .6rem; opacity: 0; transform: translateY(6px); animation: cvFadeUp .38s ease forwards; } @keyframes cvFadeUp { to { opacity: 1; transform: none; } } .cv-widget .fold-label { width: 54px; text-align: right; font-size: .67rem; font-family: 'JetBrains Mono', monospace; font-weight: 600; color: var(--cv-subtext); flex-shrink: 0; } .cv-widget .fold-bar { flex: 1; height: 38px; position: relative; } .cv-widget .seg { position: absolute; top: 0; height: 100%; border-radius: 3px; min-width: 3px; transition: left .42s cubic-bezier(.4,0,.2,1), width .42s cubic-bezier(.4,0,.2,1), background .3s; } .cv-widget .seg-dropped { background: var(--dropped); } .cv-widget .seg-training { background: var(--training); } .cv-widget .seg-forecast { background: var(--forecast); box-shadow: 2px 0 10px var(--shadow-fc); } .cv-widget .fold-bar:hover .fold-tooltip { opacity: 1; transform: translateX(-50%) translateY(-2px); } .cv-widget .fold-tooltip { position: absolute; bottom: calc(100% + 6px); left: 50%; transform: translateX(-50%) translateY(5px); background: var(--tooltip-bg); color: var(--tooltip-fg); padding: .28rem .55rem; border-radius: 4px; font-size: .62rem; font-family: 'JetBrains Mono', monospace; white-space: nowrap; pointer-events: none; opacity: 0; transition: opacity .18s, transform .18s, background .3s, color .3s; z-index: 20; } .cv-widget .avail-row { display: flex; align-items: center; gap: .6rem; margin-top: .7rem; } .cv-widget .avail-spacer { width: 54px; flex-shrink: 0; } .cv-widget .avail-bar { flex: 1; height: 20px; background: var(--avail-bg); border-radius: 3px; display: flex; align-items: center; justify-content: center; transition: background .3s; } .cv-widget .avail-bar span { font-size: .61rem; font-family: 'JetBrains Mono', monospace; font-weight: 600; color: var(--avail-text); text-transform: uppercase; letter-spacing: .07em; } .cv-widget .legend { display: flex; gap: 1.4rem; margin-top: 1.2rem; flex-wrap: wrap; } .cv-widget .legend-item { display: flex; align-items: center; gap: .38rem; font-size: .68rem; color: var(--cv-subtext); font-weight: 300; } .cv-widget .legend-swatch { width: 13px; height: 13px; border-radius: 2px; transition: background .3s; } </style> <div class="cv-widget" id="cvWidget"> <div class="mode-toggle"> <button class="mode-btn active" onclick="cvSetMode('expanding',this)">Expanding Window</button> <button class="mode-btn" onclick="cvSetMode('sliding',this)">Sliding Window</button> </div> <div class="controls"> <div class="ctrl-group"> <label>Folds</label> <input type="range" id="cvFolds" min="3" max="8" value="5" oninput="cvUpdate()" /> <span class="ctrl-val" id="cvFoldsVal">5</span> </div> <div class="ctrl-group"> <label>Forecast Horizon</label> <input type="range" id="cvHorizon" min="1" max="5" value="2" oninput="cvUpdate()" /> <span class="ctrl-val" id="cvHorizonVal">2</span> </div> <div class="ctrl-group" id="cvWindowWrap" style="display:none"> <label>Window Size</label> <input type="range" id="cvWinSize" min="3" max="10" value="6" oninput="cvUpdate()" /> <span class="ctrl-val" id="cvWinSizeVal">6</span> </div> </div> <div> <div class="axis-row"> <span class="axis-label">Time</span> <div class="axis-arrow"></div> <span class="axis-end">Today</span> </div> <div class="folds" id="cvFoldsContainer"></div> <div class="avail-row"> <div class="avail-spacer"></div> <div class="avail-bar"><span>Available Historical Data</span></div> </div> </div> <div class="legend"> <div class="legend-item" id="cvLegendDropped" style="display:none"> <div class="legend-swatch" style="background:var(--dropped)"></div> <span>Dropped</span> </div> <div class="legend-item"> <div class="legend-swatch" style="background:var(--training)"></div> <span>Model Training</span> </div> <div class="legend-item"> <div class="legend-swatch" style="background:var(--forecast)"></div> <span>Forecast</span> </div> </div> </div> <script> (function () { let cvMode = 'expanding'; function syncTheme() { const root = document.documentElement; const isDark = root.classList.contains('dark'); document.getElementById('cvWidget') .setAttribute('data-bs-theme', isDark ? 'dark' : 'light'); } syncTheme(); new MutationObserver(syncTheme).observe(document.documentElement, { attributes: true, attributeFilter: ['class'] }); window.cvSetMode = function(m, btn) { cvMode = m; document.querySelectorAll('.cv-widget .mode-btn').forEach(b => b.classList.remove('active')); btn.classList.add('active'); document.getElementById('cvWindowWrap').style.display = m === 'sliding' ? '' : 'none'; document.getElementById('cvLegendDropped').style.display = m === 'sliding' ? '' : 'none'; cvUpdate(); }; window.cvUpdate = function() { const nFolds = +document.getElementById('cvFolds').value; const horizon = +document.getElementById('cvHorizon').value; const winSize = +document.getElementById('cvWinSize').value; document.getElementById('cvFoldsVal').textContent = nFolds; document.getElementById('cvHorizonVal').textContent = horizon; document.getElementById('cvWinSizeVal').textContent = winSize; const MIN_TRAIN = 3; let total = 0; for (let i = 0; i < nFolds; i++) { const dropped = cvMode === 'expanding' ? 0 : i; const trainLen = cvMode === 'expanding' ? MIN_TRAIN + i : winSize; const end = dropped + trainLen + horizon; if (end > total) total = end; } const container = document.getElementById('cvFoldsContainer'); container.innerHTML = ''; for (let i = 0; i < nFolds; i++) { const dropped = cvMode === 'expanding' ? 0 : i; const trainLen = cvMode === 'expanding' ? MIN_TRAIN + i : winSize; const pDrop = (dropped / total) * 100; const pTrain = (trainLen / total) * 100; const pForecast = (horizon / total) * 100; const tip = cvMode === 'expanding' ? `Train ${trainLen} · Forecast ${horizon}` : `Drop ${dropped} · Train ${trainLen} · Forecast ${horizon}`; const row = document.createElement('div'); row.className = 'fold-row'; row.style.animationDelay = (i * 72) + 'ms'; row.innerHTML = ` <div class="fold-label">Fold ${i + 1}</div> <div class="fold-bar"> ${dropped > 0 ? `<div class="seg seg-dropped" style="left:0;width:${pDrop.toFixed(3)}%"></div>` : ''} <div class="seg seg-training" style="left:${pDrop.toFixed(3)}%;width:${pTrain.toFixed(3)}%"></div> <div class="seg seg-forecast" style="left:${(pDrop+pTrain).toFixed(3)}%;width:${pForecast.toFixed(3)}%"></div> <div class="fold-tooltip">${tip}</div> </div>`; container.appendChild(row); } }; cvUpdate(); })(); </script> ### Specifying ARIMA Models <aside class="margin-note"> <img src="/articles/forecasting-greek-unemployment/box.png" alt="George Box" /> **George E.P. Box** (United Kingdom, 1919–2013) was a statistician with very significant contributions to the field of time series. George Box, in collaboration with Gwylim Jenkins (United Kingdom, 1932–1982), created the methodology for ARIMA time series. In many textbooks this methodology is referred to as Box-Jenkins. Beyond this, in collaboration with the Finnish-American statistician Greta M. Ljung (Finland, 1941–2014), he created the test for autocorrelation of errors known as the Ljung-Box test. **Photo source:** Wikimedia Commons, 2011. Licence: CC BY-SA 3.0. </aside> For the simple ARIMA model I need to determine my three parameters. I know that $d = 1$, since I achieved stationarity with the first differences. For the determination of $p$ I suspect 1, 6, and 10, as there was a large sharp drop in the partial autocorrelation diagram of the first differences after these values. Finally, for $q$ I suspect 1, 3, 6, and 10. From the diagram I cannot draw a safe conclusion about their values, as there are quite a few significant autocorrelations at least up to the 10th lag. The graphical determination of $p$ and $q$ is quite subjective and provides uncertain conclusions. For this reason it would be good to evaluate a range of parameter combinations for the ARIMA time series models in order to find the model with the optimal parameters. <div id="tbl-aic-arima-models" class="chart-container" style="min-height:380px"> <p style="text-align:center;padding:2rem;color:var(--color-text-muted)">Table: 15 ARIMA models comparison — Model name, BIC, AIC, AICc (sorted by BIC)</p> </div> Older guides or articles would likely have suggested using `auto.arima` to find the best ARIMA. Well, this article did exactly that until I realised that the `{fable}` library gives us the necessary information in a more organised way, so using the `model` command with various parameter combinations for ARIMA models gives us a ready-made table with the Akaike information criteria (AIC), the corrected Akaike information criterion (AICc), and the Bayesian information criterion (BIC). But what should we be looking for? This is very important because, as you can see, each criterion gives a different model as the best. For example, based on the AIC and AICc criteria I should choose the ARIMA(3,1,2) model, while BIC proposes the ARIMA(1,1,1) model. In general, we look for the model that gives the smallest numbers on these criteria. The ideal model is ARIMA(1,1,1) as it achieves the best performance on the BIC criterion, has good performance on the AIC and AICc criteria with a small gap from the optimal suggestion on those, and finally the AIC criteria proposed models with a marginal improvement but with 5 parameters, compared to 2-parameter models that may generalise better to the data and the behaviour of the time series. ### Specifying the ETS Model The Exponential Smoothing method is an equally popular method for forecasting time series. Its great advantage is that it does not require the tests we performed for the ARIMA models. Despite its flexible methodology, we need to examine the time series graphically to observe its properties (trend, seasonality, cyclicality) so that I can specify the ETS model that suits our case. $$ETS(\text{Error}, \text{Trend}, \text{Seasonality})$$ The smoothing model I will use is based on these characteristics. <div id="tbl-aic-ets-models" class="chart-container" style="min-height:280px"> <p style="text-align:center;padding:2rem;color:var(--color-text-muted)">Table: 9 ETS models comparison — Model name (A-N-N, A-A-N, …), BIC, AIC, AICc (sorted by BIC)</p> </div> The table shows that the ETS(M,A,N) model, that is, with multiplicative error, linear trend, and no seasonality, has the best performance on the BIC criterion. Very close behind is ETS(M,Ad,N), which differs only in that the trend damps gradually rather than continuing linearly. The absence of a seasonal component (N) from the best-performing models confirms the findings of the seasonality section: seasonality in Greek unemployment is weak and does not improve the models. By contrast, models with a seasonal component (A-A-A, M-Ad-M, A-A-M) sit at the bottom of the table, with the added complexity not offset by better fit. ### Prophet Model The solutions above are potentially quite complex and require deep theoretical knowledge as well as basic statistical programming skills. An easy alternative for quick time series forecasting is the Prophet algorithm. It was created by Facebook in 2017 and is a fairly popular forecasting method. The eponymous R package, with the `prophet()` command, gives us an estimate in a single line of code. One point worth noting is that it requires data in data frame format. For smooth application, you should rename your columns appropriately so that it understands which columns it needs to analyse. Unlike ARIMA and ETS models, Prophet does not require explicit stationarity testing or parameter specification because it automatically decomposes the time series into trend, seasonality, and holidays. This ease of use makes it popular in industry applications, but it does not guarantee better performance. In the comparison table that follows we will see how it performs relative to the classical models. ## Model Comparison Up to now I used the AIC and BIC criteria to select the best parameters within each model type, for example, which ARIMA is the best among the other ARIMAs. These criteria cannot, however, tell me whether an ARIMA is better than an ETS or Prophet, because each model type computes them in a different way and they are not directly comparable across types. For this reason I need a common basis for comparison. The logic is simple: each model makes forecasts for months for which I already know the actual unemployment figure, and I measure how far off it was. The smaller the error, the better the model. The metrics I will use are three: <aside class="margin-note"> $y_i$ : actual value, that is, the unemployment figure for the given month $\hat{y}_i$ : estimated value, that is, the unemployment figure forecast by my model $$RMSE = \sqrt{\frac{1}{n} \sum_{i=1}^n (y_i - \hat{y}_i)^2}$$ </aside> $$MAE = \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i|$$ $$MSE = \frac{1}{n} \sum_{i=1}^n (y_i - \hat{y}_i)^2$$ <div id="tbl-rmse-models-2mths" class="chart-container" style="min-height:300px"> <p style="text-align:center;padding:2rem;color:var(--color-text-muted)">Table: Model comparison for 2-month forecast — ~11 models (Auto ARIMA, ARIMAX, ARIMA, ETS, Prophet, Naive, SNaive), RMSE and MAE, sorted by RMSE</p> </div> <div id="tbl-rmse-models-6mths" class="chart-container" style="min-height:300px"> <p style="text-align:center;padding:2rem;color:var(--color-text-muted)">Table: Model comparison for 6-month forecast — same models, RMSE and MAE, sorted by RMSE</p> </div> From the tables above it emerges that for short-term forecasts ($h \leq 6$ months) the ARIMAX model, which includes two exogenous variables representing the structural changes (breaks), performs best. The ARIMA(1,1,1) model also performs well for both the 2-month and 6-month forecasts. What stands out in our results is the significant difference in errors between the ARIMA models and the Naive model. This potentially indicates that our models identify the various patterns of the time series better, and therefore justify their existence. One point that requires particular attention is the differences in errors: while differences clearly exist, it should be investigated whether they are statistically significant. To study this, the Diebold-Mariano test will be used, which tests whether the errors of the compared models are statistically significant: <div id="tbl-db-test" class="chart-container" style="min-height:260px"> <p style="text-align:center;padding:2rem;color:var(--color-text-muted)">Table: Diebold-Mariano tests — 6 model pair comparisons: DM statistic, p-value, conclusion (significant / no difference)</p> </div> According to the test, I found that the selected ARIMA models have statistically significant differences from the Naive model, demonstrating their value for forecasting future unemployment, at least in the short term. A further finding from the combinations of tests is that the selected ARIMA models do not have statistically significant differences among themselves. That is, the ARIMAX model using one or two breaks does not have significant differences at the 5% significance level from a simple ARIMA(1,1,1) model. ## Wait, Where Are the Assumptions? > Hey, where are the assumption tests? We want our money back. Well, I have some good news and some bad news. The good news is that not only has the article not ended, but I also need to check whether the best models satisfy the assumptions we discussed, namely that the errors must be: - uncorrelated ($Cov(\epsilon_t, \epsilon_{t-1}) = 0$), - homoskedastic ($Var(\epsilon_t) = \sigma^2$), and - follow the normal distribution. <div id="tbl-assumptions" class="chart-container" style="min-height:220px"> <p style="text-align:center;padding:2rem;color:var(--color-text-muted)">Table: Model assumption tests — Ljung-Box, ARCH, Jarque-Bera for each of the 4 models, plus AIC and BIC</p> </div> The results are not ideal for any of the models. The Ljung-Box test rejects the null hypothesis of uncorrelated errors for all models ($p < 0.01$), indicating that the errors retain some structure that the models fail to fully explain. The ARCH test is also rejected everywhere, indicating heteroskedasticity (the variance of the errors is not constant over time), something expected in a time series that passed through a debt crisis and a pandemic. Finally, the Jarque-Bera test rejects normality, with extremely high test statistics suggesting heavy tails in the error distribution. These findings mean that the confidence intervals produced by the models should be interpreted with caution, as they may be narrower or wider than the true underlying uncertainty. However, the violation of these assumptions does not automatically invalidate the point forecasts. ARIMA models are known to produce reliable point estimates even when the distributional assumptions are not fully satisfied, particularly over a short-term horizon. The main implication concerns the quantification of uncertainty, not the direction of the forecast. ## Forecasting Future Unemployment All of this was somewhat tiring, but the moment has finally arrived for all of it to make sense. I found the best model and, using it, I will forecast the level of unemployment in Greece for the next 12 months. My data end in August 2022 (unemployment at 12.2%), so any estimate will be made for the months from September 2022 through to August 2023. <div id="fig-forecasting" class="chart-container" style="min-height:440px"> <p style="text-align:center;padding:2rem;color:var(--color-text-muted)">Chart: Unemployment forecast — historical line (dark) + 6-month forecast (red dashed) with 80% and 95% confidence bands</p> </div> And the corresponding table with forecasts: <div id="tbl-results-predictions" class="chart-container" style="min-height:240px"> <p style="text-align:center;padding:2rem;color:var(--color-text-muted)">Table: 6-month unemployment forecasts — Month, Point Forecast, 80% CI, 95% CI</p> </div> Based on the best-performing model, the downward trend in Greek unemployment is expected to continue over the next six months. The point estimate for February 2023 is 11.4%, with an 80% confidence interval between 10.1% and 12.8% (and a 95% interval between 9.3% and 13.5%). If confirmed, this would represent a continuation of the steady deceleration that began after the historical high of 2013. However, it is worth highlighting certain limitations of this analysis. ARIMA models rest on the assumption that the future structure of the time series will resemble its historical one. An unforeseen event, such as a new geopolitical crisis or an energy shock, could overturn the forecast. In addition, unemployment is influenced by variables not included in the model, such as the trajectory of GDP, inflation, and fiscal policy. Finally, the confidence intervals should be interpreted with caution, given that the assumption tests revealed heteroskedasticity and non-normality in the errors. Despite these limitations, the direction of the forecast (continuation of the downward trend) is consistent across all models examined, which reinforces the credibility of the conclusion. Greece appears to be moving steadily towards lower unemployment, but remains at levels significantly above the European average (approximately 6% in the same month), a reminder that the full recovery from the 2010 crisis has not yet been completed. --- <div class="appendix"> Photo by <a href="https://pixabay.com/el/users/roszie-6000120/">Rosy / Bad Homburg / Germany</a> from <a href="https://pixabay.com/">Pixabay</a>. </div>