Introduction

In this article we build a machine learning model with the goal of predicting which customers of a bank are interested in opening a term deposit account. For this purpose we compare seven classification algorithms, with emphasis on Boosting models (XGBoost and LightGBM). The data come from the “UCI Machine Learning Repository” and specifically from the Bank Marketing dataset.

Before the analysis, let us define some basic concepts.

What is a term deposit?

It is a type of bank account where the customer commits to not making withdrawals for a predetermined period of time (e.g., one year). In return, the bank offers higher interest rates compared to regular savings or current accounts.

For reference:

  • Bank of Piraeus offers double the interest rate on its term deposit accounts.
  • Eurobank offers zero interest on current accounts, 0.01%–0.35% on savings accounts, and 0.1%–1% on term deposits, depending on the program and deposit amount.
  • According to a recent report by the Bank of Greece, term deposit interest rates range between 1.2% and 1.4%, compared to just 0.03% for typical household accounts.

The appropriate profile for these products generally involves individuals with significant savings balances and without heavy financial obligations (loans, overdue debts).

Prerequisites

Loading Libraries

For this analysis we will need standard R libraries for importing data, via the {readr} package, and formatting it with the {dplyr} package. The {kableExtra} package is a useful addition for printing results in table format. An important part of the workflow is data visualization. I initially used the {ggplot2} package to create charts, which is limiting for a website since ggplot2 produces static graphics. So for my articles I use the {highcharter} package, which enables responsive charts that work well on all screen sizes. Finally, this analysis aims to classify bank customers based on their interest in a banking product, so the {tidymodels} package is essential, as we will need a classification model.

# General data processing
library(readr)
library(dplyr)
library(forcats)
library(tidyr)
library(glue)

# Results presentation
library(kableExtra)
library(reactable)
library(gt)

# Interactive charts
library(highcharter)

# Machine learning models
library(tidymodels)
library(bonsai)    # LightGBM via tidymodels
library(themis)    # SMOTE for imbalanced data
library(stacks)    # Ensemble stacking
library(probably)
library(discrim)

# Individual algorithms
library(kknn)
library(ranger)
library(naivebayes)
library(kernlab)
library(vip)       # Variable importance

Importing Data

After loading the necessary libraries, we need to import our data. There are several versions of the same dataset, a larger and a more compact one, differing only in the number of observations. For this article I will choose the more compact version since fitting Boosting models is particularly time-consuming compared to simpler classification models (e.g., Logistic Regression, k-Nearest Neighbors).

bank_dataset <- read_delim("bank_dataset_files/bank.csv",
                           delim = ";",
                           escape_double = FALSE,
                           trim_ws = TRUE)

bank_dataset <- bank_dataset %>% tibble::rowid_to_column("ID")

Data Preview

Below we present a small sample of the dataset (the first 6 observations) to understand its structure and variable types.

Before performing any analysis it is good to identify the type of data we have available. In general, variables can be classified based on the values they take as follows:

  • Quantitative: Discrete or Continuous
  • Qualitative: Categorical or Ordinal

: Variable summary

VariableTypeDescription
Agequantitative (continuous)Individual’s age
Jobqualitative (categorical)Employment sector
Maritalqualitative (categorical)Marital status
Educationqualitative (ordinal)Highest level of education
Defaultqualitative (categorical)Has credit in default?
Balancequantitative (continuous)Average annual account balance (in euros)
Housingqualitative (categorical)Has a housing loan?
Loanqualitative (categorical)Has a personal loan?
Contactqualitative (categorical)Contact method
Monthqualitative (ordinal)Month of most recent contact
Durationquantitative (continuous)Duration (in seconds) of last call
CampaignquantitativeNumber of contacts during this campaign
pdaysquantitativeDays since last contact from a previous campaign
ppreviousquantitativeNumber of contacts before this campaign
poutcomequalitative (nominal)Outcome of previous marketing campaign
Depositqualitative (nominal)Did the client subscribe to a term deposit?

The dataset consists of 17 variables (columns), of which 7 are quantitative and the remaining 10 are qualitative. Of the qualitative variables, 8 are categorical and only two are ordinal (month of contact and education level).

Defining Functions

Ok, we have seen some basic characteristics of the data and their structure. Can I now start my analysis?

It depends. If we want a quick analysis to extract a specific result, it is probably fine. However, most of the time a more careful study design is required. A common mistake I have made in the past is the risk of repeating certain procedures. To prevent writing the same things multiple times, writing reusable functions is essential.

We define two functions. First, univariateQualitativePlot, which is used for creating pie charts and bar charts for the qualitative variables:

univariateQualitativePlot <- function(data, column, title, subtitle, chart_type = "bar") {

  freq_table <- data %>%
    count({{ column }}, name = "Frequency") %>%
    arrange(desc(Frequency)) %>%
    rename(Variable = {{ column }}) %>%
    mutate(pct = round((Frequency / sum(Frequency) * 100), digits = 1))

  hc <- highchart() %>%
    hc_title(text = title) %>%
    hc_subtitle(text = subtitle) %>%
    hc_tooltip(pointFormat = "{point.name}: {point.y}") %>%
    hc_legend(enabled = chart_type == "pie")

  hc <- if (chart_type == "bar") {
    hc %>%
      hc_chart(type = "bar") %>%
      hc_xAxis(categories = freq_table$Variable, title = list(text = "Category")) %>%
      hc_yAxis(title = list(text = "Frequency")) %>%
      hc_series(list(name = "Frequency", data = freq_table$Frequency))
  } else if (chart_type == "pie") {
    pie_data <- lapply(1:nrow(freq_table), function(i) {
      list(name = freq_table$Variable[i], y = freq_table$Frequency[i])
    })
    hc %>%
      hc_chart(type = "pie") %>%
      hc_series(list(name = "Frequency", data = pie_data))
  }

  return(hc)
}

Similarly, we define univariateQuantitativePlot for building histograms for our quantitative variables. Both functions build interactive charts using the {highcharter} package.

Descriptive Analysis

Missing Values

The dataset contains a total of 0 missing values. This is of course a rare and ideal situation. Otherwise, we would need to impute the missing values using some estimation method.

Univariate Analysis

Next it is important to study our variables, their values and their distributions. This is a critical step to understand the sample and to take additional context into account when building the model.

Regarding the employment sector, the sample shows significant participation from individuals with jobs likely associated with higher education and consequently higher earnings, such as management executives, administrative employees, and entrepreneurs. About 40% of the bank’s customers work in blue-collar jobs, which are most often associated with a reduced willingness to commit capital. Finally, in the bank’s customer base there is a share of around 10% from population groups who, for various reasons, would not benefit from creating a term deposit account: the unemployed, students who are in a financially vulnerable period with high expenses and limited income, and retirees who may need to cover unexpected healthcare costs. For retirees specifically, who make up 5% of customers, there are other financial instruments available for securing their retirement, such as reverse mortgages.

Another available piece of information is marital status, which may be associated with increased household needs and expenses. A married customer may have higher household expenses (e.g., due to children). Alternatively, married individuals may also have greater financial stability. In general, the interpretation of this indicator is not clear-cut a priori. In any case, in the sample under examination about 60% of individuals are married, one quarter are single, and the rest are divorced.

A variable that, at least intuitively, can be among the most important is the customer’s highest level of education. It is logical that someone with higher-level education can work in jobs requiring specialization that is rewarded accordingly given its reduced supply. This indirectly determines professional opportunities and consequently the salary one can command and the amount left over for savings. In this particular dataset only 30% have university-level education.

Beyond education, which is a strong indicator, the customer’s financial obligations also play an important role. These can be examined through three variables:

  • whether the customer has credit in default
  • whether the customer has a housing loan
  • whether the customer has a personal loan

It is obvious that a customer with outstanding debts is unlikely to consider saving rather than paying off what they owe. In the sample only 76 individuals, corresponding to 1.6% of customers, fall into this category, which is encouraging for the vast majority.

Furthermore, a significant share of the bank’s customers already carry substantial obligations, both short-term and long-term. More than half have taken out a housing loan, which represents a fixed, long-term commitment. On the other hand, one could argue that having a housing loan means the household has already budgeted for housing costs rather than paying rent. What may be the more important barrier is the presence of a personal or consumer loan. These loans are typically taken out to cover short-term unexpected needs and are known for their high interest rates. In our case about 15% have a personal loan, which is mildly encouraging since it leaves 85% without this particular obstacle.

Another piece of information available is the contact method used to reach customers. More than half (64%) have indicated a mobile phone as their preferred contact method.

Another interesting variable is the last month in which a customer was contacted. Most recent contacts appear to have occurred during the summer months. This detail requires careful interpretation, however, as the data may have been collected in September, which would naturally make summer months the most recent for most customers.

According to the data and its description, the bank had previously run similar campaigns promoting term deposit accounts. As a result, we have records of customers who subscribed in previous campaigns. The previous campaigns resulted in 129 customers having subscribed, while the status of many others is unknown.

The bank’s customers are predominantly younger, with the vast majority under 60. The histogram appears to have a bell shape resembling a normal distribution, but with a slight positive skew.

Another variable of interest is the duration of the last call. Intuitively, a very short call suggests the customer is not interested, while a longer interaction between the agent and the customer likely signals greater interest in the banking product.

Bivariate Analysis

In the previous sub-section we examined some basic descriptive statistics per variable, giving us a sense of the bank’s customer base. Looking at these figures individually may not be sufficient to draw meaningful conclusions, and a bivariate analysis becomes necessary.

An important comparison is the customer’s employment sector against their final decision. The results show that blue-collar workers have the lowest positive response rate, while retirees have the highest.

These results may have been partially expected when considering the likely financial situation of each group. The case of retirees is particularly interesting: while we initially assumed they would not be an ideal target due to potential unexpected expenses, the data show that they respond positively at a higher rate. This can be explained by the fact that many retirees have a stable income with few new financial obligations, making them suitable candidates for committing capital.

There are also variables where the answer is less obvious, such as marital status. The data shows that individuals who are alone (either single or divorced) subscribe proportionally more than married individuals.

A summary of the above can also be visualised with a Sankey diagram. The first column shows the distribution across marital statuses combined with educational background, and we end at the third column which is the customer’s final response.

custom_df <-
  tibble(
    r = bank_dataset$education,
    t = bank_dataset$marital,
    m = bank_dataset$y
  )

df_sankey <- custom_df %>%
  group_by(r, t, m) %>%
  summarise(weight = n(), .groups = "drop")

links2 <- df_sankey %>%
  group_by(t, r) %>%
  summarise(weight = sum(weight), .groups = "drop") %>%
  rename(from = t, to = r)

links3 <- df_sankey %>%
  group_by(r, m) %>%
  summarise(weight = sum(weight), .groups = "drop") %>%
  rename(from = r, to = m)

links_all <- bind_rows(links2, links3)

highchart() %>%
  hc_chart(type = "sankey") %>%
  hc_title(text = "Marital Status -> Education -> Outcome") %>%
  hc_subtitle(text = "Flow of customers by combination of demographic characteristics and final decision") %>%
  hc_add_series(
    keys = c("from", "to", "weight"),
    data = list_parse(links_all),
    name = "Flow"
  ) %>%
  hc_tooltip(pointFormat = "{point.from} -> {point.to}: <b>{point.weight}</b>")

I also wanted to compare the difference between previous and current campaign outcomes. The bank retains significant loyalty from previous subscribers, with 64% of those who had previously agreed to open a term deposit doing so again in the new campaign. The key figure in this bivariate analysis is the rate at which those who previously declined were converted. This rate approaches 13%, which is a reasonably satisfying result given that they had previously refused.

Building the Model

In R there are two widely used frameworks for building models: caret and tidymodels. On one hand, the caret package is fairly easy to use and has a large body of guides and tutorials. On the other hand, tidymodels is an all-in-one solution, a meta-package that provides a comprehensive workflow, though with somewhat less documentation since it is more recent.

Splitting the Dataset

The first step is to split the original dataset. In this analysis we use a three-way split:

  • Training set (bank_train): used for training all models and cross-validation.
  • Validation set (bank_val): a small portion of the training set held aside exclusively for selecting the optimal classification threshold for the Stack Ensemble.
  • Test set (bank_test): used only for the final evaluation, and not touched at any other stage.
set.seed(123)

# Main split: 75% train, 25% test
bank_dataset_split <- initial_split(bank_dataset,
                                    prop   = 0.75,
                                    strata = y)
bank_trainval <- training(bank_dataset_split)
bank_test     <- testing(bank_dataset_split)

# Secondary split: from trainval we produce
# bank_train (80%) and bank_val (20%)
set.seed(123)
trainval_split <- initial_split(bank_trainval,
                                prop   = 0.80,
                                strata = y)
bank_train <- training(trainval_split)
bank_val   <- testing(trainval_split)

The split results in:

SubsetObservationsPurpose
Full dataset4,521
bank_trainval3,390Train + validation
bank_test1,131Final evaluation only
bank_train2,712Model training & cross-validation
bank_val678Stack Ensemble threshold selection

The bank_val set contains approximately 678 observations, enough to give a reliable threshold estimate without removing a significant portion of the training data.

Data Preprocessing

Building models is not straightforward. Between splitting the dataset and fitting the models comes the data preprocessing stage. The steps involved are not fixed and vary depending on the type of problem and the model chosen. Fortunately, the {tidymodels} package provides ready-made tools to make this stage easier. There are also additional packages that address common dataset issues. For example, our data are expected to be imbalanced, as the vast majority of customers did not subscribe to a term deposit. A dataset is considered imbalanced when the target variable has a large disparity between classes (e.g., 90% “No” / 10% “Yes”). In this case we use the step_smote() function from the {themis} package to balance the target variable.

# Recipe for tree-based models (RF, XGBoost, LightGBM)
tree_recipe <- recipe(y ~., data = bank_train) %>%
  step_rm(poutcome, ID, duration) %>%
  step_corr(all_numeric(), threshold = 0.75) %>%
  step_dummy(all_nominal(), -all_outcomes()) %>%
  step_smote(y)

# Recipe for distance/linear models (Logistic Regression, KNN)
linear_recipe <- recipe(y ~., data = bank_train) %>%
  step_rm(poutcome, ID, duration) %>%
  step_corr(all_numeric(), threshold = 0.75) %>%
  step_dummy(all_nominal(), -all_outcomes()) %>%
  step_normalize(all_numeric_predictors()) %>%
  step_smote(y)

It is worth noting that the duration variable (call duration) was deliberately removed from the models. Its value is only known after the call has ended, that is, after the event we are trying to predict. Including it would create a data leakage problem, making the model artificially accurate but practically useless. The poutcome variable was also removed, for a different reason: it concerns previous campaigns for which data are unavailable for the large majority of customers, limiting its usefulness as a predictor.

Cross-Validation

Ok, is it time to build our model now?

Not so fast. Theoretically we could proceed, but the recommended approach is not to rely on a single two-way split, since the evaluation results depend heavily on how that split happened. To get a more reliable estimate of model performance, we build subsets of our data to estimate, on average across 5 or 10 sub-samples, which parameters consistently lead to better accuracy.

Cross-validation is a technique in which the training set is divided into k equal subsets (folds). In each iteration, one fold is used as the validation set and the remaining k-1 folds are used for training. The process is repeated k times and the results are averaged. In our case we used 5-fold cross-validation with stratification, so that the ratio of subscribers to non-subscribers is preserved in each fold.

set.seed(123)
cv_folds <- vfold_cv(bank_train, v = 5, strata = y)

ctrl_grid  <- control_stack_grid()
ctrl_bayes <- control_stack_bayes()

Building the Model

With the {parsnip} package we can define the characteristics of various models. In this case we evaluate seven different classification models: Logistic Regression, K-Nearest Neighbors (KNN), Random Forest, Naive Bayes, SVM, XGBoost, and LightGBM.

# --- Logistic Regression ---
log_reg_model <- parsnip::logistic_reg(
  penalty = tune(),
  mixture = tune()
) %>%
  set_engine("glmnet") %>%
  set_mode("classification")

# --- K-Nearest Neighbors ---
knn_model <- parsnip::nearest_neighbor(
  neighbors   = tune(),
  weight_func = tune()
) %>%
  set_engine("kknn") %>%
  set_mode("classification")

# --- Random Forest ---
rf_model <- parsnip::rand_forest(
  trees = 200,
  mtry  = tune(),
  min_n = tune()
) %>%
  set_engine("ranger") %>%
  set_mode("classification")

# --- Naive Bayes ---
nb_model <- naive_Bayes(
  smoothness = tune(),
  Laplace    = tune()
) %>%
  set_engine("naivebayes") %>%
  set_mode("classification")

# --- SVM ---
svm_model <- parsnip::svm_linear(
  cost = tune()
) %>%
  set_engine("kernlab") %>%
  set_mode("classification")

# --- XGBoost ---
xgb_model <- parsnip::boost_tree(
  trees      = 200,
  min_n      = tune(),
  learn_rate = tune(),
  tree_depth = tune()
) %>%
  set_engine("xgboost") %>%
  set_mode("classification")

# --- LightGBM ---
lgbm_model <- parsnip::boost_tree(
  trees      = 200,
  min_n      = tune(),
  learn_rate = tune(),
  tree_depth = tune()
) %>%
  set_engine("lightgbm") %>%
  set_mode("classification")

log_wf  <- workflow() %>% add_recipe(linear_recipe) %>% add_model(log_reg_model)
knn_wf  <- workflow() %>% add_recipe(linear_recipe) %>% add_model(knn_model)
rf_wf   <- workflow() %>% add_recipe(tree_recipe)   %>% add_model(rf_model)
nb_wf   <- workflow() %>% add_recipe(linear_recipe) %>% add_model(nb_model)
svm_wf  <- workflow() %>% add_recipe(linear_recipe) %>% add_model(svm_model)
xgb_wf  <- workflow() %>% add_recipe(tree_recipe)   %>% add_model(xgb_model)
lgbm_wf <- workflow() %>% add_recipe(tree_recipe)   %>% add_model(lgbm_model)

Fitting the Models

Having defined the models and their corresponding workflows, we proceed to hyperparameter optimization. For the simpler models (Logistic Regression, KNN, Naive Bayes, SVM, and Random Forest) we use grid search (tune_grid()). For the gradient boosting models (XGBoost and LightGBM), I opt for Bayesian optimization (tune_bayes()). In this approach, the algorithm learns from previous trials and selects the next parameter combination more intelligently, saving time compared to an exhaustive search.

set.seed(123)

log_results <- tune_grid(
  log_wf, resamples = cv_folds, grid = 10,
  metrics = metric_set(roc_auc, accuracy), control = ctrl_grid
)

knn_results <- tune_grid(
  knn_wf, resamples = cv_folds, grid = 10,
  metrics = metric_set(roc_auc, accuracy), control = ctrl_grid
)

rf_results <- tune_grid(
  rf_wf, resamples = cv_folds, grid = 15,
  metrics = metric_set(roc_auc, accuracy), control = ctrl_grid
)

nb_results <- tune_grid(
  nb_wf, resamples = cv_folds, grid = 10,
  metrics = metric_set(roc_auc, accuracy), control = ctrl_grid
)

svm_results <- tune_grid(
  svm_wf, resamples = cv_folds, grid = 10,
  metrics = metric_set(roc_auc, accuracy), control = ctrl_grid
)

# XGBoost & LightGBM -- tune_bayes
xgb_results <- tune_bayes(
  xgb_wf, resamples = cv_folds, initial = 5, iter = 25,
  metrics = metric_set(roc_auc, accuracy), control = ctrl_bayes
)

lgbm_results <- tune_bayes(
  lgbm_wf, resamples = cv_folds, initial = 5, iter = 25,
  metrics = metric_set(roc_auc, accuracy), control = ctrl_bayes
)

Beyond evaluating each model individually, we also apply a technique called stacking. The idea behind stacking is that instead of selecting a single model as the final one, we combine the predictions of multiple models in a meta-model that learns which model performs best and in which situations. We implement this using the {stacks} package:

bank_stack_v1 <- stacks() %>%
  add_candidates(log_results)  %>%
  add_candidates(knn_results)  %>%
  add_candidates(rf_results)   %>%
  add_candidates(xgb_results)  %>%
  add_candidates(lgbm_results)

bank_stack_v2 <- bank_stack_v1 %>%
  add_candidates(nb_results) %>%
  add_candidates(svm_results)

set.seed(123)
bank_stack_model_v1 <- bank_stack_v1 %>%
  blend_predictions(penalty = 10^(-2:0), metric = metric_set(roc_auc))

bank_stack_model_v2 <- bank_stack_v2 %>%
  blend_predictions(penalty = 10^(-2:0), metric = metric_set(roc_auc))

bank_stack_fit_v1 <- bank_stack_model_v1 %>% fit_members()
bank_stack_fit_v2 <- bank_stack_model_v2 %>% fit_members()

Note on the Stack Ensemble threshold

Unlike individual models (LightGBM, XGBoost, Logistic Regression) for which we can draw out-of-fold predictions from cross-validation, the Stack Ensemble does not have equivalent predictions. For this reason we created bank_val earlier: a subset of the training set that the model has not seen during training, used exclusively for threshold selection in the next section.

Threshold Selection

Each model produces for each customer a probability of interest, not a direct decision. To move from probability to classification (“Yes” / “No”) we need a threshold: if the probability exceeds it, the customer is classified as interested.

The default value of 0.5 is rarely the optimal choice for imbalanced data. In our case only 11% of customers belong to the “Yes” class. Instead, we select the threshold that maximizes the F1 score, which balances the ability to detect interested customers (recall) with the reliability of positive predictions (precision).

A common mistake is to find the threshold on the same test set used for final evaluation. To avoid this we follow different approaches depending on the model:

  • For individual models (LightGBM, XGBoost, Logistic Regression) we use the out-of-fold (OOF) predictions from cross-validation.
  • For the Stack Ensemble we use bank_val, which was set aside from the beginning exclusively for this purpose.

It is worth noting that thresholds differ considerably across models. This does not mean that any model is “wrong”: each model simply calibrates its probabilities differently. What matters is that every threshold was found on data that the corresponding model had not used during training.

Results

Variable Importance

The variable importance analysis from the LightGBM model reveals that the most decisive factors for predicting interest in a term deposit are the presence of a housing loan (housing_Yes, 15.3%) and an unknown contact method (contact_Unknown, 14.0%), followed by married marital status (marital_Married, 11.7%). Noteworthy is also the contribution of the contact month, with May appearing as the most important month (8.2%), while education level also carries meaningful weight. These results largely align with the findings of the descriptive analysis.

VariableImportance (%)
Housing Loan15.3
Unknown Contact Method14.0
Married11.7
Month: May8.2
Tertiary Education7.1
Secondary Education6.4
Month: July5.8
Month: August5.3
Occupation: Blue Collar4.9
Month: November4.6

Model Comparison

Before moving on to the final evaluation, it is worth comparing graphically the predictive ability of the four selected models using ROC curves (Receiver Operating Characteristic curves). A ROC curve plots sensitivity against the false positive rate across all possible thresholds. The closer the curve is to the upper left corner, the better the model’s overall predictive ability. The diagram clearly shows the superiority of the Stack Ensemble models over the other two.

Overall, we observe that accuracy alone is misleading. XGBoost, which achieves one of the highest accuracy scores, has a near-zero F1. This happens because accuracy primarily rewards correct negative predictions, and in our imbalanced dataset, predicting “No” for the vast majority is easy. F1 is the more honest criterion here, penalizing false positives and false negatives equally. It is also worth noting the sensitivity of LightGBM: it identifies 51% of truly interested customers, a rate that the Stack Ensemble models do not match.

ModelAccuracyF1SensitivitySpecificityPPV
Stack Ensemble (v1)highest
Stack Ensemble (v2)
LightGBM51%
Random Forest
Naive Bayes
Logistic Regression
SVM
XGBoost~0

Conclusions

The analysis reveals a different ranking from what the F1 score might have suggested. Stack Ensemble v1 emerges as the most efficient option: with only 127 calls it achieves a success rate per call of 40.2%, meaning 4 in every 10 contacts result in a subscription. If the constrained resource is call center time, this model is the most respectful of it. The trade-off, however, is 80 missed subscribers. Of the 131 actually interested customers, we identified 51.

Interestingly, the second ensemble model (Stack Ensemble v2) does not outperform the first, despite being a combination of seven individual models compared to the five in v1. It does produce a smaller total number of contacts, but at a lower efficiency rate relative to v1. Its main advantage is that it generates the fewest incorrect contacts among the top-performing models.

LightGBM, on the other hand, identifies 67 subscribers — the most of any model — but requires 204 calls to do so, at a success rate of 32.8%. That is nearly 60% more calls than Stack Ensemble v1 to gain 16 additional true positives. Also notable are the results of Naive Bayes, which comes surprisingly close to the performance of the ensemble models. Finally, the most striking result is XGBoost, which correctly predicted only 3 out of 131 actual positive cases.

ModelTrue Positives (TP)False Positives (FP)Missed Subscribers (FN)Total CallsSuccess Rate (%)
Stack Ensemble (v1)51768012740.2
Stack Ensemble (v2)
LightGBM671376420432.8
Naive Bayes
Random Forest
Logistic Regression
SVM
XGBoost3

From the analysis above it becomes clear that there is no single “correct” or universally best model, but rather one that best fits our goal. If the bank’s objective is to maximize revenue by promoting a new product, the answer is clear: LightGBM, as it identifies the most interested customers. If the goals are more conservative — for instance expanding services among existing customers while disturbing as few uninterested ones as possible — then Stack Ensemble v2 becomes ideal since it generates 21 fewer incorrect contacts. The same model is also ideal when campaign time is limited, as it leads to the lowest total number of calls. If the goal is a balanced approach between the number of contacts required and the expected campaign results, the winner is clearly Stack Ensemble v1.

In conclusion, the ideal model is not determined solely by optimal parameters, but by the question being asked and the goals of the organization.