Disclaimer and Datasets Limitations:
During the development process of the models and the EUMigraTool, the researchers came across some limitations based on the available datasets.
- Based on the availability of the datasets and, most importantly the absence of accurate and frequently updated datasets for migration, it is not possible for the Large Scale model to provide accurate predictions for migration flows but instead focused on asylum applications flow. The model is trained on asylum applications historic data from Eurostat and it provides predictions on asylum seekers / unrecognized refugees.
- Similar to the previous point and disclaimer, there is a very low availability of quality datasets, even for asylum applicants.
- The Large Scale model is based on a Topic Modeler (LDA) by monitoring the national press. This creates limitations on specific countries that may not have free speech and/or the national press is either full of false information and/or censored. An example is Eritrea, which, currently, does not have independent mass media. This, also, creates barriers in the case of Ukraine since at times of war the media and national press is flooded with misinformation.
- The available datasets from the sources mentioned above, are not frequently updated. Even the datasets that are getting updates have a very irregular update frequency as the same dataset is getting monthly updates for specific countries while other countries are not updated. This may affect the predictions of the Large Scale model, mainly the early weeks predictions.
- The Large Scale model is trained on specific datasets and is bound to predict asylum seekers / unrecognized refugees based on the categories these datasets include. For example, the Eurostat dataset include five age categories and three gender categories and the model that is trained with this dataset can only predict these categories. One of the most common requests is to include predictions of “unaccompanied minors”, but if this category does not exist in the dataset that the model is trained with, the model does not even know the existence of said category so it cannot predict it.
Mitigation of different types of bias:
- Administrative Bias (Asylum Applications Bias): The amount of asylum applications recorded by organizations such as Eurostat depends heavily on the current number of employees. Therefore, there is an administrative bias (seasonal and not) induced in the datasets which is mitigated by including the time lagged dependent variable as input to the prediction models. In this way, the model develops in intuition on historical fluctuations of the dependent variable that are due to administrative reasons.
- Topic Modeling Bias: The LDA topic modeler, used to classify GDELT’s headlines, is trained on a specific text corpus consisting of only American publications. Therefore, the topic distribution extracted during the training process is heavily influenced by the American culture which is not representative of the global one. This dataset bias is not mitigated by the EMT since there are no other corpuses, of significant size for training, available now.
- GDELT headlines Bias: The GDELT’s API allows you to download up to 250 headlines per day. To keep the scope limited to national news, the model selects headlines from articles that mention the country of origin at least three times in their text. This selection criteria aids in focusing on relevant headlines but introduces a bias toward national news.
- Prediction Evaluation Bias: A data point forecast doesn’t provide intuition on the certainty/uncertainty of the forecast. Although there here are error metrics, such as the Mean Relative Error, that can quantify the accuracy of the forecasts so far, there is no such metric that can quantify the uncertainty of a forecast data point. Therefore, this is mitigated by providing prediction intervals (lower bound and upper bound) with a 95% statistical confidence. This means that, independent of how ‘accurate’ a forecasting model is, the prediction interval will manage to include the true value 95% of the time. The difference between a certain and an uncertain forecast is the size of the prediction interval (the bigger the less certain).
Recommendations/guidance to end users:
- Asylum Application Bias: a legend is proposed, that will be visualized clearly in the EMT website, containing important warnings on the prediction of the asylum applications such as the bias and the limitations.
- As for the data sets’ bias, elaborating on technical details would only confuse the end users rather than guiding them. Therefore, no warning should be provided on them.
- Prediction Evaluation Bias: This is by far the most important guidance that should be provided to the end users. Every prediction dashboard needs to have clearly visualized legends with information on the concept and statistical confidence of the prediction intervals. End users need to know exactly what the predictions stand for and how to use them.
- The prediction pipelines are trained to minimize the Mean Relative Error. More information on what this error metric is and how it is used can be found on the Deliverables 6.2 and 6.3.
AI Fairness:
AI Fairness 360(aif360) is a software toolkit developed by IBM that aims to assist developers, researchers and data scientists to identify and mitigate potential biases in artificial intelligence (AI) models. The toolkit includes a set of algorithms and metrics for evaluating the fairness of AI models, as well as algorithms for post-processing and debiasing models. It aims to provide a comprehensive resource for understanding and addressing fairness issues in AI and can be used in a variety of contexts, including natural language processing, computer vision and predictive modelling.
In the developer’s approach, the above toolkit was selected to identify and mitigate potential biases in the LSM model. Firstly, a specific dataset was created according to the format required by aif360. The dataset included the actual values of asylum applicants from Eurostat. And the predictions made by the LSM model for each month concerning asylum applicants. The timeframe studied was from 04-2018 to 05-2022. The metric used in this dataset was APE. The Absolute Percentage Error (APE) column (METRIC) represents the percentage difference between the actual and forecasted value for the corresponding month. Experiments were conducted by setting the categories [‘SEX’ , ‘AGE’] as protected attributes. In this case, men aged 18 to 34 ([‘M’ ], [‘Y18-34’]) are considered privileged and the remaining characteristics are considered unprivileged. Label values were assigned to 1 (favourable) and 0 (unfavourable). The columns from the dataset selected to be targeted for this bias study were ([‘Origin Country’, ‘Destination Country’, ‘SEX’, ‘AGE’, ‘EUROSTAT’, ‘METRIC’]). Observed that the difference in mean scores between the non-privileged and privileged groups was -0.039. This indicates that there was no bias in our data set. And there was no need to mitigate the bias.
Statistical Confidence for LSM:
In justifying the LSM, the reason prediction intervals are so valuable is that they express the uncertainty in the forecasts. If we only produce point forecasts, there is no way of telling how accurate the forecasts are. However, if we also produce prediction intervals, then it is clear how much uncertainty is associated with each forecast.
Specifically, the wider the interval, the less confident the forecast is. For that reason, the LSM provides additional 95% confidence intervals (this means that 95 out of 100 true values of recorded asylum applications will fall within the interval) for every point forecast it produces. Because the sample size is large and the residuals follow a normal distribution, we used a 95% confidence level in a standard normal distribution.