- Short Name: #VaccRiskAssess
Estimating the risk is a complicated process, and there are many different methods and techniques to assess risk. This Global Vaccination Risk Assessment [GVRA] tool will try to create an overview of risk factors by country while looking at the more comprehensive [global] picture by combining the different data sources in a structured process. The idea is derived from Fuzzy Operators, typically used in other data domains but it’s principles are applicable in [almost] every field.
The GVRA tool can ingest all kind of data, e.g. from poverty rates and vaccination levels to GDP and literariness of a country, all this data comes in different shapes and forms. This is perfect for applying Fuzzy Logic [FL], and the main reason it was developed because in FL the “real” values are not important, and all operations are done on their relationship to the user [or Expert]. The first step is to define this relationship, which is a value between 0 and 1. In this proof of concept, this relationship is expressed as a linear function between the maximum and minimum value of the processed data sources.
Nine sources are processed project, but hopefully, in the future, more data sources can be added to the risk library. Each data source is assigned to one of the sides of the risk triangle, and the average value is used to calculate the filled area of this side of the triangle. The sum of the three filled triangles gives the risk ratio, which is converted to a Risk Score for each country [or region]. This is the final score mentioned in Figure 1
Figure 1: The filled risk triangle
The scope of PoC was to find out if it is possible to combine all these different data sources and compute a risk score for each country. Figure 2 shows the frequency distribution of the global vaccination risk score. The next step is to make the system dynamic, and allow expert opinions on both the configuration of the data on the sides of the risk triangle, as well as the rating of each data source. This will change the frequency distribution but will be more reliable than the initial configuration [without the expert knowledge].
Figure 2: Global distribution of Vaccination Risk
The final product is a web-based tool/page where everyone can change, add, and rate data and visualise the effects of each combination. When all operations/combinations are tracked, a global image will emerge, and a risk consensus could be reached. However, this falls out of the scope of this project, and more research is needed to set realistic benchmarks and targets.
Elevator pitch / Abstract
One of the aims of this proof of concept is to enable data for decision making and inform about the global risk of vaccination.
In the proof of concept, we focused on how the data can be used, what an expert system could look like, and what would be the best way to visualise the risk score.
The method behind the tool is based on Fuzzy Logic, where all data is redefined in intervals between 0 and 1 to describe the importance, or relationship, to the user.
The data is placed in a Risk Triangle, on one of the three sides of the triangle, and the average relationship value is used to compute the area of this facet. The user, or expert, can modify the configuration, e.g. which data set goes to which side, and place different weights on the sources used in the analysis. This unique combination is used to calculate the Vaccination Risk by combining area of the filled risk triangle.
The tool can be used for any area, like country or region, for which data is available, creating a unique risk signature which can be easily compared to the risk value of other countries or regions.
How to contribute
This project is a Proof of Concept [PoC], and the project is, still, open for participation. The PoC is very data-intensive and could use more global data sources to create a more complete picture of the driving parameters behind the risk of not getting vaccinated, data exploration and cleaning, as well as some coding to visualise the risk by country.
At the moment, there are nine possible factors included in the risk calculation, 6 - 7 datasets are not processed, and an important dataset [actual vaccination split by disease and country] is not used at all, because of a lack of time and processing ideas. More datasets might be found, downloaded and incorporated in the system, but again the available time is constraining the growth of the database.
If you want to contribute just send a message to us, all help is welcome.
Vaccination associated problems, for example, identifying the risk of not getting a vaccination, are issues which are not easily understood and solved. Correctly assessing the risk [even at country-level] depends on the accurate interpretation of a lot of different datasets, each dataset holding information collected and formatted to the benefit of the specific need of the data user.
The GVRA tool tries to solve this problem by ingesting as many data sources as possible, transforming the data to a uniform format, and with the use of experts combine the standardised data to calculate the contribution of each data source to the risk triangle [Figure 1]. The fill-ratio of the triangle determines the risk level, e.g. countries with a high risk for non-vaccination have a more filled triangle than countries with a lower risk.
Visualising the risk by filled triangles will make it easier to identify the countries with high risks and in which aspect these countries are lacking resilience.
Figure 1: Risk Triangle
Objectives & Methodology
The PoC is set up as a simple data processing problem, in which each data source is processed equally:
1) Find the correct ISO3 country code* for each row in the data
2) Use only the rows in the data which match the country list
3) Mark the rows with no data, and calculate basic statistics for the other rows
4) Create a fuzzy function over the min and max values
5) Apply the function, so that there is a fuzziness between 0 and 1
6) Assign a default value, either close to 0 or 1, to the rows with no data
7) Export the fuzzified score to the Multi-Criteria Analysis calculation
8) Assign the data to one of the three groups [Exposure, Hazard, Vulnerability]
9) Apply a scaling factor to the data [column]
10) Calculate the group value [for now set as the average]
11) Combine the group values as the area under the triangle and calculate the fill ratio
Visualisation [to be implemented]
12) Create a dynamic map
13) Create a widget showing the risk triangle and the country-specific fill rates
*ISO3: a unique identifier for each country in the world map and linking the country risk values to the countries
State of the art
The methodology of this project is based on three known techniques:
a) Fuzzy Logic
b) Multi-Criteria Analysis
c) Risk Triangles
However, to my knowledge, these three techniques are not yet [successfully] implemented for a global vaccination risk assessment. There are projects gathering and combining [public] datasets, even in the Co-Vaccine program, but a universally applicable method to visualise the risk is missing.
The aim of this project is to combine the three methods and make them available either through a Python Jupyter Notebook [passed through a Voila/Docker/Git solution] or directly as a widget on any site which could benefit from a risk visualisation.
The first phase of this project was a Proof of Concept [PoC], in which the feasibility of the solution was tried out. Currently, there are nine data sources converted [fuzzified] and placed in an equal-weighted multi-criteria analysis [MCA] to calculate the contribution to one of the three sides of the risk triangle [see figure 1]. The PoC process is documented in the procedure document, see the documentation section of the project, outlining the few steps needed in the data processing pipeline. The GitHub* repository is set up to hold the raw and processed data, and all data is combined in one single Excel spreadsheet.
In the next weeks, the focus will be on the visualisation of the calculated risk, bringing the methodology from Excel Tab calculations to a Python Jupyter Notebook [Need 1].
Starting with the static [converted] data in the Excel file, a dynamic world map will be set up to link the calculated risk to each country and visualising the contribution to the risk on the fly, e.g. filling the risk-triangle with the country-specific information. Because the contribution is based on the MCA, making the calculation dynamic [as an expert system] is the next step [Need 2].
The final step is allowing users to include their data, meaning that the fuzzification has to be programmed in the Notebook, or landing place
No stakeholders are identified; possible stakeholders are information gathers which have difficulty communicating complex data stories to their audience. More research is needed to recognize the stakeholders for this project.
The GVRA tool should be linked to informative pages about vaccination to give an overview of the global vaccination risk level. No direct impact has been identified, other than that the input for the tool is very flexible and could be combined with other initiatives to calculate an indicator [for risk].
The tool tries to give an informative and safety message about the risk of the global risk of vaccination. It tries to highlight the country-specific areas where more an increase in the effort would lead to a reduction of this risk. As seen in figure 1, not all areas are easily accessible to create an environment of change [like corruption as a function of GPD] but will give insight in which areas are contributing most to the [high] vaccination risk in each country.
Sustainability and scalability
The tool has the potential to be informative to a large population, but there is no communication strategy developed to engage with either the scientific community nor the general public.
One of the objectives is to try to solve a complex problem of combining different data sources and assessing the global risk of lack of vaccination through the merged information. More research is needed to answer the need for such a tool or information platform because the future users of the system are not yet defined.
Communication and dissemination strategy
For the moment the project’s visibility is limited to the JOGL platform [https://app.jogl.io/project/73]. There is no plan in place to broaden the scope or increase the project’s visibility because too little effort is put on the design of the tool and a landing page is missing to direct visitors to the tool. The next phase of this project should be focussed on:
1) Conduct a study to confirm that a project like is has relevance and is useful
2) Designing the data pipeline and constructing a demo landing place
As of this date [01/12/2019], no funding is secured to upscale the project. Identical to the first phase also in the next step no external financing is needed to upscale the project, most effort should go in finding a landing place to embed the methodology – if this methodology is deemed useful to process and visualise complex data questions.