Project

Status:

Active/Ongoing

Project maturity:

Proof of concept

Linked to group(s)/challenge(s):

Community Vulnerability Index

The Community Vulnerability Index is an open-source, science-backed needs assessment tool. Using data-driven metrics and AI, CVI evaluates a community’s well-being and delivers actionable information to those in a position to effect change.

1.0 Introduction

1.1 Problem and Background

Community-focused organizations and nonprofit groups often have the seemingly impossible task of doing more with less. They are expected to transform limited resources into a notable impact for their communities. Unfortunately, in addition to illness, the COVID-19 pandemic also brought with it financial burdens that many were unprepared to face. The economic harm caused by the pandemic forced many nonprofits and other community organizations to make budget cuts. Consequently, the communities these groups are serving are going without, sometimes, much-needed resources.

As community needs vastly increase due to the pandemic, efficient and effective resource planning and distribution become crucial. However, this process is not straightforward as the effects of COVID can interact in complicated ways. For example, two communities with similar infection rates may experience different hospitalization rates due to a higher prevalence of other complicating factors such as obesity or diabetes.

Untangling how the effects of COVID are interacting in different communities has become a priority as we begin the process of recovering from the pandemic. There is no avoiding the economic harm resulting from budget cuts and their increased burden on nonprofits. However, we can help mitigate the consequences by making smarter, data-driven decisions about resource distribution.

1.2 Solution summary in simple terms

We propose a solution that leverages data science and machine learning to help nonprofits and other organizations obtain insight into their communities.

The Community Vulnerability Index (CVI) is a science-backed, open-source needs assessment tool targeted mainly for use by nonprofits. By using data-driven metrics and machine learning, CVI evaluates a community’s well-being and needs. It then delivers actionable information that nonprofits and community-focused organizations can use to effectively target their resources. Through a web application, users can access several indicators representing health, socioeconomics, policy, and more. They can also view the potential vulnerabilities within their communities that were predicted with our sophisticated AI algorithms.

1.3 Solution summary in technical terms

In technical terms, CVI can be described by its front-end and back-end components. The front-end for CVI is a dashboard that is being built on Python and the Dash Plotly framework. We are developing the app using Flask, a lightweight web application framework that is an industry-standard in Python. The dashboard is stylized with HTML, CSS, Bootstrap, and Javascript, making the presentation aesthetically pleasing.

CVI’s back-end is where data is processed, and machine learning models are built to identify and project vulnerabilities within communities. To quantify community vulnerability, we derived a series of indices that measure community needs along different axes. Predictive modeling is then used to determine where resources are best used.

1.4 State of advancement of the project

CVI has undergone user testing with various nonprofit organizations. Early feedback has indicated they find the product useful and wish to collaborate in developing customized vulnerability indices, including an Access to Sexual Health Education Score for Planned Parenthood.

1.5 Project Timeline

The Community Vulnerability Index was started in 2020 and is an ongoing project. Short term milestones are listed below:

February 2021

Definition of new Economic Harm vulnerability metric
Evaluation of clustering schemes

March 2021

Data cleaning and merging for Economic Harm metric
Assessment of predictive power of current COVID severity metric

April 2021

Refinement of COVID severity metric based on results
Implementation and analyses of clustering schemes

May 2021

Analysis of model accuracies based on a retrospective longitudinal study
Finalize Economic Harm metric

June 2021

Refinement of Machine Learning models based on results
Incorporate Economic Harm Vulnerability metic to Dashboard

July 2021

Beginning of user testing outside of partner organizations
Intensification of public outreach efforts through non-academic publications and increased promotion

2.0 Project Implementation

2.1 Solution

This project is taking place in the United States and will be made freely accessible to any community-focused organization. We estimate the stakeholders for this project to be healthcare professionals, policymakers, researchers, nonprofit groups, philanthropists, and anyone in the market for social good.

The main deliverable of this project is an interactive web application. Using the dashboard, stakeholders can visualize data on health, infrastructure, policy, demographics and more, for each county in the United States. The platform also provides a variety of derived metrics that quantify community vulnerabilities across different axes. To date, users can access eight vulnerability indices:

risk of severe case complications
risk of economic harm
need for mobile health resources
need for food resources
need for mental health resources
risk of overwhelming health care infrastructure
community connectedness
information deserts

Figure 1: Community Vulnerability Index Dashboard

A mock-up of the application is shown in Figure 1. Upon accessing the web application, users are presented with a full map of the United States as well as national statistics regarding COVID-19. To obtain statistics relevant to their county, they can interact with the map by zooming in on the area, by clicking the Magnifying Glass icon and searching for the location, or by selecting their state from the dropdown menu.

Users can choose what data is presented to them by adding layers to the map using the Layers icon in the top right corner of the map. Each layer represents a variable (e.g.,% adults 65 or older, % smokers, etc.) or a calculated vulnerability index. In the module on the right of the map, users can select a state to view county vulnerability scores and their rankings within the state. The arrows at the bottom toggle between the scores for different vulnerability indices.

Although this project was conceived to help our stakeholders address the impacts of COVID-19 in their communities, we expect its utility to outlive the pandemic. Our vulnerability indices do not pertain exclusively to COVID-19, and we are planning to add more metrics, translating stakeholders’ insights into community needs.

2.2 Methodology

Data

The Community Vulnerability Index leverages data from the following open-access sources:

The New York Times COVID-19 Data Repository
The New York City Health Department COVID-19 Data Repository
United States Census Bureau County Population by Characteristics: 2010-2019, American Community Survey, and Small Area Health Insurance Estimates
United States Diabetes Surveillance System (2017)
Centers for Disease Control and Prevention Interactive Atlas of Heart Disease and Stroke and Wonder Multiple Cause of Death database
Institute for Health Metrics and Evaluation Chronic Respiratory Disease Mortality Rates by County 1980-2014
County Health Rankings Model Adult smoking
Centers for Medicare & Medicaid Services COVID-19 Nursing Home Dataset
United States Department of Agriculture Economic Research Service
The Robert Wood Johnson Foundation County Health Rankings and Roadmaps Program
Homeland Infrastructure Foundation-Level Data Open Data
National Center for Education Statistics Number and percentage of households with computer and internet access, by state: 2015

Adding new variables to the CVI dataset involves a carefully crafted pipeline. The process starts with the identification of a new vulnerability index through an extensive literature review. Once identified, we locate open access data sources for each of the variables that comprise the index. To be included, the data must be race and age-adjusted, cover the entire adult population, and present the information at the county-level. We then clean the data and address any missing values. Finally, the new variables are merged into the full data set.

Construction of the dataset is an ongoing activity as vulnerability indices are continuously reviewed and added. The full dataset, along with definitions for all included variables, is available for download on our GitHub.

Metrics and Modeling

Vulnerability indices are calculated by quantile normalizing each comprising variable to a Gaussian distribution based on the full set of U.S. counties. The scaled variables are added using a weighted linear combination and the result is divided by the number of factors multiplied by weights to get a value between 0 and 100. Weights for each variable are determined through a review of published CDC information and other peer-reviewed literature.

We are currently exploring several strategies to improve our vulnerability scoring. To better understand the similarities between communities, we are examining the use of unsupervised learning methods such as clustering. We expect the resulting clusters to provide insight into additional axes on which we can evaluate communities.

In addition, we will also use time series forecasting through LSTM or similar models to improve the accuracy of scores that rely on COVID-19 case counts. Our expectation is that this will lead to improved results compared to simply using the current case count.

Finally, we are also currently researching using a neural network or other supervised learning methods for predicting severe COVID-19 cases. Interpretability methods applied to the trained model will allow us to refine the weights associated with the current metric, thus improving accuracy.

2.3 Results/Expected results

At the end of this project, we expect to deliver a tool that can assist nonprofits and other organizations with resource planning and distribution. Early feedback has indicated that stakeholders find CVI useful, and some have expressed interest in collaborating on developing customized vulnerability indices. Another expectation is that users of CVI will be able to leverage the insights obtained from the product to bolster their grant applications. As such, we believe CVI could also help secure funding.

3.0 Safety, quality assurance and regulation

3.1 What steps have you taken to ensure your solution’s safety? How advanced are you in this process (if applicable)? Please check the Biosafety and Biosecurity guideline of OpenCovid19

N/A - We do not collect any clinical samples. The data we use has been aggregated at a county level, and as such contains no identifying information for individuals.

3.2 Have you planned the conduct of your manufacturing process that ensures quality, what are the steps you have taken? How advanced are you in this (if applicable)?

N/A

3.3 Will you need assistance with the regulation system? If not, which regulatory system do you plan on using to distribute the product? Please elaborate (please see: Regulatory-Strategies) (if applicable)

N/A

3.4 Have you talked to medical staff about the feasibility of your project? What did they say?

N/A

3.5 Have you planned the testing, verification and validation of your solution? How advanced are you? (if applicable)

The accuracy of the vulnerability indices will be validated through a retrospective longitudinal study. We will start by identifying a list of past events that may have impacted a community’s vulnerability (e.g., policy changes). Subsequently, we will plot the vulnerability scores over time and verify if there are any significant deviations related to these events.

4.0 Impact, issues and risks

4.1 What impact do you feel your project could have?

There is growing evidence showing that Black and Latinx communities have been disproportionately affected by the pandemic, adding COVID-19 to a long list of health disparities suffered more acutely by minorities. In the United States, President Joe Biden and his administration have made equitable pandemic recovery a top priority. CVI can help governments and nonprofits with this task by ensuring that aid is going where it would be most effective. We can help navigate the long road to recovery by giving a clear and detailed picture of a community’s needs, allowing stakeholders to make smarter, more effective data-driven decisions.

4.2 What do you think would make your project a success?

We measure CVI’s success mainly by two criteria:

The accuracy of the models
User adoption

Having only one is not sufficient. For this project to be considered successful, both need to be attained. After all, basing decisions on the results of an inaccurate model would be no better (and potentially worse) than using nothing at all. By the same logic, even the most accurate model will be ineffective if no one is willing to use it.

4.3 Please list the known issues, potential risks, grey-areas, etc in your project

Reluctance to Adopt

We recognize that for some industries incorporating data science and machine learning in their decision-making represents a major paradigm shift. As such, there is a risk that users will be hesitant to use CVI. We can mitigate this risk by providing users with convincing demos showcasing what CVI can help them achieve, as well as delivering a clear, easy-to-use UX.

Hidden Variables, Insufficient Data

Model accuracy is highly dependent on having enough data to learn patterns. Similarly, variables not included in the dataset will be missing information. If these variables are important to the process being modeled, then performance will be impaired. We rely mostly on open access data sources to build our models. In addition, we enforce strict guidelines for including variables. Data must be race and age-adjusted, cover the entire adult population, and present the information at the county-level. As such, access to the data necessary to build accurate and reliable models represents another potential risk.

5.0 Originality

5.1 What other projects on JOGL are like yours? Search for them and Link them!

As of March 10, 2021, there are no projects for performing needs assessments. EPI-CENTER is similar in that it is focused on building epidemiological forecasting models.

5.2 Is this an innovative project? What makes this project different if it’s unique on JOGL?

To our knowledge, there are no other projects listed on JOGL for performing community needs assessments. What makes our project unique compared to others is that CVI is meant to help communities recover from the consequences of the pandemic by quantifying their needs across multiple axes and identifying areas of vulnerability.

5.3 Is there already an open-source version of this project?

We have not come across another open-source project like CVI.

6.0 Team experience

6.1 Please cite your team members and their roles in the project.

Savannah Thais - Project Founder and Machine Learning Team Lead

Shaine Leibowitz - Machine Learning Co-Lead

Stephanie Santo - Data Lead

Diep Hoang - Dashboard Lead

Alexandra Passarelli - Literature Review

Alex Rios - Literature Review

Lindsey Fiedler - Funding Lead

Annina Christensen - Design Strategist

Sahil Saxena - Project Manager

7.0 Funding and Costs

7.1 How is your project being funded so far?

We are currently applying for other grants and setting up crowdfunding.

7.2 How much funding do you need and how do you plan to use that funding?

We have included an itemized budget in the Documents section. Although we recognize that this award would not cover all of our expenses, it would be of immeasurable value in helping us grow the Community Vulnerability Index.

Edit: It seems the budget document we uploaded is not showing to other users. We include an image of the budget below.

Additional information

Short Name: #CVI
Created on: March 3, 2021
Last update: July 12, 2021
Looking for collaborators: ✅
Grant information: Received $1,600.00€ from the OpenCOVID19 Grant Round 5 on Invalid Date

Keywords

Data science

Public health

machine learning

Artificial intelligence

Front-end development

Associated SDGs

Good Health and Well-being

Reduced Inequalities

Links