Data Science Competition 2020 | Global Association for Research Methods and Data Science

Overview

Welcome challengers! The COVID Computational Challenge seeks to create an innovative solution to determine the risk of exposure to COVID-19 in locations in and around the City of Los Angeles. This two-week challenge will provide ideas and concepts to help deepen our understanding of the issues that may increase or decrease COVID-19 exposure risks, how to calculate these risks, while being respectful of data privacy. Projects will be reviewed by a panel of judges from the City of LA, LA County Department of Public Health, Chamber of Commerce, and academia.

*You must be signed in first to access

Winning Solutions See Award Ceremony

First Place

Team USC_ANRG with members Mehrdad Kiamari, Dr. Gowri Sankar Ramachandran, Dr. Quynh Nguyen and Professor Bhaskar Krishnamachari from USC Viterbi School of Engineering.

Second Place, in no particular order

Team DSO with team members Greg Faletto and Mohammad Mehrabi from USC Department of Data Sciences and Operations.

Team RPI Solver with members Tong Chen, Junxiong Tang, Yueyang Li, and Feng Wang. Their mentor is Dorit Nevo from Rensselaer Polytechnic Institute.

Team Contemporary Li Shi Jen (当代李时珍) with members Wanying (Joy) Qian and Jessie (Ge) Qu from University of Michigan and Pengyue Jia and Yi'an Wang from Zhejiang University. Their mentor is Professor Feng Zhang from Zhejiang University.

Special Prizes, in no particular order

Best Application

Team The Padron Peppers with team members Jeev Prayaga, Rena Brar Prayaga, Ram Prayaga, Gyan Prayaga. Jeev and Gyan are attending Grinnell College.

Rising Star in Data Science

Team HDMA with Daniel Kao with mentor Professor Ming-Hsiang Tsou from The Center for Human Dynamics in the Mobile Age.

Timeline

Training Webinars

June 2 at 11 AM PST

Risk Scoring Solutions Discussions led by RMDS

June 3 at 3:00 PM PST

Public health perspective on COVID-19 data issues

June 1 at 2 PM PST

Training by SafeGraph on Social Distancing and Mobility Data

May 27 at 10 AM PST

Training by SafeGraph on Social Distancing and Mobility Data

May 27 at 2 PM PST

Training by the City of LA and RMDS on the problem statement, data, evaluation, and resources

May 28 at 3:30 PM PST

Training by UCLA Computational Medicine on analyzing the trajectory of COVID

May 29 at 10 AM PST

Training by ESRI

May 29 at 2:00 PM PST

Training by Gartner on Data Bias and Ethics

The Problem

In the next two weeks, you will determine the risk of exposure to COVID-19 in locations in and around the City of Los Angeles.

Features that may increase or decrease COVID-19 exposure risks
Assist with the transition to re-open by predicting location-based risk scores
Proposed methodology to implement risk score assessment
Actionable steps for risk mitigation and to improve risk score

Data

Participants are highly encouraged to use the open data resources highlighted below. If proprietary data is used, it must be documented for our judges to understand and reproduce your work. The datasets below contain both static and time varying spatial-temporal features related to COVID-19. Documentation included on site.

Open Data Portals:

City of Los Angeles open data

City of Los Angeles geospatial data

State of California open data

Los Angeles County open data

Los Angeles County COVID Dashboards

Mobility Data:

Waze data

Google mobility data

SafeGraph cell phone data

Foot traffic data

Descartes Lab mobility data

Other Data:

1Point3Acres data

Daily numbers neighborhood-level

To get started, you can begin with these datasets. View Dataset

Free training on epidemiology, spatial analytics, data science, and more:

Submission Deliverables

Source code required
README file explaining how to run your codes. If you use Java or C++, please also include the commands you use to compile your code (we should be able to compile, and if necessary, run your code and see the output files generated).
Technical Report in PDF with names of all team members and team name required
- Your report should include the following sections: Introduction, Data, Methodology, Result, Implementation Proposal, Risk Mitigation Recommendations, Acknowledgement, and Reference. Please refer to the Problem Statement to check that your solution answers the prompt.
CSV of your results with the location and location-base risk score
Optional is presentation and demo recording

Evaluation

Impact: what useful business insights are acquired from the proposal? Does the score and implementation proposal have a meaningful impact on businesses and the LA community? What are actionable steps recommended to improve their score?

Methodology Validity: are the methodology, mathematics, and epidemiology principles behind the proposal reasonable and documented? How is the risk score and thresholds defined and are the ways that risk is quantified and factors are weighted sensible? Are the assumptions and limitations of the methodology clearly outlined with suggestions to improve the proposal? Are the quantitative steps of data ingestion, feature engineering, model architecture and performance optimization valid? How robust is the model?

Reproducibility: do the solution and script use best practices with workflows and documentation to reproduce their work? For example, are the data ingress and egress pipelines reproducible? Is there a clear presentation of data science work in the documentation?

Usability: is the information presented in a way that is actionable? Would a member of the general public understand the score, what it means and what actions to take?

Ability to Deploy: is getting access to the data realistic with reasonable computation time? Is the proposal a good fit within the existing system? Is the system scalable and robust to take into account new data sources, maintenance and perhaps even applications to other cities?

Fair and Ethical Use of Data: does the solution take into account biases in data related to underserved communities? Is the data from open and trusted sources?

Innovation: will the idea have a big impact? How innovative is the approach, selection and weighting of risk factors, or how information is displayed and communicated?

Inclusiveness/Diversity: the team working on this should represent diverse views across gender, ethnicity, and age. Does the solution provide insights that factor demographic variables and its relationship to risk? Does the solution provide context, specifically focused on intended outcomes towards equitably assessing locations (e.g. inclusiveness/diversity methodology section in report)?

Guidelines

Stage 1: Registration

Each team member will register on GRMDS. Be sure to check the box stating your intent to register for the data science competition. We will send out a confirmation email to all candidates upon successful registration. Add info about your team in the Team Registration Form. For any questions, please email: [email protected].

Stage 2: Team work and submission

Submissions must include all the coding, CSV file of your risk score predictions, and technical report and are due Monday June 8 at 11:59 PDT. Please upload all deliverables to the GRMDS. Place the names of all team members and team name on the technical report. Submission by any individual group member will represent the whole team.

Stage 3: Evaluation and Final Presentation

Our expert committee from Chamber of Commerce, City and County of LA, RMDS, and academia will evaluate all project deliverables and select the finalist teams. The evaluation criteria will be disclosed in the future announcement. The city of LA may work with partners to deploy and use the winning models to score risks to guide our communities in the form of alerts accessed via map, website, or app

Prizes

Cash prizes, internship opportunities, and certificates of participation will be awarded for first and second place. Teams will also be awarded for most ethical consideration and most reusable code or algorithm.

Cash prizes of over $3K
Considerations for internship positions at the City of Los Angeles, UCLA Computational Medicine, and other partner organizations
1on1 mentorship with data executives
Recommendation of winners’ technical report for publishing at Harvard Data Science Review magazine
Certificates for winners and contestants who make a complete submission
Invitation to present at IM Data 2020

Code of Conduct

The use of data will adhere to ethical use and protection of individual data privacy. Find the Code of Conduct here

Frequently Asked Questions

How do I register?

The registration form can be found here. (You must be signed in to view the form.)

How do I form a team?

Participants are welcomed either as individuals or as teams. In the case of teams, one person must be designated as the team leader and will be solely responsible for communications with the organizers.

How do I register as a Mentor?

For Mentors, we’re seeking members of the business, academic, and research community. We ask our Mentors to hold office hours for 2 hours per week for the two-week duration. To register as a Mentor for the competition, please go to the link here.

How do I make submissions and what are the deliverables?

Submissions can be made here. See above section “Submission Deliverables” to see what must be included in your submission.

What is the deadline to register?

There is no deadline for our registration. But we strongly recommend that your registration is no later than 05/29 since you need time to prepare your work.

Is there a minimum and maximum number of members permitted on a team?

No minimum or maximum number.

How does the number of team members impact potential cash prize offerings?

The number of team members will not impact potential prize offerings. The prize offerings will remain the same.

Can teams comprise members from different cities/countries?

Yes. We welcome people from different cities or countries to join our competition. This competition is open to the global community.

How do I get in contact with the organizers?

If you have any questions, please email: [email protected]. We’ll get back to you as soon as we can.

I already registered my team but need to update my team info. What should I do?

If you need to update your team roster, fill out the form here

What training material do you have to help my team get started?

Please see resources listed on this page, including recordings of competition training sessions. There is also a dataset starter list and further reading material here.

How do I find a mentor for my team?

You may find mentors and email them here with any questions or requesting feedback on your work.

How do I join another team?

Please fill out the form here. If any further questions you could email us at [email protected] with any questions or requesting feedback on your work.