Expert Interviews

As part of its mission to empower researchers, data scientists, and analysts worldwide, RMDS Lab is expanding its cohort of experts through its expert portal. The expert portal seeks to expand RMDS Lab’s base of experts and practitioners in the fields of data science and AI, which includes specialists from some of the world’s top universities and companies to allow for further interaction and collaboration. To highlight the research and accomplishments of its members, RMDS Lab conducted interviews to showcase the work of its members. Click on the profiles below to learn more about a select group of our experts.

1. Tell us a little about yourself as well as your past and current research interests.

I’m the Director of the China Data Institute. I have been serving as the Director of the ChinaData Center and am also a faculty member of the ICPSR. China Data Center has been providing a data series on China. We offer data series to libraries, faculty, students, and all those interested in China studies. We also offer different training webinars.

2. What inspired you to pursue your research interests?

Before I came to the US for my PhD, I saw there were many issues on research development in China. I started my Bachelor’s degree in Computer Science and my Master’s in Statistics, and I realized I needed training in economics for a better understanding of the regional development of economics. So my PhD was on spatial economics and the methodology of spatial statistics. So that expanded my research interest from regional economics to spatial to data science.

3. You’ve recently published a lot of research on Covid-19. What are some of the growing trends you’ve seen in terms of how data science has approached Covid-19?

It’s very interesting to see many kinds of data science emerge during Covid-19. Traditionally, we rely more on government-provided data or data crafted by different research institutions. During Covid-19, there’s a lot of data from non-authority and non-professional data sources. Covid-19 data was compiled by many nonprofit organizations. So that changed the way and where we can get the data. Different apps, like Twitter, and different data sources have become mainstream for data science in future. Secondly, there’s coding and sharing on websites. People started to share their research methodology and coding, which is another area of data science public sharing and knowledge transfer. Third is how it is delivered for data science and how you can deliver results. Deploying those results from data science is not only about data but also methodology, coding, and authority. That doesn’t always have peer review, so Covid-19 changed the traditional way of data science.

4. What data science applications have been most useful in tracking Covid-19?

We’re seeing applications and more papers and research on the practice, not just the clinical research to solve problems of our world.

5. What more can data scientists do when researching Covid-19?

There’s different ways data scientists can participate. Data science is a broad concept. There’s different ways data scientists can research Covid-19 or similar areas. They can have data analysis and data visualizations for better understandings of trends and what’s happening. Those are the most popular things they’re doing. Second is to apply different methodologies for people to have a better understanding of how the virus spreads over time and space. Third, data scientists can’t just work on data. They have to work on how data science can solve the real problems for policy and for actions.

6. What advice would you want to give to someone interested in pursuing research/a career in data science?

Data science is a very broad concept. Many universities offer programs of data science content, which may be different. If you want to be a data scientist, be prepared for changes. It’s better to train to have diversified knowledge and try everything so you can be prepared for changes and demands.

7. What resources would you recommend to people wanting to know more about this field?

Harvard Dataverse, which is good for data and for coding, etc. For tools for data analysts, I suggest those data scientists pay more attention to not only traditional data analysis tools, but also new tools for workflow-based data analysis. When methodologies are more complicated, it’s impossible to use traditional ways. How to make data analysis replicable and expandable will be a challenging new direction. Workflow resources, like KNIME, workflow-based data and coding like Dataverse, and many online computation platforms could be great resources.

8. What discussion about data science do you think we need to shed more light on?

We’re in the most challenging aspect of data science. Work is piece by piece, so everyone is stuck to restrictions of resources, skills, and information. Everyone works on one piece, so few of us can see a more complete picture of what we try to find, so data scientists are most challenged with how to put our information together to complete the picture. It’s like a data puzzle, and so how can we put the framework together? Integrating different methodologies, that’s what we need to pay more attention to in the future.

9. Is there anything else you want to mention that we haven’t discussed?

I would like to encourage data scientists, especially young ones, to try to join different communities. Once you got the job, you could have something specific, but the market changes all the time, so increase your abilities and try to find some communities and join some other work to volunteer with different types of people and different areas. That will help your future career. Dr. Bao’s work on Covid-19 is also part of RMDS Lab’s course, Tackling Covid-19. To access this FREE course, log in to The course can be accessed using the self-enrollment code: COVID19.

1. What are the main biases that exist in AI?

Almost any data that is created by humans, we collect about humans, or selected by humans is biased by the very nature of humans and society. AI specialists need to acknowledge that the bias in the systems can reinforce and amplify the inequalities and discrimination that exist in the society. “I am just an engineer” or “I don’t make the last decision” is not enough. We all have a responsibility in reimagining our world and making it better. Biases that can leak into the system are too many to name here but a few to kick off the thinking process are sunk cost bias, automation bias, representation bias, measurement and selection biases, framing effect, stereotyping, availability bias etc…

2. How can these biases best be addressed?

We need more applied ethics training as part of our education, but also digital literacy as part of the mainstream society. Without understanding the implications of big data and how it is processed, and how bias can be included in an AI system, or just as seriously, how AI can be used to exploit human biases, we cannot effectively address the issues created by it. The developers and users of AI systems need to understand their intentional contribution to the creation of further issues in social justice.

3. What are some of the greatest ethical challenges facing AI specialists?

The greatest challenge is how not to be part of a system or tools/services that exploit human dignity and autonomy or well-being. There is always a decision to be made. Should I prioritize revenue or deadlines over responsible debates and development of the AI tool? Is AI even the best solution to this problem? Do I know my dataset enough and the context of that data? Have I voiced my concerns about the issues in the system? Have I empowered the team enough to voice concerns and take responsible action?

4. What steps can companies take to ensure they are using AI ethically?

• Ethical and responsible work starts with C-level commitment first and the leaders being models to the rest of the employees in their actions & priorities. This is not specific to AI but to all work associated with the organization.

• It is then about embedding diversity and the organization’s values and principles into the culture of the organization, into the recruitment practices and incentives mechanisms, and into the project management process, full lifecycle of product development and deployment, policies and procedures.

• Another very critical step is creating the space for people to constructively bring up their ideas and/or concerns so that everyone is expected to think about how to improve the product/service for the consumer, company, society – and have the ability to bring their thoughts forward.

5. What ethical aspects of AI are overlooked and need to be considered more?

Ethics and values are culture and context dependent. We need to be aware that we are not forcing our own values and priorities upon others, especially in a world without digital borders. What AI product you launch today has the potential to be available worldwide immediately. What we are overlooking are non-Western ethical values and perspectives, and also how use of AI is impacting the power relationships within a society (between individual, corporations and government) and between different countries.

6. What ethical challenges do you face when applying AI to social justice issues?

There is an enthusiasm to apply AI tools to every single problem we have around us, without actually diving deeper to the root causes and structural issues with that problem (for example policing or welfare benefit eligibility). We need to move away from that techno-solutionism mindset first. If after looking at context and history of the issue, and deliberating with the stakeholders who have been involved in fighting a particular social justice issue we decide that AI might be a solution, we definitely need to be extra diligent. The outcomes an AI solution brings have impact on human lives in substantial ways. Knowing that any data about humans is biased due to its nature, AI systems might magnify and accelerate the inequality in society and create further obstacles for people to access resources and opportunities.

7. What ethical challenges do you face when applying AI to helping those with intellectual disabilities?

Following up from the previous comment on techno-solutionism, we need to ensure that we are not falling into the ableism trap. In certain cases, when AI solutions are used with people with intellectual disabilities (or any disability for that matter), there is a tendency to use able/typical body as a norm and trying to move the person with a disability towards that norm. Everyone have their own skills and bring diversity to the world. The tools created and offered should use the insights of the people who are impacted and work towards making their lives easier, not assuming that developers have the solutions. The inclusive design approach has been born out of disability justice work (Nothing About Us Without Us), and AI design should be no different. Also, AI practitioners need to ensure that their solutions are not biased and are not creating extra hurdles for people with disabilities or evaluating them as “outliers” in data or results. One billion people, or 15% of the world's population, experience some form of disability. So we need to move away from treating typical body and mind as the norm.

8. What is the best way to alleviate these challenges?

Best way is to understand that ethics is not a constraint on your innovation, an extra expense or a delay in your products. If fully and consistently integrated, it is actually a way to differentiate your product, have better insights with regards to risks and opportunities. In order to alleviate these challenges, those in decision-making roles need to ensure that inclusive design and an ethical, auditable framework is in place in their development and implementation processes and policies. An ethical framework consistently requires the teams to ask crucial questions on inclusion, outcomes, accuracy, metrics, feedback etc.

9. What are some of the most interesting developments in the relationship between AI and intellectual disabilities?

Nearly 6.5 million people in the United States have some level of intellectual disabilities. AI can have the ability to create individualized applications for people with intellectual disabilities, and assist them acquire and maintain adaptive behavior and enhance their linguistic diversity. There should always be human oversight to ensure that the wellbeing of the person is protected and he/she is not being exploited or abused with these technologies. I do want to flip this question a bit though and say for all our talk about artificial ‘intelligence’, we do not know what intelligence is, how humans learn, etc. So we need to be diligent

10. What resources would you recommend to people who want to learn more about ethics in AI and data science?

I actually have created a huge repository of resources for EXACTLY that reason and keep it current. I hope you enjoy everything in and let me know if I am missing any major work.

1. Tell us a little about yourself as well as your past and current research interests.

I am an atmospheric scientist and have been doing research on extreme weather and climate change for over 20 years. My primary research interests are tropical convective storms and global cloud and precipitation variability.

2. In 2010, you received the NASA Exceptional Achievement Medal for “major advances in the understanding of water vapor and cloud feedback on climate change through quantitative analysis of observations from multiple NASA satellites.” Can you tell us more about your research in this area and how it led to this award?

I analyzed NASA satellite observations of clouds and water vapor from Aura Microwave Limb Sounder and Atmospheric Infrared Sounder with my colleagues at JPL. My team found the variations of upper tropospheric water vapor concentration are closely associated with ice cloud amounts. In regions of abundant ice clouds, water vapor concentration is high, resulting from upward transport of moisture from ocean surface by thunderstorms. Higher concentration of water vapor would trap more thermal radiation from the surface, which warms the surface. This forms a positive feedback to surface warming, i.e., surface warming causes more thunderstorms, increase upper tropospheric water vapor concentration, which traps thermal emission and leads to further warming of the surface. We quantified such a positive feedback for the first time using satellite data.

3. What new methodological approaches to predicting and evaluating hurricane forecast models have you seen emerge?

Machine learning and Artificial Intelligence are definitely emerging as new techniques in weather forecast.

4. How effective do you think these approaches will be in the future?

ML/AI have shown tremendous potential in improving forecast accuracy and computational efficiency.

5. What specific challenges do researchers face when applying their research to forecasting?

Lack of real-time observations as inputs to the forecast models.

6. What more needs to be done to improve the application of data science to forecasting?

Collaborations between data scientists and science domain experts.

7. What resources would you recommend to people wanting to know more about hurricane forecasting?

Attend annual hurricane conferences organized by the American Meteorological Society.

8. What discussion about data science do you think we need to shed more light on?

Check out NASA and NOAA websites for satellite observations.

9. What is the best career advice you can give to someone looking to enter your field?

Have a good training in mathematics, physics, computer programming and be passionate about solving real-world problems.

10. Is there anything else you want to mention not discussed above?

I was quite impressed by the students in my Deep Dive class. They did a great job in exploring the hurricane data and making machine learning models for hurricane forecast. Thanks to the RMDS team who organized this class.