Oct 26, 2021 (PST)
Virtual Data Science Learnathon with KNIME
Data Scientists, KNIME Team
Data Scientist at KNIME
9:00 am - 11:00 am
Current data science and machine learning applications in the medical and aerospace industry
Dr. Kyongsik Yun
Technologist, NASA/Jet Propulsion Lab
12:00 pm - 2:00 pm
Getting insights from text data (collecting, cleaning, and analyzing text data from the web)
Founder & Data Scientist at Opening Data
Intermediate - advanced
3:00 pm - 5:00 pm
Data: Oct 26th , 2021
Time: 9:00 am – 11:00 am (PST)
Difficulty Level: Beginner
Name: Dr. Satoru Hayasaka – Data Scientist, KNIME, Wali Khan – Solution Engineer, KNIME, Corey Weisinger – Data Scientist, KNIME
Presented by KNIME
This learnathon is a mix between a hackathon and a workshop. It's like a workshop because we'll learn more about the data science cycle: data access, data blending, data preparation, model training, optimization, testing, and deployment. It's like a hackathon because we'll work in groups to hack a workflow-based solution to guided exercises.
The tool of choice is the open-source, GUI-driven KNIME Analytics Platform. Because KNIME is open, it offers great integrations with an IDE environment for R, Python; SQL, and Spark.We'll start with an introduction to KNIME Analytics Platform, followed by a short presentation about the data science cycle. After this presentation we split into three groups. Each group focuses on one of the three aspects of the data science cycle.Three zoom breakout rooms will be activated for this purpose. You go into the room for the group you sign up for (below) to attend the specific tutorial and exercises.There will be a KNIME data scientist in each breakout room to help you while you work on the exercises.
Group 1 - Working on the raw data. Data access and data preparation.
Group 2 - Machine Learning. Which model shall I use? Which parameters?
Group 3 - I have a great model. Now what? The model deployment phase.
Dr. Satoru Hayasaka was trained in statistical analysis of various types of biomedical data. Since his doctoral training, he has taught several courses on data analysis geared toward non-experts and beginners. In recent years, he taught introductory machine learning courses to graduate students from different disciplines. Recently he joined KNIME as part of the evangelism team, and he continues teaching machine learning and data mining using KNIME Analytics Platform.
Wali Khan is a Solution Engineer at KNIME based out of Austin, Texas. His main focus is to help people operationalize their Machine Learning Models and analytics pipelines. Before KNIME Wali worked as a consultant at Oracle, holds a Masters Degree in Biomedical Engineering from University of Texas Arlington, and a Chemistry Degree from Texas A&M University.
Corey Weisinger is a Data Scientist with KNIME in Austin Texas. He studied Mathematics at Michigan State University focusing on Actuarial Techniques and Functional Analysis. Before coming to work for KNIME he worked as an Analytics Consultant for the Auto Industry in Detroit Michigan. He currently focuses on Signal Processing and Numeric Prediction techniques and is the Author of the Alteryx to KNIME guidebook.
- the data science cycle
- how to hack a workflow-based solution
- data access and data preparation
- model selection
- model deployment
- Professionals who are looking for more practice with data preparation, model selection, and model deployment
- Students and beginners who are interested in learning about the data science cycle
- Expert instruction on data access, data blending, data preparation, model training, optimization, testing, and deployment
- Hands-on data science work
- Guided exercises
- More practice with the GUI-driven KNIME Analytics Platform
Data: Oct 26th , 2021
Time: 12:00 pm – 2:00 pm (PST)
Difficulty Level: Beginner
Prerequisties: Basic Statistics
Name: Kyongsik Yun, Ph.D. – Technologist, NASA/Jet Propulsion Lab
is a technologist at the Jet Propulsion Laboratory, California Institute of Technology. His research focuses on building brain-inspired technologies and systems, including deep learning computer vision, natural language processing, brain-computer interfaces, and noninvasive remote neuromodulation. He received the JPL Explorer Award (2019) for scientific and technical excellence in machine learning applications. In addition to his research, Kyongsik co-founded two biotechnology companies, Ybrain and BBB Technologies, that have raised $25 million in investment funding.
What data science and machine learning topics and techniques are actually used in industry? How do you apply the specific deep learning skills you are learning now to solving real-world problems? What are some recent topics people are interested in addressing in your industry? If you have any of these questions, this workshop is for you. This workshop covers specific use cases of data science and machine learning technologies in the medical and aerospace industries. Topics include computationally efficient, physically constrained neural networks; combined convolutional and recurrent neural networks for explainable AI; multivariate data fusion and time series prediction. These technologies can be applied to a variety of use cases in medical, aerospace and earth science issues, and financial forecasting models.
- Computationally-efficient, physically-constrained neural networks (transforming nonlinear physical/mathematical problems into data-driven deep learning models)
- Combining convolutional and recurrent neural networks for explainable and trustable machine learning solutions
- Multivariate time series prediction using LSTM and Transformer models
- 3D convolutional neural networks for medical image classification and segmentation
- Beginner and intermediate software developer, research fellow, student in data science and machine learning
- Learn which deep learning techniques are being used to solve real problems
- Understand the essentials of computational efficiency and explainability in deep learning
- Gain industry insights through practical examples
Data: Oct 26th , 2021
Time: 3:00 pm – 5:00 pm (PST)
Tools: R or Python
Difficulty Level: Intermediate - advanced
Prerequisties: Basic Statistics, Python for beginners
Name: Dr. Adriana Summerow – Founder & Data Scientist at Opening Data
is the founder and data scientist at Opening Data. Her experience includes 7+ years of professional experience applying predictive modeling, data pre-processing, and Natural Language Processing (NLP) algorithms. She has worked for companies such as Deloitte and Lockheed Martin as senior consultant specialist and industrial engineer solving challenging business problems. Her business acumen includes the application of data engineering, machine learning, and data visualization solutions for enterprises located in North and South America.
This is a hands-on workshop focusing on text data collection and data processing to make it ready for analysis and visualization. In this workshop we will implement machine learning classifiers and hyperparameter tuning to predict sentiment and categorize entities using the content generated on the web which has become increasingly crucial to successfully run a business.
- Text preprocessing & lemmatization
- Word vectorization
- Implementation of machine learning classifiers
- Evaluation of graphs
- Hyperparameter tuning
- Industry best practices & insights
Beginner to intermediate data analyst, data scientist, data engineer, software developer, and students of data analytics
- Make sense of text data and improve the data-driven decision making by integrating Natural Language Processing into the analysis of documents, social media, online reviews and more.
- Streamline processes and reduce cost by automating the analysis of text data with automated and scalable machine learning models.
- Understand the language of your customer base, learn to perform market segmentations, and get the tools to impact performance in Finance, Healthcare or Marketing.