Competition Training: Workflow Guidebook

Overview Download & Link

The data science workflow is a structured procedure of data analysis. A workflow usually follows these steps: data ingestion, data cleaning, data visualization and analysis, feature engineering, modeling and tuning, deployment and maintenance. Modeling can also be divided into several parts such as supervised and unsupervised machine learning, transfer learning, statistical modeling, etc. The cross-industry standard for data science workflow follows the RM4E approach: Equation, Estimation, Evaluation and Execution, and is supported by the Ecosystem Platform.

Equation represents the models and frameworks for our research. They serve as a link between data and research ideas or designs.
Estimation is the link between equations (models) and the data used for our research. They are the algorithms used to compute the parameters of our models.
Evaluation is the fit between models and data. These metrics evaluate the performance of our estimation methods and the produced models.
Execution or Explanation is the link between equations (models) and our intended research purposes. How we explain our results depends on our research purposes and also on the subject we are studying. During this step, the model is deployed and then utilized for decision making.