Data Science Life Cycle
The OSEMN framework
Figure 1: Data Science Process (a.k.a the O.S.E.M.N. framework
Image source: https://towardsdatascience.com/5-steps-of-a-data-science-project-lifecycle-26c50372b492
- The OSEMN framework is comprised of 5 major steps that help us to focus and prioritize the right data science tasks at different stages:
- Obtaining Data
- Scrubbing Data
- Exploring Data
- Modelling Data
-
Model parameter estimation
-
Hyper-parameter tuning
- Hyperparameters are the parameters that define the model architecture.
- Hyperparameters are external to the model and cannot be estimated from data.
- Hyperparameter optimization or tuning is the process of searching for the ideal model architecture (a set of optimal hyperparameters).
Figure 2: Model Data
Image source: https://towardsdatascience.com/5-steps-of-a-data-science-project-lifecycle-26c50372b492
- Interpretation of Data.
Tidy workflow in Data Science
- Tidy workflow in R for Data Science (Wickham and Grolemund 2017), describes the tools needed in a typical data science project.
Figure 3: Tidy Workflow in data science
Image source: https://r4ds.had.co.nz/introduction.html
R
Import | Tidy | Transform | Visualize | Model | Communicate |
---|---|---|---|---|---|
readr | tidyr | dplyr | ggplot2 | broom | rmarkdown |
heaven | tibble | lubridate | tidymodels | bookdown | |
readxl | forcats | modelr | knitr | ||
htr | stringr | shiny | |||
rvest | |||||
xml2 |
Python
Import | Tidy | Transform | Visualize | Model | Communicate |
---|---|---|---|---|---|
pandas - tabular data | pandas | pandas | matplotlib | Scikit-Learn | Jupyter Notebook |
numpy (numerical data) | seaborn | statsmodels | JupyterLab | ||
plotnine (GoG) | TensorFlow | Dash | |||
plotly | keras | streamlit | |||
Flask |
Jupyter Notebook vs JupyterLab
-
Jupyter Notebook is a web-based interactive computational environment for creating Jupyter notebook documents.
-
JupyterLab is the next-generation user interface including notebooks. It has a modular structure, where we can open several notebooks or files (e.g. HTML, Text, Markdowns etc) as tabs in the same window. It offers more of an IDE-like experience.