Data Science Fundamentals
Data exploration, cleaning, and preparation.
Data Cleaning and Preprocessing
Handling noise, inconsistencies, and formatting issues β garbage in, garbage out is the first law of ML.
Data Splitting and Sampling
Train/validation/test splits, stratification, and handling class imbalance β the foundation of honest evaluation.
Data Types and Structures
Numerical, categorical, ordinal, text, time series β understanding your dataβs nature determines every downstream decision.
Encoding Categorical Variables
One-hot, label, target, and embedding-based encoding β translating categories into numbers without introducing false relationships.
Exploratory Data Analysis
Visualizing distributions, correlations, and anomalies before modeling β the most undervalued step in the ML pipeline.
Feature Scaling and Normalization
Standardization, min-max scaling, and robust scaling β ensuring features contribute equally regardless of their original units.
Handling Missing Data
Deletion, imputation, and model-based approaches β the strategy depends on why data is missing, not just how much.