Daily Task Overview
- Cleaned and formatted NetFlow V1-based network traffic dataset (NF-UNSW-NB15)
- Dropped irrelevant features (e.g., IP addresses) to prevent data leakage
- Engineered new features to improve classification performance
- Built and trained baseline Random Forest models using Python and scikit-learn
- Evaluated model using precision, recall, F1-score, and ROC-AUC metrics
- Addressed class imbalance through downsampling of benign traffic
- Performed cross-validation to test model stability and detect overfitting
- Visualized feature importances and model metrics using Matplotlib and Seaborn
- Created flow-based models using reconstructed IP address groupings
- Designed layout and structure of an educational prototype web interface
- Delivered weekly presentations outlining progress, challenges, and upcoming goals.