SHAP-Guided Feature Selection for Cross-Dataset Generalization in Network Intrusion Detection Systems

Şengül, Gökhan; Kılıç, Can

doi:10.1109/ACCESS.2026.3703481

SHAP-Guided Feature Selection for Cross-Dataset Generalization in Network Intrusion Detection Systems

Files

SHAP-Guided_Feature_Selection_for_Cross-Dataset_Generalization_in_Network_Intrusion_Detection_Systems_IEEE_ACCESS_2026.pdf (1.79 MB)

Date

2026

Authors

Şengül, Gökhan

Kılıç, Can

Publisher

IEEE

Abstract

Flow-based machine learning intrusion detection systems (IDS) often achieve near-perfect performance when trained and tested on a single benchmark dataset; nonetheless, their ability to generalize across datasets is a crucial and mostly unresolved challenge. This study analyzes the cross-dataset generalization behavior of an explainable, flow-based IDS trained on CICIDS2017 and externally evaluated on the CSE-CIC-IDS2018 dataset, which represents a more realistic network environment with varying attack implementations, traffic compositions, and background services. Two frequently used ensemble models, Random Forest and XGBoost, are trained solely on flow-level metadata without packet payload examination. After removing non-behavioral identifiers (Flow ID, Source IP, Destination IP, and Timestamp) and harmonizing feature schemas, the datasets are aligned into a unified 80-dimensional feature space extracted with CICFlowMeter. SHAP (TreeSHAP) is used to calculate global feature importance and create multiple explainability-driven feature subsets, such as model-specific Top-20 sets, a COMMON-10 intersection, and a UNION-30 superset. Although both models attain near-perfect accuracy and weighted F1-scores on CICIDS2017 (macro-F 1 ≈ 0.90 ), when evaluated on CSE-CIC-IDS2018, macro-F1 drops to 0.127 for Random Forest and 0.119 for XGBoost, despite high overall accuracy, indicating a strong bias toward majority classes under domain shift conditions. SHAP-guided feature reduction provides a measurable but limited improvement for Random Forest, increasing macro-F1 from 0.127 to 0.166, while an additional port-removal ablation further improves macro-F1 to 0.207. In contrast, no significant cross-dataset improvement is observed for XGBoost. An additional practical observation is that SHAP-guided feature rankings remain highly stable across sample sizes: class-balanced subsets of approximately 400 flows (50 samples per class) produce highly similar Top-20 rankings to those obtained from 10,000 flows (1250 samples per class), supporting the feasibility of computationally efficient explainability. Overall, the results show that explainability-driven feature analysis improves transparency, compactness, and feature prioritization; however, it does not fully resolve the broader distributional shift challenges that limit cross-dataset generalization in flow-based intrusion detection systems.

Keywords

Cross-dataset generalization, explainable artificial intelligence, flow-based traffic analysis, network intrusion detection, random forest, SHAP, XGBoost