SHAP-Guided Feature Selection for Cross-Dataset Generalization in Network Intrusion Detection Systems

Şengül, Gökhan; Kılıç, Can

doi:10.1109/ACCESS.2026.3703481

SHAP-Guided Feature Selection for Cross-Dataset Generalization in Network Intrusion Detection Systems

dc.contributor.author	Şengül, Gökhan
dc.contributor.author	Kılıç, Can
dc.date.accessioned	2026-06-23T11:10:30Z
dc.date.issued	2026
dc.description.abstract	Flow-based machine learning intrusion detection systems (IDS) often achieve near-perfect performance when trained and tested on a single benchmark dataset; nonetheless, their ability to generalize across datasets is a crucial and mostly unresolved challenge. This study analyzes the cross-dataset generalization behavior of an explainable, flow-based IDS trained on CICIDS2017 and externally evaluated on the CSE-CIC-IDS2018 dataset, which represents a more realistic network environment with varying attack implementations, traffic compositions, and background services. Two frequently used ensemble models, Random Forest and XGBoost, are trained solely on flow-level metadata without packet payload examination. After removing non-behavioral identifiers (Flow ID, Source IP, Destination IP, and Timestamp) and harmonizing feature schemas, the datasets are aligned into a unified 80-dimensional feature space extracted with CICFlowMeter. SHAP (TreeSHAP) is used to calculate global feature importance and create multiple explainability-driven feature subsets, such as model-specific Top-20 sets, a COMMON-10 intersection, and a UNION-30 superset. Although both models attain near-perfect accuracy and weighted F1-scores on CICIDS2017 (macro-F 1 ≈ 0.90 ), when evaluated on CSE-CIC-IDS2018, macro-F1 drops to 0.127 for Random Forest and 0.119 for XGBoost, despite high overall accuracy, indicating a strong bias toward majority classes under domain shift conditions. SHAP-guided feature reduction provides a measurable but limited improvement for Random Forest, increasing macro-F1 from 0.127 to 0.166, while an additional port-removal ablation further improves macro-F1 to 0.207. In contrast, no significant cross-dataset improvement is observed for XGBoost. An additional practical observation is that SHAP-guided feature rankings remain highly stable across sample sizes: class-balanced subsets of approximately 400 flows (50 samples per class) produce highly similar Top-20 rankings to those obtained from 10,000 flows (1250 samples per class), supporting the feasibility of computationally efficient explainability. Overall, the results show that explainability-driven feature analysis improves transparency, compactness, and feature prioritization; however, it does not fully resolve the broader distributional shift challenges that limit cross-dataset generalization in flow-based intrusion detection systems.
dc.identifier.doi	10.1109/ACCESS.2026.3703481
dc.identifier.issn	2169-3536
dc.identifier.uri	https://hdl.handle.net/20.500.14411/11624
dc.identifier.uri	https://doi.org/10.1109/ACCESS.2026.3703481
dc.language.iso	en
dc.publisher	IEEE
dc.relation.ispartof	IEEE Access
dc.rights	info:eu-repo/semantics/openAccess
dc.subject	Cross-dataset generalization
dc.subject	explainable artificial intelligence
dc.subject	flow-based traffic analysis
dc.subject	network intrusion detection
dc.subject	random forest
dc.subject	SHAP
dc.subject	XGBoost
dc.title	SHAP-Guided Feature Selection for Cross-Dataset Generalization in Network Intrusion Detection Systems
dc.type	Article
dspace.entity.type	Publication
gdc.description.department	Computer Engineering
gdc.description.publicationcategory	Makale - Ulusal Hakemli Dergi - Kurum Öğretim Elemanı
gdc.description.scopusquality	Q1
gdc.description.volume	14
gdc.description.wosquality	Q2
relation.isAuthorOfPublication.latestForDiscovery	f291b4ce-c625-4e8e-b2b7-b8cddbac6c7b
relation.isOrgUnitOfPublication.latestForDiscovery	50be38c5-40c4-4d5f-b8e6-463e9514c6dd

Files

Original bundle

Now showing 1 - 1 of 1

Name:: SHAP-Guided_Feature_Selection_for_Cross-Dataset_Generalization_in_Network_Intrusion_Detection_Systems_IEEE_ACCESS_2026.pdf
Size:: 1.79 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed to upon submission
Description:

Download

Collections

WoS