Paper Number
1740
Paper Type
Complete
Description
Patching in a timely manner has proven to be one of the most effective ways to protect enterprise information systems from cyberattacks. However, as patching operations are not cost-free, enterprises typically delay patch deployments. Balancing operational expenses and system security risks to determine the optimal timing of patching remains an ongoing challenge. This study addresses the patching decision problem by proposing a novel deep reinforcement learning-based approach. Specifically, we model the patching problem as a Markov decision process with a thorough consideration of various costs, dynamics, and uncertainties. To avoid the curse of dimensionality and obtain an effective patching policy, we develop a novel reinforcement learning method called Action-Decomposed Proximal Policy Optimization (ADPPO). Experimental results indicate that the proposed approach significantly outperforms benchmarks. This study contributes to both the cybersecurity management and the reinforcement learning communities.
Recommended Citation
Jia, Qian; Qu, Xinxue; Jiang, Zhengrui; and Wang, Chengjun, "Enterprise Security Patch Management with Deep Reinforcement Learning" (2024). ICIS 2024 Proceedings. 6.
https://aisel.aisnet.org/icis2024/security/security/6
Enterprise Security Patch Management with Deep Reinforcement Learning
Patching in a timely manner has proven to be one of the most effective ways to protect enterprise information systems from cyberattacks. However, as patching operations are not cost-free, enterprises typically delay patch deployments. Balancing operational expenses and system security risks to determine the optimal timing of patching remains an ongoing challenge. This study addresses the patching decision problem by proposing a novel deep reinforcement learning-based approach. Specifically, we model the patching problem as a Markov decision process with a thorough consideration of various costs, dynamics, and uncertainties. To avoid the curse of dimensionality and obtain an effective patching policy, we develop a novel reinforcement learning method called Action-Decomposed Proximal Policy Optimization (ADPPO). Experimental results indicate that the proposed approach significantly outperforms benchmarks. This study contributes to both the cybersecurity management and the reinforcement learning communities.
Comments
06-Security