Location

Online

Event Website

https://hicss.hawaii.edu/

Start Date

3-1-2022 12:00 AM

End Date

7-1-2022 12:00 AM

Description

Capabilities of deep reinforcement learning (DRL) in obtaining fast decision policies in high dimensional and stochastic environments have led to its extensive use in operational research, including the operation of distribution grids with high penetration of distributed energy resources (DER). However, the feasibility and robustness of DRL solutions are not guaranteed for the system operator, and hence, those solutions may be of limited practical value. This paper proposes an analytical method to find feasibility ellipsoids that represent the range of multi-dimensional system states in which the DRL solution is guaranteed to be feasible. Empirical studies and stochastic sampling determine the ratio of the discovered to the actual feasible space as a function of the sample size. In addition, the performance of logarithmic, linear, and exponential penalization of infeasibility during the DRL training are studied and compared in order to reduce the number of infeasible solutions.

Share

COinS
 
Jan 3rd, 12:00 AM Jan 7th, 12:00 AM

On the Verification of Deep Reinforcement Learning Solution for Intelligent Operation of Distribution Grids

Online

Capabilities of deep reinforcement learning (DRL) in obtaining fast decision policies in high dimensional and stochastic environments have led to its extensive use in operational research, including the operation of distribution grids with high penetration of distributed energy resources (DER). However, the feasibility and robustness of DRL solutions are not guaranteed for the system operator, and hence, those solutions may be of limited practical value. This paper proposes an analytical method to find feasibility ellipsoids that represent the range of multi-dimensional system states in which the DRL solution is guaranteed to be feasible. Empirical studies and stochastic sampling determine the ratio of the discovered to the actual feasible space as a function of the sample size. In addition, the performance of logarithmic, linear, and exponential penalization of infeasibility during the DRL training are studied and compared in order to reduce the number of infeasible solutions.

https://aisel.aisnet.org/hicss-55/es/monitoring/5