Abstract

Deep neural networks (DNNs) have expanded their applicability to fields such as large language models, generative artificial intelligence, computer vision, and autonomous driving. Inspired by biological neural networks, DNNs process input signals through interconnected neurons across different layers, update these neurons’ weights, and generate responses as output signals. Activation functions (AFs) play a crucial role in shaping these responses by influencing weight updates, thereby affecting DNN performance. The DNN literature highlights non-linearity as a fundamental design principle for AFs in order to learn complex patterns in datasets (Dubey et al., 2022). For example, the Sigmoid AF introduces an S-shaped non-linearity constrained between 0 and 1, while the Hyperbolic Tangent AF extends this S-shape to range between -1 and 1. Interestingly, however, the Leaky ReLU AF, which operates within the range of –∞ to ∞, remains the most accurate, despite being a piecewise linear function with two linear segments. Leaky ReLU’s non-linearity arises from the transition between the linear segments, making it much weaker than the non-linearity introduced by the sigmoid and hyperbolic tangent AFs. According to recent research conducted by Son and Lee (2025), the leaky ReLU AF was sufficient to perfectly fit complex patterns (e.g., a U-shaped pattern) by varying the number of neurons, hidden layers, and epochs. This possibly suggests that the non-linearity defined by AFs plays a minimal role in enabling DNNs to learn complex patterns in datasets. Instead, the coverage range of each AF appears to be more important (e.g., 0 to 1 for the Sigmoid AF; -1 to 1 for the Hyperbolic Tangent AF). In this research, we propose incorporating the maximum standardized value of the dependent variable, where y is the dependent variable) into AFs. By doing so, each AF will cover the full coverage of standardized values in the dependent variable, thereby mitigating the vanishing gradient problem and consequently improving performance. The preliminary results are promising. DNN regressors utilizing 3 neurons, one hidden layer, and 20 epochs have demonstrated significant improvement in learning a U-shaped pattern in the same dataset across diverse AFs, as shown in Table 1. For example, DNNs with the leaky ReLU AF achieved an R2 value of 0.90 through the proposed range expansion, an average improvement of 0.174. Moving forward, we aim to gather empirical evidence to validate and generalize the findings of this study.

Comments

tpp1317

Share

COinS