A Review of Lithium-Ion Battery State of Charge Estimation Methods Based on Machine Learning

By inergency On Mar 25, 2024

[ad_1]

4.2. Deep Neural Network

The ongoing advancement of deep-learning algorithms has led many researchers to focus on deeper neural networks. Traditional feedforward neural networks often only fit the nonlinear mapping relationship between the input and output based on the current moment’s input features. In contrast, deep neural networks, a subset of deep-learning algorithms, utilize multiple hidden layers and are capable of mapping complex and nonlinear functions. They typically exhibit higher accuracy compared to traditional neural networks [52,53].

Common deep-learning models include Convolutional Neural Networks (CNNs) [54] and Recurrent Neural Networks (RNNs) [55]. CNNs are well-suited for tasks involving grid-like data such as images, as they can automatically learn hierarchical features. On the other hand, RNNs are particularly useful for sequences of data, as they can capture temporal dependencies and context information. The use of deep neural networks, like CNNs and RNNs, has significantly contributed to the success of various machine-learning and artificial intelligence applications.

The typical architecture of a CNN consists of multiple layers, including convolutional layers, pooling layers, fully connected layers, and an output layer [56]. The structure of a CNN is shown in Figure 4. The presence of convolutional processes in the CNN allows for faster learning and reduces the memory requirements. This design is particularly effective for tasks involving grid-like data, such as images. On the other hand, RNNs have a short-term “memory” property, where the information in each hidden layer at a given time step depends on the input at that time step and the hidden layer from the previous time step [57]. A structure diagram of the RNN is shown in Figure 5. The corresponding formulas are (1) and (2).

$h (t) = f (W_{h h} h (t - 1) + W_{i h} x (t) + b_{h})$

(1)

$\overset{\land}{y} (t) = g (W_{h o} h (t) + b_{y})$

(2)

$x (t)$ is the input at the current time step, $t$ . $h (t)$ is the hidden state at the current time step, $t$ . $h (t - 1)$ is the hidden state from the previous time step, $t - 1$ . $W_{i h}$ , $W_{h h}$ , $W_{h o}$ is the connection weight matrix. $y_{t}$ is the output generated by the output layer at the current time step. $b_{h}$ and $b_{y}$ are the bias terms. $f$ and $g$ are the activation functions. In simple terms, the updating of the hidden state in an RNN network is determined by the input at the current time step, $x (t)$ , and the hidden state from the previous time step, $h (t - 1)$ .

However, RNNs suffer from the vanishing gradient or exploding gradient problem when dealing with sequences of data over multiple time steps, limiting their performance. To address this issue, researchers have developed several variants of RNNs. The three mainstream variants of RNNs are Long Short-Term Memory (LSTM) [58], Bi-directional Long Short-Term Memory (Bi-LSTM) [59], and Gated Recurrent Unit (GRU) [60]. LSTM introduces forget gates, input gates, and output gates to mitigate the gradient vanishing problem that exists in traditional RNNs. The structure of the LSTM is shown in Figure 6. The corresponding formula is given in (3)–(8). While this improves the gradient flow, LSTM has a more complex internal structure and increased computational complexity compared to traditional RNNs. The GRU is a modification of LSTM that simplifies the architecture. In contrast to LSTM, which has separate gates for forgetting and updating states, the GRU uses a single gate unit to control both the forgetting and updating processes. This results in a simpler parameter structure and reduced computational complexity compared to LSTM. Both the LSTM and GRU variants have been widely adopted in sequence-modeling tasks, offering solutions to the vanishing gradient problem and improving the performance of RNNs [61].

$f_{t} = σ (W_{f} \cdot [h_{t - 1}, x_{t}] + b_{f})$

(3)

$i_{t} = σ (W_{i} \cdot [h_{t - 1}, x_{t}] + b_{i})$

(4)

${\tilde{C}}_{t} = \tanh (W_{C} \cdot [h_{t - 1}, x_{t}] + b_{c})$

(5)

$C_{t} = f_{t} * C_{t - 1} + i_{t} * {\tilde{C}}_{t}$

(6)

$o_{t} = σ (W_{o} \cdot [h_{t - 1}, x_{t}] + b_{o})$

(7)

$h_{t} = o_{t} * \tanh (C_{t})$

(8)

$f_{t}$ , $i_{t}$ and $o_{t}$ are the activation values of the different gates. $σ$ is the sigmoid function. $W_{f}$ , $W_{i}$ , $W_{C}$ and $W_{o}$ comprise the weight matrix. $b_{f}$ , $b_{i}$ , $b_{c}$ and $b_{o}$ are the bias terms, and $[h_{t - 1}, x_{t}]$ represents the concatenation of $h_{t - 1}$ and $x_{t}$ . $C_{t}$ is the cell state, while $C_{t - 1}$ is the cell state from the previous time step. ${\tilde{C}}_{t}$ is a tanh layer that creates a new candidate value vector.

Hannan et al. [62] introduced a Fully Convolutional Network (FCN) composed of four temporal convolutions for estimating the SOC. The FCN differs from CNNs in that it transforms the fully connected layers typically found at the end of a CNN into convolutional layers. It employs global average pooling to prevent overfitting. The results showed that this model achieved an RMSE of 0.85% and an MAE of 0.7% at room temperature. The FCN’s unique approach combines the automatic feature extraction capabilities of CNNs with the time-series prediction abilities of RNNs.

Zhang et al. [63] combined a CNN and LSTM to create a CNN-LSTM model with the introduction of an attention mechanism. They utilized one-dimensional convolution to capture the spatial features within measurement variables and employed LSTM to capture the features between the current output and past inputs. The attention mechanism allowed the model to focus on key parts of the input data, thereby improving the overall performance and accuracy. The average prediction error at different temperatures reached 0.89%. This model aims to balance the consideration of temporal aspects while enhancing the estimation accuracy.

To augment the robustness and adaptability of their model, Liu et al. [28] introduced a CNN-GRU hybrid, ingeniously integrating the CNN with the GRU. It demonstrated good performance in unknown operating conditions. In cases where the initial SOC value is unknown, the CNN-GRU neural network rapidly converges to the reference values. This network does not require weight adjustments based on the test conditions, complex feature extraction such as averaging or integration, and does not rely on battery models, filtering, or algorithms. This approach streamlines SOC estimation while maintaining accuracy and adaptability.

Because the estimation of the SOC of lithium batteries can be viewed as a time-series problem, it is related not only to the current moment’s input features but also to previous time steps’ input features. This makes RNNs particularly suitable for SOC estimation. However, using a standalone RNN network for SOC estimation can lead to the vanishing gradient or exploding gradient problem over multiple time steps. Therefore, researchers have mostly focused on studying variants of RNNs or algorithmic models that combine RNNs.

Ma et al. [64] trained and validated an LSTM model to estimate the SOC using a publicly available dataset provided by Phillip Kollmeyer. The estimation error RMSE ranged from 1% to 1.8%. Li et al. [65] were the first to propose using a GRU alone for battery SOC estimation. They achieved an MAE of 0.32% at 25 °C under LA92 conditions and compared the errors of the GRU and LSTM models under the same experimental conditions. The results showed that the GRU outperformed LSTM in terms of the estimation error and training time.

Researchers have also explored models that are combinations of RNN variants and other algorithms. Li et al. [66] improved the RNN and introduced the LSTM-RNN neural network, which was validated for lithium-ion battery models under six high-rate pulse conditions. This approach addressed the issues of vanishing gradients and exploding gradients. Because the GRU is an extension of LSTM, it can also address the vanishing gradient and exploding gradient problems. To further optimize the GRU, Han Yitong [67] and colleagues proposed a GRU-RNN estimation model. The GRU-RNN model was trained and tested on battery data under different conditions and temperatures, resulting in improved accuracy and robustness.

Bi-LSTM is an extension of traditional LSTM that can enhance the performance of sequence classification models. Bian et al. [68] used a stacked bidirectional Bi-LSTM model to estimate the battery SOC and compared it with the LSTM and GRU models. The results showed that under three conditions (0 °C, 10 °C, and 25 °C), SBi-LSTM achieved the highest estimation accuracy, with an MAE as low as 0.46%. There is also research related to a Bi-GRU. Zhou et al. [69] proposed a lithium battery SOC estimation model based on a bidirectional Bi-GRU. In addition, some scholars have studied the key problems of joint multi-state estimation.

[ad_2]