Deep Deterministic Policy Gradient (DDPG) Agent-Based Sliding Mode Control for Quadrotor Attitudes

[ad_1]

1. Introduction

As an unmanned flight platform, a quadrotor UAV has the advantages of a simple structure, lightweight fuselage, and low cost. It is widely used in various tasks such as cargo transportation, aerial photography, agricultural plant protection, rescue and relief operations, remote-sensing mapping, and reconnaissance [1,2,3,4]. A wide range of application scenarios also impose strict requirements on its flight control capability, particularly the attitude control during UAV flight [4,5,6]. However, the lightweight fuselage of a quadrotor leads to its poor ability to resist external disturbances, which reduces the accuracy of its attitude control.
There have been many studies on attitude control methods for quadrotors. Some linear control methods such as proportional integral derivative (PID) control [7,8,9] and linear quadratic regulation [10] have been widely used in engineering practice, due to the advantages of their simple structure and easy implementation. The PID and LQ methods were applied for the attitude angle control of a micro quadrotor, and the control laws were validated through autonomous flight experiments in the presence of external disturbances [11]. A robust PID control methodology was proposed for quadrotor UAV regulation, which could reduce the power consumption and perform well in the disturbances of parameter uncertainties and aerodynamic interferences [12]. Twelve PID coefficients of a quadrotor controller were optimized using four classical evolutionary algorithms, respectively, and the simulation results indicated that the coefficients optimized from the differential evolution algorithm (DE) could minimize the energy consumption when compared with other algorithms [7]. While linear or coefficient-optimized linear controllers may be suitable for some of the above scenarios, it is often found that the nonlinear effects of the quadrotor dynamics are non-negligible [13], and that the linear control methodologies are incapable due to their reliance on approximately linearized dynamical models. Various control approaches have been used in quadrotors considering the nonlinear dynamics model. One of these approaches is nonlinear dynamic inversion (NDI), which can theoretically eliminate the nonlinearities of the control system [14], but this control method is much dependent on the model accuracies [15]. The incremental nonlinear dynamic inversion (INDI) methodology was used to improve the robustness against the model inaccuracies, which could achieve stable attitude control even though the change in pitch angle was up to 90° [16]. The adaptive control algorithm has also been widely used in quadrotor systems [17,18]. Two adaptive control laws were designed for the attitude stabilization of a quadrotor in order to deal with the problem of parametric uncertainty and external disturbance [18]. A robust adaptive control strategy was developed for tracking the attitude of foldable quadrotors, which were modeled as switched systems [19].
Due to the advantages of fast response times and strong robustness, the sliding mode control (SMC) methodology has been widely applied in the attitude tracking of quadrotors [20,21]. However, the problem of control input chattering is apparent in the traditional reaching law designed in SMC. A fuzzy logic system was developed to adaptively schedule the control gains of the sign function, effectively suppressing the control signal chattering [22]. A novel discrete-time sliding mode control (DSMC) reaching law was proposed based on theoretical analysis, which could significantly reduce chattering [23]. An adaptive fast nonsingular terminal sliding mode (AFNTSM) controller was introduced to achieve attitude regulation and suppress the chattering phenomenon. The effectiveness of this controller was verified through experiments [24]. A fractional-order sliding mode surface was designed to adaptively adjust the parameters of SMC for the fault-tolerant control of a quadrotor model with mismatched disturbances [25].
The above works of research have great significance as references. However, the control signal chattering still needs further improvement and attention when the SMC method is applied in attitude regulation with external disturbances. There are many strategies for reducing the chattering phenomenon of an SMC algorithm [26,27,28,29,30,31], such as the super-twisting algorithm and sigmoid approximation. The first strategy mainly introduces an integral term into the switching function term [28,29,30], which is equivalent to low-pass filtering, so as to make the control signal continuous. However, the disadvantage of this strategy is that it needs to select an appropriate integral term coefficient, and the coefficient needs to be adjusted according to the change of the system motion state. The second strategy usually approximates the switch function by constructing a function [31], so as to remove the dependence on the switch function, so as to make the state change of the system smoother. The disadvantage of this strategy is that the lack of the switch function leads to a decline in the robustness of the control system, and eventually the steady-state error becomes larger.
With the development of artificial intelligence technology, more and more reinforcement learning algorithms have been applied to traditional control methodologies [32,33]. Inspired by these studies, a deep deterministic policy gradient (DDPG) [34] agent was introduced to the SMC in this paper. The parameters linked to the sign function can be adaptively regulated by the trained DDPG agent. This adaptive regulation helps to suppress the control input chattering in attitude control, especially in the presence of external disturbances.

The primary contribution of our work can be summarized as follows: a reinforcement learning agent, based on DDPG, is trained to adaptively adjust the switching control gain in the traditional SMC method. This adaptation effectively suppresses the chattering phenomenon in attitude control.

The remainder of this paper is organized as follows: Section 2 introduces the attitude dynamics modeling for a quadrotor UAV. In Section 3, the traditional SMC and the proposed DDPG-SMC are designed for solving attitude control problems. In Section 4, the robustness and effectiveness of the proposed control approach are validated through simulation results, followed by key conclusions in Section 5.

2. Attitude Dynamics Modeling for Quadrotor UAV

The quadrotor is considered a rigid body, and its attitude motion can be described by two coordinate frames: an inertial reference frame (frame I) O i x i y i z i and a body reference frame (frame B) O b x b y b z b , as shown in Figure 1. The attitude motion of the quadrotor can be achieved by rotating each propeller. The attitude angles can be described as η = [ ϕ , θ , ψ ] T in frame B, where ϕ , θ , ψ are the roll angle (rotation around the x-axis), pitch angle (rotation around the y-axis), and yaw angle (rotation around the z-axis), respectively. The attitude angular velocities are expressed as ζ = [ p , q , r ] T , where p , q , r are the angular velocities in the roll, pitch, and yaw directions, respectively.
According to the relationship between the angular velocities and the attitude rate, the attitude kinematics equation of the quadrotor can be expressed as follows [35]:
where

Φ ( η ) = 1 tan θ sin ϕ tan θ cos ϕ 0 cos ϕ sin ϕ 0 sec θ sin ϕ sec θ cos ϕ

The attitude dynamics equation of the quadrotor can be written as follows [36]:

J ζ ˙ + ζ × J ζ = τ

where J = diag J x , J y , J z ; J x , J y , and J z are the moments of inertia along the O b x b , O b y b , and O b z b axes, respectively; τ = L , M , N T denotes the control inputs; L , M , and N are the control torques in the roll, pitch, and yaw directions, respectively. When external disturbances are taken into account, the attitude dynamics Equation (3) can be rewritten as

J ζ ˙ = ζ × J ζ + τ + τ d ,

where τ d denotes the external disturbances.

4. Simulation Results

The robustness and effectiveness of the proposed control approach can be verified via flight simulations. The quadrotor used in our study is modified and designed on the basis of a DJI F450 UAV, and its specific technical parameters are shown in Table 2.
The basic simulation conditions are described as follows. Initial attitude angles and angular velocities of the quadrotor are set as η 0 = 0.1 rad , 0.2 rad , 0.1 rad T and ζ 0 = 0 rad / s , 0 rad / s , 0 rad / s T , respectively. The desired attitude angles are selected as η d = 0 rad , 0.1 rad , 0 rad T . The external disturbances are assumed to act on the system in the form of torques: τ d = 0.005 × sin π / 100 t

cos π / 100 t

sin π / 100 t

T

  N m

. Three control approaches, including SMC, the AFGS-SMC proposed in reference [22], and the DDPG-SMC designed in this paper, are used in the flight simulation, respectively.

4.1. Simulation Results of SMC

The relevant control parameters of the sign function in SMC are designed as follows: k = diag ( 0.2 , 0.2 , 0.2 ) and λ = diag ( 1.5 , 1.5 , 1.5 ) . The numerical simulation results are depicted in Figure 8.
As shown in Figure 8a, the dashed and solid lines represent the desired and actual attitude angles, respectively. The convergence times of attitude angles in three directions (roll, pitch, and yaw) are 1.8 s, 2.1 s, and 1.8 s, respectively, from the initial value to the desired value. This indicates that the quadrotor attitude can be regulated into the desired attitude using the SMC algorithm, in the presence of disturbances.
The time histories of angular velocities and control inputs are depicted in Figure 8b,c, respectively. It can be seen that the attitude angular velocities in three directions approach 0 rad/s during time periods of 2.2 s, 2.4 s, and 2.2 s, respectively. The angular velocities oscillate slightly around 0 rad/s to maintain balance in the quadrotor system.

However, the chattering of the control inputs is more severe when the control system is stabilized. The control input signal in the roll direction oscillates in the range of −0.012 N·m to 0.014 N·m, the control input signal in the pitch direction oscillates in the range of −0.008 N·m to 0.016 N·m, and the control input signal in the yaw direction oscillates in the range of −0.018 N·m to 0.024 N·m. Since the control input signals denote the torques generated by the quadrotor propellers, chattering at high frequencies is absolutely unacceptable for the quadrotor’s actuators.

Figure 8d,e represent the time evolutions of control gains related to the reaching law and sliding mode surfaces, respectively. It can be seen that sliding mode surfaces converge to zero asymptotically, and the control gains remain constant throughout the entire simulation process. These constant gains lead to the chattering phenomenon of control signals.

4.2. Simulation Results of AFGS-SMC

The chattering phenomenon is caused by high-frequency switching around the sliding mode surface, attributed to the term k·sign(s) in SMC. The adaptive fuzzy gain-scheduling sliding mode control (AFGS-SMC) method in the reference [22] was proposed by the authors’ team in 2016. This method can effectively suppress the control signal chattering, and the authors would like to compare it with the method proposed in this paper in terms of control performance. The simulation results for this method are depicted in Figure 9.
As presented in Figure 9a, the dashed and solid lines represent the desired and actual attitude angles, respectively. The convergence times of attitude angles in three directions (roll, pitch, and yaw) are 2.2 s, 2.5 s, and 2.2 s, respectively, from the initial value to the desired value. This demonstrates that the quadrotor attitude can be regulated into the desired attitude using the AFGS-SMC algorithm, in the presence of disturbances.
The time evolutions of the quadrotor’s angular velocities and control inputs are depicted in Figure 9b,c, respectively. It can be seen that the attitude angular velocities in the three directions approach 0 rad/s during time periods of 3.1 s, 2.8 s, and 3.1 s, respectively, and the oscillation is significantly reduced. In contrast to the results for SMC, the chattering phenomenon of the control input is significantly reduced.
Figure 9d,e represent the time evolutions of control gains related to the reaching law and sliding mode surfaces, respectively. It can be seen that the sliding mode surfaces converge to zero asymptotically, and the control gains are adjusted adaptively via the associated fuzzy rules in AFGS-SMC. This adaptive adjustment helps reduce the chattering phenomenon of control signals.

4.3. Simulation Results of DDPG-SMC

Similar to the AFGS-SMC method mentioned above, the control gains of DDPG-SMC are time-varying and can be adaptively scheduled through the DDPG-based parameter regulator. The simulation results for DDPG-SMC are depicted in Figure 10.
As depicted in Figure 10a, the dashed and solid lines represent the desired and actual attitude angles, respectively. The convergence times of attitude angles in three directions (roll, pitch, and yaw) are 2.0 s, 1.9 s, and 2.0 s, respectively, from the initial value to the desired value. This demonstrates that the quadrotor’s attitude can be regulated into the desired attitude using the designed DDPG-SMC algorithm, in the presence of disturbances.
The time evolutions of the quadrotor’s angular velocities and control inputs are presented in Figure 10b,c, respectively. It can be seen that the attitude angular velocities in the three directions approach 0 rad/s during time periods of 2.1 s, 2.4 s, and 2.6 s, respectively, and the oscillation is much less.
Figure 10d,e represent the time evolutions of control gains related to the reaching law and sliding mode surfaces, respectively. It can be seen that the sliding mode surfaces converge to zero asymptotically, and the control gains related to the reaching law are adjusted adaptively via the trained DDPG agent. This adjustment can help reduce the chattering phenomenon of control signals.

4.4. Comparative Analysis of Simulation Results

To compare the control performance of the above three methods, the convergence time and steady-state errors of the attitude angles and the chattering amplitudes of the control signals are selected as performance indicators, which are listed in Table 3.

It can be seen that both AFGS-SMC and DDPG-SMC can greatly reduce the chattering of control signals. However, in the DDPG-SMC method, the convergence time is shorter and the steady-state error is smaller than those of the AFGS-SMC method, indicating that the DDPG-SMC method exhibits better control performance.

Remark 3. 

(1) The traditional SMC, referenced AFGS-SMC, and designed DDPG-SMC methods all perform effectively and robustly in attitude control, with the presence of external continuous disturbances. (2) The disadvantage of the traditional SMC is that a high-frequency chattering phenomenon exists in the control input signals. (3) The control gains related to the reaching law in DDPG-SMC can be adjusted adaptively via the trained reinforcement learning agent, where the chattering phenomenon is effectively reduced.

[ad_2]

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Accept Read More