A Deep-Learning Approach to Driver Drowsiness Detection

By inergency On Nov 17, 2023

1. Introduction

Drowsiness, defined as a feeling of sleepiness, may lead to the following symptoms: reduced response time, an intermittent lack of awareness, or the presence of microsleeps (blinks lasting more than 500 milliseconds). A lack of sleep affects thousands of drivers who drive on highways daily, including taxi drivers, truck drivers, and people traveling long distances. Moreover, the feeling of drowsiness reduces drivers’ degree of attention, resulting in hazardous conditions. This significantly increases the possibility of drivers missing road signs or exits, drifting into other lanes, or even becoming involved in accidents and is one of the major contributing factors to accidents on the road. Globally, fatalities and injuries have increased yearly due to driver drowsiness while driving. Nowadays, artificial intelligence (AI) has become a significant factor in resolving many global issues. An instance of this is in the reduction in the number of accidents on the road that are caused by drowsiness via safety driver drowsiness detection technology that can help prevent accidents caused by drivers who fall asleep while driving. A multitude of behavioral and overall health issues, including impaired driving performance, have been related to sleep disturbances. Thousands of accidents worldwide are caused by insufficient sleep, exhaustion, inadequate road conditions, and weariness [1]. The public health administration is concerned about the potential involvement of inadequate driving, asleep-in-traffic accidents, deaths, and injuries that have been increasing because of such issues. Table 1 shows the ratio of accidents and percentage of fatalities and injuries attributable to drowsy driving in the Kingdom of Saudi Arabia [2], the United Kingdom [3], the United States [4], and Pakistan [5].

The main contribution of this study is to develop a drowsiness detection system using computer vision techniques to identify a driver’s face in the images, then use deep-learning techniques to predict whether the driver is sleepy/drowsy or not based on their face image in a real-time environment. Moreover, this is a first-of-its-kind study in Saudi Arabia to be conducted on a public and diversified dataset that is very much aligned with regional aspects such as facial features, gender-based features, etc. In most of the studies in the literature, accuracy was considered to be the only figure of merit or the sole evaluation metric, while other metrics, such as precision, recall, and F1-score, are missing, despite their ability to comprehend a model’s effectiveness in a variety of ways. In this proposed study, all four metrics are investigated, and a 99% value is obtained for precision, recall, and F1-score, while the accuracy is 97%. This makes the proposed model distinct from the others. Finally, the proposed study primarily investigates two models—one of which is the designed CNN model and the other a pretrained model—and contrasts their effectiveness, finding that the CNN outperforms the alternative.

To accomplish this, a deep-learning model is developed and trained on a dataset obtained from Kaggle, a web-based data science platform from which data and machine learning researchers may discover and share datasets for analysis and model development. This study potentially contributes to the Saudi Vision 2030 for smart cities and road and public safety while driving, especially on highways, where there is a relatively higher speed limit and more potential for road accidents.

In terms of theoretical contribution, this study provides a comprehensive review of related studies in the literature, finds a research gap, and describes the motivation behind this study, especially from a KSA perspective. As far as the practical contributions are concerned, the proposed approach provides practices to be implemented by the administration and road safety departments to detect drivers’ conditions and prevent fatal accidents on the road in real time. Overall, this study is a good contribution to the existing body of knowledge.

The rest of this paper is structured as follows: Section 2 provides the related work in the literature, while Section 3 highlights the dataset and its potential features used in this study. The proposed model’s description and deployment are provided in Section 4, and an evaluation is performed in Section 5. Section 6 concludes this paper.

2. Related Work

The study in [6] proposed to detect driver drowsiness based on eye state. A dataset was created with 2850 images separated into different classes. In this paper, a novel framework based on deep learning is developed to identify driver fatigue while driving a car. The Viola–Jones face detection method is utilized to recognize the eye area, a stacked deep convolution neural network is created to determine important frames in camera sequences, and the SoftMax layer in a CNN classifier is used to classify if the driver is sleeping or non-sleeping. As a result, the model achieved an improved accuracy of 96.42% compared with traditional CNN. In [7], the authors utilized a forward deep-learning CNN to identify driver sleepiness. The authors used two datasets: the Closed Eye in the Wild dataset (CEW) and the Yawing Detection Dataset (YawDD). The proposed model achieved an accuracy of 96%. Similarly, another study [8] proposed a video-based model using ensemble CNN (ECNN), which is comprised of four different CNN architectures to measure the degree of sleepiness. The authors used the YawDD dataset, which consists of 107 images, and a 93% F1-score was achieved using the proposed ECNN. The authors aim to investigate a more balanced and larger dataset in the future for improvement. The authors of [9] used recurrent neural networks (RNNs) and CNNs to detect drowsiness as well as a fuzzy logic-based approach to extract numeric data from the images. It was carried out using the UTA Real-Life Drowsiness Dataset (UTA-RLDD), which includes 60 videos. RNN and CNN achieved 65%, whereas fizzy logic obtained 93%.

Florez et al. [10] proposed a drowsy driving detection system via real-time eye status identification using three deep-learning algorithms, namely InceptionV3, VGG16, and ResNet50V2. In this regard, they used the dataset named NITYMED, containing drivers’ videos with diverse drowsiness states. The technique was promising in terms of detection accuracy.

Utaminingrum et al. [11] conducted research on rapid eye recognition using image-processing techniques based on a robust Haar sliding window while utilizing a private dataset collected in Malang City. The proposed approach achieves 92.40% accuracy. The technique was not robust against the variable lighting conditions, and the authors aimed to make it robust, faster, and precise in their future study.

Budiyanto et al. [12] conducted a study on a private dataset to develop an eye detection system based on image processing for vehicle safety. They have achieved 84.72% accuracy when the facial situation is upright and slanted no more than 45 degrees. The major shortcoming of the study is that eye identification was more effective at particular light intensity values and facial positions. Li et al. [13] carried out a study to detect fatigue while driving to improve traffic safety. They suggested a new detection method f based on facial multi-feature fusion and applied it to an open-source dataset named WIDER_ FACE [14]. The proposed method has obtained good results with 95.10% accuracy. However, there is still a need for enhancement in some areas, such as high intrusiveness and detection performance in complicated surroundings. Hazirah et al. [15] used a computer vision approach named PERCLOS and support vector machine (SVM) to categorize eye closeness for observing driver concentration and tiredness. They also compare the performance of the proposed approach for RGB and grayscale images. The approach achieves an accuracy of 91% on photos with lenses, while photos without lenses scored 93% accuracy. Furthermore, the trials reveal that RGB images outperform grayscale images in terms of classification accuracy, whereas grayscale images outperform RGB images in terms of processing time. The study has one limitation: it employed an unpublished, private dataset. In a recent study conducted in [16], an innovative real-time model was developed utilizing computer vision techniques to identify instances of driver fatigue or inattention. The primary objective of the model is to enhance driving safety by alerting drivers when there are signs of inattention or fatigue. To carry out this study, a significant dataset of videos was collected, which was analyzed using the Viola–Jones algorithm. This algorithm consists of four stages, including Haar feature selection, constructing an integral image, AdaBoost for training, and cascade classifiers for detecting faces. Through this methodology, the authors were able to achieve an accuracy exceeding 95%.

A recent study [17] employed SVM to detect drowsiness by conducting image segmentation and emotion detection, specifically tracking facial expressions such as eyes and mouth movement, using a private dataset. Additionally, the model exhibited robustness to changes in illumination, enabling it to perform effectively in varying lighting conditions with an accuracy of 93%. To further optimize the performance, the researchers also intend to enhance the model’s adaptability to various environmental conditions. The authors of [18] introduced an image-processing method to identify sleepiness by assessing the conditions of the mouth, eyes, and head. The authors presented a new and effective methodology, influenced by the human visual system (HVS) [19]. In the proposed algorithm, a private dataset was pre-processed to reduce noise and guarantee illumination invariance. Subsequently, the behavior of the mouth, eyes, and head were extracted to aid in detecting the driver’s drowsiness. Based on these three features, a new algorithm is developed to determine whether the driver is drowsy based on head dropping, yawning, and closed eyes. The proposed model yielded an accuracy of 90%. Another study [20] proposed a detector for detecting blinking and drowsiness using a pre-trained CNN based on Dlib features. The detector computes Euclidean distance between recorded eye coordinates to estimate eye aspect ratio (EAR). Moreover, the CNN was trained using the HAAR cascade algorithm to detect facial features. The dataset employed in this study consisted of 17,000 images. Furthermore, the model’s performance was evaluated in varying facial angles and low-light conditions using an infrared camera, and it achieved a satisfactory accuracy of 99.83%. In a research study conducted by the authors of [21], a vision-based system for driver drowsiness detection was developed. The system employed the histogram of oriented gradient (HOG) technique for feature extraction and the Naïve Bayes (NB) algorithm for classification. A dataset named NTHU-DDD, consisting of 376 videos, was used to train and evaluate the proposed model, which achieved an accuracy of 85.62%. To enhance the model’s generalization capability, the authors plan to utilize different datasets in their future research.

In another study [22], the objective was to reduce the number of accidents caused by tired and sleepy drivers. To identify significant facial characteristics, shape prediction techniques are applied. OpenCV’s built-in HAAR cascades performed face detection. A dataset named iBUG-300w, containing 300 indoor and outdoor images, was used. When the face is properly aligned, and there are no wearing obstructions, the accuracy is almost 100%. In [23], the authors aimed to create a system that can determine a driver’s level of weariness using a series of images that are taken such that the subject’s face is visible. Two different approaches are developed, focusing on reducing false positives, to determine if the driver shows sleepiness symptoms or not. The first uses a recurrent CNN (RCNN), whereas the second option uses deep learning to extract numerical information from photos, which are then added to a fuzzy logic-based system. UTA Real-Life Drowsiness Dataset (UTA-RLDD) is used with videos of 60 distinct individuals in two different states: awake and drowsy. Moreover, this dataset is realistic. Both alternatives achieved comparable accuracy levels: roughly 65% on training data and 55–65% on test data. In [24], authors proposed an approach that uses machine learning to identify sleepiness from images. To categorize eyes as open or closed, CNN was used. In this regard, the media research lab’s eyes dataset is used. Various eye images of males and females closed or open, glasses on or off, and eyes that reflect light in intensities are included in the dataset. The approach obtained training and testing accuracies of 98.1% and 94%, respectively.

In [25], the main goal was to create a system that accurately assesses a driver’s level of drowsiness based on the angle of their eyelids. The system was dependable enough to send the appropriate notifications as well as email emergency contacts. OpenCV is used for face detection, and it also works with the EAR function. If a person is not facing the camera, the result of this research states that the eyeballs cannot be detected. In [26], the authors aimed to build a computer vision-based model to observe the condition of the eyes and mouth to identify the weariness state of the driver to provide a good safety tool. The dataset comprised 16,600 images with eleven features. The authors utilized four distinct algorithms, which are random forest, k-nearest neighbor (kNN), general regression neural network, and genetic algorithm-based RNN (GA-RNN), to contrast the results. The best-performing algorithm with high generalization and solidity was the GA-RNN, with an accuracy of 93.3%. A recent study conducted by Chand and Karthikeyan [27] provides a deep-learning model to detect drowsiness and analyze emotions to predict the status of the driver and prevent car accidents. The authors used an image dataset of size of 17,243 containing four different classes (normal, fatigue, drunk, reckless) to build the system. They employed the SVM, kNN, and CNN algorithms to investigate the outcome. The CNN was the outstanding algorithm with a high accuracy of 93%.

A study by Phan et al. [28] intended to utilize deep-learning algorithms to build a system for recognizing the driver’s fatigue status and firing an alarm to wake the user. For this research, the authors used a mixed dataset of 16,577 images and videos to deliver a binary classification (drowsiness and non-drowsiness). They applied two deep-learning algorithms to conduct this experiment, which are the MobileNet-V2 and ResNet-50V2. The best model performance for the study was the ResNet-50V2, with an accuracy of 97%. As a limitation of this work, the study delivers a binary classification of the problem, where, in real life, detecting the yawning is also important to prevent any future accidents. The study by Zhao et al. [29] proposed a driver drowsiness detection system using facial dynamic fusion information and a deep belief network (DBN) with a private dataset. The system achieved an accuracy of 96.70% in detecting driver drowsiness using dynamic landmark and texture features of the facial region. The proposed system has significant potential for improving road safety and could also have applications in sleep medicine. The authors compared their approach with state-of-the-art methods and found it outperformed them in terms of accuracy, robustness, and efficiency. However, the only limitation is that a private dataset was used. Overall, this study represents an important step toward the development of reliable and accurate driver drowsiness detection systems.

A study by Alhaddad et al. [30] proposed an image-processing-based system for detecting driver drowsiness using EAR and blinking analysis. The study used a private dataset and achieved a detection accuracy of 92.10%. The system used the Dlib library for facial landmark detection and EAR calculation to detect the driver’s drowsiness. The study’s contribution lies in its ability to accurately detect drowsiness regardless of the size of the eye, demonstrating the effectiveness of image-processing methods for drivers’ drowsiness detection. Guede-Fernández et al. [31] aimed to develop a novel algorithm for monitoring a driver’s state of alertness by analyzing respiratory signals. The researchers used a quality signal classification algorithm and a Nested LOSOCV algorithm for model selection and assessment. The novel algorithm, called TEDD, was validated using a private dataset, achieving an accuracy of 96.6%. The techniques include signal processing, feature extraction, and machine learning. The results suggest that respiratory signal analysis can be an effective approach for drowsiness detection in drivers.

Vishesh et al. [32] developed a computer vision-based system to detect driver drowsiness in real time using eye blink detection. The authors used a CNN and OpenCV for image processing and feature extraction, along with a new method called horizontal and vertical gradient features (HVGFs) to improve accuracy. The study used an eye blink dataset consisting of eye images from 22 participants. CNN was trained on 80% of the dataset and tested on the remaining 20%, achieving an accuracy of 92.86% in detecting eye blinks. However, based on the experimental outcome, the proposed method can achieve an accuracy of 97%. The relationship between the rate of eye movement and the level of driver drowsiness was also analyzed. The authors found a correlation between the rate of eye movement and the degree of drowsiness, which could help detect and prevent accidents caused by driver fatigue. The study concluded that the proposed system could effectively detect driver drowsiness and be integrated with existing driver assistance systems to improve road safety. The developed prototype serves as a base for further development and potential implementation in vehicles to reduce the risk of accidents caused by drowsy driving.

Mehta et al. [33] developed a real-time driver drowsiness detection system using non-intrusive methods based on EAR and eye closure ratio (ECR). The system uses a webcam to capture images of the driver’s face and extracts features from the eyes using EAR and ECR. The study used a dataset comprised facial images of 10 subjects recorded while driving. The authors manually annotated the images to indicate whether the driver was drowsy or not. The dataset was split into a training set (80%) and a testing set (20%). Moreover, the authors used a random forest (RF) to classify the drowsy and non-drowsy states of the driver based on the EAR and ECR features. The proposed model achieved an accuracy of 84% in detecting driver drowsiness. Finally, the study concluded that the proposed system could be used as a part of a driver monitoring system to improve road safety. However, the system’s performance can be further improved using a larger dataset and robust classification algorithms.

Another study [34] aimed to classify drowsy and non-drowsy driver states based on respiration rate detection using a non-invasive, non-touch, impulsive radio ultra-wideband (IR-UWB) radar. A dataset was acquired, consisting of age, label (drowsy/non-drowsy), and respiration per minute. Different machine learning models were used in the study, namely, SVM, decision tree, logistic regression, gradient boosting machine (GBM), extra tree classifier, and multilayer perceptron (MLP). As a result, SVM achieved the best accuracy of 87%. A study conducted by the authors of [35] aimed to develop a system to reduce accidents caused by the driver’s drowsiness. The dataset was developed and generated by the authors. In this study, images are preprocessed using the Haar cascade classifiers to methodically improve the CNN model’s hyperparameters. The performance of the model is measured using a variety of metrics, including accuracy, precision, recall, F1-score, and confusion matrix. Therefore, the model classified the input data with 97.98% accuracy, 98.06% precision, 97.903% recall, and 97.981% F1-score.

In [36], the objective of the study was to develop a system that can recognize drowsy driving and warn the driver to prevent accidents. Images were gathered from the online public dataset titled “Driver drowsiness”, available on the Kaggle website. The Naïve Bayes region of interest (NB-RoI) algorithm is used to detect the eyes, and a single-layer artificial neural network (ANN) algorithm is utilized for labeling the eyes as “drowsy” or “alert” based on the detection of eye closure. Accuracy and miss rate are the performance measures used in the study. The ANN model achieved 81.62% accuracy and a miss rate of 18.38%.

A comprehensive summary of the reviewed literature is presented in Table 2, which emphasizes the type of dataset, methods, and algorithms used, and the best results obtained in this study. From the table, it is evident that driver drowsiness detection is among the hottest and emerging areas of research in public and road safety, which needs more research to improve the performance of the classification algorithms for observing the drivers’ behavior, especially in real-time environments [37].

6. Conclusions and Future Work

In conclusion, this research aims to investigate deep learning to detect driver drowsiness and accurately classify it into four groups: closed, open, no yawn, and yawn. To achieve accurate results, the dataset (drowsiness dataset) consisting of 2900 images was used and trained in this project. The CNN technique is effective in this task of classifying the different drowsiness categories classified as four various classes. The CNN model structure of Conv2D, MaxPooling2D, Flatten, and Dropout layers helped with enhancing the performance of detection. Thus, the CNN modeling technique achieved the best results among all the benchmark studies with an accuracy of 97%, precision of 99%, and recall and F1 score of 99%. In contrast to the state-of-the-art approaches, the proposed study exhibits comparable results in terms of accuracy and outperforms in terms of precision, recall, and F1-score. This proposed study is a potential contribution towards road and public safety, especially in metropolitan areas, highways, and smart cities. Public administration and governmental agencies can be the potential stakeholders of the study, especially in the Kingdom of Saudi Arabia. The idea can be implemented via smart surveillance and integrated into the traffic monitoring systems. From Saudi Arabi’s perspective, in the future, this study can be extended to observe the conditions of female drivers wearing veils by integrating more diverse datasets. In this regard, the new dataset can be produced to add features such as gender, age group, year of driving experience, veils, makeup, eyelashes, etc. Moreover, drivers’ psychological conditions, in addition to the current features, can be added. Further, we aim to improve the efficiency of drowsiness detection systems with the help of deep-learning techniques as well as using supportive models that can be integrated with CNN to increase accuracy further and reduce the computation time.