Automation and Robots for Disaster Response

Superpixel-Based Graph Convolutional Network for UAV Forest Fire Image Segmentation

By inergency On Apr 5, 2024

[ad_1]

1. Introduction

Forest fires, resulting from both natural and anthropogenic factors, pose a significant threat to global biodiversity and the ecological balance [1]. Globally, an average of over 200,000 forest fires occur annually, resulting in not only substantial economic losses but also long-term damage to ecosystems [2]. Against this backdrop, the development of efficient fire monitoring and imaging technologies has become key to addressing this challenge. Traditional monitoring methods, such as manual surveillance from watchtowers and detection via infrared instruments on helicopters, have played a role in early fire identification but face issues of high costs, low efficiency, and limited coverage. Consequently, utilizing imaging technology for fire surveillance and early warning not only provides real-time data on fire behavior but also aids in disaster assessment and the formulation of response strategies.

However, in the context of forest fire monitoring and management, merely detecting fires is often insufficient. Fire detection can quickly identify the presence of a fire, but for disaster response and management decisions, it is crucial to accurately determine the fire’s specific location, extent, and intensity. Fire segmentation, which precisely delineates the fire area from non-fire areas through image analysis techniques, is a key step in achieving this goal. It not only helps assess the actual impact range of the fire but also provides important information on the fire’s spread trend, thereby supporting the formulation of more effective firefighting strategies and resource allocation. Therefore, in forest fire monitoring and management, combining fire detection with further fire segmentation is indispensable for achieving rapid and accurate disaster response. But the processing of forest fire images presents unique challenges. The complex backgrounds, inadequate contrast, visual obstructions, and irregular shapes of fires all increase the difficulty of image segmentation [3]. These factors lead to the inefficiency of traditional image processing methods in accurately identifying and segmenting fire areas, failing to meet the needs for rapid response [4,5]. Therefore, exploring new technologies and methods capable of overcoming these challenges is crucial for enhancing the accuracy and efficiency of forest fire monitoring and imaging.

Traditional image segmentation methods typically rely on pixel-level processing, treating each pixel as an independent unit. Such coarse-grained representation can increase computational complexity and lower algorithm efficiency when processing forest fire images. Furthermore, they fail to adequately capture the contextual information and spatial relationships within the image, resulting in inaccuracies in the segmentation outcome [6]. In recent years, deep learning has achieved significant success in various fields, including computer vision [7], machine translation [8,9], image recognition [10,11], and speech recognition [12], and has been widely employed in image segmentation tasks [13]. Among these, the graph convolutional network, a powerful deep learning model, can effectively learn the spatial relationships and semantic information in images. Early research on graph convolutional networks adhered to a recursive method, where vertex representation was learned by iteratively propagating information among its neighbors until a stable state was reached [14,15]. More recently, Liu et al. [16] introduced a novel, multifeature fusion network that combines multiscale graph convolutional networks (GCNs) and multiscale convolutional neural networks (CNNs) for the classification of hyperspectral imaging system (HIS) images. They demonstrated the effectiveness of their approach in achieving accurate classification results. Wang et al. [17] developed a similar architecture called UNet Transformer for real-time urban scene segmentation. In this model, they employed a lightweight ResNet18 encoder to capture global and local information. To simulate the integration of global and local information in the decoder, they designed a sophisticated global–local attention mechanism, which enhanced the segmentation performance of the model. In the work by Wu et al. [18], a graph neural network (GNN) model was proposed based on the feature similarity of multiview images. They established correlation nodes between multiview images and library images, enabling the transformation of graph node features into correlation features between images. Furthermore, they designed an image-based region feature extraction method, which simplified the image preprocessing process and better extracted important image characteristics for improved performance.

However, in traditional GCN models [19], each pixel is treated as a node, an approach that could lead to information loss and increased computational complexity. To overcome the limitations of conventional GCN models and enhance the accuracy and efficiency of forest fire image segmentation, we propose a graph convolutional network based on superpixels in this study. Superpixels, a technique for segmenting images, partition the image into contiguous regions exhibiting similar texture and color characteristics. This method not only provides precise edge information but also captures the object details effectively [2,20]. Several commonly used superpixel segmentation algorithms include simple linear iterative clustering (SLIC) [21], superpixels extracted via energy-driven sampling (SEEDS) [22], and superpixel segmentation using Gaussian mixture models (GMMSP) [23]. Researchers often utilize superpixel segmentation as a preprocessing step for image segmentation tasks. For instance, Belizario et al. [24] employed superpixels for pre-segmentation, derived feature matrices based on color information, and proposed an automatic image segmentation method based on weighted recursive label propagation (WRLP), which quantifies the similarity between superpixels by utilizing edge weights. Xiong and Yan [25] developed a novel superpixel merging technique to address the over-segmentation problem in single-frame video sequence images and employed support vector machines (SVMs) for spectral-based superpixel classification. There are also studies that combine CNNs with SLIC for image segmentation [26]. For example, digital methods of superpixel segmentation and convolutional neural networks were used to segment trees in a forest environment. However, CNNs are primarily designed for handling Euclidean space data and may encounter limitations when applied to graph-structured data [27]. This is because CNNs typically assume that data are uniformly distributed in space, an assumption that may not hold when dealing with graph-structured data characterized by complex spatial distributions, such as forest fires. The dynamic and uncontrolled expansion of forest fires and their potential to damage ecosystems necessitate real-time monitoring and grading of these events to achieve accurate forecasting and control. However, traditional CNNs, due to their inability to handle non-uniformly distributed spatial data effectively, may not achieve this objective when processing such graph-structured data.

To overcome the aforementioned issues, we propose an algorithm based on graph convolutional networks (GCNs) utilizing superpixels. Initially, the algorithm utilizes the color and spatial proximity information of superpixels to partition the image into multiple superpixel blocks. Subsequently, each superpixel block is considered a graph node, and edges between these nodes are constructed based on the regional color space characteristics of the image. Then, features are extracted from each superpixel block using a convolutional neural network. Through iterative training of the GCN, graph node classification is performed, and node class labels are associated with the corresponding superpixel blocks, resulting in the final segmentation outcome. The novelty of our proposed algorithm includes the following aspects:

(1) We have developed a superpixel-based graph convolutional network model specifically for forest fire image segmentation. To address the inevitable loss of boundary information caused by resizing input images to their original sizes after passing through the encoder, we propose a preprocessing step that converts grid-structured images into graph-structured ones using superpixels. Specifically, our model performs node prediction for each image converted into a superpixel graph. This preprocessing step can preserve essential boundary information, thereby enhancing the overall performance of the segmentation process.

(2) We introduced a novel forest fire image segmentation approach based on both convolutional neural networks and graph convolutional networks. We enhance the graph convolutional operator of the GCN by utilizing GraphSAGE’s operator. Specifically, the CNN is employed to extract features from superpixel blocks, while the GCN is used to predict node labels within the graph.

(3) To address the issues of class imbalance and varying pixel sizes within superpixel blocks, we introduce a novel loss function. This function imposes varying degrees of penalties on superpixel blocks of different classes and sizes, thereby effectively managing imbalanced data.

The rest of this paper is organized as follows. In Section 2, the forest fire dataset and the methods and modules used in the experiments are introduced. The forest fire segmentation model presented in this paper is also elaborated upon in this section. Section 3 presents the experimental results of each part of the improvements. Section 4 describes the discussion and analysis of the model, as well as the outlook for future work. A summary of the entire work is presented in Section 5.

[ad_2]