Individual Tree Species Identification for Complex Coniferous and Broad-Leaved Mixed Forests Based on Deep Learning Combined with UAV LiDAR Data and RGB Images
Based on the current research status, this study proposes an object detection method that can combine LiDAR point cloud and ultra-high spatial resolution RGB image data in natural coniferous broad-leaved mixed forests under complex conditions, achieving highly automated and high-precision identification of individual tree species. The specific objectives of this study are to:
(1) Explore the individual tree species identification ability of a YOLO v8 model in natural mixed coniferous and broad-leaved forests under complex conditions, compare YOLO v8 with current mainstream object detection models (RetinaNet, Faster R-CNN, SSD, YOLOv5), and reveal the impact of different spatial resolution images and YOLOv8 model scales on individual tree species identification results.
(2) Evaluate the effectiveness of the current multi-source remote sensing data band combination method for identifying individual tree species in natural coniferous and broad-leaved mixed forests under complex conditions compared to single data sources.
(3) Propose an improved YOLO v8 model according to the characteristics of the multisource remote sensing forest data to achieve more precise individual tree species identification in natural coniferous and broad-leaved mixed forests under complex conditions.
3.1. Data Preprocessing
The raw data was processed using DJI Terra 3.4 software. The RGB data was registered and stitched by the software. Combined with the UAV flight altitude and RGB camera parameters, the maximum output of 2.7 cm resolution orthophoto can be obtained. To explore the impact of image spatial resolution on tree species identification, the RGB images with 2.7 cm spatial resolution were selected for subsequent analyses. The original point cloud data was filtered and denoised using LiDAR360 4.1.3 software, and then ground points normalization was performed, and the point cloud data of the study area was obtained after cropping. CHM images were generated based on an inverse distance weight interpolation method. In order to facilitate data fusion, the spatial resolution was consistent with that of the RGB images, which was 2.7 cm.
3.2. Dataset Creation and Data Fusion
Since the input data for an object detection model is generally a 3-channel RGB image, fusing depth information data can replace one channel of the RGB image with a CHM image, thereby generating RG-D, R-D-B, and D-GB images. However, this approach directly discards 1/3 of the RGB images data, while a PCA algorithm concentrates the main information of the data into the previous component through transformation. Therefore, ENVI 5.3 software was used to perform principal component transformation analysis on the RGB images. The first two components of the principal component transformation were used to fuse with depth information to obtain PCA-D images. Since the image size and spatial resolution were exactly the same as those of the RGB images, the corresponding image dataset was obtained by segmenting it in the same way as RGB images.
3.3. Performance Comparison of Different Object Detection Models
In order to compare the individual tree species identification capabilities of different models, the RGB dataset was trained using various models, including the first-stage algorithms RetinaNet, SSD, YOLO v5, and YOLO v8. In addition, the classic second-stage object detection algorithm Faster R-CNN was also used to train the models to explore their capabilities in identifying individual tree species under complex natural coniferous and broad-leaved mixed forests.
3.4. Tree Species Identification Effectiveness of Different Scales and Spatial Resolutions in YOLO v8
YOLO v8 can be divided into five scales based on different scaling factors, namely n, s, m, l and x, each associated with an increasing number of parameters. With additional parameters, the accuracy of the model rises, but it also makes the model larger, more complex, and slower to run. This study obtained RGB datasets with spatial resolutions of 2.7, 3.6, 5.4, 8.1, 10, 15, 20, 30, 40, 50, and 80 cm through resampling, and conducted training to explore the impact of different spatial resolutions and model scales on the accuracy of individual tree species identification.
3.5. Tree Species Identification Performance of Different Data Fusion Methods
Currently, there is limited research on the optimal fusion method for RGB and CHM image data in the field of forestry. Therefore, this study is based on the YOLO v8 model and trains the RG-D, R-D-B, D-GB, and PCA-D datasets to compare the accuracy impact of different fusion methods on individual tree species identification.
3.6. AMF GD YOLO v8 Model
3.6.2. Gather-and-Distribute Mechanism
The AMF GD YOLO v8 model is an improvement based on the characteristics of different modalities of multi-source forest remote sensing data. Compared to the original YOLO v8, the improved model can simultaneously input RGB and CHM images for tree species identification and achieve multi-level fusion of RGB and CHM features through a feature fusion model. In response to the characteristic that trees are mostly small to medium-sized targets, the gather-and-distribute mechanism was introduced to comprehensively utilize the features extracted by the backbone. Compared to PAN-FPN, the P2 detection layer was added to enhance the ability to detect small targets and thereby better achieving individual tree species identification in complex natural coniferous and broad-leaf mixed forests.
3.7. Accuracy Evaluation and Experimental Environment
The experiment of this study utilized PyTorch as the deep learning framework. The experiments were conducted on a desktop computer with Windows 11 as the operating system. The hardware included an NVIDIA GeForce RTX 3090 GPU with 24 GB of VRAM, 32 GB DDR4 RAM, and an Intel (Core (TM) i7-12700 CPU. This setup provided a robust platform for conducting the deep-learning experiments and evaluations necessary for this study.
YOLO v8 provides five scales for researchers to use. As the number of parameters increases, its accuracy will improve, but the model will become larger and complex, and the computational efficiency will also decrease. This study used different scales to train datasets at different spatial resolutions to explore the differences in model accuracy and speed. The x scale of YOLOv8 achieved the highest individual tree species identification accuracy but had the lowest speed. In contrast, the n and s scales had lower accuracy but faster speeds. However, the detection accuracy of l and m scales had a smaller difference compared to x scale but faster detection speeds. Thus, in practical applications, it is important to choose the appropriate scales and spatial resolution based on research objectives to achieve an efficient balance between accuracy and detection speed. This consideration is vital for optimizing tree species identification performance and data collecting efficiency using UAV remote sensing.
This study based on deep learning achieved automated and precise identification of individual tree species in natural coniferous and broad-leaved mixed forests. It explored the impacts of different models, spatial resolutions, and data fusion methods on individual tree species identification, and the proposed AMF GD YOLO v8 model achieved encouraging results in individual tree identification. However, there are still some limitations worthy of further research. The dual-branch feature extraction and fusion structure within the AMF module have achieved an improvement in accuracy, yet they have also increased the computational complexity of the model. Future research can focus on developing more lightweight models, which can be deployed in small-scale devices to enable real-time acquisition of tree species information. Alternatively, developing more advanced model architectures to further improve the accuracy of tree species identification.
While the creation of a multi-source remote sensing dataset for forests has validated the efficacy of the AMF GD YOLO v8 model, its generalizability across forest types under varying geographical, climatic, or ecological conditions still requires further verification. However, current research on individual tree species identification using deep learning is hindered by the lack of comprehensive and large-scale public datasets encompassing a wide variety of tree species, which is crucial for enhancing the model’s performance and universality.
Disasters Expo USA, is proud to be supported by Inergency for their next upcoming edition on March 6th & 7th 2024!
The leading event mitigating the world’s most costly disasters is returning to the Miami Beach
And in case you missed it, here is our ultimate road trip playlist is the perfect mix of podcasts, and hidden gems that will keep you energized for the entire journey-