-
We compared the YOLOv4 performance at the threshold value (or probability of detection) of 0.5 (@0.5) using the two visual datasets. The results for testing images showed that the mean average precision (mAP) @0.5 using the visual dataset 2 from on-site images from a local fire department in Taiwan achieved 91%, which is much higher than the mAP@0.5 of 27% using the visual dataset 1 from Google Images. The four model evaluation metrics (Accuracy, precision, recall, and F1-score) are summarized in Table 1. For both training and testing datasets, the model performances using visual dataset 2 were much higher than using visual dataset 1.
Table 1. Comparison of model performances using the two datasets.
Training Testing Visual
dataset 1
(Internet)Visual
dataset 2
(Taiwan onsite)Visual
dataset 1
(Internet)Visual
dataset 2
(Taiwan onsite)Accuracy 0.78 0.96 0.37 0.71 Precision 0.77 0.97 0.51 0.83 Recall 0.83 0.97 0.38 0.77 F1-score 0.80 0.97 0.44 0.80 We compared the average precision for each class using the two datasets as shown in Table 2. The results show that using the on-site images from the local Taiwan fire department could better differentiate the firefighters from non-firefighters, indicating that using the local fireground images was preferable in order to capture local features.
Table 2. Comparison of the average precision for each class using the two datasets.
Training Testing Visual
dataset 1
(Internet)Visual
dataset 2
(Taiwan on-site)Visual
dataset 1
(Internet)Visual
dataset 2
(Taiwan on-site)Firefighter 0.88 0.99 0.40 0.75 Non-firefighter 0.73 0.98 0.07 0.98 Firetruck 0.57 1.00 0.35 1.00 -
Figure 2 summarizes the flowchart of this study. We explored two methods of collecting images for training the model: (1) downloading images from online sources; and (2) obtaining images from the scenes of fires. Deng et al. showed that, with a clean (clean annotations) set of full-resolution (minimum of 400 × 350 pixel resolution) images, object recognition can be more accurate, especially by exploiting more feature-level information[25]. Therefore, we downloaded full-resolution images from Google Images by searching firefighters and fire trucks. We used only those images without copyright in this study. In total, 612 images were obtained from online sources for the first visual dataset. Moreover, we obtained on-site images from fire events in Taiwan. A total of 152 images were obtained for the second visual dataset. After collecting the images, we annotated each image by three classes: firefighters, non-firefighters, and fire trucks. Then, the annotated images were used for training the YOLOv4 model for detection, and each visual dataset was done separately. YOLOv4 was compiled in Microsoft Visual Studio 2019 to run on Windows Operating System with GPU (GeForce GTX 1660 Ti with 16 GB-VRAM), CUDNN_HALF, and OpenCV. Finally, we compared the model performances using the two visual datasets and discussed the implications of this study. For each dataset, the images were partitioned into training and testing sets, with an 80%−20% split.
YOLOv4 network architecture and model performance evaluation metrics
-
YOLOv4 consists of three main blocks, including the 'backbone', 'neck', and 'head'[19]. The model implements the Cross Stage Partial Network (CSPNet) backbone method to extract features[26], where there are 53 convolutional layers for accurate image classification, also known as CSPDarknet53. CSPDarknet53 can largely reduce the complexity of the target problem while still maintaining accuracy. The 'neck' is a layer between the 'backbone' and 'head', acting as feature aggregation. YOLOv4 uses the Path Aggregation Network (PANet)[27] and Spatial Pyramid Pooling (SPP) to set apart the important features obtained from the 'backbone'[28]. The PANet utilizes bottom-up path augmentation to aggregate features for image segmentation. The SPP enables YOLOv4 to take any size of the input image. The 'head' uses dense prediction for anchor-based detection that helps divide the image into multiple cells and inspects each cell to find the probability of having an object using the post-processing techniques[29].
The model performances were analyzed using the four metrics: Accuracy, precision, recall, and F1-score[30]. A confusion matrix for binary classification includes four possible outcomes: true-positive (TP), true-negative (TN), false-positive (FP), and false-negative (FN). TP indicates the number of objects successfully detected by the algorithm. TN indicates the number of non-objects successfully identified as not an object. FP indicates the number of non-objects that are falsely identified as objects. FN indicates the number of objects falsely identified as non-objects. Accuracy represents the overall performance of the model; or the portion of TP and TN over all data points. Precision represents the model’s ability to identify relevant data points or the proportion of data points classified as true, which are actually true. Recall is described as the model’s ability to find all relevant data points. It is the proportion of the total number of correctly identified data points given the overall relevant data points. F1-score is equated by the balance between precision and recall. Maximizing precision often comes at the expense of recall, and vice-versa. Determining the F1-score is useful in this assessment to ensure optimal precision and recall scores. Their calculations are as follows:
$ Accuracy=\frac{TP+TN}{TP+TN+FP+FN} $ $ Precision=\frac{TP}{TP+FP} $ $ Recall=\frac{TP}{TP+FN} $ $ {F}_{1}=\frac{2*Recall*Precision}{Recall+Precision} $ -
About this article
Cite this article
Chang RH, Peng YT, Choi S, Cai C. 2022. Applying Artificial Intelligence (AI) to improve fire response activities. Emergency Management Science and Technology 2:7 doi: 10.48130/EMST-2022-0007
Applying Artificial Intelligence (AI) to improve fire response activities
- Received: 29 January 2022
- Accepted: 21 June 2022
- Published online: 19 July 2022
Abstract: This research discusses how to use a real-time Artificial Intelligence (AI) object detection model to improve on-site incident command and personal accountability in fire response. We utilized images of firegrounds obtained from an online resource and a local fire department to train the AI object detector, YOLOv4. Consequently, the real-time AI object detector can reach more than ninety percent accuracy when counting the number of fire trucks and firefighters on the ground utilizing images from local fire departments. Our initial results indicate AI provides an innovative method to maintain fireground personnel accountability at the scenes of fires. By connecting cameras to additional emergency management equipment (e.g., cameras in fire trucks and ambulances or drones), this research highlights how this technology can be broadly applied to various scenarios of disaster response, thus improving on-site incident fire command and enhancing personnel accountability on the fireground.