Processing math: 100%
ARTICLE   Open Access    

Fire detection methods based on an optimized YOLOv5 algorithm

More Information
  • Computer vision technology has broad application prospects in the field of intelligent fire detection, which has the benefits of accuracy, timeliness, visibility, adjustability, and multi-scene adaptability. Traditional computer vision algorithm flaws include erroneous detection, detection gaps, poor precision, and slow detection speed. In this paper, the efficient and lightweight YOLOv5s model is used to detect fire flames and smoke. The attention mechanism is embedded into the C3 module to enhance the backbone network and maximize the algorithm's suppression of invalid feature data. Alpha CIOU is adopted to improve the positioning function and detection target. At the same time, the concept of transfer learning is used to realize semi-automatic data annotation, which reduces training expenses in terms of manpower and time. The comparative experiments of six distinct fire detection algorithms (YOLOv5 and five optimization algorithms) are carried out. The results indicate that the self-attention mechanism based on the Transformer structure has a substantial impact on enhancing target detection precision. The improved location function based on Alpha CIOU aids in enhancing the detection recall rate. The average recall rate of fire detection of the YOlOv5+TR+αCIOU algorithm is the highest, which is 68.5%, clearly outperforming other algorithms. Based on the surveillance video, this optimization algorithm is utilized to detect a fire in a factory, and the fire is detected in the 9th second when it starts to appear. The results demonstrate the algorithm's viability for real-time fire detection.
  • Soil and life have been a central theme for soil system science for soil protection physical habitats enabling biodiversity of the underground biota (Shen & Pan, 2022). The capacity of soil to protect biodiversity has been concerned with the critical but unique natural contribution of global soil and the driver for soil health and more for One Health (Banerjee & van der Heijden, 2022; Lehmann et al., 2020). As a vital part of the earths surface, topsoil plays a critical role in manipulating plant growth and sustaining biodiversity to allow ecosystem function and service (Janzen et al., 2021) and in renewable utilization of natural resources for sustainable land use (Shen & Pan, 2022). The abundant microbial communities in topsoil has been well known to be a pivotal player in biogeochemical cycling (Bahram et al., 2018) through mediating storage and turnover of organic matter and the associated nutrients (Schmidt et al., 2011), a key to drive ecosystem services and biodiversity provided by global soils (Smith et al., 2015). Maintaining richness and diversity of soil microbial communities has been widely concidered in addressing soil quality and ecosystem health at a regional or global level (Guerra et al., 2021; Coban et al., 2022).

    Soil microbial biomass was subject to change with the organic inputs due to crop production on a farm scale (Anderson & Domsch, 1989; Powlson, 1994) or to plant litter from aboveground biomass with vegetation on a regional scale (Zak et al., 1994). In other words, the size of soil microbial biomass could represent soil carbon substrate availability in soil, under interaction of carbon source incorporated into soil and the allocation and protection for microbial access in soil (Sollins et al., 1996). By operation definition, microbial biomass itself was considered as an indispensable part of soil organic matter stabilized through biological decay under certain ecosystem conditions (Zhu et al., 2020). Therefore, the biomass of soil microbes and their abundance could indicate the contribution by soil microbes to SOM build-up and the ecosystem function and health.

    The microbial biomass of soil, routinely measured with a fumigation protocol, has been adopted as an integrative measure of the overall size of all microbial communities in soil (Anderson & Domsch, 1989; Jenkinson & Ladd, 1981). This was, of course, included in the total pool of soil organic matter measured routinely with wet digestion or combustion (Black et al., 1965; Matejovic, 1997). Soil organic matter has been increasingly considered a complex of a wide range of organic compounds varying in microbiological decay and in mineral association and aggregate-allocation within a soil (Lehmann & Kleber, 2015; Kallenbach et al., 2016; Pan et al., 2019). In turn, the portion of microbial biomass carbon to total organic carbon of soil could represent the size of live soil microbial communities existing in the soil organic matter developed (Miltner et al., 2012; Cotrufo et al., 2013) and even the potential sequestration of soil organic carbon (SOC) in the medium-term (Lehmann & Kleber, 2015; Matejovic, 1997; Six & Paustian, 2014). As a soil health indicator widely accepted (Anderson & Domsch, 1989; Powlson, 1994; Sparling, 1992), microbial quotient (MQ), the fraction of microbial biomass carbon to SOC or of microbial biomass N to soil total N, could represent the biological active fraction of SOM (Bachar, 2010; Zhou et al., 2017). Thus, MQ could be further linked to organic carbon/nitrogen turnover mediated either by carbon inputs or the quality of carbon substrate or both (Coban et al., 2022; Paul, 2016). With the great variability with various ecosystems (Bachar et al., 2010; Paul, 2016; Martiny et al., 2006), quantitative comparison among land use types on a regional scale has not yet widely reported.

    Changes in topsoil MBC pool and the MQ, could be driven either by climate condition affecting plant/crop biomass production on a regional scale or by edaphic factors such as mineralogy, soil texture and structure on a site scale. There were wide variations of microbial abundance and diversity index with diverse biotic (plant and soil biota, for example) and abiotic (soil nutrients and texture, for example) as well as other environmental attributes (Martiny et al., 2006). Abiotic variables such as soil temperature and moisture, in particular, have been increasingly known to drive the spatial variation of carbon inputs and turnover as well as soil formation (Conant et al., 2011; Seneviratne et al., 2010). For the growing demand of climate change mitigation and of soil protection and food health, it became urgent to quantify the pool size of microbial biomass carbon and the microbial portion to SOM at regional and global scale (Wieder et al., 2013). Serna-Chavez et al. (2013) and Xu et al. (2013) reported the first quantitative estimates of global topsoil MBC and MQ, using global scale high resolution geographical data. Addressing microbial manipulation of soil organic matter stabilization under various ecosystem conditions (Leifeld & Kögel-Knabner, 2005), quantification of topsoil MBC pool and the MQ should be developed further on a regional scale with land use changes.

    As a large country with diverse soil cover and ecosystems, China had been threatened with fast shifting land use patterns and in turn, land degradation (Pan et al., 2015; Song & Deng, 2017). Total SOC stock of China in 1m depth was estimated at a small level of 90 Gt, of which 15 Gt C was allocated to topsoil SOM (Pan, 2009). While SOC storage varied greatly with vegetation biomass across terrestrial ecosystem in China (Fang et al., 2001), size and variability of MBC pool and the MQ values were shown impacted profoundly with land use changes (Mao et al., 1992) and with soil contamination (Bian et al., 2015). Particularly, long term rice cultivation promoted SOC accumulation while increased microbial abundance (elevated MQ) (Liu et al., 2016). In contrast, shift of natural grasslands to croplands could lead to a reduction of MBC and thus to a decrease in MQ in the semiarid Loess plateau of Northwest China (Wang et al., 2009). Ma et al. (2015) reported significant lower topsoil MBC values in soils stressed by water-logging or drought than those without water stresses, from northeastern China. Using national forest inventory data, Zhou & Wang (2015) quantified a mean topsoil MBC pool of 390.2 mg·kg−1 and a MQ of 1.92% for China's forest ecosystems. In concern of land use patterns, data of the size and variation of soil microbial biomass and the MQ of China have not yet been reported.

    In this study, we firstly hypothesize that the topsoil microbial biomass pool could show greater variation than microbial quotient, with land use types. We further hypothesize that soil factors could have strong impact on the MBC pool while climate factor on MQ. With the variation quantitatively with land use types addressed, a topsoil MBC pool could be estimated for China as a whole through spatial extrapolation. To address these, we conducted a literature survey to form a database of topsoil MBC pool with varying ecosystem conditions. Both MBC and MQ were compared among land use patterns and climatic patterns across mainland China. The variations were explored to identify the major drivers for the changes across sites. By data synthesis, the total topsoil MBC pool was predicted using the established MBC and MQ values. Finally, a statistical model was tested for predicting topsoil MBC pool for China's soils under different land uses.

    We searched the literature published from January 2000 − December 2022 via the bibliographic databases available in China. A basic literature archive of 1700 papers were first created by searching with the key words 'soil organic carbon' and 'microbial biomass carbon'. The literature archive was further filtered for field studies reporting measurement data both of SOC and MBC (and MBN) measured with the chloroform fumigation-extraction (CFE) (Vance et al., 1987), of topsoil in untreated conditions. Data of MBC (or MBN) measured not with the fumigation method and/or under any experimental treatments (with land management, farming practices, vegetation restoration and pollution remediation, etc.) were not included. In order to avoid potentially abnormal values, retrieved dataset of SOC and MBC were filtered using the 95% confidence principle. Resultantly, the established database consisted of 454 observations covering a wide range of climate and soil conditions across mainland China. The land use types analyzed included forest land (117 sites), grassland (67 sites), wetland (12 sites) and cropland (164 sites including 125 sites for dry cropland and 39 sites for rice paddy). All the soil and geographical information of the observations are provided in Supplemental Table S1 while the site distribution is graphed in Fig. 1.

    Figure 1.  Geographical distribution of observations in China that were used in this study.

    Climatic conditions were categorized of plateau/mountain climate (PMC), temperate continental climate (TCC), temperate monsoon climate (TMC) and subtropical monsoon climate (SMC) (http://geodata.pku.edu.cn). Site data of mean annual air temperature (MAT) and precipitation (MAP) were collected either from the reported studies or, when not provided, extracted from the China Meteorological Data Network (http://data.cma.cn/) for the station nearest to the reported site. Meanwhile, topsoil data of SOC, total N contents (TN) and microbial biomass carbon (MBC) and nitrogen (MBN), pH, bulk density (BD) were also retrieved from the published studies and archived in the database.

    The microbial quotient (MQ, %), defined as a portion of MBC content in percentage to SOC content of the topsoil, was calculated with Eqn (1):

    MQ=CMBCorg×1000×100% (1)

    where, CMB and Corg represents the topsoil content of MBC in mg kg−1 and of SOC in g kg−1, respectively.

    As a direct approach, the mean values of MBC obtained were then used in alignment with the data of bulk density and of soil area (Xie et al., 2007; Zheng et al., 2013) to directly estimate a pool size of topsoil microbial biomass carbon (Mp, Tg C) under a certain land use. Subsequently, the whole of China's topsoil MBC pool could be predicted by integrating the individual pool occupation by the land use types, using the equation as follows:

    Mp=MBCi×BDi×D×Ai/10000 (2)

    where, MBCi and BDi is the topsoil MBC (mg·kg−1) and bulk density (g·cm−3) averaged for a land use type i, and Ai is the total area (M·ha) of land use type i, respectively. D is the topsoil depth (cm), which was set default as 15 cm for rice paddies and 20 cm for other non-paddy land uses (18−21 cm reported by Xie et al (2007)). The soil area of the different land use types were also retrieved from the report by Xie et al. (2007). The number of 10,000 is a conversion factor.

    Alternatively for an indirect approach, the data of MQ derived in this study was combined with the total topsoil SOC pool to estimate an individual MBC pool for a given land use type. Finally, a potential pool of microbial biomass carbon of China's topsoil was predicted by integrating the values for individual land use type, using the following equation:

    Mp=MQi×SOCpi (3)

    where, MQi is the mean topsoil microbial quotient (%) obtained herein; SOCpi is the SOC pool (Tg C) of topsoil under a land use type i, respectively. The values of topsoil SOC pool for a given land use type were cited from the data reported by Xie et al. (2007). However, the topsoil SOC pool of wetlands was estimated using the data in this study with the number of wetlands area of China cited from Zhang et al. (2008).

    In this study, all measurement data of soil organic carbon, microbial biomass carbon/nitrogen, soil nitrogen, and the related rations were all log-transformed prior to statistics (Supplemental Figs S1S8). An analysis of variance (ANOVA) was then performed using a least significant difference test (LSD) to evaluate the differences among land use types and climatic categories. Multivariate correlation analysis was conducted to explore the environmental influences on MBC pool. Stepwise multivariate regression analysis was further performed to characterize the environmental drivers and to simulate MBC as a linear function of multiple explanatory variables. The forward method was used in this study, which involved starting with no variables in the model, testing the addition of each variable according to Akaike's Information Criteria (AIC), adding the variable that improved the model the most, and repeating this process until the best multivariate model was selected. The level of significance of a difference or a correlation was defined at p < 0.05. All statistical analyses were carried out using JMP software (version 11, SAS Institute Inc., Cary, NC, USA).

    The mean values of topsoil MBC across land use types and climate regions in our database are plotted in Fig. 2. Following a log-normal distribution, MBC content ranged from 28.7−1608.2 mg·kg−1 across the soils studied, with an average of 353.7 mg·kg−1 and a 95% confidence interval of 323.2~384.3 mg·kg−1 (Supplemental Fig. S4). Topsoil MBC content was highly variable across sites within a land use group but the means of MBC content were found different among land use groups (Fig. 2a). Mean topsoil MBC content was significantly higher under forest (470.8 mg·kg−1), rice paddy (454.9 mg·kg−1) and wetland (634.8 mg·kg−1) than under dry croplands (179.9 mg·kg−1). Owing to their high site variability, mean MBC contents under grasslands (349.9 mg·kg−1) was found not significantly different either from forest and rice paddy or from dry croplands.

    Figure 2.  Differences in soil microbial biomass carbon concentrations (a) among land use types and (b) in different climatic regions. Different letters indicate significant differences of soil microbial biomass carbon concentrations and microbial quotients between land use types at p < 0.05.

    The mean topsoil MBC contents ranged from 280.9 mg·kg−1 to 400.8 mg·kg−1 across the climatic regions (Fig. 2b), with a much narrower variation than across land use types. As shown also with ANOVA, the mean topsoil MBC was higher by almost 25% under subtropical monsoon climate (SMC) than under temperate continental climate (TCC), which was not significantly different from the plateau/mountain climate (PMC) region and the temperate monsoon climate (TMC) region.

    Like soil MBC, microbial quotient estimated as MBC in percentage to SOC, fitted well a log-normal distribution and ranged from 0.2% to 12.8% (Supplemental Fig. S5). The MQ of all observations across the land use types was averaged as 2.05% to their SOC content, with a 95% confidence interval of 1.87%−2.22%. As indicated in Fig. 3a, the mean MQ was lowest in grassland (1.63%) and highest in rice paddy (2.69%). The mean MQ value was moderate in wetland (2.16%), which was not significantly different from forest (2.08%) and dry cropland (1.82%). Compared to MBC, MQ showed a relatively stronger variation among the land use types.

    Figure 3.  (a) Soil microbial quotient among land use types and (b) microbial quotient in different climatic regions. TMC, temperate monsoon climate; PMC, plateau/mountain climate; SMC, subtropical monsoon climate and TCC, temperate continental climate. Different letters after mean values indicate significant differences of soil microbial biomass carbon concentrations and microbial quotients between climatic regions at p < 0.05.

    Moreover, the variation of mean topsoil MQ values among the climatic regions was not following that of MBC (Fig. 3b). The mean MQ values were more or less similar among the TMC (2.04%), SMC (2.21%) and TCC (2.31%) regions. Being significantly lower, the mean MQ was as low as 0.73% in the PMC region. Evidently, variation with land uses and climate regions seemed relatively smaller than the site variability.

    Topsoil MBC pools estimated for different land use types using direct (Approach I) and indirect (Approach II) approaches were graphed in Fig. 4a & b respectively. Herein, under a certain land use type, the MBC-based direct estimation of topsoil microbial biomass carbon pool was very close to MQ-based indirect estimation. Predicted with Approach I and II, the topsoil MBC pool was 246.5 (213.0−279.9, 95% CI) Tg C and 284.6 (238.1−331.0, 95% CI) Tg C for forest land, 233.9 (192.3−275.5, 95% CI) Tg C and 250.5 (203.4−297.6, 95% CI) Tg C for grassland, 56.6 (50.3 − 62.9, 95% CI) Tg C and 55.8 (49.7−61.9, 95% CI) Tg C for dry cropland, 50.3 (25.3−75.4, 95% CI) Tg C and 44.3 (32.7−55.8, 95% CI) Tg C for wetland, and 27.1 (21.2−32.9, 95% CI) Tg C and 22.0 (18.3−25.8, 95% CI) Tg C for rice paddy, respectively.

    Figure 4.  Topsoil microbial biomass carbon pool (Tg C) of different land uses of China, estimated as per averaged MBC ((a), approach I)) and averaged MQ combined with SOC stock ((b), approach II)). The size of the set is comparable to the total pool size for the whole stock.

    Integrating these MBC pools of the individual land use types, we reached an estimation of the whole China's topsoil being 614.4 Tg C (502.2−726.7 Tg C, 95% CI) using the area-averaged MBC values and 657.1 Tg C (542.1−772.1 Tg C, 95% CI) using the mean MQ values. Using the area-weighted mean MQ and the whole topsoil SOC stock (Xie et al., 2007), a potential topsoil microbial biomass C pool could be 614.4 Tg C for the whole mainland China. Confidently, a topsoil MBC pool size could reach 0.6 Pg C for the whole of China. This pool was allocated to forest by ca 40%, to grassland by 38%, to dry cropland by ca 9% and to wetland by 7%−8% while to rice paddy by 3%−4%.

    Results of multivariate correlation analysis to explore the environmental drivers for topsoil MBC are presented in Fig. 5. Overall, MBC was correlated positively and strongly to total N and MBN (r > 0.60, p < 0.0001) but slightly (r < 0.30, p < 0.05) to MAP, soil C/N ratio and MBC/MBN ratio. However, negative correlation of MBC was found strong to bulk density (r = −0.57, p < 0.0001), slight to soil pH (r = 0.30, p < 0.01 when pH > 7.5). MBC correlation to environmental attributes varied with land use types. A negative correlation of MBC to bulk density was found very significant and strong for natural soils (forest, grassland and wetlands). Whereas, the MBC correlation to bulk density was not significant for croplands (both dry cropland and rice paddy). Differently, MBC was correlated to soil pH positively under grassland but negatively under rice paddy and forest. Again, MBC was correlated positively and moderately to strongly to MAP under wetland and grassland/forest while negatively in rice paddy. In contrast, MBC was observed very significantly and strongly to MAT in wetland though insignificant under other land use types. In addition, only for wetland and rice paddy, soil N level was significantly and positively correlated to MBC (r > 0.5, p < 0.05).

    Figure 5.  Environmental drivers of soil microbial biomass carbon (* p < 0.05; ** p <0.01).

    Unlike MBC, a very significant (p < 0.0001) negative but moderate correlation was found for topsoil MQ to SOC (r = −0.48), total N (r = −0.32), and soil C/N ratio (r = −0.37) while no correlations to soil pH or to bulk density, for overall observations (Fig. 5). However, the correlation of MQ to soil and environmental attributes varied also greatly with land use types. For example, MQ was correlated positively and moderately to MAP under forest and rice paddy but negatively in dry croplands. However, no correlation was visible to MAT despite a weak positive correlation under forest. In addition, MQ was significantly correlated to soil pH negatively under rice paddy but positively under grassland.

    Through an approach of stepwise multivariate regression analysis with forward method, an optimum multivariate model was developed to predict a topsoil MBC content. With a total adjusted explanatory power of 48%, the model was expressed as the following equation:

    log10MBC=2.410.60×BD+0.46×log10SOC0.27×log10TN+0.0002×MAP (4)

    where, MBC is the microbial biomass carbon content (mg·kg−1), BD represents the bulk density (g·cm−3), SOC is the organic carbon content (g·kg−1), TN is the total N content (g·kg−1), of topsoil. MAP is the mean annual air temperature (°Ϲ) over the soil area.

    In this study, there were greater difference in both mean topsoil MBC and MQ among land use types than among the climate zones (Figs 2 & 3) despite large heterogeneity across sites (28.7−1608.2 mg·kg−1). As shown in Figs 2a & 3a, both MBC and MQ on average were relatively high under wetland (635 mg·kg−1 and 2.2%), rice paddy (454 mg·kg−1 and 2.7% ) and forest (471 mg·kg−1 and 2.1%) while low under cropland (180 mg·kg−1 and 1.8%) and grassland (350 mg·kg−1 and 1.6%). Compared to the estimation on a global scale by Xu et al. (2013), the estimated topsoil MBC pool in this study was similar for forest and croplands (area weighted of dry croplands and rice paddy) but significantly lower for wetland and grassland, being respectively 1336.8 mg·kg−1 and 520.8 mg·kg−1 reported (Xu et al., 2013). With soil survey and monitoring data synthesis, Xie et al. (2007) reported a significant soil carbon accumulation for forest and cropland but decline for grass land since the 1980's. Also, soil organic carbon loss was seen to be extensive for wetlands across China (Zhang et al., 2008). For forest soil, in particular, the estimate of topsoil MBC in this study (470.8 mg·kg−1 on average) was higher than that of 390.2 mg·kg−1 reported in a specific study of MBC of forest soils from China by Zhou & Wang (2015). In this study, data of forest soil MBC were largely from the measurements conducted after 2015 following the national ecological civilization strategy (Supplemental Table S1). Following ecological restoration of vegetation in China, soil microbial necromass increment was observed much higher than SOC increase reported by Li et al. (2023). The relatively higher MBC in forest and cropland could point to potential soil microbial community enhancement through increased carbon substrate supply with improved managements and restoration (Zhang et al., 2022). Singh & Gupta (2018) argued that ecological restoration could reduce the unpredictability and turnover rates of soil microbial biomass through alleviating soil stresses on microbial communities.

    Our study demonstrated a larger variation of MBC than MQ among the land use types. For climate zones also, MBC was almost similar though lower in TCC zone while MQ exerted a larger variation than MBC with PMC (0.81%) greatly lower (Fig. 2b & 3b). Serna-Chavez et al. (2013) demonstrated a very large difference of both soil MBC and topsoil MQ across global biomes, whereby under forest MQ rather than MBC displayed a wider variation with climate conditions. MQ represented microbial assimilation of soil organic carbon (Serna-Chavez et al., 2013) and of microbial activity in relation to environmental stresses (Zhou & Wang, 2015). In this study, mean MQ values (1.6%−2.2%) across the land use types were generally higher, except for grassland, than those (1.0%−2.1%) across global biomes quantified by Xu et al. (2013). In a quantification by Serna-Chavez et al. (2013) using their MBC estimate database in geo-reference to grid SOC, MQ of grasslands was over 3.0% compared to temperate broadleaf forest (2.0%), temperate coniferous forest (3.0%) and tropical forest (3.6%). In particular, our estimate of mean MQ in forest (2.1%) was close to that (1.92%) reported by Zhou & Wang (2015) with experimental data under intended treatments. For cropland, the mean MQ of 2.0% in this study was comparable to those in a range of 2.3%−2.9% under long-term experiments from Central Europe (Coban et al., 2022). But, the mean MQ of 2.2% for wetland, though much fewer cases, turned markedly higher than that of 1.20% for global mean of natural wetlands (Serna-Chavez et al., 2013). Furthermore, an area-weighted mean MQ was estimated as 1.89% (1.7%−2.3% for 95% confidence interval) in comparison to 1.20% by both Serna-Chavez et al. (2013) and by Xu et al. (2013). Although topsoil samples were taken for a 0−20 cm depth (except for 15 cm for rice paddy) in Chinese soil sampling protocol (Song & Deng, 2017) while the 0−30 cm depth was default for sampling in the works (Serna-Chavez et al., 2013; Xu et al., 2013), the above mentioned discrepancy could be explained with difference in soil resource and environmental conditions (Zhou & Wang, 2015) to be explored below.

    The data obtained here could allow an estimation of total MBC pool of China's topsoil cover. Both in terms of land use types, a total topsoil MBC pool was yielded of 614.4 Tg C and 657.1 Tg C respectively with Approach I using mean measurement data of MBC concentration and bulk density in the database and Approach II using the individual values both of mean MQ obtained and the topsoil SOC stock retrieved from Xie et al. (2007). Using the area weighted mean MQ value and the total topsoil SOC stock of 32.94 Pg (Xie et al., 2007) integrating all the land use types, a total topsoil MBC pool of 622.7 Tg C was obtained for the whole of mainland China. Thus, a total topsoil microbial biomass carbon pool could be established, being very likely of ca 657.1 Tg C for the whole of China. Evidently, this pool contributed 4.4% and 3.8% to the global pool respectively of 14.6 Pg C (Serna-Chavez et al., 2013) and 16.7 Pg C (Xu et al., 2013). Comparably, China's share of the global SOC stock could be known of 6% and 4.8% respectively for whole soil and for topsoil (Pan et al., 2015; Xie et al., 2007). The relatively lower share of topsoil microbial biomass could be indicative of intensive impact by land use activities and environmental changes (Zhou & Wang, 2015) on soil microbial community preservation.

    The key player in regulating their level could differ between MBC and MQ for China's soils though both were subject to changes in edaphic and biogeographic factors (Paul, 2016). Clearly, there was a high site variability both of MBC (CV of 95%) and MQ (CV of 98%). Xu et al. (2013) reported a wide variation by three orders of topsoil MBC but by only one order for topsoil MQ using global data. Moreover, the variation of MBC was stronger (CV of 40%) than MQ (CV of 19%) across the land use types while that of MQ was stronger (CV of over 40%) than MBC (CV of 14%) when assessed with climate regimes (Figs 2 & 3). This pattern of stronger driver by vegetation zone on SMBC but less by climate on MQ was also reported at global scale (Serna-Chavez et al., 2013). Seemingly, land use change impacted more on topsoil MBC while climate (mainly MAP) more on the MQ (Fig. 5). As shown in recent studies (Serna-Chavez et al., 2013; Xu et al., 2013), MQ could be linked to soil stress such as moisture, organic carbon loss and N limitation. The variation of MQ with land use could represent the extent by which the ecosystems altered with human disturbance (Zhou & Wang, 2015). As such, the estimated mean MQ values were generally almost 2-fold the global mean of 1.2%, reflecting soil stresses of Chinese soils under long human utilization and climate change (Pan, 2009; Song et al., 2005).

    It was recommended that soil microbial biomass was not driven by temperature but by factors affecting soil moisture availability and soil nutrients such as N status (Serna-Chavez et al., 2013). Specifically for forest biomes across China, Zou & Wang (2015) noted that both MBC and MQ were controlled by soil condition rather than by climate condition, with up to 40% of the total variation explained by soil factors of SOC, total N and their interaction but less than 10% by climate conditions. At the regional scale, therefore, land use as a major driver impacted soil microbial biomass pool plus modification by climate condition through changes in soil resource conditions such as soil moisture availability, carbon input through vegetation shift, N level through human activities.

    For the edaphic factors, SOC, total N, bulk density and pH are the important factors for soil MBC across the land use types (Fig. 5; Supplemental Fig. S9). SOC had been well known to be controlled by ecosystem productivity, spatial variability with soil attributes, on a regional scale (Fierer et al., 2009). This further affected the size and community structure of soil microbes with variations in SOC quality, plant C inputs and rates of C turnover (Wardle, 1992). Serna-Chavez et al. (2013) addressed a strong impact by SOC and total N on MBC pool of a wide range of soils across the globe. To note, there was a significant, but very slight (r2 <0.1), negative correlation of MBC to soil pH across China (Supplemental Fig. S10) despite a strong negative correlation both under forest and rice paddy (Fig. 5). Differently, soil pH was either significantly but slightly (Serna-Chavez et al., 2013) or not significantly (Xu et al., 2013) correlated to microbial biomass carbon on a global scale. Interestingly, we found a significant and very strong negative correlation (r = −0.57, p < 0.0001) of MBC to soil bulk density for all soils other than croplands across China. Though bulk density data was not included (Xu et al., 2013) or not correlated in the existing global synthesis, our finding highlighted the prominent effect of soil structure on preservation of microbial communities and their ecosystem services (Gupta & Germida, 2015).

    The narrower range of MQ variation suggested weaker impact by soil factors on the microbial carbon assimilation intensity in topsoil. Linking to N limitation effects (Xu et al., 2013), microbial assimilation could be stimulated in N-limited conditions while microbial growth, and thus microbial abundance, could be stressed in high C:N ratio soil (Paul, 2016; Wang et al., 2009; Dequiedt et al., 2011). As such, MQ was negatively correlated strongly to total N and less strongly to C/N ratio across global biomes (Xu et al., 2013). Unlike the finding for the forest lands across China (Zhou & Wang, 2015) and for the global biomes (Xu et al., 2013), MQ was negatively correlated to soil C/N ratio and, to lesser extent, to total N for overall observations in this study (Fig. 5). Again, MQ was found very significantly and strongly positively correlated with microbial nitrogen (MBN) across all the land use types. This showed variation of MQ could be partly attributed to variation of N assimilation by microbes, in responding to soil nutrient status (Dequiedt et al., 2011). It was already shown that low N availability could spike microbial N assimilation and thus increase MQ through enhanced organic matter decomposition in disturbed topsoil (Lejon et al., 2007).

    For all the observations across China, MBC was positively and strongly correlated to MAP but not to MAT, supporting the major factor of moisture rather than temperature on soil microbial biomass (Xu et al., 2013). Whereas, MQ was correlated neither to MAP nor to MAT for the whole observations (Fig. 5). In terms of climate zones, MBC was higher in SMC region than in the TCC regions in this study (Fig. 3), depicting a critical role of soil moisture on soil microbial growth (Wieder et al., 2013; Ma et al., 2015). Although microbial growth was well known to be strongly temperature- dependent (Grisi et al., 1998), MQ but not MBC was lower in the PMC than in the TCC, SMC and TMC regions (Fig. 3). This is inconsistent with the finding that MQ values were higher in tropical and subtropical climate zones than in boreal and tundra regions (Xu et al., 2013). This could be attributed to the difference in microbial carbon decomposition and nutrient assimilation between these climate zones. Likewise, Franzluebbers et al. (2001) argued that topsoil MBC, but not SOC, was controlled by macro-climate condition across continental USA overall in this study, soil MBC was strongly controlled by soil factors but less strongly by climate variables though the variation of MQ was rather narrow and less respondent to soil and climate changes.

    For predicting MBC of a given soil, a linear regression model (Eqn 4) was established following a stepwise regression analysis. This model was contributed by three edaphic attributes including bulk density, SOC and total N and one climate attribute of MAP. Xu et al. (2013) proposed a logarithmic model for predicting MBC mainly with climate variables (mean annual precipitation and temperature) plus SOC, with the fitted parameters inconsistent with climate zones of the globe. Differently, Serna-Chavez et al. (2013) developed a MBC model with six attributes both of climatic parameters and edaphic factors of pH and total nitrogen, and a MQ model with eight attributes of climate parameters and soil total nitrogen, pH, C:N ratio and CEC. Our MBC model had an adjusted explanatory of 48% of the total variance, in comparison to that of 39% of the MBC model by Serna-Chavez et al. (2013) with more attributes accounted. In previous work (Frey et al., 2013), macroclimate attributes had been considered for the predominant environmental driver for MBC pool of soils across major biomes on a global scale. In this study, however, edaphic parameters such as soil carbon, total N and climate parameter of mean annual precipitation were shown playing determinant roles on MBC pool of topsoil across land use types in China. The proposed model, with both key attributes of soil and climate and with a good explanatory power, could provide a simple tool to perform a essential estimation of MBC for robust soils at a random site from mainland China. The model could be used to guide the practices for enhancing soil microbial biomass and thus enhance soil microbial biodiversity of China' soils through manipulation of soil organic carbon and nitrogen. Indeed, nature-based solutions such as biochar for soil management could safeguard soil microbial biomass pool, and in turn, soil health for One Health of the Earth system (UNEP, 2022).

    Uncertainty remains in the basic estimation via multivariate statistics of microbial biomass in soils of China. Firstly, the MBC data used in this study were measured using the chloroform fumigation-extraction method (Vance et al., 1987). The sampling at different seasons in individual studies could impact the MBC level determined. Secondly, the bias of number of observations among land use type could cause a main source of uncertainty. This was the case particularly for wetlands, which had only 11 observations in the dataset and showed high, but variable, MQ measurements, compared to the values reported by Serna-Chavez et al. (2013). Thirdly, lack of available soil data could impact the model efficiency as only 77 data sets were used to develop the multivariate model, compared to a total of 648 MBC measurements. We reached a consistent estimate of topsoil MBC pool of China's soils with different predicting approaches. However, the depth of topsoil was set to a default of 20 cm for soils other than rice paddy with a default value of 15 cm. This could, of course, lead to a lower pool size of topsoil MBC in China' soils compared to a default topsoil depth of 30 cm in the works for global estimations (Serna-Chavez et al., 2013; Xu et al., 2013). In addition, for non cropland soils, edaphic parameters such as bulk density, soil N and microbial biomass N as well as soil texture were not reported or often absent. Further work should be deserved to obtain a more robust and high resolution estimation of microbial biomass in topsoil in terms of soil types and soil regions (Gong, 1999) as well as for whole soil. For better understanding of the MQ status, the linkage of microbiome structure and activity to soil organic matter status should be explored with special reference to soil structure at an aggregate level in the future.

    The effects of environmental variables on topsoil microbial biomass and the relation to SOC were quantified based on published database literature of field studies across mainland China. Wide MBC variation was mostly showed with land use changes while MQ was changed with climate conditions. For individual soils, SOC (and TN) exerted a strong positive impact on MBC and moderate negative impact on MQ. In contrast, precipitation had positive but moderate impact on MBC and temperature had positive but moderate impact on MQ. Among the land use types, rice paddy had the higher MQ despite lower SOC and soil C/N ratio, compared to forest and grassland soils, indicating a higher active biological carbon pool. A multi-variable model was developed to allow a general prediction of topsoil microbial biomass carbon for China soils. As a result, a topsoil MBC pool was estimated of 614.4~657.1 Tg for overall China soils. Topsoil MBC pool showed greater variation than microbial quotient with land use types; soil factors strongly impacted MBC pool while climate factors showed a great influence on microbial quotient.

    This work was financially supported by China Natural Science Foundation under a grant number 41501569, 41371298 and U1612441. The international cooperation was funded by the Priority Academic Program Development of Jiangsu Higher Education Institutions (PAPD), and the double first rank discipline construction plan, the Ministry of Education, China. This work contributes to the N-Circle project, a China-UK Virtual Joint Centre on Nitrogen funded by the Newton Fund via the UK BBSRC (BB/N013484/1).

  • The authors declare that they have no conflict of interest. Pan Genxing is the Editorial Board member of Soil Science and Environment. He was blinded from reviewing or making decisions on the manuscript. The article was subject to the journal's standard procedures, with peer-review handled independently of this Editorial Board member and his research groups.

  • [1]

    Kobes M, Helsloot I, De Vries B, Post JG. 2010. Building safety and human behaviour in fire: A literature review. Fire Safety Journal 45:1−11

    doi: 10.1016/j.firesaf.2009.08.005

    CrossRef   Google Scholar

    [2]

    Wang Z, Li T. 2022. A lightweight CNN model based on GhostNet. Computational Intelligence and Neuroscience 2022:8396550

    doi: 10.1155/2022/8396550

    CrossRef   Google Scholar

    [3]

    Drysdale D. 2011. An Introduction to Fire Dynamics. 3rd Edition. UK: John Wiley & Sons. 576 pp. https://doi.org/10.1002/9781119975465

    [4]

    Liu Z, Kim AK. 2003. Review of recent developments in fire detection technologies. Journal of Fire Protection Engineering 13:129−51

    doi: 10.1177/1042391503013002003

    CrossRef   Google Scholar

    [5]

    Gaur A, Singh A, Kumar A, Kulkarni KS, Lala S, et al. 2019. Fire sensing technologies: A review. IEEE Sensors Journal 19:3191−202

    doi: 10.1109/JSEN.2019.2894665

    CrossRef   Google Scholar

    [6]

    Röck F, Barsan N, Weimar U. 2008. Electronic nose: current status and future trends. Chemical Reviews 108:705−25

    doi: 10.1021/cr068121q

    CrossRef   Google Scholar

    [7]

    Davies ER. 2004. Machine vision: theory, algorithms, practicalities. 3rd Edition. San Francisco, USA: Academic Press, Elsevier. https://doi.org/10.1016/C2013-0-10565-X

    [8]

    Ma J, Sun DW, Qu JH, Liu D, Pu H, et al. 2016. Applications of computer vision for assessing quality of agri-food products: a review of recent research advances. Critical Reviews In Food Science And Nutrition 56:113−27

    doi: 10.1080/10408398.2013.873885

    CrossRef   Google Scholar

    [9]

    Szeliski R. 2022. Computer Vision: Algorithms and Applications. Cham, Switzerland: Springer Nature. 925 pp. https://doi.org/10.1007/978-3-030-34372-9

    [10]

    Zhong Z, Wang M, Shi Y, Gao W. 2018. A convolutional neural network-based flame detection method in video sequence. Signal, Image and Video Processing 12:1619−27

    doi: 10.1007/s11760-018-1319-4

    CrossRef   Google Scholar

    [11]

    Zhang L, Wang M, Ding Y, Bu X. 2023. MS-FRCNN: A Multi-Scale Faster RCNN Model for Small Target Forest Fire Detection. Forests 14:616

    doi: 10.3390/f14030616

    CrossRef   Google Scholar

    [12]

    Yu L, Liu J. 2020. Flame image recognition algorithm based on improved Mask R-CNN. Computer Engineering and Applications 56:194−98

    doi: 10.3778/j.issn.1002-8331.2006-0194

    CrossRef   Google Scholar

    [13]

    Abdusalomov A, Baratov N, Kutlimuratov A, Whangbo TK. 2021. An improvement of the fire detection and classification method using YOLOv3 for surveillance systems. Sensors 21:6519

    doi: 10.3390/s21196519

    CrossRef   Google Scholar

    [14]

    Zheng H, Duan J, Dong Y, Liu Y. 2023. Real-time fire detection algorithms running on small embedded devices based on MobileNetV3 and YOLOv4. Fire Ecology 19:31

    doi: 10.1186/s42408-023-00189-0

    CrossRef   Google Scholar

    [15]

    Hou Q, Zhou D, Feng J. 2021. Coordinate Attention for Efficient Mobile Network Design. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20-25 June 2021. USA: IEEE. pp. 13708−17. https://doi.org/10.1109/CVPR46437.2021.01350

    [16]

    Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S. End-to-end object detection with transformers. In Computer Vision – ECCV 2020, eds. Vedaldi A, Bischof H, Brox T, Frahm JM. pp. 213−29. Switzerland: Springer Cham. https://doi.org/10.1007/978-3-030-58452-8_13

    [17]

    He J, Erfani S, Ma X, Bailey J, Chi Y, et al. 2021. α-IoU: A family of power intersection over union losses for bounding box regression. 35th Conference on Neural Information Processing Systems (NeurIPS 2021). pp. 1−19. https://doi.org/10.48550/arXiv.2110.13675

    [18]

    Chino DYT, Avalhais LPS, Rodrigues JF, Traina AJM. Bowfire: detection of fire in still images by integrating pixel color and texture analysis. 2015 28th SIBGRAPI conference on graphics, patterns and images, Salvador, Brazil, 26-29 August, 2015. USA: IEEE. pp. 95−102. https://doi.org/10.1109/SIBGRAPI.2015.19

    [19]

    Zeng G. 2020. On the confusion matrix in credit scoring and its analytical properties. Communications In Statistics-theory And Methods 49:2080−93

    doi: 10.1080/03610926.2019.1568485

    CrossRef   Google Scholar

    [20]

    Wang L, Qu JJ, Hao X. 2008. Forest fire detection using the normalized multi-band drought index (NMDI) with satellite measurements. Agricultural And Forest Meteorology 148:1767−76

    doi: 10.1016/j.agrformet.2008.06.005

    CrossRef   Google Scholar

    [21]

    Majid S, Alenezi F, Masood S, Ahmad M, Gündüz ES, et al. 2022. Attention based CNN model for fire detection and localization in real-world images. Expert Systems with Applications 189:116114

    doi: 10.1016/j.eswa.2021.116114

    CrossRef   Google Scholar

    [22]

    Solovyev R, Wang W, Gabruseva T. 2021. Weighted boxes fusion: Ensembling boxes from different object detection models. Image And Vision Computing 107:104117

    doi: 10.1016/j.imavis.2021.104117

    CrossRef   Google Scholar

    [23]

    Qu Z, Gao L, Wang S, Yin H, Yi T. 2022. An improved YOLOv5 method for large objects detection with multi-scale feature cross-layer fusion network. Image and Vision Computing 125:104518

    doi: 10.1016/j.imavis.2022.104518

    CrossRef   Google Scholar

    [24]

    Song C, Zhang F, Li J, Xie J, Chen Y, Zhou H, et al . 2022. Detection of maize tassels for UAV remote sensing image with an improved YOLOX model. Journal of Integrative Agricultur 22:1671−83

    doi: 10.1016/j.jia.2022.09.021

    CrossRef   Google Scholar

  • Cite this article

    Shao Z, Lu S, Shi X, Yang D, Wang Z. 2023. Fire detection methods based on an optimized YOLOv5 algorithm. Emergency Management Science and Technology 3:11 doi: 10.48130/EMST-2023-0011
    Shao Z, Lu S, Shi X, Yang D, Wang Z. 2023. Fire detection methods based on an optimized YOLOv5 algorithm. Emergency Management Science and Technology 3:11 doi: 10.48130/EMST-2023-0011

Figures(15)  /  Tables(2)

Article Metrics

Article views(5057) PDF downloads(715)

ARTICLE   Open Access    

Fire detection methods based on an optimized YOLOv5 algorithm

Emergency Management Science and Technology  3 Article number: 11  (2023)  |  Cite this article

Abstract: Computer vision technology has broad application prospects in the field of intelligent fire detection, which has the benefits of accuracy, timeliness, visibility, adjustability, and multi-scene adaptability. Traditional computer vision algorithm flaws include erroneous detection, detection gaps, poor precision, and slow detection speed. In this paper, the efficient and lightweight YOLOv5s model is used to detect fire flames and smoke. The attention mechanism is embedded into the C3 module to enhance the backbone network and maximize the algorithm's suppression of invalid feature data. Alpha CIOU is adopted to improve the positioning function and detection target. At the same time, the concept of transfer learning is used to realize semi-automatic data annotation, which reduces training expenses in terms of manpower and time. The comparative experiments of six distinct fire detection algorithms (YOLOv5 and five optimization algorithms) are carried out. The results indicate that the self-attention mechanism based on the Transformer structure has a substantial impact on enhancing target detection precision. The improved location function based on Alpha CIOU aids in enhancing the detection recall rate. The average recall rate of fire detection of the YOlOv5+TR+αCIOU algorithm is the highest, which is 68.5%, clearly outperforming other algorithms. Based on the surveillance video, this optimization algorithm is utilized to detect a fire in a factory, and the fire is detected in the 9th second when it starts to appear. The results demonstrate the algorithm's viability for real-time fire detection.

    • Fire is a natural and social disaster with the highest probability of occurrence, posing a grave threat to human life and the stable development of society and the economy[1]. China is one of the nations hardest hit by fire worldwide. According to statistics released by the Fire and Rescue Department Ministry of Emergency Management, 2021 is the year with the largest number of police reports received by the fire rescue team, of which 38.1% are fire-fighting tasks. 748 thousand fires were reported throughout the entire year, causing 1,987 deaths, 2,225 injuries, and direct property losses of 6.75 billion yuan[2].

      The development of fire is generally divided into four stages: the slow growth stage, the rapid growth stage, the fully grown stage, and the decay stage[3]. Initial slow growth stage is characterized by low burning intensity, a small surface area, low temperature, and low radiant heat. The optimal timing for firefighting is when the fire can be contained with fewer human and material resources, which is when it is noticed and dealt with promptly. After the fire enters the second stage, if there is no external control, the fire's spread will increase rapidly, which is essentially proportional to the square of time, and it will develop rapidly to the fully grown stage. During this stage, the fire's development range rapidly extends and the temperature peaks, making it difficult to extinguish. In addition, the fire has a strong contingency, making it difficult to be noticed by humans in the initial slow growth stage. Hence, it is of the utmost importance to precisely and swiftly detect the initial fire and issue an alarm, which can considerably increase the efficiency of firefighting and rescue activity, thereby reducing the loss of life and property.

      At present, the vast majority of conventional fire detection systems are outfitted with sensitive electronic sensors to detect fire-related characteristics such as smoke, temperature, light, and gas concentration[46]. Due to the sensitivity of these detectors, changes in the surrounding environment will alter the detection effect and result in false alarms. Installation of sensor-type fire detectors in a large space environment necessitates a large number of detectors, resulting in expensive prices, laborious installation, and potential circuit safety issues. It is therefore not ideal for large-space fire detection. In addition, fire characteristic factors require time before they can be detected by sensors, allowing ample time for the fire to spread and produce a catastrophe.

      With the advancement of computer hardware and software technology, computer vision has begun to be used in various industries[79]. Utilizing computer vision to recognize things in video and photos and then identifying them with precision and speed offers numerous potential applications. Computer vision analyzes the image characteristics to detect whether a fire is present in the image, and performs tasks such as recognizing and locating the fire, as well as evaluating burning combustibles. The technique for detecting fires that is based on computer vision has a quick identification rate and a quick response time, and it has evident advantages in situations involving rapid flow, large space, and unknown environments. Also, the price is modest. Existing monitoring equipment can be utilized to collect video and picture data in real-time without the need for additional hardware. Vision-based fire detection systems can deliver intuitive and comprehensive fire data (such as the location, severity, and surrounding environment of the fires). The visual recognition algorithm based on machine learning is capable of updating and learning, which may significantly increase fire recognition accuracy and decrease false alarms. In addition, it can simultaneously detect smoke and fires.

      Fire detection is an application of computer vision technology that necessitates a rapid detection reaction while maintaining accuracy. In addition, fire primarily detects smoke and flames, making it a multi-target detection challenge. Training and detection are typically time-consuming processes. Researchers around the world have conducted large amounts of work in this direction. Zhong et al.[10] implemented CNN-based video flame detection. Zhang et al.[11] proposed a multi-scale faster R-CNN model, which effectively improves fire detection accuracy. Yu & Liu[12] added a bottom-up feature pyramid to Mask R-CNN to improve flame detection accuracy. Abdusalomov et al.[13] proposed a fire detection method based on YOLOv3. Zheng et al.[14] proposed a fire detection method based on MobileNetV3 and YOLOv4. Different methods can achieve good performance in a specific image dataset. However, due to the poor robustness of the algorithms, the performance tends to be poor in different image datasets, and these methods make it difficult to eliminate the complex interference in real applications. Nowadays, there is a great deal of space for optimization and development in computer vision's accuracy of fire recognition and sample training time. This study employs the YOLOv5 algorithm as the fundamental fire recognition algorithm. Coordinate attention and multi-head self-attention based on the Transformer structure are presented and embedded into the last C3 module of the backbone network to improve the feature extraction capability. Simultaneously, the Alpha-CIOU loss function has been employed to improve the localization loss function and target identification accuracy. Moreover, transfer learning has been implemented to reduce the training cost for fire targets (fire and smoke). In the end, a test for factory fire detection has been carried out to validate the presented optimization technique and to enhance the precision of fire identification.

    • YOLO (short for You Only Live Once) is the most classic and advanced computer vision algorithm in the single-stage deep convolution target recognition algorithm. The algorithm directly extracts features and predicts object categories and positions on the input original image through the model network generated by previous target training, and realizes end-to-end real-time target detection. Its fundamental concept is to divide an input image into S × S grid cells, with the grid containing the object's center being responsible for object prediction. Each grid can pre-select B bounding boxes and determine their position (x, y, w, h), confidence, and category information C. The classification and position regression are merged into a single regression problem, whereby the loss function of each candidate frame is calculated at each iteration, and the parameters are iteratively learned through backpropagation. Lastly, the graphic displays the target prediction frame, target confidence, and category prediction probability. The YOLO series continues to be iteratively optimized by network model modification and technological integration.

      The YOLOv5 algorithm was proposed by Glenn-Jocher and numerous Ultralytics contributors in 2020 to improve YOLOv4. It is one of the most powerful object detection algorithms available today and the fastest inference process. This paper uses the YOLOv5s-6.0 version to detect fires. The model's architecture consists of four components: Input, Backbone, Neck, and Head. The Input performs adaptive anchor box computation and adaptive scaling on the image (size is 640 × 640 pixels) and employs the mosaic data augmentation method to increase the training speed of the model and the precision of the network. The Backbone is a convolutional neural network that gathers and produces fine-grained visual information. A Neck consists of a sequence of network layers that aggregate and blend visual information before sending it to the prediction layer. The Head makes predictions on image features, generates bounding boxes, and predicts categories.

    • The attention mechanism in computer vision deep learning is comparable to the selective visual attention mechanism in humans. The core goal is to select, from a great quantity of information, the data that is most relevant to the current task objective. The attention mechanism is now utilized extensively in the field of computer vision and has produced outstanding achievements. It enriches the information of the target features by improving the ability to extract the target information features of a specific area in the image, and improves the detection accuracy to a certain extent.

    • Coordinate Attention (CA) is a novel channel attention mechanism designed by Hou et al[15]. CA embeds position information into the channel attention by extending the channel attention into two one-dimensional feature codes in the length and width directions, and then re-aggregating the features along these two spatial directions to produce a feature vector. This mechanism abandons the brute force conversion of feature tensor into a single feature vector by two-dimensional global pooling of spatial information, such as the squeeze-and-excite (SE) channel attention mechanism. In light of CA's superior performance and plug-and-play adaptability in object recognition studies, this study introduces CA for feature extraction in the backbone.

      CA not only takes into consideration channel information, but also location-based spatial information. The horizontal and vertical attention weights obtained represent the presence or absence of focal regions in the respective rows and columns of the feature images. This encoding more precisely locates the position of the target focus, hence enhancing the recognition ability of the model.

    • Since it was proposed, the Transformer module incorporated with the self-attention mechanism has produced remarkable results in natural language processing (NLP) problems. Microsoft proposed using the Transformer structure to address the vision task in 2020, and the DETR (Detection Transformer) network model is the pioneering effort in target detection[16]. The transformer encoder block improves the capacity to capture diverse local data. Positional embedding, Encoder, and Decoder are the three components of the Transformer model. According to the Encoder structure of DETR, this paper introduces multi-head attention to the YOLO backbone. The Transformer block is replaced by the bottleneck blocks of the C3 structure, as well as the final C3 module of the New CSP-Darknet53 convolutional network. This layer has the largest number of channels and the most abundant computer semantic features, allowing it to capture a wealth of global and contextual information.

      The self-attention mechanism of the Transformer structure used in this paper is improved from the Encoder network layer structure of the Transformer module designed by DETR. The two sub-network layers and residual link structure of Multi-Head Attention and Feed-Forward Network (FFN) are preserved, but the original structure's normalization process (Batch Normalization layer) is omitted.

    • Locating the loss function in the backpropagation process is crucial for updating the bounding box regression of the iterative target location information parameters. Intersection over Union (IoU) loss is the most classic loss function for bounding box regression. However, there are obvious drawbacks to the IoU loss function. For instance, when IoU = 1, the candidate frame and the real frame GT completely overlap, but they do not reflect a complete encapsulation of the target, and there may be a very low degree of overlap between the two frames. More severely, when IoU = 0, the candidate frame and the real frame GT do not intersect, Loss (IoU) = 1, the gradient disappears in the IoU loss, and multiple random matchings are necessary to generate an intersection. All of these factors will result in slower model convergence and decreased detection model precision. In order to increase the accuracy, the loss function is modified based on the IOU, and the overlap region, center point, width and height of the candidate box, and normalizing terms are introduced.

      In this paper, according to He et al.[17], the CIoU is improved by introducing the parameter α to adjust the power level of the IoU and by adding a power regularization term to the general form of the α-IoU. α-CIoU is obtained by exponentiating CIoU (Eqn 1), and a new loss function is proposed, as shown in Eqn 2.

      α-CIoU=IoUα+ρ2α(b,bgt)c2α+(βv)α (1)
      Lα-CIoU=1IoUα+ρ2α(b,bgt)c2α+(βv)α (2)

      where, ρ2α(b,bgt) represents the Euclidean distance from the center point of the prediction frame to the center point of the target frame. b and bgt respectively represent the center point of the two candidate boxes. c is the diagonal distance of the smallest circumscribed rectangle between the candidate box and the ground truth box. The shape factor C is measured by the respective rectangular box aspect ratios of the candidate box and the ground truth box, β is a positive trade-off parameter, and v is a consistency parameter for measuring the aspect ratio. The α-CIoU loss function maintains the fundamental characteristics of the IoU-type loss function, including non-negativity, indistinguishable identity, symmetry, and triangle inequality. In addition, as the model is trained, the α-CIoU position loss continuously learns in the direction of approaching 0. Due to the properties of adaptive relative loss reweighting and adaptive relative gradient reweighting, the learning rate is continuously adjusted so that the speed at which simple targets are learned increases over time. When learning challenging targets at a later stage, the training speed is improved by increasing the weights of target loss and gradient for high IoU.

      As an adjustable parameter, α provides flexibility for achieving varying levels of Bounding Box (BBox) regression accuracy when training the target model. According to our previous experiments, the value of α is not overly sensitive to the impact of various models or data sets. When 0 < α < 1, the final target localization effect is not good due to reducing the loss and gradient weight of high IoU targets. When α > 1, increase the relative loss and gradient weight of the high IoU target, so that the high IoU target attracts more attention, and increases the high IoU regression gradient, thereby speeding up the training speed and improving the BBox regression accuracy. Experiments indicate that when α is set to 3, the performance on multiple data sets is consistently good[17]; therefore, the value of α presented in this paper is 3.

    • In this paper, the initial training weight is based on the YOLOv5s model and MS COCO (Microsoft Common Objects in Context) data set. Although the MS COCO dataset does not contain samples such as smoke and fires, they all identify the targets in the image by learning image target labels. The early picture processing methods are similar. This approach is a kind of homogeneous transfer learning from the perspective of the source field and the target domain, inductive transfer learning from the perspective of the label-based setting classification, and characteristic transfer learning from the perspective of the transfer method. Employing this strategy can increase the generalization of the model over a variety of MS COCO data sets (80 included), drastically reduce the time and labor cost of data labeling, generate rough models for fire target detection, and lay the groundwork for automatic labeling.

      At the same time, we utilize the rough model to reason and identify a large number of unlabeled photos, record the types and location information of the target, and then update each rectangle box to collect new fire and smoke data for the original rough model. Target recognition rough models are retrained to iterate through this cycle and continuously obtain more precise fire target recognition models. Using instance-based transfer learning to improve the ability to identify fire and smoke, the number of training data sets is gradually raised and the cost of individual training is decreased. Similarly, the training approach based on instances can be utilized to further strengthen the training effect of fire recognition in a specialized context in accordance with the scenario's specialization.

    • This study conducts all experiments on a Dell Precision 7920 Tower Server (Desktop-9PVCQ4). The server is equipped with an Inter Xeon (R) GOLD 5218 processor, an NVIDIA GeForce RTX 1650 graphics processing unit, and 32GB Memory. Under the Anaconda and Pycharm compilers, the Python3.9, PyTorch 1.9.0, and CUDA1.8.0 environments are set up. The development of neural network models is supported by the PyTorch framework.

      YOLOv5 and five optimization algorithms, which are YOLOv5 + CAC3, YOLOv5 + TRC3, YOLOv5 + αCIOU, YOLOv5 + CA + αCIOU and YOLOv5 + TR + αCIOU, are adopted to compare the detection accuracy and speed. The YOLOv5 algorithm employs YOLOv5's lightweight basic model. YOLOv5 + CAC3 is the backbone network of YOLOv5's basic model, enhanced by the C3 module's incorporation of the CA mechanism for coordinated attention. YOLOv5 + TRC3 represents the backbone network of YOLOv5's basic model, which has been enhanced with the C3 module's inbuilt self-attention mechanism. YOLOv5 + αCIOU represents the YOLOv5's basic model algorithm employing Alpha-CIOU as the loss function. YOLOv5 + CA + αCIOU depicts the optimization method of YOLOv5's fundamental model by employing the CAC3 module embedded with the coordinate attention mechanism and the Alpha-CIOU location loss function concurrently. YOLOv5 + TR + αCIOU is an optimization technique for YOLOv5's basic model that employs both the TRC3 module with the self-attention mechanism of the Transformer structure and the Alpha-CIOU location loss function.

    • According to the objective of the experiments, the dataset is divided into three parts: training dataset (train), verification dataset (val), and test dataset (test). The training dataset is used to train the model, whereas the verification dataset is used to evaluate the model's performance. The test dataset is used to evaluate the model's efficacy, precision, and generalizability.

      The training dataset (consisting of 1,971 images) used in this article is collected from experiments, public datasets, and Internet photographs and videos of fire and smoke. All photos are randomly divided into a training dataset and a verification dataset in the ratio of 8:2, with 1,577 images automatically assigned to the training dataset and 394 images assigned to the verification dataset. The employed dataset consists of fire and smoke photos with various shapes, distances, and interfering objects from various combustion objects (buildings, automobiles) and scenarios. The photographs depict both indoor and outdoor themes, such as families, offices, and factories, as well as mountains, forests, and roads.

      The test dataset uses Bowfire from Chino et al.[18], one of the most reputable open-source datasets pertaining to fire detection. Bowfire features 226 images, comprising 119 fire-related images and 107 non-fire-related images. 325 fire targets and 153 smoke targets are labeled.

    • Target detection by a deep learning network necessitates the prior annotation of training datasets and the provision of the real frame GT. To manually label the image dataset, the Labeling tool Labelimg is utilized. While labeling, uniformity and precision of the label must be ensured.

      The target is selected using the horizontal bounding box (HBB) in the YOLO dataset. The upper left corner of the original image is (0,0), the horizontal direction is the X-axis, the vertical direction is the Y-axis, and the normalizing process places the lower right corner at (1,1) (as shown in Fig. 1). The labeling file returns five variables (classes, bx, by, W, H), of which bx and by are the coordinates of the original image's HBB center point. W and H represent the absolute difference between the frame's left and right edges, as well as its top and bottom edges. After the labeling process is complete, a .txt format file is created to store the image's category and location information.

      Figure 1. 

      Relationship between labeling parameters and image.

    • Precision (P) is determined by the proportion of actual positive cases that were accurately anticipated. It displays the inability to incorrectly identify negative samples as positive. Recall (R) is the proportion of accurately predicted real objects to the total number of real targets. It represents the ability of real targets to be predicted

      P=TPTP+FP (3)
      R=TPTP+FN (4)

      where, TP refers to true positive, FP denotes false positive, FN is false negative, and TN is true negative (as shown in Table 1). TP+FP is the number of all predicted boxes, and TP + FN is the number of all real targets. The confusion matrix represents the relationship between the positive and negative of the sample prediction value and the positive and negative of the sample real value[1921].

      Table 1.  Confusion matrix.

      True value
      Positive
      (real target)
      Negative
      ( non-target)
      Predicted valuePositiveTrue Positive (TP)False Positive (FP)
      NegativeFalse Negative (FN)True Negative (TN)

      F-measure is the weighted harmonic mean value of P and R, which provides a single score that balances precision and recall in a single number. In general, R is inversely associated with P, such that high R and high P cannot coexist. The higher the F value, the more effective the detection.

      Fβ=(1+β2)×P×Rβ2×P+R (5)

      β=1 is normally used, i.e.,

      F1=2×P×RP+R (6)

      Average Precision (AP) is the area enclosed by the P and R curves, which reflects the overall performance of the detection model and eliminates the single-point limitation of P, R, and F-measure. The greater the effect, the closer AP is to 1.

      AP=10P(r)d(r) (7)

      Mean average precision (mAP) is the average of the sum of the APs of various items and reflects the overall performance of the multi-category detection model.

      mAP=APNum(class) (8)

      mAP@0.5 is the mAP at an IoU threshold of 0.5. mAP@0.5:0.95 represents the average mAP at various IoU thresholds (ranging from 0.5 to 0.95 in 0.05 increments)[2224].

      The detection rate is measured in frames per second (FPS), which shows the number of frames (images) that the target recognition model identifies each second. When the detection rate is greater than 30 FPS, it is deemed to have reached the real-time detection level, as the video processing rate is typically at least 30 FPS.

      The detection rate can be measured by frame per second (FPS), which indicates how many frames (pictures) the target recognition model detects per second. Generally, the video processing rate is at least 30 FPS, so it can be considered as reaching the real-time detection level when the detection rate is greater than 30 FPS.

    • The comprehensive detection capabilities of each algorithm are measured using the five indices described previous and the size of the model. Table 2 displays the results. In order to intuitively understand the effect of each algorithm, histograms (as shown in Figs 2 & 3) are drawn according to the precision and recall statistics presented in Table 2. Furthermore, Precision-Recall curves of the six fire detection algorithms are also displayed in Fig. 4. YOlOv5 + TRC3 algorithm achieves the highest fire and total average detection accuracy, as well as the second-highest smoke detection accuracy. YOlOv5 + TR + αCIOU achieves the best accuracy in smoke detection and the second-highest accuracy in fire and overall average detection. This indicates that self-attention mechanism based on the Transformer structure can improve the detection accuracy to some degree. The average recall rate of fire detection of YOlOv5 + TR + αCIOU algorithm is the highest, which is 68.5%, clearly outperforming other algorithms. In addition, the average recall rate of the optimized algorithms for fire detection has been significantly improved through the use of the improved loss function of Alpha-CIOU.

      Table 2.  Experimental results based on YOLOv5 and 5 optimization algorithms.

      ModelClassPRMAP
      @0.5
      F1FPS/Frame
      per second
      Weight/
      MB
      YOlOv5All0.7780.5400.6410.6464.114.5
      Fire0.8590.6550.764
      Smoke0.6960.4260.518
      YOlOv5
      + CAC3
      All0.7760.5850.6530.6672.513.8
      Fire0.8290.6880.774
      Smoke0.7220.4810.531
      YOlOv5
      + TRC3
      All0.8550.5810.6970.6954.614.5
      Fire0.9030.6990.797
      Smoke0.8060.4630.597
      YOlOv5
      + αCIOU
      All0.7740.5830.6510.6661.314.5
      Fire0.8180.6670.765
      Smoke0.7290.5000.538
      YOlOv5
      + CA
      + αCIOU
      All0.7270.6140.6730.6760.613.8
      Fire0.8320.7100.794
      Smoke0.6220.5190.553
      YOlOv5
      + TR
      + αCIOU
      All0.8390.6850.7240.7058.814.5
      Fire0.8600.7100.806
      Smoke0.8180.5000.641

      Figure 2. 

      Precision of fire and smoke detection in the six algorithms.

      Figure 3. 

      Recall rate of fire and smoke detection in the six algorithms.

      Figure 4. 

      Precision-Recall curves of six fire detection algorithms.

      Figure 5 demonstrates that YOlOv5 + TR + αCIOU achieves the highest mAP (0.724), which is much greater than other algorithms. In addition, the mAP of algorithms employing the Alpha-CIOU loss location function has been drastically enhanced. This demonstrates that the Alpha-CIOU function can assist the algorithm in further balancing precision and recall, hence directly enhancing its detection capacity.

      Figure 5. 

      mAP of the six fire detection algorithms.

      As illustrated in Figs 6 & 7, each of the five optimized algorithms can improve F1 of fire detection. The comparison reveals that, apart from the TRC3 module, the enhancement of other algorithms is not as noteworthy. Both algorithms with a bigger F1 employ the attention mechanism based on the Transformer structure, demonstrating the superiority of this attention mechanism.

      Figure 6. 

      F1 curves of six fire detection algorithms.

      Figure 7. 

      F1 of six fire detection algorithms.

      The maximum mAP and F1 are attained using the YOlOv5 + TR + αCIOU algorithm, which adopts both the attention mechanism based on the Transformer structure to improve the backbone network and the Alpha-CIOU to improve the location function. The precision P and recall R have been significantly improved through the application of the self-attention mechanism module based on the Transformer structure. In addition, the results suggest that the location loss function based on Alpha-CIOU can improve the recall rate.

      Figure 8 visualizes the fire detection results of various algorithms, allowing the viewer to immediately perceive the detection impacts on various sorts of targets. According to the data, YOLOv5 has a drastically diminished ability to recognize smoke. The addition of the attention mechanism increases the frequency of smoke target recalls and enhances the detection effect.

      Figure 8. 

      Detection effect of the six algorithms for a building fire.

      Figure 9 shows the capacity of the six algorithms to recognize a single overturned flame. The results indicate that all algorithms can effectively recognize a single, simple flame target. Unfortunately, both the YOLOv5 algorithm and the YOLOv5 + CAC3 algorithm can misidentify clouds as smoke. In addition, it demonstrates that enhancing the characteristic receptive area and supplementing the characteristic information via the attention mechanism can further enhance the target object's distinction.

      Figure 9. 

      Detection effect of the six algorithms for a single overturned flame.

      Figure 10 demonstrates the capacity of six algorithms to recognize various fire and smoke targets. The optimization algorithm using Alpha-CIOU can effectively improve the target recall rate, which is in line with the conclusions drawn from the experimental comparison. Both the YOLOv5 and YOLOv5 + CAC3 algorithms fail to detect the middle smoke. With Alpha-CIOU, however, all targets are recognized (independent of target size and type), and the detection frame delineates the firing targets more completely.

      Figure 10. 

      Detection effect of the six algorithms for multiple fire and smoke targets.

      Due to smoke's less visible pixel information features compared to those of flame, the accuracy and recall of smoke are much lower than those of flame, as demonstrated by the results of the preceding experiments. Furthermore, the smoke pixel information is different and does not have a fixed shape, and the smoke concentration and combustible types can significantly change the color characteristics. Smoke detection can therefore only be utilized as an a priori target for rapid fire detection. The discovered smoke cannot be utilized as a criterion for determining the occurrence of a fire, which requires additional manual investigation and verification.

      The self-attention mechanism based on the Transformer structure has a considerable impact on improving the accuracy of target detection, as demonstrated by the aforementioned experimental findings. Updating the location function to Alpha-CIOU can increase the recall rate and ensure the detection of a variety of targets to some extent. Combining the self-attention mechanism of the Transformer structure with the Alpha-CIOU positioning function, the algorithm suggested in this research increases the target detection capabilities, but at the expense of computational power and speed. Comparatively to the coordinate attention mechanism, the CA module makes the model lighter, and the addition of Alpha-CIOU can retain a higher level of detecting capabilities. In order to balance the conflict between detecting accuracy and speed, various algorithms can be employed for real applications based on their suitability.

    • Following a comparison between YOLOv5 and five optimized algorithms, the YOLOv5 + TR + αCIOU algorithm demonstrates the best performance. In the Bowfire test dataset, the precision P reaches 83.9%, the recall rate reaches 68.5%, the mAP@0.5 is 0.74, and F1 reaches its maximum value of 0.70 at a confidence level of 0.214. Thus, YOlOv5 + TR + αCIOU is finally adopted in this paper. Figure 11 depicts the detection ability of the YOlOv5 + TR + αCIOU algorithm on difficult photos from the Bowfire dataset. Different distances of objects in the same image, overturned objects, and the presence of smoke-like interfering substances such as cloud, fog, and water mist represent the difficulties. Figure 12 depicts the effects of fire detection in various scenarios.

      Figure 11. 

      Detection effect of YOLOv5 + TR + αCIOU algorithm in Bowfire dataset.

      Figure 12. 

      Fire detection effects of the YOLOv5 + TR + αCIOU algorithm for different types of scenarios.

    • In order to further investigate the algorithm's ability to detect fires in real-time, a factory fire surveillance video is acquired from the Internet. Fourty eight seconds of video are transformed using the Python OpenCV package into 1476 frames of pictures. The results of both fire and smoke detection are depicted in Figs 1315. A manual examination of the video reveals that the fire starts in the eighth second (around frame 229) when the tank begins to release a little bit of smoke. The monitoring video reveals, however, virtually no change. Neither the human eye nor the YOlOv5 + TR + αCIOU algorithm can detect the smoke leaking at this moment.

      Figure 13. 

      Incipient stage of the factory fire.

      The flame starts to appear in the 9th second, and the algorithm also finds the fire in the 9th second (frame 279, as shown in Fig. 14), which shows that the YOlOv5 + TR + αCIOU algorithm can detect the fire in time. Beginning in the 9th second, the fire gradually intensifies, and by the 14th second, the exterior layer of the tank is completely consumed. At this stage, the fire has reached a condition of total combustion. The fire is no longer spreading as swiftly as before. The video shows that at the 23rd second, an employee discovers the fire and begins extinguishing it with a fire extinguisher.

      Figure 14. 

      Developing stage of the factory fire.

      According to the fully grown stage of the factory fire shown in Fig. 15, the employee is unable to extinguish the fire with conventional fire extinguishers. Black smoke develops at the 26-second mark, indicating the production of unburned solid particles. Due to the excessive fuel loss and insufficient oxygen supply, the fire transforms from oxygen-rich combustion to fuel-rich combustion.

      Figure 15. 

      Fully developed stage of the factory fire.

      To sum up, YOlOv5 + TR + αCIOU algorithm recognizes smoke in the 8th second and fire in the 9th second. The confidence of fire target reaches 95% in the 9th second. The prediction box contains the fire entirely, and the fire has been identified. The results show the feasibility and prospect of real-time fire detection based on YOlOv5 + TR + αCIOU algorithm.

    • This paper investigates the viability of the attention mechanism, loss function, and transfer learning in further optimizing the fire detection effect using the YOLOv5 algorithm. The key findings are as follows:

      (1) The coordinate attention mechanism CA and the self-attention mechanism based on Transformer structure are embedded into the C3 module to create a new backbone network. Feature weighting focuses on the desired feature points and extracts useful feature information. A parameter is imported to exponentiate the original positioning function CIOU, which facilitates the production of a more accurate prediction box. The yolov5s.pt file is used to train the crude fire detection model, which enables semi-automatic data set annotation and minimizes training expenses.

      (2) Fire identification experiments under varying conditions are carried out for YOLOv5 and five optimization algorithms. The experimental results show that embedding attention mechanism and modifying location function have significant optimization effects on detection accuracy and recall rate.

      (3) YOlOv5 + TR + αCIOU algorithm is adopted to detect the factory fire video, and achieves an excellent balance between detection precision and speed.

      (4) The YOLO algorithm used in this paper cannot recognize the motion features of the target. In the future, the separation of moving and static objects can be achieved by further introducing time information and considering the relationship between consecutive frames.

      (5) In the future, attempts can be made to replicate and locate target detection results in 3D space By utilizing digital twin technology, the real-time simulated on-site 3D scenes could be obtain, which can help provide rich visualization and operational information.

    • The authors confirm contribution to the paper as follows: study conception and design: Shao Z, Lu S, Shi X; data collection: Yang D, Wang Z; analysis and interpretation of results: Lu S, Shi X, Yang D; draft manuscript preparation: Lu S, Wang Z. All authors reviewed the results and approved the final version of the manuscript.

    • The datasets generated and analyzed during the current study are available from the corresponding author on reasonable request.

      • The project is funded by the National Natural Science Foundation of China (Grant Nos 52274236, 52174230), the Xinjiang Key Research and Development Special Task (Grant No. 2022B03003-2), the China Postdoctoral Science Foundation (Grant No. 2023M733765).

      • The authors declare that they have no conflict of interest.

      • Copyright: © 2023 by the author(s). Published by Maximum Academic Press on behalf of Nanjing Tech University. This article is an open access article distributed under Creative Commons Attribution License (CC BY 4.0), visit https://creativecommons.org/licenses/by/4.0/.
    Figure (15)  Table (2) References (24)
  • About this article
    Cite this article
    Shao Z, Lu S, Shi X, Yang D, Wang Z. 2023. Fire detection methods based on an optimized YOLOv5 algorithm. Emergency Management Science and Technology 3:11 doi: 10.48130/EMST-2023-0011
    Shao Z, Lu S, Shi X, Yang D, Wang Z. 2023. Fire detection methods based on an optimized YOLOv5 algorithm. Emergency Management Science and Technology 3:11 doi: 10.48130/EMST-2023-0011

Catalog

  • About this article

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return