CN117876874A

CN117876874A - Forest fire detection and positioning method and system based on high-point monitoring video

Info

Publication number: CN117876874A
Application number: CN202410055348.4A
Authority: CN
Inventors: 谢亚坤; 朱庆; 朱军; 冯德俊; 刘子琛
Original assignee: Southwest Jiaotong University
Current assignee: Southwest Jiaotong University
Priority date: 2024-01-15
Filing date: 2024-01-15
Publication date: 2024-04-12

Abstract

The invention discloses a forest fire detection and positioning method and system based on a high-point monitoring video, belongs to the technical field of forest fire detection and positioning, and solves the problems of low accuracy and low efficiency of smoke extraction and positioning of forest fires caused by early-stage forest fires under the condition of complex background. The method comprises the steps of constructing a forest fire detection data set based on an acquired high-point monitoring video, training a constructed multi-scale and multi-dimensional feature extraction network which fuses video space features and time sequence features to identify a detection frame of smoke and flame in the high-point monitoring video to be identified, and finally obtaining the detection frame of a smoke object and a flame object and a central two-dimensional pixel coordinate of the flame object in each frame image; and (3) accurately positioning the forest fire driven by the video stereoscopic grid based on the detection frames of the smoke object and the flame object and the central two-dimensional pixel coordinates of the flame object in each frame of image. The method is used for accurately detecting the early forest fire based on the high-point monitoring video.

Description

Forest fire detection and positioning method and system based on high-point monitoring video

Technical Field

A forest fire detection and positioning method and system based on high-point monitoring video are used for accurately detecting early forest fires based on the high-point monitoring video, and belong to the technical field of forest fire detection and positioning.

Background

Forest is a precious natural resource, and a forest ecosystem is one of important ecosystems on the earth, and has irreplaceable effects and important influences on human survival and development, biodiversity, maintenance and improvement of ecological environment, and response to global climate warming and the like.

Among the many forest natural disasters, forest fires are the most damaging to the forest ecosystem. Forest fires destroy forest structures, affect living environments of wild animals and plants, cause water and soil loss, and further cause other natural disasters such as debris flow. The forest fire has the characteristics of long delay time, large fire area, high spreading speed, large damage degree and the like, not only damages forestry resources, but also causes great harm to national economy, personnel life safety and ecological environment. Therefore, the forest fire can be effectively monitored and positioned, the forest fire can be timely found, and the system has important effects on rescue and fire extinguishing work and reduction of loss caused by the forest fire.

At present, with the continuous development of high-point video monitoring technology, computer vision technology and artificial intelligence technology, the utilization of high-point video monitoring to monitor forest fires has become one of important means for forest fire prevention, and the fire monitoring mode of the high-point video monitoring system has the unique advantages of strong timeliness, stability, reliability, flexible and various information and the like, and occupies important positions in forest fire detection. Mainly controls the fire disaster from the source and effectively controls the fire disaster not to occur or not to occur after the occurrence, the method is used for realizing early discovery, early prevention, early treatment and early extinguishing, and is an effective way for improving forest fire prevention. In recent years, with successful application of artificial intelligence technologies represented by big data, machine learning, and deep learning in fields such as speech recognition, computer vision, and recommendation systems, artificial intelligence has made great progress in algorithms, models, and architectures. The artificial intelligence technology represented by deep learning also provides a new thought for forest fire identification, and a forest fire identification algorithm serving as a core component of a forest fire high-point video monitoring system is widely paid attention to by researchers in recent years. Moreover, the deep learning algorithm avoids the manual complexity compared to traditional computer vision methods, and can learn complex representations from a large number of image datasets.

In addition, since the fire source is difficult to be directly observed in the early forest fire, a great amount of smoke generated by smoldering objects such as branches, and leaves of the forest are main characteristics of the early fire. The early warning technology for detecting smoke through the high-point monitoring video image to realize early forest fire is a hot spot of current research. However, the prior art has the following technical problems:

1. because the background is complex, aiming at the problems of easiness in causing forest fire smoke extraction and positioning precision and low efficiency of early forest fire;

2. under severe weather conditions, particularly in heavy fog and haze weather, the performance of the device is damaged, so that the problems of poor positioning accuracy and low efficiency are caused;

3. the existing forest fire detection method has poor detection effect on a long-distance small target;

4. most of the existing forest fire detection methods aim at daytime scenes, and high-efficiency and real-time monitoring under night conditions cannot be achieved.

Disclosure of Invention

Aiming at the problems of the researches, the invention aims to provide a forest fire detection and positioning method and system based on a high-point monitoring video, which solve the problems of low accuracy and low efficiency of extracting and positioning smoke of a forest fire aiming at the early forest fire under the condition of complex background in the prior art.

In order to achieve the above purpose, the invention adopts the following technical scheme:

a forest fire detection and positioning method based on high-point monitoring video comprises the following steps:

step 1, constructing a forest fire detection data set based on an acquired high-point monitoring video, and training a constructed multi-scale and multi-dimensional feature extraction network which fuses video space features and time sequence features to identify a detection frame of smoke and flame in the high-point monitoring video to be identified, so as to finally obtain a detection frame of a smoke object and a flame object and a central two-dimensional pixel coordinate of the flame object in each frame image;

and 2, accurately positioning the forest fire driven by the video stereoscopic grid based on the detection frames of the smoke object and the flame object and the central two-dimensional pixel coordinates of the flame object in each frame of image.

Further, the multi-scale and multi-dimensional feature extraction network for fusing the video spatial features and the time sequence features constructed in the step 1 comprises an input layer, a Focus module, a first Conv module, a global-local feature extraction module, a second Conv module, a deep-shallow feature extraction module, a third Conv module, a time sequence neural unit, a fourth Conv module, a pooling module, a depth and receptive field enhancement module and an output layer which are sequentially connected, wherein the first Conv module, the second Conv module, the third Conv module and the fourth Conv module are convolution modules consisting of Conv2d, a BN layer and an SILV which are sequentially connected.

Further, the global-local feature extraction module comprises a local feature extraction module and a global feature extraction module which are respectively connected with the first Conv module, a 3×3 convolution layer which receives results obtained after the local feature extraction module and the global feature extraction module are output and processed, and a 1×1 convolution layer which is sequentially connected with the 3×3 convolution layer, wherein the local feature extraction module comprises a 3×3 convolution layer and a 1×1 convolution layer which are respectively processed by convolution on an input feature image, a batch standardization layer which is respectively connected with the 3×3 convolution layer and the 1×1 convolution layer, the results output by the two batches of standardization layers are processed to obtain local features, the global feature extraction module comprises the steps of sequentially carrying out size transformation, linearization operation and projection layer mapping operation on the input feature image to obtain a feature image Q, a feature image K and a feature image V, carrying out multiplication operation on results obtained after the feature image Q and the feature image K, carrying out size transformation on the obtained after the multiplication operation and carrying out size transformation on the feature image V, and carrying out equivalent operation on each pixel point obtained after the transformation on the feature image is obtained in the global feature image, and the global feature image is obtained by the final feature image;

The deep-shallow layer feature extraction module comprises a fifth Conv module connected with the second Conv module, two 3×3 convolution layers with step length of 1, 3×3 convolution layers with step length of 2 and 3×3 convolution layers with step length of 5 which are sequentially connected with the fifth Conv module respectively, the multiplication operation is carried out on the output results of the two 3×3 convolution layers with step length of 5 and the fifth Conv module respectively, then the addition operation is carried out on the result obtained by the multiplication operation and the output result of the fifth Conv module, and the output result is obtained by connecting the two results obtained by the addition operation in series, wherein the fifth Conv module is a convolution module consisting of Conv2d, BN layer and SILV which are sequentially connected with each other;

the time sequence neural unit receives the activated t-1 time hidden layer state vector a output by the third Conv module ^t ^-1 And a t moment vector x ^t And respectively a ^t-1 And x ^t And V is equal to _ah And V _xh Performing multiplication operation, and obtaining a result and b by the two multiplication operations _h Performing addition operation to obtain hidden layer state h ^t Then the hidden layer state is processed by hyperbolic tangent function tanh to obtain an activated hidden layer state vector a at the time t ^t And output at the same time a ^t And V is equal to _ao Performing a multiplication operation, the result obtained by the multiplication operation and b _o Adding to obtain a state vector c of the output node ^t ，c ^t After softmax calculation, the label vector is converted into an output label vectorWherein V is _xh Representing a weight matrix of K input nodes to N hidden nodes, V _ah The representation is a weight matrix connecting N hidden nodes at t-1 time to N hidden nodes at t time, b _h Representing a matrix of hidden layer weights, b, before activation _o Representing the activated hidden layer weight matrix, V _ao Representing the weight matrix of the activated input node to the hidden node.

Further, the specific steps of the step 1 are as follows:

step 1.1, based on the acquired high-point monitoring video, an all-weather forest fire database based on a forest fire classification system is established through rough labeling, rendering, training, feedback, fine tuning and enhancement, and the method comprises the following specific steps:

step 1.11, artificially analyzing the feature expression of smoke and fire in a forest fire scene in a high-point monitoring video in a video image, and preliminarily establishing a coarse annotation database through manual annotation, wherein the features comprise color features, shape features and texture features, the colors are obtained by analyzing corresponding color histograms, color sets, color moments and color aggregation vectors, the shapes are obtained by adopting a boundary feature method, a Fourier shape descriptor method, a shape geometric parameter method and a finite element method, and the textures are obtained by adopting a gray level symbiotic matrix, an energy spectrum function, a random field model, an autoregressive texture model and wavelet transformation analysis;

Step 1.12, artificially analyzing characteristic heterogeneity expression caused by different types of interference in forest fire scenes in a high-point monitoring video, and performing diversified rendering on data in a coarse marking database, wherein the different types are classified into conifer fires, needle-broad mixed forest fires and broadleaf forest fires according to different forest land types, and classified into surface fires, crown fires and underground fires according to different fire positions, and classified into forest fires, general forest fires, important forest fires and extra-large forest fires according to different damaged forest areas, and classified into daytime forest fires and night forest fires according to different time of occurrence of forest fires, wherein the characteristic heterogeneity comprises light intensity, scale difference, smoke concentration and smoke-like fire objects;

step 1.13, learning coarse annotation database knowledge by utilizing a neural network model, detecting unlabeled data, feeding back error division conditions, finely annotating the data in a fine adjustment mode, and enhancing the data by an album tool; after enhancement, if the requirements are met, a forest fire detection data set with diversified characteristics is obtained, otherwise, the step 1.11 is transferred to be executed again;

Step 1.2, training a multi-scale and multi-dimensional feature extraction network based on a forest fire detection data set, inputting a high-point monitoring video to be identified into the trained multi-scale and multi-dimensional feature extraction network to obtain a smoke and flame object detection frame, simultaneously calculating the vertex two-dimensional pixel coordinates of the flame detection frame, calculating the center two-dimensional pixel coordinates of the detection frame based on the vertex two-dimensional pixel coordinates of the flame, and finally obtaining the detection frame of the smoke object and the flame object and the center two-dimensional pixel coordinates of the flame object.

Further, the specific steps of the step 2 are as follows:

step 2.1, establishing different initial positioning methods of the fire points based on the diffusion characteristics of the smoke and the flame of different forest fires, wherein the specific steps are as follows:

step 2.11, analyzing the diffusion characteristics of smoke and flame in the forest fire detection data set by adopting an optical flow method, namely dividing a diffusion model of the forest fire smoke and flame into a triangular diffusion model, a diffuse diffusion model and a radiation diffusion model by calculating a square distribution diagram of optical flow intensity and optical flow direction angle;

the formulas of the optical flow intensity and the optical flow direction angle are as follows:

L(i,j)＝u(i,j) ² +v(i,j) ²

wherein L (i, j) represents optical flow intensity, alpha (i, j) represents optical flow direction angle, u (i, j) and v (i, j) represent transverse and longitudinal optical flow vectors on pixel points (i, j), respectively, the concentrated distribution of the optical flow intensity and the optical flow direction angle is a triangular diffusion model, the irregular distribution is a diffuse diffusion model, and the uniform distribution is a radiation diffusion model;

Step 2.12, establishing different initial positioning methods of the ignition point according to different diffusion models, wherein the initial positioning method comprises a boundary-central line characteristic line positioning method established by a triangular diffusion model, a mass center moving offset positioning method established by a diffuse diffusion model and a discrete seed point positioning method established by a radiation diffusion model;

step 2.2, providing a forest fire positioning method combining a forest fire smoke diffusion model and a video three-dimensional grid, wherein the method comprises the following specific steps of:

step 2.21, simulating and establishing different simulation motion models aiming at different diffusion models, mapping the diffusion of the smoke and flame in the real world to a digital three-dimensional model in terms of diffusion speed and direction, diffusion mode and space topological relation constraint, namely obtaining the diffusion speed and direction of the simulation motion model by combining mathematical calculation with an image processing technology, and then mapping the smoke and flame of the forest fire in the real world to the digital three-dimensional model by combining the diffusion mode and space topological relation constraint of different diffusion models;

step 2.22, analyzing the adjacent, associated and contained space topological relation between smoke and flame and the structure of the forest in the digital three-dimensional model based on a geographic space topological relation analysis method, and carrying out space topological semantic description on the space topological relation;

2.23, carrying out camera calibration and distortion correction according to high-point monitoring camera parameters, extracting boundaries of a smoke object and a flame object by adopting a Bwboundaries edge extraction function under the premise of restraining a detection frame of the smoke object and the flame object, acquiring a smoke center line by combining the boundary, and utilizing a diffusion model and different initial positioning methods of a fire point to realize initial positioning of the smoke object in a two-dimensional image space, and simultaneously carrying out initial positioning based on the two-dimensional coordinates of the center point of the flame object, wherein the camera parameters comprise a pitch angle, a yaw angle and a camera height of a camera, wherein the smoke center line is a center point of all edge points of a forest fire after extracting the smoke edge and determining a smoke diffusion motion trend, forming a straight line between the center point and the slope of the smoke motion direction, and separating the smoke edge points into left and right fitting edges to obtain a final straight line which is the center line of the smoke;

2.24, converting an image coordinate system by utilizing an imaging mechanism from a three-dimensional space to a two-dimensional plane of a high-point monitoring camera and combining a digital elevation model to obtain position information under a camera and a world coordinate system, setting a reference point in a physical space, establishing a coordinate back calculation model of a pixel-image-camera-world coordinate system through the reference coordinate, and combining constraint of space topology semantic description and preliminary positioning to obtain preliminary positioning of a smoke object and a flame object, thereby realizing three-dimensional space positioning of forest fire smoke and flame objects;

And 2.25, based on three-dimensional space positioning of a forest fire smoke object and a flame object, mapping from longitude and latitude, height and three-dimensional grid position codes of the smoke object and the flame object is established, namely, the corresponding relation and the mutual conversion from video pixel coordinates to three-dimensional position codes are realized, and after conversion, the forest fire driven by the video three-dimensional grid is accurately positioned.

A forest fire detection and positioning system based on a high-point monitoring video comprises the following steps:

and a network construction and detection module: constructing a forest fire detection data set based on the acquired high-point monitoring video, and training a constructed multi-scale and multi-dimensional feature extraction network integrating the video spatial features and the time sequence features to identify a detection frame of smoke and flame in the high-point monitoring video to be identified, so as to finally obtain a detection frame of a smoke object and a flame object and a central two-dimensional pixel coordinate of the flame object in each frame of image;

accurate positioning module of forest fire: and (3) accurately positioning the forest fire driven by the video stereoscopic grid based on the detection frames of the smoke object and the flame object and the central two-dimensional pixel coordinates of the flame object in each frame of image.

Further, the specific implementation steps of the network construction and detection module are as follows:

Further, the specific implementation steps of the forest fire accurate positioning module are as follows:

L(i,j)＝u(i,j) ² +v(i,j) ²

Compared with the prior art, the invention has the beneficial effects that:

1. according to the invention, the forest fire smoke dimension difference and scale change characteristics are fully considered from three dimensions of depth, width and resolution, and a target recognition network model combining multi-dimensional characteristic extraction and multi-scale characteristic fusion is established to realize high-precision detection of early forest fire smoke under complex interference conditions, and the method is specifically characterized in that:

(1) The establishment of the all-weather forest fire database (namely a forest fire detection data set) based on the forest fire classification system is characterized in that a cyclic iteration optimization mechanism of rough labeling, rendering, training, feedback, fine tuning and enhancement is adopted; meanwhile, a cyclic iteration optimization mechanism is utilized to establish a large-scale forest fire disaster information detection database with all-weather scenes and diversified characteristics;

(2) The establishment of a multi-mode target detection network model (i.e. a multi-scale and multi-dimensional feature extraction network integrating video space features and time sequence features) integrating multi-scale features is a core algorithm for accurately identifying forest fires driven by video data, the core of the technology is to comprehensively consider the features of global local, deep shallow and video time sequences, realize a multi-mode information extraction method for multi-scene multi-mode forest fire smoke and flame objects, the technical model respectively establishes a multi-mode target detection network model comprising a global-local feature extraction module, a deep-shallow feature extraction module and a time sequence neural unit from the aspects of video global local, deep shallow and video time sequences, and compared with other methods, the method comprehensively acquires multi-mode features of the forest fire smoke and flame objects in videos, realizes comprehensive exploration of significance characterization of the smoke objects and flame objects in complex forest fire scenes, can obviously describe the objects when the forest fires occur, rapidly accurately detects the objects and flame objects, wherein the model realizes improvement on an original data set, the accuracy of the model is improved, the SOTA is improved by more than the first 37, the SOTA is improved by 1.35.25%, the SO1.35.1%, the SO1.35.35.1%, the SO1.720.1% is better than the CPU, the full-accuracy is improved, and the full-accuracy is improved by the comparison of the 1.35.1%, making it well suited for real-time deployment on edge devices.

2. According to the method, the spatial topological relation between the geographic position and the camera gesture is analyzed, the diffusion characteristics of early forest fire smoke are considered to construct the forest fire positioning method combining the smoke diffusion characteristics and the video three-dimensional grid, the limitation of the topography condition and the visibility of the observation point is broken through, the accurate positioning of the early forest fire of the video image is realized, and the method is specifically characterized in that:

(1) According to the type of the diffusion model of the smoke and the flame, a preliminary positioning method suitable for different conditions is formulated, different diffusion models of the smoke and the flame are developed so as to adapt to various terrain and climatic conditions, for example, different models are designed aiming at the influence of variables such as wind speed, humidity and temperature;

(2) The method for carrying out three-dimensional accurate positioning by combining the video three-dimensional grids of the smoke and flame diffusion model is characterized in that a high-precision three-dimensional space model is constructed by utilizing a video three-dimensional grid technology so as to more accurately simulate and analyze the diffusion of the fire smoke and flame, and the smoke and flame diffusion model is combined with the three-dimensional grid technology to realize more accurate three-dimensional positioning.

Drawings

FIG. 1 is a general idea of the present invention;

FIG. 2 is a diagram of the logical relationship in the present invention;

FIG. 3 is a schematic diagram of the precise positioning of a forest fire driven by a video stereoscopic grid based on the detection frames of smoke objects and flame objects and the central two-dimensional pixel coordinates of the flame objects in each frame of image in the invention;

FIG. 4 is a schematic structural diagram of a multi-scale and multi-dimensional feature extraction network integrating video spatial features and time sequence features, wherein a Focus module performs slicing operation on a picture, an original 640×640×3 image is input into the Focus module, the slicing operation is adopted to firstly change the image into a 320×320×12 feature map, a Conv module is a convolution module consisting of Conv2d (i.e. a two-dimensional convolution layer), BN layer (i.e. a batch standardization layer) and SILV (i.e. an activation function), SSP is a pooling module, and C3 is a depth and receptive field enhancement module;

FIG. 5 is a schematic diagram of a global-local feature extraction module in the present invention, where D, D respectively represent depth before and after transformation, H, H respectively represent height before and after transformation, W represents width, N represents Batch size, i.e. number of pictures in one Batch, depthNorm represents depth normalization processing, reshape is used for transforming the size of the pictures, liner represents linearization operation, project represents Projection layer mapping, scale represents that each pixel point of the layer feature map is equivalent to several pixel points in the original input image, conv1×1 represents convolution kernel with size of 1×1, conv3×3 represents convolution kernel with size of 3×3, and Batch Norm represents Batch normalization;

FIG. 6 is a schematic structural diagram of a deep-shallow feature extraction module in the invention, wherein C represents the number of channels, H represents the height, W represents the width, BN represents batch normalization, reLU is a common activation function for deep learning, s represents the step size, and Concate represents serial implementation of multi-feature extraction fusion;

FIG. 7 is a schematic diagram of a time-series neural unit structure according to the present invention, wherein the number of input nodes of the cyclic neural network is K, the number of hidden layer nodes is N, the number of output nodes is L, and for time t, the input vector is assumed to be x ^t Hidden layer state h ^t After transformation by the activation function and the full connection layer, the output is obtained by the softmax function, V _xh Weight matrix representing K input nodes and N hidden nodes, V _ah The representation is a weight matrix connecting N hidden nodes at the time (t-1) to N hidden nodes at the time t, a ^t Is an implicit layered vector after activation, c ^t Is the state vector of the output node, and is converted into the output label vector after being calculated by softmax

FIG. 8 is an exemplary diagram of the present invention employing boundary-centerline feature line localization;

FIG. 9 is an exemplary diagram of an application of centroid shifting in the present invention;

FIG. 10 is an exemplary diagram of the application of the discrete seed point method in the present invention.

Detailed Description

The invention will be further described with reference to the drawings and detailed description.

Aiming at the requirements of automatic detection and positioning of early forest fires and aiming at the problems of poor forest fire smoke extraction and positioning precision and low efficiency under complex background conditions, a forest fire smoke efficient recognition algorithm combining multi-dimensional feature extraction and multi-dimensional feature fusion is designed, and a video three-dimensional grid forest fire point accurate positioning method based on forest fire smoke diffusion features is provided, so that accurate recognition and positioning of early forest fires are realized, and forest fire emergency response and disaster relief are supported.

As shown in fig. 1 and 2, the device comprises: 1) Constructing a forest fire detection data set based on the acquired high-point monitoring video, and training a constructed multi-scale and multi-dimensional feature extraction network integrating the video spatial features and the time sequence features to identify a detection frame of smoke and flame in the high-point monitoring video to be identified, so as to finally obtain a detection frame of a smoke object and a flame object and a central two-dimensional pixel coordinate of the flame object in each frame of image; 2) And (3) accurately positioning the forest fire driven by the video stereoscopic grid based on the detection frames of the smoke object and the flame object and the central two-dimensional pixel coordinates of the flame object in each frame of image. In the data set construction stage, various complex background conditions are fully considered: the forest fire smoke has visual similarity with other objects such as cloud, fog and the like, small targets at long distance in early stage, different illumination conditions, weather conditions, dynamic background and the like. In the forest fire smoke characteristic analysis stage, the triangular diffusion characteristics, the movement direction and the speed characteristics and the like of the forest fire smoke are fully considered and analyzed. In a forest fire smoke recognition stage under a complex background condition, taking into consideration that a high-point monitoring video is photographed at a certain distance, the detection precision of the existing target detection network for a long-distance small target is to be improved, cloud, fog and the like are easily recognized as fire smoke by mistake, and a multi-scale and multi-dimensional feature extraction network is constructed aiming at the problems. In a forest fire accurate positioning stage, aiming at the problem that the positioning of forest fires based on infrared images is affected by seasons and climates and the accuracy of a forest fire positioning method based on single points and double points is low, a forest fire detection and positioning method based on high-point monitoring videos is provided, and the positioning accuracy of the forest fires is improved.

The forest fire video data background is complex, and the high-point monitoring video is influenced by cloud rain, foggy days, illumination and the like, so that the smoke characteristics of the early forest fire are seriously interfered. Therefore, how to reveal the characteristics of dimension difference, dimension change and the like of early forest fire smoke in a video image and realize high-precision recognition of the forest fire smoke is a key problem. According to the scheme, the forest fire smoke dimension difference and the dimension change characteristics are fully considered from three dimensions of depth, width and resolution, and a target recognition network model combining multi-dimension characteristic extraction and multi-dimension characteristic fusion is established to realize high-precision detection of early forest fire smoke under complex interference conditions. The method comprises the steps of constructing a forest fire detection data set based on an acquired high-point monitoring video, training a constructed multi-scale and multi-dimensional feature extraction network which fuses video space features and time sequence features to identify a detection frame of smoke and flame in the high-point monitoring video to be identified, and finally obtaining the detection frame of a smoke object and a flame object and a central two-dimensional pixel coordinate of the flame object in each frame of image. The method comprises the following specific steps:

As shown in fig. 4, the constructed multi-scale and multi-dimensional feature extraction network for fusing video spatial features and time sequence features comprises an input layer, a Focus module, a first Conv module, a global-local feature extraction module, a second Conv module, a deep-shallow feature extraction module, a third Conv module, a time sequence neural unit, a fourth Conv module, a pooling module, a depth and receptive field enhancement module and an output layer which are sequentially connected, wherein the first Conv module, the second Conv module, the third Conv module and the fourth Conv module are convolution modules consisting of a Conv2d, a BN layer and a SILU which are sequentially connected.

As shown in fig. 5, the global-local feature extraction module includes a local feature extraction module and a global feature extraction module which are respectively connected with the first Conv module, a 3×3 convolution layer which receives the results output by the local feature extraction module and the global feature extraction module and performs the addition operation, and a 1×1 convolution layer which is sequentially connected with the 3×3 convolution layer, wherein the local feature extraction module includes a 3×3 convolution layer and a 1×1 convolution layer which respectively perform the convolution treatment on the input feature image, a batch standardization layer which is respectively connected with the 3×3 convolution layer and the 1×1 convolution layer, the addition operation is performed on the results output by the two batches of standardization layers to obtain local features, the global feature extraction module sequentially performs the size conversion, the linearization operation and the projection layer mapping operation on the input feature image to obtain a feature image Q, a feature image K and a feature image V, performs the multiplication operation on the results obtained after the depth normalization treatment on the feature image Q and the feature image K, performs the size conversion on the feature image V, and performs the dimension conversion on the feature image V, and performs the equivalent operation on each pixel point of the feature image obtained after the conversion, and the global feature image is obtained in the final feature image;

As shown in fig. 6, the deep-shallow feature extraction module includes a fifth Conv module connected to the second Conv module, two 3×3 convolution layers with step length of 1, 3×3 convolution layers with step length of 2, and 3×3 convolution layers with step length of 5, where the two 3×3 convolution layers with step length of 5 are multiplied by the output result of the fifth Conv module, then the result obtained by the multiplication is added to the output result of the fifth Conv module, and the two added results are connected in series to obtain an output result, where the fifth Conv module is a convolution module composed of Conv2d, BN layer, and SILU that are sequentially connected;

as shown in FIG. 7, the timing neural unit receives the activated t-1 time hidden layer state vector a output by the third Conv module ^t-1 And a t moment vector x ^t And respectively a ^t-1 And x ^t And V is equal to _ah And V _xh Performing multiplication operation, and obtaining a result and b by the two multiplication operations _h Performing addition operation to obtain hidden layer state h ^t Then the hidden layer state is processed by hyperbolic tangent function tanh to obtain an activated hidden layer state vector a at the time t ^t And output at the same time a ^t And V is equal to _ao Performing a multiplication operation, the result obtained by the multiplication operation and b _o Adding to obtain a state vector c of the output node ^t ，c ^t Calculated by softmaxPost-conversion to output tag vectorWherein V is _xh Representing a weight matrix of K input nodes to N hidden nodes, V _ah The representation is a weight matrix connecting N hidden nodes at t-1 time to N hidden nodes at t time, b _h Representing a matrix of hidden layer weights, b, before activation _o Representing the activated hidden layer weight matrix, V _ao Representing the weight matrix of the activated input node to the hidden node.

Finally, the efficient identification of the early forest fire under the complex background condition is realized, the related information of the smoke object and the flame object is obtained at the same time, and the comprehensive description of the early forest fire condition is finally obtained.

Forest fire location accuracy is low based on single-point or double-point mobile shooting, and forest fire location of infrared images is affected by seasons and climates. Therefore, how to ascertain the spatial topological relation between the geographic position and the camera gesture, and constructing a monitoring video stereoscopic grid to improve the positioning accuracy of forest fires is a key problem. By analyzing the spatial topological relation between the geographic position and the camera gesture, the forest fire positioning method combining the smoke diffusion characteristics and the video three-dimensional grid is constructed by considering the diffusion characteristics of the early forest fire smoke, the limitation of the topography condition and the visibility of the observation point is broken through, and the accurate positioning of the early forest fire of the video image is realized. The method is characterized in that the forest fire driven by the video stereoscopic grid is accurately positioned based on the central two-dimensional pixel coordinates of the smoke object and the flame object in each frame of image. As shown in fig. 3, the specific steps are as follows:

L(i,j)＝u(i,j) ² +v(i,j) ²

step 2.12, establishing different initial positioning methods of the ignition point aiming at different diffusion models, including a triangular diffusion model establishing boundary-central line characteristic line positioning method, a diffuse diffusion model establishing centroid movement offset positioning method and a radiation diffusion model establishing discrete seed point positioning method, wherein the method is shown in figures 8-10;

2.23, calibrating a camera and correcting distortion according to high-point monitoring camera parameters (the high-point monitoring camera parameters comprise pitch angle, yaw angle, camera height and the like), extracting boundaries of a smoke object and a flame object by adopting a Bwboundaries edge extraction function on the premise of restraining a detection frame of the smoke object and the flame object, acquiring a smoke center line by combining, and utilizing a diffusion model and different initial positioning methods of ignition points, realizing initial positioning of the smoke object in a two-dimensional image space, and simultaneously performing initial positioning based on the two-dimensional coordinates of the center point of the flame object, wherein the camera parameters comprise the pitch angle, yaw angle and camera height of the camera, wherein the smoke center line is the middle point of all edge points of a forest fire after extracting the smoke edge and determining a smoke diffusion movement trend, forming a straight line between the middle point and the slope of the smoke movement direction, and separating the smoke edge point into left fitting edges and right fitting edges to obtain a final straight line which is the center line of smoke;

and 2.25, based on three-dimensional space positioning of a smoke object and a flame object of the forest fire, mapping from longitude and latitude and altitude of the smoke object and the flame object to a three-dimensional grid position code is established, namely, the corresponding relation and the mutual conversion from video pixel coordinates to the three-dimensional position code are realized (after mapping is completed, the longitude and latitude and altitude information can be converted into video three-dimensional grid codes which can be identified by high-point monitoring equipment and are easy to store), and the forest fire driven by the video three-dimensional grid is accurately positioned after conversion.

In conclusion, the method is applicable to complex background and severe weather conditions, and can improve the accuracy and efficiency of smoke extraction and positioning of forest fires and the like aiming at early forest fires and the like; meanwhile, the method is suitable for detecting a long-distance small target, and the detection effect is good; the invention not only can measure the daytime scene, but also can realize high-efficiency and real-time monitoring under the night condition.

In order to further improve the detection and positioning stability of the system, the system also combines a real-time monitoring data set (wind speed, temperature, wind direction, humidity, air pressure and the like) of the surrounding environment, dynamically adjusts model parameters in real time by means of the parameters so as to achieve the optimal monitoring and positioning effect, and timely updates the three-dimensional visualization effect of the forest fire in the system. The method comprises the following steps:

real-time data integration: environmental data (such as wind speed, humidity, etc.) are monitored in real time, and model parameters are adjusted according to the data so as to improve the accuracy of fire detection.

Dynamic tracking and analysis: by tracking the dynamic changes of smoke and flame in real time and dynamically adjusting the three-dimensional grid, continuous and accurate fire source positioning is realized.

Visualization and aid decision making: an intuitive three-dimensional visual interface is provided, so that a decision maker is helped to quickly understand the development of fire and to formulate an effective coping strategy.

The above is merely representative examples of numerous specific applications of the present invention and should not be construed as limiting the scope of the invention in any way. All technical schemes formed by adopting transformation or equivalent substitution fall within the protection scope of the invention.

Claims

1. A forest fire detection and positioning method based on a high-point monitoring video is characterized by comprising the following steps:

2. The forest fire detection and positioning method based on the high-point surveillance video according to claim 1, wherein the multi-scale and multi-dimensional feature extraction network for fusing the spatial features and the time sequence features of the video constructed in the step 1 comprises an input layer, a Focus module, a first Conv module, a global-local feature extraction module, a second Conv module, a deep-shallow feature extraction module, a third Conv module, a time sequence neural unit, a fourth Conv module, a pooling module, a depth and receptive field enhancement module and an output layer which are sequentially connected, wherein the first Conv module, the second Conv module, the third Conv module and the fourth Conv module are convolution modules consisting of a Conv2d, a BN layer and a SILU which are sequentially connected.

3. The forest fire detection and positioning method based on the high-point monitoring video is characterized in that the global-local feature extraction module comprises a local feature extraction module and a global feature extraction module which are respectively connected with a first Conv module, a 3X 3 convolution layer which receives the results output by the local feature extraction module and the global feature extraction module and carries out the addition operation, and a 1X 1 convolution layer which is sequentially connected with the 3X 3 convolution layer, wherein the local feature extraction module comprises a 3X 3 convolution layer and a 1X 1 convolution layer which respectively carry out convolution processing on an input feature image, a batch standardization layer which is respectively connected with the 3X 3 convolution layer and the 1X 1 convolution layer, the results output by the two batches of standardization layers are added to obtain local features, the global feature extraction module sequentially carries out size conversion, linearization operation and projection layer mapping operation on the input feature image to obtain a feature image Q, a feature image K and a feature image V, carrying out the multiplication operation on the results obtained by carrying out depth normalization processing on the feature image Q and the feature image K, carrying out the multiplication operation on the obtained result and carrying out the feature image V and carrying out the size conversion operation on the feature image V to obtain a final feature image, and a plurality of equivalent points after the feature image is subjected to the size conversion and the global image point conversion;

the time sequence neural unit receives the activated t-1 time hidden layer state vector a output by the third Conv module ^t-1 And a t moment vector x ^t And respectively a ^t-1 And x ^t And V is equal to _ah And V _xh Performing multiplication operation, and obtaining a result and b by the two multiplication operations _h Performing addition operation to obtain hidden layer state h ^t Then the hidden layer state is processed by hyperbolic tangent function tanh to obtain an activated hidden layer state vector a at the time t ^t And output at the same time a ^t And V is equal to _ao Performing a multiplication operation, the result obtained by the multiplication operation and b _o Adding to obtain a state vector c of the output node ^t ，c ^t After softmax calculation, the label vector is converted into an output label vectorWherein V is _xh Representing a weight matrix of K input nodes to N hidden nodes, V _ah The representation is a weight matrix connecting N hidden nodes at t-1 time to N hidden nodes at t time, b _h Representing a matrix of hidden layer weights, b, before activation _o Representing the activated hidden layer weight matrix, V _ao Representing the weight matrix of the activated input node to the hidden node.

4. The forest fire detection and positioning method based on the high-point monitoring video according to claim 3, wherein the specific steps of the step 1 are as follows:

5. The forest fire detection and positioning method based on the high-point monitoring video according to claim 4, wherein the specific steps of the step 2 are as follows:

L(i,j)＝u(i,j) ² +v(i,j) ²

6. A forest fire detection and positioning system based on a high-point monitoring video is characterized by comprising the following steps:

7. The forest fire detection and positioning system based on the high-point surveillance video according to claim 6, wherein the multi-scale and multi-dimensional feature extraction network for fusing the spatial features and the time sequence features of the video constructed in the step 1 comprises an input layer, a Focus module, a first Conv module, a global-local feature extraction module, a second Conv module, a deep-shallow feature extraction module, a third Conv module, a time sequence neural unit, a fourth Conv module, a pooling module, a depth and receptive field enhancement module and an output layer which are sequentially connected, wherein the first Conv module, the second Conv module, the third Conv module and the fourth Conv module are convolution modules consisting of a Conv2d, a BN layer and a SILU which are sequentially connected.

8. The forest fire detection and positioning system based on the high-point monitoring video according to claim 7, wherein the global-local feature extraction module comprises a local feature extraction module and a global feature extraction module which are respectively connected with the first Conv module, a 3×3 convolution layer which receives the results output by the local feature extraction module and the global feature extraction module and performs the addition operation, and a 1×1 convolution layer which is sequentially connected with the 3×3 convolution layer, wherein the local feature extraction module comprises a 3×3 convolution layer and a 1×1 convolution layer which respectively perform convolution processing on an input feature image, a batch standardization layer which is respectively connected with the 3×3 convolution layer and the 1×1 convolution layer, the results output by the two batches of standardization layers are subjected to the addition operation to obtain local features, the global feature extraction module sequentially performs the size conversion, the linearization operation and the projection layer mapping operation on the input feature image to obtain a feature image Q, a feature image K and a feature image V, performs the multiplication operation on the results obtained by performing the depth normalization processing on the feature image Q and the feature image K, performs the size conversion operation on the feature image V, and performs the feature image size conversion on the feature image V, and performs the feature point conversion and the global equivalent to obtain final images;

9. The forest fire detection and positioning system based on the high-point monitoring video according to claim 8, wherein the specific implementation steps of the network construction and detection module are as follows:

10. The forest fire detection and positioning system based on high-point monitoring video according to claim 9, wherein the specific implementation steps of the forest fire accurate positioning module are as follows:

L(i,j)＝u(i,j) ² +v(i,j) ²