WO2023184350A1

WO2023184350A1 - Fire video image recognition method and system, computer device, and storage medium

Info

Publication number: WO2023184350A1
Application number: PCT/CN2022/084441
Authority: WO
Inventors: 柯峰; 方恩权; 杨利萍; 庄泽升; 彭东亮; 马跃; 何冬冬
Original assignee: 华南理工大学; 广州地铁集团有限公司; 深圳市朗驰欣创科技股份有限公司
Priority date: 2022-03-31
Filing date: 2022-03-31
Publication date: 2023-10-05
Also published as: CN114419558A; CN114419558B

Abstract

Disclosed in the present invention are a fire video image recognition method and system, a computer device, and a storage medium. The fire video image recognition method comprises: obtaining a data set, the data set being a video image data set of fire and non-fire; constructing a convolutional neural network; training the convolutional neural network by using the data set so as to obtain a fire video image recognition model; obtaining a video to be recognized, and performing framing processing on said video to obtain a video image to be recognized; and inputting said video image into the fire video image recognition model to implement fire video image recognition. According to the present invention, the detection efficiency and accuracy of a network model can be improved while the quantity of parameters of the network model is reduced, the quick recognition of the fire video image is implemented, and thus fire hazards can be discovered in time, and the personal and property safety can be ensured.

Description

Fire video image recognition method, system, computer equipment and storage medium

Technical field

The invention relates to a fire video image recognition method, system, computer equipment and storage medium, and belongs to the field of computer vision.

Background technique

With the improvement of our country's economic and technological level, the population continues to increase, and the buildings continue to increase and become denser. The continued use of electricity and fuel increases the risk of fires, and the damage caused by fires increases. Fire not only causes economic losses to society, but also endangers the safety of the public. Therefore, it is necessary to conduct specialized research on fire detection technology to identify fires when they first start and reduce the losses caused by fires as much as possible. , to protect people’s safety.

Traditional fire detection technologies mainly include smoke detection, temperature detection, light detection and gas detection. They mainly identify the occurrence of fire based on the physical characteristics of the fire, such as the concentration of smoke generated, the temperature of the environment, The light intensity of the flame and the concentration of O2 consumed by combustion and CO, CO2 and other gases produced. Traditional fire detection technology has certain limitations: First, it is limited to closed environments. If in a large area, the physical characteristics change is not obvious, the sensor detection efficiency is reduced, and at the same time, gas, particles and other physical characteristics are transmitted to the sensor. The time becomes longer as the distance increases, and the detection time becomes longer, making it impossible to report in a timely manner; secondly, it is susceptible to the influence of the environment. Changes in environmental factors such as rain, snow, wind speed, etc. will affect the physical characteristics of the fire scene, and then affect the fire scene. The accuracy of sensor detection; third, the cost is high, the price of the sensor is high, and it is susceptible to corrosion, aging or even damage.

With the development of the information age, people have begun to develop fire detection technology in an intelligent direction, using image processing, artificial intelligence and other technologies to detect and identify extracted flame features. At the same time, video surveillance technology continues to develop, and most areas have achieved full surveillance coverage. Images can very intuitively detect fire sources and fire intensity, and video-based fire detection technology has received increasing attention. However, the current fire detection technology based on artificial intelligence has complex models, excessive parameters and low detection efficiency, which is not conducive to rapid fire detection. For this reason, finding a fire identification model with a simple model, small number of parameters, and high detection efficiency has become a key concern of researchers.

Contents of the invention

In view of this, the present invention provides a fire video image recognition method, system, computer equipment and storage medium, which constitutes a new module by fusing multi-scale feature information, network residual structure and depth separable convolution operation. To build a fire video image recognition model, this network not only reduces the number of network model parameters, but also improves the detection efficiency and accuracy of the network model.

The first object of the present invention is to provide a fire video image recognition method.

The second object of the present invention is to provide a fire video image recognition system.

A third object of the present invention is to provide a computer device.

The fourth object of the present invention is to provide a storage medium.

The first object of the present invention can be achieved by adopting the following technical solutions:

A fire video image recognition method, the method includes:

Obtain a data set, which is a fire and non-fire video image data set;

Construct a convolutional neural network. The convolutional neural network includes one layer of input layer, one layer of module A, three layers of module B, two layers of module C, two layers of 1×1 convolution block A, and four layers of maximum pooling layers. One layer of adaptive average pooling layer, one layer of flatten layer, one layer of dropout layer, one layer of fully connected layer and one layer of softmax classification layer;

Use the data set to train the convolutional neural network and obtain the fire video image recognition model;

Obtain the video to be identified and perform frame processing on the video to be identified to obtain the video image to be identified;

Input the video image to be recognized into the fire video image recognition model to realize fire video image recognition.

Further, the three-layer module B is the first module B, the second module B and the third module B respectively, the two-layer module C is the first module C and the second module C respectively, and the two-layer 1×1 convolution block A is respectively are the first 1×1 convolution block A and the second 1×1 convolution block A, and the four maximum pooling layers are the first pooling layer, the second pooling layer, the third pooling layer and the fourth pooling layer. chemical layer;

The construction of a convolutional neural network is as follows:

Connect the input layer, module A, the first max pooling layer, the first module B, the first 1×1 convolution block A, the second max pooling layer, the first module C, the third max pooling layer, and the Two 1×1 convolution block A, second module B, fourth maximum pooling layer, second module C, third module B, adaptive average pooling layer, dropout layer, flatten layer, fully connected layer, softmax classification layers, and then construct a convolutional neural network.

Further, the module A includes an input layer, a first feature extraction layer and an output layer; the module B includes an input layer, a second feature extraction layer and an output layer; the module C includes an input layer and a third feature extraction layer. and output layer.

Further, the first feature extraction layer includes a first input channel, a first output channel, a second output channel and a third output channel;

The first input channel is a first 3×3 convolution block A, a second 3×3 convolution block A, and a third 3×3 convolution block A connected in sequence;

The first output channel outputs the feature information matrix of the first 3×3 convolution block A;

The second output channel outputs the feature information matrix of the second 3×3 convolution block A;

The third output channel outputs the feature information matrix of the third 3×3 convolution block A.

Further, the second feature extraction layer includes a second input channel, a third input channel, a fourth input channel, a fourth output channel, a fifth output channel, a sixth output channel, a seventh output channel and an eighth output channel. ;

The second input channel is the third 1×1 convolution block A;

The third input channel is specifically: first connect the first 3×3 convolution block B and the second 3×3 convolution block B in sequence, and output the feature information matrix of the first 3×3 convolution block B. After adding it to the feature information matrix output of the second 3×3 convolution block B, it is then connected to the first activation layer and the third 3×3 convolution block B in sequence;

The fourth input channel is the fifth maximum pooling layer and the fourth 1×1 convolution block A connected in sequence;

The fourth output channel outputs the feature information matrix of the third 1×1 convolution block A;

The fifth output channel outputs the feature information matrix of the first 3×3 convolution block B;

The sixth output channel outputs the feature information matrix of the second 3×3 convolution block B;

The seventh output channel outputs the feature information matrix of the third 3×3 convolution block B;

The eighth output channel outputs the feature information matrix of the fourth 1×1 convolution block A.

Further, the third feature extraction layer includes a first input-output channel, a second input-output channel, a third input-output channel, a fourth input-output channel and a fifth input-output channel;

The first input and output channel is the fifth 1×1 convolution block A;

The second input and output channels are the fourth 3×3 convolution block B and the sixth 1×1 convolution block A connected in sequence;

The third input and output channels are the fifth 3×3 convolution block B, the sixth 3×3 convolution block B, and the seventh 1×1 convolution block A connected in sequence;

The fourth input and output channels are the seventh 3×3 convolution block B, the eighth 3×3 convolution block B, the ninth 3×3 convolution block B, and the eighth 1×1 convolution block A, which are connected in sequence;

The fifth input and output channel is connected to the sixth maximum pooling layer and the ninth 1×1 convolution block A in sequence.

Further, the convolution block B includes a convolution layer, a batch normalization layer and a second activation layer connected in sequence;

The activation function used in the second activation layer is: RELU6, RELU6(x)=min(max(x,0),6);

The convolution layer in the convolution block B uses depth-separable convolution operations.

Further, the input of the output layer performs deep splicing of all feature information matrices output in the corresponding feature extraction layer.

The second object of the present invention can be achieved by adopting the following technical solutions:

A fire video image recognition system, the system includes:

The first acquisition unit is used to acquire a data set, where the data set is a fire and non-fire video image data set;

Building unit, used to construct a convolutional neural network. The convolutional neural network includes one layer of input layer, one layer of module A, three layers of module B, two layers of module C, two layers of 1×1 convolution block A, and four layers. Maximum pooling layer, an adaptive average pooling layer, a flatten layer, a dropout layer, a fully connected layer and a softmax classification layer;

The training unit is used to train the convolutional neural network using the data set to obtain the fire video image recognition model;

The second acquisition unit is used to acquire the video to be recognized and perform frame processing on the video to be recognized to obtain the video image to be recognized;

The recognition unit is used to input the video image to be recognized into the fire video image recognition model to realize fire video image recognition.

The third object of the present invention can be achieved by adopting the following technical solutions:

A computer device includes a processor and a memory for storing an executable program of the processor. When the processor executes the program stored in the memory, it implements the above fire video image recognition method.

The fourth object of the present invention can be achieved by adopting the following technical solutions:

A storage medium stores a program. When the program is executed by a processor, the above fire video image recognition method is implemented.

Compared with the prior art, the present invention has the following beneficial effects:

(1) The fire video image recognition model built by the present invention can not only reduce the amount of network model parameters, but also improve the detection efficiency and accuracy of the network model, thereby realizing rapid recognition of fire video images, so that fire hazards can be discovered in a timely manner. Ensure the safety of personal and property.

(2) The present invention obtains a video image data set by performing frame processing on the collected video, and then preprocesses the video image data, thereby effectively solving problems such as insufficient lighting and shadows that exist during the collection process of monitoring equipment.

Description of drawings

In order to explain the embodiments of the present invention or the technical solutions in the prior art more clearly, the following will briefly introduce the drawings needed to describe the embodiments or the prior art. Obviously, the drawings in the following description are only These are some embodiments of the present invention. For those of ordinary skill in the art, other drawings can be obtained based on the structures shown in these drawings without exerting creative efforts.

Figure 1 is a flow chart of the fire video image recognition method in Embodiment 1 of the present invention.

Figure 2 is a frame diagram of the fire video image recognition model in Embodiment 1 of the present invention.

Figure 3 is a frame diagram of module A according to Embodiment 1 of the present invention.

Figure 4 is a frame diagram of module B in Embodiment 1 of the present invention.

Figure 5 is a frame diagram of module C in Embodiment 1 of the present invention.

Figure 6 is a frame diagram of convolution blocks A and B in Embodiment 1 of the present invention.

Figure 7 is a bar chart showing the parameters of each network model in Embodiment 1 of the present invention.

Figure 8 is a statistical graph showing the fire identification accuracy of each network model in Embodiment 1 of the present invention.

Figure 9 is a flow chart of the fire video image recognition system in Embodiment 2 of the present invention.

Figure 10 is a structural block diagram of a computer device according to Embodiment 3 of the present invention.

Detailed ways

In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention. Obviously, the described embodiments It is a part of the embodiments of the present invention, but not all the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without making creative efforts belong to the protection of the present invention. scope.

Example 1:

As shown in Figure 1, this embodiment provides a fire video image recognition method, which includes the following steps:

S101. Obtain the data set.

This embodiment collects flame and non-flame videos through the network, and then uses the opencv library to perform frame processing on the collected flame and non-flame videos (12 frames as a unit), thereby obtaining labeled fire and non-fire videos. video image data set.

Further, this embodiment uses a script to divide the above data set into a training set and a test set, and perform data enhancement on the training set, where the data enhancement includes random rotation, mirroring, and random cropping.

S102. Construct a convolutional neural network.

As shown in Figure 2, the convolutional neural network in this embodiment includes one layer of input layer, one layer of module A, three layers of module B, two layers of module C, two layers of 1×1 convolution block A, and four layers of maximum pooling. layer, an adaptive average pooling layer, a flatten layer, a dropout layer, a fully connected layer and a softmax classification layer; the three layers of modules B are the first module B and the second module B respectively. and the third module B, the two-layer module C is the first module C and the second module C respectively, and the two-layer 1×1 convolution block A is the first 1×1 convolution block A and the second 1×1 convolution block A respectively. Block A, the four maximum pooling layers are the first pooling layer, the second pooling layer, the third pooling layer and the fourth pooling layer.

This embodiment combines the input layer, module A, the first maximum pooling layer, the first module B, the first 1×1 convolution block A, the second maximum pooling layer, the first module C, and the third maximum pooling layer. , the second 1×1 convolution block A, the second module B, the fourth maximum pooling layer, the second module C, the third module B, the adaptive average pooling layer, the dropout layer, the flatten layer, the fully connected layer, The softmax classification layers are connected in sequence to construct a convolutional neural network.

The convolution layers used by the first 1×1 convolution block A and the second 1×1 convolution block A in this embodiment have a stride of 1 and a padding of 0.

Further, as shown in Figure 3, module A in this embodiment includes an input layer, a first feature extraction layer, and an output layer; where the first feature extraction layer includes a first input channel, a first output channel, and a second output channel. and third output channel. Specifically, the first input channel is connected in sequence to the first 3×3 convolution block A, the second 3×3 convolution block A, and the third 3×3 convolution block A; the first output channel outputs the first 3×3 The feature information matrix of the convolution block A; the second output channel outputs the feature information matrix of the second 3×3 convolution block A; the third output channel outputs the feature information matrix of the third 3×3 convolution block A.

In module A, the convolution layer used by the first 3×3 convolution block A has a stride of 2 and a padding of 1; the second 3×3 convolution block A and the third 3×3 convolution block A use The convolutional layers reached have a stride of 1 and a padding of 1.

Further, as shown in Figure 4, module B in this embodiment includes an input layer, a second feature extraction layer and an output layer; the second feature extraction layer includes a second input channel, a third input channel, and a fourth input channel. , the fourth output channel, the fifth output channel, the sixth output channel, the seventh output channel and the eighth output channel. Specifically, the second input channel is the third 1×1 convolution block A; the third input channel is specifically: first connect the first 3×3 convolution block B and the second 3×3 convolution block B in sequence, After adding the feature information matrix output of the first 3×3 convolution block B and the feature information matrix output of the second 3×3 convolution block B, they are then combined with the first activation layer and the third 3×3 convolution block B is connected in sequence; the fourth input channel is the fifth maximum pooling layer and the fourth 1×1 convolution block A is connected in sequence; the fourth output channel outputs the feature information of the third 1×1 convolution block A in the second input channel matrix; the fifth output channel outputs the feature information matrix of the first 3×3 convolution block B; the sixth output channel outputs the feature information matrix of the second 3×3 convolution block B; the seventh output channel outputs the third 3×3 The feature information matrix of convolution block B; the eighth output channel outputs the feature information matrix of the fourth 1×1 convolution block A in the fourth input channel.

In module B, the convolution layers used by the first 3×3 convolution block B, the second 3×3 convolution block B, and the third 3×3 convolution block B have a stride of 1 and a padding of 1; The convolution layer used by the third 1×1 convolution block A and the fourth 1×1 convolution block A has a stride of 1 and no padding.

Further, as shown in Figure 5, module C in this embodiment includes an input layer, a third feature extraction layer and an output layer; the third feature extraction layer includes a first input and output channel, a second input and output channel, a third Input and output channels, the fourth input and output channel and the fifth input and output channel. Specifically, the first input and output channel is the fifth 1×1 convolution block A; the second input and output channel is the fourth 3×3 convolution block B and the sixth 1×1 convolution block A which are connected in sequence; the third input The output channel is the fifth 3×3 convolution block B, the sixth 3×3 convolution block B, and the seventh 1×1 convolution block A. The fourth input and output channel is the seventh 3×3 convolution block B. , the eighth 3×3 convolution block B, the ninth 3×3 convolution block B, and the eighth 1×1 convolution block A are connected in sequence; the fifth input and output channel is the sixth maximum pooling layer and the ninth 1× 1 convolution block A is connected in sequence.

In module C, the fourth 3×3 convolution block B, the fifth 3×3 convolution block B, the sixth 3×3 convolution block B, the seventh 3×3 convolution block B, the eighth 3×3 The convolution layers used by convolution block B and the ninth 3×3 convolution block B have a stride of 1 and a padding of 1; the fifth 1×1 convolution block A and the sixth 1×1 convolution block A. The convolution layers used by the seventh 1×1 convolution block A, the eighth 1×1 convolution block A, and the ninth 1×1 convolution block A have a stride of 1 and no padding.

The first activation layer in this embodiment is the activation layer in module B, and the second activation layer is the activation layer in convolution block A and convolution block B.

The input layer in this embodiment is used to receive the output of the previous layer; the input of the output layer performs deep splicing of all feature information matrices output in the corresponding feature extraction layer, specifically: in module A, the output layer The input is the feature information matrix output by these three output channels for depth splicing; in module B, the input of the output layer is the feature information matrix output by these five output channels for depth splicing; in module C, the same Reason, no more details.

Furthermore, as shown in Figure 6, both convolution block A and convolution block B include a convolution layer, a batch normalization (BN) layer and a second activation layer connected in sequence; where, in convolution block A The convolution layer uses ordinary convolution operations. The convolution layer in convolution block B uses depth-separable convolution operations. The activation functions used in the second activation layer are all RELU6, RELU6(x)=min(max (x,0),6).

The depth-separable convolution in this embodiment is specifically as follows: the number of channels of the convolution kernel is 1, and at the same time, the number of channels of the input feature matrix = the number of convolution kernels = the number of channels of the output feature matrix.

In this embodiment, the size of the first maximum pooling layer, the second maximum pooling layer, the third maximum pooling layer, the fourth maximum pooling layer, the fifth maximum pooling layer, and the sixth maximum pooling layer are all 3. ×3, the stride is 1, and the padding is 1; the number of randomly inactivated neurons in the dropout layer is 40%; the number of neurons in the fully connected layer is 2.

The specific parameters of the convolutional neural network in this embodiment are shown in Table 1:

Table 1 shows the specific parameters of the convolutional neural network.

Among them: 3×3-1A, 3×3-2A and 3×3-3A respectively represent the first, second and third 3×3 convolution block A in module A; 1×1-1B and 1×1- 2B respectively represents the number of the third 1×1 convolution block A in the fourth output channel in module B and the fourth 1×1 convolution block A in the eighth output channel; 1×1-1C, 1×1-2C, 1×1-3C, 1×1-4C and 1×1-5C respectively represent the fifth 1×1 convolution block A in the first input and output channel in module C and the sixth 1×1 volume in the second input and output channel. Product block A, the seventh 1×1 convolution block A in the third input and output channel, the eighth 1×1 convolution block A in the fourth input and output channel, and the ninth 1×1 convolution block in the fifth input and output channel. A; 1×1 represents a 1×1 convolution block.

S103. Use the data set to train the convolutional neural network to obtain a fire video image recognition model.

The training set obtained in step S101 is input into the fire video image recognition model for training, and the network parameters are adjusted to obtain a pre-trained model (trained fire video image recognition model). The test set obtained in step S101 is input into In the pre-trained model, the recognition accuracy is obtained.

The fire video image recognition model was tested for performance. The specific results are as follows:

As shown in Figure 7, the fire video image recognition model parameter amount is far less than other classic convolutional neural network models. Its model parameter amount is 1.02% of the VGG19 model parameter amount, 23.80% of the GoogleNet model parameter amount, and resnet34 6.68% of the model parameters.

As shown in Figure 8, the performance of the fire video image recognition model on the test set is much better than that of other classic convolutional neural network models on the test set. Specifically: in the same iteration of 300 epochs, the fire video The highest fire recognition accuracy of the image recognition model is 97.06%, which is 2.31% higher than the classic convolutional neural network model GoogleNet and 0.85% higher than the classic convolutional neural network model resnet34.

S104. Obtain the video to be identified, and perform frame processing on the video to be identified to obtain the video image to be identified.

S105. Input the video image to be recognized into the fire video image recognition model to realize fire video image recognition.

Those skilled in the art can understand that all or part of the steps in the method of implementing the above embodiments can be completed by instructing relevant hardware through a program, and the corresponding program can be stored in a computer-readable storage medium.

It should be noted that although the method operations of the above embodiments are described in a specific order in the drawings, this does not require or imply that these operations must be performed in that specific order, or that all illustrated operations must be performed to achieve desired results. . Instead, the steps depicted can be executed in a different order. Additionally or alternatively, certain steps may be omitted, multiple steps may be combined into one step for execution, and/or one step may be broken down into multiple steps for execution.

Example 2:

As shown in Figure 9, this embodiment provides a fire video image recognition system. The system includes a first acquisition unit 901, a construction unit 902, a training unit 903, a second acquisition unit 904 and a recognition unit 905. The specific details of each unit are The functions are as follows:

The first acquisition unit 901 is used to acquire a data set, which is a fire and non-fire video image data set;

Construction unit 902 is used to construct a convolutional neural network. The convolutional neural network includes one layer of input layer, one layer of module A, three layers of module B, two layers of module C, two layers of 1×1 convolution block A, and four layers of module A. There are one layer of max pooling layer, one layer of adaptive average pooling layer, one layer of flatten layer, one layer of dropout layer, one layer of fully connected layer and one layer of softmax classification layer;

The training unit 903 is used to train the convolutional neural network using the data set to obtain a fire video image recognition model;

The second acquisition unit 904 is used to acquire the video to be recognized and perform frame processing on the video to be recognized to obtain the video image to be recognized;

The recognition unit 905 is used to input the video image to be recognized into the fire video image recognition model to realize fire video image recognition.

The specific implementation of each unit in this embodiment can be referred to the above-mentioned Embodiment 1, and will not be repeated here. It should be noted that the system provided by this embodiment is only illustrated by the division of each functional unit mentioned above. In practical applications, , the above functions can be assigned to different functional units as needed, that is, the internal structure is divided into different functional units to complete all or part of the functions described above.

Example 3:

As shown in FIG. 10 , this embodiment provides a computer device, which includes a processor 1002 , a memory, an input device 1003 , a display device 1004 and a network interface 1005 connected through a system bus 1001 . Among them, the processor 1002 is used to provide computing and control capabilities. The memory includes a non-volatile storage medium 1006 and an internal memory 1007. The non-volatile storage medium 1006 stores an operating system, computer programs and databases. The internal memory 1007 is The operating system and computer program in the non-volatile storage medium 1006 provide an environment for running. When the computer program is executed by the processor 1002, the fire video image recognition method of the above-mentioned Embodiment 1 is implemented, as follows:

Obtain a data set, which is a fire and non-fire video image data set;

Example 4:

This embodiment provides a storage medium, which is a computer-readable storage medium that stores a computer program. When the computer program is executed by a processor, the fire video image recognition method of the above-mentioned Embodiment 1 is implemented, as follows:

Obtain a data set, which is a fire and non-fire video image data set;

It should be noted that the computer-readable storage medium in this embodiment may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two. The computer-readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, device or device, or any combination thereof. More specific examples of computer readable storage media may include, but are not limited to: an electrical connection having one or more wires, a portable computer disk, a hard drive, random access memory (RAM), read only memory (ROM), removable Programmed read-only memory (EPROM or flash memory), fiber optics, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.

In this embodiment, a computer-readable storage medium may be any tangible medium that contains or stores a program that may be used by or in conjunction with an instruction execution system, apparatus, or device. In this embodiment, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, in which a computer-readable program is carried. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the above. A computer-readable signal medium may also be any computer-readable storage medium other than computer-readable storage media that can be sent, propagated, or transmitted for use by or in connection with an instruction execution system, apparatus, or device program. Computer programs embodied on computer-readable storage media may be transmitted using any suitable medium, including but not limited to: wires, optical cables, RF (radio frequency), etc., or any suitable combination of the above.

The above-mentioned computer-readable storage medium can be used to write a computer program for executing this embodiment in one or more programming languages or a combination thereof. The above-mentioned programming languages include object-oriented programming languages such as Java, Python, and C++. Also included are conventional procedural programming languages—such as C or similar programming languages. The Program may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In situations involving remote computers, the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer, such as through the Internet using an Internet service provider. ).

In summary, the present invention obtains a video image data set by performing frame processing on the collected video, and then preprocesses the video image data, thereby effectively solving problems such as insufficient lighting and shadows that exist during the collection process of monitoring equipment. ; In addition, the built fire video image recognition model can not only reduce the number of network model parameters, but also improve the detection efficiency and accuracy of the network model, thereby realizing rapid recognition of fire video images, so as to detect fire hazards in time and ensure personal safety. Property security.

The above are only preferred embodiments of the patent of the present invention, but the scope of protection of the patent of the present invention is not limited thereto. Any person familiar with the technical field can, within the scope disclosed by the patent of the present invention, proceed according to the patent of the present invention. Any equivalent substitution or change of the technical solution and its inventive concept shall fall within the scope of protection of the patent of the present invention.

Claims

A fire video image recognition method, characterized in that the method includes:

Obtain a data set, which is a fire and non-fire video image data set;

Construct a convolutional neural network. The convolutional neural network includes one layer of input layer, one layer of module A, three layers of module B, two layers of module C, two layers of 1×1 convolution block A, and four layers of maximum pooling layers. One layer of adaptive average pooling layer, one layer of flatten layer, one layer of dropout layer, one layer of fully connected layer and one layer of softmax classification layer;

Use the data set to train the convolutional neural network and obtain the fire video image recognition model;

Obtain the video to be identified and perform frame processing on the video to be identified to obtain the video image to be identified;

Input the video image to be recognized into the fire video image recognition model to realize fire video image recognition.
The fire video image recognition method according to claim 1, characterized in that the three-layer modules B are the first module B, the second module B and the third module B respectively, and the two-layer modules C are the first module C and the third module B respectively. Second module C, two layers of 1×1 convolution block A are respectively the first 1×1 convolution block A and the second 1×1 convolution block A, and the four layers of maximum pooling layers are respectively the first pooling layer and the second 1×1 convolution block A. Second pooling layer, third pooling layer and fourth pooling layer;

The construction of a convolutional neural network is as follows:

Connect the input layer, module A, the first max pooling layer, the first module B, the first 1×1 convolution block A, the second max pooling layer, the first module C, the third max pooling layer, and the Two 1×1 convolution block A, second module B, fourth maximum pooling layer, second module C, third module B, adaptive average pooling layer, dropout layer, flatten layer, fully connected layer, softmax classification layers, and then construct a convolutional neural network.
The fire video image recognition method according to any one of claims 1-2, characterized in that the module A includes an input layer, a first feature extraction layer and an output layer; the module B includes an input layer, a second feature Extraction layer and output layer; the module C includes an input layer, a third feature extraction layer and an output layer.
The fire video image recognition method according to claim 3, wherein the first feature extraction layer includes a first input channel, a first output channel, a second output channel and a third output channel;

The first input channel is a first 3×3 convolution block A, a second 3×3 convolution block A, and a third 3×3 convolution block A connected in sequence;

The first output channel outputs the feature information matrix of the first 3×3 convolution block A;

The second output channel outputs the feature information matrix of the second 3×3 convolution block A;

The third output channel outputs the feature information matrix of the third 3×3 convolution block A.
The fire video image recognition method according to claim 3, characterized in that the second feature extraction layer includes a second input channel, a third input channel, a fourth input channel, a fourth output channel, a fifth output channel, The sixth output channel, the seventh output channel and the eighth output channel;

The second input channel is the third 1×1 convolution block A;

The third input channel is specifically: first connect the first 3×3 convolution block B and the second 3×3 convolution block B in sequence, and output the feature information matrix of the first 3×3 convolution block B. After adding it to the feature information matrix output of the second 3×3 convolution block B, it is then connected to the first activation layer and the third 3×3 convolution block B in sequence;

The fourth input channel is the fifth maximum pooling layer and the fourth 1×1 convolution block A connected in sequence;

The fourth output channel outputs the feature information matrix of the third 1×1 convolution block A;

The fifth output channel outputs the feature information matrix of the first 3×3 convolution block B;

The sixth output channel outputs the feature information matrix of the second 3×3 convolution block B;

The seventh output channel outputs the feature information matrix of the third 3×3 convolution block B;

The eighth output channel outputs the feature information matrix of the fourth 1×1 convolution block A.
The fire video image recognition method according to claim 3, characterized in that the third feature extraction layer includes a first input and output channel, a second input and output channel, a third input and output channel, a fourth input and output channel and a third input and output channel. Five input and output channels;

The first input and output channel is the fifth 1×1 convolution block A;

The second input and output channels are the fourth 3×3 convolution block B and the sixth 1×1 convolution block A connected in sequence;

The third input and output channels are the fifth 3×3 convolution block B, the sixth 3×3 convolution block B, and the seventh 1×1 convolution block A connected in sequence;

The fourth input and output channels are the seventh 3×3 convolution block B, the eighth 3×3 convolution block B, the ninth 3×3 convolution block B, and the eighth 1×1 convolution block A, which are connected in sequence;

The fifth input and output channel is connected to the sixth maximum pooling layer and the ninth 1×1 convolution block A in sequence.
The fire video image recognition method according to any one of claims 5-6, characterized in that the convolution block B includes a convolution layer, a batch normalization layer and a second activation layer connected in sequence;

The activation function used in the second activation layer is: RELU6, RELU6(x)=min(max(x,0),6);

The convolution layer in the convolution block B uses depth-separable convolution operations.
The fire video image recognition method according to claim 3, characterized in that the input of the output layer performs depth splicing of all feature information matrices output in the corresponding feature extraction layer.
A fire video image recognition system, characterized in that the system includes:

The first acquisition unit is used to acquire a data set, where the data set is a fire and non-fire video image data set;

Building unit, used to construct a convolutional neural network. The convolutional neural network includes one layer of input layer, one layer of module A, three layers of module B, two layers of module C, two layers of 1×1 convolution block A, and four layers. Maximum pooling layer, an adaptive average pooling layer, a flatten layer, a dropout layer, a fully connected layer and a softmax classification layer;

The training unit is used to train the convolutional neural network using the data set to obtain the fire video image recognition model;

The second acquisition unit is used to acquire the video to be recognized and perform frame processing on the video to be recognized to obtain the video image to be recognized;

The recognition unit is used to input the video image to be recognized into the fire video image recognition model to realize fire video image recognition.
A computer device, including a processor and a memory for storing a program executable by the processor, characterized in that when the processor executes the program stored in the memory, the fire video image described in any one of claims 1-8 is realized. recognition methods.