CN114419558B

CN114419558B - Fire video image identification method, fire video image identification system, computer equipment and storage medium

Info

Publication number: CN114419558B
Application number: CN202210327700.6A
Authority: CN
Inventors: 柯峰; 方恩权; 杨利萍; 庄泽升; 彭东亮; 马跃; 何冬冬
Original assignee: South China University of Technology SCUT; Guangzhou Metro Group Co Ltd; Shenzhen Launch Digital Technology Co Ltd
Current assignee: South China University of Technology SCUT; Guangzhou Metro Group Co Ltd; Shenzhen Launch Digital Technology Co Ltd
Priority date: 2022-03-31
Filing date: 2022-03-31
Publication date: 2022-07-05
Anticipated expiration: 2042-03-31
Also published as: WO2023184350A1; CN114419558A

Abstract

The invention discloses a fire video image identification method, a fire video image identification system, computer equipment and a storage medium, wherein the fire video image identification method comprises the following steps: acquiring a data set, wherein the data set is a video image data set of fire and non-fire; constructing a convolutional neural network; training the convolutional neural network by using a data set to obtain a fire video image recognition model; acquiring a video to be identified, and performing framing processing on the video to be identified to obtain a video image to be identified; and inputting the video image to be identified into a fire video image identification model to realize fire video image identification. The invention can reduce the number of parameters of the network model, improve the detection efficiency and accuracy of the network model, and realize the rapid identification of the fire video image, thereby discovering the fire hazard in time and ensuring the personal and property safety.

Description

Fire video image identification method, fire video image identification system, computer equipment and storage medium

Technical Field

The invention relates to a fire video image identification method, a fire video image identification system, computer equipment and a storage medium, and belongs to the field of computer vision.

Background

With the improvement of the economic and technological levels of China, the population quantity is continuously increased, and buildings are continuously increased and dense. The continued use of electricity and fuel increases the risk of fire and the damage caused by fire increases. Since a fire causes economic loss in society and also harms the safety of the public, it is necessary to specially study a fire detection technology to identify a fire when the fire is initially ignited, so as to reduce the loss caused by the fire as much as possible and protect the safety of people.

The traditional fire detection technologies mainly include smoke detection, temperature detection, light detection and gas detection, and mainly identify the occurrence of fire according to physical characteristics of the fire, such as the concentration of smoke generated, the temperature of the environment, the illumination intensity of flame, and the concentrations of O2 consumed by combustion and gases generated such as CO and CO 2. Traditional fire detection techniques have certain limitations: firstly, the method is only limited to a closed environment, if the physical characteristics are not obviously changed in a place with a large area, the detection efficiency of the sensor is reduced, meanwhile, the time for transmitting the physical characteristics such as gas, particles and the like to the sensor is prolonged along with the increase of the distance, the detection time is prolonged, and the timely broadcasting cannot be realized; secondly, the fire disaster monitoring system is easily influenced by the environment, and the physical characteristics of a fire scene can be influenced by the change of environmental factors such as rain, snow, wind speed and the like, so that the detection accuracy of the sensor is influenced; thirdly, the cost is high, the price of the sensor is high, and the sensor is easy to be corroded, aged and even damaged.

With the development of the information age, people begin to develop fire detection technologies towards the direction of intellectualization, and detection and identification are performed on extracted flame characteristics by using technologies such as image processing, artificial intelligence and the like. Meanwhile, the video monitoring technology is continuously developed, and monitoring full coverage is realized in most areas. The image can intuitively find the fire source, the fire behavior and other conditions, and the video-based fire detection technology is increasingly emphasized. However, the existing fire detection technology based on artificial intelligence is complex in model, overlarge in parameter quantity and low in detection efficiency, and is not beneficial to rapid detection of fire. Therefore, it is a problem of great interest to researchers to find a fire recognition model with a simple model, a small parameter amount and high detection efficiency.

Disclosure of Invention

In view of the above, the invention provides a fire video image recognition method, a fire video image recognition system, a computer device and a storage medium, which construct a fire video image recognition model by combining multi-scale feature information, a network residual error structure and a depth separable convolution operation to form a new module.

The invention aims to provide a fire video image identification method.

The second purpose of the invention is to provide a fire video image recognition system.

It is a third object of the invention to provide a computer apparatus.

It is a fourth object of the present invention to provide a storage medium.

The first purpose of the invention can be achieved by adopting the following technical scheme:

a fire video image identification method, the method comprising:

acquiring a data set, wherein the data set is a video image data set of fire and non-fire;

constructing a convolutional neural network, wherein the convolutional neural network comprises a layer of input layer, a layer of module A, three layers of modules B, two layers of modules C, two layers of 1 multiplied by 1 convolutional blocks A, four layers of maximum pooling layers, a layer of self-adaptive average pooling layer and a layer of self-adaptive average pooling layerflattenLayer, one layerdropoutLayer, a full joint layer and a layersoftmaxA classification layer;

training the convolutional neural network by using a data set to obtain a fire video image recognition model;

acquiring a video to be identified, and performing framing processing on the video to be identified to obtain a video image to be identified;

and inputting the video image to be identified into a fire video image identification model to realize fire video image identification.

Further, the three layers of modules B are respectively a first module B, a second module B and a third module B, the two layers of modules C are respectively a first module C and a second module C, the two layers of 1 × 1 rolling blocks a are respectively a first 1 × 1 rolling block a and a second 1 × 1 rolling block a, and the four layers of maximum pooling layers are respectively a first pooling layer, a second pooling layer, a third pooling layer and a fourth pooling layer;

the construction of the convolutional neural network specifically comprises the following steps:

sequentially connecting an input layer, a module A, a first maximum pooling layer, a first module B, a first 1X 1 volume block A, a second maximum pooling layer, a first module C, a third maximum pooling layer, a second 1X 1 volume block A, a second module B, a fourth maximum pooling layer, a second module C, a third module B, an adaptive average pooling layer, a,dropoutA layer,flattenA layer, a full connecting layer,softmaxAnd classifying the layers, and further constructing to obtain the convolutional neural network.

Further, the module A comprises an input layer, a first feature extraction layer and an output layer; the module B comprises an input layer, a second feature extraction layer and an output layer; the module C comprises an input layer, a third feature extraction layer and an output layer.

Further, the first feature extraction layer comprises a first input channel, a first output channel, a second output channel and a third output channel;

the first input channel is formed by sequentially connecting a first 3 × 3 volume block A, a second 3 × 3 volume block A and a third 3 × 3 volume block A;

the first output channel outputs a characteristic information matrix of a first 3 x 3 convolution block A;

the second output channel outputs a characteristic information matrix of a second 3 x 3 convolution block a;

the third output channel outputs a feature information matrix of a third 3 x 3 convolution block a.

Further, the second feature extraction layer includes a second input channel, a third input channel, a fourth output channel, a fifth output channel, a sixth output channel, a seventh output channel, and an eighth output channel;

the second input channel is a third 1 × 1 convolution block a;

the third input channel specifically comprises: firstly, a first 3 x 3 rolling block B and a second 3 x 3 rolling block B are sequentially connected, and after the characteristic information matrix output of the first 3 x 3 rolling block B and the characteristic information matrix output of the second 3 x 3 rolling block B are added, the first 3 x 3 rolling block B and the second 3 x 3 rolling block B are sequentially connected;

the fourth input channel is formed by sequentially connecting a fifth maximum pooling layer with a fourth 1 x 1 volume block A;

the fourth output channel outputs a characteristic information matrix of a third 1 × 1 convolution block a;

the fifth output channel outputs a characteristic information matrix of the first 3 × 3 convolution block B;

the sixth output channel outputs a characteristic information matrix of a second 3 × 3 convolution block B;

the seventh output channel outputs a feature information matrix of a third 3 × 3 convolution block B;

the eighth output channel outputs a feature information matrix of a fourth 1 × 1 convolution block a.

Further, the third feature extraction layer includes a first input/output channel, a second input/output channel, a third input/output channel, a fourth input/output channel, and a fifth input/output channel;

the first input/output channel is a fifth 1 × 1 convolution block a;

the second input and output channel is formed by sequentially connecting a fourth 3X 3 volume block B and a sixth 1X 1 volume block A;

the third input and output channel is formed by sequentially connecting a fifth 3 × 3 volume block B, a sixth 3 × 3 volume block B and a seventh 1 × 1 volume block A;

the fourth input and output channel is formed by sequentially connecting a seventh 3 × 3 convolution block B, an eighth 3 × 3 convolution block B, a ninth 3 × 3 convolution block B and an eighth 1 × 1 convolution block A;

and the fifth input and output channel is formed by sequentially connecting a sixth largest pooling layer with a ninth 1 × 1 rolling block A.

Further, the convolution block B comprises a convolution layer, a batch normalization layer and a second activation layer which are connected in sequence;

the activation function adopted by the second activation layer is as follows: RELU6, RELU6(x) = min (max (x,0), 6);

the convolutional layers in the convolutional block B employ a depth separable convolution operation.

Further, the input of the output layer performs deep stitching on all feature information matrices output in the corresponding feature extraction layer.

The second purpose of the invention can be achieved by adopting the following technical scheme:

a fire video image recognition system, the system comprising:

the system comprises a first acquisition unit, a second acquisition unit and a third acquisition unit, wherein the first acquisition unit is used for acquiring a data set, and the data set is a video image data set of fire and non-fire;

a construction unit for constructing a convolutional neural network, which comprises an input layer, a module A, three layers of modules B, two layers of modules C, and two layers of 1 × 1 convolutional blocksA. Four maximum pooling layers, one adaptive average pooling layer, and one layerflattenLayer, one layerdropoutLayer, a full joint layer and a layersoftmaxA classification layer;

the training unit is used for training the convolutional neural network by using the data set to obtain a fire video image recognition model;

the second acquisition unit is used for acquiring the video to be identified and performing framing processing on the video to be identified to obtain a video image to be identified;

and the identification unit is used for inputting the video image to be identified into the fire video image identification model to realize fire video image identification.

The third purpose of the invention can be achieved by adopting the following technical scheme:

a computer device comprises a processor and a memory for storing a program executable by the processor, wherein the processor executes the program stored in the memory to realize the fire video image identification method.

The fourth purpose of the invention can be achieved by adopting the following technical scheme:

a storage medium stores a program which, when executed by a processor, implements the fire video image recognition method described above.

Compared with the prior art, the invention has the following beneficial effects:

(1) the fire video image recognition model built by the invention not only can reduce the parameter quantity of the network model, but also can improve the detection efficiency and accuracy of the network model, thereby realizing the rapid recognition of the fire video image, finding the fire hazard in time and ensuring the personal and property safety.

(2) According to the invention, the collected video is subjected to framing processing to obtain a video image data set, and then the video image data is subjected to preprocessing, so that the problems of insufficient illumination, shadow and the like in the collecting process of the monitoring equipment are effectively solved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the structures shown in the drawings without creative efforts.

Fig. 1 is a flowchart of a fire video image recognition method according to embodiment 1 of the present invention.

Fig. 2 is a frame diagram of a fire video image recognition model according to embodiment 1 of the present invention.

Fig. 3 is a frame diagram of module a according to embodiment 1 of the present invention.

Fig. 4 is a frame diagram of module B of embodiment 1 of the present invention.

Fig. 5 is a frame diagram of module C according to embodiment 1 of the present invention.

FIG. 6 is a block diagram of convolution blocks A and B according to embodiment 1 of the present invention.

Fig. 7 is a bar chart showing the parameter number of each network model in embodiment 1 of the present invention.

Fig. 8 is a statistical graph of fire identification accuracy of each network model according to embodiment 1 of the present invention.

Fig. 9 is a flowchart of a fire video image recognition system according to embodiment 2 of the present invention.

Fig. 10 is a block diagram of a computer device according to embodiment 3 of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some embodiments of the present invention, but not all embodiments, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts based on the embodiments of the present invention belong to the protection scope of the present invention.

Example 1:

as shown in fig. 1, the present embodiment provides a fire video image recognition method, which includes the following steps:

s101, acquiring a data set.

In the embodiment, videos of flames and non-flames are collected through a network, and then the collected videos of flames and non-flames are subjected to framing processing (12 frames are taken as a unit) through an opencv library, so that video image data sets of fire and non-fire with labels are obtained.

Further, the present embodiment divides the data set into a training set and a testing set by using a script, and performs data enhancement on the training set, where the data enhancement includes random rotation, mirroring, and random clipping.

And S102, constructing a convolutional neural network.

As shown in fig. 2, the convolutional neural network in this embodiment includes a layer of input layer, a layer of module a, a layer of module B, a layer of module C, two layers of 1 × 1 convolutional blocks a, four maximum pooling layers, a layer of adaptive average pooling layer, and a layer of adaptive average pooling layerflattenLayer, one layerdropoutLayer, a full joint layer and a layersoftmaxA classification layer; the three layers of modules B are respectively a first module B, a second module B and a third module B, the two layers of modules C are respectively a first module C and a second module C, the two layers of 1 × 1 rolling blocks A are respectively a first 1 × 1 rolling block A and a second 1 × 1 rolling block A, and the four layers of maximum pooling layers are respectively a first pooling layer, a second pooling layer, a third pooling layer and a fourth pooling layer.

In this embodiment, an input layer, a module A, a first maximum pooling layer, a first module B, a first 1 × 1 rolling block A, a second maximum pooling layer, a first module C, a third maximum pooling layer, a second 1 × 1 rolling block A, a second module B, a fourth maximum pooling layer, a second module C, a third module B, an adaptive average pooling layer, a first filtering layer, a second filtering layer, a third filtering layer, a fourth filtering layer, and a fourth filtering layer,dropoutA layer,flattenA layer, a full connecting layer,softmaxThe classification layers are connected in sequence, and then the convolutional neural network is constructed.

In the present embodiment, the convolution layers used for the first 1 × 1 convolution block a and the second 1 × 1 convolution block a have a step pitch of 1 and a fill of 0.

Further, as shown in fig. 3, the module a in the present embodiment includes an input layer, a first feature extraction layer, and an output layer; wherein the first feature extraction layer includes a first input channel, a first output channel, a second output channel, and a third output channel. Specifically, the first input channel is formed by sequentially connecting a first 3 × 3 convolution block a, a second 3 × 3 convolution block a, and a third 3 × 3 convolution block a; the first output channel outputs a characteristic information matrix of the first 3 x 3 convolution block A; the second output channel outputs a characteristic information matrix of the second 3 x 3 convolution block A; the third output channel outputs a feature information matrix of a third 3 x 3 convolution block a.

In block a, the first 3 x 3 convolutional block a uses the convolutional layer with a step size of 2 and a fill of 1; the convolution layers used for the second 3 × 3 convolution block a and the third 3 × 3 convolution block a have a step size of 1 and a fill of 1.

Further, as shown in fig. 4, a module B in the present embodiment includes an input layer, a second feature extraction layer, and an output layer; the second feature extraction layer comprises a second input channel, a third input channel, a fourth output channel, a fifth output channel, a sixth output channel, a seventh output channel and an eighth output channel. Specifically, the second input channel is a third 1 × 1 convolution block a; the third input channel specifically comprises: firstly, a first 3 multiplied by 3 volume block B and a second 3 multiplied by 3 volume block B are sequentially connected, and after the characteristic information matrix output of the first 3 multiplied by 3 volume block B and the characteristic information matrix output of the second 3 multiplied by 3 volume block B are added, the first 3 multiplied by 3 volume block B and the second 3 multiplied by 3 volume block B are sequentially connected; the fourth input channel is formed by sequentially connecting a fifth maximum pooling layer with a fourth 1 multiplied by 1 volume block A; the fourth output channel outputs the characteristic information matrix of the third 1 multiplied by 1 convolution block A in the second input channel; a fifth output channel outputs a characteristic information matrix of the first 3 × 3 convolution block B; the sixth output channel outputs the characteristic information matrix of the second 3 × 3 convolution block B; the seventh output channel outputs the characteristic information matrix of the third 3 × 3 convolution block B; the eighth output channel outputs the feature information matrix of the fourth 1 × 1 convolution block a in the fourth input channel.

In the module B, the first 3 × 3 convolutional block B, the second 3 × 3 convolutional block B and the third 3 × 3 convolutional block B use convolutional layers, the step distances are all 1, and the padding is all 1; the convolutional layers used for the third 1 × 1 convolutional block a and the fourth 1 × 1 convolutional block a are all 1 in step size and have no padding.

Further, as shown in fig. 5, a module C in the present embodiment includes an input layer, a third feature extraction layer, and an output layer; the third feature extraction layer comprises a first input and output channel, a second input and output channel, a third input and output channel, a fourth input and output channel and a fifth input and output channel. Specifically, the first input/output channel is a fifth 1 × 1 convolution block a; the second input and output channel is formed by sequentially connecting a fourth 3X 3 volume block B and a sixth 1X 1 volume block A; the third input and output channel is formed by sequentially connecting a fifth 3 × 3 volume block B, a sixth 3 × 3 volume block B and a seventh 1 × 1 volume block A; the fourth input and output channel is formed by sequentially connecting a seventh 3 × 3 volume block B, an eighth 3 × 3 volume block B, a ninth 3 × 3 volume block B and an eighth 1 × 1 volume block A; the fifth input/output channel is formed by sequentially connecting the sixth largest pooling layer with the ninth 1 × 1 rolling block a.

In the module C, the convolution layers used for the fourth 3 × 3 convolution block B, the fifth 3 × 3 convolution block B, the sixth 3 × 3 convolution block B, the seventh 3 × 3 convolution block B, the eighth 3 × 3 convolution block B, and the ninth 3 × 3 convolution block B are all 1 in step pitch and 1 in padding; the convolution layers used for the fifth 1 × 1 convolution block a, the sixth 1 × 1 convolution block a, the seventh 1 × 1 convolution block a, the eighth 1 × 1 convolution block a, and the ninth 1 × 1 convolution block a all have a step pitch of 1 and no padding.

The first active layer in this embodiment is the active layer in module B, and the second active layer is the active layer in convolution block a and convolution block B.

The input layers in this embodiment are all used to receive the output of the previous layer; the input of the output layer performs deep splicing on all feature information matrixes output from the corresponding feature extraction layer, and specifically comprises the following steps: in the module A, the input of the output layer is the deep splicing of the characteristic information matrixes output by the three output channels; in the module B, the input of the output layer is the characteristic information matrix output by the five output channels for splicing in depth; in the module C, the same description is omitted.

Further, as shown in fig. 6, the convolution block a and the convolution block B each include a convolution layer, a Batch Normalization (BN) layer, and a second activation layer, which are sequentially connected; the convolutional layers in the convolutional block a adopt a normal convolution operation, the convolutional layers in the convolutional block B adopt a depth separable convolution operation, and the activation functions adopted by the second active layer are all RELU6, and RELU6(x) = min (max (x,0), 6).

The depth separable convolution in this embodiment specifically includes: the number of channels of the convolution kernel is 1, and meanwhile, the number of channels of the input feature matrix = the number of channels of the convolution kernel = the number of channels of the output feature matrix.

In this embodiment, the sizes of the first largest pooling layer, the second largest pooling layer, the third largest pooling layer, the fourth largest pooling layer, the fifth largest pooling layer and the sixth largest pooling layer are all 3 × 3, the step pitch is 1, and the padding is 1;dropoutthe number of layer randomly inactivated neurons was 40%; the number of neurons in the full junction layer is 2.

The specific parameter conditions of the convolutional neural network in this embodiment are shown in table 1:

table 1 shows the specific parameter conditions of the convolutional neural network

Wherein: 3 × 3-1A, 3 × 3-2A and 3 × 3-3A represent the first, second and third 3 × 3 convolutional blocks A, respectively, in module A; 1 × 1-1B and 1 × 1-2B denote the number of third 1 × 1 convolutional blocks A in the fourth output channel and the fourth 1 × 1 convolutional blocks A in the eighth output channel, respectively, in module B; 1 × 1-1C, 1 × 1-2C, 1 × 1-3C, 1 × 1-4C and 1 × 1-5C respectively represent a fifth 1 × 1 convolution block A in a first input/output channel, a sixth 1 × 1 convolution block A in a second input/output channel, a seventh 1 × 1 convolution block A in a third input/output channel, an eighth 1 × 1 convolution block A in a fourth input/output channel and a ninth 1 × 1 convolution block A in a fifth input/output channel in the module C; 1 × 1 denotes a 1 × 1 volume block.

And S103, training the convolutional neural network by using the data set to obtain a fire video image recognition model.

Inputting the training set obtained in step S101 into a fire video image recognition model for training, and adjusting network parameters to obtain a pre-training model (a trained fire video image recognition model), and inputting the test set obtained in step S101 into the pre-training model to obtain recognition accuracy.

And (3) carrying out performance test on the fire video image recognition model, wherein the specific result is as follows:

as shown in fig. 7, the fire video image recognition model parameters are much smaller than those of other classical convolutional neural network models, and the model parameters are 1.02% of those of VGG19, 23.80% of GoogleNet, and 6.68% of those of resnet 34.

As shown in fig. 8, the performance of the fire video image recognition model on the test set is far better than that of other classical convolutional neural network models, specifically: under the condition of the same iteration of 300 epochs, the highest fire identification accuracy of the fire video image identification model is 97.06%, which is 2.31% higher than that of the classical convolutional neural network model GoogleNet and 0.85% higher than that of the classical convolutional neural network model respet 34.

And S104, acquiring the video to be identified, and performing framing processing on the video to be identified to obtain a video image to be identified.

And S105, inputting the video image to be identified into a fire video image identification model to realize fire video image identification.

Those skilled in the art will appreciate that all or part of the steps in the method for implementing the above embodiments may be implemented by a program to instruct associated hardware, and the corresponding program may be stored in a computer-readable storage medium.

It should be noted that although the method operations of the above-described embodiments are depicted in the drawings in a particular order, this does not require or imply that these operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Rather, the depicted steps may change the order of execution. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.

Example 2:

as shown in fig. 9, the present embodiment provides a fire video image recognition system, which includes a first acquiring unit 901, a constructing unit 902, a training unit 903, a second acquiring unit 904, and a recognition unit 905, and the specific functions of each unit are as follows:

a first acquiring unit 901, configured to acquire a data set, where the data set is a video image data set of a fire and a non-fire;

a constructing unit 902, configured to construct a convolutional neural network, where the convolutional neural network includes a layer of input layer, a layer of module a, three layers of module B, two layers of module C, two layers of 1 × 1 rolling block a, four maximum pooling layers, a layer of adaptive average pooling layer, and a layer of adaptive average pooling layerflattenLayer, layerdropoutLayer, a full joint layer and a layersoftmaxA classification layer;

a training unit 903, configured to train the convolutional neural network by using a data set to obtain a fire video image recognition model;

a second obtaining unit 904, configured to obtain a video to be identified, and perform framing processing on the video to be identified to obtain a video image to be identified;

and the identification unit 905 is used for inputting the video image to be identified into the fire video image identification model to realize fire video image identification.

The specific implementation of each unit in this embodiment may refer to embodiment 1, which is not described herein any more; it should be noted that the system provided in this embodiment is only illustrated by the division of the functional units, and in practical applications, the functions may be allocated to different functional units as needed to complete, that is, the internal structure is divided into different functional units to complete all or part of the functions described above.

Example 3:

as shown in fig. 10, the present embodiment provides a computer apparatus including a processor 1002, a memory, an input device 1003, a display device 1004, and a network interface 1005 connected by a system bus 1001. The processor 1002 is configured to provide computing and control capabilities, the memory includes a nonvolatile storage medium 1006 and an internal memory 1007, the nonvolatile storage medium 1006 stores an operating system, a computer program, and a database, the internal memory 1007 provides an environment for the operating system and the computer program in the nonvolatile storage medium 1006 to run, and when the computer program is executed by the processor 1002, the fire video image recognition method of embodiment 1 is implemented as follows:

Example 4:

the present embodiment provides a storage medium, which is a computer-readable storage medium, and stores a computer program, and when the computer program is executed by a processor, the fire video image recognition method of embodiment 1 is implemented as follows:

It should be noted that the computer readable storage medium of the present embodiment may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

In the present embodiment, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this embodiment, however, a computer readable signal medium may include a propagated data signal with a computer readable program embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable storage medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. The computer program embodied on the computer readable storage medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

The computer-readable storage medium may be written with a computer program for performing the present embodiments in one or more programming languages, including an object oriented programming language such as Java, Python, C + +, and conventional procedural programming languages, such as C, or similar programming languages, or combinations thereof. The program may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

In summary, the invention performs framing processing on the acquired video to obtain a video image data set, and then performs preprocessing on the video image data, thereby effectively solving the problems of insufficient illumination, shadow and the like in the acquisition process of the monitoring equipment; in addition, the built fire video image recognition model not only can reduce the number of parameters of the network model, but also can improve the detection efficiency and accuracy of the network model, and further realize the quick recognition of the fire video image, so that the fire hazard can be found in time, and the personal and property safety is ensured.

The above description is only for the preferred embodiments of the present invention, but the protection scope of the present invention is not limited thereto, and any person skilled in the art can substitute or change the technical solution and the inventive concept of the present invention within the scope of the present invention.

Claims

1. A fire video image recognition method, the method comprising:

inputting a video image to be identified into a fire video image identification model to realize fire video image identification;

the module A comprises an input layer, a first feature extraction layer and an output layer; the module B comprises an input layer, a second feature extraction layer and an output layer; the module C comprises an input layer, a third feature extraction layer and an output layer;

the first feature extraction layer comprises a first input channel, a first output channel, a second output channel and a third output channel;

2. The fire video image recognition method according to claim 1, wherein the three layers of modules B are a first module B, a second module B and a third module B, the two layers of modules C are a first module C and a second module C, the two layers of 1 x 1 rolling blocks a are a first 1 x 1 rolling block a and a second 1 x 1 rolling block a, and the four layers of maximum pooling layers are a first pooling layer, a second pooling layer, a third pooling layer and a fourth pooling layer, respectively;

the construction of the convolutional neural network is specifically as follows:

an input layer, a module A, a first maximum pooling layer, a first module B, a first 1 x 1 rolling block A, a second maximum pooling layer, a first module C, a third maximum pooling layer, a second 1 x 1 rolling block A, a second module B, a fourth maximum pooling layer, a second module C, a third module B, an adaptive average pooling layer, a first filtering layer, a second filtering layer, a third filtering layer, a fourth filtering layer, a third module B, a third filtering layer, a second filtering, a third module B, a third module C, a third module B, a third module C, a third module B, a fourth maximum filtering layer, a third module B, a fourth maximum filtering layer, a third module B, a fourth maximum filtering layer, a third module C, a fourth maximum filtering layer, a third module, a fourth maximum filtering layer, a fourth, a third module B, a fourth maximum filtering layer, a fourth, a third module, a fourth,dropouta layer,flattenA layer, a full connecting layer,softmaxAnd classifying the layers, and further constructing to obtain the convolutional neural network.

3. The fire video image recognition method according to claim 1, wherein the second feature extraction layer comprises a second input channel, a third input channel, a fourth output channel, a fifth output channel, a sixth output channel, a seventh output channel, and an eighth output channel;

the second input channel is a third 1 × 1 convolution block a;

4. The fire video image recognition method according to claim 1, wherein the third feature extraction layer comprises a first input/output channel, a second input/output channel, a third input/output channel, a fourth input/output channel, and a fifth input/output channel;

the first input and output channel is a fifth 1 multiplied by 1 convolution block A;

the second input and output channel is formed by sequentially connecting a fourth 3X 3 convolution block B and a sixth 1X 1 convolution block A;

the fourth input and output channel is formed by sequentially connecting a seventh 3 × 3 volume block B, an eighth 3 × 3 volume block B, a ninth 3 × 3 volume block B and an eighth 1 × 1 volume block A;

and the fifth input and output channel is formed by sequentially connecting the sixth largest pooling layer with the ninth 1 × 1 convolution block A.

5. The fire video image recognition method according to any one of claims 3 to 4, wherein the convolution block B comprises a convolution layer, a batch normalization layer and a second active layer which are connected in sequence;

the activation function adopted by the second activation layer is: RELU6, RELU6(x) = min (max (x,0), 6);

6. The fire video image recognition method according to claim 1, wherein the input of the output layer is a depth stitching of all feature information matrices output in the corresponding feature extraction layers.

7. A fire video image recognition system, the system comprising:

a construction unit for constructing a convolutional neural network, the convolutional neural networkThe network comprises a first input layer, a first module A, a third module B, a second module C, a second 1X 1 rolling block A, a fourth maximum pooling layer, a first self-adaptive average pooling layer, and a first layerflattenLayer, one layerdropoutLayer, a full joint layer and a layersoftmaxA classification layer;

the training unit is used for training the convolutional neural network by using a data set to obtain a fire video image recognition model;

the identification unit is used for inputting the video image to be identified into a fire video image identification model to realize fire video image identification;

8. A computer device comprising a processor and a memory for storing a program executable by the processor, wherein the processor, when executing the program stored in the memory, implements the fire video image recognition method according to any one of claims 1 to 6.