WO2023184350A1 - Fire video image recognition method and system, computer device, and storage medium - Google Patents

Fire video image recognition method and system, computer device, and storage medium Download PDF

Info

Publication number
WO2023184350A1
WO2023184350A1 PCT/CN2022/084441 CN2022084441W WO2023184350A1 WO 2023184350 A1 WO2023184350 A1 WO 2023184350A1 CN 2022084441 W CN2022084441 W CN 2022084441W WO 2023184350 A1 WO2023184350 A1 WO 2023184350A1
Authority
WO
WIPO (PCT)
Prior art keywords
layer
convolution block
video image
input
module
Prior art date
Application number
PCT/CN2022/084441
Other languages
French (fr)
Chinese (zh)
Inventor
柯峰
方恩权
杨利萍
庄泽升
彭东亮
马跃
何冬冬
Original Assignee
华南理工大学
广州地铁集团有限公司
深圳市朗驰欣创科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华南理工大学, 广州地铁集团有限公司, 深圳市朗驰欣创科技股份有限公司 filed Critical 华南理工大学
Publication of WO2023184350A1 publication Critical patent/WO2023184350A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the invention relates to a fire video image recognition method, system, computer equipment and storage medium, and belongs to the field of computer vision.
  • Traditional fire detection technologies mainly include smoke detection, temperature detection, light detection and gas detection. They mainly identify the occurrence of fire based on the physical characteristics of the fire, such as the concentration of smoke generated, the temperature of the environment, The light intensity of the flame and the concentration of O2 consumed by combustion and CO, CO2 and other gases produced.
  • Traditional fire detection technology has certain limitations: First, it is limited to closed environments. If in a large area, the physical characteristics change is not obvious, the sensor detection efficiency is reduced, and at the same time, gas, particles and other physical characteristics are transmitted to the sensor. The time becomes longer as the distance increases, and the detection time becomes longer, making it impossible to report in a timely manner; secondly, it is susceptible to the influence of the environment. Changes in environmental factors such as rain, snow, wind speed, etc. will affect the physical characteristics of the fire scene, and then affect the fire scene. The accuracy of sensor detection; third, the cost is high, the price of the sensor is high, and it is susceptible to corrosion, aging or even damage.
  • the present invention provides a fire video image recognition method, system, computer equipment and storage medium, which constitutes a new module by fusing multi-scale feature information, network residual structure and depth separable convolution operation.
  • this network not only reduces the number of network model parameters, but also improves the detection efficiency and accuracy of the network model.
  • the first object of the present invention is to provide a fire video image recognition method.
  • the second object of the present invention is to provide a fire video image recognition system.
  • a third object of the present invention is to provide a computer device.
  • the fourth object of the present invention is to provide a storage medium.
  • a fire video image recognition method includes:
  • the convolutional neural network includes one layer of input layer, one layer of module A, three layers of module B, two layers of module C, two layers of 1 ⁇ 1 convolution block A, and four layers of maximum pooling layers.
  • One layer of adaptive average pooling layer one layer of flatten layer, one layer of dropout layer, one layer of fully connected layer and one layer of softmax classification layer;
  • the three-layer module B is the first module B, the second module B and the third module B respectively
  • the two-layer module C is the first module C and the second module C respectively
  • the two-layer 1 ⁇ 1 convolution block A is respectively are the first 1 ⁇ 1 convolution block A and the second 1 ⁇ 1 convolution block A
  • the four maximum pooling layers are the first pooling layer, the second pooling layer, the third pooling layer and the fourth pooling layer.
  • the module A includes an input layer, a first feature extraction layer and an output layer
  • the module B includes an input layer, a second feature extraction layer and an output layer
  • the module C includes an input layer and a third feature extraction layer. and output layer.
  • the first feature extraction layer includes a first input channel, a first output channel, a second output channel and a third output channel;
  • the first input channel is a first 3 ⁇ 3 convolution block A, a second 3 ⁇ 3 convolution block A, and a third 3 ⁇ 3 convolution block A connected in sequence;
  • the first output channel outputs the feature information matrix of the first 3 ⁇ 3 convolution block A
  • the second output channel outputs the feature information matrix of the second 3 ⁇ 3 convolution block A
  • the third output channel outputs the feature information matrix of the third 3 ⁇ 3 convolution block A.
  • the second feature extraction layer includes a second input channel, a third input channel, a fourth input channel, a fourth output channel, a fifth output channel, a sixth output channel, a seventh output channel and an eighth output channel. ;
  • the second input channel is the third 1 ⁇ 1 convolution block A;
  • the third input channel is specifically: first connect the first 3 ⁇ 3 convolution block B and the second 3 ⁇ 3 convolution block B in sequence, and output the feature information matrix of the first 3 ⁇ 3 convolution block B. After adding it to the feature information matrix output of the second 3 ⁇ 3 convolution block B, it is then connected to the first activation layer and the third 3 ⁇ 3 convolution block B in sequence;
  • the fourth input channel is the fifth maximum pooling layer and the fourth 1 ⁇ 1 convolution block A connected in sequence;
  • the fourth output channel outputs the feature information matrix of the third 1 ⁇ 1 convolution block A;
  • the fifth output channel outputs the feature information matrix of the first 3 ⁇ 3 convolution block B;
  • the sixth output channel outputs the feature information matrix of the second 3 ⁇ 3 convolution block B;
  • the seventh output channel outputs the feature information matrix of the third 3 ⁇ 3 convolution block B;
  • the eighth output channel outputs the feature information matrix of the fourth 1 ⁇ 1 convolution block A.
  • the third feature extraction layer includes a first input-output channel, a second input-output channel, a third input-output channel, a fourth input-output channel and a fifth input-output channel;
  • the first input and output channel is the fifth 1 ⁇ 1 convolution block A;
  • the second input and output channels are the fourth 3 ⁇ 3 convolution block B and the sixth 1 ⁇ 1 convolution block A connected in sequence;
  • the third input and output channels are the fifth 3 ⁇ 3 convolution block B, the sixth 3 ⁇ 3 convolution block B, and the seventh 1 ⁇ 1 convolution block A connected in sequence;
  • the fourth input and output channels are the seventh 3 ⁇ 3 convolution block B, the eighth 3 ⁇ 3 convolution block B, the ninth 3 ⁇ 3 convolution block B, and the eighth 1 ⁇ 1 convolution block A, which are connected in sequence;
  • the fifth input and output channel is connected to the sixth maximum pooling layer and the ninth 1 ⁇ 1 convolution block A in sequence.
  • the convolution block B includes a convolution layer, a batch normalization layer and a second activation layer connected in sequence;
  • the convolution layer in the convolution block B uses depth-separable convolution operations.
  • the input of the output layer performs deep splicing of all feature information matrices output in the corresponding feature extraction layer.
  • a fire video image recognition system includes:
  • the first acquisition unit is used to acquire a data set, where the data set is a fire and non-fire video image data set;
  • the convolutional neural network includes one layer of input layer, one layer of module A, three layers of module B, two layers of module C, two layers of 1 ⁇ 1 convolution block A, and four layers.
  • Maximum pooling layer an adaptive average pooling layer, a flatten layer, a dropout layer, a fully connected layer and a softmax classification layer;
  • the training unit is used to train the convolutional neural network using the data set to obtain the fire video image recognition model
  • the second acquisition unit is used to acquire the video to be recognized and perform frame processing on the video to be recognized to obtain the video image to be recognized;
  • the recognition unit is used to input the video image to be recognized into the fire video image recognition model to realize fire video image recognition.
  • a computer device includes a processor and a memory for storing an executable program of the processor.
  • the processor executes the program stored in the memory, it implements the above fire video image recognition method.
  • a storage medium stores a program.
  • the program is executed by a processor, the above fire video image recognition method is implemented.
  • the present invention has the following beneficial effects:
  • the fire video image recognition model built by the present invention can not only reduce the amount of network model parameters, but also improve the detection efficiency and accuracy of the network model, thereby realizing rapid recognition of fire video images, so that fire hazards can be discovered in a timely manner. Ensure the safety of personal and property.
  • the present invention obtains a video image data set by performing frame processing on the collected video, and then preprocesses the video image data, thereby effectively solving problems such as insufficient lighting and shadows that exist during the collection process of monitoring equipment.
  • Figure 1 is a flow chart of the fire video image recognition method in Embodiment 1 of the present invention.
  • Figure 2 is a frame diagram of the fire video image recognition model in Embodiment 1 of the present invention.
  • FIG. 3 is a frame diagram of module A according to Embodiment 1 of the present invention.
  • FIG. 4 is a frame diagram of module B in Embodiment 1 of the present invention.
  • FIG. 5 is a frame diagram of module C in Embodiment 1 of the present invention.
  • Figure 6 is a frame diagram of convolution blocks A and B in Embodiment 1 of the present invention.
  • Figure 7 is a bar chart showing the parameters of each network model in Embodiment 1 of the present invention.
  • Figure 8 is a statistical graph showing the fire identification accuracy of each network model in Embodiment 1 of the present invention.
  • Figure 9 is a flow chart of the fire video image recognition system in Embodiment 2 of the present invention.
  • Figure 10 is a structural block diagram of a computer device according to Embodiment 3 of the present invention.
  • this embodiment provides a fire video image recognition method, which includes the following steps:
  • This embodiment collects flame and non-flame videos through the network, and then uses the opencv library to perform frame processing on the collected flame and non-flame videos (12 frames as a unit), thereby obtaining labeled fire and non-fire videos. video image data set.
  • this embodiment uses a script to divide the above data set into a training set and a test set, and perform data enhancement on the training set, where the data enhancement includes random rotation, mirroring, and random cropping.
  • the convolutional neural network in this embodiment includes one layer of input layer, one layer of module A, three layers of module B, two layers of module C, two layers of 1 ⁇ 1 convolution block A, and four layers of maximum pooling.
  • layer an adaptive average pooling layer, a flatten layer, a dropout layer, a fully connected layer and a softmax classification layer;
  • the three layers of modules B are the first module B and the second module B respectively.
  • the third module B, the two-layer module C is the first module C and the second module C respectively, and the two-layer 1 ⁇ 1 convolution block A is the first 1 ⁇ 1 convolution block A and the second 1 ⁇ 1 convolution block A respectively.
  • Block A, the four maximum pooling layers are the first pooling layer, the second pooling layer, the third pooling layer and the fourth pooling layer.
  • This embodiment combines the input layer, module A, the first maximum pooling layer, the first module B, the first 1 ⁇ 1 convolution block A, the second maximum pooling layer, the first module C, and the third maximum pooling layer. , the second 1 ⁇ 1 convolution block A, the second module B, the fourth maximum pooling layer, the second module C, the third module B, the adaptive average pooling layer, the dropout layer, the flatten layer, the fully connected layer,
  • the softmax classification layers are connected in sequence to construct a convolutional neural network.
  • the convolution layers used by the first 1 ⁇ 1 convolution block A and the second 1 ⁇ 1 convolution block A in this embodiment have a stride of 1 and a padding of 0.
  • module A in this embodiment includes an input layer, a first feature extraction layer, and an output layer; where the first feature extraction layer includes a first input channel, a first output channel, and a second output channel. and third output channel.
  • the first input channel is connected in sequence to the first 3 ⁇ 3 convolution block A, the second 3 ⁇ 3 convolution block A, and the third 3 ⁇ 3 convolution block A; the first output channel outputs the first 3 ⁇ 3 The feature information matrix of the convolution block A; the second output channel outputs the feature information matrix of the second 3 ⁇ 3 convolution block A; the third output channel outputs the feature information matrix of the third 3 ⁇ 3 convolution block A.
  • the convolution layer used by the first 3 ⁇ 3 convolution block A has a stride of 2 and a padding of 1; the second 3 ⁇ 3 convolution block A and the third 3 ⁇ 3 convolution block A use The convolutional layers reached have a stride of 1 and a padding of 1.
  • module B in this embodiment includes an input layer, a second feature extraction layer and an output layer; the second feature extraction layer includes a second input channel, a third input channel, and a fourth input channel. , the fourth output channel, the fifth output channel, the sixth output channel, the seventh output channel and the eighth output channel.
  • the second input channel is the third 1 ⁇ 1 convolution block A;
  • the third input channel is specifically: first connect the first 3 ⁇ 3 convolution block B and the second 3 ⁇ 3 convolution block B in sequence, After adding the feature information matrix output of the first 3 ⁇ 3 convolution block B and the feature information matrix output of the second 3 ⁇ 3 convolution block B, they are then combined with the first activation layer and the third 3 ⁇ 3 convolution block B is connected in sequence;
  • the fourth input channel is the fifth maximum pooling layer and the fourth 1 ⁇ 1 convolution block A is connected in sequence;
  • the fourth output channel outputs the feature information of the third 1 ⁇ 1 convolution block A in the second input channel matrix;
  • the fifth output channel outputs the feature information matrix of the first 3 ⁇ 3 convolution block B;
  • the sixth output channel outputs the feature information matrix of the second 3 ⁇ 3 convolution block B;
  • the seventh output channel outputs the third 3 ⁇ 3 The feature information matrix of convolution block B;
  • the eighth output channel outputs the feature information matrix of the fourth 1 ⁇ 1 convolution block A
  • the convolution layers used by the first 3 ⁇ 3 convolution block B, the second 3 ⁇ 3 convolution block B, and the third 3 ⁇ 3 convolution block B have a stride of 1 and a padding of 1;
  • the convolution layer used by the third 1 ⁇ 1 convolution block A and the fourth 1 ⁇ 1 convolution block A has a stride of 1 and no padding.
  • module C in this embodiment includes an input layer, a third feature extraction layer and an output layer; the third feature extraction layer includes a first input and output channel, a second input and output channel, a third Input and output channels, the fourth input and output channel and the fifth input and output channel.
  • the first input and output channel is the fifth 1 ⁇ 1 convolution block A;
  • the second input and output channel is the fourth 3 ⁇ 3 convolution block B and the sixth 1 ⁇ 1 convolution block A which are connected in sequence;
  • the third input The output channel is the fifth 3 ⁇ 3 convolution block B, the sixth 3 ⁇ 3 convolution block B, and the seventh 1 ⁇ 1 convolution block A.
  • the fourth input and output channel is the seventh 3 ⁇ 3 convolution block B.
  • the eighth 3 ⁇ 3 convolution block B, the ninth 3 ⁇ 3 convolution block B, and the eighth 1 ⁇ 1 convolution block A are connected in sequence;
  • the fifth input and output channel is the sixth maximum pooling layer and the ninth 1 ⁇ 1 convolution block A is connected in sequence.
  • the convolution layers used by convolution block B and the ninth 3 ⁇ 3 convolution block B have a stride of 1 and a padding of 1; the fifth 1 ⁇ 1 convolution block A and the sixth 1 ⁇ 1 convolution block A.
  • the convolution layers used by the seventh 1 ⁇ 1 convolution block A, the eighth 1 ⁇ 1 convolution block A, and the ninth 1 ⁇ 1 convolution block A have a stride of 1 and no padding.
  • the first activation layer in this embodiment is the activation layer in module B
  • the second activation layer is the activation layer in convolution block A and convolution block B.
  • the input layer in this embodiment is used to receive the output of the previous layer; the input of the output layer performs deep splicing of all feature information matrices output in the corresponding feature extraction layer, specifically: in module A, the output layer The input is the feature information matrix output by these three output channels for depth splicing; in module B, the input of the output layer is the feature information matrix output by these five output channels for depth splicing; in module C, the same Reason, no more details.
  • both convolution block A and convolution block B include a convolution layer, a batch normalization (BN) layer and a second activation layer connected in sequence; where, in convolution block A
  • the convolution layer uses ordinary convolution operations.
  • the convolution layer in convolution block B uses depth-separable convolution operations.
  • the size of the first maximum pooling layer, the second maximum pooling layer, the third maximum pooling layer, the fourth maximum pooling layer, the fifth maximum pooling layer, and the sixth maximum pooling layer are all 3. ⁇ 3, the stride is 1, and the padding is 1; the number of randomly inactivated neurons in the dropout layer is 40%; the number of neurons in the fully connected layer is 2.
  • Table 1 shows the specific parameters of the convolutional neural network.
  • 3 ⁇ 3-1A, 3 ⁇ 3-2A and 3 ⁇ 3-3A respectively represent the first, second and third 3 ⁇ 3 convolution block A in module A
  • 1 ⁇ 1-1B and 1 ⁇ 1- 2B respectively represents the number of the third 1 ⁇ 1 convolution block A in the fourth output channel in module B and the fourth 1 ⁇ 1 convolution block A in the eighth output channel
  • 1 ⁇ 1-1C, 1 ⁇ 1-2C, 1 ⁇ 1-3C, 1 ⁇ 1-4C and 1 ⁇ 1-5C respectively represent the fifth 1 ⁇ 1 convolution block A in the first input and output channel in module C and the sixth 1 ⁇ 1 volume in the second input and output channel.
  • Product block A the seventh 1 ⁇ 1 convolution block A in the third input and output channel, the eighth 1 ⁇ 1 convolution block A in the fourth input and output channel, and the ninth 1 ⁇ 1 convolution block in the fifth input and output channel.
  • A; 1 ⁇ 1 represents a 1 ⁇ 1 convolution block.
  • the training set obtained in step S101 is input into the fire video image recognition model for training, and the network parameters are adjusted to obtain a pre-trained model (trained fire video image recognition model).
  • the test set obtained in step S101 is input into In the pre-trained model, the recognition accuracy is obtained.
  • the fire video image recognition model parameter amount is far less than other classic convolutional neural network models. Its model parameter amount is 1.02% of the VGG19 model parameter amount, 23.80% of the GoogleNet model parameter amount, and resnet34 6.68% of the model parameters.
  • the performance of the fire video image recognition model on the test set is much better than that of other classic convolutional neural network models on the test set. Specifically: in the same iteration of 300 epochs, the fire video The highest fire recognition accuracy of the image recognition model is 97.06%, which is 2.31% higher than the classic convolutional neural network model GoogleNet and 0.85% higher than the classic convolutional neural network model resnet34.
  • this embodiment provides a fire video image recognition system.
  • the system includes a first acquisition unit 901, a construction unit 902, a training unit 903, a second acquisition unit 904 and a recognition unit 905.
  • the specific details of each unit are The functions are as follows:
  • the first acquisition unit 901 is used to acquire a data set, which is a fire and non-fire video image data set;
  • Construction unit 902 is used to construct a convolutional neural network.
  • the convolutional neural network includes one layer of input layer, one layer of module A, three layers of module B, two layers of module C, two layers of 1 ⁇ 1 convolution block A, and four layers of module A.
  • the training unit 903 is used to train the convolutional neural network using the data set to obtain a fire video image recognition model
  • the second acquisition unit 904 is used to acquire the video to be recognized and perform frame processing on the video to be recognized to obtain the video image to be recognized;
  • the recognition unit 905 is used to input the video image to be recognized into the fire video image recognition model to realize fire video image recognition.
  • each unit in this embodiment can be referred to the above-mentioned Embodiment 1, and will not be repeated here. It should be noted that the system provided by this embodiment is only illustrated by the division of each functional unit mentioned above. In practical applications, the above functions can be assigned to different functional units as needed, that is, the internal structure is divided into different functional units to complete all or part of the functions described above.
  • this embodiment provides a computer device, which includes a processor 1002 , a memory, an input device 1003 , a display device 1004 and a network interface 1005 connected through a system bus 1001 .
  • the processor 1002 is used to provide computing and control capabilities.
  • the memory includes a non-volatile storage medium 1006 and an internal memory 1007.
  • the non-volatile storage medium 1006 stores an operating system, computer programs and databases.
  • the internal memory 1007 is The operating system and computer program in the non-volatile storage medium 1006 provide an environment for running.
  • the fire video image recognition method of the above-mentioned Embodiment 1 is implemented, as follows:
  • the convolutional neural network includes one layer of input layer, one layer of module A, three layers of module B, two layers of module C, two layers of 1 ⁇ 1 convolution block A, and four layers of maximum pooling layers.
  • One layer of adaptive average pooling layer one layer of flatten layer, one layer of dropout layer, one layer of fully connected layer and one layer of softmax classification layer;
  • This embodiment provides a storage medium, which is a computer-readable storage medium that stores a computer program.
  • the computer program is executed by a processor, the fire video image recognition method of the above-mentioned Embodiment 1 is implemented, as follows:
  • the convolutional neural network includes one layer of input layer, one layer of module A, three layers of module B, two layers of module C, two layers of 1 ⁇ 1 convolution block A, and four layers of maximum pooling layers.
  • One layer of adaptive average pooling layer one layer of flatten layer, one layer of dropout layer, one layer of fully connected layer and one layer of softmax classification layer;
  • the computer-readable storage medium in this embodiment may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two.
  • the computer-readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, device or device, or any combination thereof. More specific examples of computer readable storage media may include, but are not limited to: an electrical connection having one or more wires, a portable computer disk, a hard drive, random access memory (RAM), read only memory (ROM), removable Programmd read-only memory (EPROM or flash memory), fiber optics, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program that may be used by or in conjunction with an instruction execution system, apparatus, or device.
  • the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, in which a computer-readable program is carried. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the above.
  • a computer-readable signal medium may also be any computer-readable storage medium other than computer-readable storage media that can be sent, propagated, or transmitted for use by or in connection with an instruction execution system, apparatus, or device program.
  • Computer programs embodied on computer-readable storage media may be transmitted using any suitable medium, including but not limited to: wires, optical cables, RF (radio frequency), etc., or any suitable combination of the above.
  • the above-mentioned computer-readable storage medium can be used to write a computer program for executing this embodiment in one or more programming languages or a combination thereof.
  • the above-mentioned programming languages include object-oriented programming languages such as Java, Python, and C++. Also included are conventional procedural programming languages—such as C or similar programming languages.
  • the Program may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
  • the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer, such as through the Internet using an Internet service provider. ).
  • LAN local area network
  • WAN wide area network
  • the present invention obtains a video image data set by performing frame processing on the collected video, and then preprocesses the video image data, thereby effectively solving problems such as insufficient lighting and shadows that exist during the collection process of monitoring equipment.
  • the built fire video image recognition model can not only reduce the number of network model parameters, but also improve the detection efficiency and accuracy of the network model, thereby realizing rapid recognition of fire video images, so as to detect fire hazards in time and ensure personal safety. Property security.

Abstract

Disclosed in the present invention are a fire video image recognition method and system, a computer device, and a storage medium. The fire video image recognition method comprises: obtaining a data set, the data set being a video image data set of fire and non-fire; constructing a convolutional neural network; training the convolutional neural network by using the data set so as to obtain a fire video image recognition model; obtaining a video to be recognized, and performing framing processing on said video to obtain a video image to be recognized; and inputting said video image into the fire video image recognition model to implement fire video image recognition. According to the present invention, the detection efficiency and accuracy of a network model can be improved while the quantity of parameters of the network model is reduced, the quick recognition of the fire video image is implemented, and thus fire hazards can be discovered in time, and the personal and property safety can be ensured.

Description

火灾视频图像识别方法、系统、计算机设备及存储介质Fire video image recognition method, system, computer equipment and storage medium 技术领域Technical field
本发明涉及一种火灾视频图像识别方法、系统、计算机设备及存储介质,属于计算机视觉领域。The invention relates to a fire video image recognition method, system, computer equipment and storage medium, and belongs to the field of computer vision.
背景技术Background technique
随着我国经济和科技水平的提高,人口数量不断增多,建筑物不断增多密集。电力和燃料的持续使用增加了火灾的风险,火灾造成的破坏也增加了。火灾不仅造成社会的经济损失,同时也危害着大众的安全,因此,有必要对火灾检测技术进行专门研究,使其在火灾初燃时便对火灾进行识别,尽可能地降低火灾带来的损失,保护人们的安全。With the improvement of our country's economic and technological level, the population continues to increase, and the buildings continue to increase and become denser. The continued use of electricity and fuel increases the risk of fires, and the damage caused by fires increases. Fire not only causes economic losses to society, but also endangers the safety of the public. Therefore, it is necessary to conduct specialized research on fire detection technology to identify fires when they first start and reduce the losses caused by fires as much as possible. , to protect people’s safety.
传统的火灾检测技术主要为烟感型检测、温感型检测、光感型检测和气体型检测,主要是依据火灾时的物理特征来识别火灾的发生,如产生的烟雾浓度、环境的温度、火焰的光照强度以及燃烧消耗的O2和产生的CO、CO2等气体的浓度。传统的火灾检测技术具有一定局限性:一是仅局限于封闭的环境,如果在面积较大的场所里,物理特征变化不明显,传感器检测效率降低,同时气体、颗粒等物理特征传输到传感器的时间随着距离的增大而变长,检测时间变长,无法做到及时播报;二是易受环境的影响,雨雪、风速等环境因素的变化,会影响火灾现场的物理特征,进而影响传感器检测的准确率;三是成本高,传感器的价格高,且容易受到腐蚀老化甚至损坏。Traditional fire detection technologies mainly include smoke detection, temperature detection, light detection and gas detection. They mainly identify the occurrence of fire based on the physical characteristics of the fire, such as the concentration of smoke generated, the temperature of the environment, The light intensity of the flame and the concentration of O2 consumed by combustion and CO, CO2 and other gases produced. Traditional fire detection technology has certain limitations: First, it is limited to closed environments. If in a large area, the physical characteristics change is not obvious, the sensor detection efficiency is reduced, and at the same time, gas, particles and other physical characteristics are transmitted to the sensor. The time becomes longer as the distance increases, and the detection time becomes longer, making it impossible to report in a timely manner; secondly, it is susceptible to the influence of the environment. Changes in environmental factors such as rain, snow, wind speed, etc. will affect the physical characteristics of the fire scene, and then affect the fire scene. The accuracy of sensor detection; third, the cost is high, the price of the sensor is high, and it is susceptible to corrosion, aging or even damage.
随着信息时代的发展,人们开始朝着智能化方向发展火灾探测技术,利用图像处理、人工智能等技术,对提取到的火焰特征进行检测识别。与此同时,视频监控技术不断发展,大部分地区实现监控全覆盖。图像可以非常直观地发现火灾源和火势等情况,基于视频的火灾探测技术日益受到重视。但目前基于人工智能的火灾检测技术模型复杂、参数量过大和检测效率低,不利于火灾的快速检测。为此,寻找一种模型简单、参数量较小、检测效率较高的火灾识别模型成为研究人员重点关注的问题。With the development of the information age, people have begun to develop fire detection technology in an intelligent direction, using image processing, artificial intelligence and other technologies to detect and identify extracted flame features. At the same time, video surveillance technology continues to develop, and most areas have achieved full surveillance coverage. Images can very intuitively detect fire sources and fire intensity, and video-based fire detection technology has received increasing attention. However, the current fire detection technology based on artificial intelligence has complex models, excessive parameters and low detection efficiency, which is not conducive to rapid fire detection. For this reason, finding a fire identification model with a simple model, small number of parameters, and high detection efficiency has become a key concern of researchers.
发明内容Contents of the invention
有鉴于此,本发明提供了一种火灾视频图像识别方法、系统、计算机设备及存储介质,其通过融合多尺度特征信息、网络残差结构与深度可分离卷积操作的结合,构成新的模块来搭建火灾视频图像识别模型,该网络不仅减少了网络模型参数数量,而且提高了网络模型的检测效率与准确率。In view of this, the present invention provides a fire video image recognition method, system, computer equipment and storage medium, which constitutes a new module by fusing multi-scale feature information, network residual structure and depth separable convolution operation. To build a fire video image recognition model, this network not only reduces the number of network model parameters, but also improves the detection efficiency and accuracy of the network model.
本发明的第一个目的在于提供一种火灾视频图像识别方法。The first object of the present invention is to provide a fire video image recognition method.
本发明的第二个目的在于提供一种火灾视频图像识别系统。The second object of the present invention is to provide a fire video image recognition system.
本发明的第三个目的在于提供一种计算机设备。A third object of the present invention is to provide a computer device.
本发明的第四个目的在于提供一种存储介质。The fourth object of the present invention is to provide a storage medium.
本发明的第一个目的可以通过采取如下技术方案达到:The first object of the present invention can be achieved by adopting the following technical solutions:
一种火灾视频图像识别方法,所述方法包括:A fire video image recognition method, the method includes:
获取数据集,所述数据集为火灾与非火灾的视频图像数据集;Obtain a data set, which is a fire and non-fire video image data set;
构建卷积神经网络,所述卷积神经网络包括一层输入层、一层模块A、三层模块B、两层模块C、两层1×1卷积块A、四层最大池化层、一层自适应平均池化层、一层flatten层、一层dropout层、一层全连接层和一层softmax分类层;Construct a convolutional neural network. The convolutional neural network includes one layer of input layer, one layer of module A, three layers of module B, two layers of module C, two layers of 1×1 convolution block A, and four layers of maximum pooling layers. One layer of adaptive average pooling layer, one layer of flatten layer, one layer of dropout layer, one layer of fully connected layer and one layer of softmax classification layer;
利用数据集对卷积神经网络进行训练,得到火灾视频图像识别模型;Use the data set to train the convolutional neural network and obtain the fire video image recognition model;
获取待识别视频,并对待识别视频进行分帧处理,得到待识别视频图像;Obtain the video to be identified and perform frame processing on the video to be identified to obtain the video image to be identified;
将待识别视频图像输入火灾视频图像识别模型,实现火灾视频图像识别。Input the video image to be recognized into the fire video image recognition model to realize fire video image recognition.
进一步的,三层模块B分别为第一模块B、第二模块B和第三模块B,两层模块C分别为第一模块C和第二模块C,两层1×1卷积块A分别为第一1×1卷积块A和第二1×1卷积块A,四层最大池化层分别为第一池化层、第二池化层、第三池化层和第四池化层;Further, the three-layer module B is the first module B, the second module B and the third module B respectively, the two-layer module C is the first module C and the second module C respectively, and the two-layer 1×1 convolution block A is respectively are the first 1×1 convolution block A and the second 1×1 convolution block A, and the four maximum pooling layers are the first pooling layer, the second pooling layer, the third pooling layer and the fourth pooling layer. chemical layer;
所述构建卷积神经网络,具体如下:The construction of a convolutional neural network is as follows:
依次连接输入层、模块A、第一最大池化层、第一模块B、第一1×1卷积块A、第二最大池化层、第一模块C、第三最大池化层、第二1×1卷积块A、第二模块B、第四最大池化层、第二模块C、第三模块B、自适应平均池化层、dropout层、flatten层、全连接层、softmax分类层,进而构建得到卷积神经网络。Connect the input layer, module A, the first max pooling layer, the first module B, the first 1×1 convolution block A, the second max pooling layer, the first module C, the third max pooling layer, and the Two 1×1 convolution block A, second module B, fourth maximum pooling layer, second module C, third module B, adaptive average pooling layer, dropout layer, flatten layer, fully connected layer, softmax classification layers, and then construct a convolutional neural network.
进一步的,所述模块A包括输入层、第一特征提取层和输出层;所述模块B包括输入层、第二特征提取层和输出层;所述模块C包括输入层、第三特征提取层和输出层。Further, the module A includes an input layer, a first feature extraction layer and an output layer; the module B includes an input layer, a second feature extraction layer and an output layer; the module C includes an input layer and a third feature extraction layer. and output layer.
进一步的,所述第一特征提取层包括第一输入通道、第一输出通道、第二输出通道和第三输出通道;Further, the first feature extraction layer includes a first input channel, a first output channel, a second output channel and a third output channel;
所述第一输入通道为第一3×3卷积块A、第二3×3卷积块A、第三3×3卷积块A依次连接;The first input channel is a first 3×3 convolution block A, a second 3×3 convolution block A, and a third 3×3 convolution block A connected in sequence;
所述第一输出通道输出第一3×3卷积块A的特征信息矩阵;The first output channel outputs the feature information matrix of the first 3×3 convolution block A;
所述第二输出通道输出第二3×3卷积块A的特征信息矩阵;The second output channel outputs the feature information matrix of the second 3×3 convolution block A;
所述第三输出通道输出第三3×3卷积块A的特征信息矩阵。The third output channel outputs the feature information matrix of the third 3×3 convolution block A.
进一步的,所述第二特征提取层包括第二输入通道、第三输入通道、第四输入通道、第四输出通道、第五输出通道、第六输出通道、第七输出通道和第八输出通道;Further, the second feature extraction layer includes a second input channel, a third input channel, a fourth input channel, a fourth output channel, a fifth output channel, a sixth output channel, a seventh output channel and an eighth output channel. ;
所述第二输入通道为第三1×1卷积块A;The second input channel is the third 1×1 convolution block A;
所述第三输入通道,具体为:先将第一3×3卷积块B和第二3×3卷积块B依次连接,并将第一3×3卷积块B的特征信息矩阵输出和第二3×3卷积块B的特征信息矩阵输出相加之后,再与第一激活层和第三3×3卷积块B依次连接;The third input channel is specifically: first connect the first 3×3 convolution block B and the second 3×3 convolution block B in sequence, and output the feature information matrix of the first 3×3 convolution block B. After adding it to the feature information matrix output of the second 3×3 convolution block B, it is then connected to the first activation layer and the third 3×3 convolution block B in sequence;
所述第四输入通道为第五最大池化层和第四1×1卷积块A依次连接;The fourth input channel is the fifth maximum pooling layer and the fourth 1×1 convolution block A connected in sequence;
所述第四输出通道输出第三1×1卷积块A的特征信息矩阵;The fourth output channel outputs the feature information matrix of the third 1×1 convolution block A;
所述第五输出通道输出第一3×3卷积块B的特征信息矩阵;The fifth output channel outputs the feature information matrix of the first 3×3 convolution block B;
所述第六输出通道输出第二3×3卷积块B的特征信息矩阵;The sixth output channel outputs the feature information matrix of the second 3×3 convolution block B;
所述第七输出通道输出第三3×3卷积块B的特征信息矩阵;The seventh output channel outputs the feature information matrix of the third 3×3 convolution block B;
所述第八输出通道输出第四1×1卷积块A的特征信息矩阵。The eighth output channel outputs the feature information matrix of the fourth 1×1 convolution block A.
进一步的,所述第三特征提取层包括第一输入输出通道、第二输入输出通道、第三输入输出通道、第四输入输出通道和第五输入输出通道;Further, the third feature extraction layer includes a first input-output channel, a second input-output channel, a third input-output channel, a fourth input-output channel and a fifth input-output channel;
所述第一输入输出通道为第五1×1卷积块A;The first input and output channel is the fifth 1×1 convolution block A;
所述第二输入输出通道为第四3×3卷积块B和第六1×1卷积块A依次连接;The second input and output channels are the fourth 3×3 convolution block B and the sixth 1×1 convolution block A connected in sequence;
所述第三输入输出通道为第五3×3卷积块B、第六3×3卷积块B、第七1×1卷积块A依次连接;The third input and output channels are the fifth 3×3 convolution block B, the sixth 3×3 convolution block B, and the seventh 1×1 convolution block A connected in sequence;
所述第四输入输出通道为第七3×3卷积块B、第八3×3卷积块B、第九3×3卷积块B、第八1×1卷积块A依次连接;The fourth input and output channels are the seventh 3×3 convolution block B, the eighth 3×3 convolution block B, the ninth 3×3 convolution block B, and the eighth 1×1 convolution block A, which are connected in sequence;
所述第五输入输出通道为第六最大池化层和第九1×1卷积块A依次连接。The fifth input and output channel is connected to the sixth maximum pooling layer and the ninth 1×1 convolution block A in sequence.
进一步的,所述卷积块B包括依次连接的卷积层、批量归一化层和第二激活层;Further, the convolution block B includes a convolution layer, a batch normalization layer and a second activation layer connected in sequence;
所述第二激活层采用的激活函数是:RELU6,RELU6(x)=min(max(x,0),6);The activation function used in the second activation layer is: RELU6, RELU6(x)=min(max(x,0),6);
所述卷积块B中的卷积层采用的是深度可分离卷积操作。The convolution layer in the convolution block B uses depth-separable convolution operations.
进一步的,所述输出层的输入对对应特征提取层中输出的全部特征信息矩阵进行深度上的拼接。Further, the input of the output layer performs deep splicing of all feature information matrices output in the corresponding feature extraction layer.
本发明的第二个目的可以通过采取如下技术方案达到:The second object of the present invention can be achieved by adopting the following technical solutions:
一种火灾视频图像识别系统,所述系统包括:A fire video image recognition system, the system includes:
第一获取单元,用于获取数据集,所述数据集为火灾与非火灾的视频图像数据集;The first acquisition unit is used to acquire a data set, where the data set is a fire and non-fire video image data set;
构建单元,用于构建卷积神经网络,所述卷积神经网络包括一层输入层、一层模块A、三层模块B、两层模块C、两层1×1卷积块A、四层最大池化层、一层自适应平均池化层、一层flatten层、一层dropout层、一层全连接层和一层softmax分类层;Building unit, used to construct a convolutional neural network. The convolutional neural network includes one layer of input layer, one layer of module A, three layers of module B, two layers of module C, two layers of 1×1 convolution block A, and four layers. Maximum pooling layer, an adaptive average pooling layer, a flatten layer, a dropout layer, a fully connected layer and a softmax classification layer;
训练单元,用于利用数据集对卷积神经网络进行训练,得到火灾视频图像识别模型;The training unit is used to train the convolutional neural network using the data set to obtain the fire video image recognition model;
第二获取单元,用于获取待识别视频,并对待识别视频进行分帧处理,得到待识别视频图像;The second acquisition unit is used to acquire the video to be recognized and perform frame processing on the video to be recognized to obtain the video image to be recognized;
识别单元,用于将待识别视频图像输入火灾视频图像识别模型,实现火灾视频图像识别。The recognition unit is used to input the video image to be recognized into the fire video image recognition model to realize fire video image recognition.
本发明的第三个目的可以通过采取如下技术方案达到:The third object of the present invention can be achieved by adopting the following technical solutions:
一种计算机设备,包括处理器以及用于存储处理器可执行程序的存储器,所述处理器执行存储器存储的程序时,实现上述的火灾视频图像识别方法。A computer device includes a processor and a memory for storing an executable program of the processor. When the processor executes the program stored in the memory, it implements the above fire video image recognition method.
本发明的第四个目的可以通过采取如下技术方案达到:The fourth object of the present invention can be achieved by adopting the following technical solutions:
一种存储介质,存储有程序,所述程序被处理器执行时,实现上述的火灾视频图像识别方法。A storage medium stores a program. When the program is executed by a processor, the above fire video image recognition method is implemented.
本发明相对于现有技术具有如下的有益效果:Compared with the prior art, the present invention has the following beneficial effects:
(1)本发明搭建的火灾视频图像识别模型不仅能够减少网络模型参数量,而且还能提高网络模型的检测效率与准确率,进而实现对火灾视频图像的快速识别,从而能够及时发现火灾隐患,保证人身财产安全。(1) The fire video image recognition model built by the present invention can not only reduce the amount of network model parameters, but also improve the detection efficiency and accuracy of the network model, thereby realizing rapid recognition of fire video images, so that fire hazards can be discovered in a timely manner. Ensure the safety of personal and property.
(2)本发明通过对采集到的视频进行分帧处理,得到视频图像数据集,再对视频图像数据进行预处理,从而有效解决了监控设备采集过程中存在的光照不足和阴影等问题。(2) The present invention obtains a video image data set by performing frame processing on the collected video, and then preprocesses the video image data, thereby effectively solving problems such as insufficient lighting and shadows that exist during the collection process of monitoring equipment.
附图说明Description of drawings
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单的介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图示出的结构获得其他的附图。In order to explain the embodiments of the present invention or the technical solutions in the prior art more clearly, the following will briefly introduce the drawings needed to describe the embodiments or the prior art. Obviously, the drawings in the following description are only These are some embodiments of the present invention. For those of ordinary skill in the art, other drawings can be obtained based on the structures shown in these drawings without exerting creative efforts.
图1为本发明实施例1的火灾视频图像识别方法的流程图。Figure 1 is a flow chart of the fire video image recognition method in Embodiment 1 of the present invention.
图2为本发明实施例1的火灾视频图像识别模型的框架图。Figure 2 is a frame diagram of the fire video image recognition model in Embodiment 1 of the present invention.
图3为本发明实施例1的模块A的框架图。Figure 3 is a frame diagram of module A according to Embodiment 1 of the present invention.
图4为本发明实施例1的模块B的框架图。Figure 4 is a frame diagram of module B in Embodiment 1 of the present invention.
图5为本发明实施例1的模块C的框架图。Figure 5 is a frame diagram of module C in Embodiment 1 of the present invention.
图6为本发明实施例1的卷积块A与B的框架图。Figure 6 is a frame diagram of convolution blocks A and B in Embodiment 1 of the present invention.
图7为本发明实施例1的各网络模型参数量的条形统计图。Figure 7 is a bar chart showing the parameters of each network model in Embodiment 1 of the present invention.
图8为本发明实施例1的各网络模型火灾识别准确率的曲线统计图。Figure 8 is a statistical graph showing the fire identification accuracy of each network model in Embodiment 1 of the present invention.
图9为本发明实施例2的火灾视频图像识别系统的流程图。Figure 9 is a flow chart of the fire video image recognition system in Embodiment 2 of the present invention.
图10为本发明实施例3的计算机设备的结构框图。Figure 10 is a structural block diagram of a computer device according to Embodiment 3 of the present invention.
具体实施方式Detailed ways
为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整的描述,显然,所描述的实施例是本发明的一部分实施例,而不是全部的实施例,基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention. Obviously, the described embodiments It is a part of the embodiments of the present invention, but not all the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without making creative efforts belong to the protection of the present invention. scope.
实施例1:Example 1:
如图1所示,本实施例提供了一种火灾视频图像识别方法,该方法包括以下步骤:As shown in Figure 1, this embodiment provides a fire video image recognition method, which includes the following steps:
S101、获取数据集。S101. Obtain the data set.
本实施例通过网络采集火焰与非火焰的视频,然后通过opencv库对采集到的火焰与非火焰的视频进行分帧处理(以12帧为一个单位),从而得到带有标签的火灾与非火灾的视频图像数据集。This embodiment collects flame and non-flame videos through the network, and then uses the opencv library to perform frame processing on the collected flame and non-flame videos (12 frames as a unit), thereby obtaining labeled fire and non-fire videos. video image data set.
进一步地,本实施例利用脚本将上述数据集划分为训练集和测试集,并对训练集进行数据增强,其中数据增强包括随机旋转、镜像和随机裁剪。Further, this embodiment uses a script to divide the above data set into a training set and a test set, and perform data enhancement on the training set, where the data enhancement includes random rotation, mirroring, and random cropping.
S102、构建卷积神经网络。S102. Construct a convolutional neural network.
如图2所示,本实施例中的卷积神经网络包括一层输入层、一层模块A、三层模块B、两层模块C、两层1×1卷积块A、四层最大池化层、一层自适应平均池化层、一层flatten层、一层dropout层、一层全连接层和一层softmax分类层;其中三层模块B分别为第一模块B、第二模块B和第三模块B,两层模块C分别为第一模块C和第 二模块C,两层1×1卷积块A分别为第一1×1卷积块A和第二1×1卷积块A,四层最大池化层分别为第一池化层、第二池化层、第三池化层和第四池化层。As shown in Figure 2, the convolutional neural network in this embodiment includes one layer of input layer, one layer of module A, three layers of module B, two layers of module C, two layers of 1×1 convolution block A, and four layers of maximum pooling. layer, an adaptive average pooling layer, a flatten layer, a dropout layer, a fully connected layer and a softmax classification layer; the three layers of modules B are the first module B and the second module B respectively. and the third module B, the two-layer module C is the first module C and the second module C respectively, and the two-layer 1×1 convolution block A is the first 1×1 convolution block A and the second 1×1 convolution block A respectively. Block A, the four maximum pooling layers are the first pooling layer, the second pooling layer, the third pooling layer and the fourth pooling layer.
本实施例将输入层、模块A、第一最大池化层、第一模块B、第一1×1卷积块A、第二最大池化层、第一模块C、第三最大池化层、第二1×1卷积块A、第二模块B、第四最大池化层、第二模块C、第三模块B、自适应平均池化层、dropout层、flatten层、全连接层、softmax分类层依次连接,进而构建得到卷积神经网络。This embodiment combines the input layer, module A, the first maximum pooling layer, the first module B, the first 1×1 convolution block A, the second maximum pooling layer, the first module C, and the third maximum pooling layer. , the second 1×1 convolution block A, the second module B, the fourth maximum pooling layer, the second module C, the third module B, the adaptive average pooling layer, the dropout layer, the flatten layer, the fully connected layer, The softmax classification layers are connected in sequence to construct a convolutional neural network.
本实施例中的第一1×1卷积块A和第二1×1卷积块A使用到的卷积层,步距均为1,填充均为0。The convolution layers used by the first 1×1 convolution block A and the second 1×1 convolution block A in this embodiment have a stride of 1 and a padding of 0.
进一步地,如图3所示,本实施例中的模块A包括输入层、第一特征提取层和输出层;其中第一特征提取层包括第一输入通道、第一输出通道、第二输出通道和第三输出通道。具体地,第一输入通道为第一3×3卷积块A、第二3×3卷积块A、第三3×3卷积块A依次连接;第一输出通道输出第一3×3卷积块A的特征信息矩阵;第二输出通道输出第二3×3卷积块A的特征信息矩阵;第三输出通道输出第三3×3卷积块A的特征信息矩阵。Further, as shown in Figure 3, module A in this embodiment includes an input layer, a first feature extraction layer, and an output layer; where the first feature extraction layer includes a first input channel, a first output channel, and a second output channel. and third output channel. Specifically, the first input channel is connected in sequence to the first 3×3 convolution block A, the second 3×3 convolution block A, and the third 3×3 convolution block A; the first output channel outputs the first 3×3 The feature information matrix of the convolution block A; the second output channel outputs the feature information matrix of the second 3×3 convolution block A; the third output channel outputs the feature information matrix of the third 3×3 convolution block A.
在模块A中,第一3×3卷积块A使用到的卷积层,步距为2,填充为1;第二3×3卷积块A、第三3×3卷积块A使用到的卷积层,步距均为1,填充均为1。In module A, the convolution layer used by the first 3×3 convolution block A has a stride of 2 and a padding of 1; the second 3×3 convolution block A and the third 3×3 convolution block A use The convolutional layers reached have a stride of 1 and a padding of 1.
进一步地,如图4所示,本实施例中的模块B包括输入层、第二特征提取层和输出层;其中第二特征提取层包括第二输入通道、第三输入通道、第四输入通道、第四输出通道、第五输出通道、第六输出通道、第七输出通道和第八输出通道。具体地,第二输入通道为第三1×1卷积块A;第三输入通道,具体为:先将第一3×3卷积块B和第二3×3卷积块B依次连接,并将第一3×3卷积块B的特征信息矩阵输出和第二3×3卷积块B的特征信息矩阵输出相加之后,再与第一激活层和第三3×3卷积块B依次连接;第四输入通道为第五最大池化层和第四1×1卷积块A依次连接;第四输出通道输出第二输入通道中第三1×1卷积块A的特征信息矩阵;第五输出通道输出第一3×3卷积块B的特征信息矩阵;第六输出通道输出第二3×3卷积块B的特征信息矩阵;第七输出通道输出第三3×3卷积块B的特征信息矩阵;第八输出通道输出第四输入通道中第四1×1卷积块A的特征信息矩阵。Further, as shown in Figure 4, module B in this embodiment includes an input layer, a second feature extraction layer and an output layer; the second feature extraction layer includes a second input channel, a third input channel, and a fourth input channel. , the fourth output channel, the fifth output channel, the sixth output channel, the seventh output channel and the eighth output channel. Specifically, the second input channel is the third 1×1 convolution block A; the third input channel is specifically: first connect the first 3×3 convolution block B and the second 3×3 convolution block B in sequence, After adding the feature information matrix output of the first 3×3 convolution block B and the feature information matrix output of the second 3×3 convolution block B, they are then combined with the first activation layer and the third 3×3 convolution block B is connected in sequence; the fourth input channel is the fifth maximum pooling layer and the fourth 1×1 convolution block A is connected in sequence; the fourth output channel outputs the feature information of the third 1×1 convolution block A in the second input channel matrix; the fifth output channel outputs the feature information matrix of the first 3×3 convolution block B; the sixth output channel outputs the feature information matrix of the second 3×3 convolution block B; the seventh output channel outputs the third 3×3 The feature information matrix of convolution block B; the eighth output channel outputs the feature information matrix of the fourth 1×1 convolution block A in the fourth input channel.
在模块B中,第一3×3卷积块B、第二3×3卷积块B和第三3×3卷积块B使用到的卷积层,步距均为1,填充均为1;第三1×1卷积块A和第四1×1卷积块A使用到的卷积层,步距均为1,无填充。In module B, the convolution layers used by the first 3×3 convolution block B, the second 3×3 convolution block B, and the third 3×3 convolution block B have a stride of 1 and a padding of 1; The convolution layer used by the third 1×1 convolution block A and the fourth 1×1 convolution block A has a stride of 1 and no padding.
进一步地,如图5所示,本实施例中的模块C包括输入层、第三特征提取层和输出层;其中第三特征提取层包括第一输入输出通道、第二输入输出通道、第三输入输出通道、第四输入输出通道和第五输入输出通道。具体地,第一输入输出通道为第五1 ×1卷积块A;第二输入输出通道为第四3×3卷积块B和第六1×1卷积块A依次连接;第三输入输出通道为第五3×3卷积块B、第六3×3卷积块B、第七1×1卷积块A依次连接;第四输入输出通道为第七3×3卷积块B、第八3×3卷积块B、第九3×3卷积块B、第八1×1卷积块A依次连接;第五输入输出通道为第六最大池化层和第九1×1卷积块A依次连接。Further, as shown in Figure 5, module C in this embodiment includes an input layer, a third feature extraction layer and an output layer; the third feature extraction layer includes a first input and output channel, a second input and output channel, a third Input and output channels, the fourth input and output channel and the fifth input and output channel. Specifically, the first input and output channel is the fifth 1×1 convolution block A; the second input and output channel is the fourth 3×3 convolution block B and the sixth 1×1 convolution block A which are connected in sequence; the third input The output channel is the fifth 3×3 convolution block B, the sixth 3×3 convolution block B, and the seventh 1×1 convolution block A. The fourth input and output channel is the seventh 3×3 convolution block B. , the eighth 3×3 convolution block B, the ninth 3×3 convolution block B, and the eighth 1×1 convolution block A are connected in sequence; the fifth input and output channel is the sixth maximum pooling layer and the ninth 1× 1 convolution block A is connected in sequence.
在模块C中,第四3×3卷积块B、第五3×3卷积块B、第六3×3卷积块B、第七3×3卷积块B、第八3×3卷积块B、第九3×3卷积块B使用到的卷积层,步距均为1,填充均为1;第五1×1卷积块A、第六1×1卷积块A、第七1×1卷积块A、第八1×1卷积块A、第九1×1卷积块A使用到的卷积层,步距均为1,无填充。In module C, the fourth 3×3 convolution block B, the fifth 3×3 convolution block B, the sixth 3×3 convolution block B, the seventh 3×3 convolution block B, the eighth 3×3 The convolution layers used by convolution block B and the ninth 3×3 convolution block B have a stride of 1 and a padding of 1; the fifth 1×1 convolution block A and the sixth 1×1 convolution block A. The convolution layers used by the seventh 1×1 convolution block A, the eighth 1×1 convolution block A, and the ninth 1×1 convolution block A have a stride of 1 and no padding.
本实施例中的第一激活层是模块B中的激活层,第二激活层是卷积块A和卷积块B中的激活层。The first activation layer in this embodiment is the activation layer in module B, and the second activation layer is the activation layer in convolution block A and convolution block B.
本实施例中的输入层均用来接收上一层的输出;输出层的输入对对应特征提取层中输出的全部特征信息矩阵进行深度上的拼接,具体为:在模块A中,输出层的输入为这三个输出通道输出的特征信息矩阵进行深度上的拼接;在模块B中,输出层的输入为这五个输出通道输出的特征信息矩阵进行深度上的拼接;在模块C中,同理,不再赘述。The input layer in this embodiment is used to receive the output of the previous layer; the input of the output layer performs deep splicing of all feature information matrices output in the corresponding feature extraction layer, specifically: in module A, the output layer The input is the feature information matrix output by these three output channels for depth splicing; in module B, the input of the output layer is the feature information matrix output by these five output channels for depth splicing; in module C, the same Reason, no more details.
更进一步地,如图6所示,卷积块A和卷积块B均包括依次连接的卷积层、批量归一化(BN)层和第二激活层;其中,卷积块A中的卷积层采用的是普通卷积操作,卷积块B中的卷积层采用的是深度可分离卷积操作,第二激活层采用的激活函数都是RELU6,RELU6(x)=min(max(x,0),6)。Furthermore, as shown in Figure 6, both convolution block A and convolution block B include a convolution layer, a batch normalization (BN) layer and a second activation layer connected in sequence; where, in convolution block A The convolution layer uses ordinary convolution operations. The convolution layer in convolution block B uses depth-separable convolution operations. The activation functions used in the second activation layer are all RELU6, RELU6(x)=min(max (x,0),6).
本实施例中的深度可分离卷积,具体为:卷积核的通道数为1,同时,输入特征矩阵的通道数=卷积核的个数=输出特征矩阵的通道数。The depth-separable convolution in this embodiment is specifically as follows: the number of channels of the convolution kernel is 1, and at the same time, the number of channels of the input feature matrix = the number of convolution kernels = the number of channels of the output feature matrix.
本实施例中的第一最大池化层、第二最大池化层、第三最大池化层、第四最大池化层、第五最大池化层、第六最大池化层大小均为3×3,步距为1,填充为1;dropout层随机失活神经元的数目为40%;全连接层神经元个数为2。In this embodiment, the size of the first maximum pooling layer, the second maximum pooling layer, the third maximum pooling layer, the fourth maximum pooling layer, the fifth maximum pooling layer, and the sixth maximum pooling layer are all 3. ×3, the stride is 1, and the padding is 1; the number of randomly inactivated neurons in the dropout layer is 40%; the number of neurons in the fully connected layer is 2.
本实施例中的卷积神经网络的具体参数情况如表1所示:The specific parameters of the convolutional neural network in this embodiment are shown in Table 1:
表1为卷积神经网络的具体参数情况Table 1 shows the specific parameters of the convolutional neural network.
Figure PCTCN2022084441-appb-000001
Figure PCTCN2022084441-appb-000001
Figure PCTCN2022084441-appb-000002
Figure PCTCN2022084441-appb-000002
其中:3×3-1A、3×3-2A和3×3-3A分别表示模块A中第一、第二、第三3×3卷积块A;1×1-1B和1×1-2B分别表示模块B中第四输出通道中第三1×1卷积块A的数目和第八输出通道中第四1×1卷积块A;1×1-1C、1×1-2C、1×1-3C、1×1-4C和1×1-5C分别表示模块C中第一输入输出通道中第五1×1卷积块A、第二输入输出通道中第六1×1卷积块A、第三输入输出通道中第七1×1卷积块A、第四输入输出通道中第八1×1卷积块A和第五输入输出通道中第九1×1卷积块A;1×1表示1×1卷积块。Among them: 3×3-1A, 3×3-2A and 3×3-3A respectively represent the first, second and third 3×3 convolution block A in module A; 1×1-1B and 1×1- 2B respectively represents the number of the third 1×1 convolution block A in the fourth output channel in module B and the fourth 1×1 convolution block A in the eighth output channel; 1×1-1C, 1×1-2C, 1×1-3C, 1×1-4C and 1×1-5C respectively represent the fifth 1×1 convolution block A in the first input and output channel in module C and the sixth 1×1 volume in the second input and output channel. Product block A, the seventh 1×1 convolution block A in the third input and output channel, the eighth 1×1 convolution block A in the fourth input and output channel, and the ninth 1×1 convolution block in the fifth input and output channel. A; 1×1 represents a 1×1 convolution block.
S103、利用数据集对卷积神经网络进行训练,得到火灾视频图像识别模型。S103. Use the data set to train the convolutional neural network to obtain a fire video image recognition model.
将从步骤S101得到的训练集输入到火灾视频图像识别模型中进行训练,并调整网络参数,从而得到预训练模型(训练好的火灾视频图像识别模型),将从步骤S101得到的测试集输入到预训练模型中,得到识别准确率。The training set obtained in step S101 is input into the fire video image recognition model for training, and the network parameters are adjusted to obtain a pre-trained model (trained fire video image recognition model). The test set obtained in step S101 is input into In the pre-trained model, the recognition accuracy is obtained.
对火灾视频图像识别模型进行性能测试,具体结果如下:The fire video image recognition model was tested for performance. The specific results are as follows:
如图7所示,火灾视频图像识别模型参数量远远少于其他经典的卷积神经网络模型,其模型参数量是VGG19模型参数量的1.02%,是GoogleNet模型参数量的23.80%,是resnet34模型参数量的6.68%。As shown in Figure 7, the fire video image recognition model parameter amount is far less than other classic convolutional neural network models. Its model parameter amount is 1.02% of the VGG19 model parameter amount, 23.80% of the GoogleNet model parameter amount, and resnet34 6.68% of the model parameters.
如图8所示,火灾视频图像识别模型在测试集上的表现远比其他经典的卷积神经网络模型在测试集上的表现要好,具体为:在同样迭代300个epoch的情况下,火灾视频图像识别模型的火灾识别最高准确率为97.06%,比经典卷积神经网络模型GoogleNet高2.31%,比经典卷积神经网络模型resnet34高0.85%。As shown in Figure 8, the performance of the fire video image recognition model on the test set is much better than that of other classic convolutional neural network models on the test set. Specifically: in the same iteration of 300 epochs, the fire video The highest fire recognition accuracy of the image recognition model is 97.06%, which is 2.31% higher than the classic convolutional neural network model GoogleNet and 0.85% higher than the classic convolutional neural network model resnet34.
S104、获取待识别视频,并对待识别视频进行分帧处理,得到待识别视频图像。S104. Obtain the video to be identified, and perform frame processing on the video to be identified to obtain the video image to be identified.
S105、将待识别视频图像输入火灾视频图像识别模型,实现火灾视频图像识别。S105. Input the video image to be recognized into the fire video image recognition model to realize fire video image recognition.
本领域技术人员可以理解,实现上述实施例的方法中的全部或部分步骤可以通过程序来指令相关的硬件来完成,相应的程序可以存储于计算机可读存储介质中。Those skilled in the art can understand that all or part of the steps in the method of implementing the above embodiments can be completed by instructing relevant hardware through a program, and the corresponding program can be stored in a computer-readable storage medium.
应当注意,尽管在附图中以特定顺序描述了上述实施例的方法操作,但是这并非要求或者暗示必须按照该特定顺序来执行这些操作,或是必须执行全部所示的操作才能实现期望的结果。相反,描绘的步骤可以改变执行顺序。附加地或备选地,可以省略某些步骤,将多个步骤合并为一个步骤执行,和/或将一个步骤分解为多个步骤执行。It should be noted that although the method operations of the above embodiments are described in a specific order in the drawings, this does not require or imply that these operations must be performed in that specific order, or that all illustrated operations must be performed to achieve desired results. . Instead, the steps depicted can be executed in a different order. Additionally or alternatively, certain steps may be omitted, multiple steps may be combined into one step for execution, and/or one step may be broken down into multiple steps for execution.
实施例2:Example 2:
如图9所示,本实施例提供了一种火灾视频图像识别系统,该系统包括第一获取单元901、构建单元902、训练单元903、第二获取单元904和识别单元905,各个单元的具体功能如下:As shown in Figure 9, this embodiment provides a fire video image recognition system. The system includes a first acquisition unit 901, a construction unit 902, a training unit 903, a second acquisition unit 904 and a recognition unit 905. The specific details of each unit are The functions are as follows:
第一获取单元901,用于获取数据集,所述数据集为火灾与非火灾的视频图像数据集;The first acquisition unit 901 is used to acquire a data set, which is a fire and non-fire video image data set;
构建单元902,用于构建卷积神经网络,所述卷积神经网络包括一层输入层、一层模块A、三层模块B、两层模块C、两层1×1卷积块A、四层最大池化层、一层自适应平均池化层、一层flatten层、一层dropout层、一层全连接层和一层softmax分类层; Construction unit 902 is used to construct a convolutional neural network. The convolutional neural network includes one layer of input layer, one layer of module A, three layers of module B, two layers of module C, two layers of 1×1 convolution block A, and four layers of module A. There are one layer of max pooling layer, one layer of adaptive average pooling layer, one layer of flatten layer, one layer of dropout layer, one layer of fully connected layer and one layer of softmax classification layer;
训练单元903,用于利用数据集对卷积神经网络进行训练,得到火灾视频图像识别模型;The training unit 903 is used to train the convolutional neural network using the data set to obtain a fire video image recognition model;
第二获取单元904,用于获取待识别视频,并对待识别视频进行分帧处理,得到待识别视频图像;The second acquisition unit 904 is used to acquire the video to be recognized and perform frame processing on the video to be recognized to obtain the video image to be recognized;
识别单元905,用于将待识别视频图像输入火灾视频图像识别模型,实现火灾视频图像识别。The recognition unit 905 is used to input the video image to be recognized into the fire video image recognition model to realize fire video image recognition.
本实施例中各个单元的具体实现可以参见上述实施例1,在此不再一一赘述;需要说明的是,本实施例提供的系统仅以上述各功能单元的划分进行举例说明,在实际应用中,可以根据需要而将上述功能分配给不同的功能单元完成,即将内部结构划分成不同的功能单元,以完成以上描述的全部或者部分功能。The specific implementation of each unit in this embodiment can be referred to the above-mentioned Embodiment 1, and will not be repeated here. It should be noted that the system provided by this embodiment is only illustrated by the division of each functional unit mentioned above. In practical applications, , the above functions can be assigned to different functional units as needed, that is, the internal structure is divided into different functional units to complete all or part of the functions described above.
实施例3:Example 3:
如图10所示,本实施例提供了一种计算机设备,其包括通过系统总线1001连接的处理器1002、存储器、输入装置1003、显示装置1004和网络接口1005。其中,处理器1002用于提供计算和控制能力,存储器包括非易失性存储介质1006和内存储器 1007,该非易失性存储介质1006存储有操作系统、计算机程序和数据库,该内存储器1007为非易失性存储介质1006中的操作系统和计算机程序的运行提供环境,计算机程序被处理器1002执行时,实现上述实施例1的火灾视频图像识别方法,如下:As shown in FIG. 10 , this embodiment provides a computer device, which includes a processor 1002 , a memory, an input device 1003 , a display device 1004 and a network interface 1005 connected through a system bus 1001 . Among them, the processor 1002 is used to provide computing and control capabilities. The memory includes a non-volatile storage medium 1006 and an internal memory 1007. The non-volatile storage medium 1006 stores an operating system, computer programs and databases. The internal memory 1007 is The operating system and computer program in the non-volatile storage medium 1006 provide an environment for running. When the computer program is executed by the processor 1002, the fire video image recognition method of the above-mentioned Embodiment 1 is implemented, as follows:
获取数据集,所述数据集为火灾与非火灾的视频图像数据集;Obtain a data set, which is a fire and non-fire video image data set;
构建卷积神经网络,所述卷积神经网络包括一层输入层、一层模块A、三层模块B、两层模块C、两层1×1卷积块A、四层最大池化层、一层自适应平均池化层、一层flatten层、一层dropout层、一层全连接层和一层softmax分类层;Construct a convolutional neural network. The convolutional neural network includes one layer of input layer, one layer of module A, three layers of module B, two layers of module C, two layers of 1×1 convolution block A, and four layers of maximum pooling layers. One layer of adaptive average pooling layer, one layer of flatten layer, one layer of dropout layer, one layer of fully connected layer and one layer of softmax classification layer;
利用数据集对卷积神经网络进行训练,得到火灾视频图像识别模型;Use the data set to train the convolutional neural network and obtain the fire video image recognition model;
获取待识别视频,并对待识别视频进行分帧处理,得到待识别视频图像;Obtain the video to be identified and perform frame processing on the video to be identified to obtain the video image to be identified;
将待识别视频图像输入火灾视频图像识别模型,实现火灾视频图像识别。Input the video image to be recognized into the fire video image recognition model to realize fire video image recognition.
实施例4:Example 4:
本实施例提供一种存储介质,该存储介质为计算机可读存储介质,其存储有计算机程序,所述计算机程序被处理器执行时,实现上述实施例1的火灾视频图像识别方法,如下:This embodiment provides a storage medium, which is a computer-readable storage medium that stores a computer program. When the computer program is executed by a processor, the fire video image recognition method of the above-mentioned Embodiment 1 is implemented, as follows:
获取数据集,所述数据集为火灾与非火灾的视频图像数据集;Obtain a data set, which is a fire and non-fire video image data set;
构建卷积神经网络,所述卷积神经网络包括一层输入层、一层模块A、三层模块B、两层模块C、两层1×1卷积块A、四层最大池化层、一层自适应平均池化层、一层flatten层、一层dropout层、一层全连接层和一层softmax分类层;Construct a convolutional neural network. The convolutional neural network includes one layer of input layer, one layer of module A, three layers of module B, two layers of module C, two layers of 1×1 convolution block A, and four layers of maximum pooling layers. One layer of adaptive average pooling layer, one layer of flatten layer, one layer of dropout layer, one layer of fully connected layer and one layer of softmax classification layer;
利用数据集对卷积神经网络进行训练,得到火灾视频图像识别模型;Use the data set to train the convolutional neural network and obtain the fire video image recognition model;
获取待识别视频,并对待识别视频进行分帧处理,得到待识别视频图像;Obtain the video to be identified and perform frame processing on the video to be identified to obtain the video image to be identified;
将待识别视频图像输入火灾视频图像识别模型,实现火灾视频图像识别。Input the video image to be recognized into the fire video image recognition model to realize fire video image recognition.
需要说明的是,本实施例的计算机可读存储介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是但不限于电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。It should be noted that the computer-readable storage medium in this embodiment may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two. The computer-readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, device or device, or any combination thereof. More specific examples of computer readable storage media may include, but are not limited to: an electrical connection having one or more wires, a portable computer disk, a hard drive, random access memory (RAM), read only memory (ROM), removable Programmed read-only memory (EPROM or flash memory), fiber optics, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
在本实施例中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本实施例中, 计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读存储介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读存储介质上包含的计算机程序可以用任何适当的介质传输,包括但不限于:电线、光缆、RF(射频)等等,或者上述的任意合适的组合。In this embodiment, a computer-readable storage medium may be any tangible medium that contains or stores a program that may be used by or in conjunction with an instruction execution system, apparatus, or device. In this embodiment, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, in which a computer-readable program is carried. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the above. A computer-readable signal medium may also be any computer-readable storage medium other than computer-readable storage media that can be sent, propagated, or transmitted for use by or in connection with an instruction execution system, apparatus, or device program. Computer programs embodied on computer-readable storage media may be transmitted using any suitable medium, including but not limited to: wires, optical cables, RF (radio frequency), etc., or any suitable combination of the above.
上述计算机可读存储介质可以以一种或多种程序设计语言或其组合来编写用于执行本实施例的计算机程序,上述程序设计语言包括面向对象的程序设计语言—诸如Java、Python、C++,还包括常规的过程式程序设计语言—诸如C语言或类似的程序设计语言。程序可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络,包括局域网(LAN)或广域网(WAN)连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。The above-mentioned computer-readable storage medium can be used to write a computer program for executing this embodiment in one or more programming languages or a combination thereof. The above-mentioned programming languages include object-oriented programming languages such as Java, Python, and C++. Also included are conventional procedural programming languages—such as C or similar programming languages. The Program may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In situations involving remote computers, the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer, such as through the Internet using an Internet service provider. ).
综上所述,本发明通过对采集到的视频进行分帧处理,得到视频图像数据集,再对视频图像数据进行预处理,从而有效解决了监控设备采集过程中存在的光照不足和阴影等问题;此外,搭建的火灾视频图像识别模型不仅能够减少网络模型参数量,而且还能提高网络模型的检测效率与准确率,进而实现对火灾视频图像的快速识别,从而能够及时发现火灾隐患,保证人身财产安全。In summary, the present invention obtains a video image data set by performing frame processing on the collected video, and then preprocesses the video image data, thereby effectively solving problems such as insufficient lighting and shadows that exist during the collection process of monitoring equipment. ; In addition, the built fire video image recognition model can not only reduce the number of network model parameters, but also improve the detection efficiency and accuracy of the network model, thereby realizing rapid recognition of fire video images, so as to detect fire hazards in time and ensure personal safety. Property security.
以上所述,仅为本发明专利较佳的实施例,但本发明专利的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明专利所公开的范围内,根据本发明专利的技术方案及其发明构思加以等同替换或改变,都属于本发明专利的保护范围。The above are only preferred embodiments of the patent of the present invention, but the scope of protection of the patent of the present invention is not limited thereto. Any person familiar with the technical field can, within the scope disclosed by the patent of the present invention, proceed according to the patent of the present invention. Any equivalent substitution or change of the technical solution and its inventive concept shall fall within the scope of protection of the patent of the present invention.

Claims (10)

  1. 一种火灾视频图像识别方法,其特征在于,所述方法包括:A fire video image recognition method, characterized in that the method includes:
    获取数据集,所述数据集为火灾与非火灾的视频图像数据集;Obtain a data set, which is a fire and non-fire video image data set;
    构建卷积神经网络,所述卷积神经网络包括一层输入层、一层模块A、三层模块B、两层模块C、两层1×1卷积块A、四层最大池化层、一层自适应平均池化层、一层flatten层、一层dropout层、一层全连接层和一层softmax分类层;Construct a convolutional neural network. The convolutional neural network includes one layer of input layer, one layer of module A, three layers of module B, two layers of module C, two layers of 1×1 convolution block A, and four layers of maximum pooling layers. One layer of adaptive average pooling layer, one layer of flatten layer, one layer of dropout layer, one layer of fully connected layer and one layer of softmax classification layer;
    利用数据集对卷积神经网络进行训练,得到火灾视频图像识别模型;Use the data set to train the convolutional neural network and obtain the fire video image recognition model;
    获取待识别视频,并对待识别视频进行分帧处理,得到待识别视频图像;Obtain the video to be identified and perform frame processing on the video to be identified to obtain the video image to be identified;
    将待识别视频图像输入火灾视频图像识别模型,实现火灾视频图像识别。Input the video image to be recognized into the fire video image recognition model to realize fire video image recognition.
  2. 根据权利要求1所述的火灾视频图像识别方法,其特征在于,三层模块B分别为第一模块B、第二模块B和第三模块B,两层模块C分别为第一模块C和第二模块C,两层1×1卷积块A分别为第一1×1卷积块A和第二1×1卷积块A,四层最大池化层分别为第一池化层、第二池化层、第三池化层和第四池化层;The fire video image recognition method according to claim 1, characterized in that the three-layer modules B are the first module B, the second module B and the third module B respectively, and the two-layer modules C are the first module C and the third module B respectively. Second module C, two layers of 1×1 convolution block A are respectively the first 1×1 convolution block A and the second 1×1 convolution block A, and the four layers of maximum pooling layers are respectively the first pooling layer and the second 1×1 convolution block A. Second pooling layer, third pooling layer and fourth pooling layer;
    所述构建卷积神经网络,具体如下:The construction of a convolutional neural network is as follows:
    依次连接输入层、模块A、第一最大池化层、第一模块B、第一1×1卷积块A、第二最大池化层、第一模块C、第三最大池化层、第二1×1卷积块A、第二模块B、第四最大池化层、第二模块C、第三模块B、自适应平均池化层、dropout层、flatten层、全连接层、softmax分类层,进而构建得到卷积神经网络。Connect the input layer, module A, the first max pooling layer, the first module B, the first 1×1 convolution block A, the second max pooling layer, the first module C, the third max pooling layer, and the Two 1×1 convolution block A, second module B, fourth maximum pooling layer, second module C, third module B, adaptive average pooling layer, dropout layer, flatten layer, fully connected layer, softmax classification layers, and then construct a convolutional neural network.
  3. 根据权利要求1-2任一项所述的火灾视频图像识别方法,其特征在于,所述模块A包括输入层、第一特征提取层和输出层;所述模块B包括输入层、第二特征提取层和输出层;所述模块C包括输入层、第三特征提取层和输出层。The fire video image recognition method according to any one of claims 1-2, characterized in that the module A includes an input layer, a first feature extraction layer and an output layer; the module B includes an input layer, a second feature Extraction layer and output layer; the module C includes an input layer, a third feature extraction layer and an output layer.
  4. 根据权利要求3所述的火灾视频图像识别方法,其特征在于,所述第一特征提取层包括第一输入通道、第一输出通道、第二输出通道和第三输出通道;The fire video image recognition method according to claim 3, wherein the first feature extraction layer includes a first input channel, a first output channel, a second output channel and a third output channel;
    所述第一输入通道为第一3×3卷积块A、第二3×3卷积块A、第三3×3卷积块A依次连接;The first input channel is a first 3×3 convolution block A, a second 3×3 convolution block A, and a third 3×3 convolution block A connected in sequence;
    所述第一输出通道输出第一3×3卷积块A的特征信息矩阵;The first output channel outputs the feature information matrix of the first 3×3 convolution block A;
    所述第二输出通道输出第二3×3卷积块A的特征信息矩阵;The second output channel outputs the feature information matrix of the second 3×3 convolution block A;
    所述第三输出通道输出第三3×3卷积块A的特征信息矩阵。The third output channel outputs the feature information matrix of the third 3×3 convolution block A.
  5. 根据权利要求3所述的火灾视频图像识别方法,其特征在于,所述第二特征提 取层包括第二输入通道、第三输入通道、第四输入通道、第四输出通道、第五输出通道、第六输出通道、第七输出通道和第八输出通道;The fire video image recognition method according to claim 3, characterized in that the second feature extraction layer includes a second input channel, a third input channel, a fourth input channel, a fourth output channel, a fifth output channel, The sixth output channel, the seventh output channel and the eighth output channel;
    所述第二输入通道为第三1×1卷积块A;The second input channel is the third 1×1 convolution block A;
    所述第三输入通道,具体为:先将第一3×3卷积块B和第二3×3卷积块B依次连接,并将第一3×3卷积块B的特征信息矩阵输出和第二3×3卷积块B的特征信息矩阵输出相加之后,再与第一激活层和第三3×3卷积块B依次连接;The third input channel is specifically: first connect the first 3×3 convolution block B and the second 3×3 convolution block B in sequence, and output the feature information matrix of the first 3×3 convolution block B. After adding it to the feature information matrix output of the second 3×3 convolution block B, it is then connected to the first activation layer and the third 3×3 convolution block B in sequence;
    所述第四输入通道为第五最大池化层和第四1×1卷积块A依次连接;The fourth input channel is the fifth maximum pooling layer and the fourth 1×1 convolution block A connected in sequence;
    所述第四输出通道输出第三1×1卷积块A的特征信息矩阵;The fourth output channel outputs the feature information matrix of the third 1×1 convolution block A;
    所述第五输出通道输出第一3×3卷积块B的特征信息矩阵;The fifth output channel outputs the feature information matrix of the first 3×3 convolution block B;
    所述第六输出通道输出第二3×3卷积块B的特征信息矩阵;The sixth output channel outputs the feature information matrix of the second 3×3 convolution block B;
    所述第七输出通道输出第三3×3卷积块B的特征信息矩阵;The seventh output channel outputs the feature information matrix of the third 3×3 convolution block B;
    所述第八输出通道输出第四1×1卷积块A的特征信息矩阵。The eighth output channel outputs the feature information matrix of the fourth 1×1 convolution block A.
  6. 根据权利要求3所述的火灾视频图像识别方法,其特征在于,所述第三特征提取层包括第一输入输出通道、第二输入输出通道、第三输入输出通道、第四输入输出通道和第五输入输出通道;The fire video image recognition method according to claim 3, characterized in that the third feature extraction layer includes a first input and output channel, a second input and output channel, a third input and output channel, a fourth input and output channel and a third input and output channel. Five input and output channels;
    所述第一输入输出通道为第五1×1卷积块A;The first input and output channel is the fifth 1×1 convolution block A;
    所述第二输入输出通道为第四3×3卷积块B和第六1×1卷积块A依次连接;The second input and output channels are the fourth 3×3 convolution block B and the sixth 1×1 convolution block A connected in sequence;
    所述第三输入输出通道为第五3×3卷积块B、第六3×3卷积块B、第七1×1卷积块A依次连接;The third input and output channels are the fifth 3×3 convolution block B, the sixth 3×3 convolution block B, and the seventh 1×1 convolution block A connected in sequence;
    所述第四输入输出通道为第七3×3卷积块B、第八3×3卷积块B、第九3×3卷积块B、第八1×1卷积块A依次连接;The fourth input and output channels are the seventh 3×3 convolution block B, the eighth 3×3 convolution block B, the ninth 3×3 convolution block B, and the eighth 1×1 convolution block A, which are connected in sequence;
    所述第五输入输出通道为第六最大池化层和第九1×1卷积块A依次连接。The fifth input and output channel is connected to the sixth maximum pooling layer and the ninth 1×1 convolution block A in sequence.
  7. 根据权利要求5-6任一项所述的火灾视频图像识别方法,其特征在于,所述卷积块B包括依次连接的卷积层、批量归一化层和第二激活层;The fire video image recognition method according to any one of claims 5-6, characterized in that the convolution block B includes a convolution layer, a batch normalization layer and a second activation layer connected in sequence;
    所述第二激活层采用的激活函数是:RELU6,RELU6(x)=min(max(x,0),6);The activation function used in the second activation layer is: RELU6, RELU6(x)=min(max(x,0),6);
    所述卷积块B中的卷积层采用的是深度可分离卷积操作。The convolution layer in the convolution block B uses depth-separable convolution operations.
  8. 根据权利要求3所述的火灾视频图像识别方法,其特征在于,所述输出层的输入对对应特征提取层中输出的全部特征信息矩阵进行深度上的拼接。The fire video image recognition method according to claim 3, characterized in that the input of the output layer performs depth splicing of all feature information matrices output in the corresponding feature extraction layer.
  9. 一种火灾视频图像识别系统,其特征在于,所述系统包括:A fire video image recognition system, characterized in that the system includes:
    第一获取单元,用于获取数据集,所述数据集为火灾与非火灾的视频图像数据集;The first acquisition unit is used to acquire a data set, where the data set is a fire and non-fire video image data set;
    构建单元,用于构建卷积神经网络,所述卷积神经网络包括一层输入层、一层模块A、三层模块B、两层模块C、两层1×1卷积块A、四层最大池化层、一层自适应平均池化层、一层flatten层、一层dropout层、一层全连接层和一层softmax分类层;Building unit, used to construct a convolutional neural network. The convolutional neural network includes one layer of input layer, one layer of module A, three layers of module B, two layers of module C, two layers of 1×1 convolution block A, and four layers. Maximum pooling layer, an adaptive average pooling layer, a flatten layer, a dropout layer, a fully connected layer and a softmax classification layer;
    训练单元,用于利用数据集对卷积神经网络进行训练,得到火灾视频图像识别模型;The training unit is used to train the convolutional neural network using the data set to obtain the fire video image recognition model;
    第二获取单元,用于获取待识别视频,并对待识别视频进行分帧处理,得到待识别视频图像;The second acquisition unit is used to acquire the video to be recognized and perform frame processing on the video to be recognized to obtain the video image to be recognized;
    识别单元,用于将待识别视频图像输入火灾视频图像识别模型,实现火灾视频图像识别。The recognition unit is used to input the video image to be recognized into the fire video image recognition model to realize fire video image recognition.
  10. 一种计算机设备,包括处理器以及用于存储处理器可执行程序的存储器,其特征在于,所述处理器执行存储器存储的程序时,实现权利要求1-8任一项所述的火灾视频图像识别方法。A computer device, including a processor and a memory for storing a program executable by the processor, characterized in that when the processor executes the program stored in the memory, the fire video image described in any one of claims 1-8 is realized. recognition methods.
PCT/CN2022/084441 2022-03-31 2022-03-31 Fire video image recognition method and system, computer device, and storage medium WO2023184350A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210327700.6A CN114419558B (en) 2022-03-31 2022-03-31 Fire video image identification method, fire video image identification system, computer equipment and storage medium
CN202210327700.6 2022-03-31

Publications (1)

Publication Number Publication Date
WO2023184350A1 true WO2023184350A1 (en) 2023-10-05

Family

ID=81264231

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/084441 WO2023184350A1 (en) 2022-03-31 2022-03-31 Fire video image recognition method and system, computer device, and storage medium

Country Status (2)

Country Link
CN (1) CN114419558B (en)
WO (1) WO2023184350A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117593610A (en) * 2024-01-17 2024-02-23 上海秋葵扩视仪器有限公司 Image recognition network training and deployment and recognition methods, devices, equipment and media

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109522819A (en) * 2018-10-29 2019-03-26 西安交通大学 A kind of fire image recognition methods based on deep learning
CN111507962A (en) * 2020-04-17 2020-08-07 无锡雪浪数制科技有限公司 Cotton sundry identification system based on depth vision
CN112419650A (en) * 2020-11-11 2021-02-26 国网福建省电力有限公司电力科学研究院 Fire detection method and system based on neural network and image recognition technology

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018116966A1 (en) * 2016-12-21 2018-06-28 ホーチキ株式会社 Fire monitoring system
CN107292298B (en) * 2017-08-09 2018-04-20 北方民族大学 Ox face recognition method based on convolutional neural networks and sorter model
CN109063728A (en) * 2018-06-20 2018-12-21 燕山大学 A kind of fire image deep learning mode identification method
CN110059582B (en) * 2019-03-28 2023-04-07 东南大学 Driver behavior identification method based on multi-scale attention convolution neural network
US11182611B2 (en) * 2019-10-11 2021-11-23 International Business Machines Corporation Fire detection via remote sensing and mobile sensors
CN111553298B (en) * 2020-05-07 2021-02-05 卓源信息科技股份有限公司 Fire disaster identification method and system based on block chain
CN111639571B (en) * 2020-05-20 2023-05-23 浙江工商大学 Video action recognition method based on contour convolution neural network
CN112231974B (en) * 2020-09-30 2022-11-04 山东大学 Deep learning-based method and system for recovering seismic wave field characteristics of rock breaking seismic source of TBM (Tunnel boring machine)
CN113591591A (en) * 2021-07-05 2021-11-02 北京瑞博众成科技有限公司 Artificial intelligence field behavior recognition system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109522819A (en) * 2018-10-29 2019-03-26 西安交通大学 A kind of fire image recognition methods based on deep learning
CN111507962A (en) * 2020-04-17 2020-08-07 无锡雪浪数制科技有限公司 Cotton sundry identification system based on depth vision
CN112419650A (en) * 2020-11-11 2021-02-26 国网福建省电力有限公司电力科学研究院 Fire detection method and system based on neural network and image recognition technology

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117593610A (en) * 2024-01-17 2024-02-23 上海秋葵扩视仪器有限公司 Image recognition network training and deployment and recognition methods, devices, equipment and media
CN117593610B (en) * 2024-01-17 2024-04-26 上海秋葵扩视仪器有限公司 Image recognition network training and deployment and recognition methods, devices, equipment and media

Also Published As

Publication number Publication date
CN114419558A (en) 2022-04-29
CN114419558B (en) 2022-07-05

Similar Documents

Publication Publication Date Title
Sun et al. Lesion-aware transformers for diabetic retinopathy grading
CN111723786B (en) Method and device for detecting wearing of safety helmet based on single model prediction
WO2019033715A1 (en) Human-face image data acquisition method, apparatus, terminal device, and storage medium
CN112434608B (en) Human behavior identification method and system based on double-current combined network
WO2021104125A1 (en) Abnormal egg identification method, device and system, storage medium, and electronic device
WO2021218238A1 (en) Image processing method and image processing apparatus
WO2023184350A1 (en) Fire video image recognition method and system, computer device, and storage medium
EP4105828A1 (en) Model updating method and related device
Li et al. Personality driven multi-task learning for image aesthetic assessment
Liu et al. Visual smoke detection based on ensemble deep cnns
CN111327949A (en) Video time sequence action detection method, device, equipment and storage medium
WO2020228129A1 (en) Method and system for evaluating and invoking facial recognition algorithm engine
WO2022105118A1 (en) Image-based health status identification method and apparatus, device and storage medium
Chen et al. A novel smoke detection algorithm based on improved mixed Gaussian and YOLOv5 for textile workshop environments
CN114067268A (en) Method and device for detecting safety helmet and identifying identity of electric power operation site
CN113343123A (en) Training method and detection method for generating confrontation multiple relation graph network
CN116611021A (en) Multi-mode event detection method and system based on double-transducer fusion model
CN115964503B (en) Safety risk prediction method and system based on community equipment facilities
CN112949777B (en) Similar image determining method and device, electronic equipment and storage medium
Wang et al. Face mask-wearing detection model based on loss function and attention mechanism
Li et al. Smoking behavior recognition based on a two-level attention fine-grained model and EfficientDet network
Liu et al. Real-time fire detection network for intelligent surveillance systems
CN112907885A (en) Distributed centralized household image fire alarm system and method based on SCNN
Jaleel et al. Towards Proactive Surveillance through CCTV Cameras under Edge-Computing and Deep Learning
Wang et al. Cbam-Efficientnetv2-Enabled Fire Image Recognition Method in Detecting Tiny Targets

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22934193

Country of ref document: EP

Kind code of ref document: A1