CN114580631B

CN114580631B - Model training method, smoke and fire detection method, device, electronic equipment and medium

Info

Publication number: CN114580631B
Application number: CN202210214314.6A
Authority: CN
Inventors: 安梦涛; 程军
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-03-04
Filing date: 2022-03-04
Publication date: 2023-09-08
Anticipated expiration: 2042-03-04
Also published as: CN114580631A

Abstract

The disclosure provides a training method of a deep learning model, a smoke and fire detection method, a device, electronic equipment and a storage medium, and relates to the technical field of artificial intelligence, in particular to the field of deep learning. The specific implementation scheme is as follows: determining an evaluation value for representing evaluation performance of a deep learning model, wherein the deep learning model is obtained by training a first sample image set; in response to detecting that the evaluation value does not reach the predefined range, processing the first sample image set to obtain a second sample image set; and training the deep learning model with the second sample image set.

Description

Model training method, smoke and fire detection method, device, electronic equipment and medium

Technical Field

The disclosure relates to the technical field of artificial intelligence, in particular to the field of deep learning, and specifically relates to a training method of a deep learning model, a smoke and fire detection method, a device, electronic equipment and a storage medium.

Background

Deep learning, also known as deep structured learning or hierarchical learning, is part of a broader family of machine learning methods based on artificial neural networks. Deep learning architectures, such as deep neural networks, deep belief networks, recurrent neural networks, and convolutional neural networks, have been applied in fields including computer vision, speech recognition, natural language processing, audio recognition, social network filtering, machine translation, bioinformatics, drug design, medical image analysis, material inspection, and board game programs. In order to ensure the accuracy of output results in various fields, corresponding model training is indispensable.

Disclosure of Invention

The disclosure provides a training method of a deep learning model, a smoke detection method, a smoke detection device, electronic equipment and a storage medium.

According to an aspect of the present disclosure, there is provided a training method of a deep learning model, including:

determining an evaluation value for characterizing evaluation performance of a deep learning model, wherein the deep learning model is trained using a first sample image set; in response to detecting that the evaluation value does not reach a predefined range, processing the first sample image set to obtain a second sample image set; and training the deep learning model using the second sample image set.

According to another aspect of the present disclosure, there is provided a smoke detection method comprising: inputting the image to be detected into a deep learning model to obtain a detection result; the deep learning model is obtained through training by the training method.

According to another aspect of the present disclosure, there is provided a training apparatus of a deep learning model, including: the determining module is used for determining an evaluation value for representing the evaluation performance of a deep learning model, wherein the deep learning model is obtained by training a first sample image set; the processing module is used for processing the first sample image set to obtain a second sample image set in response to detecting that the evaluation value does not reach a predefined range; and a training module for training the deep learning model by using the second sample image set.

According to another aspect of the present disclosure, there is provided a smoke detection device comprising: the acquisition module is used for inputting the image to be detected into the deep learning model to obtain a detection result; the deep learning model is trained based on the training device.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the training method and the smoke detection method of the deep learning model of the present disclosure.

According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the training method and the smoke detection method of the deep learning model of the present disclosure.

According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the training method and the smoke detection method of the deep learning model of the present disclosure.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 schematically illustrates an exemplary system architecture of training methods and apparatus, smoke detection methods and apparatus, to which deep learning models may be applied, in accordance with embodiments of the present disclosure;

FIG. 2 schematically illustrates a flow chart of a training method of a deep learning model according to an embodiment of the present disclosure;

FIG. 3 schematically illustrates an overall flowchart of training a deep learning model in accordance with an embodiment of the present disclosure;

FIG. 4 schematically illustrates a flow chart of a smoke detection method according to an embodiment of the present disclosure;

FIG. 5 schematically illustrates a block diagram of a training apparatus of a deep learning model according to an embodiment of the present disclosure;

FIG. 6 schematically illustrates a block diagram of a smoke detection device according to an embodiment of the present disclosure; and

FIG. 7 illustrates a schematic block diagram of an example electronic device that may be used to implement embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In the technical scheme of the disclosure, the related processes of collecting, storing, using, processing, transmitting, providing, disclosing, applying and the like of the personal information of the user all conform to the regulations of related laws and regulations, necessary security measures are adopted, and the public order harmony is not violated.

In the technical scheme of the disclosure, the authorization or consent of the user is obtained before the personal information of the user is obtained or acquired.

Fire is a multiple disaster which endangers the life and property safety of people, and is particularly important to detect the fire in time. When fire disaster detection is carried out on high-rise scenes of fires such as houses, gas stations, roads, forests and the like, a target detection technology of PaddleX (flying oar full-flow development tool) can be applied to automatically detect smoke and fires in a detectable area, help related personnel to deal with in time and reduce casualties and property losses to the greatest extent.

The smoke and fire detection technology comprises the steps of image acquisition, image preprocessing, image combination, smoke and fire target detection, deep learning and the like. The method for realizing the detection of the pyrotechnic target comprises the following steps: successive multi-frame video images are acquired. And determining the undetermined smoke and fire area in each frame of video image according to the color distribution of each frame of video image. And determining the pixel motion area in each frame of video image according to the change condition of the continuous multi-frame video image. And determining a target pyrotechnic region in each frame of video image according to the to-be-determined pyrotechnic region and the pixel motion region.

The inventor finds that in the process of realizing the conception of the disclosure, in the process of carrying out smoke and fire detection, a plurality of interference samples are needed, and false detection is easy to cause. For example, many objects and fireworks in life are very close, such as clouds, red lights and the like, are difficult to distinguish, and are easy to cause false detection. In addition, the speed of smoke and fire detection is low, the fire development speed is high, if the fire cannot be judged at the initial stage of the fire, and human intervention is performed, the detection result can be output until the middle stage of the fire, and the significance of smoke and fire detection is lost.

The disclosure provides a training method of a deep learning model, a smoke detection method, a smoke detection device, electronic equipment and a storage medium. The training method of the deep learning model comprises the following steps: determining an evaluation value for representing evaluation performance of a deep learning model, wherein the deep learning model is obtained by training a first sample image set; in response to detecting that the evaluation value does not reach the predefined range, processing the first sample image set to obtain a second sample image set; and training the deep learning model with the second sample image set.

FIG. 1 schematically illustrates an exemplary system architecture of training methods and apparatus, smoke detection methods and apparatus, to which deep learning models may be applied, according to embodiments of the present disclosure.

It should be noted that fig. 1 is only an example of a system architecture to which embodiments of the present disclosure may be applied to assist those skilled in the art in understanding the technical content of the present disclosure, but does not mean that embodiments of the present disclosure may not be used in other devices, systems, environments, or scenarios. For example, in another embodiment, an exemplary system architecture to which the training method and apparatus of the deep learning model and the smoke and fire detection method and apparatus may be applied may include a terminal device, but the terminal device may implement the training method and apparatus of the deep learning model and the smoke and fire detection method and apparatus provided by the embodiments of the present disclosure without interacting with a server.

As shown in fig. 1, a system architecture 100 according to this embodiment may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired and/or wireless communication links, and the like.

The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like. Various communication client applications may be installed on the terminal devices 101, 102, 103, such as a knowledge reading class application, a web browser application, a search class application, an instant messaging tool, a mailbox client and/or social platform software, etc. (as examples only).

The terminal devices 101, 102, 103 may be a variety of electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.

The server 105 may be a server providing various services, such as a background management server (by way of example only) providing support for content browsed by the user using the terminal devices 101, 102, 103. The background management server may analyze and process the received data such as the user request, and feed back the processing result (e.g., the web page, information, or data obtained or generated according to the user request) to the terminal device. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service ("Virtual Private Server" or simply "VPS") are overcome. The server may also be a server of a distributed system or a server that incorporates a blockchain.

It should be noted that, the training method and the smoke detection method of the deep learning model provided by the embodiments of the present disclosure may be generally performed by the terminal device 101, 102, or 103. Accordingly, the training device and the smoke detection device of the deep learning model provided by the embodiments of the present disclosure may also be provided in the terminal device 101, 102, or 103.

Alternatively, the training method and the smoke detection method of the deep learning model provided by the embodiments of the present disclosure may be generally performed by the server 105. Accordingly, the training device and the smoke detection device of the deep learning model provided in the embodiments of the present disclosure may be generally disposed in the server 105. The training method of the deep learning model, the smoke detection method provided by the embodiments of the present disclosure may also be performed by a server or a server cluster that is different from the server 105 and is capable of communicating with the terminal devices 101, 102, 103 and/or the server 105. Accordingly, the training apparatus and the smoke detection apparatus of the deep learning model provided in the embodiments of the present disclosure may also be provided in a server or a server cluster that is different from the server 105 and is capable of communicating with the terminal devices 101, 102, 103 and/or the server 105.

For example, when training the deep learning model is required, the terminal device 101, 102, 103 may acquire a first sample image set, then send the acquired first sample image set to the server 105, determine an evaluation value for characterizing an evaluation performance of the deep learning model by the server 105, the deep learning model is trained using the first sample image set, process the first sample image set in response to detecting that the evaluation value does not reach a predefined range, obtain a second sample image set, and train the deep learning model using the second sample image set. Or by a server or cluster of servers capable of communicating with the terminal devices 101, 102, 103 and/or server 105, and to enable training of the deep learning model.

For example, when smoke detection is required, the terminal devices 101, 102, 103 may acquire an image to be detected, then send the acquired image to be detected to the server 105, and the server 105 inputs the image to be detected into the deep learning model to obtain a detection result. Or the image to be detected is analyzed by a server or a server cluster capable of communicating with the terminal devices 101, 102, 103 and/or the server 105, and the detection result is obtained.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

Fig. 2 schematically illustrates a flowchart of a training method of a deep learning model according to an embodiment of the present disclosure.

As shown in fig. 2, the method includes operations S210 to S230.

In operation S210, an evaluation value for characterizing an evaluation performance of a deep learning model, which is trained using a first sample image set, is determined.

In response to detecting that the evaluation value does not reach the predefined range, the first sample image set is processed to obtain a second sample image set in operation S220.

In operation S230, the deep learning model is trained using the second sample image set.

According to embodiments of the present disclosure, a deep learning model may be used to detect target information in an image, which may include at least one of smoke information, fire information, and the like. In the case where a deep learning model is used to detect smoke information and fire information in an image, the output of the model may include the category of the detected target information, such as smoke category, fire category, and the like. The output of the model may also include location information of the detected smoke information in the image, location information of the detected fire information in the image, and the like. The location information may include a certain pixel coordinate or a set of certain pixel coordinates covered by the detected object information in the image.

According to an embodiment of the present disclosure, the first sample image set may include an image having target information. The image having the target information may include an image obtained by collecting a scene having the target information by an image collecting device, or may include an image formed by clipping the target information from the image having the target information, expanding or shrinking the clipped target information, and then pasting the clipped target information into a randomly selected background picture. The target information in the image may include target information having a footprint of any size, e.g., the footprint of the target information may be less than 3% of the image area. In the case where the deep learning model needs to be trained as a model for detecting information of smoke, fire, or the like, the target information may include smoke information and fire information.

According to an embodiment of the present disclosure, for each image in the first sample image set, tag information may be configured, and the tag information may include at least one of category tag information characterizing a category of the target information, position tag information characterizing a position of the target information in the image, and the like. For example, the tag information associated with the image with smoke information may include a smoke category tag and a smoke location tag, and the tag information associated with the image with fire information may include a fire category tag and a fire location tag. The tag information associated with the image having the smoke information and the fire information may include a smoke category tag, a fire category tag, a smoke location tag, and a fire location tag, and the tag information associated with the image having no smoke information and no fire information may include an empty tag, and the like.

According to embodiments of the present disclosure, the evaluation value may include at least one of a recall rate at an image level, a false detection rate at an image level, a MAP (Mean Average Precision, full-class average correct rate), and an IOU (Intersection over Union, cross-over ratio), etc., for evaluating performance of the deep learning model.

According to embodiments of the present disclosure, the deep learning model may include a target detection model implemented based on a single-stage PP-YOLO algorithm, which is an effective and efficient target detection model. The deep learning model may be obtained by training the PP-YOLO model using the first sample image set, the second sample image set, and the like described above. The PP-YOLO model can improve the real-time performance of target detection, and the target detection model can be beneficial to effectively detecting target information with small occupied area in an image.

According to an embodiment of the present disclosure, various kinds of evaluation values for evaluating the current performance of a model may be first calculated for a deep learning model trained using a first sample image set. Then, in the case where it is detected that at least one evaluation value of the plurality of classes of evaluation values does not reach the predefined range, the deep learning model may be further trained with a second sample image set different from the first sample image set. And (3) continuously calculating the evaluation value of the deep learning model obtained by further training, and continuously training the deep learning model by using the new sample image set under the condition that the evaluation value does not reach the predefined range until the evaluation value of the deep learning model obtained by training reaches the predefined range, and ending the training process.

According to an embodiment of the present disclosure, in a case where the evaluation value is a recall, the evaluation value not reaching the predefined range may include the recall being smaller than a first preset value. In the case where the evaluation value is the false detection rate, the evaluation value not reaching the predefined range may include the false detection rate being greater than the second preset value. The first preset value and the second preset value can be set in a self-defined mode, and set values of the first preset value and the second preset value can be different.

It should be noted that, when training the deep learning model, all the used sample image sets may be regarded as the first sample image set, and the second sample image set may represent the sample image set used when training the deep learning model next time.

Through the embodiment of the disclosure, the deep learning model can be further trained under the condition that the evaluation value of the deep learning model does not reach the predefined range, so that a more optimized deep learning model is obtained, and the accuracy of the detection result can be effectively improved when the optimized deep learning model is used for target detection such as smoke detection.

The method shown in fig. 2 is further described below in connection with the specific examples.

According to an embodiment of the present disclosure, in the case where the evaluation value is the recall, determining the evaluation value for characterizing the evaluation performance of the deep learning model may include: and detecting each image in the first image set by using the deep learning model to obtain a first target image with target information detected, wherein the first image set comprises the first image with the target information. And determining the recall rate according to the ratio of the number of the first target images to the number of the first images.

According to an embodiment of the present disclosure, the first image set may include a plurality of first images having target information selected at random, and the first image set may also include images having no target information. In calculating the recall ratio, the calculation may be performed based on only the first image having the target information and the detection result thereof.

According to an embodiment of the present disclosure, when each image in the first image set is subject to target detection using the deep learning model, the first image may be considered recalled whenever a target is detected on the first image having target information. The recall rate of the picture level of the current deep learning model may be determined by calculating the proportion of all recalled first images in all first images having the target information.

According to embodiments of the present disclosure, when calculating the recall ratio, the first image having the target information may also be first divided into a plurality of batches. Then, the proportion of the recalled first image in the first image with the target information is calculated in batches, and the recall rate of the picture level of the current deep learning model is determined. The method for determining the recall rate through batch calculation can comprise the following steps: for each batch of first images, calculating the proportion of the recalled first images in the batch to the first images included in the batch to obtain a plurality of proportions. After obtaining the plurality of ratios, the recall may be determined based on the value of the ratio having the smallest value among the plurality of ratios. Recall may also be determined based on an average of the multiple ratios. The recall may also be determined based on a value of the proportion of the plurality of proportions that occurs most frequently. The description is not limited thereto.

Through the embodiment of the disclosure, the recall rate is introduced as an index for judging the performance of the deep learning model, so that the detection effect of the deep learning model can be correctly reflected, the more optimized deep learning model can be conveniently and efficiently trained, and the accuracy rate of target detection by using the deep learning model is further improved.

According to an embodiment of the present disclosure, in the case where the evaluation value is the false detection rate, determining the evaluation value for characterizing the evaluation performance of the deep learning model may include: and detecting each image in the second image set by using the deep learning model to obtain a second target image with target information detected, wherein the second image set comprises the second image without the target information. And determining the false detection rate according to the ratio of the number of the second target images to the number of the second images.

According to an embodiment of the present disclosure, the second image set may include a plurality of second images having target information selected at random, and the second image set may also include images having target information. In calculating the false detection rate, the calculation may be performed based on only the second image having no target information and the detection result thereof.

According to the embodiment of the present disclosure, in performing object detection on each image in the second image set using the deep learning model, as long as an object is detected on the second image having no object information, the second image may be considered to be erroneously detected. The false detection rate of the picture level of the current deep learning model can be determined by calculating the proportion of all the false detected second images in all the second images without target information.

According to an embodiment of the present disclosure, when calculating the false detection rate, the second image having no target information may also be first divided into a plurality of batches. And then, calculating the proportion of the second image which is subjected to false detection in the second image without target information in batches, and determining the false detection rate of the picture level of the current deep learning model. The method for determining the false detection rate by means of batch calculation can comprise the following steps: and calculating the proportion of the false detected second image in the batch in the second image included in the batch aiming at the second image of each batch to obtain a plurality of proportions. After the plurality of ratios are obtained, the false detection rate may be determined based on the value of the ratio having the largest value among the plurality of ratios. The false detection rate may also be determined based on an average of a plurality of ratios. The false detection rate may be determined based on a value of a ratio having the highest occurrence frequency among the plurality of ratios. The description is not limited thereto.

Through the embodiment of the disclosure, the false detection rate is introduced as an index for judging the performance of the deep learning model, so that the detection effect of the deep learning model can be correctly reflected, the more optimized deep learning model can be conveniently and efficiently trained, and the accuracy rate of target detection by using the deep learning model is further improved.

Whether or not the target is detected is irrelevant to the number of detection frames, and the detection result indicates the presence of the detection frames, so that the target can be considered to be detected. For the detection result in which the detection frame does not exist, it can be considered that the corresponding image is not detected as a target.

According to an embodiment of the present disclosure, a method of processing a first sample image set to obtain a second sample image set may include at least one of: and performing data augmentation processing on the first sample image set to obtain a second sample image set. And adding negative sample images with the similarity with the target information being greater than a preset threshold value in the first sample image set to obtain a second sample image set.

According to embodiments of the present disclosure, a data augmentation policy for performing data augmentation processing may include at least one of: mixup (a simple and data independent data augmentation), random distor (random brightness change), random expansion (random fill), random contrast (random contrast change), random color (random color change), random crop (random inter-p (random zoom), random flip, resize (random flip), batch random Resize (batch random size adjustment), normal (normalization), etc., and may not be limited thereto.

According to an embodiment of the present disclosure, the second preset threshold may be custom determined. The negative sample image may include an image having at least one of the following information: cloud information, snow mountain information, light information, and sun information, and may not be limited thereto.

It should be noted that the above processing manner is only an exemplary embodiment, but not limited thereto, and other processing methods known in the art may be also included, for example, a new sample image set may be acquired as the second sample image set, so long as it is possible to implement that the second sample image set is different from the first sample image set.

Through the above embodiment of the present disclosure, in the case that it is determined that the evaluation value of the deep learning model does not reach the predefined range, the second sample image set for further training the deep learning model may be determined in combination with at least one of the data augmentation method and the negative sample image, so that after the deep learning model is trained by using the second sample image set, performance of the deep learning model may be effectively improved, and in particular, accuracy in target detection using the deep learning model may be effectively improved.

According to embodiments of the present disclosure, a deformable convolution sub-model may be included in the deep learning model. Training of the deformable convolution sub-model can be achieved simultaneously with training of the deep learning model.

According to embodiments of the present disclosure, a deformable convolution (DCN, deformable Convolution Network) sub-model may be added on top of the PP-YOLO model. The deformable convolution sub-model can adapt to information of various shapes and sizes. The method is particularly suitable for target detection in scenes with relatively large scale changes such as smoke, fire and the like.

According to embodiments of the present disclosure, different backbones may also be provided in the deep learning model. For example, resNet101 may be selected as the backbone network, and a highly accurate deep learning model may be obtained. ResNet50 can be selected as a backbone network, and a deep learning model with higher detection speed can be obtained. ResNet is a residual network.

According to an embodiment of the present disclosure, the deep learning model is trained using a first set of sample images on the basis of a pre-training model.

In accordance with embodiments of the present disclosure, the pre-training model may include models trained on COCO public data sets, models trained on smoke and fire related data sets, and the like. After the pre-trained model is obtained, training can be performed on the smoke and fire data set to obtain a deep learning model. The deep learning model is obtained by applying the migration learning mode, so that the detection effect of the model can be further improved, and the requirement of high recall and low false detection is met.

FIG. 3 schematically illustrates an overall flowchart of training a deep learning model according to an embodiment of the present disclosure.

As shown in FIG. 3, the method includes operations S310-S340.

In operation S310, for the deep learning model obtained by training, an evaluation value for characterizing the evaluation performance of the deep learning model is determined.

In operation S320, whether the evaluation value reaches a predefined range? If yes, executing operation S330; if not, operations S340 and S310 to S320 are performed.

In operation S330, the training process ends.

In operation S340, a training sample image for training the deep learning model is updated, and the deep learning model is trained using the new training sample image.

It should be noted that, in the case where the evaluation values include a plurality of evaluation indexes such as the recall rate and the false detection rate, only if the evaluation values of the plurality of evaluation indexes all reach the predefined range, the determination result of operation S320 is yes. Otherwise, as long as one of the evaluation values of the plurality of evaluation indexes does not reach the predefined range, the determination result of operation S320 is no.

Through the embodiment of the disclosure, the deep learning model can be trained by combining the evaluation value of the deep learning model, so that the optimized deep learning model is obtained, and the accuracy of the model detection result is improved.

Fig. 4 schematically illustrates a flow chart of a smoke detection method according to an embodiment of the present disclosure.

As shown in fig. 4, the method includes operation S410.

In operation S410, the image to be detected is input into the deep learning model, and a detection result is obtained.

According to an embodiment of the present disclosure, the deep learning model may be a model trained based on the above-described training method. For example, a smoke and fire dataset may be first acquired, and a deep learning object detection model trained so that the location and class of fires and smoke in either image can be acquired based on the model. Under the condition that the deep learning model capable of realizing smoke and fire detection is obtained through preliminary training, the deep learning model obtained through the preliminary training can be evaluated by combining the recall rate of the image level and the false detection rate of the image level. Under the condition that the recall rate and the false detection rate of the current model are not achieved, the deep learning model can be further trained and optimized by combining a new sample image set.

According to embodiments of the present disclosure, the trained or optimized deep learning model may be deployed on a client, such as Jetson NX (a compact artificial intelligence edge computing device). After deployment, a user can input videos, pictures, folders and the like acquired in real time, and the deep learning model can analyze the contents to judge whether fire information or smoke information is contained in the contents. If included, the location and type of fire or smoke may be output.

Through the embodiment of the disclosure, the implemented smoke and fire detection method has a high detection speed, so that in smoke and fire detection scenes, the model can respond at the first time of fire occurrence. In addition, the smoke and fire detection method has higher detection accuracy, can distinguish object information and smoke and fire which are very close to each other in life, and can ensure higher detection accuracy.

Fig. 5 schematically illustrates a block diagram of a training apparatus of a deep learning model according to an embodiment of the present disclosure.

As shown in fig. 5, the training apparatus 500 of the deep learning model includes a determination module 510, a processing module 520, and a training module 530.

A determining module 510 is configured to determine an evaluation value for characterizing an evaluation performance of a deep learning model, the deep learning model being trained using the first set of sample images.

And a processing module 520, configured to process the first sample image set to obtain a second sample image set in response to detecting that the evaluation value does not reach the predefined range.

A training module 530 for training the deep learning model using the second set of sample images.

According to an embodiment of the present disclosure, the evaluation value includes a recall rate. The determining module includes a first detecting unit and a first determining unit.

And the first detection unit is used for detecting each image in the first image set by using the deep learning model to obtain a first target image with target information detected, and the first image set comprises the first image with the target information.

And the first determining unit is used for determining the recall rate according to the ratio of the number of the first target images to the number of the first images.

According to an embodiment of the present disclosure, the evaluation value includes a false detection rate. The determining module includes a second detecting unit and a second determining unit.

And the second detection unit is used for detecting each image in the second image set by using the deep learning model to obtain a second target image with target information detected, and the second image set comprises the second image without the target information.

And the second determining unit is used for determining the false detection rate according to the ratio of the number of the second target images to the number of the second images.

According to an embodiment of the present disclosure, the processing module includes at least one of: a processing unit and an adding unit.

And the processing unit is used for carrying out data augmentation processing on the first sample image set to obtain a second sample image set.

And the adding unit is used for adding negative sample images with the similarity with the target information larger than a preset threshold value in the first sample image set to obtain a second sample image set.

According to an embodiment of the present disclosure, the target information includes at least one of smoke information and fire information.

In accordance with an embodiment of the present disclosure, a deformable convolution sub-model is included in the deep learning model.

According to an embodiment of the present disclosure, the second sample image set includes images having at least one of the following information: cloud information, snow mountain information, light information, and sun information.

Fig. 6 schematically illustrates a block diagram of a smoke detection device according to an embodiment of the present disclosure.

As shown in fig. 6, the smoke detection device 600 includes an acquisition module 610.

The obtaining module 610 is configured to input an image to be detected into the deep learning model to obtain a detection result. The deep learning model is trained based on the training device.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

According to an embodiment of the present disclosure, an electronic device includes: at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the training method and the smoke detection method of the deep learning model of the present disclosure.

According to an embodiment of the present disclosure, a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform a training method and a smoke detection method of a deep learning model of the present disclosure.

According to an embodiment of the present disclosure, a computer program product comprises a computer program which, when executed by a processor, implements the training method and the smoke detection method of the deep learning model of the present disclosure.

Fig. 7 illustrates a schematic block diagram of an example electronic device 700 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 7, the apparatus 700 includes a computing unit 701 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 702 or a computer program loaded from a storage unit 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data required for the operation of the device 700 may also be stored. The computing unit 701, the ROM 702, and the RAM 703 are connected to each other through a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

Various components in device 700 are connected to I/O interface 705, including: an input unit 706 such as a keyboard, a mouse, etc.; an output unit 707 such as various types of displays, speakers, and the like; a storage unit 708 such as a magnetic disk, an optical disk, or the like; and a communication unit 709 such as a network card, modem, wireless communication transceiver, etc. The communication unit 709 allows the device 700 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

The computing unit 701 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 701 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 701 performs the respective methods and processes described above, such as a training method of a deep learning model and a smoke detection method. For example, in some embodiments, the training method of the deep learning model and the smoke detection method may be implemented as computer software programs tangibly embodied on a machine-readable medium, such as the storage unit 708. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 700 via ROM 702 and/or communication unit 709. When the computer program is loaded into the RAM 703 and executed by the computing unit 701, one or more steps of the training method of the deep learning model and the smoke detection method described above may be performed. Alternatively, in other embodiments, the computing unit 701 may be configured to perform the training method of the deep learning model and the smoke detection method in any other suitable manner (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A training method of a deep learning model, comprising:

determining an evaluation value for representing evaluation performance of a deep learning model, wherein the deep learning model is trained by using a first sample image set, the first sample image set comprises an image with target information, the image with target information comprises an image formed by cutting target information from the image with target information, expanding or shrinking the cut target information and pasting the cut target information into a randomly selected background picture, and the occupation area of the target information is smaller than 3% of the area of the image, and the target information comprises at least one of smoke information and fire information;

In response to detecting that the evaluation value does not reach a predefined range, processing the first sample image set to obtain a second sample image set, the processing the first sample image set to obtain a second sample image set comprising: adding negative sample images with similarity to target information larger than a preset threshold value in the first sample image set to obtain a second sample image set, wherein the second sample image set comprises images with at least one of the following information: cloud information, snow mountain information, light information and sun information; and

training the deep learning model by using the second sample image set;

wherein the evaluation value includes a false detection rate; the determining an evaluation value for characterizing evaluation performance of the deep learning model includes:

detecting each image in a second image set by using the deep learning model to obtain a second target image with target information detected, wherein the second image set comprises a second image without target information; and

determining the false detection rate according to the ratio of the number of the second target images to the number of the second images, wherein the process comprises: dividing the second image without the target information into a plurality of batches, calculating the proportion of the second image which is detected by mistake in each batch in the second image of each batch, obtaining a plurality of proportions, and determining the false detection rate according to the value of the proportion with the largest numerical value in the plurality of proportions, or the average value of the plurality of proportions, or the value of the proportion with the highest occurrence frequency in the plurality of proportions.

2. The method of claim 1, wherein the evaluation value comprises a recall rate;

the determining an evaluation value for characterizing evaluation performance of the deep learning model includes:

detecting each image in a first image set by using the deep learning model to obtain a first target image with target information detected, wherein the first image set comprises a first image with target information; and

and determining the recall rate according to the ratio of the number of the first target images to the number of the first images.

3. The method of any of claims 1-2, wherein the processing the first set of sample images to obtain a second set of sample images comprises:

and carrying out data augmentation processing on the first sample image set to obtain the second sample image set.

4. The method of claim 1, wherein the deep learning model includes a deformable convolution sub-model therein.

5. The method of claim 1, wherein the deep learning model is trained using the first set of sample images on the basis of a pre-training model.

6. A method of smoke detection comprising:

Inputting the image to be detected into a deep learning model to obtain a detection result;

wherein the deep learning model is trained based on the training method of any one of claims 1-5.

7. A training device for a deep learning model, comprising:

a determining module, configured to determine an evaluation value for characterizing an evaluation performance of a deep learning model, where the deep learning model is trained using a first sample image set, the first sample image set including an image with target information, the image with target information including an image formed by clipping target information from the image with target information, expanding or shrinking the clipped target information, and pasting the clipped target information into a randomly selected background picture, and a occupation area of the target information being less than 3% of an area of the image, the target information including at least one of smoke information and fire information;

a processing module, configured to process the first sample image set to obtain a second sample image set in response to detecting that the evaluation value does not reach a predefined range, where the processing module includes: an adding unit, configured to add, in the first sample image set, a negative sample image having a similarity with the target information greater than a preset threshold value, to obtain the second sample image set, where the second sample image set includes an image having at least one of the following information: cloud information, snow mountain information, light information and sun information; and

The training module is used for training the deep learning model by utilizing the second sample image set;

wherein the evaluation value includes a false detection rate; the determining module includes:

the second detection unit is used for detecting each image in a second image set by using the deep learning model to obtain a second target image with target information detected, wherein the second image set comprises a second image without target information; and

a second determining unit configured to determine the false detection rate according to a ratio of the number of the second target images to the number of the second images, the process including: dividing the second image without the target information into a plurality of batches, calculating the proportion of the second image which is detected by mistake in each batch in the second image of each batch, obtaining a plurality of proportions, and determining the false detection rate according to the value of the proportion with the largest numerical value in the plurality of proportions, or the average value of the plurality of proportions, or the value of the proportion with the highest occurrence frequency in the plurality of proportions.

8. The apparatus of claim 7, wherein the evaluation value comprises a recall rate;

The determining module includes:

the first detection unit is used for detecting each image in a first image set by utilizing the deep learning model to obtain a first target image with target information detected, wherein the first image set comprises first images with the target information; and

9. The apparatus of any of claims 7-8, wherein the processing module comprises:

and the processing unit is used for carrying out data augmentation processing on the first sample image set to obtain the second sample image set.

10. The apparatus of claim 7, wherein the deep learning model comprises a deformable convolution sub-model.

11. A smoke detection device comprising:

the acquisition module is used for inputting the image to be detected into the deep learning model to obtain a detection result;

wherein the deep learning model is trained based on the training apparatus of any one of claims 7-10.

12. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-5 or 6.

13. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-5 or 6.