CN116052053A

CN116052053A - Method and device for improving accuracy of monitoring image under intelligent text blog

Info

Publication number: CN116052053A
Application number: CN202310089431.9A
Authority: CN
Inventors: 郑禄; 艾勇; 周珊; 宋中山; 张大红; 帖军; 莫海芳; 许诺
Original assignee: Hubei Jiashidun Intelligent Technology Co ltd; South Central University for Nationalities
Current assignee: Hubei Jiashidun Intelligent Technology Co ltd; South Central Minzu University
Priority date: 2023-01-17
Filing date: 2023-01-17
Publication date: 2023-05-02

Abstract

The invention discloses a method and a device for improving accuracy of a monitoring image under a smart text blog, which are used for carrying out style migration processing on a target image obtained after image preprocessing on video information collected by the smart text blog monitoring equipment based on an improved CycleGAN model, wherein the improved CycleGAN model comprises an encoder integrated with a convolution kernel attention mechanism, a Res2Net converter integrated with the attention mechanism and a decoder, the S-NanoDet model is subjected to reinforcement training based on the migrated image, and the target detection is carried out on the monitoring image under a smart Wen Bo scene according to the trained S-NanoDet model. The invention realizes the mode of carrying out style migration expansion on the data set by improving the CycleGAN model, improves the robustness of the S-NanoDet model, further improves the picture identification efficiency aiming at each type of picture style in a complex scene and ensures the identification accuracy.

Description

Method and device for improving accuracy of monitoring image under intelligent text blog

Technical Field

The invention relates to the technical field of image processing, in particular to a method and a device for improving accuracy of a monitoring image under intelligent text blogging.

Background

Under a complex working environment, the style of a video picture collected by the monitoring camera is changed frequently, a picture in daytime can show different picture styles along with different time periods, and the picture can become dim under a scene lacking illumination at night, and at the moment, part of cameras can collect infrared data instead. The different picture styles can bring great challenges to the robustness of the deployment detection model, and the method is mainly characterized in that a data set adopted by model training can not well cover the different picture styles, so that the performance of the model is reduced when the model processes the data, and further, the accuracy is poor due to the fact that the confidence of a detection result is low and data errors are easy to occur.

The foregoing is provided merely for the purpose of facilitating understanding of the technical solutions of the present invention and is not intended to represent an admission that the foregoing is prior art.

Disclosure of Invention

The invention mainly aims to provide a method and a device for improving accuracy of a monitoring image under intelligent text blogging, and aims to solve the technical problem that in the prior art, images are not uniform due to various factors under the condition of identifying complex scenes, so that the identification effect of a network model is poor.

In order to achieve the above objective, the present invention provides a method for improving accuracy of a monitoring image under intelligent text, wherein the method for improving accuracy of the monitoring image under intelligent Wen Bo comprises the following steps:

Image preprocessing is carried out on video information collected by intelligent text blog monitoring equipment, and a processed target image is obtained;

performing style migration processing on the target image based on an improved CycleGAN model to obtain a migrated image, wherein the improved CycleGAN model comprises a generator and a discriminator, the generator comprises an encoder which is connected in sequence and is integrated with a convolution kernel attention mechanism, a Res2Net converter which is integrated with an attention mechanism and a decoder, the encoder is used for extracting features of an input image according to the convolution kernel attention mechanism, and the obtained feature map is input to the converter; the converter is used for carrying out feature fusion on the feature map according to the convolution kernel attention mechanism, and inputting the fused feature information to the decoder for feature conversion;

and performing reinforcement training on the S-NanoDet model based on the migrated image, and performing target detection on the monitoring image in the target intelligent Wen Bo scene according to the trained S-NanoDet model.

Optionally, before the step of performing style migration processing on the target image based on the improved CycleGAN model to obtain the migrated image, the method further includes:

Training the CycleGAN model by taking an image dataset containing normal hues of the intelligent Wen Bo scene and an infrared image as a model training dataset to obtain a pre-training model;

based on the pre-training model, combining a Res2Net residual structure integrated with an attention mechanism and an encoder integrated with a convolution kernel attention mechanism, obtaining an improved pre-training model;

and constructing an improved CycleGAN model based on the pre-training model.

Optionally, the step of constructing an improved CycleGAN model based on the pre-training model includes:

testing the improved pre-training model by using an image test set in an intelligent Wen Bo scene without image preprocessing to obtain a test result;

and when the test result meets a preset result, taking the improved pre-training model as an improved CycleGAN model.

Optionally, the step of performing reinforcement training on the S-NanoDet model based on the migrated image and performing target detection on the monitored image in the target smart Wen Bo scene according to the trained S-NanoDet model includes:

expanding a picture data set based on the migrated image, and performing reinforcement training on the S-NanoDet model according to the expanded picture data set to obtain a trained S-NanoDet model;

And carrying out image recognition on the monitoring image in the target wisdom Wen Bo scene according to the trained S-NanoDet model.

Optionally, the Res2Net residual structure is used to output a feature map combination comprising different perceived field sizes.

Optionally, the convolution kernel attention mechanism is an ECANet attention mechanism, and the ECANet attention mechanism adopts a local cross-channel interaction strategy without dimension reduction to realize information interaction through one-dimensional convolution.

Optionally, the image preprocessing includes four preprocessing including random clipping, rotation, color brightening and gaussian blur noise adding, and the step of performing image preprocessing on video information collected by the intelligent text blog monitoring device to obtain a processed target image includes:

and carrying out three preprocessing of random cutting, rotation and color brightening on video information collected by the intelligent text blog monitoring equipment to obtain a processed target image.

In addition, in order to achieve the above objective, the present invention further provides a device for improving accuracy of a monitoring image under an intelligent context, where the device for improving accuracy of a monitoring image under an intelligent Wen Bo includes:

the image preprocessing module is used for carrying out image preprocessing on the video information acquired by the intelligent text blog monitoring equipment to obtain a processed target image;

The style migration module is used for performing style migration processing on the target image based on an improved CycleGAN model to obtain a migrated image, the improved CycleGAN model comprises a generator and a discriminator, the generator comprises an encoder which is connected in sequence and is integrated with a convolution kernel attention mechanism, a Res2Net converter which is integrated with the attention mechanism and a decoder, the encoder is used for extracting characteristics of the input image according to the convolution kernel attention mechanism, and the obtained characteristic diagram is input to the converter; the converter is used for carrying out feature fusion on the feature map according to the convolution kernel attention mechanism, and inputting the fused feature information to the decoder for feature conversion;

and the target detection module is used for carrying out reinforcement training on the S-NanoDet model based on the migrated image, and carrying out target detection on the monitoring image in the target intelligent Wen Bo scene according to the trained S-NanoDet model.

Optionally, the monitoring image accuracy improving device under the intelligence Wen Bo further includes: a model training module;

the module training module is used for training the CycleGAN model by taking an image dataset containing the normal tone of the intelligent Wen Bo scene and the infrared image as a model training dataset to obtain a pre-training model;

The module training module is further used for obtaining an improved pre-training model based on the pre-training model by combining a Res2Net residual structure integrated with an attention mechanism and an encoder integrated with a convolution kernel attention mechanism;

the module training module is also used for constructing an improved CycleGAN model based on the pre-training model.

Optionally, the module training module is further configured to test the improved pre-training model with an image test set under a smart Wen Bo scene without image preprocessing, so as to obtain a test result;

the module training module is further configured to use the improved pre-training model as an improved CycleGAN model when the test result meets a preset result.

The method comprises the steps of performing image preprocessing on video information acquired by intelligent text blog monitoring equipment to obtain a processed target image; performing style migration processing on the target image based on an improved CycleGAN model to obtain a migrated image, wherein the improved CycleGAN model comprises a generator and a discriminator, the generator comprises an encoder which is connected in sequence and is integrated with a convolution kernel attention mechanism, a Res2Net converter which is integrated with an attention mechanism and a decoder, the encoder is used for extracting features of an input image according to the convolution kernel attention mechanism, and the obtained feature map is input to the converter; the converter is used for carrying out feature fusion on the feature map according to the convolution kernel attention mechanism, and inputting the fused feature information to the decoder for feature conversion; and performing reinforcement training on the S-NanoDet model based on the migrated image, and performing target detection on the monitoring image in the target intelligent Wen Bo scene according to the trained S-NanoDet model. According to the invention, the target image is subjected to style migration processing by improving the CycleGAN model, the S-NanoDet model is subjected to reinforcement training based on the migrated image, and the target detection is carried out on the monitoring image in the target intelligent Wen Bo scene according to the trained S-NanoDet model.

Drawings

FIG. 1 is a flowchart of a first embodiment of a method for improving accuracy of a monitoring image according to the present invention under intelligence Wen Bo;

FIG. 2 is a schematic diagram of a first embodiment of a method for improving accuracy of a monitored image in accordance with the present invention;

FIG. 3 is a schematic diagram of a CycleGAN method for improving accuracy of a monitoring image according to a first embodiment of the present invention under intelligence Wen Bo;

FIG. 4 is a diagram showing the structure of a generator and a arbiter according to a first embodiment of the present invention for improving the accuracy of a monitored image under the intelligence Wen Bo;

FIG. 5 is a diagram showing a comparison between ResNet and Res2Net residual block structures of a first embodiment of a method for improving accuracy of a monitored image according to the present invention;

FIG. 6 is a diagram showing a comparison of the data set style migration and expansion before and after the second embodiment of the method for improving the accuracy of the monitored image in the intelligent Wen Bo of the present invention;

FIG. 7 is a block diagram illustrating a first embodiment of a monitor image accuracy improvement device according to the present invention under the intelligence Wen Bo.

The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.

Detailed Description

It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

Referring to fig. 1, fig. 1 is a flowchart illustrating a first embodiment of a method for improving accuracy of a monitoring image in intelligence Wen Bo according to the present invention, and a first embodiment of a method for improving accuracy of a monitoring image in intelligence Wen Bo according to the present invention is provided.

In this embodiment, the method for improving the accuracy of the monitoring image under the intelligence Wen Bo includes the following steps:

step S10: and performing image preprocessing on the video information acquired by the intelligent text blog monitoring equipment to obtain a processed target image.

It should be noted that, the execution subject of the present embodiment may be a device with an image recognition function in a complex smart Wen Bo scenario, where the device is as follows: computers, notebooks, computers, tablets, etc., may also be other image recognition devices that may perform the same or similar functions. This embodiment is not limited thereto. This embodiment and the following embodiments will be described herein by taking the above-described computer as an example.

It will be appreciated that as the Wen Bo industry has evolved digitally, many venues in the blogging industry are monitored by installing intelligent blogging monitoring devices, wherein the intelligent Wen Bo monitoring devices may be devices installed in intelligent museums for monitoring exhibits, persons and vehicles, or devices installed in school students. The monitoring equipment can be connected with the monitoring platform in an Internet of things mode, so that the monitoring platform can acquire monitoring pictures in real time, and image processing is facilitated.

It should be understood that the video information may be video information composed of multiple frames of video frames, where the video information is a set of video frames generated according to shooting parameters corresponding to the monitoring device, and when the video information is obtained to ensure that the frame parameters in the video information conform to the size processed by the improved CycleGAN model, image preprocessing is required to be performed on the video information to obtain a processed target image, where the target image may refer to an image that needs to undergo style migration.

Further, the image preprocessing includes three preprocessing of random clipping, rotation, and color brightening, and the step S10 includes: and carrying out three preprocessing of random cutting, rotation and color brightening on video information collected by the intelligent text blog monitoring equipment to obtain a processed target image.

It should be noted that, for the task of image recognition, the preprocessing process can significantly improve the quality of feature extraction, so as to improve the recognition effect of the model, so that image preprocessing is an important operation. The image preprocessing can correspondingly process the sample image according to the network model recognition requirement, so that the interference of negative information in the sample image is weakened, and the positive information of the sample image is highlighted to meet the data input form actually required.

It can be understood that the real intelligent Wen Bo scene is complex and changeable, the problem can be effectively solved by preprocessing the data set by a reasonable method, the image recognition effect of the model under different light rays is improved, and the recognition effect of the model can be influenced to a certain extent because the data can be unevenly distributed, so that the image to be recognized can be processed by the preprocessing method.

Step S20: performing style migration processing on the target image based on an improved CycleGAN model to obtain a migrated image, wherein the improved CycleGAN model comprises a generator and a discriminator, the generator comprises an encoder which is connected in sequence and is integrated with a convolution kernel attention mechanism, a Res2Net converter which is integrated with an attention mechanism and a decoder, the encoder is used for extracting features of an input image according to the convolution kernel attention mechanism, and the obtained feature map is input to the converter; the converter is used for carrying out feature fusion on the feature map according to the convolution kernel attention mechanism, and inputting the fused feature information to the decoder for feature conversion.

It should be noted that, in order to solve the technical problem of the present solution, a specific structure may be added to the model to preprocess the input data so as to improve the robustness of the model, but the method is only suitable for small-scale use, because the complexity and the calculated amount of the model are required to be as small as possible by the detection model deployed by the edge device, and the method tends to increase the complexity and the calculated amount of the model to some extent. Secondly, a corresponding data set is acquired and manufactured aiming at a specific scene, and then a model is trained on a new data set, so that the model is adapted to the scene, but a large amount of resources are consumed for respectively manufacturing the corresponding data set due to numerous deployed scene styles. In order to quickly obtain the data set with the corresponding style, the style of generating the specific picture against the network learning can be used, and then, the style of part of the picture of the existing data set is migrated to the corresponding style, so that the data set can be quickly expanded with low cost. The expansion mode only needs pictures corresponding to the scene, can be obtained through real-time acquisition of the camera, and after style migration is carried out on the existing data set, the label information of the original pictures can still be normally used, so that the workload is greatly reduced.

It is understood that the improved CycleGAN model is a model obtained by improving the CycleGAN model based on the CycleGAN model, the CycleGAN model is developed from a basic generation countermeasure network, and for further explaining the principle and structure of the CycleGAN model, referring to the basic principle schematic diagram of the generation countermeasure network shown in fig. 2, it is known that the main structure of the generation countermeasure network includes a Generator G and a discriminator D (Discriminator). The training of the GAN model is mainly divided into two stages, wherein the first stage is a stage of fixing the discriminator and a training generator, the generator continuously generates false data through random noise and sends the false data to the discriminator for judgment, and in an initial state, the data generated by the generator is easily identified as the false data by the discriminator, but the data generated by the generator can be cheated by the discriminator finally along with the continuous deep training; the second stage is a stage of fixing the generator and training the discriminator, and the data generated by the generator trained in the first stage is enough to confuse the discriminator, so that the discrimination capability of the discriminator is improved by training the discriminator through the data set, and the data of the generator cannot be identified by the discriminator again. The two phases are repeatedly circulated, and finally, a generator with good effect can be obtained to generate corresponding picture data. The application of generating the countermeasure network is very wide, and the scheme can realize style migration of the data set pictures by improving the CycleGAN model.

Further, the CycleGAN model is developed from the underlying generation countermeasure network, and the cyclic consistency loss is introduced, so that the model can be trained without paired data sets, and the requirement on the data sets is reduced. The CycleGAN model consists of 2 generators and 2 discriminators, assuming a given sample space X and Y,in order to make the generated picture

The style of the picture in the Y is consistent with that of the picture in the Y, and the content of the input picture X corresponding to the picture is the same, so that 2 generators G and F are designed, a model learns two generators simultaneously, the picture in the X domain can be correspondingly converted back after being required to be converted into the Y domain, and further the cycle consistency loss is introduced. The loss function of CycleGAN consists of two parts:

Loss＝Loss _GAN +Loss _cycle

Loss _GAN ensuring the mutual optimization of the generator and the discriminator, further ensuring that the generator can generate more real pictures, while Loss _cycle The output picture and the input picture of the generator are only different in style and the content is the same.

It will be appreciated that, to further illustrate the CycleGAN principle, referring to the CycleGAN principle schematic diagram shown in fig. 3, (a) is a model structure of CycleGAN, and includes two mapping functions G: x→y and F: y→x, and the two learned mappings G and F can be guaranteed not to contradict each other by the cyclic consistency loss, the sample in the X domain generates the image in the Y domain by the generator G, and the sample in the Y domain can also generate the image in the X domain by the generator F. (b) The first procedure described above, the samples in the X-domain generating images in the Y-domain by generator G

Then the generator F generates an image

Model hope->

As x, i.e. +.>

The difference between them is a loss; similarly (c) is a dual process, i.e. for each image Y in the Y-domain, hopefully +.>

As with original y, i.e

As shown in fig. 3, the CycleGAN model is a ring structure, X represents an image of an X domain, and Y represents an image of a Y domain. The image in the X domain generates an image in the Y domain through a generator G, and then the original image input in the X domain is reconstructed through a generator F; the image in the Y domain generates an image in the X domain by a generator F, and the original image input in the Y domain is reconstructed by a generator G. The discriminators Dx and Dy play a role in discriminating, and it can be seen that the generator of CycleGAN mainly consists of 3 parts, as shown in fig. 4 (a), namely, an encoder, a converter and a decoder, respectively, by referring to the schematic diagrams of the generator and the discriminators shown in fig. 4 for ensuring the style migration of the image. The encoder is constituted by a common convolutional network, the converter by a residual network, and the decoder by a deconvolution network. The discriminator is a common convolution network, as shown in fig. 4 (b), the feature extraction of the picture is completed by a plurality of convolution layers, and the final layer is a full connection layer to realize the true or false judgment of the picture. / >

In a specific implementation, though the CycleGAN model can generate pictures of a designated style through the analysis, because the fine granularity of the CycleGAN model is insufficient to support picture style migration in various scenes, the CycleGAN model needs to be improved, and style migration processing is performed on the target image based on the improved CycleGAN model to obtain a migrated image, wherein the improved CycleGAN model comprises a generator and a discriminator, the generator comprises an encoder which is connected in sequence and is integrated with a convolution kernel attention mechanism, a converter which is integrated with a Res2Net of the attention mechanism, and a decoder, the encoder is used for extracting characteristics of the input image according to the convolution kernel attention mechanism, and the obtained characteristic image is input to the converter; the converter is used for carrying out feature fusion on the feature map according to the convolution kernel attention mechanism, and inputting the fused feature information to the decoder for feature conversion.

Further, before the step S20, the method further includes: training the CycleGAN model by taking an image dataset containing normal hues of the intelligent Wen Bo scene and an infrared image as a model training dataset to obtain a pre-training model; based on the pre-training model, combining a Res2Net residual structure integrated with an attention mechanism and an encoder integrated with a convolution kernel attention mechanism, obtaining an improved pre-training model; and constructing an improved CycleGAN model based on the pre-training model.

It should be noted that, in the scheme, the normal tone of the smart Wen Bo scene and the image dataset of the infrared image are used as the model training dataset to train the cyclgan network model to obtain the pre-training model, but the monitoring image in the complex environment is interfered by factors such as environment, light and the like, the factors become noise in the model identification, the noise information is transmitted in the model training process, and the weight occupied by the noise is continuously increased along with the deepening of the network layer number, so that the identification effect of the model is finally affected. Therefore, a further improvement on the pretrained model of CycleGAN is needed, and therefore, a convolution kernel attention mechanism is introduced on the residual structure of CycleGAN to improve the recognition accuracy. And obtaining an improved pre-training model through the improvement. And building an improved CycleGAN model based on the pre-training model.

It can be understood that the improved CycleGAN model is based on a CycleGAN network model, and combines a Res2Net residual structure integrated with an attention mechanism and an encoder integrated with a convolution kernel attention mechanism to obtain an improved pre-training model; and constructing an improved CycleGAN model based on the pre-training model.

Further, the Res2Net residual structure is used to output a feature map combination that includes different perceived field sizes.

It should be noted that, for the structural features of the CycleGAN model, res2Ne is used to replace the residual network in the CycleGAN converter. As shown in FIG. 5, the residual block structures of ResNet and Res2Net are compared, and it can be seen that Res2Net is present in a single residueThe difference block internally constructs class residual connections with different levels, representing multi-scale features at granularity levels, and increasing the receptive field of each network layer. The main difference with ResNet is that the 3×3 convolution in ResNet is replaced, and after 1×1 convolution, the feature map is divided into 4 parts. First part x ₁ Directly to y without treatment ₁ The method comprises the steps of carrying out a first treatment on the surface of the Second part x ₂ After 3 x 3 convolution, the signal is divided into two lines, one line is continuously transmitted to y ₂ The other is transferred to x ₃ The third line thus obtains information of the second line, and so on, and finally obtains output. The output of the Res2Net module contains a combination of different receptive field sizes, which structure facilitates the extraction of global and local information.

Further, the convolution kernel attention mechanism is an ECANet attention mechanism, and the ECANet attention mechanism adopts a local cross-channel interaction strategy without dimension reduction to realize information interaction through one-dimensional convolution.

It should be noted that, in order to further improve the performance of the backbone network of the CycleGAN model, an ECANet attention mechanism is designed to be connected to the Res2Net residual structure and the encoder, which has a lower complexity compared with the conventional SENet. ECANet mainly adopts a local cross-channel interaction strategy without dimension reduction, and the strategy can be simply realized through one-dimensional convolution, so that the dimension reduction operation in SENET is avoided, and meanwhile, the performance of the structure is maintained.

In the specific implementation, training a CycleGAN model by taking an image dataset containing normal tone of a smart Wen Bo scene and an infrared image as a model training dataset to obtain a pre-training model; based on the pre-training model, combining a Res2Net residual structure integrated with an attention mechanism and an encoder integrated with a convolution kernel attention mechanism, obtaining an improved pre-training model; and constructing an improved CycleGAN model based on the pre-training model. Step S30: and performing reinforcement training on the S-NanoDet model based on the migrated image, and performing target detection on the monitoring image in the target intelligent Wen Bo scene according to the trained S-NanoDet model.

It should be noted that, by improving the style migration and expansion mode of the CycleGAN model to the data set, the robustness of the S-NanoDet model can be improved to a certain extent, so when an image after the migration of the improved CycleGAN model is obtained, the data set is expanded based on the image, and the S-NanoDet model is subjected to reinforcement training, thereby widening the use scene of the S-NanoDet model rapidly with low cost. The S-NanoDet model can be a lightweight model for target detection of a monitored image in a complex smart Wen Bo scene, and the model is a model constructed based on the NanoDet model.

It can be understood that the target information in the monitored image can be accurately identified by performing target detection on the monitored image in the target wisdom Wen Bo scene through the trained S-NanoDet model.

Further, the step S30 further includes: expanding a picture data set based on the migrated image, and performing reinforcement training on the S-NanoDet model according to the expanded picture data set to obtain a trained S-NanoDet model; and carrying out image recognition on the monitoring image in the target wisdom Wen Bo scene according to the trained S-NanoDet model.

It should be noted that the image dataset may be a preset image dataset for training the S-NanoDet model, and the image dataset may be an image dataset expanded by taking an infrared image style common to a monitored scene as a migration target.

It can be understood that the method can be used for carrying out style migration on the data set image so as to rapidly expand the data set, and then carrying out reinforcement training on the target detection model (S-NanoDet model) so as to improve the accuracy of the target detection model under a specific deployment environment.

In the specific implementation, the image dataset is expanded by taking the common infrared image style of the monitoring scene as a migration target, and the expanded dataset is subjected to reinforcement training on the S-NanoDet model to obtain a trained S-NanoDet model; and carrying out image recognition on the monitoring image in the target wisdom Wen Bo scene according to the trained S-NanoDet model.

According to the method, the image preprocessing is carried out on the video information collected by the intelligent text blog monitoring equipment, so that a processed target image is obtained; performing style migration processing on the target image based on an improved CycleGAN model to obtain a migrated image, wherein the improved CycleGAN model comprises a generator and a discriminator, the generator comprises an encoder which is connected in sequence and is integrated with a convolution kernel attention mechanism, a Res2Net converter which is integrated with an attention mechanism and a decoder, the encoder is used for extracting features of an input image according to the convolution kernel attention mechanism, and the obtained feature map is input to the converter; the converter is used for carrying out feature fusion on the feature map according to the convolution kernel attention mechanism, and inputting the fused feature information to the decoder for feature conversion; and performing reinforcement training on the S-NanoDet model based on the migrated image, and performing target detection on the monitoring image in the target intelligent Wen Bo scene according to the trained S-NanoDet model. According to the invention, the target image is subjected to style migration processing by improving the CycleGAN model, the S-NanoDet model is subjected to reinforcement training based on the migrated image, and the target detection is carried out on the monitoring image in the target intelligent Wen Bo scene according to the trained S-NanoDet model.

Based on the first embodiment shown in fig. 1, a second embodiment of the method for improving accuracy of monitoring images under the intelligence Wen Bo of the present invention is provided.

In this embodiment, the step of constructing an improved CycleGAN model based on the pre-training model includes: testing the improved pre-training model by using an image test set in an intelligent Wen Bo scene without image preprocessing to obtain a test result; and when the test result meets a preset result, taking the improved pre-training model as an improved CycleGAN model.

It should be noted that, the image test set in the intelligent Wen Bo scenario without image preprocessing may be a KAIST data set, where the data set collectively includes 95328 pictures, each including normal color and two versions of infrared images, and the size of the pictures is 640×480, respectively, in the daytime and at night, where various conventional traffic occasions including campuses, streets and countryside are captured. In a dataset style migration experiment to improve the CycleGAN model, the batch_size may be set to 1, the learning rate may be 0.0002, and 180 epochs may be trained. In the experiment, the picture combination 4946 of the corresponding scene is extracted to serve as an image test set of the improved CycleGAN model to test the improved pre-training model, and a test result is obtained.

It is understood that the test result may refer to a result of comparing the training Loss of the improved CycleGAN model with the CycleGAN model, wherein the preset result refers to a result that the training Loss of the improved CycleGAN is generally lower than that of the CycleGAN model. The results indicate that the improved model is more effective in learning the style of the real sample.

In a specific implementation, when the training Loss of the improved CycleGAN is generally lower than that of the CycleGAN model, the improved pre-training model is used as the improved CycleGAN model.

Further, before the model is put into practical application, in order to further verify whether the data set expanded by the improved CycleGAN model can improve the accuracy of the S-NanoDet model in the infrared image, 2000 training set pictures in the PASCAL VOC2012 data set are randomly extracted to perform style migration, 500 pictures are randomly extracted from the verification set to perform style migration, and the migrated pictures are respectively added into corresponding picture sets to train the S-NanoDet target detection model.

Table 1 data set style migration experimental data comparison

As shown in table 1, when the verification set is not extended, the detection accuracy of the model is slightly reduced after the training set is extended, mainly because the difference of the styles of the infrared image and the original image is too large, the detection accuracy of the model is affected to a certain extent, but because the occupation of the extended picture in the original data set is smaller, the influence degree is also smaller; when the training set is not expanded and the verification set is expanded, the accuracy of the model is reduced by 4.3%, because the model is not trained for pictures of a corresponding style, the accuracy is reduced greatly; finally, when the training set and the verification set are both expanded, the detection accuracy of the model is close to that of the original condition, and compared with the condition that the training set is not expanded, the accuracy of the model is improved by 3.9%, so that the detection accuracy of the target detection model to the specific style picture can be improved through style migration expansion of the data set.

The S-NanoDet model trained by the data set before and after the style migration and expansion is subjected to experiments on other pictures outside the data set, the effect is shown in a comparison schematic diagram before and after the style migration and expansion of the data set in the figure 6, the accuracy of the model trained by using the migration and expansion data set on the infrared picture is higher, and more target objects are detected, so that the effectiveness of the migration and expansion method can be verified. It is worth noting that the infrared images with larger migration difficulty and larger style difference are selected as migration targets in the experiment, so that the effectiveness of the method is highlighted, but the obtained experiment effect is more conservative due to the larger difficulty.

Referring to fig. 7, fig. 7 is a block diagram illustrating a first embodiment of a monitoring image accuracy enhancing apparatus according to the present invention under the intelligence Wen Bo.

As shown in fig. 7, the device for improving accuracy of a monitoring image under an intelligent text blog according to the embodiment of the present invention includes:

the image preprocessing module 10 is used for performing image preprocessing on the video information acquired by the intelligent text blog monitoring equipment to obtain a processed target image;

a style migration module 20, configured to perform style migration processing on the target image based on an improved CycleGAN model, to obtain a migrated image, where the improved CycleGAN model includes a generator and a discriminator, the generator includes an encoder, a converter and a decoder, and the encoder is configured to perform feature extraction on an input image according to a convolution kernel attention mechanism, and input an obtained feature map to the converter, where the encoder is connected in sequence and the decoder is connected in sequence; the converter is used for carrying out feature fusion on the feature map according to the convolution kernel attention mechanism, and inputting the fused feature information to the decoder for feature conversion;

the target detection module 30 is configured to perform reinforcement training on the S-NanoDet model based on the migrated image, and perform target detection on the monitored image in the target smart Wen Bo scene according to the trained S-NanoDet model.

Further, the monitoring image accuracy improving device under the intelligence Wen Bo further comprises: a model training module;

Further, the module training module is further configured to test the improved pre-training model by using an image test set in a smart Wen Bo scene without image preprocessing, so as to obtain a test result;

Further, the target detection module 30 is further configured to expand a picture data set based on the migrated image, and perform reinforcement training on the S-NanoDet model according to the expanded picture data set, so as to obtain a trained S-NanoDet model; and carrying out image recognition on the monitoring image in the target wisdom Wen Bo scene according to the trained S-NanoDet model.

Further, the image preprocessing module 10 is further configured to perform three preprocessing including random clipping, rotation, and color brightening on the video information collected by the intelligent text blog monitoring device, so as to obtain a processed target image.

It should be understood that the foregoing is illustrative only and is not limiting, and that in specific applications, those skilled in the art may set the invention as desired, and the invention is not limited thereto.

It should be noted that the above-described working procedure is merely illustrative, and does not limit the scope of the present invention, and in practical application, a person skilled in the art may select part or all of them according to actual needs to achieve the purpose of the embodiment, which is not limited herein.

In addition, technical details not described in detail in the present embodiment may refer to the method for improving accuracy of a monitoring image under the intelligent text provided by any embodiment of the present invention, which is not described herein again.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.

The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the terms first, second, third, etc. do not denote any order, but rather the terms first, second, third, etc. are used to interpret the terms as names.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. read only memory mirror (Read Only Memory image, ROM)/random access memory (Random Access Memory, RAM), magnetic disk, optical disk), comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method according to the embodiments of the present invention.

The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims

1. The method for improving the accuracy of the monitoring image under the intelligent text blog is characterized by comprising the following steps of:

2. The method for improving accuracy of a monitored image under a smart text blog according to claim 1, wherein before the step of performing style migration processing on the target image based on the improved CycleGAN model to obtain a migrated image, the method further comprises:

and constructing an improved CycleGAN model based on the pre-training model.

3. The method for improving accuracy of monitored images under intelligent context as set forth in claim 2, wherein said step of constructing an improved CycleGAN model based on said pre-training model comprises:

4. The method for improving accuracy of a monitored image under a wisdom context as set forth in claim 1, wherein the step of performing the target detection on the monitored image under the target wisdom Wen Bo scene based on the migrated image to perform the reinforcement training on the S-NanoDet model and based on the trained S-NanoDet model comprises:

5. The method for improving accuracy of monitored images under intelligent text as set forth in claim 1, wherein said Res2Net residual structure is used for outputting feature map combinations comprising different perceived field sizes.

6. The method for improving accuracy of monitoring images under intelligent text blogs according to claim 1, wherein the convolution kernel attention mechanism is an ECANet attention mechanism, and the ECANet attention mechanism adopts a local cross-channel interaction strategy without dimension reduction to realize information interaction through one-dimensional convolution.

7. The method for improving accuracy of monitored images under intelligent text blogging as claimed in claim 1, wherein said image preprocessing comprises four preprocessing of random clipping, rotation, color brightening and gaussian blur noise adding, said step of image preprocessing video information collected by intelligent text blogging monitoring equipment to obtain processed target images comprises:

8. The utility model provides a wisdom text is monitoring image degree of accuracy hoisting device under, its characterized in that, monitoring image degree of accuracy hoisting device under the wisdom Wen Bo includes:

9. The apparatus for improving accuracy of a monitored image under a smart text blog according to claim 8, wherein the apparatus for improving accuracy of a monitored image under a smart Wen Bo further comprises: a model training module;

10. The apparatus for improving accuracy of monitored images under intelligent context as set forth in claim 9, wherein said module training module is further configured to test an improved pre-training model with an image test set under an intelligent Wen Bo scenario without image preprocessing to obtain a test result;