CN114155598A

CN114155598A - Training method and device of image processing model and electronic equipment

Info

Publication number: CN114155598A
Application number: CN202111340104.3A
Authority: CN
Inventors: 李坡; 王原原; 郑佳
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Priority date: 2021-11-12
Filing date: 2021-11-12
Publication date: 2022-03-08

Abstract

A training method, a device and an electronic device of an image processing model are provided, wherein the method comprises the following steps: and performing multiple times of iterative training on the first image processing model, and determining a target image processing model corresponding to the first image processing model based on the image processing model obtained by each iterative training in the multiple times of iterative training and the loss ratio corresponding to each iterative training. By the method, after the first image processing model is obtained, the first image processing model is subjected to multiple times of iterative training, the image processing model with the most accurate prediction result is screened from the multiple times of iterative image processing models, and the image processing model is used as the target image processing model, so that the accuracy of the target image processing model is ensured.

Description

Training method and device of image processing model and electronic equipment

Technical Field

The present application relates to the field of image recognition technologies, and in particular, to a training method and apparatus for an image processing model, and an electronic device.

Background

At the electric power maintenance operation scene, in order to ensure staff's safety, prevent the accident that high altitude construction personnel fall, require the staff to wear the safety belt according to the regulation. In order to improve the efficiency of safety belt supervision, an intelligent safety belt detection system is needed.

At present, in order to detect whether workers wear safety belts or not, a method for detecting safety caps and safety belts based on a deep convolutional neural network is adopted, images of workers in videos are detected and wearing conditions of the safety caps and the safety belts are judged through a deep convolutional neural network model and a spatial correlation model for locking the safety caps and the safety belts, an image processing model is obtained based on the deep convolutional neural network model and the spatial correlation model, however, in the deep convolutional neural network training process of the image processing model, image features are extracted again by reducing the number of channels and increasing the number of channels, when the number of channels is increased, more image features are obtained, overfitting of the image processing model in the training process is caused, although the overfitting can lead the accuracy of a prediction result in the image processing model training process to be high, however, when the image processing model is used for actual prediction, the accuracy of the actual prediction result is rather reduced, and when the image characteristics are used for detecting whether a worker wears a safety belt, the worker cannot be accurately locked, so that the accuracy of the detection result of the image processing model is low.

Disclosure of Invention

The application provides a training method and device of an image processing model and electronic equipment, iterative training is carried out on an i +1 th image processing model through an i +1 th loss occupation ratio, the i +1 th loss occupation ratio is obtained after the i +1 th image processing model is trained, the image processing model with the highest accuracy of a prediction result is screened out from a plurality of image processing models which are subjected to iterative training, and the image processing model is used as a target image processing model, so that the accuracy of the detection result of the target image processing model is improved.

In a first aspect, the present application provides a method for training an image processing model, the method comprising:

performing multiple iterative training on a first image processing model, and determining a target image processing model corresponding to the first image processing model based on the image processing model obtained by each iterative training in the multiple iterative training and the loss ratio corresponding to each iterative training, wherein the (i + 1) th iterative training in the multiple iterative training comprises:

according to the ith loss ratio corresponding to the ith iterative training, performing sample expansion processing on at least part of ith sample images in an ith sample image set used by the (i + 1) th iterative training to obtain at least one expansion sample;

training an ith image processing model obtained by the ith iterative training based on the at least one extended sample and the ith sample image set to obtain an (i + 1) th image processing model corresponding to the (i + 1) th iterative training and an (i + 1) th loss ratio corresponding to the (i + 1) th iterative training, wherein i is a positive integer.

In one possible design, determining a target image processing model corresponding to the first image processing model based on an image processing model obtained by each iterative training in a plurality of iterative training and a loss ratio corresponding to each iterative training includes:

obtaining M image processing models and M loss ratios of multiple times of iterative training, wherein M is a positive integer;

and taking the image processing model corresponding to the minimum loss occupation ratio as a target image processing model.

In one possible design, the process of obtaining a first image processing model includes:

and inputting the first sample image into a preset network for training to obtain a first image processing model.

In one possible design, according to an ith loss ratio corresponding to an ith iterative training, performing sample expansion processing on at least part of ith sample images in an ith sample image set used by an (i + 1) th iterative training to obtain at least one expanded sample, including:

judging whether the received ith loss ratio exceeds a preset threshold value;

if yes, scaling at least part of the ith sample image in the ith sample image set to obtain a plurality of scaled images, splicing the scaled images with the same resolution in the plurality of scaled images to obtain at least one first extended sample, and determining the first extended sample as the ith sample image in the ith sample image set;

if not, performing rotation processing on at least part of the ith sample image in the ith sample image set to obtain a plurality of rotation images, adding pixel points of the rotation images with the same size in the plurality of rotation images according to the weight values of the images to obtain at least one second expansion sample, and determining the second expansion sample as the ith sample image in the ith sample image set.

In one possible design, before training the ith image processing model obtained by the ith iterative training, the method includes:

converting the at least one extended sample of RGB space and each image in the ith sample image set to HIS space;

decomposing each image into a high-frequency component and a low-frequency component according to the brightness, wherein the high-frequency component is subjected to linear weighting enhancement processing, and the low-frequency component is subjected to histogram equalization processing;

and fusing the high-frequency component processed by linear weighting enhancement and the low-frequency component processed by histogram equalization in each image, and inverting each fused image from the HIS space to the RGB space.

In one possible design, training an ith image processing model obtained by the ith iterative training based on the at least one extended sample and the ith sample image set to obtain an i +1 th image processing model corresponding to the i +1 th iterative training and an i +1 th loss ratio corresponding to the i +1 th iterative training includes:

inputting the at least one extended sample and the ith sample image set into a prediction network for training, and obtaining an image feature set of the (i + 1) th sample image set, wherein images in the image feature set have different resolutions;

and performing prediction training on the (i + 1) th sample image set based on the image feature set to obtain an (i + 1) th loss ratio.

In one possible design, the predictive training of the i +1 th sample image set based on the image feature set includes:

acquiring a target object image in an i +1 th sample image set, and detecting whether the image area of the target object image exceeds a preset area;

if so, taking the target object image exceeding the preset area as a first target object image;

if not, taking the target object image with the area lower than the preset area as a second target object image;

and performing prediction training on the first target object image and the second target object image based on the feature image set according to a preset rule.

In one possible design, performing prediction training on the first target object image and the second target object image based on the feature image according to a preset rule includes:

respectively extracting a first positive sample and a first negative sample corresponding to the first target object image and a second positive sample and a second negative sample corresponding to the second target object image, wherein the positive sample is an image corresponding to a specified area in the target object image, and the negative sample is an image corresponding to an area outside the specified area in the target object image;

when the target object image is a first target object image, predicting the first positive sample and the first negative sample by using the characteristic image with the first resolution;

and when the target object image is a second target object image, predicting the second positive sample and the second negative sample by using the characteristic image of the second resolution.

In a second aspect, the present application provides a method of behavior recognition, the method comprising:

performing behavior recognition on an image to be processed containing a target object based on a trained target image processing model, and determining whether the target object has abnormal behavior, wherein the abnormal behavior is that the target object does not wear a safety belt, and the target image processing model is obtained by training a first image processing model based on the method of any one of claims 1 to 8.

In one possible design, performing behavior recognition on an image to be processed including a target object based on a trained target image processing model, and determining whether the target object has an abnormal behavior includes:

if the target object in the current frame has abnormal behavior, outputting alarm information;

and determining that the identification information of the target object in the next frame of the current frame is inconsistent with the identification information of the target object in the current frame, and outputting alarm information.

In a third aspect, the present application provides an apparatus for training an image processing model, the apparatus comprising:

the iteration module is used for carrying out multiple times of iterative training on the first image processing model, and determining a target image processing model corresponding to the first image processing model based on the image processing model obtained by each iterative training in the multiple times of iterative training and the loss ratio corresponding to each iterative training;

the extension module is used for carrying out sample extension processing on at least part of ith sample images in an ith sample image set used by the ith +1 th iterative training according to the ith loss ratio corresponding to the ith iterative training to obtain at least one extension sample;

and the training module is used for training an ith image processing model obtained by the ith iterative training based on the at least one extended sample and the first sample image set to obtain an i +1 image processing model corresponding to the (i + 1) th iterative training and an i +1 loss ratio corresponding to the (i + 1) th iterative training.

In one possible design, the iteration module is specifically configured to obtain M image processing models and M loss ratios of multiple iterative training, and use an image processing model corresponding to the minimum loss ratio as a target image processing model.

In a possible design, the iteration module is further configured to input the first sample image into a preset network for training, so as to obtain a first image processing model.

In a possible design, the extension module is specifically configured to determine whether the received ith loss ratio exceeds a preset threshold, and if so, scaling at least part of the ith sample image in the ith sample image set to obtain a plurality of scaled images, and the scaled images with the same resolution in the plurality of scaled images are spliced to obtain at least one first extended sample, the first extended sample is determined as the ith sample image in the ith sample image set, if not, performing rotation processing on at least part of the ith sample image in the ith sample image set to obtain a plurality of rotated images, and adding pixel points of the rotating images with the same size in the plurality of rotating images according to the weight values of the images to obtain at least one second extended sample, and determining the second extended sample as the ith sample image in the ith sample image set.

In one possible design, the expansion module is further configured to convert the at least one expansion sample in the RGB space and each image in the ith sample image set into an HIS space, decompose each image into a high frequency component and a low frequency component according to brightness, perform linear weighting enhancement on the high frequency component, perform histogram equalization on the low frequency component, fuse the high frequency component that is subjected to linear weighting enhancement processing and the low frequency component that is subjected to histogram equalization processing in each image, and invert each fused image from the HIS space to the RGB space.

In a possible design, the training module is specifically configured to input the at least one extended sample and the ith sample image set into a prediction network for training, obtain an image feature set of the (i + 1) th sample image set, perform prediction training on the (i + 1) th sample image set based on the image feature set, and obtain an (i + 1) th loss fraction.

In a possible design, the training module is further configured to obtain a target object image in an i +1 th sample image set, detect whether an image area of the target object image exceeds a preset area, if so, use the target object image exceeding the preset area as a first target object image, if not, use the target object image lower than the preset area as a second target object image, and perform prediction training on the first target object image and the second target object image based on the feature image set according to a preset rule.

In a possible design, the training module is further configured to extract a first positive sample and a first negative sample corresponding to the first target object image, and a second positive sample and a second negative sample corresponding to the second target object image, respectively, where the positive sample is an image corresponding to a specified region in the target object image, and the negative sample is an image corresponding to a region other than the specified region in the target object image, and when the target object image is the first target object image, the first positive sample and the first negative sample are predicted by using the feature image with the first resolution, and when the target object image is the second target object image, the second positive sample and the second negative sample are predicted by using the feature image with the second resolution.

In a fourth aspect, the present application provides an apparatus for behavior recognition, the apparatus comprising:

and the recognition module is used for performing behavior recognition on the image to be processed containing the target object based on the trained target image processing model and determining whether the target object has abnormal behavior.

In a possible design, the identification module is specifically configured to output alarm information if it is determined that the target object in the current frame has an abnormal behavior, and output alarm information if it is determined that the identification information of the target object in the next frame of the current frame is inconsistent with the identification information of the target object in the current frame.

In a fifth aspect, the present application provides an electronic device, comprising:

a memory for storing a computer program;

the processor is used for realizing the steps of the training method of the image processing model and the steps of the behavior recognition method when executing the computer program stored in the memory.

In a sixth aspect, a computer-readable storage medium has stored therein a computer program which, when being executed by a processor, carries out the above-mentioned method steps of training an image processing model and method steps of behavior recognition.

For each of the first to sixth aspects and possible technical effects of each aspect, please refer to the above description of the possible technical effects for the first aspect or each possible solution of the first aspect, and no repeated description is given here.

Drawings

FIG. 1 is a flowchart illustrating the steps of a method for training an image processing model according to the present application;

FIG. 2 is a flow chart of a convolution of a second sample image set provided herein;

fig. 3 is a schematic view illustrating a flow of acquiring a feature image set corresponding to a second sample image set provided in the present application;

FIG. 4 is a schematic diagram of a mapping process for a second sample image provided herein;

FIG. 5 is a schematic diagram of a prediction network output prediction provided herein;

FIG. 6 is a schematic diagram illustrating training of a prediction network provided by the present application in an image processing model training process;

FIG. 7 is a schematic diagram of a positive sample provided herein;

FIG. 8 is a flow chart of method steps for behavior recognition provided herein;

FIG. 9 is a schematic structural diagram of an image processing model training apparatus according to the present application;

FIG. 10 is a schematic structural diagram of an apparatus for behavior recognition provided herein;

fig. 11 is a schematic structural diagram of an electronic device provided in the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more clear, the present application will be further described in detail with reference to the accompanying drawings. The particular methods of operation in the method embodiments may also be applied to apparatus embodiments or system embodiments. It should be noted that "a plurality" is understood as "at least two" in the description of the present application. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. A is connected with B and can represent: a and B are directly connected and A and B are connected through C. In addition, in the description of the present application, the terms "first," "second," and the like are used for descriptive purposes only and are not intended to indicate or imply relative importance nor order to be construed.

In the prior art, in order to detect whether a worker wears a safety belt, a method for judging a safety helmet and a safety belt based on a deep convolutional neural network is adopted, the worker image in a video is detected and the wearing conditions of the safety helmet and the safety belt are judged through a deep convolutional neural network model and a spatial correlation model for locking the safety helmet and the safety belt, an image processing model is obtained based on the deep convolutional neural network model and the spatial correlation model, however, in the deep convolutional neural network training process of the image processing model, the image features are extracted again by reducing the number of channels and increasing the number of channels, when the number of channels is increased, more image features are obtained, which leads to the overfitting of the image processing model in the training process, therefore, when the image features are used for detecting whether the worker wears the safety belt, the image processing model cannot be accurately locked to a worker, so that the accuracy of the detection result of the image processing model is low.

In order to solve the above problem, an embodiment of the present application provides a training method for an image processing model, so as to solve the problem that the accuracy of a detection result of the image processing model is low, thereby improving the accuracy of the image processing model for detecting whether a worker wears a seat belt. The method and the device in the embodiment of the application are based on the same technical concept, and because the principles of the problems solved by the method and the device are similar, the device and the embodiment of the method can be mutually referred, and repeated parts are not repeated.

The embodiments of the present application will be described in detail below with reference to the accompanying drawings.

Example one

Referring to fig. 1, the present application provides a method for detecting a seat belt, which can improve accuracy of a detection result of an image processing model, and an implementation flow of the method is as follows:

step S1: and performing multiple times of iterative training on the first image processing model, and determining a target image processing model corresponding to the first image processing model based on the image processing model obtained by each iterative training in the multiple times of iterative training and the loss ratio corresponding to each iterative training.

In the embodiment of the application, the accuracy and the detection speed of the image processing model for detecting whether a worker wears a seat belt need to be improved, and the number of images in the first sample image set needs to be increased, so that the image processing model can obtain more feature images in the training process. Because the convolution layer in the image processing model consumes computer resources, in order to improve the speed of reading the image data of the convolution layer by the computer, improve the accuracy of the detection result of the image processing model and improve the detection speed of the image processing model in the process of detecting whether a worker wears a safety belt or not, the embodiment of the application adopts 6 convolution layers, and sets the number of input channels and the number of output channels of the convolution layers to be consistent, thereby improving the speed of reading the image data of the convolution layer by the computer.

Here, the functions of the convolution layer of the image processing model include: reducing the resolution of the image, classifying the image, etc.

The target image processing model is an image processing model which is screened out after the image processing model is trained for M times and corresponds to the minimum loss ratio, the loss ratio can be obtained only through one-time model training, and the quantity of images in a first sample image set is not increased when the image processing model is trained for the first time.

Before describing the process of the second model training, a second sample image set for model training needs to be obtained, where the second sample image set is obtained after adding at least one first extended sample or at least one second extended sample to the first sample image set, and therefore, the first sample image set is obtained first, and the process of obtaining the first sample image set is as follows:

the method comprises the steps of obtaining video data of workers, wherein the video data are formed by each frame of image, in order to avoid the problem that high repetition exists between the images of the adjacent frames due to the fact that the images of the adjacent frames are intercepted, the images are intercepted according to a preset period, and after the images are intercepted according to the preset period, the intercepted images are used as an initial sample image set.

After the initial sample image set is obtained, in order to reduce interference of a background image in the initial sample image set, a target object image needs to be determined from each image in the initial sample image set, and the target object image is a human body image, a safety belt image and the like.

After the first time of image processing model training, a first loss ratio of the first sample image set is obtained, wherein the loss ratio represents the accuracy of the image processing model, and the smaller the loss ratio is, the more accurately the image processing model detects whether a worker wears a safety belt.

It should be further noted that the first loss ratio obtained after the first image processing model training is a difference value obtained by subtracting a predicted value from a true value, and the larger the predicted value is, the smaller the first loss ratio is.

The predicted value reflects the accuracy of the image processing model for training and predicting whether the worker wears the safety belt, and the true value represents the accuracy of detecting whether the worker wears the safety belt or not, which is calculated after the first sample image set is labeled.

In the embodiment of the present application, a specific process of labeling the first sample image set and calculating the true value of the first sample image set according to the labeling information is as follows:

obtaining the coordinates of the intersection point of the left edge and the upper edge of the first sample image set and the coordinates of the intersection point of the right edge and the lower edge of the first sample image set, and recording the category names of the target object images in all the first sample images in the first sample image set, wherein the category names are represented by numbers, and the relationship between the category names and the numbers is shown in table 1:

category name	Number of
		Human being	0
Safety belt	1
		......	......

TABLE 1

According to table 1, the number corresponding to the name of the type to which the target object image belongs can be obtained, the category name in table 1 only lists the person and the safety belt, the number corresponding to the person is 0, the number corresponding to the safety belt is 1, and the other numbers of the category name corresponding to the category name can refer to the person and the safety belt in table 1.

Labeling all the first sample images in the first sample image set according to the above description, and recording the numbers corresponding to the two coordinates of each first sample image in the first sample image set and the name of the type to which the target object image belongs, such as: the coordinates of the image 1 are (0, 0) and (2, 2), and the number corresponding to the category name of the image is 0, so that the image 1 can be represented by (0, 0), (2, 2), and 0.

After labeling each first sample image in the first sample image set, in order to calculate the true value of the first sample image set, it is necessary to record the labeling information corresponding to the first sample image in the first sample image set and the labeling information corresponding to each first sample image in the first sample image set, respectively, where the labeling information corresponding to each first sample image in the first sample image set and the labeling information corresponding to each first sample image are shown in table 2:

first sample image set	Annotation information corresponding to first sample image
		Image 1	(0，0)、(2，2)、1
Image 2	(0，1)、(3，2)、0
		Image 3	(1，1)、(3，3)、1
Image 4	(1，2)、(4，3)、3
		......	......

TABLE 2

In table 2, it should be noted that the first sample image in the first sample image set is associated with the label information corresponding to the first sample image, only labels corresponding to the 4 first sample images and 4 first sample images are listed in table 2, and any one of the 4 images can be referred to as the other first sample images.

After obtaining the labeling information of each first sample image in the first sample image set, performing prediction training on the target object image based on the first sample image set, calculating a true value corresponding to the first sample image set according to the labeling information, and after obtaining the true value of the first sample image set, calculating a first loss ratio of the first sample image set according to the true value and the predicted value.

By the method, the first sample image set is trained to obtain the first image processing model corresponding to the first sample image set, the first image processing model is subjected to multiple times of iterative training, the target image processing model with the highest prediction result accuracy is screened from the multiple times of iterative image processing models, and the accuracy of the target image processing model is ensured.

Step S2: and performing sample expansion processing on at least part of the ith sample image in the ith sample image set used by the (i + 1) th iterative training according to the ith loss ratio corresponding to the ith iterative training to obtain at least one expansion sample.

In the first embodiment of the present application, except that the first sample image in the first sample image set is not extended in the training process of the first image processing model, the iterative training processes of other image processing models are consistent, the iterative training process of the image processing model is to perform sample extension processing on at least part of the ith sample image in the ith sample image set used in the (i + 1) th iterative training according to the ith loss ratio corresponding to the ith iterative training to obtain at least one extended sample, taking the training process of the second image processing model as an example, the image processing model training processes in other iterative processes refer to the training process of the second image processing model, and the specific process of the second image processing model training is as follows:

after the first loss ratio corresponding to the first sample image set is obtained, in the process of second model training, whether the first loss ratio exceeds a preset threshold value is detected, if the first loss ratio exceeds the preset threshold value, scaling processing needs to be performed on at least part of the first sample images in the first sample image set to obtain a plurality of scaled images, the scaled images with the same resolution in the plurality of scaled images are spliced to obtain at least one first extended sample, the at least one extended sample is used as the first sample image in the first sample image set, the process of splicing processing is to splice 4 images with the same resolution, and after the splicing processing is completed, the size of the images is four times of the original size.

If the first loss ratio is lower than a preset threshold, performing rotation processing on a first sample image to obtain a plurality of rotation images, adding pixel points of the rotation images with the same size in the plurality of rotation images according to the weight values of the images to obtain at least one second extension sample, determining the at least one second extension sample as the first sample image in the first sample image set, and after adding the pixel points of the images with the same size according to the weight values of the images, the resolution of the images after adding the pixel points is higher, the resolution represents the density of the pixel points in a unit area, and the size of the images does not change.

After adding all the images of at least one first extended sample or at least one second extended sample in the first sample image set, in order to make details of all the first sample images in the first sample image set clearer, all the first sample images need to be transferred from an RGB space to an HIS space, where the RGB space is a color space and the HIS space is a space where colors are described by hue, saturation and brightness from the viewpoint of human vision.

After all the first sample images are converted from the RGB space to the HIS space, each first sample image is decomposed according to a histogram equalization function, each first sample image is decomposed into a low-frequency component and a high-frequency component according to brightness, the low-frequency component represents a region with slow brightness change in the image, the high-frequency component represents a region with severe brightness change in the image, a gray change region corresponding to the low-frequency component is adjusted, gray distribution corresponding to the low-frequency component is averaged, linear weighting enhancement processing is carried out on the high-frequency component, the contrast of the image is increased, the details of the image are clearer, after the high-frequency component and the low-frequency component of the first sample image are obtained, the high-frequency component and the low-frequency component need to be fused, and after the fusion, the first sample image is reversely converted from the HIS space to the RGB space.

After the first sample image set is processed, the processed image set is used as a second sample image set.

By the method, the number of the first sample images in the first sample image set is increased to obtain at least one extended sample, the at least one extended sample is used as the second sample image in the second sample image set to ensure that the number of the images in the second sample image set is sufficient, and more image features can be obtained in the process of extracting the image features based on the second sample image set.

Step S2: training an ith image processing model obtained by the ith iterative training based on the at least one extended sample and the ith sample image set to obtain an (i + 1) th image processing model corresponding to the (i + 1) th iterative training and an (i + 1) th loss ratio corresponding to the (i + 1) th iterative training.

After the second sample image set is obtained, in order to improve the accuracy of the image processing model for detecting whether the worker wears the safety belt, the second sample image set needs to be input into the prediction network for training, and the specific process of inputting the second sample image set into the prediction network for prediction training is as follows:

as shown in fig. 2, fig. 2 is a flowchart of convolution of the second sample image, the resolution of the second sample image is reduced after each convolution of the second sample image, in fig. 2, after the first convolution, the resolution of the image is 80 × 80, after the second convolution, the resolution of the image is 40 × 40, and after the third convolution, the resolution of the image is 20 × 20, in this embodiment, the resolution of the second sample image in the second sample image set is 160 × 160, if there are other resolution images, reference may be made to a change of the resolution of the second sample image after each convolution in fig. 2, and since the convolution layer processes the image as a technology well known to those skilled in the art, the description is not repeated here.

After the second sample image set passes through the convolutional layer of fig. 2, the result of fig. 2 is taken as the input of fig. 3, fig. 3 is a schematic diagram of the acquisition flow of the feature image set corresponding to the second sample image set, the result after the first convolution is taken as input 1, the result after the second convolution is taken as input 2, and the result after the third convolution is taken as output 3, in order to extract the feature of each second sample image in the second sample image set, it is necessary to process input 1, input 2, and input 3 with a pooling layer and a convolutional layer, respectively, the pooling layer is used to extract the feature of the image and the thickness of the compressed image, in fig. 3, the number of input channels is consistent with the number of output channels, and the calculation resources of a computer are saved.

In the embodiment of the application, the feature image output by the second channel is fused with the feature image acquired by the first channel, the feature image output by the third channel is fused with the feature images acquired by the first channel and the second channel, and the feature image acquired by the pooling layer can be adjusted according to actual requirements, which is not explained herein.

It should be further noted that, after the feature image set of the second sample image set is acquired by fig. 3, a second sample image in the second sample image set may be mapped onto the low resolution image, as shown in fig. 4, in fig. 4, the area at the upper left 3 x 3 of image a is the human body image, and the area at the upper left 3 x 3 of image a is mapped to image b, which is 2 x 2, and at this time, in the image 4, the human body image is converted into a digital form through an image form, the number represents the weight of the human body image in the area, the greater the number, the higher the possibility that the human body image is in the area, 0 represents that the area has no human body image, and further, the image b can be mapped to the image c, and therefore, the image a is mapped to the image c, and whether the human body image is in the image a can be judged according to the image c.

After obtaining the output result in fig. 3, it is necessary to obtain a prediction result, taking output 1, output 2, and output 3 as the inputs of fig. 5, in fig. 5, the feature images output in fig. 3 are convolved 3 times respectively, so as to obtain a classification branch, a regression branch, and an IOU prediction branch corresponding to the feature images, and finally, 9 sets of image data are obtained, where each input corresponds to 3 kinds of image data, the classification branch is used to obtain a target object frame of the target object image, the regression branch is used to obtain a target object in the target object frame, and the IOU prediction branch is used to obtain an image corresponding to the minimum region including the target object image, where the feature images in the classification branch, the regression branch, and the IOU prediction branch are all in data form.

After the feature image set is subdivided, in order to obtain the accuracy of the image processing model for detecting whether the worker wears the safety belt, first, a target object image in each second sample image in the second sample image set needs to be divided into a first target object image and a second target object image, the first target object image is a larger-sized target object image in the second sample image, the second target object image is a smaller-sized target object image in the second sample image, and the dividing process of the first target object image and the second target object image is as follows:

after the target object images in the second sample image are determined, the image area of each framed target object image is calculated, the image areas are sorted according to a rule from large to small, if the sorted serial numbers are even numbers, the target object image with the serial number in the first half is used as a first target object image, the rest target object image is used as a second target object image, if the sorted serial numbers are odd numbers, the target object image in the first half with the serial number reduced by one is used as a first target object, and the rest target object image is used as a second target object image.

After the first target object image and the second target object image are divided, the target object image needs to be divided into a positive sample and a negative sample, the positive sample is an image corresponding to a specified region near the center point of the target object image, the negative sample is an image corresponding to a region other than the specified region of the target object image, and in this embodiment of the present application, the specified region is a region 3 × 3 near the center point of the target object image. As shown in fig. 7, which is a schematic diagram of positive samples, the area 3 × 3 near the center point of the target object image is a positive sample, the area other than the positive sample in the image is a negative sample, and the designated area can be adjusted according to actual conditions, and therefore, the description is not repeated here.

In the embodiment of the present application, the positive sample is a target object image, the negative sample is a background image of the target object image, the first target object image corresponds to the first positive sample and the first negative sample, and the second target object image corresponds to the second positive sample and the second negative sample.

Before matching a first positive sample, a first negative sample, a second positive sample and a second negative sample by using feature images with different resolutions, the resolutions of the feature images of the first positive sample, the first negative sample, the second positive sample and the second negative sample are determined, the resolution of the feature images is determined by a channel for outputting the feature images, the resolution of the feature images of a third channel is low, the number of fused image features is large, and the feature images of the third channel are not clear enough.

The characteristic image of the first channel is high in resolution, so that the characteristic image is clearer, and when the characteristic image of the first channel is used for detecting the target object image, the target object image with a larger size is not needed, so that the second target object image is matched with the characteristic image of the first channel.

After the first target object image is determined to be matched with the feature image of the third channel and the second target object image is determined to be matched with the feature image of the first channel, in order to obtain a more accurate matching result, the first positive sample needs to be matched with the feature image corresponding to the regression branch in the maximum resolution feature image to obtain an image A, and the first negative sample needs to be matched with the feature image corresponding to the classification branch in the maximum resolution feature image to obtain an image a; and matching the second positive sample with the feature image corresponding to the regression branch in the minimum resolution feature image to obtain an image B, and matching the second negative sample with the feature image corresponding to the classification branch in the minimum resolution feature image to obtain an image B.

After the image A and the image a are obtained, matching needs to be performed in the feature image corresponding to the IOU prediction branch with the maximum resolution, an X image corresponding to the maximum IOU value is screened out, the feature image in the IOU prediction branch is a part of the real image overlapped with the feature image, after the image B and the image B are obtained based on the same method, matching needs to be performed in the feature image corresponding to the IOU prediction branch with the minimum resolution, and a Y image corresponding to the maximum IOU value is screened out, at this time, the first target object image corresponds to the image A, the image a and the X image, the second target object image corresponds to the image B, the image B and the Y image, the A image, the image a and the X image can be accurately locked to the target object image according to the matched 3 images, and the prediction training process described above is repeated to obtain the image processing model.

After obtaining the image processing model, calculating a predicted value corresponding to the image processing model, the predicted value representing the accuracy of locking the target object image by the image processing model, comparing the real value with the predicted value of a second sample image set, calculating a second loss ratio corresponding to a feature image set of the second sample image set, feeding the second loss ratio back to the image processing model, processing the second sample image set according to the second loss ratio during the next model training, as shown in fig. 6, which is a training schematic diagram of a prediction network during the image processing model training process, in fig. 6, a first loss ratio is obtained after the first sample image set is input into the image processing model, and at least a part of the first sample image set is expanded according to the first loss ratio during the second model training to obtain a second sample image set, which is sequentially trained on the second image set by the steps in the above embodiments, a second image processing model is obtained.

By the method, the second sample image set is input into the prediction network for training, the feature image set corresponding to the second sample image set is extracted, the feature image set is divided into 3 classes of classification branch, regression branch and IOU prediction branch, the images in the feature image set are further divided, the first target object image and the second target object image are respectively matched with image feature sets with different resolutions, and the accuracy of the detection result corresponding to the image processing model is ensured.

According to the method, M times of training are carried out on the image processing models to obtain M loss ratios, after the M loss ratios are obtained, the M loss ratios are sequenced according to a rule from large to small, the image processing model corresponding to the minimum loss ratio is screened out, and the image processing model corresponding to the minimum loss ratio is used as the target image processing model.

By the method, the image processing model corresponding to the minimum loss occupation ratio is screened out as the target image processing model according to the size of the M loss occupation ratios, the target image processing model is guaranteed to be the model with the highest accuracy rate in the image processing models, and the accuracy of the detection result of the target image processing model is improved.

Example two

Referring to fig. 8, the present application provides a behavior recognition method, which may output alarm information when detecting that a target object has an abnormal behavior, and an implementation flow of the method is as follows:

step S81: and performing behavior recognition on the image to be processed containing the target object based on the trained target image processing model, and determining whether the target object has abnormal behavior.

After the target image processing model is obtained, in the process of practical application of the target image processing model, video data of a worker needs to be obtained first, a target object of a current frame in a video is identified, the target object is used as identification information, whether the target object wears a safety belt or not is detected, and if abnormal behaviors of the target object of the current frame are detected, alarm information is output. After the alarm information is output, detecting that the identification information of the target object in the next frame of the current frame is inconsistent with the identification information of the target object in the current frame, and outputting the alarm information.

By the method, the situation that alarm information is repeatedly output to the target object without the safety belt is avoided, and the efficiency of detecting whether the worker wears the safety belt is improved.

EXAMPLE III

Based on the same inventive concept, the embodiment of the present application further provides a training apparatus for an image processing model, and the detection apparatus for a seat belt is used to implement the function of the training method for the image processing model, and referring to fig. 9, the apparatus includes:

an iteration module 901, configured to perform multiple times of iterative training on a first image processing model, and determine a target image processing model corresponding to the first image processing model based on an image processing model obtained through each iterative training in the multiple times of iterative training and a loss ratio corresponding to each iterative training;

an extension module 902, configured to perform sample extension processing on at least part of an ith sample image in an ith sample image set used in an (i + 1) th iterative training according to an ith loss ratio corresponding to the ith iterative training to obtain at least one extension sample;

a training module 903, configured to train an ith image processing model obtained through the ith iterative training based on the at least one extended sample and the first sample image set, so as to obtain an i +1 th image processing model corresponding to the i +1 th iterative training and an i +1 th loss ratio corresponding to the i +1 th iterative training.

In a possible design, the iteration module 901 is further configured to input the first sample image into a preset network for training, so as to obtain a first image processing model.

In a possible design, the extension module 902 is specifically configured to determine whether the received ith loss ratio exceeds a preset threshold, and if so, scaling at least part of the ith sample image in the ith sample image set to obtain a plurality of scaled images, and the scaled images with the same resolution in the plurality of scaled images are spliced to obtain at least one first extended sample, the first extended sample is determined as the ith sample image in the ith sample image set, if not, performing rotation processing on at least part of the ith sample image in the ith sample image set to obtain a plurality of rotated images, and adding pixel points of the rotating images with the same size in the plurality of rotating images according to the weight values of the images to obtain at least one second extended sample, and determining the second extended sample as the ith sample image in the ith sample image set.

In a possible design, the expansion module 902 is further configured to convert the at least one expansion sample in the RGB space and each image in the ith sample image set into an HIS space, decompose the each image into a high frequency component and a low frequency component according to brightness, perform linear weighting enhancement on the high frequency component, perform histogram equalization on the low frequency component, fuse the high frequency component that is performed linear weighting enhancement processing and the low frequency component that is performed histogram equalization processing in each image, and invert each fused image from the HIS space to the RGB space.

In a possible design, the training module 903 is specifically configured to input the at least one extended sample and the ith sample image set into a prediction network for training, obtain an image feature set of the (i + 1) th sample image set, perform prediction training on the (i + 1) th sample image set based on the image feature set, and obtain an (i + 1) th loss fraction.

In a possible design, the training module 903 is further configured to obtain a target object image in an i +1 th sample image set, detect whether an image area of the target object image exceeds a preset area, if so, use the target object image exceeding the preset area as a first target object image, if not, use the target object image lower than the preset area as a second target object image, and perform prediction training on the first target object image and the second target object image based on the feature image set according to a preset rule.

In a possible design, the training module 903 is further configured to extract a first positive sample and a first negative sample corresponding to a first target object image, and a second positive sample and a second negative sample corresponding to a second target object image, respectively, where the positive sample is an image corresponding to a specified region in the target object image, and the negative sample is an image corresponding to a region other than the specified region in the target object image, and when the target object image is the first target object image, the first positive sample and the first negative sample are predicted by using the characteristic image of the first resolution, and when the target object image is the second target object image, the second positive sample and the second negative sample are predicted by using the characteristic image of the second resolution.

Example four

Based on the same inventive concept, the embodiment of the present application further provides a behavior recognition device, where the detection device of the seat belt is used to implement the function of a behavior recognition method, and with reference to fig. 10, the device includes:

the identification module 1001 is configured to perform behavior identification on an image to be processed including a target object based on the trained target image processing model, and determine whether the target object has an abnormal behavior.

In a possible design, the identification module 1001 is specifically configured to output alarm information if it is determined that a target object in a current frame has an abnormal behavior, and output alarm information if it is determined that identification information of a target object in a next frame of the current frame is inconsistent with identification information of a target object in the current frame.

EXAMPLE five

Based on the same inventive concept, an embodiment of the present application further provides an electronic device, which can implement the functions of the training apparatus for image processing models and the functions of the apparatus for behavior recognition, and with reference to fig. 11, the electronic device includes:

at least one processor 1101, and a memory 1102 connected to the at least one processor 1101, in this embodiment, a specific connection medium between the processor 1101 and the memory 1102 is not limited in this application, and fig. 11 illustrates an example in which the processor 1101 and the memory 1102 are connected through a bus 1100. The bus 1100 is shown by a thick line in fig. 11, and the connection form between other components is merely illustrative and not limited. The bus 1100 may be divided into an address bus, a data bus, a control bus, etc., and is shown in fig. 11 with only one thick line for ease of illustration, but does not represent only one bus or one type of bus. Alternatively, processor 1101 may also be referred to as a controller, without limitation to name a few.

In the embodiment of the present application, the memory 1102 stores instructions executable by the at least one processor 1101, and the at least one processor 1101 may execute the instructions stored in the memory 402 to perform a training method of an image processing model and a behavior recognition method discussed above. The processor 1101 may implement the functions of the respective modules in the apparatuses shown in fig. 8 and 9.

The processor 1101 is a control center of the apparatus, and may connect various portions of the entire control device by using various interfaces and lines, and perform various functions of the apparatus and process data by operating or executing instructions stored in the memory 1102 and calling up data stored in the memory 1102, thereby performing overall monitoring of the apparatus.

In one possible design, the processor 1101 may include one or more processing units, and the processor 1101 may integrate an application processor, which primarily handles operating systems, user interfaces, application programs, and the like, and a modem processor, which primarily handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 1101. In some embodiments, the processor 1101 and the memory 1102 may be implemented on the same chip, or in some embodiments, they may be implemented separately on separate chips.

The processor 1101 may be a general purpose processor, such as a Central Processing Unit (CPU), digital signal processor, application specific integrated circuit, field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or the like, that may implement or perform the methods, steps, and logic blocks disclosed in embodiments of the present application. A general purpose processor may be a microprocessor or any conventional processor or the like. The steps of a method for training an image processing model and a method for behavior recognition disclosed in the embodiments of the present application may be directly implemented by a hardware processor, or implemented by a combination of hardware and software modules in the processor.

Memory 1102, which is a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules. The Memory 1102 may include at least one type of storage medium, and may include, for example, a flash Memory, a hard disk, a multimedia card, a card-type Memory, a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Programmable Read Only Memory (PROM), a Read Only Memory (ROM), a charged Erasable Programmable Read Only Memory (EEPROM), a magnetic Memory, a magnetic disk, an optical disk, and so on. The memory 1102 is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to such. The memory 1102 in the embodiments of the present application may also be circuitry or any other device capable of performing a storage function to store program instructions and/or data.

By programming the processor 1101, the codes corresponding to the method for training the image processing model and the method for behavior recognition described in the foregoing embodiments can be solidified into the chip, so that the chip can execute the steps of the method for training the image processing model of the embodiment shown in fig. 1 and the steps of the method for performing behavior recognition of the embodiment shown in fig. 8 when running. How processor 1101 is programmed is well known to those skilled in the art and will not be described in detail herein.

Based on the same inventive concept, the present application further provides a storage medium storing computer instructions, which when executed on a computer, cause the computer to perform the training method of the image processing model and the behavior recognition method discussed above.

In some possible embodiments, the present application provides that the aspects of the method for training an image processing model and the method for behavior recognition may also be implemented in the form of a program product comprising program code for causing the control apparatus to perform the steps of the method for training an image processing model and the method for behavior recognition according to various exemplary embodiments of the present application described above in this specification when the program product is run on an apparatus.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, C D-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. A method for training an image processing model, comprising:

2. The method of claim 1, wherein determining the target image processing model corresponding to the first image processing model based on a ratio of an image processing model obtained from each iterative training in a plurality of iterative training and a loss corresponding to each iterative training comprises:

3. The method of claim 1, wherein obtaining the first image processing model comprises:

4. The method of claim 1, wherein performing sample expansion processing on at least part of the ith sample image in the ith sample image set used in the (i + 1) th iterative training according to the ith loss ratio corresponding to the ith iterative training to obtain at least one expanded sample comprises:

judging whether the received ith loss ratio exceeds a preset threshold value;

5. The method of claim 1, wherein prior to training the ith image processing model trained in the ith iteration, comprising:

6. The method of claim 1, wherein training an ith image processing model obtained by an ith iterative training based on the at least one extended sample and the ith sample image set to obtain an i +1 th image processing model corresponding to the i +1 th iterative training and an i +1 th loss ratio corresponding to the i +1 th iterative training comprises:

7. The method of claim 6, wherein the predictive training of the i +1 th sample image set based on the image feature set comprises:

8. The method according to claim 7, wherein performing predictive training on the first target object image and the second target object image based on the feature image according to a preset rule comprises:

9. A method of behavior recognition, comprising:

10. The method of claim 9, wherein performing behavior recognition on the image to be processed containing the target object based on the trained target image processing model, and determining whether the target object has abnormal behavior comprises:

11. An apparatus for training an image processing model, the apparatus comprising:

12. An apparatus for behavior recognition, the apparatus comprising:

13. An electronic device, comprising:

a memory for storing a computer program;

a processor for implementing the method steps of any one of claims 1-10 when executing the computer program stored on the memory.

14. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of claims 1-10.