CN113869211B

CN113869211B - Automatic image labeling and labeling quality automatic evaluation method and system

Info

Publication number: CN113869211B
Application number: CN202111145155.0A
Authority: CN
Inventors: 庞明锋; 李攀星; 庞楼阳
Original assignee: Hangzhou Fortune Ark Technology Co ltd
Current assignee: Hangzhou Fortune Ark Technology Co ltd
Filing date: 2021-09-28
Publication date: 2024-07-02
Anticipated expiration: 2041-09-28

Abstract

The invention discloses a method and a system for automatically labeling images and evaluating labeling quality, wherein the method comprises the following steps: extracting an effective frame from a video frame image of the commodity, and cutting out a commodity taking area image from the effective frame; inputting an original image and a clipping image of an effective frame into a target detection model to obtain target detection results of the two images; calculating probability average values of two labeling frame selection contents in the original image and the cut image as objects and the intersection ratio of two frame selection areas according to the target detection result; correcting the marking frame in the original image according to the intersection ratio; the fuzzy detection and classification recognition model carries out fuzzy detection and object classification recognition on the corrected region selected by the labeling frame, and a fuzzy detection and object classification recognition result is obtained; the labeling quality evaluation model takes probability mean value, cross ratio and fuzzy detection and classification recognition results as inputs to evaluate the labeling quality of the image. The invention realizes the automatic labeling of commodity images.

Description

Automatic image labeling and labeling quality automatic evaluation method and system

Technical Field

The invention relates to the technical field of image recognition, in particular to an automatic image labeling and labeling quality evaluation method and an automatic image labeling and labeling quality evaluation system.

Background

The unmanned vending method of the unmanned vending cabinet comprises the following steps: and shooting an image of the commodity taken by the consumer, identifying the type and the quantity of the commodity taken by the consumer from the shot commodity image by the pre-trained commodity identification model, and then carrying out charging settlement according to the identified type and quantity of the commodity. Training of the merchandise identification model requires a large number of annotation images as training samples. At present, the commodity category labeling of commodity images mainly adopts the following two methods:

1. Full manual labeling

And manually labeling commodity categories of the commodity images collected in the history. However, the accuracy of manual labeling depends on labeling experience of labeling personnel to a great extent, different labeling personnel often have differences in understanding the image content, and the labeling accuracy is not high. The most critical is that a large amount of training samples are needed to support in order to ensure the recognition accuracy of the commodity recognition model, the commodity image data volume as the training samples can reach several TB, if all the huge data volume needs to be manually marked, the time and the labor are wasted, errors are easy to occur, and the accuracy of the marked result is difficult to be effectively ensured.

2. Semi-supervised automatic labeling

After the semi-supervised labeling method is used for automatically labeling the commodity images, the accuracy verification is carried out on the commodity images labeled by the machine in a manual verification mode, and the commodity images with inaccurate machine labeling are manually filtered out. Although the manual marking pressure is relieved to a certain extent by the semi-supervised marking method, the marking precision of the existing semi-supervised marking method is not high, and the quality of machine marking needs to be checked manually frame by frame later, so that the problems of low manual marking efficiency and low accuracy are not fundamentally solved.

Disclosure of Invention

The invention aims to improve the image marking quality of the image of the goods of the unmanned sales counter and improve the image marking efficiency, and provides an automatic image marking and marking quality evaluation method and an automatic image marking and marking quality evaluation system.

To achieve the purpose, the invention adopts the following technical scheme:

The method for automatically labeling the image and automatically evaluating the labeling quality comprises the following steps:

Step S1, extracting an effective frame to be subjected to image marking from continuous video frame images of the acquired goods taken by the consumer from the unmanned sales counter, and cutting out a goods taking area image with a specified size from the effective frame to obtain a cutting image;

S2, inputting an original image of the effective frame and the clipping image into a first target detection model and a second target detection model which are trained in advance respectively, wherein the first target detection model outputs a first target detection result related to the original image, and the second target detection model outputs a second target detection result related to the clipping image;

Step S3, calculating probability average values P _mean of probabilities that the contents of the first target detection model and the second target detection model in the labeling frames selected by the frames in the original image and the clipping image are objects respectively according to the first target detection result and the second target detection result, and calculating the intersection ratio P _IOU of the areas of the first target detection model and the second target detection model in the areas selected by the frames in the original image and the clipping image respectively;

S4, correcting the marking frame of the original image according to the intersection ratio P _IOU, and cutting a commodity area image to be marked from the original image of the effective frame by taking the area selected by the corrected marking frame as a cutting object;

Step S5, inputting the commodity area image into a pre-trained fuzzy detection and classification recognition model, and outputting the class probability P _class, the probability P _bg and the probability P _blur of image blurring of the commodity class corresponding to the object in the commodity area image by the model;

And S6, inputting the intersection ratio P _IOU, the probability average value P _mean and the category probability P _class, the background probability P _bg and the image blurring probability P _blur which are calculated in the step S3 and are related to the commodity region image into a pre-trained labeling quality evaluation model, and outputting a quality evaluation result of image labeling of the effective frame by the model.

As a preferred embodiment of the present invention, in the step S1, the method for extracting the effective frame from the continuous video frame image includes:

Step S11a, converting the video frame images of two continuous frames from RGB images to gray images, and obtaining a difference image between the video frame images of the current frame and the frame previous to the current frame by utilizing an inter-frame difference method, and marking the difference image as D (x, y);

Step S12a, performing corrosion and expansion treatment on the image D (x, y), and removing noise in the image D (x, y) to obtain an image D (x, y)';

Step S13a, selecting a motion change area in the image D (x, y)' in a circumscribed rectangle mode;

step S14a, calculating the area of each motion change area, and filtering out the motion change areas with abnormal area;

A step S15a of judging whether the number of the motion change regions in the image D (x, y)' retained through filtering is greater than a preset number threshold,

If yes, judging the current frame as the effective frame;

if not, judging the current frame as the non-valid frame.

As a preferred embodiment of the present invention, the number threshold is 4.

As a preferred embodiment of the present invention, the method for cropping the cropping image from the active frame includes:

Step S11b, calculating the central locus coordinates of the circumscribed rectangle of each motion change region which are reserved through filtering, and marking as (x _i,y_i),x_i、y_i represents the horizontal axis coordinates and the vertical axis coordinates of the central locus of the ith motion change region respectively;

Step S12b, carrying out summation average calculation on the central locus coordinates of all the circumscribed rectangles of each motion change region, obtaining the central locus coordinates of a cutting region, and marking the central locus coordinates as (x _center,y_center);

And step S13b, cutting out the cut-out image with a specified size by taking the coordinates (x _center,y_center) as the center position of the cut-out image in the effective frame.

In the step S2, a multi-resolution target detection result fusion is adopted, and the effective frame is adjusted from the original 1280 x 720 resolution to 746 x 448 resolution, and then is input into the first target detection model;

And adjusting the resolution of the clipping image to 704 x 704, and inputting the clipping image into the second target detection model.

As a preferred embodiment of the present invention, the mean probability P _mean is calculated by the following formula (1):

In the formula (1), P _join0 represents a probability that the first object detection model determines that the content selected by the frame in the original image is an object;

P _join1 represents the probability that the first object detection model determines that the content selected by the frame in the clipping image is an object;

P _join0 is calculated by the following formula (2):

p _join0＝P_class0×P_obj0 formula (2)

In the formula (2), P _class0 represents the probability that the content selected by the first target detection model in the frame in the original image is the corresponding object class;

p _obj0 represents a first confidence of the first target detection model to the target detection result of the original image;

P _join1 is calculated by the following formula (3):

P _join1＝P_class1×P_obj1 formula (3)

In the formula (3), P _class1 represents the probability that the content selected by the target detection model in the clipping image is the corresponding object class;

P _obj1 represents a second confidence of the second object detection model to the object detection result of the cropped image.

As a preferable embodiment of the present invention, in the step S4, the method for correcting the annotation frame of the original image according to the intersection ratio P _IOU includes:

judging whether the intersection ratio P _IOU is smaller than 0.7,

If yes, taking the labeling frame corresponding to the probability P _join0 and the labeling frame corresponding to the probability P _join1 with the larger probability as the corrected labeling frame;

if not, recalculating the annotation frame to correct the annotation frame of the original image by the following formula (4):

in the formula (4), x represents the horizontal axis coordinate of the center locus of the recalculated labeling frame in the original image;

x ₀ represents the horizontal axis coordinate of the central site of the first labeling frame in the original image before correction;

x ₁ represents the coordinate of the central site of the second labeling frame in the clipping image is converted into the horizontal axis coordinate under the original image coordinate system;

y represents the vertical axis coordinate of the central locus of the recalculated labeling frame in the original image;

y ₀ represents a vertical axis coordinate of a central site of a first annotation frame in the original image before correction in the original image;

y ₁ represents the coordinate of the central site of the second labeling frame in the clipping image is converted to the vertical axis coordinate under the coordinate system of the original image;

w represents the width of the recalculated annotation frame in the original image;

w ₀ represents the width of the first label frame in the original image before correction;

w ₁ represents the width of the second label frame in the cropped image;

h represents the height of the recalculated annotation frame in the original image;

h ₀ represents the width of the first label frame in the original image before correction;

h ₁ denotes the height of the second label box in the cropped image.

In a preferred embodiment of the present invention, in the step S5, the cut commodity area image is adjusted to 256×256 resolution, and then is input to the fuzzy detection and classification model.

As a preferred embodiment of the present invention, the method for training the object detection model includes:

Step S21, classifying retail commodities into 10 categories, namely bottling, strip-shaped bags, sheet bags, square bags, vacuum packaging, strip-shaped boxes, square boxes, canning, barreling and fruit packaging, and acquiring at least 500 commodity images of each category of commodities, wherein the original resolution of each commodity image is 1280 x 720;

S22, manually selecting the area where the commodity is located from each commodity image by using a labelImg image marking tool in a rectangular frame selection mode, and marking a commodity category label;

Step S23, cutting out the cutting image with the resolution of 704 x 704 from each commodity image by taking the central site of the marking frame as the center of the cutting image;

Step S24, scaling at least 5000 images of each commodity with an original resolution of 1280×720 to 746×448, and inputting at least 5000 images of each commodity with a resolution of 746×448 and at least 5000 images of each commodity with an original resolution of 704×704 cut out from each original image of the commodity with a label frame as a center into a YOLO-v4 neural network for training, so as to obtain the first target detection model and the second target detection model.

As a preferred embodiment of the present invention, the fuzzy detection and classification model in the step S5 is trained by the following method steps:

Step S51, inputting at least 1000 commodity images artificially marked as fuzzy and clear into an improved parallel type resnet neural network, and training a clear classification model of the forming paste through a first training branch in the parallel type resnet neural network;

The method comprises the steps of photographing a commodity to be marked by using a mobile phone according to the conditions that a camera is perpendicular to the commodity by 90 degrees in front view, 60 degrees in top view and 30 degrees in top view, and photographing 3 special trademark parts of the commodity for 15 images. The image is scaled to 320 x 320 resolution, cropped at 256 x 256 along the center, top left, bottom left, top right, bottom right, and rotated at 60 degrees and 30 degrees after cropping, and random noise, color disturbance, and random noise are added. This is a common way of enhancing image data and will not be described in detail herein. Then randomly extracting 1000 images from the enhanced data, adding 1000 interference images to the parallel resnet neural network, and training to form a class classification plus interference model through a second training branch in the parallel resnet neural network;

And step S52, fusing the fuzzy clear classification model and the classification and interference model into the fuzzy detection and classification recognition model.

As a preferable scheme of the invention, the parallel resnet neural network comprises a feature extraction layer shared by the first training branch and the second training branch, a fuzzy detection layer taking the output of the feature extraction layer as input and a commodity classification recognition layer,

The feature extraction layer comprises a convolution layer conv1, a conv2_x, a conv3_x and a conv4_x which are sequentially cascaded, and the ambiguity detection layer and the commodity classification recognition layer comprise a convolution layer conv5_x, an average pooling layer averagepool and a logistic regression softmax layer which are sequentially cascaded; the output of the convolution layer conv4_x in the feature extraction layer serves as an input to the convolution layer conv5 in the blur detection layer and the commodity classification recognition layer.

The invention also provides an automatic image labeling and labeling quality automatic evaluation system, which can realize the automatic image labeling and labeling quality automatic evaluation method, and the system comprises the following steps:

the effective frame extraction module is used for extracting effective frames to be subjected to image marking from the video frame images of the acquired goods taken by the consumer from the unmanned sales counter;

the image clipping module is connected with the effective frame extraction module and is used for clipping the commodity taking area image with the specified size from the effective frame to obtain a clipping image;

The image input module is respectively connected with the effective frame extraction module and the image clipping module and is used for inputting the original image and the clipping image of the effective frame into the target detection module for target commodity area detection;

The target detection module is connected with the image input module and is used for detecting target commodity areas of the original image and the cut image of the input effective frame through a pre-trained target detection model to obtain a first target detection result related to the original image and a second target detection result related to the cut image;

the probability average value calculation module is connected with the target detection module and is used for calculating a probability average value P _mean of the target detection model, which is used for taking the content of a labeling frame selected by a frame in the original image and the clipping image as an object, according to the first target detection result and the second target detection result;

The intersection ratio calculation module is connected with the target detection module and is used for calculating an intersection ratio P _IOU of the area of the target detection model selected by the frame in the original image and the clipping image according to the first target detection result and the second target detection result;

The marking frame correction module is connected with the cross ratio calculation module and is used for correcting the marking frame in the original image according to the cross ratio P _IOU;

the image cutting module is respectively connected with the marking frame correction module and the effective frame extraction module and is used for cutting a commodity area image to be subjected to image marking from the original image of the effective frame by taking the area selected by the corrected marking frame as a cutting object;

The fuzzy detection and classification recognition module is connected with the image cutting module and is used for inputting the commodity region image into a fuzzy detection and classification recognition model trained in advance, and the model outputs the class probability P _class that an object in the commodity region image is a commodity class corresponding to the object, the probability P _bg that the object is an image background and the probability P _blur that the image is fuzzy;

The labeling quality evaluation module is respectively connected with the probability average value calculation module, the intersection ratio calculation module and the fuzzy detection and classification recognition module and is used for taking the calculated intersection ratio P _IOU, the probability average value P _mean and the calculated class probability P _class, the probability P _bg and the probability P _blur of image blurring which are related to the commodity region image as inputs of a pre-training labeling quality evaluation model and outputting a quality evaluation result of image labeling of the effective frame through the labeling quality evaluation model.

The invention has the following beneficial effects:

1. According to the invention, a target detection model is trained through a YOLO-v4 neural network, target detection is respectively carried out on an original image of an effective frame with different resolutions and a cut image cut from the original image, the probability P _join0 of taking the content selected by a first marking frame as an object in the original image and the probability P _join1 of taking the content selected by a second marking frame as an object in the cut image are calculated according to the first target detection result of the associated original image and the second target detection result of the associated cut image output by the target detection model, the intersection ratio P _IOU of the area selected by the first marking frame and the second marking frame is calculated, the size of the first marking frame is corrected according to the intersection ratio P _IOU, the probability P _join0 and the probability P _join1, and the target detection precision of the target detection model is improved.

2. According to the invention, the improved parallel type resnet neural network is used for training the fuzzy detection and classification recognition model, the fuzzy detection and commodity classification recognition are carried out on the commodity area image selected by the corrected labeling frame in the original image of the effective frame, and the precision of the fuzzy detection and commodity classification recognition is improved. In addition, a first training branch in the parallel type resnet neural network is used for training the fuzzy clear classification model, a second training branch in the parallel type resnet neural network is used for training the classification and interference model, and the first training branch and the second training branch share the same feature extraction layer, so that the training speed of the fuzzy detection and classification recognition model is improved.

3. The method also uses the cross ratio P _IOU, the probability average value P _mean, the category probability P _class, the background probability P _bg and the image blurring probability P _blur of the related commodity region images as the input of the pre-trained labeling quality evaluation model, thereby improving the accuracy of the labeling quality evaluation model on the labeling quality evaluation.

Drawings

In order to more clearly describe the technical solution of the embodiments of the present invention, the following will briefly describe the attached drawings that are required to be used in the embodiments of the present invention. It is evident that the drawings described below are only some embodiments of the present invention and that other drawings may be obtained from these drawings without inventive effort for a person of ordinary skill in the art.

FIG. 1 is a schematic block diagram of an implementation of an automatic image labeling and labeling quality evaluation method according to an embodiment of the present invention;

FIG. 2 is a functional block diagram of extracting a valid frame;

FIG. 3 is a functional block diagram of model input data for acquiring images of different resolutions;

FIG. 4 is a schematic diagram of a network architecture of a resnet neural network modified in accordance with an embodiment of the present invention;

FIG. 5 is a step diagram of implementing the method for automatically labeling images and evaluating the labeling quality according to the embodiment of the invention;

FIG. 6 is a schematic diagram of a conventional target detection model for detecting a target commodity from a video frame image of a consumer picking up the commodity from an unmanned sales counter;

FIG. 7 is a diagram of steps in a method for extracting valid frames from successive video frame images according to an embodiment of the present invention;

FIG. 8 is a diagram of method steps for cropping a cropped image from a valid frame;

FIG. 9 is a diagram of method steps for training a target detection model according to an embodiment of the present invention;

FIG. 10 is a diagram of method steps for training a fuzzy detection and classification recognition model in accordance with an embodiment of the present invention;

FIG. 11 is a schematic structural diagram of an automatic image labeling and labeling quality evaluation system according to an embodiment of the present invention;

FIG. 12 is a schematic diagram of a multi-layer perceptron model constructed in accordance with an embodiment of the present invention;

fig. 13 is a schematic diagram of the structure of each neuron in the multi-layer perceptron model.

Detailed Description

The technical scheme of the invention is further described below by the specific embodiments with reference to the accompanying drawings.

Wherein the drawings are for illustrative purposes only and are shown in schematic, non-physical, and not intended to be limiting of the present patent; for the purpose of better illustrating embodiments of the invention, certain elements of the drawings may be omitted, enlarged or reduced and do not represent the size of the actual product; it will be appreciated by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.

The same or similar reference numbers in the drawings of embodiments of the invention correspond to the same or similar components; in the description of the present invention, it should be understood that, if the terms "upper", "lower", "left", "right", "inner", "outer", etc. indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, it is merely for convenience in describing the present invention and simplifying the description, rather than indicating or implying that the device or element being referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus the terms describing positional relationships in the drawings are merely for exemplary purposes and should not be construed as limiting the present patent, and the specific meaning of the above terms may be understood by those of ordinary skill in the art according to specific circumstances.

In the description of the present invention, unless explicitly stated and limited otherwise, the term "coupled" or the like should be interpreted broadly, as referring to the connection between components, for example, whether fixedly coupled, detachably coupled, or as a unit; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between the two parts or interaction relationship between the two parts. The specific meaning of the above terms in the present invention will be understood in specific cases by those of ordinary skill in the art.

Fig. 1 is a schematic block diagram of an implementation of an automatic image labeling and labeling quality evaluation method according to an embodiment of the present invention. It should be noted that, the "automatic labeling model" shown in fig. 1 includes a target detection model, a fuzzy detection and classification recognition model, and a labeling quality evaluation model. The consumer inputs video frame images of goods taken from the unmanned sales counter into a target detection model, and the target detection model frames a suspected goods taking area in the video frame images in a rectangular labeling frame mode. And then the fuzzy detection and classification recognition model performs commodity classification label marking and image ambiguity detection on the suspected commodity taking area selected by the target detection model frame. And the labeling quality evaluation model evaluates the labeling quality according to the fuzzy detection and the output of the classification recognition model.

FIG. 6 is a schematic diagram showing a conventional target detection model for detecting a target commodity from a video frame image of a consumer taking the commodity from an unmanned sales counter. As can be seen from fig. 6, the suspected commodity taking area selected by the existing target detection model frame may be image background or too blurred due to interference of factors such as image background or human body movement. In addition, the labeling frame of the suspected commodity taking area selected by the target detection model frame may be too small or too large, and if the labeling frame is not fine and accurate enough, the labeling accuracy of the subsequent commodity category will be directly affected, so in order to ensure the labeling accuracy of the commodity category, the problem of the labeling accuracy of the labeling frame needs to be solved first.

The automatic image labeling and labeling quality evaluation method provided by the embodiment of the invention solves the problem that the existing target detection model frame selection suspected commodity taking area is not accurate enough through the steps S1-S4. As shown in fig. 5, the method for automatically labeling images and evaluating the labeling quality according to the embodiment of the invention includes:

Step S1, extracting an effective frame for marking the commodity from continuous video frame images of the acquired commodity taken by a consumer from an unmanned sales counter, and cutting out a commodity taking area image with a specified size from the effective frame to obtain a cutting image; for example, a consumer stands in front of an unmanned vending cabinet and does not open the unmanned vending cabinet door, and because the consumer does not take the commodity at the moment and cannot pay and settle the price, the video frame image acquired at the moment is invalid and does not have the value of commodity category marking. Before labeling the category of the product, we first need to extract the effective frames from the continuous video frame images, as shown in fig. 2 and fig. 7, the method for extracting the effective frames in this embodiment includes:

Step S11a, converting the video frame images of two continuous frames from RGB image to gray image, and obtaining the difference image between the current frame and the previous video frame image of the current frame by using the frame difference method (subtracting the two frames to obtain the absolute value of the pixel value difference of the corresponding position of the image), and recording as D (x, y), wherein the obtaining of the difference image D (x, y) can be expressed by the following formula (1):

in the formula (1), I (t) represents a video frame image (current frame image) at the current time t;

I (t-1) represents a video frame image at time t-1 (a previous video frame image of the current frame);

t represents an absolute value threshold of the pixel value difference, in this embodiment, t=128;

d (x, y) =1 represents an image foreground;

d (x, y) =0 represents an image background;

Step S13a, selecting a motion change area in an external rectangle mode in an image D (x, y)'; the existing method for selecting the motion change area by the circumscribed rectangle mode frame is numerous, so the concrete method for selecting the motion change area by the frame is not described herein;

Step S14a, calculating the area of each motion change area, and filtering out the motion change areas with abnormal area; the method for judging whether the area of the region is abnormal is exemplified by:

and if the area of the motion change area is greater than 50% or less than 1% of the whole area of the acquired video frame image, determining that the area of the motion change area is abnormal.

In step S15a, it is determined whether the number of motion change regions in the filtered retained image D (x, y)' is greater than a preset number threshold (preferably 4, we find that when the number of motion change regions in the filtered retained image D is greater than or equal to 4, the accuracy of determining the current frame as a valid frame is high, so the number threshold is determined to be 4),

If yes, the current frame is judged to be a valid frame.

If not, the current frame is judged to be a non-valid frame.

After the effective frame is extracted, the commodity taking area image with the specified size is cut out from the effective frame, so that the commodity taking area image in the effective frame is enlarged, a multi-model fusion method is adopted, and the effective frame with different resolutions and the cut-out image cut out from the effective frame are simultaneously used as the input of an automatic labeling model, so that the precision of commodity category labeling is improved.

Specifically, the method for clipping the commodity taking area image with the specified size from the valid frame is as shown in fig. 8, and includes:

step S11b, calculating the central locus coordinates of the circumscribed rectangle of each motion change region which are reserved through filtering, and marking as (x _i,y_i),x_i、y_i respectively represents the horizontal axis coordinates and the vertical axis coordinates of the central locus of the ith motion change region;

Step S12b, carrying out summation average calculation on the central locus coordinates of all circumscribed rectangles of each motion change region to obtain the central locus coordinates of the clipping region, and marking the central locus coordinates as (x _center,y_center); for example, the number of the motion change regions remained after filtering is 5, and the central sites of the circumscribed rectangles of the 5 motion change regions are respectively denoted as (x₀,y₀)、(x₁,y₁)、(x₂,y₂)、 (x₃,y₃)、(x₄,y₄),

In step S13b, a clipping image of a specified size is clipped in the effective frame with the coordinates (x _center,y_center) as the center position of the clipping image.

In order to increase the speed of commodity category labeling, the present invention preferably adjusts the effective frame from the original 1280 x 720 resolution to 746 x 448 resolution before inputting the effective frame into the automatic labeling model for commodity category labeling.

The present invention preferably determines the resolution of a cropped image cropped from the active frame as 704 x 704. The clipping image with 704 x 704 resolution is clipped from the effective frame, and the motion change area in the effective frame is partially amplified, so that the detection model is more focused on the effective area, and the subsequent commodity category labeling precision is improved.

Referring to fig. 5, to solve the problem that the existing target detection model frame selects the suspected commodity taking area not precisely enough, the method for automatically labeling images and automatically evaluating labeling quality according to the embodiment of the invention further includes:

Step S2, inputting the original image of the effective frame (preferably the original image with 746 x 448 resolution) and the clipped image with 704 x 704 size into a first target detection model and a second target detection model trained in advance respectively, wherein the first target detection model outputs a first target detection result related to the original image, the second target detection model outputs a second target detection result related to the clipped image, and the first target detection result and the second target detection result are respectively expressed as (x₀,y₀,w₀,h₀,label₀,P_class0,P_obj0)、(x₁,y₁,w₁,h₁,label₁,P_class1,P_obj1),

(X ₀,y₀) represents the coordinates of the central site of the first labeling frame selected by the frame in the original image of the target detection model under the XY-axis coordinate system;

w ₀ denotes the width of the first label frame;

h ₀ denotes the height of the first label frame;

label ₀ represents an object classification label of the area framed by the first label frame;

p _class0 represents the probability that the region framed by the first labeling frame is the corresponding object class;

p _obj0 represents a first confidence level of the content framed by the first annotation box;

(x ₁,y₁) representing that the coordinates of the central site of the second labeling frame selected by the frame in the clipping image of the target detection model are converted into coordinates under the XY-axis coordinate system of the original image;

w ₁ denotes the width of the second label frame;

h ₁ denotes the height of the second label frame;

label ₁ represents an object classification label for the area framed by the second label frame;

P _class1 represents the category probability that the area selected by the second labeling frame is the corresponding object category;

p _obj1 represents a second confidence level of the content framed by the second annotation box;

Step S3, calculating the product of the category probability P _class0 and the first confidence coefficient P _obj0 as the probability P _join0 for judging that the area framed by the first labeling frame is an object, calculating the product of the category probability P _class1 and the second confidence coefficient P _obj1 as the probability P _join1 for judging that the area framed by the second labeling frame is an object, then summing and averaging the probability P _join0 and the probability P _join1 to obtain the probability average value P _mean for finally judging that the content framed by the first labeling frame or the second labeling frame is an object,

And calculating an intersection ratio P _IOU of the areas selected by the first labeling frame and the second labeling frame, wherein the intersection ratio P _IOU is calculated by the following formula (2):

S ₀ in the formula (2) represents the area of the area selected by the first labeling frame, S ₀＝w₀*h₀;

S ₁ represents the area of the area selected by the second label frame, S ₁＝w₁*h₁.

Step S4, according to the cross-over ratioCorrecting the marking frame of the original image, and cutting out the commodity area image with the appointed size to be marked in the effective frame according to the corrected marking frame;

the method for correcting the annotation frame of the original image comprises the following steps:

judging the cross-parallel ratio Whether or not it is less than 0.7,

If yes, then by probabilityProbability ofThe labeling frame corresponding to the middle probability is the labeling frame after correction; for example, probabilityCorresponding to the first labeling frame, probabilityCorresponding to the second marking frame, if the cross ratio of the areas selected by the first marking frame and the second marking frameProbability of less than 0.7Greater than probabilityThe first labeling frame is not corrected, and the content of the area selected by the first labeling frame is directly used as a subsequent commodity category labeling object; if the cross ratio of the areas selected by the first marking frame and the second marking frameProbability of less than 0.7Less than or equal to probabilitySelecting a suspected commodity taking area serving as a subsequent commodity category labeling object at a corresponding position of the original image by taking the second labeling frame as a labeling frame corrected by the first labeling frame;

If not, recalculating the annotation frame to correct the annotation frame of the original image by the following formula (3):

in the formula (4) of the present invention, Representing the horizontal axis coordinates of the center locus of the recalculated annotation frame in the original image;

Representing the horizontal axis coordinate of the central locus of the first annotation frame in the original image before correction in the original image;

Converting the coordinates of the central locus of the second annotation frame in the clipping image into horizontal axis coordinates under the original image coordinate system;

representing the vertical axis coordinates of the center locus of the recalculated annotation frame in the original image;

representing the vertical axis coordinate of the central site of the first annotation frame in the original image before correction in the original image;

converting the coordinates of the central site of the second annotation frame in the clipping image into vertical axis coordinates under the coordinate system of the original image;

representing the width of the recalculated annotation frame in the original image;

representing the width of a first annotation frame in the original image before correction;

Representing the width of a second annotation frame in the cropped image;

representing the height of the recalculated annotation frame in the original image;

Representing the height of the second annotation box in the cropped image.

In order to further improve the automatic labeling speed of the subsequent commodity category, in step S4, the size of the cut commodity area image is adjusted to 256×256 resolution, and then the image is input into a fuzzy detection and classification recognition model for further fuzzy detection and commodity classification recognition marking.

Step S5, inputting the commodity area image into a pre-trained fuzzy detection and classification recognition model, and outputting a class probability P _class, a probability P _bg of being an image background and a probability P _blur of image blurring of the commodity which is judged by the commodity area image to belong to classification by the model;

And S6, inputting the intersection ratio P _IOU and the probability mean value P _mean calculated in the step S3, and inputting the category probability P _class, the background probability P _bg and the image blurring probability P _blur calculated in the step S5 into a pre-trained labeling quality evaluation model, and outputting a commodity category labeling quality evaluation result of an effective frame by the model.

The process of the marking quality evaluation model outputting the marking quality evaluation result according to the input data can be represented by the following formula (5):

(N ₀,N₁,N₂)＝f(M₁,M₂,M_s,M₄,M₅) formula (5)

In the formula (5), M ₁,M₂,M₃,M₄,M₅ represents an intersection ratio P _IOU, a probability average value P _mean, a category probability P _class, a background probability P _bg, and an image blur probability P _blur, respectively, as model input data;

N ₀,N₁,N₂ represents the quality evaluation grade of "excellent", "middle", "poor" of the image labeling results made by the labeling quality evaluation model and the fuzzy detection and classification recognition model, respectively.

The method for training the target detection model according to the present invention is briefly described below:

As shown in fig. 9, the method for training the target detection model according to the present invention includes:

S22, selecting an area where the commodity is located by an artificial frame in each commodity image by using labelImg image marking tools in a rectangular frame selection mode and marking commodity category labels;

step S23, cutting out a cutting image with the resolution of 704 x 704 from each commodity image by taking the central site of the marking frame as the center of the cutting image;

Step S24, scaling at least 5000 commodity images with original resolution of 1280×720 to 746×448 resolution, and inputting at least 5000 commodity images with resolution of 746×448 and at least 5000 clipping images with resolution of 704×704 clipped from each original commodity image into YOLO-v4 neural network for model training to obtain a first target detection model and a second target detection model. The invention adopts the YOLO-v4 neural network to train the target detection model because the YOLO-v4 based on darknet is very ideal for the accuracy of target detection, the customization degree of the YOLO-v4 is very high, and the built-in data enhancement technology can furthest utilize the training data set to obtain the high-accuracy target detection model.

The following are the parameter configurations for the training of the object detection model:

[ first target detection configuration parameters ]

classes＝1；

batch＝64,subdivisions＝16；

width＝746,height＝448；

max_batches＝3000；

learning_rate＝0.001,steps＝2400,2700,scales＝0.1,0.1；

Classes =1 and filters=18 for the [ yolo ] layer;

Start data enhancement mosaine=1.

[ Second target detection configuration parameters ]

classes＝1；

batch＝64,subdivisions＝16；

width＝704,height＝704；

max_batches＝3000；

learning_rate＝0.001,steps＝2400,2700,scales＝0.1,0.1；

Classes =1 and filters=18 for the [ yolo ] layer;

Start data enhancement mosaine=1.

In model training, it is preferable to adjust the resolution of the commodity image with 1280×720 to 746×448, and then to use the commodity image as a training sample of the first target detection model, and to adjust the commodity image from 1280×720 to 746×448, in order to reduce the resolution of the commodity image and to increase the processing speed of the model. The clipping image of 704 x 704 is also taken as a training sample of the second target detection model from the original commodity image, and the clipping image of 704 x 704 can play a role in locally enlarging the image area, so that the model is more focused on the effective area. Thus, the detection precision of the target detection model can be further improved by using the fusion of the first target detection model and the second target detection model.

The method for training the fuzzy detection and classification recognition model is briefly described below:

as shown in FIG. 10, the method for training the fuzzy detection and classification recognition model comprises the following steps:

Step S51, inputting at least 1000 commodity images artificially marked as fuzzy and clear into an improved parallel type resnet neural network, and training a formed fuzzy detection model through a first training branch in the parallel type resnet neural network;

The method comprises the steps of photographing a commodity to be marked by using a mobile phone according to the conditions that a camera is perpendicular to the commodity by 90 degrees in front view, 60 degrees in top view and 30 degrees in top view, and photographing 3 special trademark parts of the commodity by 15 images. The image is scaled to 320 x 320 resolution, cropped at 256 x 256 along the center, top left, bottom left, top right, bottom right, and rotated at 60 degrees and 30 degrees after cropping, and random noise, color disturbance, and random noise are added. This is a common way of enhancing image data and will not be described in detail herein. Then randomly extracting 1000 images from the enhanced data, adding 1000 interference images to the parallel resnet neural network, and training to form a class classification plus interference model through a second training branch in the parallel resnet neural network;

Step S52, fusing the fuzzy detection model and the classification recognition model into the fuzzy detection and classification recognition model.

Fig. 4 shows a network structure of the improved parallel resnet neural network of the present invention. As shown in fig. 4, the parallel resnet neural network includes a feature extraction layer common to the first training branch and the second training branch, a fuzzy detection layer and a commodity classification recognition layer taking the output of the feature extraction layer as input,

The feature extraction layer comprises a convolution layer conv1, a conv2_x, a conv3_x and a conv4_x which are sequentially cascaded, the fuzzy detection layer and the commodity classification identification layer comprise a convolution layer conv5_x, an average pooling layer averagepool and a logistic regression softmax layer which are sequentially cascaded, and the output of the convolution layer conv4_x is used as the input of the convolution layer conv5_x in the fuzzy detection layer and the commodity classification identification layer.

The process of the present invention for training a fuzzy detection and classification recognition model using parallel resnet is briefly described as follows:

Training a fuzzy detection and classification recognition initial model under 500 sku commodity categories in advance by using resnet neural network, wherein training parameters are as follows:

the optimizer selects SGD, cross entropy loss adopted by a loss function, initializes the learning rate to 0.1, decrements the attenuation coefficient to 0.1, and stores a training model after training.

The conventional resnet neural network is modified into a parallel type resnet, and the structure of the modified parallel type resnet neural network is shown in the above figure 4.

Data preparation: and acquiring 1000 images of the commodity (images of the commodity taken by a consumer) artificially marked as blurred and clear, wherein whether the images are blurred or clear is distinguished by naked eyes. The method for acquiring the commodity image to be marked and 1000 interference images (the interference images are often mistaken as background images of objects) for interfering with classification and identification of the commodity comprises the following steps: the mobile phone is used for photographing, and the photographing mode is respectively described according to the beverage, the bag and the box.

Assuming that the tabletop for placing the commodity to be marked is an XY axis, the vertical tabletop is a Z axis, and the upward direction is the positive direction of the Z axis. Placing the beverage to be marked on an XY plane, photographing at 90 degrees, 30 degrees and 60 degrees along the positive direction and the negative direction of the Z axis respectively, photographing the top and the bottom of the beverage respectively, and photographing the main characteristic parts (such as the parts where trademarks are) of the beverage commodity for 1-3 sheets.

If the bagged commodity is mainly packaged, the front and back sides of the bag are placed in an XY plane, photographing is respectively carried out in the positive and negative directions of the XY axis at 90 degrees, 30 degrees and 60 degrees with the positive direction of the Z axis, and then photographing is carried out on the main trademark part of the bagged commodity for 1-3 sheets.

If the box-packed commodity to be marked is a box-packed commodity, photographing six plane planes of the box-packed commodity, assuming that each plane is an XY plane, and taking photographing at 90 degrees, 30 degrees and 60 degrees with the Z-axis positive direction along the positive and negative directions of the XY axis respectively, wherein the direction of the Z-axis positive direction is the vertical XY plane.

Then, the image shot by the mobile phone is enhanced, the method is that each commodity image is scaled to 320 multiplied by 320 resolution, the center of the image is cut by 256 multiplied by 256 by the center position of the image, the upper left vertex, the lower left vertex, the upper right vertex and the lower right vertex are cut by 320 multiplied by 320, then the cut image is rotated by 45 degrees, 90 degrees, 135 degrees, 180 degrees, 225 degrees and 270 degrees, then the image is horizontally turned over, then Gaussian random noise is added with the average value of 0.2 and the variance of 0.3, then the saturation and the sharpness of the image are randomly enhanced by 0-0.3 proportion, and the brightness and the contrast are randomly enhanced by 0.1-0.2 proportion.

And then randomly taking 1000 images from the enhanced images, and scaling the images to 256 times and 256 sizes. 1000 pieces of background interference data to be marked and 1000 pieces of background interference data are input into the parallel resnet neural network, and a category classification and interference model is formed through training of a second training branch in the parallel resnet neural network.

Training a fuzzy clear classification model: using the first training branch shown in fig. 4, 1000 blurred commodity images and 1000 clear commodity images are used as training samples, and training is performed to form a blurred clear classification model.

Category classification + interference model training: using the second training branch shown in fig. 4, taking 1000 commodity images to be marked and 1000 background interference images as training samples, loading weight files during training of the fuzzy clear classification model, and training to form a classification and interference model.

Wherein the training parameters are as follows: the optimizer selects SGD, the loss function adopts cross entropy loss, the initial learning rate is 0.1, the decreasing attenuation coefficient is 0.1, and a fuzzy clear classification model and a class classification and interference model are stored after training.

And finally, fusing the fuzzy clear classification model and the classification and interference model into a fuzzy detection and classification recognition model.

The method for evaluating the labeling quality by the labeling quality evaluation model provided by the invention is briefly described as follows:

The quality evaluation model is realized by constructing a multi-layer perceptron model. The multi-layer perceptron model inputs the characteristic values into the input layer in a mode of simulating biological neurons, calculates linear transformation once at each node of the hidden layer and the output layer, and then uses an activation function to improve the nonlinear fitting capacity of the model, so that the classification accuracy of the model is improved.

FIG. 12 is a schematic diagram of a multi-layer perceptron model constructed in accordance with an embodiment of the present invention. Fig. 13 shows a schematic structural diagram of each neuron in the multi-layer perceptron model. As shown in fig. 12, the multi-layer perceptron model is divided into an input layer, an hidden layer, and an output layer. M _i in fig. 13 is the input value, e _i is the weight of the corresponding input value, b is the bias, f (·) is the Sigmoid activation function, and the expression function of any neuron in the network is as follows:

N＝f(u)

where M _i is the input data, e _i is the weight of the corresponding input value, and f (u) is the Sigmoid activation function. Three neurons are arranged on the output layer, the "excellent", "medium" and "poor" of quality evaluation are respectively marked correspondingly, and then the corresponding evaluation values are calculated through a softmax function.

The data used for training the multi-layer perceptron model are 500 of the best, middle and poor in the historical sample library, and the total number of the data is 1500 for model training.

Wherein the input data is [M₁,M₂,M₃,M₄,M₅],M₁,M₂,M₃,M₄,M₅, which respectively represent the intersection ratio P _IOU, the probability average value P _mean, the category probability P _class, the background probability P _bg and the image blurring probability P _blur as the model input data.

The output result is [ N ₀,N₁,N₂ ], wherein N ₀ is the value of the corresponding labeling evaluation quality of "excellent"; n ₁ is the value of the corresponding labeling evaluation quality of 'middle'; n ₂ is the value of the corresponding label evaluation quality "bad". Then judging whether the maximum value in N ₀、N₁、N₂ is N ₂, if so, directly discarding the valid frame marked by the machine,

If the maximum value in N ₀、N₁、N₂ is N ₀ and N ₀ is more than 0.8, judging that the image marking quality of the machine is 'excellent', and directly storing the current effective frame;

if the maximum value in N ₀、N₁、N₂ is N ₀ and the maximum value in N ₀ is less than or equal to 0.8 or the maximum value in N ₀、N₁、N₂ is N ₁, the image marking quality of the machine is judged to be 'medium', and the marking condition of the current effective frame needs to be checked manually.

The invention also provides an automatic image labeling and labeling quality evaluation system, which can realize the automatic image labeling and labeling quality evaluation method, as shown in fig. 11, and comprises the following steps:

the image clipping module is connected with the effective frame extraction module and is used for clipping the commodity taking area image with the specified size in the effective frame to obtain a clipping image;

the image input module is respectively connected with the effective frame extraction module and the image clipping module and is used for inputting an original image and a clipping image of the effective frame into the target detection module for target commodity area detection;

the target detection module is connected with the image input module and is used for detecting target commodity areas of the original image and the cut image of the input effective frame through the first target detection model and the second target detection model which are trained in advance, so that a first target detection result of the associated original image and a second target detection result of the associated cut image are obtained;

The probability average value calculation module is connected with the target detection module and is used for calculating a probability average value P _mean of the target detection model, which is used for taking the content of the labeling frame selected by the frame in the original image and the clipping image as an object, according to the first target detection result and the second target detection result;

The intersection ratio calculation module is connected with the target detection module and is used for calculating the intersection ratio P _IOU of the area of the target detection model selected by the frames in the original image and the cut image according to the first target detection result and the second target detection result;

the marking frame correction module is connected with the cross-over ratio calculation module and is used for correcting the marking frame in the original image according to the cross-over ratio P _IOU;

the image cutting module is respectively connected with the marking frame correction module and the effective frame extraction module and is used for cutting a commodity area image to be subjected to image marking from an original image of the effective frame by taking the area selected by the corrected marking frame as a cutting object;

The fuzzy detection and classification recognition module is connected with the image cutting module and is used for inputting the commodity area image into a pre-trained fuzzy detection and classification recognition model, and the model outputs the class probability P _class that the object in the commodity area image corresponds to the commodity class to which the object belongs, the probability P _bg that the object is an image background and the probability P _blur that the image is fuzzy;

The labeling quality evaluation module is respectively connected with the probability average value calculation module, the intersection ratio calculation module and the fuzzy detection and classification identification module and is used for outputting quality evaluation results of image labeling of an effective frame (labeling content is a class label of a commodity category to which an object in the commodity area image belongs, the class probability P _class of the commodity category, the probability P _bg of an image background and the probability P _blur of the image blurring) by taking the calculated intersection ratio P _IOU, the probability average value P _mean, the calculated class probability P _class, the background probability P _bg and the probability P _blur of the image blurring of the related commodity area image as inputs of a pre-trained labeling quality evaluation model through the labeling quality evaluation model.

In summary, the object detection model is trained through the YOLO-v4 neural network, the original image of the effective frame with different resolutions and the clipping image clipped from the original image are respectively subjected to object detection, the probability P _join0 of taking the content selected by the first labeling frame as an object in the original image and the probability P _join1 of taking the content selected by the second labeling frame as the object in the clipping image are calculated according to the first object detection result of the related original image and the second object detection result of the related clipping image output by the object detection model, the intersection ratio P _IOU of the area selected by the first labeling frame and the second labeling frame is calculated, the size of the first labeling frame is corrected according to the intersection ratio P _IOU, the probability P _join0 and the probability P _join1, and the object detection precision of the object detection model is improved.

According to the invention, the improved parallel type resnet neural network is used for training the fuzzy detection and classification recognition model, the fuzzy detection and commodity classification recognition are carried out on the commodity area image selected by the corrected labeling frame in the original image of the effective frame, and the precision of the fuzzy detection and commodity classification recognition is improved. In addition, a first training branch of the parallel type resnet neural network is used for training a fuzzy clear classification model, a second training branch of the parallel type resnet neural network is used for training a classification and interference model, and the first training branch and the second training branch share the same feature extraction layer, so that the training speed of the fuzzy detection and classification recognition model is improved.

The method also takes the cross ratio P _IOU, the probability average value P _mean, the category probability P _class, the background probability P _bg and the image blurring probability P _blur of the related commodity region images as the input of the pre-trained labeling quality evaluation model, thereby improving the accuracy of the labeling quality evaluation model on the labeling quality evaluation.

The automatic image labeling and labeling quality evaluation method provided by the invention greatly reduces the workload of manually labeling the image data of the commodity, and greatly improves the image labeling quality and the labeling efficiency.

It should be understood that the above description is only illustrative of the preferred embodiments of the present application and the technical principles employed. It will be apparent to those skilled in the art that various modifications, equivalents, variations, and the like can be made to the present application. Such variations are intended to be within the scope of the application without departing from the spirit thereof. In addition, some terms used in the description and claims of the present application are not limiting, but are merely for convenience of description.

Claims

1. An automatic image labeling and labeling quality automatic evaluation method is characterized by comprising the following steps:

Step S3, calculating probability mean values P _mean of probabilities that the contents of the first target detection model and the second target detection model in the labeling frames selected by the frames in the original image and the clipping image are objects respectively according to the first target detection result and the second target detection result, and calculating intersection ratio P _IOU of areas of the first target detection model and the second target detection model in the areas selected by the frames in the original image and the clipping image respectively;

S4, correcting a labeling frame of the original image according to the intersection ratio P _IoU, and cutting a commodity area image to be subjected to image labeling from the original image of the effective frame by taking the area selected by the corrected labeling frame as a cutting object;

Step S6, inputting the intersection ratio P _IOU, the probability average value P _mean and the category probability P _class, the background probability P _bg and the image blurring probability P _blur which are calculated in the step S3 and are related to the commodity region image into a pre-trained labeling quality evaluation model, and outputting a quality evaluation result of image labeling of the effective frame by the model;

In the step S4, the method for correcting the labeling frame of the original image according to the intersection ratio P _IOU includes:

Judging whether the intersection ratio P _IoU is smaller than 0.7,

y ₀ represents the vertical axis coordinate of the central site of the first labeling frame in the original image before correction in the original image;

w ₁ represents the width of the second label frame in the cropped image;

h ₁ represents the height of a second label frame in the cropped image;

The fuzzy detection and classification recognition model in the step S5 is trained by the following method steps:

Step S51, inputting at least 1000 commodity images artificially marked as fuzzy and clear into an improved parallel type resnet neural network, and training and forming a fuzzy clear classification model through a first training branch in the parallel type resnet neural network;

The method comprises the steps of using a mobile phone to shoot a commodity to be marked according to the conditions that a camera is perpendicular to the commodity in front view by 90 degrees, the top view is 60 degrees and the top view is 30 degrees, shooting 3 special trademark parts of the commodity, 15 images in total, scaling the image to 320×320 resolution, cutting the image along the center, the upper left, the lower left, the upper right and the lower right by 256×256, horizontally overturning the cut image, rotating the cut image by 60 degrees and 30 degrees, adding random noise and color disturbance, randomly extracting 1000 images from the enhanced data, adding 1000 interference images into the parallel type resnet neural network, and training to form a category classification+interference model through a second training branch in the parallel type resnet neural network;

step S52, fusing the fuzzy clear classification model and the classification and interference model into the fuzzy detection and classification recognition model;

The parallel resnet neural network comprises a feature extraction layer shared by the first training branch and the second training branch, a fuzzy detection layer taking the output of the feature extraction layer as input and a commodity classification recognition layer,

The feature extraction layer comprises a convolution layer conv1, a conv2_x, a conv3_x and a conv4_x which are sequentially cascaded, and the fuzzy detection layer and the commodity classification identification layer comprise a convolution layer conv5_x, an average pooling layer average pool and a logistic regression softmax layer which are sequentially cascaded; the output of the convolution layer conv4_x in the feature extraction layer serves as an input to the convolution layer conv_5 in the blur detection layer and the commodity classification identification layer.

2. The method for automatically labeling images and automatically evaluating the labeling quality according to claim 1, wherein in the step S1, the method for extracting the effective frames from the continuous video frame images comprises:

step S12a, performing corrosion and expansion treatment on the image D (x, y), and removing noise in the image D (x, y) to obtain an image D (x, y) ^′;

step S13a, selecting a motion change area in the image D (x, y) ^′ in a circumscribed rectangular mode;

step S15a, judging whether the number of the motion change regions in the filtered and retained image D (x, y) ^′ is larger than a preset number threshold,

If yes, judging the current frame as the effective frame;

if not, judging the current frame as the non-valid frame.

3. The method for automatic labeling and automatic evaluation of labeling quality of images according to claim 2, wherein the number threshold is 4.

4. The method for automatically labeling and automatically evaluating the labeling quality of an image according to claim 1, wherein the method for clipping the clipped image from the effective frame comprises:

Step S11b, calculating the central locus coordinates of the circumscribed rectangle of each motion change region which are reserved through filtering, and recording as (x _i,y_i),x_i、y_i respectively represents the horizontal axis coordinates and the vertical axis coordinates of the central locus of the ith motion change region;

5. The method for automatic labeling and automatic evaluation of labeling quality according to claim 1, wherein in the step S2, the effective frame is input into the first object detection model after the original 1280×720 resolution is adjusted to 746×448 resolution by adopting multi-resolution object detection result fusion;

6. The method for automatic labeling and automatic evaluation of labeling quality of images according to claim 1, wherein the probability average P _mean is calculated by the following formula (1):

P _join1 represents the probability that the second object detection model determines that the content selected by the frame in the clipping image is an object;

P _join0 is calculated by the following formula (2):

p _join0＝P_class0×P_obj0 formula (2)

P _join1 is calculated by the following formula (3):

P _join1＝P_class1×P_obj1 formula (3)

7. The method according to claim 1, wherein in the step S5, the image of the product area obtained by cutting is adjusted to 256×256 resolution, and then inputted to the fuzzy detection and classification model.

8. The method for automatically labeling images and automatically evaluating the labeling quality according to claim 1, wherein the method for training the object detection model comprises the following steps:

s22, selecting an area where the commodity is located from an artificial frame in each commodity image by using labelImg image marking tools in a rectangular frame selection mode, and marking commodity category labels;

9. An automatic image labeling and labeling quality evaluation system capable of realizing the automatic image labeling and labeling quality evaluation method according to any one of claims 1-8, characterized in that the system comprises:

The fuzzy detection and classification recognition module is connected with the image cutting module and is used for inputting the commodity region image into a pre-trained fuzzy detection and classification recognition model, and the model outputs the class probability P _class that the object in the commodity region image corresponds to the commodity class to which the object belongs, the probability P _bg that the object is an image background and the probability P _blur that the image is fuzzy;

The labeling quality evaluation module is respectively connected with the probability average value calculation module, the intersection ratio calculation module and the fuzzy detection and classification recognition module and is used for outputting a quality evaluation result of image labeling of the effective frame through the labeling quality evaluation module by taking the calculated intersection ratio P _IOU, the probability average value P _mean and the calculated class probability P _class, the probability P _bg and the probability P _blur of image blurring which are associated with the commodity region image as inputs of a pre-trained labeling quality evaluation model.