CN115205553A

CN115205553A - Image data cleaning method and device, electronic equipment and storage medium

Info

Publication number: CN115205553A
Application number: CN202210829416.9A
Authority: CN
Inventors: 孟哲令; 刘诗男; 侯军; 伊帅
Original assignee: Beijing Sensetime Technology Development Co Ltd
Current assignee: Beijing Sensetime Technology Development Co Ltd
Priority date: 2022-06-22
Filing date: 2022-06-22
Publication date: 2022-10-18

Abstract

The disclosure relates to a data cleaning method, a device, an electronic device and a storage medium for images, wherein the method comprises the following steps: acquiring at least one image; screening out a first image in the at least one image according to the object marking frame corresponding to each image; generating an object prediction frame corresponding to the first image according to the first image; and performing data cleaning on the first image according to the object labeling frame and the corresponding object prediction frame corresponding to the first image. The image of the labeled object labeling frame can be screened for multiple times, so that the labeling difference existing among different images is reduced, the object features can be better extracted by the machine learning model based on the image training after data cleaning, and the training effect of the machine learning model can be favorably improved.

Description

Image data cleaning method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of information processing technologies, and in particular, to a method and an apparatus for cleaning image data, an electronic device, and a storage medium.

Background

Object detection has a wide range of application scenarios in both production and life, for example: object identity detection, object quality, object type detection, etc., which are typically performed by machine learning models. The machine learning model needs to be trained according to a large number of marked images, and the machine learning model can generate practical application value after reaching a certain precision. Therefore, how to perform data cleaning on the labeled image directly influences the model parameters of the trained machine learning model, and further influences the recognition accuracy of various detection functions.

Disclosure of Invention

The present disclosure provides a data cleaning technical solution for images.

According to an aspect of the present disclosure, there is provided a data cleansing method of an image, the data cleansing including: acquiring at least one image; screening out a first image in the at least one image according to the object marking frame corresponding to each image; generating an object prediction frame corresponding to the first image according to the first image; and performing data cleaning on the first image according to the object marking frame corresponding to the first image and the corresponding object prediction frame.

In a possible implementation manner, the screening out a first image of the at least one image according to the object labeling box corresponding to each image includes: screening out first images which meet at least one of the following conditions from the at least one image: the number of the object labeling frames corresponding to the first image is smaller than a preset maximum value, the number of the object labeling frames corresponding to the first image is larger than a preset minimum value, and any two object labeling frames corresponding to the first image do not belong to an inclusion relationship.

In a possible implementation manner, the performing data cleansing on the first image according to the object labeling box and the corresponding object prediction box corresponding to the first image includes: determining a confidence level of each object prediction box in the first image; screening out a first prediction frame with the confidence level higher than a preset confidence level in the object prediction frame; and performing data cleaning on the first image according to the first prediction frame and the object labeling frame.

In a possible implementation, the performing data cleansing on the first image according to the first prediction box and the object labeling box includes: for any first image, determining a second prediction frame matched with the object labeling frame in the first prediction frame according to the positions of the first prediction frame and the object labeling frame; determining a first quantity ratio of the second prediction frame to the object labeling frame in any first image; and screening any first image under the condition that the first number ratio is determined to be smaller than or equal to a preset first ratio.

In a possible implementation manner, the determining, according to the positions of the first prediction box and the object labeling box, a second prediction box in the first prediction box, which matches the object labeling box, includes: determining a third prediction frame which is intersected with the object marking frame and has the highest ratio in the first prediction frame according to the positions of the first prediction frame and the object marking frame; and under the condition that the intersection ratio of the third prediction frame and the object marking frame is determined to be larger than a preset intersection ratio, taking the third prediction frame as the second prediction frame.

In a possible implementation, the performing data cleansing on the first image according to the first prediction box and the object labeling box includes: determining a second quantity ratio of the first prediction frame to the object labeling frame aiming at any first image; and screening any first image when the second number ratio is larger than or equal to a preset second ratio.

In one possible embodiment, the acquiring at least one image includes: acquiring an original image through a web crawler, and executing at least one of the following: screening out an image to be marked in the original image under the condition that the original image does not comprise the object marking frame; in response to the generation processing of the object labeling frame aiming at the image to be labeled, taking the image to be labeled after the object labeling frame is generated as the image; and in the case that the original image is determined to comprise the object labeling frame, taking the original image as the image.

In one possible embodiment, the data cleansing method further includes: determining the model capacity of a model to be trained corresponding to the image; the image after data cleaning is used for training the model to be trained; screening out a target model in a model library according to the model capacity; wherein the model capacity of the target model is larger than that of the model to be trained; the generating an object prediction frame corresponding to the first image according to the first image comprises: and inputting the first image into the target model to obtain an object prediction frame corresponding to the first image.

In one possible embodiment, the data cleansing method further includes: and determining at least one of a preset maximum value, a preset minimum value, a first quantity ratio and a second quantity ratio according to the quantity of the object marking frames corresponding to the image.

According to an aspect of the present disclosure, there is provided an image data cleansing apparatus including: the image acquisition module is used for acquiring at least one image; the image screening module is used for screening out a first image in the at least one image according to the object marking frame corresponding to each image; the object prediction frame generation module is used for generating an object prediction frame corresponding to the first image according to the first image; and the first image screening module is used for carrying out data cleaning on the first image according to the object marking frame corresponding to the first image and the corresponding object prediction frame.

According to an aspect of the present disclosure, there is provided an electronic device including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to invoke the instructions stored by the memory to perform the data cleansing method described above.

According to an aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the above-described data cleansing method.

In the embodiment of the disclosure, at least one image can be acquired, and a first image in the at least one image is screened out according to an object labeling frame corresponding to each image. And then generating an object prediction frame corresponding to the first image according to the first image. And finally, performing data cleaning on the first image according to the object marking frame corresponding to the first image and the corresponding object prediction frame. According to the image labeling method and device, the images of the labeled object labeling frames are screened for multiple times, labeling differences among different images are reduced, and therefore the machine learning model based on the image training after data cleaning can better extract object features, and the training effect of the machine learning model is favorably improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure. Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure.

Fig. 1 is a reference diagram showing a difference in labeling in the related art.

Fig. 2 shows a flowchart of a data cleansing method for an image according to an embodiment of the present disclosure.

FIG. 3 illustrates a reference schematic diagram of a culled image provided in accordance with an embodiment of the disclosure.

FIG. 4 illustrates a reference schematic diagram of a sifted out image provided in accordance with an embodiment of the present disclosure.

FIG. 5 illustrates a reference schematic diagram of a culled image provided in accordance with an embodiment of the disclosure.

FIG. 6 shows a flow chart of a method for data cleansing of an image provided according to an embodiment of the present disclosure.

FIG. 7 illustrates a reference schematic diagram of a culled image provided in accordance with an embodiment of the disclosure.

FIG. 8 illustrates a reference schematic diagram of a culled image provided in accordance with an embodiment of the disclosure.

FIG. 9 shows a block diagram of a data cleansing apparatus for images provided in accordance with an embodiment of the present disclosure.

FIG. 10 shows a block diagram of an electronic device provided according to an embodiment of the disclosure

Detailed Description

Various exemplary embodiments, features and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers indicate functionally identical or similar elements. While the various aspects of the embodiments are presented in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.

The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. Additionally, the term "at least one" herein means any one of a variety or any combination of at least two of a variety, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.

Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements and circuits that are well known to those skilled in the art have not been described in detail so as not to obscure the present disclosure.

In the related art, the data set with the object image is usually generated by a manual method to generate an object labeling box, and then is input into a model to be trained for training, which is prone to the following problems: 1. the image quality of the data set is ragged. Large-scale data sets are usually acquired by web crawlers, so that fuzzy and wrong data are easy to appear in the data sets. In addition, because the object labeling box needs to be generated manually, the labeling personnel is also prone to labeling errors, which is not beneficial to the training and generalization of the model. 2. The standard of manually generating the object labeling frame is inconsistent, in general, a large-scale data set can generate the object labeling frame through a plurality of annotators, the labeling habit of each annotator is usually different, and even if the same annotator has different labeling habits at different times. Referring to fig. 1, fig. 1 is a schematic reference diagram illustrating a difference in labeling in the related art. Taking the object as food as an example, for the food image in fig. 1, the annotator may label each litchi separately, may label the litchi in the tray separately from the litchi outside the tray, and may label all the litchis as a whole. In a data set, due to the labeling difference caused by inconsistent labeling habits, the difficulty of capturing the object features by the model is easily increased, and further the training effect of the model is influenced.

In view of this, an embodiment of the present disclosure provides an image data cleaning method, which may acquire at least one image, and screen out a first image of the at least one image according to an object labeling frame corresponding to each image. And then generating an object prediction frame corresponding to the first image according to the first image. And finally, performing data cleaning on the first image according to the object marking frame corresponding to the first image and the corresponding object prediction frame. According to the image labeling method and device, the images of the labeled object labeling frames are screened for multiple times, labeling differences among different images are reduced, and therefore the machine learning model based on the image training after data cleaning can better extract object features, and the training effect of the machine learning model is favorably improved.

In a possible implementation manner, the data cleansing method may be executed by an electronic device such as a terminal device or a server, the terminal device may be a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle-mounted device, a wearable device, or the like, and the method may be implemented by a manner in which a processor calls a computer-readable instruction stored in a memory. Alternatively, the method may be performed by a server. By combining with an actual application scene, a marker can generate an object marking frame for an image on terminal equipment, then the processed image is sent to the electronic equipment, the electronic equipment can automatically screen the image to be trained of the machine learning model, then the training process of the machine learning model is carried out, and after the training of the machine learning model is completed, the machine learning model with higher recognition accuracy and stronger generalization capability can be obtained.

Referring to fig. 2, fig. 2 is a flowchart illustrating a data cleansing method for an image according to an embodiment of the present disclosure, and as shown in fig. 2, the data cleansing method includes:

step S100, at least one image is acquired. The image may be an image that has been annotated by an annotator, for example, with at least one object annotation box if the annotator is correctly annotated. The object may be any object to be detected, and may be determined by a developer according to an actual application scenario, for example: the object may be food, a person, an animal, a vehicle, etc., and the disclosed embodiments are not limited thereto.

In one possible implementation, step S100 may include: acquiring an original image through a web crawler, and executing at least one of the following: screening out an image to be labeled in the original image under the condition that the original image does not comprise an object labeling frame; in response to the generation processing of the object labeling frame aiming at the image to be labeled, taking the image to be labeled after the object labeling frame is generated as the image; and in the case that the original image is determined to comprise the object labeling frame, taking the original image as the image.

The original image is an image which is not marked by a marker and may not have an object. And then screening out the image to be marked in the original image. For example, images with problems of blur, object loss, and the like in an original image may be screened out through a screening algorithm or a screening model in the related art, and details of the embodiments of the present disclosure are not repeated herein. And finally, in response to the generation processing of the object labeling frame aiming at the image to be labeled, taking the image to be labeled after the object labeling frame is generated as the image. Exemplarily, if the electronic device is a terminal device, the image to be annotated can be directly displayed on the display screen. If the electronic equipment is a server, the image to be annotated can be sent to a terminal device, namely, the terminal device displays the image to be annotated through a display screen. And then the annotator performs manual calibration (i.e. generates an object annotation box). After the marking is finished, the marker can transmit the marked object image to the electronic equipment so as to carry out a data cleaning process. The embodiment of the disclosure can reduce the manual proofreading cost by automatically preprocessing the original image so as to improve the overall efficiency of data cleaning.

And S200, screening out a first image in the at least one image according to the object marking frame corresponding to each image. For example, the electronic device may filter out the first image of the images according to the number or the position of the object labeling boxes in each image. The embodiment of the present disclosure does not limit the specific number and the position rule, and a developer may set the number and the position rule according to actual situations. For example: the developer can set the label frame with the edge of the label frame for the screened object being closer to the edge of the image (if the coordinate deviation value of the closest pixel point is smaller than the preset value), that is, the electronic device can consider that the object contained in the image is incomplete, and the training value is low.

In one possible implementation, step S200 may include: screening out first images which meet at least one of the following conditions from the at least one image: the number of the object labeling frames corresponding to the first image is smaller than a preset maximum value, the number of the object labeling frames corresponding to the first image is larger than a preset minimum value, and any two object labeling frames corresponding to the first image do not belong to an inclusion relationship.

With reference to fig. 3-5, fig. 3-5 illustrate reference schematic diagrams of a culled image provided according to an embodiment of the disclosure. Taking the object as food as an example, the boxes in fig. 3 to 5 represent object labels (or food labels, other types of objects can be set as labels corresponding to the objects). Fig. 3 to 5 correspond to the above three conditions in sequence.

In fig. 3, due to the personal labeling habit of the labeling operator, each mangosteen in the image corresponds to an object labeling frame, and a large number of incomplete mangosteen images exist, so that the training value of the image is considered to be low, the extraction of the food characteristics of the mangosteen in the image by the model to be trained is not facilitated, and the image can be screened out by the electronic device. That is, the electronic device may compare the preset maximum value with the number of the object labeling boxes of each image, and then determine the screened image in the image. For example: the preset maximum value may be set to 10, and since the number of the object labeling boxes in fig. 3 is 12, the image is screened out.

In fig. 4, an object labeling box having an inclusion relationship appears in the image, that is, all image contents of 2 boxes are included in the 1 box in fig. 4, that is, the 2 boxes have a low training value, which is not favorable for the model to be trained to extract the food features of the image, and the electronic device can screen out the image. That is, the electronic device may determine whether any two object annotation boxes in each image have a containment relationship to determine a filtered image in the image.

In fig. 5, an image with filing errors appears in the image, that is, a non-food image or an unmarked image is input to the electronic device by a manual error of a marker, the image generally has no training value and belongs to an image with marking errors, and the electronic device can screen out the image. That is, the electronic device may determine to compare a preset minimum value (e.g., 0) with the number of object annotation boxes of each image, and then determine a culled image in the image.

Step S300, generating an object prediction frame corresponding to the first image according to the first image. For example, the object prediction box may be determined by an object detection algorithm or a machine learning model in the related art, and the embodiments of the present disclosure are not limited herein. The object prediction box is used for marking an area in the first image, wherein the electronic equipment considers that the object possibly exists.

In a possible implementation, the data cleansing method further includes: and determining the model capacity of the model to be trained corresponding to the image. The image after data cleaning is used to train the model to be trained, where the model to be trained represents a model to be used in an actual production application stage, for example: can be a RetinaNet-50 model (a target detection model). The model Capacity (i.e., capacity) represents the fitting capability of the model to be trained. Then, the electronic equipment screens out target models (such as a Faster RCNN-50 model, a RetinaNet-151 model and the like, wherein the model capacities of the fast RCNN-50 model and the RetinaNet-151 model are larger than that of the RetinaNet-50 model) in the model library according to the model capacities. And the model capacity of the target model is larger than that of the model to be trained. In the embodiment of the present disclosure, a model library may be pre-established, and the model library may include a plurality of original models and corresponding model capacities thereof. And then the electronic equipment can automatically recommend an original model with larger model capacity as a target model according to the model capacity of the model to be trained. In the case that the target model exists, the step S300 may include: and inputting the first image into the target model to obtain an object prediction frame corresponding to the first image. Generally, a larger model capacity represents a higher detection accuracy and a longer detection time period. In combination with the actual application scenario, in order to meet the use requirements of the user, a model with a moderate model capacity is usually selected for the model to be trained so as to balance the detection accuracy and the detection duration. In the embodiment of the disclosure, the object prediction frame is generated by detecting the target model with higher precision, so that whether the object in the image has a training value or not can be accurately determined. In other words, if the target model with high model capacity cannot generate the object prediction box of an object, the model to be trained with lower model capacity applied to the actual production phase is less likely to recognize the object, for example: the fuzzy object and the negative sample object in the image have lower training value on the model to be trained. Therefore, the training value of each image can be accurately determined by using the target model with higher model capacity, so that the model to be trained based on the image training after the subsequent data cleaning is more consistent with the actual application scene.

And S400, performing data cleaning on the first image according to the object labeling frame and the corresponding object prediction frame corresponding to the first image. For example, the image after data cleaning in the first image may be determined according to the number or positions of the object labeling boxes and the object prediction boxes. For example: the numbers of the object labeling frame and the object prediction frame can be compared, and if the difference between the numbers is larger than a preset difference value, the occurrence of a fuzzy part or a prediction error part in the image is determined. For example, if the number of object labeling boxes is much smaller than the number of object prediction boxes, it may be determined that a blurred or incomplete object exists in the image, i.e., the object prediction box corresponding to the blurred or incomplete object cannot be generated. If the number of the object labeling frames is far higher than that of the object prediction frames, it can be determined that the object prediction frames generated by the electronic equipment are over-detected, namely, a plurality of object prediction frames generating one object are generated. Both are not favorable for improving the precision and generalization requirements of the model, and the two types of images can be screened out. For another example: the offset of the object labeling frame and the offset of the object prediction frame can be compared, and under the condition that the difference between the two is greater than a preset offset, a labeling area error or an electronic equipment prediction error occurs in the image, namely, the situation shows that the offset of the object labeling frame and the object prediction frame is too large. Details regarding the above-described sifting will be described later.

Referring to fig. 6, fig. 6 is a flowchart illustrating a data cleansing method for an image according to an embodiment of the present disclosure, and in one possible implementation, step S400 may include:

step S410, determining a confidence of each object prediction box in the first image. For example, the confidence level is used to indicate the accuracy of the object prediction frame considered by the machine learning model, and may be represented as a confidence score corresponding to the determination result when the machine learning model outputs the determination result in the related art, which is not described herein again in this embodiment of the disclosure.

And step S420, screening out the first prediction frame with the confidence level higher than the preset confidence level in the object prediction frame. The confidence level is preset, and a developer can set the confidence level according to actual requirements, and the embodiment of the disclosure is not limited herein. The embodiment of the disclosure allows the first prediction frame to be used instead of the object prediction frame in step S400, so as to increase the representativeness of each first prediction frame, thereby improving the screening effect, and being beneficial to improving the training quality of the machine learning model based on the image training after data cleaning.

And step S430, performing data cleaning on the first image according to the first prediction frame and the object marking frame. Illustratively, the first image may be filtered according to the number, position, etc. of the first prediction frame and the object labeling frame. In one example, this step may include: and for any first image, determining a second prediction frame matched with the object labeling frame in the first prediction frame according to the positions of the first prediction frame and the object labeling frame. The matched object labeling frame and the second prediction frame are used for indicating that the object labeling frame and the second prediction frame label the same object in the first image. Illustratively, the second prediction box may be determined by: and determining a third prediction frame which is intersected with the object marking frame and has the highest ratio in the first prediction frame according to the positions of the first prediction frame and the object marking frame. The Intersection over Union ratio (IoU, intersection over Union) is used to indicate the coincidence degree of the first prediction frame and the object labeling frame, in other words, a ratio can be calculated based on the number of pixels corresponding to the Intersection part of the first prediction frame and the object labeling frame and the number of pixels corresponding to the Union part of the first prediction frame and the object labeling frame, the ratio is used as the Intersection ratio, and then the third prediction frame with the highest Intersection ratio in the first prediction frame is determined based on the Intersection ratio. And under the condition that the intersection ratio of the third prediction frame and the object marking frame is determined to be larger than a preset intersection ratio, taking the third prediction frame as the second prediction frame. The preset intersection ratio may be set by a developer according to actual conditions, and the embodiment of the disclosure is not limited herein. According to the image processing method and device, the matching precision of the second prediction frame and the object marking frame is improved by setting the preset intersection ratio, the screening precision of the first image can be further improved, and the training quality of the machine learning model based on the image training after data cleaning is favorably improved.

After determining a second prediction frame, determining a first quantity ratio of the second prediction frame to the object labeling frame in any first image. And under the condition that the first number ratio is determined to be larger than a preset first ratio, taking any one first image as a data-cleaned food image. And screening out the first image in the case that the first number ratio is determined to be less than or equal to a preset first ratio. Referring to fig. 7, where the object is taken as food, fig. 7 is a schematic reference diagram of a sifting image provided according to an embodiment of the disclosure. The white boxes in fig. 7 represent the object labeling boxes, the black boxes represent the second prediction boxes, and the numbers in the figure represent the confidence of the second prediction box and the matching object labeling box. In the object image, there are three second prediction boxes with confidence levels of 0.76, 0.99, 1.00, respectively, i.e., the machine learning model determines three of the seven food objects in the image. However, due to the focusing of image shooting, the other four food objects in the image are shot more vaguely, and the machine learning model cannot recognize whether the four food objects are food, so that the number of the second prediction frames is smaller than that of the object labeling frames. And under the condition that the ratio of the two is less than or equal to the preset first ratio, the fact that the fuzzy part in the image is too much is determined, the extraction of the food characteristics by the model is not facilitated, and therefore the food image can be screened out.

In a possible implementation manner, if the data cleansing method includes multiple conditions for screening out the first images, the first images satisfying any one screening out condition may be set to be screened out, and of course, the first images satisfying any number of screening out conditions may also be screened out, and embodiments of the present disclosure are not limited herein.

In one example, the step S430 may further include: and determining a second quantity ratio of the first prediction frame and the object labeling frame aiming at any first image, and screening out any first image when the second quantity ratio is larger than or equal to a preset second ratio.

Taking the object as food as an example, in conjunction with fig. 8, fig. 8 shows a reference schematic diagram of a screening image provided according to an embodiment of the present disclosure. The white boxes in fig. 8 represent the object labeling boxes, and the black boxes represent the first prediction boxes. In the image, the first prediction frames generated by the electronic device are all object prediction frames with higher confidence, but as can be seen from fig. 8, due to the multi-particle structure of the raspberry, excessive recognition of the electronic device may be caused, that is, the overlapped part between the raspberry and the raspberry is also considered as a raspberry object (i.e., the above object), so that the number of the objects actually existing in the image is smaller than the number of the first prediction frames, and the image is not favorable for the model to be trained to extract the object features. Therefore, the second numerical ratio is set in the embodiment of the disclosure to screen out the image of the over-prediction object of the electronic device, so as to improve the training quality of the model to be trained, which is trained based on the image after data cleaning.

In a possible implementation, the data cleansing method further includes: and determining at least one of a preset maximum value, a preset minimum value, a first quantity ratio and a second quantity ratio according to the quantity of the object labeling frames corresponding to the images. According to the embodiment of the disclosure, the related parameters of the data cleaning process can be automatically set according to the number of the object marking frames in the image, so that excessive images are prevented from being screened out, and the condition of excessive data cleaning is further reduced. For example: the range of the preset maximum value and the preset minimum value can be determined according to the average value of the object marking frame corresponding to each image, so that the average value falls between the preset maximum value and the preset minimum value, and the screening rate of the images is reduced. For another example: if the current average value is larger than the previous average value in the history record, the first quantity ratio of the previous time may be adjusted to decrease the screening rate of the image, and the specific adjustment range is not limited herein.

In a possible implementation manner, the preset maximum value, the preset minimum value, the preset confidence, the first quantity ratio, the second quantity ratio, and the preset cross-over ratio may also be flexibly set by a developer according to an actual labeling condition of an image, and the embodiments of the present disclosure are not limited herein. For example: the preset maximum value may be set to 10, the preset minimum value may be set to 0, the preset confidence may be set to 0.5, the first number ratio may be set to 0.5, the second number ratio may be set to 3, and the preset intersection ratio may be set to 0.5.

In combination with an actual application scenario, before training a model to be trained, data to be trained (namely, the images) can be input into the electronic equipment, then the electronic equipment automatically performs data cleaning to screen out images with learning difficulty, image blurring and labeling errors, then the data to be trained after data cleaning is input into the machine learning model for training until the machine learning model training is completed, and thus an object detection model with higher robustness and accuracy can be obtained. For example: a food detection model can be trained through the food image after data cleaning, and a pedestrian detection model and the like can be trained through the pedestrian image after data cleaning.

It is understood that the above-mentioned method embodiments of the present disclosure can be combined with each other to form a combined embodiment without departing from the logic of the principle, which is limited by the space, and the detailed description of the present disclosure is omitted. Those skilled in the art will appreciate that in the above methods of the specific embodiments, the specific order of execution of the steps should be determined by their function and possibly their inherent logic.

In addition, the present disclosure also provides an image data cleaning device, an electronic device, a computer-readable storage medium, and a program, which can be used to implement any image data cleaning method provided by the present disclosure, and the corresponding technical solutions and descriptions and corresponding descriptions in the methods section are not repeated.

Fig. 9 shows a block diagram of a data cleansing apparatus for an image provided according to an embodiment of the present disclosure, and as shown in fig. 9, the data cleansing apparatus 100 includes: the image obtaining module 110 is configured to obtain at least one image. The image selecting module 120 is configured to select a first image of the at least one image according to the object labeling box corresponding to each image. The object prediction frame generation module 130 is configured to generate an object prediction frame corresponding to the first image according to the first image. The first image screening module 140 is configured to perform data cleaning on the first image according to the object labeling frame and the corresponding object prediction frame corresponding to the first image.

In a possible implementation, the performing data cleansing on the first image according to the first prediction box and the object labeling box includes: for any first image, according to the positions of the first prediction frame and the object labeling frame, determining a second prediction frame matched with the object labeling frame in the first prediction frame; determining a first quantity ratio of the second prediction frame to the object labeling frame in any first image; and screening any first image under the condition that the first number ratio is determined to be smaller than or equal to a preset first ratio.

In a possible implementation, the performing data cleansing on the first image according to the first prediction box and the object labeling box includes: determining a second quantity ratio of the first prediction frame to the object labeling frame aiming at any first image; and screening any first image when the second quantity ratio is greater than or equal to a preset second ratio.

In one possible embodiment, the acquiring at least one image includes: acquiring an original image through a web crawler, and executing at least one of the following: screening out an image to be labeled in the original image under the condition that the original image does not comprise an object labeling frame; in response to the generation processing of the object labeling frame aiming at the image to be labeled, taking the image to be labeled after the object labeling frame is generated as the image; and in the case that the original image is determined to comprise the object labeling frame, taking the original image as the image.

In one possible embodiment, the data cleansing apparatus further includes: the target model calling module is used for executing any one of the following steps: determining the model capacity of a model to be trained corresponding to the image; the image after data cleaning is used for training the model to be trained; screening out a target model in a model library according to the model capacity; wherein the model capacity of the target model is larger than that of the model to be trained; the generating an object prediction frame corresponding to the first image according to the first image includes: and inputting the first image into the target model to obtain an object prediction frame corresponding to the first image.

The method has specific technical relevance with the internal structure of the computer system, and can solve the technical problem of how to improve the hardware operation efficiency or the execution effect (including reducing data storage capacity, reducing data transmission capacity, improving hardware processing speed and the like), thereby obtaining the technical effect of improving the internal performance of the computer system according with the natural law.

In some embodiments, functions of or modules included in the apparatus provided in the embodiments of the present disclosure may be used to execute the method described in the above method embodiments, and specific implementation thereof may refer to the description of the above method embodiments, and for brevity, will not be described again here.

Embodiments of the present disclosure also provide a computer-readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the above-mentioned method. The computer readable storage medium may be a volatile or non-volatile computer readable storage medium.

An embodiment of the present disclosure further provides an electronic device, including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to invoke the memory-stored instructions to perform the above-described method.

The disclosed embodiments also provide a computer program product comprising computer readable code or a non-transitory computer readable storage medium carrying computer readable code, which when run in a processor of an electronic device, the processor in the electronic device performs the above method.

The electronic device may be provided as a server or other modality of device.

Fig. 10 illustrates a block diagram of an electronic device 1900 provided in accordance with an embodiment of the disclosure. For example, the electronic device 1900 may be provided as a server or terminal device. Referring to fig. 10, electronic device 1900 includes a processing component 1922 further including one or more processors and memory resources, represented by memory 1932, for storing instructions, e.g., applications, executable by processing component 1922. The application programs stored in memory 1932 may include one or more modules that each correspond to a set of instructions. Further, the processing component 1922 is configured to execute instructions to perform the methods described above.

Electronic device 1900 may also include a power supply component1926 are configured to perform power management of the electronic device 1900, a wired or wireless network interface 1950 is configured to connect the electronic device 1900 to a network, and an input/output (I/O) interface 1958. The electronic device 1900 may operate based on an operating system, such as the Microsoft Server operating system (Windows Server), stored in the memory 1932 ^TM ) Apple Inc. of the present application based on the graphic user interface operating System (Mac OS X) ^TM ) Multi-user, multi-process computer operating system (Unix) ^TM ) Free and open native code Unix-like operating System (Linux) ^TM ) Open native code Unix-like operating System (FreeBSD) ^TM ) Or the like.

In an exemplary embodiment, a non-transitory computer readable storage medium, such as the memory 1932, is also provided that includes computer program instructions executable by the processing component 1922 of the electronic device 1900 to perform the above-described methods.

The present disclosure may be systems, methods, and/or computer program products. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied thereon for causing a processor to implement various aspects of the present disclosure.

The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but is not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or electrical signals transmitted through electrical wires.

The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device over a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.

The computer program instructions for carrying out operations of the present disclosure may be assembler instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, the electronic circuitry that can execute the computer-readable program instructions implements aspects of the present disclosure by utilizing the state information of the computer-readable program instructions to personalize the electronic circuitry, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA).

Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer readable program instructions.

These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The computer program product may be embodied in hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK) or the like.

The foregoing description of the various embodiments is intended to highlight various differences between the embodiments, and the same or similar parts may be referred to each other, and for brevity, will not be described again herein.

It will be understood by those skilled in the art that in the method of the present invention, the order of writing the steps does not imply a strict order of execution and any limitations on the implementation, and the specific order of execution of the steps should be determined by their function and possible inherent logic.

If the technical scheme of the application relates to personal information, a product applying the technical scheme of the application clearly informs personal information processing rules before processing the personal information, and obtains personal independent consent. If the technical scheme of the application relates to sensitive personal information, a product applying the technical scheme of the application obtains individual consent before processing the sensitive personal information, and simultaneously meets the requirement of 'express consent'. For example, at a personal information collection device such as a camera, a clear and significant identifier is set to inform that the personal information collection range is entered, the personal information is collected, and if the person voluntarily enters the collection range, the person is regarded as agreeing to collect the personal information; or on the device for processing the personal information, under the condition of informing the personal information processing rule by using obvious identification/information, obtaining personal authorization by modes of popping window information or asking a person to upload personal information of the person by himself, and the like; the personal information processing rule may include information such as a personal information processor, a personal information processing purpose, a processing method, and a type of personal information to be processed.

Having described embodiments of the present disclosure, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein was chosen in order to best explain the principles of the embodiments, the practical application, or improvements to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. A data cleaning method for an image, the data cleaning method comprising:

acquiring at least one image;

screening out a first image in the at least one image according to the object labeling box corresponding to each image;

generating an object prediction frame corresponding to the first image according to the first image;

and performing data cleaning on the first image according to the object marking frame corresponding to the first image and the corresponding object prediction frame.

2. The data cleaning method of claim 1, wherein the screening out a first image of the at least one image according to the object labeling box corresponding to each image comprises:

screening out first images which meet at least one of the following conditions from the at least one image: the number of the object labeling frames corresponding to the first image is smaller than a preset maximum value, the number of the object labeling frames corresponding to the first image is larger than a preset minimum value, and any two object labeling frames corresponding to the first image do not belong to an inclusion relationship.

3. The data cleansing method according to claim 1 or 2, wherein the data cleansing of the first image according to the object labeling box and the object prediction box corresponding to the first image comprises:

determining a confidence level of each object prediction box in the first image;

screening out a first prediction frame with the confidence level higher than a preset confidence level in the object prediction frame;

and performing data cleaning on the first image according to the first prediction frame and the object labeling frame.

4. The data cleansing method of claim 3, wherein the data cleansing of the first image according to the first prediction box and the object labeling box comprises:

for any first image, determining a second prediction frame matched with the object labeling frame in the first prediction frame according to the positions of the first prediction frame and the object labeling frame;

determining a first quantity ratio of the second prediction frame to the object labeling frame in any first image;

and screening any first image under the condition that the first number ratio is determined to be smaller than or equal to a preset first ratio.

5. The data cleansing method of claim 4, wherein the determining a second prediction box of the first prediction box that matches the object labeling box according to the positions of the first prediction box and the object labeling box comprises:

determining a third prediction frame which is intersected with the object marking frame and has the highest ratio in the first prediction frame according to the positions of the first prediction frame and the object marking frame;

and under the condition that the intersection ratio of the third prediction frame and the object marking frame is determined to be larger than a preset intersection ratio, taking the third prediction frame as the second prediction frame.

6. The data cleansing method of claim 3, wherein the data cleansing of the first image according to the first prediction box and the object labeling box comprises:

determining a second quantity ratio of the first prediction frame to the object labeling frame aiming at any first image;

and screening any first image when the second number ratio is larger than or equal to a preset second ratio.

7. The data cleansing method of any one of claims 1 to 6, wherein the acquiring at least one image comprises:

acquiring an original image through a web crawler, and executing at least one of the following:

screening out an image to be labeled in the original image under the condition that the original image does not comprise an object labeling frame; in response to the generation processing of the object labeling frame aiming at the image to be labeled, taking the image to be labeled after the object labeling frame is generated as the image;

and in the case that the original image is determined to comprise the object labeling frame, taking the original image as the image.

8. The data cleansing method of any one of claims 1 to 7, further comprising:

determining the model capacity of a model to be trained corresponding to the image; the image after data cleaning is used for training the model to be trained;

screening out a target model in a model library according to the model capacity; wherein the model capacity of the target model is larger than that of the model to be trained;

the generating an object prediction frame corresponding to the first image according to the first image includes:

and inputting the first image into the target model to obtain an object prediction frame corresponding to the first image.

9. The data cleansing method of any one of claims 1 to 8, further comprising:

and determining at least one of a preset maximum value, a preset minimum value, a first quantity ratio and a second quantity ratio according to the quantity of the object marking frames corresponding to the image.

10. An apparatus for cleaning data of an image, the apparatus comprising:

the image acquisition module is used for acquiring at least one image;

the image screening module is used for screening out a first image in the at least one image according to the object marking frame corresponding to each image;

the object prediction frame generation module is used for generating an object prediction frame corresponding to the first image according to the first image;

and the first image screening module is used for cleaning the data of the first image according to the object marking frame and the corresponding object prediction frame corresponding to the first image.

11. An electronic device, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to invoke the memory-stored instructions to perform the data cleansing method of any one of claims 1 to 10.

12. A computer readable storage medium having computer program instructions stored thereon, wherein the computer program instructions, when executed by a processor, implement the data cleansing method of any one of claims 1 to 10.