CN115131597A - Data processing method and device, electronic equipment and storage medium - Google Patents

Data processing method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN115131597A
CN115131597A CN202210432805.8A CN202210432805A CN115131597A CN 115131597 A CN115131597 A CN 115131597A CN 202210432805 A CN202210432805 A CN 202210432805A CN 115131597 A CN115131597 A CN 115131597A
Authority
CN
China
Prior art keywords
pixel point
image
content
unmarked
category
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210432805.8A
Other languages
Chinese (zh)
Inventor
李汉俊
潘兴甲
鄢科
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202210432805.8A priority Critical patent/CN115131597A/en
Publication of CN115131597A publication Critical patent/CN115131597A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The embodiment of the invention discloses a data processing method, a data processing device, electronic equipment and a storage medium, and can be applied to various scenes such as cloud technology, artificial intelligence, intelligent traffic, auxiliary driving and the like; the method comprises the following steps: acquiring an image, wherein the image comprises a plurality of image contents, and each image content belongs to a content category; each content category is labeled through a corresponding labeling pixel point; acquiring a first feature value set corresponding to the labeling pixel point; determining unmarked pixel points in the image, and acquiring a second feature value set corresponding to the unmarked pixel points; predicting the prediction category of the unmarked pixel points according to the first characteristic value set and the second characteristic value set; and taking the prediction category meeting the preset requirement as a pseudo label of the unmarked pixel point, and marking the unmarked pixel point by using the pseudo label. According to the method and the device, the marked information can be perfected by utilizing the algorithm under the condition that the artificial marked information is less, and then the consideration on the cost and the detection effect is realized.

Description

Data processing method and device, electronic equipment and storage medium
Technical Field
The present application relates to the field of image processing, and in particular, to a data processing method and apparatus, an electronic device, and a storage medium.
Background
In the prior art, when detecting a specific type of target image in a picture, one of Fully-Supervised Object Detection (FSOD), weak-Supervised Object Detection (WSOD) or Semi-Supervised Object Detection (SSOD) is often used.
However, when the fully supervised target detection is used for detection, although the detection effect is good, a large amount of manpower and material resources are required to obtain a large amount of accurate labeling information. With weakly supervised target detection or semi-supervised target detection, although there is less need for labeling information, the detection effect is generally poor. Therefore, the prior art generally cannot achieve the compromise between the cost and the detection effect.
Disclosure of Invention
The embodiment of the application provides a data processing method and device, an electronic device and a storage medium, which can solve the problem that the prior art cannot take cost and detection effects into consideration.
An embodiment of the present application provides a data processing method, including:
acquiring an image, wherein the image comprises a plurality of image contents, and each image content belongs to a content category; each content category is labeled through a corresponding labeling pixel point, and the labeling pixel point is located in image content included in the corresponding content category;
acquiring a first feature value set corresponding to the labeling pixel point;
determining unmarked pixel points in the image, and acquiring a second feature value set corresponding to the unmarked pixel points;
predicting the prediction category of the unmarked pixel points according to the first characteristic value set and the second characteristic value set;
and taking the prediction category meeting the preset requirement as a pseudo label of the unmarked pixel point, and marking the unmarked pixel point by using the pseudo label.
An embodiment of the present application further provides a data processing apparatus, including:
the image acquisition unit is used for acquiring an image, wherein the image comprises a plurality of image contents, and each image content belongs to a content category; each content category is labeled through a corresponding labeling pixel point, and the labeling pixel point is located in image content included in the corresponding content category;
a first set obtaining unit, configured to obtain a first feature value set corresponding to the labeled pixel point;
the second set acquisition unit is used for determining unmarked pixel points in the image and acquiring a second characteristic value set corresponding to the unmarked pixel points;
the prediction type obtaining unit is used for predicting the prediction type of the unmarked pixel points according to the first characteristic value set and the second characteristic value set;
and the pixel point labeling unit is used for taking the prediction category meeting the preset requirement as a pseudo label of the unmarked pixel point and labeling the unmarked pixel point by using the pseudo label.
In some embodiments, the image content where the annotation pixel is located is marked as annotation image content; a first set acquisition unit comprising:
the characteristic value acquisition subunit is used for acquiring a characteristic value corresponding to the marked image content for the marked image content corresponding to each marked pixel point;
and the first set subunit is configured to aggregate the feature values of the plurality of labeled image contents to obtain the first feature value set.
In some embodiments, the feature value obtaining subunit includes:
the pixel point secondary subunit is used for acquiring a pixel point set corresponding to the content of the marked image, wherein the pixel point set comprises a central pixel point and a non-central pixel point;
the first operation secondary subunit is configured to perform first operation processing on the center pixel and the non-center pixel, so as to obtain content parameter values corresponding to each center pixel and each non-center pixel in the pixel set;
and the second operation subunit is configured to perform second operation on the content parameter value corresponding to each pixel in the pixel set and the characteristic parameter value of the corresponding position of the image, so as to obtain a characteristic value, where the characteristic value is a characteristic value corresponding to the content of the labeled image.
In some embodiments, the second operation subunit is specifically configured to multiply, bit by bit, a content parameter value corresponding to each pixel having a content parameter value in the pixel set and a characteristic parameter value of a corresponding position of the image, to obtain a plurality of product values; adding the multiple product values to obtain a sum result; and carrying out normalization processing on the addition result to obtain the characteristic value.
In some embodiments, the image includes a first number of content categories; a prediction category acquisition unit comprising:
a third operation subunit, configured to perform third operation on the first feature value set and the second feature value set to obtain a first number of similarity values corresponding to each unlabeled pixel, where the first number of similarity values correspond to a first number of content categories one to one;
a target category subunit, configured to, for each unmarked pixel point, obtain a target similarity value with a largest similarity value among the first number of similarity values corresponding to the unmarked pixel point, and a target content category corresponding to the target similarity value;
and the prediction category subunit is used for taking the corresponding target content category as the prediction category of the unmarked pixel point when the target similarity value exceeds a similarity threshold value.
In some embodiments, the apparatus further comprises:
and the background type determining unit is used for determining the prediction type of the unmarked pixel points as the background type when the target similarity value does not exceed the similarity threshold, wherein the unmarked pixel points of the background type are not marked with pseudo labels.
In some embodiments, the image content where the annotation pixel is located is marked as annotation image content; the device further comprises:
the target loss construction unit is used for constructing the operation of a target loss value according to the incidence relation between the central pixel point of the current marked image content and the non-central pixel point of the current marked image content, the incidence relation between the central pixel point of the current marked image content and the predicted values of the second number of pixel points and the incidence relation between the central pixel point of the current marked image content and the non-central pixel point of each marked image content;
and the incidence relation adjusting unit is used for minimizing the target loss value so as to adjust the incidence relation between the central pixel point of the current annotated image content and the non-central pixel point of the current annotated image content, the incidence relation between the central pixel point of the current annotated image content and the predicted values of the second number of pixel points, and the incidence relation between the central pixel point of the current annotated image content and the non-central pixel point of each annotated image content.
The embodiment of the application also provides electronic equipment, which comprises a memory, a storage and a control unit, wherein the memory stores a plurality of instructions; the processor loads instructions from the memory to execute the steps of any one of the data processing methods provided by the embodiments of the present application.
Embodiments of the present application further provide a computer-readable storage medium, where multiple instructions are stored, and the instructions are suitable for being loaded by a processor to perform steps in any one of the data processing methods provided in the embodiments of the present application.
The method and the device for obtaining the image can obtain the image, the image comprises a plurality of image contents, and each image content has a content category to which the image content belongs. Each content category is labeled through a corresponding labeling pixel point, and the labeling pixel point is located in an image content included in the corresponding content category, and the image content belongs to the plurality of image contents. The image consists of marked pixel points and unmarked pixel points, and a first characteristic value set corresponding to the marked pixel points can be obtained; for the unmarked pixel points, a second feature value set corresponding to the unmarked pixel points can be obtained, and then the prediction categories of the unmarked pixel points can be predicted according to the first feature value set and the second feature value set; and taking the prediction category meeting the preset requirement as a pseudo label of the unmarked pixel point, and marking the unmarked pixel point by using the pseudo label.
In the method, under the condition that each content category in multiple content categories in the image only has one marked pixel point marked with the content category, the category of the unmarked pixel point is predicted, and the predicted category meeting the preset requirement is used as the pseudo label of the unmarked pixel point, so that the step of marking the unmarked pixel point according to the algorithm is realized, the marked information can be perfected by using the algorithm under the condition that the artificially marked information is less, and the consideration of the cost and the detection effect is further realized.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1a is a schematic view of a scene of a data processing method provided in an embodiment of the present application;
FIG. 1b is a schematic flow chart of a data processing method provided in an embodiment of the present application;
FIG. 1c is a schematic diagram showing detection results corresponding to a plurality of target detection modes, respectively;
fig. 2 is a schematic flowchart of a specific implementation of a data processing method according to an embodiment of the present application;
FIG. 3 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of an electronic device provided in an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described clearly and completely with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only some embodiments of the present application, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The embodiment of the application provides a data processing method and device, electronic equipment and a storage medium.
The data processing apparatus may be specifically integrated in an electronic device, and the electronic device may be a terminal, a server, or the like. The terminal comprises but is not limited to a mobile phone, a computer, intelligent voice interaction equipment, intelligent household appliances, a vehicle-mounted terminal, an aircraft and the like; the server may be a single server or a server cluster composed of a plurality of servers.
In some embodiments, the data processing apparatus may also be integrated in a plurality of electronic devices, for example, the data processing apparatus may be integrated in a plurality of servers, and the data processing method of the present application is implemented by the plurality of servers.
In some embodiments, the server may also be implemented in the form of a terminal.
The embodiment of the invention can be applied to various scenes, including but not limited to cloud technology, artificial intelligence, intelligent traffic, driving assistance and the like.
For example, referring to fig. 1a, the electronic device may acquire an image, the image including a plurality of image contents, each of the image contents belonging to a content category; each content category is labeled through a corresponding labeling pixel point, and the labeling pixel point is located in image content included in the corresponding content category; acquiring a first characteristic value set corresponding to the labeling pixel point; determining unmarked pixel points in the image, and acquiring a second feature value set corresponding to the unmarked pixel points; predicting the prediction category of the unmarked pixel points according to the first characteristic value set and the second characteristic value set; and taking the prediction type meeting the preset requirement as a pseudo label of the unmarked pixel point, and marking the unmarked pixel point by using the pseudo label.
The data processing method provided by the embodiment of the application can be applied to the field of target detection, such as pedestrian detection, goods shelf detection, automatic driving and the like. In the application scene of pedestrian detection and goods shelf goods detection, pedestrians and goods shelves are targets to be detected, and in the application scene of automatic driving, objects appearing around a fire in front of a vehicle are the targets to be detected.
The following are detailed descriptions. The numbers in the following examples are not intended to limit the order of preference of the examples.
In this embodiment, a data processing method is provided, as shown in fig. 1b, the specific flow of the data processing method may be as follows, step 110 to step 150:
110. acquiring an image, wherein the image comprises a plurality of image contents, and each image content belongs to a content category; each content category is labeled through a corresponding labeling pixel point, and the labeling pixel point is located in image content included in the corresponding content category.
The image is any one of a plurality of images. Alternatively, the image may be a feature map obtained by downsampling an original image through a convolutional neural network. For example, without taking an image of which original image is 100 × 100 × 3, the original image is downsampled to obtain a processed feature map, which is the image described above. If downsampling is not four times downsampling, the feature map obtained is an image of 25 × 25 × D.
The image content is used for reflecting objects displayed by the image, and the objects comprise life-equipped objects such as animals, plants and the like, and non-life objects such as buildings, natural landscapes and the like. The content category is a category reflecting common features of the objects, such as: a plurality of people with different body types and appearances in the image belong to a 'people' category, and a plurality of cats in the image belong to a 'cat' category.
Each content category is labeled through a corresponding labeling pixel point, and the labeling pixel point is located in image content included in the corresponding content category. That is, if there are a plurality of image contents belonging to the same content category in the image, only one of the image contents is artificially labeled with a label reflecting the content category, and the label is specifically located at a unique pixel point in the artificially labeled image content, and the pixel point can be marked as a labeled pixel point; the image content where the annotation pixel is can be recorded as the annotation image content. For example, if the image does not include 3 dogs, 4 sheep, 2 people, and 3 trees, all the 3 dogs belong to the category of "dog", only one dog in the 3 dogs is the content of the labeled image, and a labeled pixel exists in the content of the labeled image, where the labeled pixel has a label representing the category of the content "dog"; all 4 sheep belong to the category of sheep, only one sheep in the 4 sheep is taken as the content of the labeled image, a labeling pixel point exists in the content of the labeled image, and the labeling pixel point is provided with a label representing the category of the sheep; 2, all the persons belong to the category of 'person', only one person in the 2 persons is used for marking the image content, and a marking pixel point exists in the marked image content and is provided with a label representing the content category of 'person'; the 3 trees belong to the tree category, only one of the 3 trees is marked image content, a marking pixel point exists in the marked image content, and the marking pixel point is provided with a label representing the tree category.
The images may be obtained by screening a training set of the fully supervised target detection data set, for example, for the training set of the fully supervised target detection data set COCO2017, the images may be obtained by retaining only one annotated image content for each category of each picture of the annotated data therein. Through the processing, only 40% of the annotation data in the training set of the fully supervised target detection data set COCO2017 can be reserved, and the fact further illustrates that the annotation method provided by the embodiment of the application can greatly reduce the cost of manual annotation. For the verification set of the fully supervised target detection data set COCO2017, the annotation data thereof can be kept unchanged, that is, the original annotation data is retained for each of a plurality of image contents included in the image.
The description is continued by taking the category corresponding to 3 dogs as an example: only one of the 3 dogs was manually labeled with a label representing the "dog" category, which was randomly selected by the staff from the 3 dogs. After the staff marks the label, the selected dog marked with the label in the image is the marked image content. The electronic equipment can calculate an annotation pixel point according to a plurality of pixel points occupied by the content of the annotation image in the image, and the annotation pixel point is used as the pixel point marked with the category of 'dog'. The above process can be performed for 4 sheep, 2 people and 3 trees, which are not described herein.
The electronic device may calculate the annotation pixel, for example, one pixel may be randomly selected from the plurality of pixels occupied by the annotation image content to serve as the annotation pixel, or a pixel in the center of the plurality of pixels occupied by the annotation image content may be determined to serve as the annotation pixel. It should be understood that the specific process of calculating the annotation pixel should not be construed as limiting the application.
120. And acquiring a first feature value set corresponding to the labeling pixel point.
The first set of eigenvalues is a set of eigenvalues related to the annotation pixels. Optionally, in a specific embodiment, the step 120 may specifically include the following steps 121 to 122:
121. and acquiring the characteristic value corresponding to the marked image content for the marked image content corresponding to each marked pixel point.
Optionally, in a specific embodiment, the step 121 may specifically include the following steps 1211 to 1213:
1211. and acquiring a pixel point set corresponding to the content of the marked image, wherein the pixel point set comprises a central pixel point and a non-central pixel point.
The pixel point set corresponding to the content of the annotated image is a set formed by a plurality of pixel points occupied by the content of the annotated image in the image. The central pixel point is a pixel point corresponding to the central position of the pixel point set, and under a normal condition, the central pixel point can also be a marking pixel point; the non-center pixel point is a pixel point surrounding the center pixel point in the pixel point set and located near the center pixel point, for example, the non-center pixel point may be a pixel point separated from the center pixel point by a plurality of rows of pixel points, or may be a pixel point separated from the center pixel point by a plurality of columns of pixel points. The number of rows and the number of columns can be between 2 and 10.
1212. And performing first operation processing on the central pixel points and the non-central pixel points to obtain content parameter values respectively corresponding to each central pixel point and each non-central pixel point in the pixel point set.
The first arithmetic processing may specifically include the steps of:
for the center pixel, the content parameter value of the center pixel may be determined as a preset value, for example, the content parameter value of the center pixel is not set to be 1.
For non-central pixel points, the formula can be used
Figure BDA0003611593480000081
Calculating the content parameter value Y of each non-central pixel point yxc Wherein Y is yxc The subscript yx of (d) is the coordinate position of the non-center pixel, c is the content type, (p' x ,p' y ) Is the coordinate position, sigma, of the center pixel point corresponding to the current content category wh Is the variance value, σ wh The specific numerical value of (a) may be set by a developer according to development experience. The non-central pixel points calculated by the formula can be attenuated in Gaussian distribution by taking the central pixel points as centers.
1213. And performing second operation on the content parameter value corresponding to each pixel point in the pixel point set and the characteristic parameter value of the corresponding position of the image to obtain a characteristic value, wherein the characteristic value is the characteristic value corresponding to the content of the marked image.
In the above embodiment, when the feature value corresponding to the content of the annotated image is calculated, not only the central pixel point of the content of the annotated image is made to participate in the calculation, but also the non-central pixel points located around the central pixel point are made to participate in the calculation, so that the calculation of the feature value can carry more details related to the content of the annotated image, and the association degree between the feature value and the content of the annotated image is improved.
Optionally, in a specific embodiment, the second operation processing may specifically include the following steps a1 to A3:
and A1, multiplying the content parameter value corresponding to each pixel point with the content parameter value in the pixel point set and the characteristic parameter value of the corresponding position of the image bit by bit to obtain a plurality of product values.
The pixel points with the content parameter values are the central pixel points and the non-central pixel points. Each pixel point with the content parameter value has its own position in the image, for example, a pixel point located at the b-th column of the a-th row of pixel lines of the image, and a pixel point located at the b-th column of the a-th row of pixel lines of the image also has a corresponding characteristic parameter value, so that the content parameter value and the characteristic parameter value at the corresponding position can be multiplied to obtain a plurality of product values.
And A2, adding the multiple product values to obtain a sum result.
After obtaining the plurality of product values, the plurality of product values may be subjected to addition processing, so that an addition result may be obtained.
And A3, carrying out normalization processing on the addition result to obtain the characteristic value.
The normalization process may be specifically performed by using a norm normalization function of L2, or may be performed by using other calculation methods, for example, normalization process by using an arctangent function, and the specific processing method of the normalization process should not be construed as limiting the present application.
Alternatively, it may be embodied by a formula
Figure BDA0003611593480000091
Calculating a content category c i Characteristic value of
Figure BDA0003611593480000092
Wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003611593480000093
is a content category c i Corresponding content parameter value, F y ' x For the value of the characteristic parameter of the corresponding location,
Figure BDA0003611593480000094
the product of the content parameter value of the pixel point and the characteristic parameter value of the corresponding position,
Figure BDA0003611593480000095
is the sum of a plurality of products,
Figure BDA0003611593480000096
a normalization process for summing the above multiple products using the L2 norm normalization function, thereby obtaining the content class c i Characteristic value of
Figure BDA0003611593480000097
Wherein D is a characteristic dimension.
122. And aggregating the characteristic values of the labeled image contents to obtain the first characteristic value set.
If it is not assumed that the image content includes a plurality of image contents covering S content categories, the number of the labeled image contents is S accordingly. Will be provided with
Figure BDA0003611593480000101
Splicing the characteristic values to obtain a first characteristic value set G e R S ×D
130. Determining unmarked pixel points in the image, and acquiring a second feature value set corresponding to the unmarked pixel points.
Unmarked pixel points are all pixel points in the image except the marked pixel points, each unmarked pixel point has a characteristic value corresponding to the unmarked pixel point, and the characteristic values are aggregated, so that a second characteristic value set Q belonging to the R can be obtained K×D And K is the number of the unmarked pixel points.
If S is not set to 4, and all pixels in the image are 25 × 25 to 625, K to 625-4 to 621.
140. And predicting the prediction category of the unmarked pixel points according to the first characteristic value set and the second characteristic value set.
The prediction category is the category to which the image content corresponding to the unmarked pixel points is most probably attributed.
Optionally, the image comprises a first number of content categories, i.e. a first number S; accordingly, in an embodiment, the step 140 may specifically include the following steps 141 to 143:
141. and performing third operation processing on the first characteristic value set and the second characteristic value set to obtain the first number of similarity values corresponding to each unmarked pixel point, wherein the first number of similarity values correspond to the first number of content categories one to one.
The first number is a positive integer and the particular numerical value of the first number should not be construed as limiting the application.
In a specific embodiment, the third operation processing may specifically include the following steps: transposing the first characteristic value set to obtain a transposed set; and multiplying the second feature value set by the transposed set to obtain the first number of similarity values respectively corresponding to each unmarked pixel point.
Specifically, QG may be given by the formula S T Calculating a first number of similarity values respectively corresponding to each unmarked pixel point, wherein S belongs to R K×S
142. And for each unmarked pixel point, acquiring a target similarity value with the maximum similarity value in the first quantity of similarity values corresponding to the unmarked pixel point and a target content category corresponding to the target similarity value.
Because each unmarked pixel point corresponds to a first number of similarity values, the similarity values corresponding to each unmarked pixel point can be calculated, so that the similarity value with the largest value among the similarity values of the first number is obtained, and the similarity value is marked as the target similarity value.
Optionally, the magnitudes of the first number of similarity values may be compared to obtain a similarity value with a largest magnitude, the similarity value with the largest magnitude is denoted as a target similarity value, and the content category corresponding to the similarity value with the largest magnitude is denoted as a target content category.
For example, it is not assumed that the content categories included in the image include four, i.e., dog, sheep, human, and tree, and it is not assumed that, for any unmarked pixel point of the image, the similarity between the unmarked pixel point and the "dog" category is 0.6, the similarity between the unmarked pixel point and the "sheep" category is 0.4, the similarity between the unmarked pixel point and the "human" category is 0.9, and the similarity between the unmarked pixel point and the "tree" category is 0.2, the similarity between the unmarked pixel point and the "human" category is 0.9, so the target similarity value of the unmarked pixel point is 0.9, and the target content category of the unmarked pixel point is the "human" category.
143. And if the target similarity value exceeds a similarity threshold value, taking the target content category as the prediction category of the unmarked pixel point.
The similarity threshold is a preset critical value for determining whether the target content category can be used as a predicted category, and the specific value of the similarity threshold should not be construed as a limitation to the present application.
Continuing with the above example, if the similarity threshold is not set to be 0.8, and the target similarity value 0.9 of the unmarked pixel point exceeds the similarity threshold 0.8, the target content category may be: the "people" category is used as the prediction category of the unmarked pixel point.
In the foregoing embodiment, the first number of similarity values respectively corresponding to each unmarked pixel point and the first number of content categories may be obtained first, and then the first number of similarity values are compared in size, so as to obtain the similarity value with the largest value corresponding to each unmarked pixel point and the content category corresponding to the similarity value. And then comparing the similarity value with a similarity threshold, and if the similarity value exceeds the similarity threshold, judging that the prediction category of the currently operated unmarked pixel point is the target content category corresponding to the similarity value exceeding the similarity threshold. By the method, the prediction category corresponding to each unmarked pixel point can be obtained, so that the cost of image recognition by manpower can be saved.
Optionally, in a specific implementation manner, after step 142, the method provided in the embodiment of the present application may further include:
and if the target similarity value does not exceed the similarity threshold, determining the prediction type of the unmarked pixel points as the background type, wherein the unmarked pixel points of the background type are not marked with pseudo labels.
The pseudo label is a label generated by the electronic device executing the data processing method provided by the embodiment of the application, and is not a label labeled manually.
In particular, it can be according to the formula
Figure BDA0003611593480000121
Calculating any unmarked pixel point c n The target similarity value v. Wherein the content of the first and second substances,
Figure BDA0003611593480000122
for unmarked pixel point c n A corresponding first number of similarity values.
Can be according to the formula
Figure BDA0003611593480000123
Computing prediction classes
Figure BDA0003611593480000124
Wherein eta is a scaling factor, and eta is dependent on the prediction category
Figure BDA0003611593480000125
(ii) varies; t is sim The similarity threshold value can be dynamically changed with different content categories; 0 indicates that the prediction class is the background class.
In the foregoing embodiment, when the target similarity value does not exceed the similarity threshold, the prediction category of the unlabeled pixel may be determined as the background category, and the unlabeled pixel corresponding to the background category is not labeled with the pseudo label. Under the general condition, the unmarked pixel points belonging to the background category are the unmarked pixel points with the largest quantity, so that the unmarked pixel points of the category are not marked with the pseudo labels, the readability of the image can be improved, and the adverse effect on the image identification process can be avoided.
150. And taking the prediction category meeting the preset requirement as a pseudo label of the unmarked pixel point, and marking the unmarked pixel point by using the pseudo label.
The preset requirement is a preset requirement, and the prediction category meeting the preset requirement may be a prediction category of which the category is not a background category, that is: the unmarked pixel points with the background category can be unmarked with the pseudo labels; the prediction categories meeting the preset requirements can also be prediction categories of which the number of the adjacent unmarked pixel points with the same prediction category exceeds the preset number, namely: if the number of the multiple adjacent unmarked pixel points with the same prediction type exceeds the preset number, the prediction type corresponding to the unmarked pixel point can be judged to be the prediction type meeting the preset requirement. The specific requirements set forth are not to be construed as limitations on the present application.
In the above embodiment, the prediction category corresponding to each unmarked pixel point can be obtained, the prediction category meeting the preset requirement is used as the pseudo label of the unmarked pixel point, and the corresponding unmarked pixel point is marked by using the pseudo label. Because the pseudo label is generated by the electronic equipment executing the data processing method provided by the embodiment of the application, the number of labels marked manually can be reduced, and the cost of marking the labels manually can be saved.
Optionally, marking the image content where the marking pixel point is located as marking image content; in a specific implementation manner, the method provided in the embodiment of the present application may further include:
constructing a target loss value operation according to the incidence relation between the central pixel point of the current labeled image content and the non-central pixel point of the current labeled image content, the incidence relation between the central pixel point of the current labeled image content and the predicted values of the second number of pixel points, and the incidence relation between the central pixel point of the current labeled image content and the non-central pixel point of each labeled image content;
and minimizing the target loss value, so as to adjust the incidence relation between the central pixel point of the current marked image content and the non-central pixel point of the current marked image content, the incidence relation between the central pixel point of the current marked image content and the predicted value of the second number of pixel points, and the incidence relation between the central pixel point of the current marked image content and the non-central pixel point of each marked image content.
In the above embodiment, the following formula may be specifically used:
Figure BDA0003611593480000131
Figure BDA0003611593480000132
calculating a target loss value L pgcl Wherein n is the number of content categories included in the image; m is the number of the pixel points with the largest predicted value selected from the predicted values of the plurality of pixel points included in the image, namely the predicted value of each pixel point in the plurality of pixel points can be obtained, and then the first m pixel points with the largest predicted values are takenPoint; m ij For representing a certain pixel point
Figure BDA0003611593480000133
Whether or not to belong to class c i If a pixel point
Figure BDA0003611593480000134
Belong to class c i Then M is ij The value is 1; if the pixel point is
Figure BDA0003611593480000135
Not belonging to class c i Then M is ij The value is 0;
Figure BDA0003611593480000136
any one pixel point in the first m pixel points with the largest predicted value is selected;
Figure BDA0003611593480000137
the central pixel point of the current marked image content is obtained;
Figure BDA0003611593480000138
the predicted value of any one pixel point in the first m pixel points with the largest predicted value is obtained;
Figure BDA0003611593480000139
weighting the sum of the characteristics of the non-central pixel points of the current labeled image content; tau is a temperature scaling factor and is a hyper-parameter;
Figure BDA00036115934800001310
and weighting the sum of the characteristics of the non-central pixel points of any one of the labeled image contents.
Figure BDA00036115934800001311
Expressing the incidence relation between the central pixel point of the current annotation image content and the non-central pixel point of the current annotation image content;
Figure BDA00036115934800001312
expressing the incidence relation between the central pixel point of the current marked image content and the predicted values of the second number of pixel points;
Figure BDA00036115934800001313
and expressing the incidence relation between the central pixel point of the current annotation image content and the non-central pixel point of each annotation image content.
Let the target loss value L pgcl At a minimum, can order
Figure BDA0003611593480000141
Take a large value as much as possible to order
Figure BDA0003611593480000142
Take as small a value as possible because
Figure BDA0003611593480000143
The central pixel point of the current marked image content and the pixel points of the same content category in the previous m pixel points can be drawn close, and the central pixel point of the current marked image content and the pixel points of different content categories in the previous m pixel points can be pushed away, so that the adjustment of the association relationship between the central pixel point of the current marked image content and the non-central pixel point of the current marked image content, the association relationship between the central pixel point of the current marked image content and the predicted values of the second number of pixel points, and the association relationship between the central pixel point of the current marked image content and the non-central pixel point of each marked image content can be realized.
In the above embodiment, the target loss value L is obtained by matching pgcl The calculation of (3) can zoom in the incidence relation between the central pixel point of the current annotated image content and the pixel points with the same content category in the previous m pixel points, and can push away the incidence relation between the central pixel point of the current annotated image content and the pixel points with different content categories in the previous m pixel points, thereby effectively improving the tolerance of the model to wrong pseudo labels, and being capable of improving the tolerance of the model to wrong pseudo labelsSo that the generation and the labeling of the pseudo label are more accurate.
Optionally, in a specific implementation manner, the label of the label pixel is a label; the label and the pseudo label are jointly marked as a final label; correspondingly, the method provided by the embodiment of the application can further include:
calculating a second loss value according to the predicted value of the final label and the target value of the final label;
and adjusting the predicted value of the final label based on the second loss value.
In the above embodiment, the following formula may be specifically used:
Figure BDA0003611593480000144
calculating a second loss value L SPLG . Wherein N is the number of the marked image contents; alpha is a hyper-parameter and can be 0.25; gamma is also a hyper-parameter and can take a value of 0.2;
Figure BDA0003611593480000145
the predicted value of the pixel point with the coordinate yx with respect to the final label of the content category c,
Figure BDA0003611593480000146
the target value of the final label of the content category c is the pixel point with the coordinate of yx.
The method and the device for acquiring the image can acquire the image, wherein the image comprises a plurality of image contents, and each image content has a content category to which the image content belongs. For each content category in the multiple content categories, only one annotation pixel point in the image is annotated with the content category, and the annotation pixel point is located in the corresponding annotation image content, which belongs to the multiple image contents. The image consists of marked pixel points and unmarked pixel points, and for the marked pixel points, a first characteristic value set corresponding to the marked pixel points can be obtained; for the unmarked pixel points, a second characteristic value set corresponding to the unmarked pixel points can be obtained, and then the prediction categories of the unmarked pixel points can be predicted according to the first characteristic value set and the second characteristic value set; and taking the prediction category meeting the preset requirement as a pseudo label of the unmarked pixel point, and marking the unmarked pixel point by using the pseudo label.
In the method, under the condition that each content category in multiple content categories in the image only has one labeled pixel point labeled with the content category, the step of labeling the unlabeled pixel points according to the algorithm is realized by predicting the category of the unlabeled pixel points and using the predicted category meeting the preset requirement as the pseudo label of the unlabeled pixel points, so that the labeled information can be perfected by using the algorithm under the condition of less artificial labeled information, and further the consideration of the cost and the detection effect is realized.
Referring to fig. 1c for details, (a) in fig. 1c is a result obtained after the training set is completely labeled with data and is learned by a model, it is obvious that most of the targets can be accurately located and identified. Whereas (b) sio (base) in fig. 1c is the result after training directly under the setting that only one annotation image content per content category remains. And the SSOD-CSD of (c) in FIG. 1c and the SSOD-TS of (d) in FIG. 1c are the middle-semi-supervised target detection methods, respectively. Finally, Ours integrates calculation of pseudo labels of unmarked pixel points and calculation of a constructed target loss value, and uses image contents corresponding to a large number of unmarked pixel points to train a model, compared with the methods shown in (b), (c) and (d) in FIG. 1c, the model has more accurate positioning and identification of targets, and can obtain performance similar to FSOD (fully supervised target detection) under the condition of reducing the marking amount by 60%, thereby proving that the method can give consideration to both cost and detection effect.
The method described in the above embodiments is further described in detail below.
In this embodiment, the method according to the embodiment of the present application will be described in detail by taking, as an example, a feature map in which an image is obtained by down-sampling the image using a neural network.
As shown in fig. 2, a specific flow of the data processing method is as follows:
201. acquiring a feature map, wherein the feature map comprises a plurality of image contents, and each image content belongs to a content category; each content category is labeled through a corresponding labeling pixel point, wherein the labeling pixel point is located in image content included in the corresponding content category, and the image content is recorded as labeling image content.
202. And acquiring a pixel point set corresponding to the content of the marked image, wherein the pixel point set comprises a central pixel point and a non-central pixel point.
203. And performing first operation processing on the central pixel points and the non-central pixel points to obtain content parameter values corresponding to each central pixel point and each non-central pixel point in the pixel point set respectively.
204. And performing second operation on the content parameter value corresponding to each pixel point in the pixel point set and the characteristic parameter value of the corresponding position of the characteristic graph to obtain a characteristic value, wherein the characteristic value is the characteristic value corresponding to the content of the marked image.
The specific process of the second arithmetic processing is as follows:
multiplying the content parameter value corresponding to each pixel point with the content parameter value in the pixel point set with the characteristic parameter value of the corresponding position of the image bit by bit to obtain a plurality of product values; adding the multiple product values to obtain a sum result; and carrying out normalization processing on the addition result to obtain the characteristic value.
205. And aggregating the characteristic values of the labeled image contents to obtain the first characteristic value set.
206. Determining unmarked pixel points in the feature map, and acquiring a second feature value set corresponding to the unmarked pixel points, wherein the feature map comprises the marked pixel points and the unmarked pixel points.
207. And performing third operation processing on the first characteristic value set and the second characteristic value set to obtain the first number of similarity values corresponding to each unmarked pixel point, wherein the first number of similarity values correspond to the first number of content categories one to one.
208. And for each unmarked pixel point, acquiring a target similarity value meeting a preset requirement in the first number of similarity values corresponding to the unmarked pixel point and a target content category corresponding to the target similarity value.
209. And if the target similarity value exceeds a similarity threshold value, taking the target content category as the prediction category of the unmarked pixel point.
210. And if the target similarity value does not exceed the similarity threshold, determining the prediction type of the unmarked pixel points as the background type, wherein the unmarked pixel points of the background type are not marked with pseudo labels.
211. And taking the prediction type meeting the preset requirement as a pseudo label of the unmarked pixel point, and marking the unmarked pixel point by using the pseudo label.
212. And constructing the operation of the target loss value according to the incidence relation between the central pixel point of the current labeled image content and the non-central pixel point of the current labeled image content, the incidence relation between the central pixel point of the current labeled image content and the predicted values of the second number of pixel points, and the incidence relation between the central pixel point of the current labeled image content and the non-central pixel point of each labeled image content.
213. And minimizing the target loss value, so as to adjust the incidence relation between the central pixel point of the current marked image content and the non-central pixel point of the current marked image content, the incidence relation between the central pixel point of the current marked image content and the predicted value of the second number of pixel points, and the incidence relation between the central pixel point of the current marked image content and the non-central pixel point of each marked image content.
Steps 201 to 213 are the same as the data processing method in the previous embodiment, and are not described herein again.
From the above, the present application may acquire an image, where the image includes a plurality of image contents, and each image content has its own content category. Each content category is labeled through a corresponding labeling pixel point, and the labeling pixel point is located in an image content included in the corresponding content category, and the image content belongs to the plurality of image contents. The image consists of marked pixel points and unmarked pixel points, and for the marked pixel points, a first characteristic value set corresponding to the marked pixel points can be obtained; for the unmarked pixel points, a second feature value set corresponding to the unmarked pixel points can be obtained, and then the prediction categories of the unmarked pixel points can be predicted according to the first feature value set and the second feature value set; and taking the prediction category meeting the preset requirement as a pseudo label of the unmarked pixel point, and marking the unmarked pixel point by using the pseudo label. Under the condition that each content category in multiple content categories in the image only has one labeled pixel point labeled with the content category, the method predicts the category of the unlabeled pixel point and takes the predicted category meeting the preset requirement as the pseudo label of the unlabeled pixel point to realize the step of labeling the unlabeled pixel point according to the algorithm, so that the labeled information can be perfected by using the algorithm under the condition of less artificial labeled information, and further the consideration of the cost and the detection effect is realized.
The setting mode that only one marked image content is reserved in each content category can greatly reduce the cost of manual marking, and the calculation of the pseudo labels of unmarked pixel points and the operation of constructing the target loss value can efficiently mine unmarked potential image content, so that the utilization of the unmarked data by the model is promoted, the marked information can be perfected by utilizing the algorithm under the condition that the manual marked information is less, and the consideration of the cost and the detection effect is further realized.
In order to better implement the method, embodiments of the present application further provide a data processing apparatus, where the data processing apparatus may be specifically integrated in an electronic device, and the electronic device may be a terminal, a server, or the like. The terminal can be a mobile phone, a tablet computer, an intelligent Bluetooth device, a notebook computer, a personal computer and other devices; the server may be a single server or a server cluster composed of a plurality of servers.
For example, in the present embodiment, the method of the present embodiment will be described in detail by taking an example in which the data processing device is specifically integrated in the electronic device.
For example, as shown in fig. 3, the data processing apparatus may include:
an image obtaining unit 301, configured to obtain an image, where the image includes a plurality of image contents, and each of the image contents belongs to a content category; each content category is labeled through a corresponding labeling pixel point, and the labeling pixel point is located in image content included in the corresponding content category;
a first set obtaining unit 302, configured to obtain a first feature value set corresponding to the labeled pixel point;
a second set obtaining unit 303, configured to determine an unmarked pixel point in the image, and obtain a second feature value set corresponding to the unmarked pixel point;
a prediction category obtaining unit 304, configured to predict, according to the first feature value set and the second feature value set, a prediction category of the unlabeled pixel;
the pixel point labeling unit 305 is configured to use the prediction category meeting preset requirements as a pseudo label of the unmarked pixel point, and label the unmarked pixel point by using the pseudo label.
In some embodiments, the image content where the annotation pixel is located is marked as annotation image content; the first set acquiring unit 302 includes:
the characteristic value acquisition subunit is used for acquiring a characteristic value corresponding to the content of the annotated image for the content of the annotated image corresponding to each annotated pixel point;
and the first set subunit is configured to aggregate the feature values of the plurality of labeled image contents to obtain the first feature value set.
In some embodiments, the feature value obtaining subunit includes:
a pixel point secondary subunit, configured to obtain a pixel point set corresponding to the content of the annotation image, where the pixel point set includes a center pixel point and a non-center pixel point
The first operation secondary subunit is configured to perform first operation processing on the center pixel and the non-center pixel, so as to obtain content parameter values corresponding to each center pixel and each non-center pixel in the pixel set;
and the second operation subunit is used for performing second operation processing on the content parameter value corresponding to each pixel point in the pixel point set and the characteristic parameter value of the corresponding position of the image to obtain a characteristic value, wherein the characteristic value is a characteristic value corresponding to the content of the marked image.
In some embodiments, the second operation subunit is specifically configured to multiply, bit by bit, a content parameter value corresponding to each pixel having a content parameter value in the pixel set and a characteristic parameter value of a corresponding position of the image, to obtain a plurality of product values; adding the multiple product values to obtain a sum result; and carrying out normalization processing on the addition result to obtain the characteristic value.
In some embodiments, the image includes a first number of content categories; the prediction category acquisition unit 304 includes:
a third operation subunit, configured to perform a third operation on the first feature value set and the second feature value set to obtain the first number of similarity values corresponding to each unlabeled pixel point, where the first number of similarity values correspond to the first number of content categories one to one;
a target category subunit, configured to, for each unmarked pixel point, obtain a target similarity value with a largest similarity value among the first number of similarity values corresponding to the unmarked pixel point, and a target content category corresponding to the target similarity value;
and the prediction category subunit is configured to, when the target similarity value exceeds a similarity threshold, use the target content category as the prediction category of the unmarked pixel point.
In some embodiments, the apparatus further comprises:
and the background type determining unit is used for determining the prediction type of the corresponding unmarked pixel point as a background type when the target similarity value does not exceed the similarity threshold, wherein the unmarked pixel point of the background type is not marked with a pseudo label.
In some embodiments, the image content where the annotation pixel is located is marked as annotation image content; the device further comprises:
the target loss construction unit is used for constructing the operation of a target loss value according to the incidence relation between the central pixel point of the current annotated image content and the non-central pixel point of the current annotated image content, the incidence relation between the central pixel point of the current annotated image content and the predicted values of the second number of pixel points, and the incidence relation between the central pixel point of the current annotated image content and the non-central pixel point of each annotated image content;
and the incidence relation adjusting unit is used for minimizing the target loss value so as to adjust the incidence relation between the central pixel point of the current annotated image content and the non-central pixel point of the current annotated image content, the incidence relation between the central pixel point of the current annotated image content and the predicted values of the second number of pixel points, and the incidence relation between the central pixel point of the current annotated image content and the non-central pixel point of each annotated image content.
In a specific implementation, the above units may be implemented as independent entities, or may be combined arbitrarily to be implemented as the same or several entities, and the specific implementation of the above units may refer to the foregoing method embodiments, which are not described herein again.
From the above, the present application may acquire an image, where the image includes a plurality of image contents, and each image content has its own content category. Each content category is labeled through a corresponding labeling pixel point, and the labeling pixel point is located in an image content included in the corresponding content category, and the image content belongs to the plurality of image contents. The image consists of marked pixel points and unmarked pixel points, and a first characteristic value set corresponding to the marked pixel points can be obtained; for the unmarked pixel points, a second feature value set corresponding to the unmarked pixel points can be obtained, and then the prediction categories of the unmarked pixel points can be predicted according to the first feature value set and the second feature value set; and taking the prediction type meeting the preset requirement as a pseudo label of the unmarked pixel point, and marking the unmarked pixel point by using the pseudo label. Under the condition that each content category in multiple content categories in the image only has one labeled pixel point labeled with the content category, the method predicts the category of the unlabeled pixel point and takes the predicted category meeting the preset requirement as the pseudo label of the unlabeled pixel point to realize the step of labeling the unlabeled pixel point according to the algorithm, so that the labeled information can be perfected by using the algorithm under the condition of less artificial labeled information, and further the consideration of the cost and the detection effect is realized.
According to the image detection method and device, the detection effect of the image detection model can be improved on the premise of reducing artificial labeling information.
The embodiment of the application also provides the electronic equipment which can be equipment such as a terminal and a server. The terminal can be a mobile phone, a tablet computer, an intelligent Bluetooth device, a notebook computer, a personal computer and the like; the server may be a single server, a server cluster composed of a plurality of servers, or the like.
In some embodiments, the data processing apparatus may also be integrated in a plurality of electronic devices, for example, the data processing apparatus may be integrated in a plurality of servers, and the data processing method of the present application is implemented by the plurality of servers.
In this embodiment, the electronic device of this embodiment is described in detail as an example, for example, as shown in fig. 4, it shows a schematic structural diagram of the electronic device according to the embodiment of the present application, specifically:
the electronic device may include components such as a processor 401 of one or more processing cores, memory 402 of one or more computer-readable storage media, a power supply 403, an input module 404, and a communication module 405. Those skilled in the art will appreciate that the electronic device configuration shown in fig. 4 does not constitute a limitation of the electronic device and may include more or fewer components than shown, or some components may be combined, or a different arrangement of components. Wherein:
the processor 401 is a control center of the electronic device, connects various parts of the entire electronic device using various interfaces and lines, performs various functions of the electronic device and processes data by operating or executing software programs and/or modules stored in the memory 402 and calling data stored in the memory 402. In some embodiments, processor 401 may include one or more processing cores; in some embodiments, processor 401 may integrate an application processor, which primarily handles operating systems, user interfaces, applications, etc., and a modem processor, which primarily handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 401.
The memory 402 may be used to store software programs and modules, and the processor 401 executes various functional applications and data processing by operating the software programs and modules stored in the memory 402. The memory 402 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data created according to use of the electronic device, and the like. Further, the memory 402 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory 402 may also include a memory controller to provide the processor 401 access to the memory 402.
The electronic device also includes a power supply 403 for supplying power to the various components, and in some embodiments, the power supply 403 may be logically coupled to the processor 401 via a power management system, such that the power management system may manage charging, discharging, and power consumption. The power supply 403 may also include any component of one or more dc or ac power sources, recharging systems, power failure detection circuitry, power converters or inverters, power status indicators, and the like.
The electronic device may also include an input module 404, the input module 404 operable to receive input numeric or character information and generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control.
The electronic device may also include a communication module 405, and in some embodiments the communication module 405 may include a wireless module, and the electronic device may perform short-range wireless transmission via the wireless module of the communication module 405 to provide wireless broadband internet access to the user. For example, the communication module 405 may be used to assist a user in sending and receiving e-mails, browsing web pages, accessing streaming media, and the like.
Although not shown, the electronic device may further include a display unit and the like, which are not described in detail herein. Specifically, in this embodiment, the processor 401 in the electronic device loads the executable file corresponding to the process of one or more application programs into the memory 402 according to the following instructions, and the processor 401 runs the application program stored in the memory 402, thereby implementing various functions as follows:
acquiring an image, wherein the image comprises a plurality of image contents, and each image content belongs to a content category; each content category is labeled through a corresponding labeling pixel point, and the labeling pixel point is located in image content included in the corresponding content category; acquiring a first feature value set corresponding to the labeling pixel point; determining unmarked pixel points in the image, and acquiring a second feature value set corresponding to the unmarked pixel points; predicting the prediction category of the unmarked pixel points according to the first characteristic value set and the second characteristic value set; and taking the prediction category meeting the preset requirement as a pseudo label of the unmarked pixel point, and marking the unmarked pixel point by using the pseudo label.
The above operations can be implemented in the foregoing embodiments, and are not described in detail herein.
It will be understood by those skilled in the art that all or part of the steps of the methods of the above embodiments may be performed by instructions, or by instructions controlling associated hardware, which may be stored in a computer-readable storage medium and loaded and executed by a processor.
To this end, the present application provides a computer-readable storage medium, in which a plurality of instructions are stored, where the instructions can be loaded by a processor to execute the steps in any one of the data processing methods provided in the present application. For example, the instructions may perform the steps of:
acquiring an image, wherein the image comprises a plurality of image contents, and each image content belongs to a content category; each content category is labeled through a corresponding labeling pixel point, and the labeling pixel point is located in image content included in the corresponding content category; acquiring a first feature value set corresponding to the labeling pixel point; determining unmarked pixel points in the image, and acquiring a second feature value set corresponding to the unmarked pixel points; predicting the prediction category of the unmarked pixel point according to the first characteristic value set and the second characteristic value set; and taking the prediction type meeting the preset requirement as a pseudo label of the unmarked pixel point, and marking the unmarked pixel point by using the pseudo label.
Wherein the storage medium may include: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like.
According to an aspect of the application, a computer program product or computer program is provided, comprising computer instructions, the computer instructions being stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the methods provided in the various alternative implementations of the data processing aspects of the software page provided in the embodiments described above.
Since the instructions stored in the storage medium can execute the steps in any data processing method provided in the embodiments of the present application, beneficial effects that can be achieved by any data processing method provided in the embodiments of the present application can be achieved, which are detailed in the foregoing embodiments and will not be described herein again.
The foregoing detailed description has provided a data processing method, an apparatus, an electronic device, and a computer-readable storage medium according to embodiments of the present application, and specific examples are applied herein to explain the principles and implementations of the present application, and the descriptions of the foregoing embodiments are only used to help understand the method and the core ideas of the present application; meanwhile, for those skilled in the art, according to the idea of the present application, the specific implementation manner and the application scope may be changed, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims (10)

1. A method of data processing, the method comprising:
acquiring an image, wherein the image comprises a plurality of image contents, and each image content belongs to a content category; each content category is labeled through a corresponding labeling pixel point, and the labeling pixel point is located in image content included in the corresponding content category;
acquiring a first feature value set corresponding to the labeling pixel point;
determining unmarked pixel points in the image, and acquiring a second feature value set corresponding to the unmarked pixel points;
predicting the prediction category of the unmarked pixel points according to the first characteristic value set and the second characteristic value set;
and taking the prediction category meeting the preset requirement as a pseudo label of the unmarked pixel point, and marking the unmarked pixel point by using the pseudo label.
2. The method of claim 1, wherein the image content at which the annotation pixel is located is denoted as annotation image content; the obtaining of the first feature value set corresponding to the labeled pixel point includes:
for the marked image content corresponding to each marked pixel point, acquiring a characteristic value corresponding to the marked image content;
and aggregating the characteristic values of the labeled image contents to obtain the first characteristic value set.
3. The method of claim 2, wherein the obtaining the feature value corresponding to the annotation image content for the annotation image content corresponding to each annotation pixel point comprises:
acquiring a pixel point set corresponding to the content of the marked image, wherein the pixel point set comprises a central pixel point and a non-central pixel point;
performing first operation processing on the central pixel points and the non-central pixel points to obtain content parameter values corresponding to each central pixel point and each non-central pixel point in the pixel point set respectively;
and performing second operation on the content parameter value corresponding to each pixel point in the pixel point set and the characteristic parameter value of the corresponding position of the image to obtain a characteristic value, wherein the characteristic value is the characteristic value corresponding to the content of the marked image.
4. The method according to claim 3, wherein performing a second operation on the content parameter value corresponding to each pixel point in the pixel point set and the characteristic parameter value of the corresponding position of the image to obtain a characteristic value comprises:
multiplying the content parameter value corresponding to each pixel point with the content parameter value in the pixel point set with the characteristic parameter value of the corresponding position of the image bit by bit to obtain a plurality of product values;
adding the multiple product values to obtain a sum result;
and carrying out normalization processing on the addition result to obtain the characteristic value.
5. The method of claim 1, wherein the image comprises a first number of content categories;
predicting the prediction category of the unlabeled pixel point according to the first feature value set and the second feature value set, including:
performing third operation processing on the first characteristic value set and the second characteristic value set to obtain the first number of similarity values corresponding to each unlabeled pixel point, wherein the first number of similarity values correspond to a first number of content categories one to one;
for each unmarked pixel point, acquiring a target similarity value with the maximum similarity value in the first quantity of similarity values corresponding to the unmarked pixel point and a target content category corresponding to the target similarity value;
and if the target similarity value exceeds a similarity threshold value, taking the corresponding target content category as the prediction category of the unmarked pixel point.
6. The method as claimed in claim 5, wherein after the obtaining, for each of the unlabeled pixel points, a target similarity value satisfying a preset requirement in the first number of similarity values corresponding to the unlabeled pixel point and a target content category corresponding to the target similarity value, the method further comprises:
and if the target similarity value does not exceed the similarity threshold, determining the prediction type of the unmarked pixel points as the background type, wherein the unmarked pixel points of the background type are not marked with pseudo labels.
7. The method of claim 1, wherein the image content at which the annotation pixel is located is denoted as annotation image content; the method further comprises the following steps:
constructing a target loss value operation according to the incidence relation between the central pixel point of the current labeled image content and the non-central pixel point of the current labeled image content, the incidence relation between the central pixel point of the current labeled image content and the predicted values of the second number of pixel points, and the incidence relation between the central pixel point of the current labeled image content and the non-central pixel point of each labeled image content;
and minimizing the target loss value, so as to adjust the incidence relation between the central pixel point of the current annotated image content and the non-central pixel point of the current annotated image content, the incidence relation between the central pixel point of the current annotated image content and the predicted values of the second number of pixel points, and the incidence relation between the central pixel point of the current annotated image content and the non-central pixel point of each annotated image content.
8. A data processing apparatus, characterized in that the apparatus comprises:
the image acquisition unit is used for acquiring an image, wherein the image comprises a plurality of image contents, and each image content belongs to a content category; each content category is labeled through a corresponding labeling pixel point, and the labeling pixel point is located in image content included in the corresponding content category;
a first set obtaining unit, configured to obtain a first feature value set corresponding to the labeled pixel point;
the second set acquisition unit is used for determining unmarked pixel points in the image and acquiring a second characteristic value set corresponding to the unmarked pixel points;
the prediction type obtaining unit is used for predicting the prediction type of the unmarked pixel points according to the first characteristic value set and the second characteristic value set;
and the pixel point labeling unit is used for taking the prediction category meeting the preset requirement as a pseudo label of the unmarked pixel point and labeling the unmarked pixel point by using the pseudo label.
9. An electronic device comprising a processor and a memory, the memory storing a plurality of instructions; the processor loads instructions from the memory to perform the steps of the data processing method according to any one of claims 1 to 7.
10. A computer readable storage medium storing instructions adapted to be loaded by a processor to perform the steps of the data processing method according to any of claims 1 to 7.
CN202210432805.8A 2022-04-24 2022-04-24 Data processing method and device, electronic equipment and storage medium Pending CN115131597A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210432805.8A CN115131597A (en) 2022-04-24 2022-04-24 Data processing method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210432805.8A CN115131597A (en) 2022-04-24 2022-04-24 Data processing method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115131597A true CN115131597A (en) 2022-09-30

Family

ID=83376020

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210432805.8A Pending CN115131597A (en) 2022-04-24 2022-04-24 Data processing method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115131597A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024113287A1 (en) * 2022-11-30 2024-06-06 华为技术有限公司 Labeling method and labeling apparatus

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024113287A1 (en) * 2022-11-30 2024-06-06 华为技术有限公司 Labeling method and labeling apparatus

Similar Documents

Publication Publication Date Title
CN112163465B (en) Fine-grained image classification method, fine-grained image classification system, computer equipment and storage medium
Yi et al. ASSD: Attentive single shot multibox detector
CN109961041B (en) Video identification method and device and storage medium
CN111242844B (en) Image processing method, device, server and storage medium
CN111125519B (en) User behavior prediction method, device, electronic equipment and storage medium
CN111292262B (en) Image processing method, device, electronic equipment and storage medium
CN111126389A (en) Text detection method and device, electronic equipment and storage medium
CN109548691A (en) A kind of pet recognition methods, device, medium and electronic equipment
CN115114439B (en) Method and device for multi-task model reasoning and multi-task information processing
CN111242019A (en) Video content detection method and device, electronic equipment and storage medium
CN111563158A (en) Text sorting method, sorting device, server and computer-readable storage medium
CN109598250A (en) Feature extracting method, device, electronic equipment and computer-readable medium
CN116485475A (en) Internet of things advertisement system, method and device based on edge calculation
CN113569138A (en) Intelligent device control method and device, electronic device and storage medium
Qi et al. A DNN-based object detection system on mobile cloud computing
CN113838134B (en) Image key point detection method, device, terminal and storage medium
CN115131597A (en) Data processing method and device, electronic equipment and storage medium
CN114611692A (en) Model training method, electronic device, and storage medium
CN110909768A (en) Method and device for acquiring marked data
CN115909336A (en) Text recognition method and device, computer equipment and computer-readable storage medium
CN113822144A (en) Target detection method and device, computer equipment and storage medium
CN110704650B (en) OTA picture tag identification method, electronic equipment and medium
CN110674716A (en) Image recognition method, device and storage medium
CN115359468A (en) Target website identification method, device, equipment and medium
CN112507912B (en) Method and device for identifying illegal pictures

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination