WO2021248932A1

WO2021248932A1 - Image data processing method and apparatus, device and readable storage medium

Info

Publication number: WO2021248932A1
Application number: PCT/CN2021/076826
Authority: WO
Inventors: 张润泽; 郭振华; 赵雅倩
Original assignee: 广东浪潮智慧计算技术有限公司
Priority date: 2020-06-11
Filing date: 2021-02-19
Publication date: 2021-12-16
Also published as: CN111723856A; CN111723856B

Abstract

An image data processing method and apparatus, a device and a readable storage medium. In the present method, an image sample set is first re-sorted on the basis of the number of labeled pictures, a fitting index of the image sample set is then determined, and on the basis of the fitting index, the image sample set may be divided into a plurality of sample subsets according to the number of labeled pictures. Then, each sample subset is used to train a target model to obtain the model training accuracy, i.e. the contribution of each sample subset to the training of the target model is determined. A sampling weight is determined on the basis of the model training accuracy, and each sample subset is sampled to obtain a target image sample set. The sample distribution in the target image sample set may be distributed according to the sample contribution ability, which may further improve the model training effect and improve the accuracy of the result of picture recognition processing.

Description

Image data processing method, device, equipment and readable storage medium

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on June 11, 2020, with the application number of 202010530581.5, and the title of the invention as "an image data processing method, device, equipment, and readable storage medium". The entire content is incorporated into this application by reference.

Technical field

The present invention relates to the field of image processing technology, and in particular to an image data processing method, device, equipment and readable storage medium.

Background technique

Pedestrian re-recognition, face recognition, image-based target detection and other image recognition processing technologies often need to collect a large number of labeled image samples to train models with learning capabilities, and finally obtain learning that can effectively recognize unknown images Model.

However, during model training, the uneven distribution of image samples often leads to poor learning effects, and ultimately makes the trained model unable to achieve the expected recognition accuracy. Taking pedestrian re-recognition as an example, for the image sample set of pedestrian re-recognition, the number of pictures corresponding to a pedestrian (tag) ranges from 1 to hundreds of thousands. Especially in some image sample sets, the pictures corresponding to different pedestrians range from 1 to more than 1,000. Then the median of pictures corresponding to each pedestrian is only 2. Nearly half of the pedestrians only have one picture, and there is only one picture. A small number of pedestrians have more than 100 pictures. Usually this type of data distribution is called long-tailed data.

In summary, how to effectively solve problems such as image sample imbalance is a technical problem urgently needed to be solved by those skilled in the art.

Summary of the invention

The purpose of the present invention is to provide an image data processing method, device, equipment, and readable storage medium to classify and divide the image sample set, and combine the contribution of the sample to model training, to provide a sample subset of each category The determined sampling weight is sub-sampled to the image sample set to achieve the purpose of data balance, which can further improve the accuracy of model training.

In order to solve the above technical problems, the present invention provides the following technical solutions:

An image data processing method, including:

Sort the tags according to the number of pictures corresponding to each tag in the image sample set;

Acquiring the fitting index corresponding to the image sample set after sorting;

Segmenting the image sample set by using the fitting index to obtain multiple sample subsets;

Each of the sample subsets is used to train the target model to obtain the model training accuracy corresponding to each of the sample subsets;

Using sampling weights matching the model training accuracy, sampling each of the sample subsets to obtain a target image sample set.

Preferably, said segmenting said image sample set by said fitting index to obtain a plurality of sample subsets includes:

The image sample set is divided by the integral of the fitting index to obtain a plurality of the sample subsets with an equal total number of pictures.

Preferably, the sampling of each of the sample subsets by using the sampling weight matching the model training accuracy to obtain a target image sample set includes:

Acquiring the relative position of each of the sample subsets in the fitting index;

Combining the relative position and the sampling weight, sampling each of the sample subsets to obtain the target image sample set.

Preferably, the combining the relative position and the sampling weight to sample each of the sample subsets to obtain the target image sample set includes:

If the relative position is the head, it is determined whether the sampling weight is greater than 1;

If yes, perform oversampling by using the number of pictures corresponding to each of the tags in the sample subset and the sampling weight;

If not, then take the original picture of the sample subset.

If the relative position is in the middle, it is determined whether the sampling weight is greater than 1;

If so, perform oversampling by using the number of pictures corresponding to each of the tags in the sample subset, the sampling weight, and the preset weighting multiple;

If not, sampling is performed using the number of pictures corresponding to each of the tags in the sample subset and the sampling weight.

If the relative position is the tail, judge whether the sampling weight is greater than 1;

If not, obtain the median number of pictures corresponding to each of the tags in the sample subset, and randomly select pictures with the median number of pictures for each tag.

Preferably, it also includes:

Training the target model by using the target image sample set to obtain a trained classification and recognition model;

The classification and recognition model is used to recognize the target image to be recognized, and the recognition result is obtained.

An image data processing device, including:

The image sample set sorting module is used to sort the tags according to the number of pictures corresponding to each tag in the image sample set;

The fitting module is used to obtain the fitting index corresponding to the image sample set after sorting;

An image sample set segmentation module, configured to use the fitting index to segment the image sample set to obtain multiple sample subsets;

The training module is configured to train the target model by using each of the sample subsets to obtain the model training accuracy corresponding to each of the sample subsets;

The re-sampling module is used to sample each of the sample subsets by using the sampling weight matching the model training accuracy to obtain a target image sample set.

An image data processing device, including:

Memory, used to store computer programs;

The processor is used to implement the steps of the image data processing method when the computer program is executed.

A readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the steps of the image data processing method described above are realized.

Using the method provided by the embodiment of the present invention, the tags are sorted according to the number of pictures corresponding to each tag in the image sample set; the fitting index corresponding to the sorted image sample set is obtained; the image sample set is divided by the fitting index to obtain Sample subsets; each sample subset is used to train the target model to obtain the model training accuracy corresponding to each sample subset; the sampling weight matching the model training accuracy is used to sample each sample subset to obtain the target Image sample set.

In this method, the image sample set is first reordered based on the number of pictures of the label, and then the fitting index of the image sample set is determined. Based on the fitting index, the image sample set can be divided into multiple samples according to the number of pictures of the label. Subset. That is, the number of pictures corresponding to the tags in the same sample subset are all similar. Then, each sample subset is used to train the target model to obtain the model training accuracy, that is, to determine the contribution of each sample subset to the target model training. The sampling weight is determined based on the model training accuracy, and each sample subset is sampled to obtain the target image sample set. The sample distribution in the target image sample set can be distributed with the sample contribution ability, which can further improve the model training effect and improve the accuracy of the result of image recognition processing.

Correspondingly, the embodiments of the present invention also provide image data processing apparatuses, equipment, and readable storage media corresponding to the above-mentioned image data processing methods, which have the above technical effects, and will not be repeated here.

Description of the drawings

In order to explain the embodiments of the present invention or the technical solutions in the prior art more clearly, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the drawings in the following description are only These are some embodiments of the present invention. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without creative work.

Figure 1 is an implementation flowchart of an image data processing method in an embodiment of the present invention;

2 is a schematic diagram of image sample set segmentation in an embodiment of the present invention;

FIG. 3 is a schematic structural diagram of an image data processing device in an embodiment of the present invention;

Figure 4 is a schematic structural diagram of an image data processing device in an embodiment of the present invention;

Fig. 5 is a schematic diagram of a specific structure of an image data processing device in an embodiment of the present invention.

detailed description

In order to enable those skilled in the art to better understand the solution of the present invention, the present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments. Obviously, the described embodiments are only a part of the embodiments of the present invention, rather than all the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of the present invention.

Please refer to FIG. 1. FIG. 1 is a flowchart of an image data processing method in an embodiment of the present invention. The method includes the following steps:

S101: Sort the tags according to the number of pictures corresponding to each tag in the image sample set.

Among them, the image sample set may specifically be a set of image samples used for training face recognition, object recognition, or pedestrian re-recognition.

Each picture in the image sample set has a label, which indicates the identification mark of the picture. For example, when the image sample set is the image sample set for pedestrian re-recognition, the label is pedestrian, and each pedestrian has one picture or more than one picture. For ease of description, the image sample set is used as the sample data corresponding to pedestrian re-recognition for description. For the process of processing the image sample set of other types of image recognition processing, please refer to this, which will not be repeated here.

Sort the tags according to the image data corresponding to the tags. That is, the image data of each label is counted, and then the image sample set is reordered according to the statistical result. It should be noted that the order here can be from less to more, or from more to less. For ease of description, in this document, the order is described from less to more, and the way from more to less can be referred to here, so I will not repeat them here.

S102. Obtain a fitting index corresponding to the sorted image sample set.

After the sorting is completed, the fitting index corresponding to the sorted image sample set can be determined.

Specifically, an exponential fitting with the base of the natural base e can be performed on the newly arranged image sample set, namely:

f(x)=a*e ^b*x +c (1)

After obtaining a, b, and c in formula (1), formula (1) can be used to replace the ascending order of the original image sample set for subsequent steps. The obtained fitting index is shown in FIG. 2, which is a schematic diagram of image sample set segmentation in an embodiment of the present invention.

S103. Use the fitting index to segment the image sample set to obtain multiple sample subsets.

Among them, the number of sample subsets is at least 3. There is no intersection between sample subsets.

Specifically, in order to equalize the number of pictures in the multiple sample subsets obtained by segmentation and facilitate subsequent sampling processing, the image sample set can be segmented using the integral of the fitting index to obtain multiple sample subsets with an equal total number of pictures. The number of sample subsets can be achieved by setting.

It should be noted that because the number of pictures corresponding to the labels is different, and the total number of pictures is not necessarily an integer multiple of the sample subset, the number of labels in multiple sample subsets may not be exactly the same, and the number of pictures in multiple sample subsets is also It's not exactly the same.

For example, please refer to Figure 2. If you need to divide the image sample set into 3 sample subsets, you can calculate the integral of the fitting index to get: g ₁ = [0, p), g ₂ = [p, q) and g ₃ =(q, n-1], the integral of each part is one-third of S (the integral of the fitting index), corresponding to S1, S2 and S3 shown in Figure 2 in turn.

S104: Use each sample subset to train the target model to obtain the model training accuracy corresponding to each sample subset.

Among them, the target model is a model that needs to be trained after image data processing. The model may be a deep learning model or a machine learning model that has learning capabilities, such as a machine learning model.

In order to determine that the final target image sample set can better train the target model, in this embodiment, the sampling situation is determined mainly based on the contribution of the sample subset to the target model training. Therefore, in this embodiment, after obtaining multiple sample subsets, each sample subset can be used to train the target model, and then the model training accuracy corresponding to each sample subset can be obtained. The training accuracy of the model can be specifically the recognition accuracy rate, which can be obtained through the validation set.

S105. Using sampling weights that match the model training accuracy, sample each sample subset to obtain a target image sample set.

Among them, a sampling weight corresponds to a sample subset. That is to say, each sample subset is sampled according to the sampling weight matching the model training accuracy, and the sampled samples are added to the target image sample set. In this way, a single over-sampling or under-sampling will not cause the problem of sampling difficulty in retaining samples that have the ability to contribute to model training.

Specifically, the sampling implementation process includes:

Step 1: Use the model training accuracy to determine the sampling weight;

Step 2: Sampling each sample subset according to the sampling weight to obtain the target image sample set.

For example, if the number of sample subsets is 3, separate training of the control variables can be performed on each sample subset, and the verification set can be used to obtain the accuracy values a ₁ , a ₂ , a that can be achieved by their training target model ₃ . Then g _1, g _2, g ₃ corresponding weight w _1, w _2, w ₃ is:

After the sampling weight is obtained, each sample subset can be sampled directly based on the sampling weight.

Preferably, considering that in practical applications, the number of samples corresponding to the label is too small, which may cause the target model to fail to learn effectively, and the number of samples corresponding to the label is too large, which may cause the target model to overfit. That is, data that contributes a lot to the model needs to be oversampled, and redundant data needs to be undersampled. Therefore, in this embodiment, effective differentiated sampling can also be performed for the number of samples corresponding to the labels, so that the labels with the median number of samples being better concentrated are retained and reasonably sampled. The specific implementation process includes:

Step 1: Obtain the relative position of each sample subset in the fitting index;

Step 2: Combine the relative position and sampling weight to sample each sample subset to obtain the target image sample set.

It can be seen from Figure 2 that in the header of the fitting index, such as g ₁ , the number of pictures corresponding to the same label in this part is low; in the middle of the fitting index, such as g ₂ , the number of pictures corresponding to the same label in this part In the middle, you can sample more when the contribution ability is large. At the end of the fitting index, such as g ₃ , the number of pictures corresponding to the same label in this part is too large, and the sampling can be reduced when the contribution ability is small.

That is, the specific sampling process, including the following situations:

Case 1: If the relative position is the head, the sampling process includes:

Step 1. Determine whether the sampling weight is greater than 1;

Step 2. If yes, use the number of pictures corresponding to each label in the sample subset and the sampling weight to perform oversampling;

Step 3. If not, sample the original pictures of the subset.

For example: for g ₁ , if w ₁ > 1, then _{sample w 1} times the number of pictures they own for each tag _{in g 1} , if it is a non-integer, round up to the whole. In order to increase data diversity, Repeat the photo to do random flip (such as angle rotation, left and right flip), crop and erase; if w ₁ ≤ 1, then take the original image sample set of _{g 1.}

Case 2: If the relative position is in the middle, the sampling process includes:

Step 1. Determine whether the sampling weight is greater than 1;

Step 2. If yes, use the number of pictures corresponding to each label in the sample subset, sampling weight, and preset weighting multiple to perform oversampling;

Step 3. If not, use the number of pictures and sampling weights corresponding to each label in the sample subset to perform sampling.

For example: for g ₂ , if w ₂ > 1, then _{sample each tag in g 2} _{m*w 2} times the number of pictures they own (m is a preset weighted multiple, m can be greater than 1 according to the specific situation The number of, such as 2). If it is a non-integer, round up. Repeated photos are randomly flipped, cropped and erased; if w ₂ ≤ 1, then take _{w 2} times the original image sample set of _{g 2.}

Case 3: If the relative position is the tail, the sampling process includes:

Step 1. Determine whether the sampling weight is greater than 1;

Step 2. If yes, use the number of pictures and sampling weights corresponding to each label in the sample subset to perform oversampling;

Step 3. If not, obtain the median number of pictures corresponding to each label in the sample subset, and randomly select pictures with the median number of pictures for each label.

For example: for g ₃ , if w ₃ > 1, then for _{each label in g 3} _{, sample w 3} times the number of pictures they have, if it is a non-integer, round up, and perform random flips for repeated pictures. Cropping and erasing; if w ₃ ≤ 1, then take the median of the entire image sample set in ascending order, and randomly sample the median number of pictures for each label in _{g 3.}

After sampling each sample subset, the target image sample set is obtained.

Preferably, after obtaining the target image sample set, use the target image sample set to train the target model to obtain a trained classification recognition model; use the classification recognition model to recognize the target image to be recognized to obtain the recognition result.

Corresponding to the above method embodiment, the embodiment of the present invention also provides an image data processing device. The image data processing device described below and the image data processing method described above can be referred to each other.

As shown in Figure 3, the device includes the following modules:

The image sample set sorting module 101 is used to sort the tags according to the number of pictures corresponding to each tag in the image sample set;

The fitting module 102 is configured to obtain the fitting index corresponding to the sorted image sample set;

The image sample set segmentation module 103 is used to segment the image sample set by using the fitting index to obtain multiple sample subsets;

The training module 104 is configured to use each sample subset to train the target model to obtain the model training accuracy corresponding to each sample subset;

The re-sampling module 105 is used for sampling each sample subset by using the sampling weight matching the model training accuracy to obtain the target image sample set.

Using the device provided by the embodiment of the present invention, the labels are sorted according to the number of pictures corresponding to each label in the image sample set; the fitting index corresponding to the sorted image sample set is obtained; the image sample set is segmented by the fitting index to obtain multiple Sample subsets; each sample subset is used to train the target model to obtain the model training accuracy corresponding to each sample subset; the sampling weight matching the model training accuracy is used to sample each sample subset to obtain the target Image sample set.

In this device, the image sample set is first reordered based on the number of pictures of the label, and then the fitting index of the image sample set is determined. Based on the fitting index, the image sample set can be divided into multiple samples according to the number of pictures of the label. Subset. That is, the number of pictures corresponding to the tags in the same sample subset are all similar. Then, each sample subset is used to train the target model to obtain the model training accuracy, that is, to determine the contribution of each sample subset to the target model training. The sampling weight is determined based on the model training accuracy, and each sample subset is sampled to obtain the target image sample set. The sample distribution in the target image sample set can be distributed with the sample contribution ability, which can further improve the model training effect and improve the accuracy of the result of image recognition processing.

In a specific embodiment of the present invention, the image sample set segmentation module 103 is specifically configured to segment the image sample set using the integral of the fitting index to obtain multiple sample subsets with an equal total number of pictures.

In a specific implementation of the present invention, the resampling module 105 specifically includes:

The relative position obtaining unit is used to obtain the relative position of each sample subset in the fitting index;

The re-sampling unit is used to combine the relative position and the sampling weight to sample each sample subset to obtain the target image sample set.

In a specific embodiment of the present invention, the resampling unit is specifically used to determine whether the sampling weight is greater than 1 if the relative position is the head; if so, use the number of pictures corresponding to each label in the sample subset and the sampling weight, Oversampling is performed; if not, the original pictures of the subset are sampled.

In a specific embodiment of the present invention, the resampling unit is specifically used to determine whether the sampling weight is greater than 1 if the relative position is in the middle; if so, use the number of pictures corresponding to each label in the sample subset, sampling weight and Preset weighting multiples and perform over-sampling; if not, use the number of pictures and sampling weights corresponding to each label in the sample subset to perform sampling.

In a specific embodiment of the present invention, the resampling unit is specifically configured to determine whether the sampling weight is greater than 1 if the relative position is the tail; if so, use the number of pictures and sampling weights corresponding to each label in the sample subset, Oversampling is performed; if not, the median number of pictures corresponding to each label in the sample subset is obtained, and pictures with the median number of pictures are randomly selected for each label.

In a specific embodiment of the present invention, it further includes:

The model training module is used to train the target model by using the target image sample set to obtain a trained classification and recognition model;

The recognition module is used to recognize the target image to be recognized by using the classification recognition model to obtain the recognition result.

Corresponding to the above method embodiment, the embodiment of the present invention also provides an image data processing device. The image data processing device described below and the image data processing method described above can be referenced correspondingly.

As shown in Figure 4, the image data processing equipment includes:

The memory 332 is used to store computer programs;

The processor 322 is configured to implement the steps of the image data processing method in the foregoing method embodiment when the computer program is executed.

Specifically, please refer to FIG. 5. FIG. 5 is a schematic diagram of a specific structure of an image data processing device provided by this embodiment. The image data processing device may have relatively large differences due to different configurations or performances, and may include one or one The foregoing central processing units (CPU) 322 (for example, one or more processors) and a memory 332, and the memory 332 stores one or more computer application programs 342 or data 344. Among them, the memory 332 may be short-term storage or persistent storage. The program stored in the memory 332 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations on the data processing device. Furthermore, the central processing unit 322 may be configured to communicate with the memory 332, and execute a series of instruction operations in the memory 332 on the image data processing device 301.

The image data processing device 301 may also include one or more power supplies 326, one or more wired or wireless network interfaces 350, one or more input and output interfaces 358, and/or one or more operating systems 341.

The steps in the image data processing method described above can be implemented by the structure of the image data processing device.

Corresponding to the above method embodiment, the embodiment of the present invention also provides a readable storage medium, and a readable storage medium described below and an image data processing method described above can be referenced correspondingly.

A readable storage medium in which a computer program is stored, and when the computer program is executed by a processor, the steps of the image data processing method in the foregoing method embodiment are implemented.

The readable storage medium can specifically be a U disk, a mobile hard disk, a read-only memory (Read-Only Memory, ROM), a random access memory (Random Access Memory, RAM), a magnetic disk, or an optical disk that can store program codes. Readable storage medium.

Those skilled in the art may further realize that the units and algorithm steps of the examples described in the embodiments disclosed herein can be implemented by electronic hardware, computer software, or a combination of the two, in order to clearly illustrate the hardware and software The interchangeability of, the composition and steps of each example have been described in general in accordance with the function in the above description. Whether these functions are executed by hardware or software depends on the specific application and design constraint conditions of the technical solution. Those skilled in the art can use different methods for each specific application to implement the described functions, but such implementation should not be considered as going beyond the scope of the present invention.

Claims

An image data processing method, characterized in that it comprises:

Sort the tags according to the number of pictures corresponding to each tag in the image sample set;

Acquiring the fitting index corresponding to the image sample set after sorting;

Segmenting the image sample set by using the fitting index to obtain multiple sample subsets;

Each of the sample subsets is used to train the target model to obtain the model training accuracy corresponding to each of the sample subsets;

Using sampling weights matching the model training accuracy, sampling each of the sample subsets to obtain a target image sample set.
The image data processing method according to claim 1, wherein said segmenting said image sample set by said fitting index to obtain a plurality of sample subsets comprises:

The image sample set is divided by the integral of the fitting index to obtain a plurality of the sample subsets with an equal total number of pictures.
The image data processing method according to claim 1, wherein said sampling each of said sample subsets by using sampling weights matching the accuracy of said model training to obtain a target image sample set comprises:

Acquiring the relative position of each of the sample subsets in the fitting index;

Combining the relative position and the sampling weight, sampling each of the sample subsets to obtain the target image sample set.
The image data processing method according to claim 3, wherein the combining the relative position and the sampling weight to sample each of the sample subsets to obtain the target image sample set comprises:

If the relative position is the head, it is determined whether the sampling weight is greater than 1;

If yes, perform oversampling by using the number of pictures corresponding to each of the tags in the sample subset and the sampling weight;

If not, then take the original picture of the sample subset.
The image data processing method according to claim 3, wherein the combining the relative position and the sampling weight to sample each of the sample subsets to obtain the target image sample set comprises:

If the relative position is in the middle, it is determined whether the sampling weight is greater than 1;

If so, perform oversampling by using the number of pictures corresponding to each of the tags in the sample subset, the sampling weight, and the preset weighting multiple;

If not, sampling is performed using the number of pictures corresponding to each of the tags in the sample subset and the sampling weight.
The image data processing method according to claim 3, wherein the combining the relative position and the sampling weight to sample each of the sample subsets to obtain the target image sample set comprises:

If the relative position is the tail, judge whether the sampling weight is greater than 1;

If yes, perform oversampling by using the number of pictures corresponding to each of the tags in the sample subset and the sampling weight;

If not, obtain the median number of pictures corresponding to each of the tags in the sample subset, and randomly select pictures with the median number of pictures for each tag.
The image data processing method according to any one of claims 1 to 6, further comprising:

Training the target model by using the target image sample set to obtain a trained classification and recognition model;

The classification and recognition model is used to recognize the target image to be recognized, and the recognition result is obtained.
An image data processing device, characterized in that it comprises:

The image sample set sorting module is used to sort the tags according to the number of pictures corresponding to each tag in the image sample set;

The fitting module is used to obtain the fitting index corresponding to the image sample set after sorting;

An image sample set segmentation module, configured to use the fitting index to segment the image sample set to obtain multiple sample subsets;

The training module is configured to train the target model by using each of the sample subsets to obtain the model training accuracy corresponding to each of the sample subsets;

The re-sampling module is used to sample each of the sample subsets by using the sampling weight matching the model training accuracy to obtain a target image sample set.
An image data processing device, characterized in that it comprises:

Memory, used to store computer programs;

The processor is configured to implement the steps of the image data processing method according to any one of claims 1 to 7 when the computer program is executed.
A readable storage medium, characterized in that a computer program is stored on the readable storage medium, and when the computer program is executed by a processor, the steps of the image data processing method according to any one of claims 1 to 7 are realized .