CN113128588A

CN113128588A - Model training method and device, computer equipment and computer storage medium

Info

Publication number: CN113128588A
Application number: CN202110416258.XA
Authority: CN
Inventors: 艾长青; 周大军; 赖勇辉; 张先震
Original assignee: Shenzhen Tencent Domain Computer Network Co Ltd
Current assignee: Shenzhen Tencent Domain Computer Network Co Ltd
Priority date: 2021-04-16
Filing date: 2021-04-16
Publication date: 2021-07-16
Anticipated expiration: 2041-04-16
Also published as: CN113128588B

Abstract

The application discloses a model training method, a model training device, computer equipment and a computer storage medium, wherein the method comprises the following steps: acquiring a sample set of the object identification model, wherein the sample set comprises a plurality of labeled images and labeled information of each labeled image; if the sample set is detected to comprise the labeled objects of N categories according to the labeling information of each labeled image and the object proportion balance among the categories is not achieved, acquiring the object expansion proportion of each category, wherein N is a positive integer larger than 1; determining the associated annotation image of each category from the sample set, and performing sample expansion on the basis of the object expansion proportion of each category and the associated annotation image of each category to obtain a plurality of expanded images; and the plurality of extended images and each labeled image in the sample set are used as training samples of the object recognition model, and the training samples are adopted to carry out model training on the object recognition model, so that the performance of the object recognition model can be improved.

Description

Model training method and device, computer equipment and computer storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a model training method, a model training apparatus, a computer device, and a computer storage medium.

Background

Object recognition generally refers to a technology for recognizing and analyzing a certain class of objects by using a machine, and generally includes two tasks of classification and detection, wherein, for example, classification can be used for judging whether an image contains a certain class of objects, and detection can be used for marking the positions and sizes of the objects. At present, object recognition is generally realized on an image by adopting an object recognition model. The accuracy of the object recognition result is closely related to the performance of the object recognition model, and the performance of the object recognition model is closely related to the model training method. Therefore, how to train the object recognition model to improve the performance of the object recognition model becomes a research hotspot.

Disclosure of Invention

The application provides a model training method, a model training device, computer equipment and a computer storage medium, which can improve the performance of an object recognition model.

In one aspect, the present application provides a model training method, including:

acquiring a sample set of an object identification model, wherein the sample set comprises a plurality of labeled images and labeled information of each labeled image; the annotation information of any annotated image is used to indicate: one or more annotation objects in any annotation image and the category to which each annotation object belongs;

if the sample set is detected to comprise N categories of labeled objects according to the labeling information of each labeled image and the object proportion balance among the categories is not achieved, acquiring the object expansion proportion of each category, wherein N is a positive integer larger than 1;

determining the associated annotation image of each category from the sample set, and performing sample expansion on the basis of the object expansion proportion of each category and the associated annotation image of each category to obtain a plurality of expanded images; the associated annotation image of any category refers to: an annotation image having an annotation object under the any one category;

and taking the plurality of extended images and each labeled image in the sample set as a training sample of the object recognition model, and performing model training on the object recognition model by adopting the training sample.

In one aspect, the present application provides a model training apparatus, comprising:

the system comprises an acquisition unit, a storage unit and a processing unit, wherein the acquisition unit is used for acquiring a sample set of an object identification model, and the sample set comprises a plurality of labeled images and labeled information of each labeled image; the annotation information of any annotated image is used to indicate: one or more annotation objects in any annotation image and the category to which each annotation object belongs;

the obtaining unit is further configured to obtain an object expansion ratio of each class if it is detected that the sample set includes N classes of labeled objects according to the labeling information of each labeled image and the object ratio among the classes is not balanced, where N is a positive integer greater than 1;

the processing unit is used for determining the associated labeled image of each category from the sample set, and performing sample expansion on the basis of the object expansion ratio of each category and the associated labeled image of each category to obtain a plurality of expanded images; the associated annotation image of any category refers to: an annotation image having an annotation object under the any one category;

and the training unit is used for taking the plurality of extended images and each labeled image in the sample set as a training sample of the object recognition model and carrying out model training on the object recognition model by adopting the training sample.

In one aspect, the present application provides a computer device comprising:

a processor adapted to implement one or more computer programs;

a computer storage medium storing one or more computer programs adapted to be loaded and executed by the processor to:

acquiring a sample set of an object identification model, wherein the sample set comprises a plurality of labeled images and labeled information of each labeled image; the annotation information of any annotated image is used to indicate: one or more annotation objects in any annotation image and the category to which each annotation object belongs; if the sample set is detected to comprise N categories of labeled objects according to the labeling information of each labeled image and the object proportion balance among the categories is not achieved, acquiring the object expansion proportion of each category, wherein N is a positive integer larger than 1; determining the associated annotation image of each category from the sample set, and performing sample expansion on the basis of the object expansion proportion of each category and the associated annotation image of each category to obtain a plurality of expanded images; the associated annotation image of any category refers to: an annotation image having an annotation object under the any one category; and taking the plurality of extended images and each labeled image in the sample set as a training sample of the object recognition model, and performing model training on the object recognition model by adopting the training sample.

In one aspect, the present application provides a computer storage medium having stored thereon one or more instructions adapted to be loaded by a processor and executed to:

In one aspect, embodiments of the present application provide a computer program product or a computer program, where the computer program product includes a computer program, and the computer program is stored in a computer storage medium; the processor reads the computer program from the computer storage medium, and the processor executes the computer program to cause the computer device to execute:

According to the method and the device, when the object proportions of various labeling objects included in the sample set are unbalanced, the related labeling images of the various labeling objects are expanded based on the object expansion proportions of the various labeling objects, so that the aim of expanding the various labeling objects is fulfilled, the problem of unbalanced object proportions of the various labeling objects in the sample set can be effectively balanced, and further, the object recognition model is trained by adopting the expanded sample set, so that the performance of the object recognition model can be effectively improved.

Drawings

In order to more clearly illustrate the technical solution of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

FIG. 1a is a schematic diagram of an annotated image provided herein;

FIG. 1b is a schematic flow chart of a model training method provided herein;

FIG. 2 is a schematic diagram of a model training method provided herein;

FIG. 3a is a schematic diagram of an annotated image as provided herein;

FIG. 3b is a schematic diagram of an augmented image provided herein;

FIG. 3c is a schematic diagram of yet another augmented image provided herein;

FIG. 4 is a schematic diagram of a model training method provided herein;

FIG. 5a is a schematic flow chart of a sample expansion provided herein;

FIG. 5b is a schematic diagram of a sample expansion process provided herein;

FIG. 5c is a schematic flow chart illustrating a process for determining a target area according to the present application;

FIG. 5d is a schematic diagram of a sample expansion process provided herein;

FIG. 5e is a sample expansion schematic of a partial area misalignment type provided herein;

FIG. 6 is a schematic diagram of a model training apparatus according to the present application;

fig. 7 is a schematic diagram of an architecture of a computer device provided in the present application.

Detailed Description

With the vigorous development of computer technology, the AI (Artificial Intelligence) technology has also made great progress. The artificial intelligence technology refers to a theory, a method, a technology and an application system which simulate, extend and expand human intelligence by using a digital computer or a machine controlled by the digital computer, sense the environment, acquire knowledge and use the knowledge to obtain the best result. In other words, artificial intelligence is an integrated technique of computer science; the intelligent machine is mainly produced by knowing the essence of intelligence and can react in a manner similar to human intelligence, so that the intelligent machine has multiple functions of perception, reasoning, decision making and the like. Accordingly, AI technology is a comprehensive discipline, which mainly includes Computer Vision technology (CV), speech processing technology, natural language processing technology, and Machine Learning (ML)/deep Learning.

The machine learning in the machine learning/deep learning is a multi-field cross subject, and relates to multiple subjects such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of AI, which is the fundamental way to make computer devices intelligent; deep learning is a technology for machine learning by using a deep neural network system; machine Learning/deep Learning may generally include a variety of techniques such as artificial neural networks, Reinforcement Learning (RL), supervised Learning, unsupervised Learning, and so on.

Based on machine learning/deep learning techniques in AI techniques, the present application provides a model training scheme that can be used to train an object recognition model to improve the performance of the object recognition model, such as: so as to improve the accuracy of target identification of the object identification model. The object recognition means: identifying one or more objects in the image by adopting a deep neural network algorithm, wherein the image can be a game image, a character shooting image, a commodity image and the like; for game images, target identification refers to identifying virtual character objects, prop objects, blood streak objects, and the like in the game images; for the photographed image of the person, the target recognition may refer to recognizing an ornament object, a five sense organ object (e.g., eyes, nose, eyebrows, etc.), and the like in the photographed image of the person. In a specific implementation, the model training method can be executed by a computer device, and the computer device can be a terminal or a server. Among others, terminals may include, but are not limited to: smart phones, tablet computers, notebook computers, desktop computers, smart televisions, and the like; various clients (APPs) can be run in the terminal, such as a multimedia playing client, a social client, a browser client, an information flow client, an education client, and so on. The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a CDN (Content Delivery Network), a big data and artificial intelligence platform, and the like.

In a specific implementation, the general principle of the model training scheme is as follows: firstly, a computer device may obtain a plurality of annotation images for performing model training on an object recognition model, where the plurality of annotation images may be a plurality of manually collected pictures with annotation information, each annotation image is a sample (i.e., picture data for model training), and each annotation image includes one or more annotation objects, which may be specifically shown in fig. 1a, where, as shown in 11 in fig. 1a, an annotation image may include an annotation object 111; as also shown at 12 in FIG. 1a, the annotation image can include a plurality of annotation objects of the same category (e.g., two annotation objects of the category "tower"); also shown as 13 in FIG. 1a, or 14 in FIG. 1a, the annotation image can include multiple annotation objects under different categories, such as: two annotation objects 131 of the category "tower" and one annotation object 132 of the category "person", or as: a plurality of annotation objects 141 of the category "tower" and a plurality of annotation objects 142 of the category "person".

Because the collection modes of the annotation images are different, the number of the various types of annotation objects included in each collected annotation image may be different, for example, if the annotation image is extracted from a shooting game picture, the number of the annotation objects of the "character" type may be far greater than the number of the annotation objects of the "house" type; if the annotation images are extracted from the construction-type game scene, the number of annotation objects of the "house" type may be much larger than the number of annotation objects of the "character" type. When the number of each type of labeled object is different, the proportion of each type of labeled object is unbalanced; in this case, the computer device may first determine a ratio of each type of annotation object included in the plurality of annotation images, which needs to be expanded, so that the computer device may determine the number of times (assuming that the number of times is the target number), which each annotation image needs to be expanded, and then the computer device performs sample expansion on the basis of the annotation image for the target number of times, so as to achieve the purpose of balancing the proportions of the various types of annotation objects in the plurality of annotation images, for example, the general principle of the model training scheme may be shown in fig. 1 b. The model training scheme provided by the application can be understood as not needing to pay much attention to picture contents in the collection stage of the labeled images, so that human resources are saved, meanwhile, the computer equipment carries out sample expansion based on the object expansion proportion of each type of labeled objects to obtain the training samples, the problem of unbalanced object proportion of each type of labeled objects in the labeled images is solved, and further the generalization of the object recognition model obtained by training through the training samples is enhanced.

Based on the principle explanation of the model training scheme, the application provides a model training method which can be executed by the computer device mentioned above; referring to fig. 2, the model training method includes the following steps S201 to S204:

s201, acquiring a sample set of the object recognition model.

The sample set comprises a plurality of marked images and marked information of each marked image; the annotation information of any annotated image is used to indicate: one or more annotation objects in any annotation image, and the category to which each annotation object belongs.

S202, if the fact that the sample set comprises the labeled objects of N categories is detected according to the labeling information of each labeled image and the object proportion among the categories is not balanced, the object expansion proportion of each category is obtained.

Wherein N is a positive integer greater than 1, i.e.: the sample set includes a plurality of categories of labeled objects, and the meaning of "plurality" mentioned herein is: at least two.

In one embodiment, the computer device may obtain the number of the labeled objects in each category according to the labeling information of each labeled image, and then determine whether the object proportion balance among the categories is achieved based on the number of the labeled objects in each category. In a specific embodiment, the object proportion balancing may refer to: the number of the labeled objects in each category reaches the expected value, for example, it is assumed that the sample set includes 3 categories of labeled objects, which are respectively "person", "tree" and "flame", and the preset values of the number of "person", "tree" and "flame" are 5000; then, if the number of "people" is 5000, the number of "trees" is 5056, and the number of "flames" is 5600, it is considered that the object proportion balance is achieved for each type of labeled object in the sample set.

Optionally, the object proportion balancing may also refer to: the difference between the numbers of labeled objects in each category is within a certain threshold, such as: and if the quantity difference of the labeled objects of all the categories is not more than 100, the labeled objects of all the categories are considered to reach the object proportion balance. In this case, for example, it is assumed that the sample set includes 3 types of labeled objects, which are "person", "tree" and "flame", respectively, if the number of "person" is 5000, the number of "tree" is 4000, and the number of "flame" is 5600, it is considered that the various types of labeled objects in the sample set do not reach the object proportion balance, and further, a computer device is required to obtain the object expansion proportion of each type; if the number of people is 5000, the number of trees is 5056, and the number of flames is 5060, the various types of labeled objects in the sample set are considered to reach the object proportion balance, and the computer equipment is not required to obtain the object expansion proportion of each type.

In another embodiment, the computer device may further obtain the number of associated labeled images of each category according to the labeling information of each labeled image, and then determine whether the object ratio balance between the categories is achieved based on the number of associated labeled images of each category. Wherein, the associated annotation image of each category refers to: and the annotation image is provided with the annotation objects under the category. As shown in FIG. 3a, 301 in FIG. 3a can be understood as the associated annotation image of the annotation object of the "tower" class, 302 in FIG. 3a can be understood as the associated annotation image of the annotation object of the "tower" class, and can also be understood as the associated annotation image of the annotation object of the "person" class. Thus, the number of associated annotation images for each category can be understood as: for example, if the annotation image 1 includes 1 a-type annotation object a and 2B-type annotation objects B, and the annotation image 2 includes 1 a-type annotation object a and 5C-type annotation objects C, the annotation images corresponding to the a-type annotation objects are "annotation image 1" and "annotation image 2", and thus the number of the associated annotation images of "a-type" is 2; similarly, the number of the related annotation images of "class B" is 1, and the number of the related annotation images of "class C" is 1. Further, the object proportion balancing may refer to: the difference value between the number of the associated annotation images of each category is within a certain threshold value range; optionally, the object proportion balancing may also refer to: the number of the associated annotation images of each category reaches the expected value.

S203, determining the associated labeled image of each category from the sample set, and performing sample expansion on the basis of the object expansion ratio of each category and the associated labeled image of each category to obtain a plurality of expanded images.

In a specific embodiment, one category may correspond to one or more associated annotation images, and then, N categories may correspond to at least N associated annotation images, where N is an integer greater than 1; in addition, the computer device may perform sample expansion based on one associated annotation image to obtain one or more expanded images, and it is understood that the manner of performing sample expansion based on each category of annotation object by the computer device may be: the computer device performs sample expansion based on the associated labeled image corresponding to each category, and it is further understood that the computer device may perform sample expansion based on each category in the sample set, so that a plurality of expanded images may be obtained, where "plurality" means: at least N. Exemplarily, assuming that the associated annotation image 1 and the associated annotation image 2 are both annotation images with class a annotation objects, the computer device may perform sample expansion according to the associated annotation image 1 to obtain one or more expanded images a, and the computer device may further perform sample expansion according to the associated annotation image 2 to obtain one or more expanded images b, and then the plurality of expanded images may be understood as including: the one or more augmented images a and the one or more augmented images b.

In one embodiment, the manner in which the computer device performs sample augmentation based on the associated annotation image includes, but is not limited to: and copying the associated annotation image, and performing white screen expansion, black screen expansion, flower screen expansion, mirror image, rotation, scale variable, color dithering and the like on the associated annotation image. Wherein, assuming that the image shown at 31 in fig. 3b is a normal annotation image a, the so-called white screen expansion can be understood as: the computer device generates an augmented image with a white screen area based on the annotation image a (321 and 322 are both white screen areas as shown at 32 in FIG. 3 b); the so-called black screen expansion can be understood as: the computer device generates an expanded image with a black screen area based on the annotation image a (shown as 33 in FIG. 3b, 331 and 332 are both black screen areas); the so-called screen expansion can be understood as: the computer device generates an expanded image having a flower screen image based on the annotation image a (as shown in fig. 3 c), wherein the flower screen image may include a malposition type flower screen image (34 in fig. 3c, 341 and 342 are regions where malposition occurs), a color dithering type flower screen image (35 and 36 in fig. 3c, 351, 352 and 361 are regions where color dithering occurs), or other types of flower screen images.

S204, using the plurality of extended images and each labeled image in the sample set as a training sample of the object recognition model, and performing model training on the object recognition model by using the training sample.

It can be understood that in the training sample of the object recognition model, the number of the labeled objects of each category is greater than the number of the labeled objects of each category included in the sample set, so that the computer device can achieve a better training effect when the training sample is used for training the object recognition model.

According to the model training method, whether the object proportion of each type of labeled object in the sample set is balanced or not is judged by obtaining the labeling information of each labeled object in the sample set, and when each type of labeled object does not reach the object proportion balance, sample expansion can be carried out on the basis of the object expansion proportion of each type and the associated labeled image of each type so as to balance the proportion of each type of labeled object; and furthermore, in the process that the computer equipment performs model training on the object recognition model by adopting a plurality of extended images and each marked image in the sample set, the object recognition model can learn the characteristics of various marked objects in a balanced manner to a certain extent, so that the learning training effect of the object recognition model is favorably improved, and the model performances such as the generalization, the robustness and the like of the object recognition model are further improved.

Referring to fig. 4, fig. 4 is a method for training a model provided by the present application, and the method may include the following steps S401 to S406:

s401, a sample set of the object recognition model is obtained.

In an embodiment, the specific implementation in step S401 may refer to the related description in step S201, and is not described herein again.

S402, if the fact that the sample set comprises the labeled objects of N categories is detected according to the labeling information of each labeled image and the object proportion among the categories is not balanced, the object expansion proportion of each category is obtained.

In one embodiment, when the computer device obtains the expansion ratio of the object for each category, the total number of images of the annotation images in the sample set and the expected number of images of the sample set may be obtained first, where the expected number of images refers to the number of annotation images included in the expected sample set. Illustratively, the desired number of images may be, for example, the minimum number of samples required for model training (e.g., the minimum number of annotated images).

In an alternative embodiment, if the desired number of images is greater than the total number of images, the computer device obtains a first baseline expansion ratio for each category, and calculates an object expansion ratio for each category according to the first baseline expansion ratio for each category. For example, assume that the sample set acquired by the computer device includes 10 annotation images, and the 10 annotation images include annotation objects of 3 categories (category a, category B, and category C). If the expected number of images is 20, the computer device respectively acquires a first reference expansion ratio of the A category, a first reference expansion ratio of the B category and a first reference expansion ratio of the C category, then calculates the object expansion ratio of the A category based on the first reference expansion ratio of the A category, calculates the object expansion ratio of the B category based on the first reference expansion ratio of the B category and calculates the object expansion ratio of the C category based on the first reference expansion ratio of the C category. Specifically, when the expected number of images is greater than the total number of images, the computer device may first calculate a scaling factor corresponding to each category, and then perform amplification processing on the first reference expansion ratio of each category by using the scaling factor of each category, so as to obtain an object expansion ratio of each category. Exemplarily, the computer device performs an amplification process on the first reference expansion ratio of the ith class labeled object, which is shown in formula 1.

Wherein, ratio_iRepresenting a first reference expansion proportion corresponding to the i-th type of labeling object, sampleNum representing the total number of images of labeled images in the sample set, minSampleNum representing the expected number of images in the sample set, and minRatio_iObject expansion ratio and scaling for representing i-th class labeled objectThe large coefficients can be understood as:

correspondingly, if the expected number of the images is less than or equal to the total number of the images, the computer device obtains a second reference expansion ratio corresponding to the labeled object of each category, and uses the second reference expansion ratio as an object expansion ratio of the labeled object under the category, as an example, see formula 2.

minRatio_i＝ratio′_iFormula 2

Wherein, ratio'_iAnd the second standard expansion ratio of the i-th class label object is shown. Then, in summary, the way that the computer device obtains the expansion ratio of the object of each category can be seen in formula 3.

Wherein, ratio_iIndicates a first standard expansion proportion, ratio 'corresponding to the i-th class mark object'_iRepresenting a second standard expansion ratio of the i-th class of labeled objects, sampleNum representing the total number of images labeled in the sample set, minSampleNum representing the expected number of images in the sample set, and minRatio_iAnd the expansion ratio of the objects of the ith class of labeled objects is shown.

In another optional embodiment, if the expected number of images is greater than the total number of images, the computer device obtains a first standard expansion ratio of each category, and takes the first standard expansion ratio of each category as an object expansion ratio of the category; if the expected number of images is less than or equal to the total number of images, the computer device obtains a second standard expansion ratio of each category, and takes the second standard expansion ratio as the target expansion ratio of the category. Based on this, the computer device determines the manner in which the objects of each category are scaled, as shown in equation 4.

In one embodiment, the first reference expansion ratio and the second reference expansion ratio of each category may be set according to an empirical value, for example, the first reference expansion ratio and the second reference expansion ratio may be set according to an application scene of the object recognition model. For example, when the object recognition model is used to recognize an object in a game screen, the first reference expansion ratio and the second reference expansion ratio may be set according to the frequency of occurrence of various types of objects in the game screen, such as: the first and second reference expansion ratios of the object whose appearance frequency is high are set to 0.2, and the first and second reference expansion ratios of the object whose appearance frequency is low are set to 0.1.

Optionally, the first reference proportion and the second reference proportion corresponding to the labeling objects of the same category may be the same. For example, assuming that the game screen in game a mainly includes "tower", "hero", "monster", "road", and the like, and the frequency of appearance of "tower" and "hero" is highest, the frequency of appearance of "monster" is low, and therefore when the object recognition model is used to recognize the object included in the game screen in game a, the first reference expansion ratio and the second reference expansion ratio of "tower" and "hero" may be set to 0.2, and the first reference expansion ratio and the second reference expansion ratio of "monster" and "road" may be set to 0.1.

Optionally, the first reference proportion and the second reference proportion corresponding to the labeling objects of the same category may not be the same. Specifically, as can be seen from the foregoing, when the total number of images of the annotated images is less than the desired number of images of the annotated images, the computer device will sample-expand based on the first baseline expansion ratio for each class with the purpose of: and more training samples are obtained by expansion, so that the effect of model training is ensured. Therefore, it can be understood that when the number of the labeled images is small, the computer device needs to expand to obtain more training samples based on the first standard expansion ratio, and when the number of the labeled images is large, the computer device only needs to expand according to the basic ratio (or called as the second standard expansion ratio). Therefore, the first reference proportion and the second reference proportion of the same category may also be different, and specifically, the first reference expansion proportion corresponding to the nth category of labeled object may be larger than the second reference expansion proportion corresponding to the nth category of labeled object. For example, the commodity image mainly includes "clothes", "daily necessities", "pets", "flowers", and the like, and the appearance frequencies of various types of objects in the commodity image are ranked as follows: clothes > daily necessities > flowers > pets, and thus, when the object recognition model is used to recognize an object included in the commodity image in the online shopping platform, a first reference expansion ratio of "clothes" may be set to 0.5, a second reference expansion ratio of "clothes" may be set to 0.3, a first reference expansion ratio of "pet" may be set to 0.2, and a second reference expansion ratio of "pet" may be set to 0.1.

S403, determining the associated labeled image of each category from the sample set, and calculating the target expansion times of each associated labeled image in the nth category according to the object expansion ratio of the nth category.

The computer equipment can count the total number of objects of the labeled objects in the sample set (namely the sum of the number of the labeled objects in the N categories) according to the labeling information of each labeled image in the sample set; then, the computer device calculates the object expansion number of the nth category based on the counted total number of the objects and the object expansion ratio of the nth category, wherein the object expansion number can be understood as: the total number of sample expansions by the computer device based on the annotated image with the category of annotated objects is required. Specifically, the calculation method may be as shown in equation 5:

extendSampleTimes_i＝ceil(totalObjNum*mibRatio_i) Formula 5 wherein, extendSampleTimes_iExpanding the number of objects of the i-th class labeled object, ceil being a ceiling function (or called as a ceiling function), totalObjNum being the total number of objects (i.e. the total number of labeled objects contained in the sample set), minRatio_iAnd expanding the scale of the objects of the ith class marking objects. For example, if the total number of objects is 10 and the expansion ratio of the objects of the i-th class labeled object is 0.21, totalObjNum minRatio is obtained_i10 x 0.21 x 2.1, and furthermore, 2.1 is rounded up to obtain extendSampleTimes_iThe value of (b) is 3, therefore, the computer device needs to expand the annotation image with the i-th class annotation object (or the associated annotation image of the i-th class annotation object) 3 times to obtain 3 expanded images. Further, after obtaining the object expansion number of the nth category, the computer device may calculate the target expansion number of each associated annotation image in the nth category according to the object expansion number of the nth category and the annotation information of each associated annotation image in the nth category.

Optionally, when the computer device obtains the number of the annotation objects in each category according to the annotation information of each annotation image, and determines whether the object proportion balance between the categories is achieved based on the number of the annotation objects in each category, the specific implementation manner that the computer device calculates the target expansion times of each associated annotation image in the nth category may be as follows: assuming that there are 3 annotation images including the ith class of annotation object, namely, the associated annotation image a1 (including 2 ith class of annotation objects), the associated annotation image a2 (including 1 ith class of annotation object) and the associated annotation image A3 (including 3 ith class of annotation objects), the target augmentation times of each associated annotation image can be: the target expansion number of the related annotation image a1 is 1, the target expansion number of the related annotation image a2 is 1, and the target expansion number of the related annotation image A3 is 0; or, the target expansion number of the related annotation image a1 is 3, the target expansion number of the related annotation image a2 is 0, and the target expansion number of the related annotation image A3 is 0; alternatively, the target expansion count of the related annotation image a1 is 0, the target expansion count of the related annotation image a2 is 0, and the target expansion count of the related annotation image A3 is 1. It should be noted that, the present application does not specifically limit the allocation of the target expansion times of each associated annotation image, as long as it is ensured that the computer device can obtain 3 (the number of objects to be expanded) newly-added associated annotation images of the i-th class annotation object according to the expansion of the associated annotation images of the i-th class annotation object.

Correspondingly, if the computer device obtains the number of the associated labeled images of each category according to the labeling information of each labeled image, and then determines whether the object proportion among the categories is balanced based on the number of the associated labeled images of each category, the computer device calculates the specific implementation manner of the target expansion times of each associated labeled image in the nth category, which can be seen in the following examples: assuming that there are 3 annotation images including the ith class of annotation object, namely, the associated annotation image a1 (including 2 ith class of annotation objects), the associated annotation image a2 (including 1 ith class of annotation object) and the associated annotation image A3 (including 3 ith class of annotation objects), the target augmentation times of each associated annotation image can be: the target expansion number of the associated annotation image a1 is 1, the target expansion number of the associated annotation image a2 is 1, and the target expansion number of the associated annotation image A3 is 1; alternatively, the target expansion number of the related annotation image a1 is 2, the target expansion number of the related annotation image a2 is 1, and the target expansion number of the related annotation image A3 is 0; or, the target expansion times of the associated annotation image a1 is 0, the target expansion times of the associated annotation image a2 is 3, the target expansion times of the associated annotation image A3 is 0, and so on, the present application does not specifically limit the target expansion times of each associated annotation image, as long as it is ensured that the computer device can obtain 3 (object expansion number) newly added i-th class annotation objects according to the expansion of the associated annotation images of the i-th class annotation objects.

S404, traversing each associated annotation image in the nth category, and taking the currently traversed associated annotation image as a target image.

Traversing each associated annotation image in the nth category means: the computer device sequentially accesses each associated annotation image of the one or more associated annotation images corresponding to the nth class of annotation object, where the access specifically may be to obtain annotation information of the associated annotation image, and then the currently traversed associated annotation image (or called: target image) may be understood as: an annotation image of the annotation information is being acquired.

S405, sample expansion is carried out on the target image based on the target expansion times of the target image, and one or more expansion images corresponding to the target image are obtained.

In one embodiment, the computer device performs sample expansion on the target image once, and may specifically perform image replication processing on the target image to obtain a replicated image; and then, determining a target area in the copied image, adjusting the color of each pixel point in the target area to a target color, and taking the adjusted copied image as an extended image corresponding to the target image.

In an alternative embodiment, the computer device may randomly determine the size and the position of the target area in the copied image, based on which, when the computer device adjusts the color of each pixel point in the target area to black or white, the specific flow of the sample expansion operation may be as shown in fig. 5 a. As shown in fig. 5a, the computer device may repeatedly perform the steps of "generating one target area and filling the target area in black or white" until the computer device generates and fills M target areas in the copied image, where M is an integer, where positions where the computer device generates the target areas (i.e., "rectangular boxes" in fig. 5 a) may overlap. For example, as shown in fig. 5b, the step of performing the above-mentioned "generating a target area and filling the target area with black or white" for the first time may be shown as 51 in fig. 5b, and the step of performing the above-mentioned "generating a target area and filling the target area with black or white" for the second time may be shown as 52 in fig. 5 b. In a specific application, the number of the black screens or the white screens generated by the computer device can be realized by randomly generating the number, in order to ensure the effectiveness of the expanded image, the computer device will not generate too many target areas, and the occupation ratio of the target areas (such as the black screen areas and the white screen areas) in the copied image cannot be too large. Illustratively, the computer device can define the areas occupied by the black screen region and the white screen region within 1/3 of the total area of the entire copied image, where 1 ≦ M ≦ 5, and M is an integer.

In another optional implementation, if the copied image includes the annotation object in the nth category and other annotation objects in other categories; the computer device determines the target area in the copied image, and may specifically include: the computer device determines the display areas of other annotation objects in the copied image and takes the display areas of the other annotation objects as target areas. For example, as can be seen in fig. 5c, assuming that the copied image is as shown at 53 in fig. 5c and the nth category is "tree", the computer device may determine the region in which "person" is located as the target region in the copied image.

In another embodiment, the computer device may perform image duplication processing on the target image to obtain a duplicated image; then generating a flower screen image block related to the copied image according to the flower screen type and the flower screen color, and determining the display position of the flower screen image block in the copied image; then, the computer device adds the image block of the flower screen at the display position to cover the area image at the display position, and obtains an extended image corresponding to the target image, as an example, see fig. 5 d. Wherein, the type of the flower screen may include: the method comprises the following steps of region snowflakes, transverse lines, longitudinal lines, character disorder, partial region dislocation and the like, wherein each flower screen type can correspond to one or more flower screen image blocks, and each flower screen image block can correspond to one or more flower screen colors. For example: the regional snowflake type can correspond to a circular snowflake region and can also correspond to a hexagonal snowflake region. Further, the computer device may generate a corresponding image block of the flower screen according to the type of the flower screen, and fill the image block of the flower screen with the corresponding color of the flower screen, so as to obtain a final expanded image. For example, as shown in fig. 5e, when the type of the flower screen is partial area misplacement, the image block of the flower screen is the shape and size of the misplaced area (as shown in 54 in fig. 5 e), and the color of the flower screen is the image content included in the error area (as shown in 55 in fig. 5 e).

S406, taking the plurality of extended images and each labeled image in the sample set as training samples of the object recognition model, and performing model training on the object recognition model by adopting the training samples.

In an embodiment, the specific implementation in step S406 may refer to the related description in step S204, and is not described herein again.

According to the model training method, when the number of the labeled images in the sample set is small, the labeled images can be expanded based on the object expansion ratio, so that the number of the labeled images can be increased, the sample set can be expanded, and an object recognition model obtained based on the sample set training has stronger robustness; meanwhile, the computer equipment can judge whether various labeled objects in the sample set reach object proportion balance according to the labeling information of each sample in the sample set, when various labeled objects in the sample set do not reach the object proportion balance, the computer equipment can obtain the object expansion proportion of each category, then the sample expansion is carried out based on the object expansion proportion, the purpose of balancing the proportion of various labeled objects in the sample set can be achieved, an object recognition model obtained after model training is carried out based on the balanced sample set can be enabled to have stronger generalization, in addition, the computer equipment has more modes for carrying out the sample expansion on the category based on the object expansion proportion of each category, the diversity of training samples can be effectively increased, and the effect of the model training is effectively ensured on a data source.

Based on the description of the above embodiment of the model training method, the embodiment of the present invention also discloses a model training apparatus, which may be a computer program (including program code) running in the above mentioned computer device. The model training apparatus may perform the method shown in fig. 2 or fig. 4. Referring to fig. 6, the model training apparatus may include at least: an acquisition unit 601, a processing unit 602 and a training unit 603.

An obtaining unit 601, configured to obtain a sample set of an object identification model, where the sample set includes a plurality of annotation images and annotation information of each annotation image; the annotation information of any annotated image is used to indicate: one or more annotation objects in any annotation image and the category to which each annotation object belongs;

the obtaining unit 601 is further configured to obtain an object expansion ratio of each class if it is detected that the sample set includes N classes of labeled objects according to the labeling information of each labeled image and the object ratio among the classes is not balanced, where N is a positive integer greater than 1;

a processing unit 602, configured to determine the associated labeled image of each category from the sample set, and perform sample expansion based on the object expansion ratio of each category and the associated labeled image of each category to obtain a plurality of expanded images; the associated annotation image of any category refers to: an annotation image having an annotation object under the any one category;

a training unit 603, configured to use the multiple extended images and each labeled image in the sample set as a training sample of the object recognition model, and perform model training on the object recognition model by using the training sample.

In an embodiment, the obtaining unit 601, when performing obtaining the object expansion ratio of each category, is specifically configured to perform:

acquiring the total number of images of the annotated images in the sample set and the expected number of images in the sample set, wherein the expected number of images is the number of the annotated images expected to be included in the sample set;

if the expected number of the images is larger than the total number of the images, acquiring a first standard expansion proportion of each category, and respectively calculating an object expansion proportion of each category according to the first standard expansion proportion of each category;

and if the expected number of the images is less than or equal to the total number of the images, acquiring a second reference expansion ratio of each category as an object expansion ratio of each category.

In another embodiment, when the obtaining unit 601 performs the calculation of the target expansion ratio of each category according to the first reference expansion ratio of each category, the method is specifically configured to perform:

calculating a corresponding scaling factor of each category according to the total number of the images and the expected number of the images;

and respectively adopting the scale amplification coefficient of each category to amplify the first standard expansion scale of each category to obtain the object expansion scale of each category.

In another embodiment, when the processing unit 602 performs sample expansion based on the object expansion ratio of each category and the associated labeled image of each category to obtain a plurality of expanded images, the method is specifically configured to perform:

calculating the target expansion times of each associated labeled image in the nth category according to the object expansion proportion of the nth category; wherein N belongs to [1, N ];

traversing each associated annotation image in the nth category, and taking the currently traversed associated annotation image as a target image;

and performing sample expansion on the target image based on the target expansion times of the target image to obtain one or more expanded images corresponding to the target image.

In another embodiment, when the processing unit 602 performs calculating the target expansion times of each associated labeled image in the nth category according to the object expansion ratio of the nth category, specifically:

counting the total number of objects of the labeled objects in the sample set according to the labeling information of each labeled image in the sample set;

calculating the object expansion number of the nth class based on the counted total number of the objects and the object expansion proportion of the nth class;

and calculating the target expansion times of each associated labeled image in the nth category according to the object expansion quantity of the nth category and the labeling information of each associated labeled image in the nth category.

In another embodiment, the processing unit 602 performs sample expansion on the target image for one time, specifically performing:

carrying out image copying processing on the target image to obtain a copied image;

determining a target area in the copied image, and adjusting the color of each pixel point in the target area to a target color;

and taking the adjusted copied image as an extended image corresponding to the target image.

In another embodiment, if the copied image includes the annotation object in the nth category and other annotation objects in other categories; when the processing unit 602 determines the target area in the copied image, it specifically executes:

if the copied image comprises the annotated object under the nth category and other annotated objects under other categories; the determining a target area in the replicated image.

generating a flower screen image block related to the copied image according to the flower screen type and the flower screen color; determining the display position of the image block of the flower screen in the copied image;

and adding the image blocks of the flower screen at the display position to cover the area image at the display position to obtain an expanded image corresponding to the target image.

According to an embodiment of the present application, the steps involved in the methods shown in fig. 2 and 4 may be performed by the units in the model training apparatus shown in fig. 6. For example, step S201 and step S202 shown in fig. 2 can both be performed by the obtaining unit 601 in the model training apparatus shown in fig. 6; step S203 may be performed by the processing unit 602 in the model training apparatus shown in fig. 6; step S204 may be performed by the training unit 603 in the model training apparatus shown in fig. 6. For another example, steps S401 to S402 shown in fig. 4 can be executed by the obtaining unit 601 in the model training apparatus shown in fig. 6; steps S403 to S405 may all be performed by the processing unit 602 in the model training apparatus shown in fig. 6; step S406 may be performed by the training unit 603 in the model training apparatus shown in fig. 6.

According to another embodiment of the present application, the units in the model training apparatus shown in fig. 6 are divided based on logic functions, and the units may be respectively or entirely combined into one or several other units to form, or some unit(s) may be further split into multiple units smaller in function to form, which may achieve the same operation without affecting the achievement of the technical effect of the embodiment of the present application. In other embodiments of the present application, the model training apparatus may also include other units, and in practical applications, these functions may also be implemented by being assisted by other units, and may be implemented by cooperation of multiple units.

According to another embodiment of the present application, the model training apparatus as shown in fig. 6 may be constructed by running a computer program (including program codes) capable of executing the steps involved in the method as shown in fig. 2 or fig. 4 on a general-purpose computing device such as a computer including a Central Processing Unit (CPU), a random access storage medium (RAM), a read-only storage medium (ROM), and the like, and a storage element, and the model training method of the embodiment of the present application may be implemented. The computer program may be embodied on, for example, a computer storage medium, and loaded into and executed by the computing device described above via the computer storage medium.

The model training device provided by the application acquires the labeling information of each labeled object in the sample set through the acquisition unit, further judges whether the object proportion of each type of labeled object in the sample set is balanced, and calls the processing unit to expand the sample based on the object expansion proportion of each type and the associated labeled image of each type when each type of labeled object does not reach the object proportion balance so as to balance the proportion of each type of labeled object; and furthermore, in the process that the computer equipment calls the training unit and model training is carried out on the object recognition model by adopting a plurality of extended images and each marked image in the sample set, the object recognition model can learn the characteristics of various marked objects in a balanced manner to a certain extent, the learning training effect of the object recognition model is favorably improved, and the model performances such as the generalization, the robustness and the like of the object recognition model are further improved.

Based on the description of the method embodiment and the device embodiment, the embodiment of the application further provides a computer device. Referring to fig. 7, the computer device at least includes a processor 701, an input interface 702, and a computer storage medium 703, and the processor 701, the input interface 702, and the computer storage medium 703 in the computer device may be connected by a bus or other means.

The computer storage medium 703 is a memory device in a computer device for storing programs and data. It is understood that the computer storage medium 703 herein may include both built-in storage media in the computer device and, of course, extended storage media supported by the computer device. The computer storage media 703 provides storage space that stores the operating system of the computer device. Also stored in this memory space are one or more instructions, which may be one or more computer programs (including program code), suitable for loading and execution by processor 701. The computer storage medium may be a high-speed RAM memory, or may be a non-volatile memory (non-volatile memory), such as at least one disk memory; and optionally at least one computer storage medium located remotely from the processor. The processor 701 (or CPU) is a computing core and a control core of a computer device, and is adapted to implement one or more instructions, and specifically, adapted to load and execute the one or more instructions so as to implement a corresponding method flow or a corresponding function.

In one embodiment, one or more instructions stored in the computer storage medium 703 may be loaded and executed by the processor 701 to implement the corresponding method steps described above in connection with the method embodiments illustrated in fig. 2 and 4; in particular implementations, one or more instructions in the computer storage medium 703 are loaded and executed by the processor 701 to perform the steps of:

In one embodiment, the obtaining of the expansion ratio of the object of each category is specifically loaded and executed by the processor 701:

In another embodiment, the calculating the object expansion ratio of each category according to the first reference expansion ratio of each category is specifically performed by loading and executing by the processor 701:

In another embodiment, the sample expansion is performed based on the object expansion ratio of each category and the associated labeled image of each category to obtain a plurality of expanded images, and the processor 701 specifically loads and executes:

In another embodiment, the target expansion times of each associated labeled image in the nth category is calculated according to the object expansion ratio of the nth category, and specifically, the processor 701 loads and executes:

In yet another embodiment, the target image is sample-extended once, and the processor 701 loads and executes:

In another embodiment, if the copied image includes the annotation object in the nth category and other annotation objects in other categories; the determining of the target region in the copied image is specifically performed by loading and executing by the processor 701:

The computer equipment provided by the application judges whether the object proportion of each type of labeled object in the sample set is balanced or not by acquiring the labeling information of each labeled object in the sample set, and can perform sample expansion based on the object expansion proportion of each type and the associated labeled image of each type to balance the proportion of each type of labeled object when each type of labeled object does not reach the object proportion balance; and furthermore, in the process that the computer equipment performs model training on the object recognition model by adopting a plurality of extended images and each marked image in the sample set, the object recognition model can learn the characteristics of various marked objects in a balanced manner to a certain extent, so that the learning training effect of the object recognition model is favorably improved, and the model performances such as the generalization, the robustness and the like of the object recognition model are further improved.

The embodiment of the present application further provides a computer storage medium, where a computer program of the model training method is stored in the computer storage medium, where the computer program includes program instructions, and when one or more processors load and execute the program instructions, the description of the model training method in the embodiment may be implemented, which is not described herein again. The description of the beneficial effects of the same method is not repeated herein. It will be understood that the program instructions may be deployed to be executed on one or more devices capable of communicating with each other.

It should be noted that according to an aspect of the present application, a computer program product or a computer program is also provided, and the computer program product or the computer program includes computer instructions, and the computer instructions are stored in a computer readable storage medium. A processor in the computer device reads the computer instructions from the computer-readable storage medium and then executes the computer instructions, thereby enabling the computer device to perform the methods provided in the various alternatives described above in connection with the model training method embodiments shown in fig. 2 and 4.

It will be understood by those skilled in the art that all or part of the processes of the methods of the above embodiments may be implemented by a computer program, which may be stored in a computer-readable storage medium, and may include the processes of the above embodiments of the model training method when the computer program is executed. The computer readable storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

While the invention has been described with reference to a particular embodiment, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A method of model training, comprising:

2. The method of claim 1, wherein the obtaining the expansion ratio of the objects in each category comprises:

3. The method of claim 2, wherein said calculating the object expansion ratio for each of the categories separately from the first baseline expansion ratio for each of the categories comprises:

4. The method according to claim 1, wherein the sample expansion based on the object expansion scale of each category and the associated labeled image of each category to obtain a plurality of expanded images comprises:

5. The method of claim 4, wherein said calculating a target expansion number for each associated annotated image in the nth class based on the object expansion ratio of the nth class comprises:

6. The method of claim 4, wherein sample augmenting the target image comprises:

7. The method according to claim 6, wherein if the copied image includes the annotation object in the nth category and other annotation objects in other categories; the determining a target area in the copied image comprises:

and determining the display area of the other annotation object in the copied image, and taking the display area of the other annotation object as a target area.

8. The method of claim 4, wherein sample augmenting the target image comprises:

9. A model training apparatus, comprising:

10. A computer device, comprising:

a processor adapted to implement one or more computer programs;

a computer storage medium storing one or more computer programs adapted to be loaded by the processor and to perform the model training method of any one of claims 1-8.