CN114758188A

CN114758188A - Sample label smoothing method, device and equipment of multi-layer hierarchical classification neural network

Info

Publication number: CN114758188A
Application number: CN202210156418.6A
Authority: CN
Inventors: 陈丹; 葛昊; 何辰立; 刘玉宇; 肖京
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2022-02-21
Filing date: 2022-02-21
Publication date: 2022-07-15

Abstract

The invention discloses a sample label smoothing method, a device and equipment of a multilevel hierarchical classification neural network, wherein the method comprises the following steps: the method comprises the steps of obtaining a sample initial label corresponding to a training sample, obtaining a sample initial smooth label by adopting a preset initial label smoothing method, taking the position of a sample type label value in the sample initial label as a symmetric center, obtaining a position weight set according to the distance between the position of each value and the symmetric center, obtaining sample Gaussian distribution probability based on Gaussian distribution, normalizing probability values of other positions except an original point position in the distribution to obtain a corresponding normalized probability value, normalizing the normalized probability value with the sample smooth type label value to obtain a corresponding target probability value, and replacing a non-sample smooth type label value with the target probability value to obtain a target smooth label. The problem of probability distribution of other labels in one cutting is solved, different confidence degrees are distributed to other classification categories, and the recognition probability distribution of the classification categories is more in line with the actual situation and reasonable.

Description

Sample label smoothing method, device and equipment of multi-layer hierarchical classification neural network

Technical Field

The invention relates to the technical field of intelligent decision of artificial intelligence, in particular to a sample label smoothing method, a device, equipment and a medium for a multilevel hierarchical classification neural network.

Background

At present, a multi-label classification method (DVML-kNN) is generally adopted for label classification in the conventional technology, neighbor information including labels and neighbor information not including labels are added into classification, and influence of neighbor samples on samples to be detected is comprehensively considered. The distance of a neighboring sample is calculated, the distance is quantized, and then a proper weight is selected to obtain a new classification function, and a value concept is added when the posterior probability is calculated, so that the final classification result is more biased to a weak class. However, even if the final classification result is more biased to the weak class, there is a problem of probability distribution for other labels, resulting in low accuracy of the final classification result.

Disclosure of Invention

The embodiment of the invention provides a sample label smoothing method, a sample label smoothing device and sample label smoothing equipment for a multilevel classification neural network, and aims to solve the problems that a final classification result is more biased to a weak class by adopting a multi-label classification method in the traditional method, and the accuracy of the final classification result is low due to the probability distribution problem of cutting other labels.

In a first aspect, an embodiment of the present invention provides a sample label smoothing method for a multilevel hierarchical neural network, including:

acquiring a sample initial label corresponding to the training sample, wherein the sample initial label is a one-dimensional matrix, the one-dimensional matrix comprises a plurality of values, each value is used for describing the probability of the training sample being recognized as the classification category corresponding to the value, the plurality of values comprise a sample class label value and a non-sample class label value, the sample class label value is used for describing the probability of the training sample being recognized as the classification category corresponding to the maximum probability, and the non-sample class label value is used for describing the probability of the training sample being recognized as other classification categories except the classification category recognized as the maximum probability;

performing label smoothing on the sample initial label by adopting a preset initial label smoothing method to obtain a sample initial smooth label, wherein the sample type label value is correspondingly smoothed to be a sample smooth type label value;

taking the position of the sample type label value in the sample initial label as a symmetry center, and configuring a position weight from small to large for each position of the value according to the sequence from the near to the far of the position of each value from the symmetry center to obtain a position weight set containing all the position weights;

Based on Gaussian distribution, taking the symmetric center as a center, obtaining distribution weights corresponding to all position weights in the position weight set, and obtaining sample Gaussian distribution probability;

normalizing the probability values of other positions except the original position in the sample Gaussian probability distribution to obtain corresponding normalized probability values;

and normalizing all the normalized probability values as a whole with the sample smooth local label value to obtain the weight occupied by the whole, acquiring a target probability value corresponding to each normalized probability value based on the weight, and replacing the target probability value with a non-sample smooth local label value at a corresponding position in the sample initial smooth label to obtain a target smooth label of the training sample.

In a second aspect, an embodiment of the present invention provides a sample label smoothing apparatus for a multi-layer hierarchical neural network, including:

a sample initial label obtaining unit, configured to obtain a sample initial label corresponding to the training sample, where the sample initial label is a one-dimensional matrix, the one-dimensional matrix includes a plurality of values, each value is used to describe a probability that the training sample is recognized as a classification category corresponding to the value, the plurality of values includes a sample class label value and a non-sample class label value, the sample class label value is used to describe a probability that the training sample is recognized as a classification category with a maximum probability, and the non-sample class label value is used to describe a probability that the training sample is recognized as a classification category other than the classification category with the maximum probability;

A label smoothing unit, configured to perform label smoothing on the sample initial label by using a preset initial label smoothing method to obtain a sample initial smooth label, where the sample type label value is correspondingly smoothed to a sample smooth type label value;

a position weight set obtaining unit, configured to configure, with the position of the sample type label value in the sample initial label as a symmetry center, a position weight from small to large for each position of the value according to a sequence from near to far from the symmetry center of the position of each value, and obtain a position weight set including all the position weights;

a Gaussian distribution probability obtaining unit, configured to obtain, based on Gaussian distribution and with the symmetry center as a center, a distribution weight corresponding to each position weight in the position weight set, and obtain a sample Gaussian distribution probability;

the probability value normalization unit is used for normalizing the probability values of other positions except the original point position in the sample Gaussian probability distribution to obtain a corresponding normalized probability value;

and the target smooth label acquiring unit is used for normalizing all the normalized probability values and the sample smooth label values to obtain the weight occupied by the whole, acquiring a target probability value corresponding to each normalized probability value based on the weight, and replacing the target probability value with a non-sample smooth label value at a corresponding position in the sample initial smooth label to obtain a target smooth label of the training sample.

In a third aspect, an embodiment of the present invention further provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor, when executing the computer program, implements the sample label smoothing method for a multi-layer hierarchical neural network according to the first aspect.

In a fourth aspect, the embodiments of the present invention further provide a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and the computer program, when executed by a processor, causes the processor to execute the method for sample label smoothing of a multilevel hierarchical classification neural network according to the first aspect.

The embodiment of the invention provides a sample label smoothing method, a device and equipment of a multi-level classification neural network, aiming at the problem of classification of levels, in particular to the problem of multi-level classification of progressive relation, Gaussian probability distribution is introduced into an initial smooth label of a sample, so that the label weight distributed by a label of a non-sample class is inversely proportional to the distance from the label of the sample class, the label of the non-sample class is distributed with different weights according to the distance from the label of the sample class, the label of the non-sample class close to the label of the sample class is distributed with a larger weight, the label of the non-sample class far from the label of the sample class is distributed with a smaller weight, the problem that the classification of the progressive relation in the label smoothing process of the traditional technology is trusting other classes with equal probability when the probability of the progressive relation of the label of the other positions is set as alpha/(K-1) can be avoided, the problem of probability distribution of other labels in one cutting is solved, different confidence degrees are distributed to other classification categories, and the recognition probability distribution of the classification categories is more in line with the actual situation and reasonable.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the description below are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.

FIG. 1 is a schematic flow chart of a sample label smoothing method for a multi-layer hierarchical neural network according to an embodiment of the present invention;

FIG. 2 is a sub-flowchart of a sample label smoothing method for a multi-level neural network according to an embodiment of the present invention;

FIG. 3 is a schematic block diagram of a sample label smoothing apparatus of a multi-level neural network provided by an embodiment of the present invention;

fig. 4 is a schematic block diagram of a computer device provided in an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting. As used in this specification and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in this specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items and includes such combinations.

Referring to fig. 1, fig. 1 is a schematic flow chart illustrating a sample label smoothing method of a multi-level neural network according to an embodiment of the present invention; as shown in FIG. 1, the method includes steps S11 to S16.

And 11, obtaining a sample initial label corresponding to the training sample, wherein the sample initial label is a one-dimensional matrix, the one-dimensional matrix includes a plurality of values, each value is used for describing the probability of the training sample being recognized as the classification category corresponding to the value, the plurality of values includes a sample class label value and a non-sample class label value, the sample class label value is used for describing the probability of the training sample being recognized as the classification category corresponding to the maximum probability, and the non-sample class label value is used for describing the probability of the training sample being recognized as the classification category except for the classification category recognized as the maximum probability.

Specifically, the classification is to identify the target as a corresponding category, and the classification neural network is to classify the target for the deep learning-based neural network so as to identify the target as a corresponding category as correctly as possible. The classification can be divided into two classifications and multiple classifications, wherein the two classifications are used as targets, or the two classifications are used as the one or the other of the three classifications, and the multiple classifications include a multi-level classification which can be called a hierarchical multiple classification. For example, for the image blur six classification (i.e. image definition six classification), the image may be classified into 6 categories according to the blur degree (i.e. the image definition may be classified into 6 levels), and may be classified into: the image is classified into six categories, namely progressive multi-level category, namely high definition, clearness, relative blur, blur and very blur.

When training the multi-level neural network model, training samples are usually used to train the multi-level neural network model, and the training samples are usually prepared in advance, marking the training samples to obtain the marked category to which the training samples belong, then the training samples are identified by the multi-level classification neural network model so as to identify the training samples as prediction classes, then the multi-level classification neural network model carries out deep learning based on the training samples, the labeling classes and the prediction classes to realize the training of the multi-level classification neural network model, the aim of training the multi-level classification neural network model is that the prediction classes and the labeling classes are consistent, namely, the recognition of the training samples by the multi-level classification neural network model is considered to be correct, otherwise, the recognition of the training samples by the multi-level classification neural network model is considered to be wrong.

When the multi-level classification neural network model is trained, the multi-level classification neural network model firstly classifies training samples to obtain the probability that the multi-level classification neural network model identifies the training samples as each classification category in the multi-level classification, the category corresponding to the default maximum probability is the category to which the training samples belong, the classification category to which the training samples belong can be called the category of the training samples, other classification categories can be called non-category of the training samples, so as to obtain the sample initial label corresponding to the training samples, because the multi-level classification problem of a plurality of classification categories exists at the moment, the sample initial label is a one-dimensional matrix, the one-dimensional matrix comprises a plurality of values, and each value is used for describing the probability (namely the prediction probability) that the training samples are identified as the classification category corresponding to the value, among all the prediction probabilities, the maximum value is used to describe the probability corresponding to the class to which the training sample should belong most, that is, the probability corresponding to the class to which the training sample is most likely to be recognized, the class to which the training sample is most likely to be recognized may be described by a sample class, and the maximum value in the prediction probabilities may be described by a sample class label value, where the sample class is the class to which the training sample is predicted to belong most likely. Since the classification problem is a probability prediction problem, the training samples may be labeled, particularly numerically labeled, for example, 10 labeled, based on the multi-level classification neural network model, so as to describe the probability that the training samples are recognized as each classification category in the multi-level classification, for example, One-hot label is used as the label, and the One-hot label is obtained by performing One-hot encoding on the samples. For example, regarding the image blur degree six classification, it is assumed that the blur degree of one training sample image is relative blur, the 01 label of the one-hot label corresponding to the one-hot sample image is [0, 0, 0, 1, 0, 0], that is, the sample initial label of the training sample image is [0, 0, 0, 1, 0, 0], the classification category of the sample class label value for describing that the training sample image is recognized as the sample class is 1, the classification category of the sample class label value with the probability of 1 corresponds to the classification category of "relative blur", the non-sample class label values of the other non-sample class categories in the multi-layer class are sequentially [0, 0, 0, 0, 0, 0], and the probability of describing that the training sample image is recognized as the other non-sample categories is 0. Thus, a training sample, a sample class category to which the training sample is labeled, and a multi-level class to which the sample class category belongs are obtained, labeling the training sample according to the sample class and a plurality of non-sample class classes contained in the multi-level classification to obtain an initial label matrix in a one-dimensional matrix form corresponding to the training sample, the initial label matrix includes a sample class label value describing the sample class category and a non-sample class label value describing the non-sample class category, each value in the initial label matrix is used to describe the probability that a training sample is identified as the class to which the value corresponds, since a training sample is identified as a class with the highest probability, the sample class label value is usually the maximum value corresponding to the maximum probability described in the initial label matrix.

And S12, performing label smoothing on the sample initial label by adopting a preset initial label smoothing method to obtain a sample initial smoothing label, wherein the sample class label value is correspondingly smoothed to be a sample smoothing class label value.

Wherein, the label smoothing is a modification of the loss function, which adjusts the training target of the neural network from "1" to "1-label smoothing adjustment".

Specifically, the classification problem is substantially a probability prediction problem, a multi-level classification neural network model is trained, when the multi-level classification neural network model is aimed at classifying training samples, the difference between the prediction probability and the true probability of the training samples is minimized as much as possible, so as to obtain a probability distribution which is optimized as much as possible, the multi-level classification neural network model learns towards a direction which promotes the self to learn the maximum difference value between a correct label and an incorrect label, meanwhile, in order to prevent the overfitting of the multi-level classification label, a label smoothing mode is generally adopted, for example, noise is added through soft label, so as to reduce the weight of the category of the true sample label when calculating a loss function, so as to solve the overfitting problem, for example, a sample initial label [0, 0, 0, 1, 0, 0] aiming at the image ambiguity six classification is adopted, when the label smoothing is adopted, the original position 1 is changed into 1-alpha, and the other positions are changed into alpha/(K-1), wherein K is the number of categories of six categories of image blurriness, namely K is 6, if alpha is 0.1, the sample initial label is converted from [0, 0, 0, 1, 0, 0] into a sample initial smooth label [0.02, 0.02, 0.02, 0.9, 0.02, 0.02, wherein the sample smooth category label value is 0.9, and the smooth label values of the other categories in the preset multi-level category are [0.02, 0.02, 0.02, 0.02, 0.02, 0.02] in sequence, namely the non-sample smooth category label value. Therefore, after a sample initial label corresponding to the training sample is obtained, performing label smoothing on the sample initial label by using a preset initial label smoothing method to obtain a sample initial smooth label, wherein the sample class label value is correspondingly smoothed to be a sample smooth class label value, and the smooth label values except the sample smooth class label value are non-sample smooth class label values.

S13, taking the position of the sample type label value in the sample initial label as a symmetry center, and configuring the position of each value with a position weight from small to large according to the sequence of the position of each value from near to far from the symmetry center to obtain a position weight set containing all the position weights.

Specifically, with the position of the sample class label value in the sample initial label as a symmetry center, according to the sequence from near to far from the symmetry center of the position of each of the non-sample class label value and the sample class label value, a position weight from small to large is configured for each position of the value, and the position weight is a distance weight, where the closer the distance is, the smaller the position weight is, the farther the distance is, and the larger the position weight is, a position weight set including the position weight of each position is obtained, so that a classification category closer to the sample class label value has a smaller position weight, for example, for the image blur six classification, since the blur degree of a training sample image is relatively blur, the position weight corresponding to relative blur may be 0, and a classification category based on "relatively clear" and "blur" based on "relatively fuzzy" should have a smaller position weight than "clear" and "very fuzzy", the probability distribution of subsequent image ambiguity identification is more in accordance with the actual situation and more reasonable.

Further, as shown in fig. 2, step S13 is to configure, with the position of the sample type label value in the sample initial label as a symmetry center, position weights from small to large for the positions of the values according to a sequence from near to far from the symmetry center of the positions of the values, so as to obtain a position weight set including all the position weights, and includes:

s130, performing one-dimensional coordinate systemization on the sample initial label to obtain a one-dimensional coordinate system;

s131, taking the symmetric center as an original point of the one-dimensional coordinate system, taking an original point coordinate of the original point as a preset reference value, and sequentially increasing the absolute value of the coordinate corresponding to the position of each non-sample type label value according to the sequence from near to far of the position of each non-sample type label value from the original point coordinate to obtain a coordinate corresponding to the position of the non-sample type label value;

and S132, taking the coordinates as position weights, and obtaining sample label coordinates corresponding to the sample initial labels.

Specifically, the sample initial label is subjected to one-dimensional coordinate systemization, the position of the sample type label value in the sample initial label is used as an origin of a one-dimensional coordinate system, the origin coordinate of the origin is used as a preset reference value, and according to the sequence from near to far of the position distance of each non-sample type label value and the origin coordinate, the absolute values of the coordinates corresponding to the positions of the non-sample type label values are sequentially increased, so as to obtain the coordinates corresponding to the positions of the non-sample type label values, and obtain the sample label coordinates corresponding to the sample initial label.

Further, in step S131, the sequentially increasing the absolute value of the coordinates corresponding to the position of each non-sample type tag value according to the sequence from the near to the far from the origin coordinate of the position of each non-sample type tag value includes:

and sequentially increasing the absolute value of the coordinate corresponding to the position of each non-sample type label value according to the sequence of the position of each non-sample type label value from the origin coordinate from near to far by taking a preset step length as a coordinate unit.

Specifically, since the sample initial label is a one-dimensional matrix, the one-dimensional matrix is converted into a one-dimensional coordinate system, namely, the sample initial label is expressed by a one-dimensional coordinate system, the position of the sample class label value in the sample initial label is taken as the origin of the one-dimensional coordinate system, namely, the position of the sample type label value in the sample initial label is taken as a symmetry center, the origin point coordinate of the origin point is taken as a preset reference value, and then sequentially increasing the absolute value of the coordinate corresponding to the position of each non-sample type label value in the sample initial label according to the sequence from near to far of the position of each non-sample type label value in the sample initial label from the origin coordinate by taking a preset step length as a coordinate unit to obtain the coordinate corresponding to the position of the non-sample type label value and obtain the sample label coordinate corresponding to the sample initial label. For example, the sample initial label [0, 0, 0, 1, 0, 0] classified according to the image blur degree six is used as the origin of the one-dimensional coordinate system, the origin coordinate of the origin is used as the reference value, if the origin coordinate is set to 0, 1 is used as the preset step, and if the sample initial label is moved to the right by one category, the corresponding coordinate is increased by 1, the corresponding coordinate is moved to the left by one category, and the corresponding coordinate is decreased by 1, so that the coordinate corresponding to the position of the non-sample type label value can be obtained, and the sample label coordinate corresponding to the sample initial label [0, 0, 0, 1, 0, 0] is [ -3, -2, -1, 0, 1, 2 ].

And S14, based on Gaussian distribution, taking the symmetric center as the center, obtaining distribution weights corresponding to each position weight in the position weight set, and obtaining a sample Gaussian distribution probability.

Specifically, gaussian distribution is a distribution mode capable of weighting, and the closer to the central position, the larger the value is, the farther from the central position, the smaller the value is, therefore, if described in a one-dimensional coordinate system, based on gaussian distribution, the origin coordinate is taken as the center, the distribution weight corresponding to each coordinate in the sample label coordinate is obtained, so as to obtain a sample gaussian distribution probability, then based on the near-to-far position relationship between the position of each non-sample label value and the origin coordinate, gaussian probability distribution is introduced into the sample initial label, so that the non-sample labels have different weights, and the non-sample label weight closer to the sample label is also relatively larger. The following gaussian distributed distribution probability density function may be taken:

（1）

wherein, x in the formula (1) is each coordinate in the sample label coordinate, f (x) is used for describing the probability value for identifying the sample as the category corresponding to the x coordinate, c is the coordinate of the sample label (namely, the original point coordinate, also a preset reference value) or the mean value of each coordinate after taking the absolute value in the sample label coordinate, e is a constant parameter, the variance σ is a constant, and the empirical value 1.5 can be taken, so that the probability value of each coordinate in the sample label coordinate can be obtained. For example, for the sample initial label [0, 0, 0, 1, 0, 0] of the image blur six classifications, the corresponding sample label coordinate is [ -3, -2, -1, 0, 1, 2], the variance is 1.5, c is the coordinate of the sample class label is set to 0, the coordinates of each position of the sample label coordinate [ -3, -2, -1, 0, 1, 2] are substituted into formula (1), and the probability values of each position class are obtained as follows: 0.0360, 0.1093, 0.2130, 0.2660, 0.2130, 0.1093, and the probability of Gaussian distribution of samples is [0.0360, 0.1093, 0.2130, 0.2660, 0.2130, 0.1093 ].

And S15, normalizing the probability values of other positions except the original point position in the sample Gaussian probability distribution to obtain the corresponding normalized probability value.

Specifically, the probability values of other positions except the origin position in the sample gaussian probability distribution are normalized to obtain corresponding normalized probability values, for example, for the sample gaussian probability distribution [0.0360, 0.1093, 0.2130, 0.2660, 0.2130, 0.1093], the five probability values of other positions except 0.2660 are summed to obtain 0.6806, and the five probability values of other positions are normalized to obtain normalized probability values of 0.06, 0.16, 0.31, 0.31, 0.16.

S16, taking all the normalized probability values as a whole, normalizing the normalized probability values and the sample smooth local label values to obtain weights occupied by the whole, obtaining target probability values corresponding to all the normalized probability values based on the weights, and replacing non-sample smooth local label values at corresponding positions in the sample initial smooth labels with the target probability values to obtain the target smooth labels of the training samples.

Specifically, all the normalized probability values are taken as a whole to be normalized with the sample smooth class label value, that is, 1-the sample smooth class label value is taken as the sum of all the normalized probability values, then the sum is divided equally to each normalized probability value, the calculated result is taken as the target probability value corresponding to each normalized probability value, so as to obtain the target probability value corresponding to each normalized probability value, and the target probability value is substituted for the non-sample smooth class label value at the corresponding position in the sample initial smooth label, so as to obtain the target smooth label of the training sample, so that the target smooth label not only has a normalized target, but also is allocated with different weights of other classification categories according to the distance from the sample smooth class label value, thereby avoiding a cutting idea of other labels and being based on the same trust degree, adding reasonable weight division.

Further, the obtaining a target probability value corresponding to each normalized probability value based on the weight in step S16 includes:

and dividing the weight equally by all the normalized probability values to obtain the sub-weight of each normalized probability value in the weight, and taking the sub-weight as the target probability value corresponding to the normalized probability value.

Specifically, all the normalized probability values are taken as a whole and normalized with the sample smooth book type label value, so that the weight occupied by all the normalized probability values is obtained, then the weight is evenly divided onto each normalized probability value, so that the sub-weight of each normalized probability value occupying the weight is obtained, the sub-weight is taken as the target probability value corresponding to the normalized probability value, and then the target probability value is used for replacing the non-sample smooth book type label value at the corresponding position in the sample initial smooth label, so that the target smooth label of the training sample is obtained. For example, for the above normalized probability values of 0.06, 0.16, 0.31, 0.31, 0.16, it is necessary to consider to perform normalization processing again with the sample smoothed class label value of 0.9, i.e. it is also necessary to normalize the above normalized probability values of 0.06, 0.16, 0.31, 0.16 to 0.1, i.e. normalize the probability values at positions other than the origin position in the sample gaussian distribution probability distribution twice, so as to obtain the normalized probability values of 0.06, 0.16, 0.31, 0.31, 0.16 to 0.006, 0.016, 0.031, 0.031, 0.016, [0.006, 0.016, 0.031, 0.016] respectively, and then replace the sample initial smoothed label of [0.02, 0.02, 0.9, 0.02, 0.02] in each of [0.02 ] with the non-corresponding sample smoothed label of [ 0.031, 0.006, 0.016] to obtain the target smoothed label of [0.006, 0.9, 0.016] respectively, 0.9, 0.031, 0.016].

In the embodiment of the application, by obtaining a sample initial label corresponding to the training sample, and smoothing the sample initial label by adopting a preset initial label smoothing method to obtain a sample initial smooth label, taking the position of the sample type label value in the sample initial label as a symmetric center, configuring position weights from small to large for the positions of the values according to the sequence of the position distances of the values from near to far from the symmetric center, obtaining a position weight set containing all the position weights, taking the symmetric center as a center based on Gaussian distribution, obtaining distribution weights corresponding to the position weights in the position weight set, obtaining a sample Gaussian distribution probability, and normalizing probability values of other positions except the original point position in the sample Gaussian probability distribution to obtain a corresponding normalized probability value, normalizing all the normalized probability values as a whole with the sample smooth class label values to obtain a target probability value corresponding to each normalized probability value, replacing the target probability value with a non-sample smooth class label value at a corresponding position in the sample initial smooth label to obtain a target smooth label of the training sample, thereby optimizing the label of the training sample, and leading the label weight distributed by the non-sample class label to be inversely proportional to the distance from the sample class label for the classification problem with hierarchy, in particular for the multi-hierarchy classification problem with progressive relation, so that the non-sample class label is distributed with different weights according to the distance from the sample class label, and the non-sample class label closer to the sample class label is distributed with larger weight, the non-sample type labels far away from the sample type labels are distributed with smaller weight, the problem that the probability that other positions are set to be alpha/(K-1) in the smoothing process of the labels in the traditional technology is equivalent to trusting other types of classification categories with the same probability can be solved, the problem of probability distribution of other labels in one cutting is solved, different trusts are distributed to other classification categories, and the recognition probability distribution of the classification categories is more in line with the actual situation and is reasonable.

The embodiment of the present invention further provides a sample label smoothing apparatus of a multilevel hierarchical recurrent neural network, where the sample label smoothing apparatus of the multilevel hierarchical recurrent neural network is configured to execute any one of the foregoing sample label smoothing methods of the multilevel hierarchical recurrent neural network. Specifically, referring to fig. 3, fig. 3 is a schematic block diagram of a sample label smoothing apparatus of a multi-level classification neural network according to an embodiment of the present invention.

As shown in fig. 3, the exemplar label smoothing apparatus 100 of the multilevel hierarchical classification neural network includes an exemplar initial label acquiring unit 110, a label smoothing unit 120, a position weight set acquiring unit 130, a gaussian distribution probability acquiring unit 140, a probability value normalizing unit 150, and a target smooth label acquiring unit 160.

A sample initial label obtaining unit 110, configured to perform one-dimensional coordinate systemization on the sample initial label to obtain a one-dimensional coordinate system;

a label smoothing unit 120, configured to take the symmetry center as an origin of the one-dimensional coordinate system, take an origin coordinate of the origin as a preset reference value, and sequentially increase absolute values of coordinates corresponding to positions of the non-sample type label values according to a sequence from near to far of the positions of the non-sample type label values from the origin coordinate, so as to obtain coordinates corresponding to the positions of the non-sample type label values;

A position weight set obtaining unit 130, configured to configure, with the position of the sample type label value in the sample initial label as a symmetry center, position weights from small to large for positions of the values according to a sequence from near to far from the symmetry center of the positions of the values, and obtain a position weight set including all the position weights.

Specifically, with the position of the sample class label value in the sample initial label as a symmetry center, according to the sequence from near to far of the position of each of the non-sample class label value and the sample class label value from the symmetry center, a position weight from small to large is configured for the position of each of the values, and the position weight is also a distance weight, and the closer the distance is, the smaller the position weight is, the farther the distance is, the larger the position weight is, a position weight set including the position weight of each position is obtained, so that a classification category closer to the sample class label value has a smaller position weight, for example, for the image blur degree six classification, since the blur degree of a training sample image is relatively blur, the position weight corresponding to the relative blur may be 0, and a classification category based on the distance "relatively clear" and "blur" should have a smaller position weight than "clear" and "very fuzzy", the probability distribution of subsequent image ambiguity identification is more in accordance with the actual situation and more reasonable.

Further, the location weight set obtaining unit 130 is specifically configured to:

performing one-dimensional coordinate systemization on the sample initial label to obtain a one-dimensional coordinate system;

taking the symmetric center as an original point of the one-dimensional coordinate system, taking an original point coordinate of the original point as a preset reference value, and sequentially increasing the absolute value of the coordinate corresponding to the position of each non-sample type label value according to the sequence from near to far of the position of each non-sample type label value from the original point coordinate to obtain the coordinate corresponding to the position of the non-sample type label value;

and taking the coordinates as position weight, and obtaining sample label coordinates corresponding to the sample initial label.

Further, the sequentially increasing the absolute value of the coordinates corresponding to the positions of the non-sample type tag values according to the sequence from the near to the far from the origin coordinates of the positions of the non-sample type tag values includes:

and sequentially increasing the absolute value of the coordinate corresponding to the position of each non-sample type label value according to the sequence from near to far of the position of each non-sample type label value from the origin coordinate by taking a preset step length as a coordinate unit.

Specifically, since the sample initial label is a one-dimensional matrix, the one-dimensional matrix is converted into a one-dimensional coordinate system, namely, the sample initial label is expressed by a one-dimensional coordinate system, the position of the sample class label value in the sample initial label is taken as the origin of the one-dimensional coordinate system, namely, the position of the sample type label value in the sample initial label is taken as a symmetry center, the origin point coordinate of the origin point is taken as a preset reference value, and then sequentially increasing the absolute value of the coordinates corresponding to the positions of the non-sample type label values in the sample initial label in a progressive manner according to the sequence from near to far of the positions of the non-sample type label values in the sample initial label from the origin coordinates by taking a preset step length as a coordinate unit to obtain the coordinates corresponding to the positions of the non-sample type label values and obtain the sample label coordinates corresponding to the sample initial label. For example, the sample initial label [0, 0, 0, 1, 0, 0] classified according to the image blur degree six is used as the origin of the one-dimensional coordinate system, the origin coordinate of the origin is used as the reference value, if the origin coordinate is set to 0, 1 is used as the preset step, and if the sample initial label is moved to the right by one category, the corresponding coordinate is increased by 1, the corresponding coordinate is moved to the left by one category, and the corresponding coordinate is decreased by 1, so that the coordinate corresponding to the position of the non-sample type label value can be obtained, and the sample label coordinate corresponding to the sample initial label [0, 0, 0, 1, 0, 0] is [ -3, -2, -1, 0, 1, 2 ].

A gaussian distribution probability obtaining unit 140, configured to obtain, based on gaussian distribution and with the symmetry center as a center, a distribution weight corresponding to each position weight in the position weight set, so as to obtain a gaussian distribution probability of the sample.

Specifically, gaussian distribution is a distribution mode capable of weighting, and the closer to the central position, the larger the value is, the farther from the central position, the smaller the value is, therefore, if described in a one-dimensional coordinate system, based on gaussian distribution, the origin coordinate is taken as the center, the distribution weight corresponding to each coordinate in the sample label coordinate is obtained, so as to obtain a sample gaussian distribution probability, then based on the near-to-far position relationship between the position of each non-sample label value and the origin coordinate, gaussian probability distribution is introduced into the sample initial label, so that the non-sample labels have different weights, and the non-sample label weight closer to the sample label is also relatively larger. A distribution probability density function of gaussian distribution shown in the above formula (1) may be adopted, where x is each coordinate in the sample label coordinate, f (x) is used to describe a probability value for identifying the sample as a category corresponding to the x coordinate, c is a coordinate of the sample label (i.e., an original point coordinate, which is also a preset reference value) or an average value of absolute values of each coordinate in the sample label coordinate, e is a constant parameter, and the variance σ is a constant, and an empirical value of 1.5 may be taken, so as to obtain a probability value of each coordinate in the sample label coordinate. For example, for the sample initial label [0, 0, 0, 1, 0, 0] of the image blur six classifications, the corresponding sample label coordinate is [ -3, -2, -1, 0, 1, 2], the variance is 1.5, c is the coordinate of the sample class label is set to 0, the coordinates of each position of the sample label coordinate [ -3, -2, -1, 0, 1, 2] are substituted into formula (1), and the probability values of each position class are obtained as follows: 0.0360, 0.1093, 0.2130, 0.2660, 0.2130, 0.1093, and the probability of Gaussian distribution of samples is [0.0360, 0.1093, 0.2130, 0.2660, 0.2130, 0.1093 ].

A probability value normalization unit 150, configured to normalize the probability values of other positions in the sample gaussian probability distribution except for the origin position, to obtain a corresponding normalized probability value.

Specifically, the probability values of the positions other than the origin position in the sample gaussian probability distribution are normalized to obtain corresponding normalized probability values, for example, for the sample gaussian probability distribution [0.0360, 0.1093, 0.2130, 0.2660, 0.2130, 0.1093], the five probability values of the positions other than 0.2660 are summed to obtain 0.6806, and the five probability values of the positions other than 0.2660 are normalized to obtain normalized probability values of 0.06, 0.16, 0.31, 0.31, 0.16.

And a target smooth label obtaining unit 160, configured to normalize all the normalized probability values as a whole with the sample smooth label value to obtain a weight occupied by the whole, obtain a target probability value corresponding to each normalized probability value based on the weight, and replace the target probability value with a non-sample smooth label value at a corresponding position in the sample initial smooth label to obtain a target smooth label of the training sample.

Specifically, all the normalized probability values are taken as a whole to be normalized with the sample smooth class label value, that is, 1-the sample smooth class label value is taken as the sum of all the normalized probability values, then the sum is divided equally to each normalized probability value, the result obtained by calculation is taken as the target probability value corresponding to each normalized probability value, so as to obtain the target probability value corresponding to each normalized probability value, and the target probability value replaces the non-sample smooth class label value at the corresponding position in the sample initial smooth label to obtain the target smooth label of the training sample, so that the target smooth label not only has a normalized target, but also is allocated with different weights of other classification categories according to the distance from the sample smooth class label value, thereby avoiding a cutting thought of other labels and being based on equal trust degree, adding reasonable weight division.

Further, the target smooth label obtaining unit 160 is specifically configured to:

The sample label smoothing apparatus of the multi-level classification neural network described above may be implemented in the form of a computer program that can be run on a computer device as shown in fig. 4.

Referring to fig. 4, fig. 4 is a schematic block diagram of a computer device according to an embodiment of the present invention. The computer device may be a sample label smoothing method for performing a multi-level hierarchical classification neural network.

Referring to fig. 4, the computer device 500 includes a processor 502, memory, and a network interface 505 connected by a system bus 501, where the memory may include a storage medium 503 and an internal memory 504.

The storage medium 503 may store an operating system 5031 and a computer program 5032. The computer program 5032, when executed, may cause the processor 502 to perform a sample label smoothing method for a multilevel hierarchical recurrent neural network, wherein the storage medium 503 may be a volatile storage medium or a non-volatile storage medium.

The processor 502 is used to provide computing and control capabilities that support the operation of the overall computer device 500.

The internal memory 504 provides an environment for the operation of the computer program 5032 in the storage medium 503, and when the computer program 5032 is executed by the processor 502, the processor 502 may be caused to perform a sample label smoothing method for a hierarchical classification neural network.

The network interface 505 is used for network communication, such as providing transmission of data information. It will be appreciated by those skilled in the art that the configuration shown in fig. 4 is a block diagram of only a portion of the configuration associated with aspects of the present invention, and is not intended to limit the computing device 500 to which aspects of the present invention may be applied, as a particular computing device 500 may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.

The processor 502 is configured to run a computer program 5032 stored in the memory to implement the corresponding functions in the sample label smoothing method of the multi-layer hierarchical classification neural network.

Those skilled in the art will appreciate that the embodiment of a computer device illustrated in fig. 4 does not constitute a limitation on the particular configuration of the computer device, and in other embodiments, the computer device may include more or fewer components than those illustrated, or some components may be combined, or a different arrangement of components. For example, in some embodiments, the computer device may only include a memory and a processor, and in such embodiments, the structures and functions of the memory and the processor are consistent with those of the embodiment shown in fig. 4, which are not described herein again.

It should be understood that, in the embodiment of the present invention, the Processor 502 may be a Central Processing Unit (CPU), and the Processor 502 may also be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field-Programmable Gate arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, etc. Wherein a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

In another embodiment of the present invention, a computer-readable storage medium is provided. The computer readable storage medium may be a volatile or non-volatile computer readable storage medium. The computer readable storage medium stores a computer program, wherein the computer program, when executed by the processor, implements the steps included in the above-mentioned sample label smoothing method for the multi-level neural network.

It can be clearly understood by those skilled in the art that, for convenience and simplicity of description, the specific working processes of the above-described devices, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. Those of ordinary skill in the art will appreciate that the various illustrative components and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the components and steps of the various examples have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the technical solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the several embodiments provided in the present invention, it should be understood that the disclosed apparatus, device and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only a logical division, and there may be other divisions in actual implementation, or units with the same function may be grouped into one unit, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may also be an electrical, mechanical or other form of connection.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiments of the present invention.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit may be implemented in the form of hardware, or may also be implemented in the form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention essentially or partly contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product, which is stored in a computer readable storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the method described in the embodiments of the present invention. And the aforementioned computer-readable storage medium comprises: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a magnetic disk, or an optical disk, and various media capable of storing program codes.

While the invention has been described with reference to specific embodiments, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A sample label smoothing method for a multilevel hierarchical neural network is characterized by comprising the following steps:

2. The method of claim 1, wherein the step of configuring a position weight from small to large for each position of each value according to a sequence from near to far from the symmetry center with the position of the sample class label value in the sample initial label as the symmetry center to obtain a position weight set including all the position weights comprises:

3. The method of claim 2, wherein the sequentially increasing the absolute value of the coordinates corresponding to the positions of the non-sample class label values according to the order from the near to the far from the origin coordinates of the positions of the non-sample class label values comprises:

4. The method of claim 1, wherein the obtaining a target probability value corresponding to each normalized probability value based on the weight comprises:

dividing the weight equally by all the normalized probability values to obtain the sub-weight of each normalized probability value in the weight, and taking the sub-weight as the target probability value corresponding to the normalized probability value.

5. The method of claim 3, wherein the sequentially increasing the absolute value of the coordinates corresponding to the positions of the non-sample type label values according to the sequence from the near to the far from the origin coordinates of the positions of the non-sample type label values comprises:

if the non-sample type label value is determined to move to the right for one type, increasing the corresponding coordinate by 1;

and if the non-sample type label value is determined to move to the left by one type, subtracting 1 from the corresponding coordinate.

6. The method of claim 1, wherein the preset initial label smoothing method employs a multi-level classification neural network for classifying classes in which a layer-by-layer progressive relationship exists between classes of the hierarchical classification.

7. The method of claim 2, wherein the probability density function of the distribution of the Gaussian distribution is:

wherein, x is each coordinate in the sample label coordinate, c is the mean value of each coordinate after taking the absolute value in the sample label coordinate, and the variance sigma is a constant.

8. A sample label smoothing apparatus for a multilevel hierarchical neural network, comprising:

a sample initial label obtaining unit, configured to obtain a sample initial label corresponding to the training sample, where the sample initial label is a one-dimensional matrix, where the one-dimensional matrix includes a plurality of values, each of the values is used to describe a probability that the training sample is identified as a classification category corresponding to the value, the plurality of values includes a sample class label value and a non-sample class label value, the sample class label value is used to describe a probability that the training sample is identified as the classification category corresponding to the maximum probability, and the non-sample class label value is used to describe a probability that the training sample is identified as a classification category other than the classification category identified as the maximum probability;

The label smoothing unit is used for smoothing the sample initial label by adopting a preset initial label smoothing method to obtain a sample initial smooth label, wherein the sample class label value is correspondingly smoothed into a sample smooth class label value;

a position weight set obtaining unit, configured to configure, with a position of the sample type tag value in the sample initial tag as a symmetry center, a position weight from small to large for each position of the value according to a sequence from a near to a far from the symmetry center of the position of each value, and obtain a position weight set including all the position weights;

a Gaussian distribution probability obtaining unit, configured to obtain, based on Gaussian distribution and with the symmetric center as a center, a distribution weight corresponding to each position weight in the position weight set, and obtain a sample Gaussian distribution probability;

the probability value normalization unit is used for normalizing the probability values of other positions except the original position in the sample Gaussian probability distribution to obtain corresponding normalized probability values;

and the target smooth label obtaining unit is used for taking all the normalized probability values as a whole, normalizing the normalized probability values and the sample smooth label values to obtain the weight occupied by the whole, obtaining the target probability value corresponding to each normalized probability value based on the weight, and replacing the target probability value with the non-sample smooth label value at the corresponding position in the sample initial smooth label to obtain the target smooth label of the training sample.

9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the sample label smoothing method of a multilevel hierarchical recurrent neural network as claimed in any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a processor, implements the sample label smoothing method of a multilevel hierarchical classification neural network according to any one of claims 1 to 7.