CN109902722A

CN109902722A - Classifier, neural network model training method, data processing equipment and medium

Info

Publication number: CN109902722A
Application number: CN201910082386.8A
Authority: CN
Inventors: 蔡东阳; 王涛; 刘倩; 刘洁
Original assignee: Beijing QIYI Century Science and Technology Co Ltd
Current assignee: Beijing QIYI Century Science and Technology Co Ltd
Priority date: 2019-01-28
Filing date: 2019-01-28
Publication date: 2019-06-18

Abstract

The invention discloses classifier, neural network model training method, data processing equipment and media.A kind of classifier includes: weighting loss value calculation apparatus, so that obtaining weighting loss value of the training sample in each class categories via it from the forecast confidence that neural network model exports；And parameter adjustment controls, the weighting loss value is reversely output to the parameter adjustment controls, so that its parameter for adjusting the neural network model based on weighting loss value, thus the training of the neural network model of the classifier is completed, wherein, the weighting loss value calculation apparatus adds weighted factor, weighting loss value of the training sample in each class categories is thus calculated in the forecast confidence for being used to measure training sample and the loss function of the loss cost of mark true value.By adding weighted factor when penalty values calculate, the parameter of neural network model can be adjusted, optimizes network model, improves the accuracy of identification of classifier.

Description

Classifier, neural network model training method, data processing equipment and medium

Technical field

The present invention relates to sorting technique more particularly to classifier, neural network model training method, data processing equipment and Medium.

Background technique

Deep learning is currently that CRT technology field precision is outstanding, widely used main stream approach.In depth In the application process for practising model, it is necessary first to be trained deep learning model on big training dataset, by making by oneself Loss function and back-propagation algorithm (such as gradient decreasing function of the back-propagation algorithm as successive ignition) study of justice The optimal neural network model parameter of iteration is deployed in practical application and carries out after then verifying touches the mark on verifying collection Classification or prediction.

For deep learning in real data training, often will appear leads to classification essence since the quantity of training sample is unbalanced Spend not high problem.For example, the number of the sample of some common class may have very much, such as can have for object identification Thousands of to up to ten thousand or more, still, the sample of common class is not since the frequency of occurrences is low, is difficult to collect, their number is then very It is few, for example, may only several hundred a samples it is even less.Since traditional loss function (such as cross entropy, mean square error) is not examined The difference for considering the number of this different classes of training sample in reality, will lead to the general classification knot of deep learning model Fruit is more biased towards in the more classifications of sample number, although having kept the penalty values of loss function minimum, for small sample classification The precision of classification but still may be very low.For example, it is assumed that a training data concentration has two class sample A and B, the number of sample A Be far longer than sample B, even if then some article to be sorted belong to sample B it is similar, but by this training dataset training after Deep learning model classification results may more maximum probability be sample A, lead here it is the quantity due to training sample is unbalanced The not high problem of the nicety of grading of cause.

A kind of solution of accuracy of identification for improving classifier is needed as a result,.

Summary of the invention

One of in order to solve problem above, the present invention provides a kind of classifiers, neural network model training method, data Processing equipment and medium, the accuracy of identification to overcome the problems, such as classifier is not high, thus improves its accuracy of identification.

According to one embodiment of present invention, a kind of classifier based on neural network model is provided, comprising: weighting loss Value calculation apparatus is configured for so that tentative prediction export from the neural network model, as neural network model As a result, for indicating that training sample belongs to the forecast confidence of the probability of each class categories of sample, via the weighting loss Value calculation apparatus obtains weighting loss value of the training sample in each class categories；And parameter adjustment controls, it is damaged by weighting The weighting loss value for losing value calculation apparatus output is reversely output to the parameter adjustment controls, so that parameter adjustment dress The parameter for adjusting the neural network model based on the weighting loss value is set, the neural network model of the classifier is thus completed Training, wherein the weighting loss value calculation apparatus is in the forecast confidence and mark true value for being used to measure training sample Loss cost loss function in, add weighted factor, be thus calculated the training sample in each class categories plus Weigh penalty values.

Optionally, the weighted factor is related at least one of:

1) training data concentrates the number of the training sample of each class categories；And

2) as the tentative prediction result of the neural network model, for indicating that training sample belongs to each point of sample The forecast confidence of the probability of class classification.

Optionally, the weighted factor and training data concentrate the number of the training sample of each class categories to be inversely proportional.

Optionally, the weighted factor with as neural network model tentative prediction result, for indicate train sample Originally belong to the forecast confidence of the probability of each class categories of sample in monotone decreasing relationship.

Optionally, the loss function includes cross entropy loss function.

Optionally, the parameter adjustment controls are based on the weighting loss value, pass through the Back Propagation Algorithm of successive ignition To adjust the parameter of neural network model.

According to one embodiment of present invention, a kind of neural network model training method for classifier is provided, comprising: Training sample is input to neural network model；Neural network model output as tentative prediction result, for described in indicating Training sample belongs to the forecast confidence of the probability of each class categories of sample；By being used to measure the described pre- of training sample Survey confidence level and mark in the loss function of the loss cost of true value and add weighted factor, and using this be added to weighting because The loss function of son, to calculate weighting loss value of the training sample in each class categories；And existed based on the training sample The weighting loss value in each class categories, adjusts the parameter of the neural network model, thus completes neural network model Training.

Optionally, the weighted factor is related at least one of:

Optionally, the loss function includes cross entropy loss function.

Optionally, it is based on the weighting loss value, neural network mould is adjusted by the Back Propagation Algorithm of successive ignition The parameter of type.

Still another embodiment in accordance with the present invention provides a kind of data processing equipment, comprising: processor；And memory, It is stored thereon with executable code, when the executable code is executed by the processor, executes the processor above One of method of description.

According to still another embodiment of the invention, a kind of non-transitory machinable medium is provided, is stored thereon with Executable code makes the processor execute one of method described above when the executable code is executed by processor.

The present invention can adjust the parameter of neural network model by adding weighted factor when penalty values calculate, optimization Neural network model, hence for the image classification for using neural network model to be classified and predicting, article identifies, image divides It cuts, edge extracting, for the application such as speech recognition, accuracy of identification can be effectively improved.

Detailed description of the invention

Disclosure illustrative embodiments are described in more detail in conjunction with the accompanying drawings, the disclosure above-mentioned and its Its purpose, feature and advantage will be apparent, wherein in disclosure illustrative embodiments, identical appended drawing reference Typically represent same parts.

Fig. 1 gives the schematic block diagram of the classifier of an exemplary embodiment according to the present invention.

Fig. 2 gives the schematic diagram of the neural network model training process of an exemplary embodiment according to the present invention.

Fig. 3 gives the schematic stream of the neural network model training method of an exemplary embodiment according to the present invention Cheng Tu.

Fig. 4 gives the schematic block diagram of the data processing equipment of an exemplary embodiment according to the present invention.

Specific embodiment

The preferred embodiment of the disclosure is more fully described below with reference to accompanying drawings.Although showing the disclosure in attached drawing Preferred embodiment, however, it is to be appreciated that may be realized in various forms the disclosure without the embodiment party that should be illustrated here Formula is limited.On the contrary, these embodiments are provided so that this disclosure will be more thorough and complete, and can be by the disclosure Range is completely communicated to those skilled in the art.What needs to be explained here is that number, serial number and attached drawing in the application Mark it is merely for convenience description and occur, for step of the invention, sequence etc. be not limited in any way, unless The execution that step has been explicitly pointed out in specification has specific sequencing.

It is n output node of setting that neural network classification model, which solves more most common methods of classification problem, and wherein n is class Other number.For each sample, via the available n dimension group of neural network as output result.In array The output result of the corresponding classification of each dimension (namely each output node).In the ideal case, if one Sample belongs to classification k, then the output valve of output node corresponding to this classification should be 1, and the output of other nodes is all It is 0.In practical applications, usually output result can not reach 1, but a probability numbers, such as 0.5,0.3 etc., this Kind situation usually takes final classification result of that the corresponding classification of the output node of maximum probability as this sample.

As described in the background art, the problem low presently, there are the nicety of grading of classifier, such as due to number of training The low problem of nicety of grading caused by mesh is unbalanced, More specifically, because the number of the sample of some or certain classifications is big, The output probability value of corresponding output node may also can be big, the classification results caused are partial to the big sample class of number Not, lead to the type that the small sample of number possibly can not be recognized accurately.In addition, there is also the accuracy of identification of difficult sample is low Problem.

One of to solve the above-mentioned problems, the present invention proposes a kind of neural network model of the use based on Weighted Loss Function Training method and corresponding classifier.

Firstly, Fig. 1 gives the schematic block diagram of the classifier of an exemplary embodiment according to the present invention, Fig. 2 is given The schematic diagram of the neural network model training process of an exemplary embodiment according to the present invention is gone out.

Specifically, as shown in Figure 1, the classification based on neural network model of an exemplary embodiment according to the present invention Device 100 includes weighting loss value calculation apparatus 110 and parameter adjustment controls 120.

Wherein, weighting loss value calculation apparatus 110 can be configured for, so that from the neural network mould of classifier 100 Type output, tentative prediction result as neural network model, for indicating that training sample belongs to each classification class of sample The not forecast confidence of the probability of (as shown in Fig. 2, classifier shares K output node, so sharing K class categories), warp By the weighting loss value calculation apparatus 110, weighting loss value of the training sample in each class categories is obtained (such as Fig. 2 institute Show).

Then, the weighting loss value exported by weighting loss value calculation apparatus 110 is reversely output to parameter adjustment controls 120, so that parameter adjustment controls 120 adjust the ginseng of each node in the neural network model based on these weighting loss values Number, thus completes the training of the neural network model of the classifier.

Wherein, above-mentioned weighting loss value calculation apparatus 110 is in the forecast confidence and mark for being used to measure training sample In the loss function of the loss cost of true value, weighted factor is added, the training sample is thus calculated in each class categories On weighting loss value.

Here, " forecast confidence " of training sample is exported from neural network model for the preliminary of the training sample Prediction result, for indicating that the training sample belongs to the probability of each class categories.For example, for a 3 class classifiers, it is initial One training sample of Neural Network model predictive belong to the 1st class categories probability be 0.3, belong to the general of the 2nd class categories Rate is 0.5, and the probability for belonging to the 3rd class categories is 0.2, then 0.3,0.5 and 0.2 here is respectively that the training sample belongs to the The forecast confidence of 1 class, the 2nd class and the 3rd class.

On the other hand, " mark true value " of training sample is it is meant that when the training sample belongs to certain class categories, the instruction Practicing mark true value of the sample in the class categories should be " 1 ", when the training sample is not belonging to certain class categories, the instruction Practicing mark true value of the sample in the class categories should be " 0 ".

As shown in figure 3, training sample is input to neural network model, then neural network model is defeated in step S101 Out as tentative prediction result, for indicating that the training sample belongs to the forecast confidence of the probability of each class categories of sample (can refer to shown in the above-mentioned example and Fig. 2).

Then, in step S110, by the forecast confidence and mark true value for being used to measure training sample It loses in the loss function of cost and adds weighted factor, and be added to the loss function of weighted factor using this, to calculate this Weighting loss value of the training sample in each class categories.

Then, in step S120, weighting loss value based on the training sample in each class categories adjusts the nerve Thus the parameter of network model completes the training (as shown in Figure 2) of neural network model.

Here it is possible to it is based on weighting loss value, by the Back Propagation Algorithm (such as gradient descent method etc.) of successive ignition, To adjust the parameter of neural network model.

In illustrative embodiments above, weighting function may include intersecting entropy function.

For the disaggregated model of a M class, traditional cross entropy loss function is defined as:

Wherein, p_i(x_k) indicate training sample x_kBelong to the true probability (i.e. above-mentioned " mark true value ") of the i-th class, Value is usually 0 or 1.For example, as described above, if training sample x_kBelong to certain class, then the probability is in the value of the category 1, it is otherwise 0.

Formula 1) in q_i(x_k) indicate prediction training sample x_kBelong to the confidence level probability (i.e. above-mentioned " prediction of the i-th class Confidence level "), it is exactly the output result of the output node corresponding with the i-th class sample of classifier.As described above, M output section Point can export M confidence level probability, for indicating the probability of a classification that a sample is belonging respectively in this M classification. As described above, usually taking that classification of confidence level maximum probability as classification results in classification.

In the present invention, above-mentioned weighted factor can be related at least one of:

Assuming that there is one, containing K training sample, (here, sample can be denoted as x_k, wherein k=1,2 ..., K) data Collect X_K, wherein X_K={ x₁,x₂,……,x_K}。

It is different from above-mentioned traditional entropy function that intersects for loss function, in the present invention, it is possible, firstly, to using with class The sample number weight w that the number of the sample of other i is inversely proportional_i, to adjust the penalty values of loss function, thus by sample number weight shape At Weighted Loss Function it is as follows.

As described above, sample number weight w_iIt is inversely proportional with the number of the sample of classification i, it can be for example with following form

Alternatively, for example, following form

Here, n_iIt is the sample size of the i-th class training sample, N is the quantity of all training samples, and β is appointing greater than 0 Meaning real number, it is preferable that β is positive integer (calculating simpler).

As described above, an exemplary embodiment according to the present invention, defines the number of one with the sample of classification i first The weighted factor w being inversely proportional_i(it is properly termed as " sample number weight w_i"), then reversely adjusted according to the number size of Different categories of samples Whole penalty values, the i.e. penalty values of the training data of the few classification of increase sample size, inhibit the training of the classification more than sample size The penalty values of data, so that neural network model can obtain better nicety of grading on equalization data.

That is, the problem that the present exemplary embodiment is low for number of samples nicety of grading caused by uneven, proposes use Sample number weight w_iCarry out regulation loss value, withFor, when the sample number very little of a certain classification, weight w_iValue Level off to 1, the penalty values result of the category is basically unchanged；But when there are many number of samples of a certain classification, weight w_iValue become It is bordering on 0, the penalty values of the category can all reduce.In this way, the type more than number of samples can be greatly reduced by sample number weight Sample loss contribution, thus, it is possible to solve to a certain extent number of samples it is uneven caused by nicety of grading is low asks Topic.

In summary, by utilizing above-mentioned " sample number weight w_i" adjust Different categories of samples penalty values, i.e., increase sample The penalty values of the training data of the few classification of quantity, while inhibiting the penalty values of the training data of classification more than sample size, make Higher nicety of grading can be obtained by obtaining classification method according to the present invention.

The weight of another exemplary embodiment according to the present invention, penalty values can also be according to the prediction confidence of the i-th class Spend q_i(x_k) be adjusted, at this time can the weight be referred to as confidence weight u_i(weight u_iWith the forecast confidence q of the i-th class_i (x_k) related), so that for forecast confidence q_i(x_k) bad sample class, it gives higher penalty values weight and otherwise gives Lower weight is given, so that classification method according to the present invention or classifier can obtain higher nicety of grading.

As a result, by confidence weight u_iThe Weighted Loss Function of formation is as follows.

Wherein, u_iFor above-mentioned confidence weight.

As described above, confidence weight u_iWith the forecast confidence q of the i-th class_i(x_k) related, specifically, for example, confidence level The size of weight can be in the relationship of monotone decreasing with forecast confidence, so as to for its forecast confidence q_i(x_k) bad sample This classification gives higher penalty values weight, otherwise, gives lower penalty values weight.

Thus, for example, confidence weight u_iIt can be defined as follows.

u_i=(1-q_i(x_k))^α (7)

Here, α is any real number greater than 0, it is preferable that α is positive integer (calculating simpler).

The present exemplary embodiment can solve the problem of difficult sample, and difficult sample refers to being difficult to accurately differentiate in network Sample.For this kind of sample, the present invention passes through confidence weight u_iTo increase the contribution of its loss function.In weighting loss value Calculation formula in, q_i(x_k) refer to the confidence level of neural network forecast.For Mr. Yu's class sample, the confidence level q of neural network forecast_i (x_k) higher, u_iSmaller, the loss weight contribution of the sample is smaller, thus, it is possible to inhibit the weight contribution of the sample, solves difficult The problem of sample is difficult to.

By adjusting the weight of penalty values based on the forecast confidence exported on the output node of classifier, so that right In the bad sample class of forecast confidence, higher penalty values weight is given, lower weight is otherwise given, so that according to this The classification method or classifier of invention can obtain higher nicety of grading.

The weight of further exemplary embodiment according to the present invention, penalty values (can be referred to as " comprehensive weight h_i") It can be adjusted according to both the number of Different categories of samples and forecast confidence of each output node output.That is, this implementation Example can be regarded as the combination of above-mentioned two embodiment.

Specifically, comprehensive weight h_iIt can be by w_iAnd u_iThe two is combined into.

For example, comprehensive weight h_iIt can be made of following formula (8).

h_i=(w_i)^s(u_i)^t(8),

Here, s and t all can be greater than 0 any real number, it is preferable that be positive integer.

As a result, by comprehensive weight h_iThe weighting function of formation is as follows.

The forecast confidence exported on output node by number and classifier in conjunction with Different categories of samples loses to adjust The weight of value, so that sample class unbalanced for sample size and that forecast confidence is bad, gives higher penalty values Otherwise weight gives lower weight, classification method according to the present invention or classifier is enabled to obtain higher classification essence Degree.

Referring to fig. 4, which includes memory 10 and processor 20.

Processor 20 can be the processor of a multicore, also may include multiple processors.In some embodiments, locate Reason device 20 may include a general primary processor and one or more special coprocessors, such as graphics processor (GPU), digital signal processor (DSP) etc..In some embodiments, the circuit realization of customization can be used in processor 20, Such as application-specific IC (ASIC, Application Specific Integrated Circuit) or scene can Programmed logic gate array (FPGA, Field Programmable Gate Arrays).

It is stored with executable code on memory 10, when the executable code is executed by the processor 20, makes institute It states processor 20 and executes one of data migration method described above.Wherein, memory 10 may include various types of storages Unit, such as Installed System Memory, read-only memory (ROM) and permanent storage.Wherein, ROM can store processor 20 or The static data or instruction that other modules of computer need.Permanent storage can be read-write storage device.Forever Long storage device will not lose the non-volatile memory device of the instruction and data of storage can be after computer circuit breaking. In some embodiments, permanent storage device is using mass storage device (such as magnetically or optically disk, flash memory) as permanent Storage device.In other embodiment, permanent storage device can be removable storage equipment (such as floppy disk, light It drives).Installed System Memory can be read-write storage equipment or the read-write storage equipment of volatibility, such as in dynamic random-access It deposits.Installed System Memory can store the instruction and data that some or all processors need at runtime.In addition, memory 10 can To include the combination of any computer readable storage medium, including various types of semiconductor memory chips (DRAM, SRAM, SDRAM, flash memory, programmable read only memory), disk and/or CD can also use.In some embodiments, memory 10 may include removable storage equipment that is readable and/or writing, such as laser disc (CD), read-only digital versatile disc (such as DVD-ROM, DVD-dual layer-ROM), read-only Blu-ray Disc, super disc density, flash card (such as SD card, min SD card, Micro-SD card etc.), magnetic floppy disc etc..Computer readable storage medium does not include carrier wave and by wirelessly or non-wirelessly transmitting Momentary electron signal.

In addition, being also implemented as a kind of computer program or computer program product, the meter according to the method for the present invention Calculation machine program or computer program product include the calculating for executing the above steps limited in the above method of the invention Machine program code instruction.

Alternatively, the present invention can also be embodied as a kind of (or the computer-readable storage of non-transitory machinable medium Medium or machine readable storage medium), it is stored thereon with executable code (or computer program or computer instruction code), When the executable code (or computer program or computer instruction code) by electronic equipment (or calculate equipment, server Deng) processor execute when, so that the processor is executed each step according to the above method of the present invention.

Those skilled in the art will also understand is that, various illustrative logical blocks, mould in conjunction with described in disclosure herein Block, circuit and algorithm steps may be implemented as the combination of electronic hardware, computer software or both.

What flow chart and block diagram in attached drawing etc. showed the system and method for multiple embodiments according to the present invention can The architecture, function and operation being able to achieve.In this regard, each box in flowchart or block diagram can represent a mould A part of block, program segment or code, a part of the module, section or code include one or more for realizing rule The executable instruction of fixed logic function.It should also be noted that in some implementations as replacements, the function of being marked in box It can also be occurred with being different from the sequence marked in attached drawing.For example, two continuous boxes can actually be substantially in parallel It executes, they can also be executed in the opposite order sometimes, and this depends on the function involved.It is also noted that block diagram and/ Or the combination of each box in flow chart and the box in block diagram and or flow chart, can with execute as defined in function or The dedicated hardware based system of operation is realized, or can be realized using a combination of dedicated hardware and computer instructions.

Various embodiments of the present invention are described above, above description is exemplary, and non-exclusive, and It is not limited to disclosed each embodiment.Without departing from the scope and spirit of illustrated each embodiment, for this skill Many modifications and changes are obvious for the those of ordinary skill in art field.The selection of term used herein, purport In the principle, practical application or improvement to the technology in market for best explaining each embodiment, or make the art Other those of ordinary skill can understand each embodiment disclosed herein.

Claims

1. a kind of classifier based on neural network model characterized by comprising

Weighting loss value calculation apparatus, be configured for so that from the neural network model export, as neural network mould The tentative prediction result of type, for indicating that training sample belongs to the forecast confidence of the probability of each class categories of sample, pass through By the weighting loss value calculation apparatus, weighting loss value of the training sample in each class categories is obtained；And

Parameter adjustment controls are output to parameter adjustment dress by the weighting loss value that weighting loss value calculation apparatus exports It sets, so that the parameter adjustment controls adjust the parameter of the neural network model based on the weighting loss value, thus completes The training of the neural network model of the classifier；

Wherein, the weighting loss value calculation apparatus is in the forecast confidence and mark true value for being used to measure training sample In the loss function for losing cost, weighted factor is added, weighting of the training sample in each class categories is thus calculated Penalty values.

2. classifier as described in claim 1, which is characterized in that wherein, the weighted factor is related at least one of:

2) as the tentative prediction result of the neural network model, for indicating that training sample belongs to each classification class of sample The forecast confidence of other probability.

3. classifier as claimed in claim 2, which is characterized in that wherein, the weighted factor and training data concentrate each point The number of the training sample of class classification is in inversely prroportional relationship.

4. classifier as claimed in claim 2, which is characterized in that wherein, the weighted factor with as neural network model Tentative prediction result, for indicating that training sample belongs to the forecast confidence of the probability of each class categories of sample in dullness Successively decrease relationship.

5. the classifier as described in any one in Claims 1 to 4, which is characterized in that the loss function includes intersecting Entropy loss function.

6. the classifier as described in any one in Claims 1 to 4, which is characterized in that the parameter adjustment controls are based on The weighting loss value, the parameter of neural network model is adjusted by the Back Propagation Algorithm of successive ignition.

7. a kind of neural network model training method for classifier characterized by comprising

Training sample is input to neural network model；

Neural network model export as tentative prediction result, for indicating that the training sample belongs to each classification class of sample The forecast confidence of other probability；

By in the loss function of the loss cost of the forecast confidence and mark true value that are used to measure training sample Weighted factor is added, and is added to the loss function of weighted factor using this, to calculate the training sample in each class categories On weighting loss value；And

The weighting loss value based on the training sample in each class categories, adjusts the parameter of the neural network model, Thus the training of neural network model is completed.

8. neural network model training method as claimed in claim 7, which is characterized in that wherein, the weighted factor with It is at least one lower related:

9. neural network model training method as claimed in claim 8, which is characterized in that wherein, the weighted factor and instruction The number for practicing the training sample of each class categories in data set is inversely proportional.

10. neural network model training method as claimed in claim 8, which is characterized in that wherein, the weighted factor and work For neural network model tentative prediction result, for indicating that training sample belongs to the pre- of the probability of each class categories of sample Surveying confidence level is in monotone decreasing relationship.

11. the neural network model training method as described in any one in claim 7~10, which is characterized in that described Loss function includes cross entropy loss function.

12. the neural network model training method as described in any one in claim 7~10, which is characterized in that be based on The weighting loss value, the parameter of neural network model is adjusted by the Back Propagation Algorithm of successive ignition.

13. a kind of data processing equipment, comprising:

Processor；And

Memory is stored thereon with executable code, when the executable code is executed by the processor, makes the processing Device executes the method as described in any one of claim 7~12.

14. a kind of non-transitory machinable medium, is stored thereon with executable code, when the executable code is located When managing device execution, the processor is made to execute the method as described in any one of claim 7~12.