CN108320026A

CN108320026A - Machine learning model training method and device

Info

Publication number: CN108320026A
Application number: CN201710344182.8A
Authority: CN
Inventors: 丁俊南; 尹红军
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2017-05-16
Filing date: 2017-05-16
Publication date: 2018-07-24
Anticipated expiration: 2037-05-16
Also published as: CN108320026B

Abstract

The present invention relates to a kind of machine learning model training method and devices, including：It obtains and has purified sample data before epicycle cleans dirty sample data；According to the "current" model parameter of existing purified sample data and machine learning model, the first second order average gradient of the loss function of the model is determined；According to epicycle from the purified sample data obtained after the dirty sample data cleaning in part and "current" model parameter is taken in dirty sample data, the second second order average gradient of loss function is determined；According to the first second order average gradient and the second second order average gradient, the whole second order average gradient of loss function is obtained；"current" model parameter is adjusted according to whole second order average gradient；If the model parameter after adjustment is unsatisfactory for training termination condition, using next round as epicycle, returns to obtain and continue to train the step of having purified sample data before epicycle cleans dirty sample data, until meeting training termination condition.Reduce the newer number of iteration, and then reduces loss of the iteration update to machine resources.

Description

Machine learning model training method and device

Technical field

The present invention relates to field of computer technology, more particularly to a kind of machine learning model training method and device.

Background technology

The process of machine learning typically refers to sample data of the computer according to input, by series of algorithms to input Sample data analyzed, to build initial model, and update by repetitive exercise the model parameter of initial model, with To final suitable model.

In conventional method, it is updated to model parameter by gradient descent method (gradient descent).Its In, using gradient descent method update model parameter when, the gradient of loss function can be calculated, according to the gradient come to model parameter into Row iteration updates, and model is carried out gradually convergence to improve the accuracy rate of model.

However, traditional method for being updated model parameter based on gradient descent method, each iteration put forward the accuracy rate of model High level is smaller, needs the newer number of iteration relatively more, it is then desired to expend in machine more resource to be iterated more Newly.

Invention content

Based on this, it is necessary to when for updating model parameter currently based on gradient descent method, need to expend more in machine Resource the technical issues of being iterated update, provides a kind of machine learning model training method and device.

A kind of machine learning model training method, including：

Obtain the existing purified sample data before epicycle cleans dirty sample data；

According to the "current" model parameter of the existing purified sample data and machine learning model, the engineering is determined Practise the first second order average gradient of the loss function of model；

Epicycle is obtained from the purified sample data for taking the dirty sample data in part to be obtained after cleaning in dirty sample data；

The purified sample data and the "current" model parameter cleaned according to epicycle, determine the loss function The second second order average gradient；

According to the first second order average gradient and the second second order average gradient, the entirety of the loss function is obtained Second order average gradient；

The "current" model parameter is adjusted according to the whole second order average gradient；

When model parameter after adjustment is unsatisfactory for training termination condition, using next round as epicycle, the acquisition is returned to To continue to train the step of existing purified sample data before the dirty sample data of epicycle cleaning, until the model parameter after adjustment Meet training termination condition.

A kind of machine learning model training device, including：

Sample data acquisition module, for obtaining the existing purified sample data before epicycle cleans dirty sample data；

Second order average gradient determining module, for working as according to the existing purified sample data and machine learning model Preceding model parameter determines the first second order average gradient of the loss function of the machine learning model；

The sample data acquisition module is additionally operable to acquisition epicycle and takes the dirty sample data cleaning in part from dirty sample data The purified sample data obtained afterwards；

The second order average gradient determining module is additionally operable to the purified sample data cleaned according to epicycle and institute "current" model parameter is stated, determines the second second order average gradient of the loss function；According to the first second order average gradient and The second second order average gradient obtains the whole second order average gradient of the loss function；

Model parameter adjusts module, for adjusting the "current" model parameter according to the whole second order average gradient；When When model parameter after adjustment is unsatisfactory for training termination condition, using next round as epicycle, the sample data is notified to obtain mould Block works, until the model parameter after adjustment meets training termination condition.

Above-mentioned machine learning model training method and device, according to the first second order of existing clean data counting loss function Average gradient, and the second second order average gradient of purified sample data counting loss function for being cleaned by epicycle, into And obtain under "current" model parameter, the whole second order average gradient of loss function, it is average by the whole second order of loss function Gradient to "current" model parameter is updated adjustment.Wherein, according to second order average gradient come update model parameter to model into The convergent speed of row carries out convergent speed faster than gradient descent method to model, and the newer number of required iteration will subtract It is few, and then reduce the loss in model parameter renewal process to machine resources.

Description of the drawings

Fig. 1 is the internal structure schematic diagram of electronic equipment in one embodiment；

Fig. 2 is the flow diagram of machine learning model training method in one embodiment；

Fig. 3 is that the second second order average gradient of loss function in one embodiment determines the flow diagram of step；

Fig. 4 is the flow diagram of machine learning model training method in another embodiment；

Fig. 5 is the structural schematic diagram of machine learning model training device in one embodiment；

Fig. 6 is the structural schematic diagram of machine learning model training device in another embodiment.

Specific implementation mode

In order to make the purpose , technical scheme and advantage of the present invention be clearer, with reference to the accompanying drawings and embodiments, right The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and It is not used in the restriction present invention.

Fig. 1 is the internal structure schematic diagram of electronic equipment in one embodiment.The electronic equipment can be terminal or clothes Business device.Terminal can be personal computer or mobile electronic device, and mobile electronic device includes mobile phone, tablet computer, individual At least one of digital assistants or Wearable etc..Server can be taken with the either multiple physics of independent server The server cluster of business device composition is realized.As shown in Figure 1, the electronic equipment include the processor connected by system bus, Non-volatile memory medium, built-in storage and network interface.Wherein, the non-volatile memory medium of the electronic equipment can store behaviour Make system and computer-readable instruction, which is performed, and processor may make to execute a kind of machine learning Model training method.The processor of the electronic equipment supports the operation of entire electronic equipment for providing calculating and control ability. The built-in storage of the electronic equipment can store computer-readable instruction, when which is executed by processor, can make It obtains processor and executes a kind of machine learning model training method.The network interface of the electronic equipment is led to for connecting network Letter.It will be understood by those skilled in the art that structure shown in Fig. 1, only with the relevant part-structure of application scheme Block diagram, does not constitute the restriction for the electronic equipment being applied thereon to application scheme, and specific electronic equipment may include Than more or fewer components as shown in the figure, either combines certain components or arranged with different components.

Fig. 2 is the flow diagram of machine learning model training method in one embodiment.The present embodiment is mainly with the party Method is illustrated applied to the electronic equipment in above-mentioned Fig. 1.With reference to Fig. 2, which specifically includes Following steps：

S202 obtains the existing purified sample data before epicycle cleans dirty sample data.

Specifically, electronic equipment can train to obtain machine learning model according to the progress machine learning of whole sample datas Initial model, and whole sample datas are carried out with the cleaning of dirty sample data by wheel, to initial model receive by wheel adjustment It holds back, improves the accuracy rate of model.Wherein, convergence is adjusted to model, can be adjusted by the model parameter to the model It is whole to realize.

During every wheel cleans dirty sample data, electronic equipment can obtain before epicycle cleans dirty sample data Some purified sample data.In one embodiment, existing purified sample data can clean dirty sample data in epicycle Before, the satisfactory sample data that cleaning obtains is had already passed through.Dirty sample data, can be in whole sample datas not yet Cleaned sample data.

For example, before first round cleaning, dirty sample data can be whole sample datas 100, and there is no existing pure Sample data, the first round clean 10 dirty sample datas, obtain the purified sample data after 10 cleanings, then clear in the second wheel Before washing, it is this 10 purified sample data obtained through over cleaning to have clean data then, and dirty sample data is then 100-10= 90.

S204 determines machine learning according to the "current" model parameter of existing purified sample data and machine learning model First second order average gradient of the loss function of model.

Wherein, "current" model parameter, before referring to epicycle progress model parameter adjustment, the model parameter of machine learning model.Damage Function is lost, the inconsistent degree of predicted value and actual value for evaluating machine learning model, loss function value is smaller, engineering The performance for practising model is better.

Gradient is a vector, for indicating that loss function value changes maximum direction and the maximum of loss function value becomes Rate.Second order gradient refers to the maximum of the loss function value obtained according to the second dervative or approximate second derivative of loss function and becomes Change direction and maximum rate of change.Wherein, approximate second derivative, refer to it is that loss function is obtained by non-secondary derivation, with to damage It loses function and carries out the derivative that the second dervative that secondary derivation obtains is similar in gradient attribute.

First second order average gradient of loss function refers to and is had respectively according at least one before the dirty sample data of epicycle cleaning Purified sample data, at least one second order of the loss function of the machine learning model sought under "current" model parameter The average value of gradient.

S206 obtains epicycle from the purified sample data for taking the dirty sample data in part to be obtained after cleaning in dirty sample data.

Wherein, dirty sample data is removed before epicycle cleans dirty sample data before referring to epicycle cleaning, in whole sample datas Sample data except some purified sample data, it will be understood that dirty sample data here refers to current all dirty sample data.

For example, whole sample datas have 100, existing purified sample data have 20 before epicycle cleans dirty sample data Item, then, dirty sample data is then 100-20=80 items.The dirty sample data in part, refer to from the dirty sample data of current whole by A part of dirty sample data extracted according to preset rules.For example, extracting 10 dirty sample datas, institute from 80 dirty sample datas The 10 dirty sample datas extracted are the dirty sample data in part.

The dirty sample data in part that electronic equipment can extract epicycle from dirty sample data is cleaned, and epicycle is obtained Purified sample data after cleaning.Electronic equipment can also be obtained to epicycle directly from sample data cleaning equipment from dirty sample The purified sample data obtained after the part sample data cleaning extracted in data.

S208, the purified sample data and "current" model parameter cleaned according to epicycle, determines the second of loss function Second order average gradient.

Wherein, second order gradient refers to the loss function value obtained according to the second dervative or approximate second derivative of loss function Maximum change direction and maximum rate of change.Wherein, approximate second derivative refers to and is obtained by non-secondary derivation to loss function , with to loss function carry out second dervative that secondary derivation obtains derivative close in gradient attribute.Loss function Second second order average gradient refers to the purified sample data cleaned respectively according to epicycle, the machine learning model sought The average value of at least one second order gradient of the loss function under "current" model parameter.

S210, according to the first second order average gradient and the second second order average gradient, the whole second order for obtaining loss function is flat Equal gradient.

Specifically, electronic equipment can be weighted average meter to the first second order average gradient and the second second order average gradient It calculates, obtains the whole second order average gradient of loss function.

In one embodiment, step S210 includes：By the first second order average gradient and the second second order average gradient, respectively It sums according to corresponding first weight and the second Weight, obtains the whole second order average gradient of loss function.Wherein, first Weight cleans the accounting that existing purified sample data before dirty sample data account for whole sample datas for epicycle；Second weight is The dirty sample data that epicycle is cleaned before dirty sample data accounts for the accounting of whole sample datas.

Wherein, epicycle cleans the dirty sample data before dirty sample data, before referring to the dirty sample data of epicycle cleaning, whole samples Sample data in data other than existing purified sample data before epicycle cleans dirty sample data.

In one embodiment, the whole second order average gradient of loss function can be calculated according to following formula：

Wherein, g (θ) is the whole second order average gradient of loss function, R_cleanFor before epicycle cleans dirty sample data Some clean datas, R are whole sample datas, | R_clean| for the number of the existing clean data before epicycle cleans dirty sample data Amount, | R | for whole sample data volumes, g_c(θ) is the first second order average gradient of loss function, and c is the abbreviation of clean, is used for It marks epicycle and cleans existing clean data before dirty sample data；R_dirtyThe dirty sample number before dirty sample data is cleaned for epicycle According to, | R_dirty| for the quantity of the existing clean data before epicycle cleans dirty sample data, g_s(θ) is the two or two of loss function Rank average gradient, s are the abbreviations of sample.Wherein, | R |=| R_dirty|+|R_clean|。For the first weight,For the second weight.

S212 adjusts "current" model parameter according to whole second order average gradient.

Wherein, "current" model parameter is adjusted according to whole second order average gradient, referred to flat along the whole second order of loss function Equal gradient descent direction is the value for declining length adjustment "current" model parameter with whole second order average gradient value, so that loss letter Numerical value is reduced with maximum rate of change.

Electronic equipment individually can adjust "current" model parameter according to whole second order average gradient, can also be according to study speed Rate and whole second order average gradient adjust "current" model parameter.When current according to learning rate and whole second order average gradient adjustment When model parameter, electronic equipment can be the whole second order average gradient descent direction along loss function, with learning rate and The product of whole second order average gradient value is to decline length, to adjust the value of "current" model parameter, so that loss function value is with most Big change rate reduces.

Wherein, learning rate, the gradient for regulation loss function decline stride.Learning rate can be fixed value, It can be the dynamic value that respective change is carried out during adjusting model parameter.

In one embodiment, "current" model parameter can be adjusted according to following formula：

Wherein, θ^newFor the model parameter after adjustment, θ^(d)For "current" model parameter, γ is learning rate,For Loss function is in "current" model parameter θ^(d)Under whole second order average gradient.

S214 when the model parameter after adjustment is unsatisfactory for training termination condition, using next round as epicycle, returns to S202 To continue to train, until the model parameter after adjustment meets training termination condition.

Wherein, training termination condition can be that the number of iteration cleaning reaches preset times.Specifically, electronic equipment can To judge whether the number of iteration cleaning reaches preset times, if so, the model parameter after judgement adjustment meets training and terminates Condition.Training termination condition can also be that after carrying out cleaning update, the change rate of loss function value is in a preset range It is interior.Specifically, electronic equipment may determine that the change rate of loss function value whether in a preset range, if so, can sentence The model parameter for setting the tone whole meets training termination condition.

Above-mentioned machine learning model training method, according to the average ladder of the first second order of existing clean data counting loss function Degree, and the second second order average gradient of purified sample data counting loss function for being cleaned by epicycle, and then obtain Under "current" model parameter, the whole second order average gradient of loss function, by the whole second order average gradient of loss function come Adjustment is updated to "current" model parameter.Wherein, model is restrained to update model parameter according to second order average gradient Speed, convergent speed is carried out faster to model than gradient descent method, the newer number of required iteration will be reduced, in turn Reduce the loss of machine resources in model parameter renewal process.

In addition, the purified sample data that electronic equipment is cleaned by existing purified sample data and epicycle；To ask The whole second order average gradient for taking loss function, ensure that and adjust model parameter based on pure sample data, avoid base In the erroneous effects that clean data and the blended data of dirty data adjustment model parameter are brought, the standard of model parameter adjustment is improved True rate.

In one embodiment, step S204 includes：By the current of existing purified sample data and machine learning model Model parameter substitutes into loss function；It seeks substituting into the first of the loss function of existing purified sample data and "current" model parameter First-order partial derivative and the first second order local derviation matrix；According to the first second order local derviation inverse of a matrix matrix and the first first-order partial derivative, really Determine the first second order average gradient of the loss function of machine learning model.

Wherein, the first first-order partial derivative refers to when model parameter is "current" model parameter, and the substitution sought is existing pure The derivative of the loss function of sample data.First second order local derviation matrix refers to when model parameter is "current" model parameter, to substituting into The loss function of existing purified sample data carries out the derivative that secondary derivation obtains.Wherein, secondary ask is carried out to loss function The derivative led is matrix.Secondary derivation differentiates again to the first first-order partial derivative of loss function.First second order local derviation Inverse of a matrix matrix can be that the progress inverse operation of electronic equipment pair the first second order local derviation matrix obtains inverse matrix.First Second Order Partial Inverse of a matrix matrix is led, can also be the approximate matrix of the first second order local derviation inverse of a matrix matrix.Existing purified sample data It is one or more.

When existing purified sample data are one, electronic equipment will can directly be based on the existing purified sample number It is flat as the first second order of loss function according to the first first-order partial derivative sought and the first second order local derviation inverse of a matrix matrix product Equal gradient.

When existing purified sample data are a plurality of, electronic equipment can then be sought existing pure based on each item respectively The product of the first first-order partial derivative and the first second order local derviation inverse of a matrix matrix that sample data obtains obtains the of loss function One second order divides gradient, then seeks the average value of multiple first second order point gradients, obtains the average ladder of the first second order of loss function Degree.

In one embodiment, step S204 includes：According to the loss function of following formula computing machine learning model First second order average gradient：

Wherein, g_c(θ) is the first second order average gradient；C is the abbreviation of clean, indicates average for calculating the first second order The sample data of gradient is pure data；R_cleanFor existing purified sample data；φ () indicates loss function；H(φ()) Indicate the second order local derviation matrix of loss function；For i-th of input data in existing purified sample data；It is existing I-th of output data in purified sample data；θ is "current" model parameter；It is the of loss function One second order local derviation inverse of a matrix matrix；For the first first-order partial derivative of loss function；It is then that the first second order divides gradient.

In one embodiment, step S208 includes：Purified sample data and the "current" model ginseng that epicycle is cleaned Number substitutes into loss function；Seek substituting into the of the loss function of purified sample data and "current" model parameter that epicycle is cleaned Two first-order partial derivatives and the second second order local derviation matrix；According to the second second order local derviation inverse of a matrix matrix and the second first-order partial derivative, Determine the second second order average gradient of loss function.

Wherein, the second first-order partial derivative refers to when model parameter is "current" model parameter, and the substitution epicycle sought is cleaned The derivative of the loss function of the purified sample data arrived.Second second order local derviation matrix, it is "current" model parameter to refer in model parameter When, the loss function to substituting into the purified sample data that epicycle is cleaned carries out the derivative that secondary derivation obtains, wherein to damage It is matrix to lose function and carry out the derivative that secondary derivation obtains.Wherein, secondary derivation is the second first-order partial derivative to loss function It differentiates again.Wherein, the second second order local derviation inverse of a matrix matrix can be that the progress of electronic equipment pair the second second order local derviation matrix is inverse Operation obtains inverse matrix.Second second order local derviation inverse of a matrix matrix can also be the close of the second second order local derviation inverse of a matrix matrix Like matrix.Wherein, the purified sample data that epicycle is cleaned are one or more.

When the purified sample data that epicycle is cleaned are one, electronic equipment will directly can be cleaned based on the epicycle The product for the second first-order partial derivative and the second second order local derviation inverse of a matrix matrix that obtained purified sample data are sought is as damage Lose the second second order average gradient of function.

When the purified sample data that epicycle is cleaned are a plurality of, electronic equipment can then be sought being based on each item sheet respectively The product for the second first-order partial derivative and the second second order local derviation inverse of a matrix matrix that the purified sample data that wheel cleaning obtains obtain, The second second order point gradient of loss function is obtained, the average value of multiple second second order point gradients is then sought, obtains loss function The second second order average gradient.

Fig. 3 is to determine damage according to the second second order local derviation inverse of a matrix matrix and the second first-order partial derivative in one embodiment The flow of the step of losing the second second order average gradient of function (the second second order average gradient of abbreviation loss function determines step) Schematic diagram.As shown in figure 3, the second second order average gradient of loss function determines that step specifically includes following steps：

S302 obtains the sampling probability for the corresponding dirty sample data of each purified sample data that epicycle is cleaned.

Wherein, when the dirty sample data of extraction section is cleaned from all dirty sample datas, all in dirty sample data Each dirty sample data both correspond to a probability that can be extracted, which is sampling probability.Wherein, sampling probability To dirty sample data is cleaned after it is directly proportional to the promotion degree of model accuracy rate.For example, dirty sample data d1 is corresponded to Sampling probability be 60%, the corresponding sampling probability of dirty sample data d2 is 50%, then, it is right after being cleaned to dirty sample data d1 The promotion degree of model accuracy rate wants high after comparing dirty sample data d2 cleanings to the promotion degree of model accuracy rate.

S304 seeks corresponding second second order local derviation matrix for every purified sample data that epicycle is cleaned The ratio of inverse matrix and the product of the second first-order partial derivative and the sampling probability of corresponding dirty sample data.

Wherein, the corresponding second second order local derviation inverse of a matrix matrix of every purified sample data being cleaned with epicycle and Second first-order partial derivative refers to every purified sample data that epicycle is cleaned substituting into loss function respectively, and that seeks is current The the second second order local derviation inverse of a matrix matrix and the second first-order partial derivative of loss function under model parameter.For example, epicycle is cleaned The second second order local derviation inverse of a matrix matrix corresponding to the purified sample data c1 arrived and the second first-order partial derivative, as by sample Data c1 substitutes into loss function, the second second order local derviation inverse of a matrix matrix of loss function and the under the "current" model parameter sought Two first-order partial derivatives.

Dirty sample data corresponding with every purified sample data that epicycle is cleaned, refer to cleaned with epicycle it is every Purified sample data are in the front and back dirty data with state transformational relation of epicycle cleaning.Dirty sample number before being cleaned to epicycle According to being cleaned, the corresponding purified sample data after epicycle cleaning are obtained.For example, epicycle carries out clearly dirty sample data d1 It washes, obtains purified sample data c1, then d1 is then dirty sample data corresponding with the purified sample data c1 that epicycle is cleaned.

Specifically, electronic equipment is obtaining the second Second Order Partial corresponding to every purified sample data that epicycle is cleaned When leading inverse of a matrix matrix and the second first-order partial derivative, the second second order local derviation inverse of a matrix matrix can be sought and the second single order is inclined The product of derivative, and seek the sampling of the dirty sample data corresponding to the purified sample data that the product is cleaned with epicycle The ratio of probability obtains at least one ratio.

S306 seeks the average value of each ratio, obtains the second second order average gradient of loss function.

Specifically, electronic equipment can take the quantity of the dirty sample data in part according to epicycle from dirty sample data, seek The average value of each ratio obtains the second second order average gradient of loss function.

In one embodiment, step S306 includes：The second second order that the loss function is calculated according to following formula is flat Equal gradient：

Wherein, g_s(θ) is the second second order average gradient；S is the purified sample data that epicycle is cleaned；C is clean Abbreviation indicates that the sample data for calculating the second second order average gradient is pure data；P (i) is i-th that epicycle extracts The sampling probability of dirty sample data；φ () indicates loss function；H (φ ()) indicates the second order local derviation matrix of loss function； I-th of input data in the purified sample data cleaned for epicycle；In the purified sample data cleaned for epicycle I-th of output data；θ is "current" model parameter；For the second second order local derviation square of loss function The inverse matrix of battle array；For the second first-order partial derivative of loss function.

In the present embodiment, for every purified sample data that epicycle is cleaned, corresponding second second order local derviation is sought The ratio of the product of inverse of a matrix matrix and the second first-order partial derivative and the sampling probability of corresponding dirty sample data, to seek damaging Lose the second second order average gradient of function.Wherein, sampling probability and after being cleaned to dirty sample data to model accuracy rate Promotion degree it is directly proportional.I.e. in the second second order average gradient for seeking loss function, increase clear to dirty sample data Considering to the promotion degree of model accuracy rate after wash clean so that the second second order average gradient of the loss function sought is more It is accurate.

In one embodiment, dirty sample data includes that user characteristics sample data and the user accordingly demarcated draw a portrait and mark Label.This method further includes：After model parameter after adjustment meets training termination condition, then user characteristic data is obtained, will be used Family characteristic inputs the machine learning model of modulated mould preparation shape parameter, output user's portrait label.

Wherein, user's portrait is can to reflect user characteristics according to user's social property, living habit and consumer behavior etc. Data and the user model of a labeling that takes out.User's portrait label, is analyzed by user information Highly refined signature identification.

User characteristics sample data refers to the sample data of characterization user characteristics.In one embodiment, user characteristics sample Data include the data such as social property, living habit and the consumer behavior of user.

After model parameter after adjustment meets training termination condition, you can to obtain the user's portrait machine for meeting demand Learning model.User characteristic data is obtained, user characteristic data is inputted to user's portrait machine learning of modulated mould preparation shape parameter Model then exports user's portrait label corresponding with the user characteristic data.According to the model parameter for meeting training termination condition Corresponding user draws a portrait machine learning model to export user's portrait label corresponding with user characteristic data, can improve output User draw a portrait label accuracy rate.

In one embodiment, as shown in figure 4, providing another machine learning model training method, this method includes Following steps：

S402 obtains the existing purified sample data before epicycle cleans dirty sample data.

The "current" model parameter of existing purified sample data and machine learning model is substituted into loss function by S404.

S406 seeks the first single order local derviation for substituting into the loss function of existing purified sample data and "current" model parameter Number and the first second order local derviation matrix.

S408 determines machine learning model according to the first second order local derviation inverse of a matrix matrix and the first first-order partial derivative First second order average gradient of loss function.

It in one embodiment, can be flat according to the first second order of the loss function of following formula computing machine learning model Equal gradient：

Wherein, g_c(θ) is the first second order average gradient；C is the abbreviation of clean, indicates average for calculating the first second order The sample data of gradient is pure data；R_cleanFor existing purified sample data；φ () indicates loss function；H(φ()) Indicate the second order local derviation matrix of loss function；For i-th of input data in existing purified sample data；It is existing I-th of output data in purified sample data；θ is "current" model parameter；It is the of loss function One second order local derviation inverse of a matrix matrix；For the first first-order partial derivative of loss function.

S410, obtains epicycle from taking the purified sample data obtained after the dirty sample data cleaning of part in dirty sample data, Wherein, dirty sample data includes user characteristics sample data and the user accordingly demarcated portrait label.

S412, the purified sample data that epicycle is cleaned and "current" model parameter substitute into loss function.

S414 seeks substituting into the second of the loss function of purified sample data and "current" model parameter that epicycle is cleaned First-order partial derivative and the second second order local derviation matrix.

S416 obtains the sampling probability for the corresponding dirty sample data of each purified sample data that epicycle is cleaned.

S418 seeks corresponding second second order local derviation matrix for every purified sample data that epicycle is cleaned The ratio of inverse matrix and the product of the second first-order partial derivative and the sampling probability of corresponding dirty sample data.

S420 seeks the average value of each ratio, obtains the second second order average gradient of loss function.

It in one embodiment, can be according to the second second order average gradient of following formula counting loss function：

Wherein, g_s(θ) is the second second order average gradient；S is the purified sample data that epicycle is cleaned；C is clean Abbreviation indicates that the sample data for calculating the second second order average gradient is pure data；P (i) is i-th that epicycle extracts The sampling probability of dirty sample data；φ () indicates loss function；H (φ ()) indicates the second order local derviation matrix of loss function； I-th of input data in the purified sample data cleaned for epicycle；In the purified sample data cleaned for epicycle I-th of output data；θ is "current" model parameter；For the second second order local derviation matrix of loss function Inverse matrix；For the second first-order partial derivative of loss function.

S422, by the first second order average gradient and the second second order average gradient, respectively according to corresponding first weight and Two Weights are summed, and the whole second order average gradient of loss function is obtained.

Wherein, the first weight, existing purified sample data account for whole sample datas before cleaning dirty sample data for epicycle Accounting.Second weight, the dirty sample data cleaned for epicycle before dirty sample data account for the accounting of whole sample datas.

S424 adjusts "current" model parameter according to whole second order average gradient.

S426, judges whether the model parameter after adjustment meets trained termination condition, if it is not, then using next round as this Wheel, return to step S402, if so, entering step S428.

S428 obtains user characteristic data, user characteristic data is inputted to the machine learning model of modulated mould preparation shape parameter, Export user's portrait label.

Secondly, it in the second second order average gradient for seeking loss function, increases and dirty sample data is cleaned Considering to the promotion degree of model accuracy rate afterwards so that the second second order average gradient of the loss function sought is more accurate Really.

Then, machine learning model is drawn a portrait according to the corresponding user of model parameter for meeting training termination condition export with The corresponding user's portrait label of user characteristic data, the accuracy rate for user's portrait label that output can be improved.

As shown in figure 5, in one embodiment, a kind of machine learning model training device 500 is provided, the device 500 Including：Sample data acquisition module 502, second order average gradient determining module 504 and model parameter adjust module 506, wherein：

Sample data acquisition module 502, for obtaining the existing purified sample data before epicycle cleans dirty sample data.

Second order average gradient determining module 504, for working as according to existing purified sample data and machine learning model Preceding model parameter determines the first second order average gradient of the loss function of machine learning model.

Sample data acquisition module 502 is additionally operable to obtain epicycle after taking the dirty sample data cleaning in part in dirty sample data Obtained purified sample data.

Second order average gradient determining module 504 is additionally operable to the purified sample data cleaned according to epicycle and "current" model Parameter determines the second second order average gradient of loss function；According to the first second order average gradient and the second second order average gradient, obtain Obtain the whole second order average gradient of loss function.

Model parameter adjusts module 506, for adjusting "current" model parameter according to whole second order average gradient；After adjustment Model parameter when being unsatisfactory for training termination condition, regard next round as epicycle, 502 work of notice sample data acquisition module, Until the model parameter after adjustment meets training termination condition.

In one embodiment, second order average gradient determining module 504 is additionally operable to existing purified sample data and machine The "current" model parameter of device learning model substitutes into loss function；It seeks substituting into existing purified sample data and "current" model parameter Loss function the first first-order partial derivative and the first second order local derviation matrix；According to the first second order local derviation inverse of a matrix matrix and One first-order partial derivative determines the first second order average gradient of the loss function of machine learning model.

In one embodiment, second order average gradient determining module 504 is additionally operable to learn according to following formula computing machine First second order average gradient of the loss function of model：

Wherein, g_c(θ) is the first second order average gradient；C is the abbreviation of clean, indicates average for calculating the first second order The sample data of gradient is pure data；R_cleanFor existing purified sample data；φ () indicates loss function；H(φ()) Indicate the second order local derviation matrix of loss function；For i-th of input data in existing purified sample data；It is existing I-th of output data in purified sample data；θ is "current" model parameter；It is the first of loss function Second order local derviation inverse of a matrix matrix；For the first first-order partial derivative of loss function.

In one embodiment, second order average gradient determining module 504 is additionally operable to the purified sample for cleaning epicycle Data and "current" model parameter substitute into loss function；It seeks substituting into the purified sample data and "current" model ginseng that epicycle is cleaned Second first-order partial derivative of several loss functions and the second second order local derviation matrix；According to the second second order local derviation inverse of a matrix matrix and Second first-order partial derivative determines the second second order average gradient of loss function.

In one embodiment, second order average gradient determining module 504 be additionally operable to obtain epicycle clean it is each pure The sampling probability of the corresponding dirty sample data of sample data；For every purified sample data that epicycle is cleaned, seek pair The product of the second second order local derviation inverse of a matrix matrix and the second first-order partial derivative answered and the sampling of corresponding dirty sample data are general The ratio of rate；The average value for seeking each ratio obtains the second second order average gradient of loss function.

In one embodiment, second order average gradient determining module 504 is additionally operable to according to following formula counting loss function The second second order average gradient：

As shown in fig. 6, in one embodiment, dirty sample data includes user characteristics sample data and the use accordingly demarcated Family portrait label.The device 500 further includes：

User's portrait label output module 508, after meeting training termination condition for model parameter after adjustment, is then obtained User characteristic data is taken, user characteristic data is inputted to the machine learning model of modulated mould preparation shape parameter, output user, which draws a portrait, to mark Label.

It should be noted that term " first " used in this application and " second " are only used for distinguishing, it is not used to suitable The restriction of sequence, size, subordinate etc..

One of ordinary skill in the art will appreciate that realizing all or part of flow in above-described embodiment method, being can be with Relevant hardware is instructed to complete by computer program, which can be stored in a computer-readable storage and be situated between In matter, the program is when being executed, it may include such as the flow of the embodiment of above-mentioned each method.Wherein, storage medium above-mentioned can be The non-volatile memory mediums such as magnetic disc, CD, read-only memory (Read-Only Memory, ROM) or random storage note Recall body (Random Access Memory, RAM) etc..

Each technical characteristic of above example can be combined arbitrarily, to keep description succinct, not to above-described embodiment In each technical characteristic it is all possible combination be all described, as long as however, the combination of these technical characteristics be not present lance Shield is all considered to be the range of this specification record.

Only several embodiments of the present invention are expressed for above example, the description thereof is more specific and detailed, but can not Therefore it is construed as limiting the scope of the patent.It should be pointed out that for those of ordinary skill in the art, Under the premise of not departing from present inventive concept, various modifications and improvements can be made, these are all within the scope of protection of the present invention. Therefore, the protection domain of patent of the present invention should be determined by the appended claims.

Claims

1. a kind of machine learning model training method, including：

According to the "current" model parameter of the existing purified sample data and machine learning model, the machine learning mould is determined First second order average gradient of the loss function of type；

The purified sample data and the "current" model parameter cleaned according to epicycle determine the of the loss function Two second order average gradients；

According to the first second order average gradient and the second second order average gradient, the whole second order of the loss function is obtained Average gradient；

When model parameter after adjustment is unsatisfactory for training termination condition, using next round as epicycle, described obtain at this is returned to To continue to train the step of existing purified sample data before the dirty sample data of wheel cleaning, until the model parameter after adjustment meets Training termination condition.

2. according to the method described in claim 1, it is characterized in that, described according to the existing purified sample data and machine The "current" model parameter of learning model determines the first second order average gradient of the loss function of the machine learning model, including：

The "current" model parameter of the existing purified sample data and machine learning model is substituted into the loss function；

Seek substituting into the first single order of the loss function of the existing purified sample data and the "current" model parameter Partial derivative and the first second order local derviation matrix；

According to the first second order local derviation inverse of a matrix matrix and first first-order partial derivative, the machine learning model is determined Loss function the first second order average gradient.

3. according to the method described in claim 2, it is characterized in that, described according to the first second order local derviation inverse of a matrix matrix With first first-order partial derivative, the first second order average gradient of the loss function of the machine learning model is determined, including：

The first second order average gradient of the loss function of the machine learning model is calculated according to following formula：

Wherein, g_c(θ) is the first second order average gradient；C is the abbreviation of clean, is indicated for calculating the average ladder of first second order The sample data of degree is pure data；R_cleanFor the existing purified sample data；φ () indicates loss function；H(φ ()) indicate the second order local derviation matrix of the loss function；For i-th of input number in the existing purified sample data According to；For i-th of output data in the existing purified sample data；θ is "current" model parameter；For the first second order local derviation inverse of a matrix matrix of the loss function；It is described First first-order partial derivative of loss function.

4. according to the method described in claim 1, it is characterized in that, the purified sample number cleaned according to epicycle According to the "current" model parameter, determine the second second order average gradient of the loss function, including：

The purified sample data that epicycle is cleaned and the "current" model parameter substitute into the loss function；

Seek substituting into the loss function of the purified sample data and the "current" model parameter that epicycle is cleaned Second first-order partial derivative and the second second order local derviation matrix；

According to the second second order local derviation inverse of a matrix matrix and second first-order partial derivative, the of the loss function is determined Two second order average gradients.

5. according to the method described in claim 4, it is characterized in that, described according to the second second order local derviation inverse of a matrix matrix With second first-order partial derivative, the second second order average gradient of the loss function is determined, including：

Obtain the sampling probability for the corresponding dirty sample data of each purified sample data that epicycle is cleaned；

For every purified sample data that epicycle is cleaned, the corresponding second second order local derviation inverse of a matrix is sought The ratio of the product of matrix and second first-order partial derivative and the sampling probability of corresponding dirty sample data；

The average value for seeking each ratio obtains the second second order average gradient of the loss function.

6. according to the method described in claim 5, it is characterized in that, the average value for seeking each ratio, obtains the damage The second second order average gradient of function is lost, including：

The second second order average gradient of the loss function is calculated according to following formula：

Wherein, g_s(θ) is the second second order average gradient；S is the purified sample data that epicycle is cleaned；C is the abbreviation of clean, Indicate that the sample data for calculating the second second order average gradient is pure data；P (i) is i-th of dirty sample that epicycle extracts The sampling probability of notebook data；φ () indicates loss function；H (φ ()) indicates the second order local derviation matrix of loss function；For this I-th of input data in the purified sample data that wheel cleaning obtains；I-th in the purified sample data cleaned for epicycle A output data；θ is "current" model parameter；For the second second order local derviation inverse of a matrix of loss function Matrix；For the second first-order partial derivative of loss function.

7. according to the method described in claim 1, it is characterized in that, described according to the first second order average gradient and described Two second order average gradients obtain the whole second order average gradient of the loss function, including：

By the first second order average gradient and the second second order average gradient, respectively according to corresponding first weight and second Weight is summed, and the whole second order average gradient of the loss function is obtained；

Wherein, first weight, existing purified sample data account for whole sample datas before cleaning dirty sample data for epicycle Accounting；

Second weight, the dirty sample data cleaned for epicycle before dirty sample data account for the accounting of whole sample datas.

8. method according to any one of claim 1 to 7, which is characterized in that the dirty sample data includes user spy Sign sample data and the user's portrait label accordingly demarcated；The method further includes：

After model parameter after adjustment meets training termination condition, then

User characteristic data is obtained, the user characteristic data is inputted to the machine learning model of modulated mould preparation shape parameter, Export user's portrait label.

9. a kind of machine learning model training device, which is characterized in that described device includes：

Second order average gradient determining module, for the current mould according to the existing purified sample data and machine learning model Shape parameter determines the first second order average gradient of the loss function of the machine learning model；

The sample data acquisition module is additionally operable to acquisition epicycle and is obtained after taking the dirty sample data cleaning in part in dirty sample data The purified sample data arrived；

The second order average gradient determining module is additionally operable to the purified sample data cleaned according to epicycle and described works as Preceding model parameter determines the second second order average gradient of the loss function；According to the first second order average gradient and described Second second order average gradient obtains the whole second order average gradient of the loss function；

Model parameter adjusts module, for adjusting the "current" model parameter according to the whole second order average gradient；Work as adjustment When model parameter afterwards is unsatisfactory for training termination condition, using next round as epicycle, the sample data acquisition module work is notified Make, until the model parameter after adjustment meets training termination condition.

10. device according to claim 9, which is characterized in that the second order average gradient determining module is additionally operable to institute The "current" model parameter for stating existing purified sample data and machine learning model substitutes into the loss function；It seeks described in substitution The first first-order partial derivative and the first second order of the loss function of existing purified sample data and the "current" model parameter Local derviation matrix；According to the first second order local derviation inverse of a matrix matrix and first first-order partial derivative, the engineering is determined Practise the first second order average gradient of the loss function of model.

11. device according to claim 10, which is characterized in that the second order average gradient determining module be additionally operable to according to Following formula calculates the first second order average gradient of the loss function of the machine learning model：

12. device according to claim 9, which is characterized in that the second order average gradient determining module is additionally operable to this The purified sample data and the "current" model parameter that wheel cleaning obtains substitute into the loss function；It is clear to seek substitution epicycle Second first-order partial derivative of the loss function of the purified sample data and the "current" model parameter washed and Two second order local derviation matrixes；According to the second second order local derviation inverse of a matrix matrix and second first-order partial derivative, determine described in Second second order average gradient of loss function.

13. device according to claim 12, which is characterized in that the second order average gradient determining module is additionally operable to obtain The sampling probability for the corresponding dirty sample data of each purified sample data that epicycle is cleaned；Epicycle is cleaned Every purified sample data seek the corresponding second second order local derviation inverse of a matrix matrix and the second single order local derviation The ratio of several products and the sampling probability of corresponding dirty sample data；The average value for seeking each ratio, obtains the loss Second second order average gradient of function.

14. device according to claim 13, which is characterized in that the second order average gradient determining module be additionally operable to according to Following formula calculates the second second order average gradient of the loss function：

15. the device according to any one of claim 9 to 14, which is characterized in that the dirty sample data includes user Feature samples data and the user's portrait label accordingly demarcated；Described device further includes：

User's portrait label output module then obtains user after meeting training termination condition for model parameter after adjustment The user characteristic data is inputted the machine learning model of modulated mould preparation shape parameter, output user's portrait by characteristic Label.