Summary of the invention
It is existing because of scene to solve it is an object of the present invention to provide the alternative manner and device of a seed nucleus body model
Or the time is different, the distribution of test data and the distribution of training data form gap, lead to the problem that model checking is inaccurate.
According to the first aspect of the invention, the alternative manner of a seed nucleus body model is provided, comprising:
Desensitization process is carried out to received data, characteristic is extracted and screened in the data after the desensitization process;
According to the state of the core body model, hyper parameter is chosen in the characteristic, the core body model is carried out
Training;
Core body model after assessment training, and after assessment is qualified, the core body model after the iteration is online.
Further, method of the present invention, after carrying out desensitization process to received data, the method also includes:
Timing is carried out after carrying out desensitization process to received data, is more than or equal to preset reflow time interval when the time of timing
When, judge whether current business is in idle condition;
If so, carrying out data backflow, characteristic is extracted and screened in the data after the desensitization process to execute
The step of.
Further, method of the present invention, is extracted in the data after the desensitization process and screening characteristic
According to, comprising:
Data after the desensitization process are reprocessed;
Characteristic after extracting the reprocessing;
Characteristic after the reprocessing is post-processed, off-note data are filtered out, obtains training sample
This.
Further, method of the present invention, the state according to the core body model, is selected in the characteristic
Hyper parameter is taken, the core body model is trained, comprising:
According to the state of the core body model, hyper parameter is chosen in the training sample, the core body model is carried out
Training.
Further, method of the present invention post-processes the characteristic after the reprocessing, by off-note
Data are filtered out, and training sample is obtained, comprising:
Using clustering algorithm, it will be far from the data of cluster centre pre-determined distance as off-note data, filtered out;
According to sample selection strategy is preset, is sampled to the data after off-note data are filtered out, obtain training sample.
Further, method of the present invention, the core body model after the assessment training, comprising:
Core body model according to algorithm evaluation and test collection and arithmetic accuracy index, after assessment training.
Further, method of the present invention, the data to after the desensitization process are reprocessed, comprising:
If data after the desensitization process are image data, described image data are decoded, rescaling and
Image normalization processing.
According to the second aspect of the invention, the iteration means of a seed nucleus body model are provided, comprising:
Desensitization process module is used to analyze the received data carry out desensitization process;
Characteristic screening module, for characteristic to be extracted and screened in the data after the desensitization process;
Model training module chooses hyper parameter for the state according to the core body model in the characteristic, right
The core body model is trained;
Wire module on model, for assessing the core body model after training, and after assessment is qualified, by the core after the iteration
Body model is online.
Further, device of the present invention, the desensitization process module, comprising:
Judging unit, for received data carry out desensitization process after carry out timing, when timing time be greater than etc.
When preset reflow time interval, judge whether current business is in idle condition;
Data backflow unit, for carrying out data backflow when current business is in idle condition, to execute described
The step of being extracted in data after desensitization process and screening characteristic.
Further, device of the present invention, the characteristic screening module, comprising:
Unit is reprocessed, for reprocessing to the data after the desensitization process;
Characteristic extraction unit, for extracting the characteristic after the reprocessing;
Training sample acquiring unit, for being post-processed to the characteristic after the reprocessing, by off-note number
According to being filtered out, training sample is obtained.
Further, device of the present invention, the model training module, is used for:
According to the state of the core body model, hyper parameter is chosen in the training sample, the core body model is carried out
Training.
Further, device of the present invention, the training sample acquiring unit, is used for:
Using clustering algorithm, it will be far from the data of cluster centre pre-determined distance as off-note data, filtered out;
According to sample selection strategy is preset, is sampled to the data after off-note data are filtered out, obtain training sample.
Further, device of the present invention, wire module on the model, is used for:
Core body model according to algorithm evaluation and test collection and arithmetic accuracy index, after assessment training.
Further, device of the present invention, the reprocessing unit, is used for:
If data after the desensitization process are image data, described image data are decoded, rescaling and
Image normalization processing.
According to the third aspect of the invention we, a kind of storage medium is provided, the storage medium stores computer program instructions,
The computer program instructions method according to the present invention is executed.
According to the fourth aspect of the invention, a kind of calculating equipment is provided, comprising: for storing depositing for computer program instructions
Reservoir and processor for executing computer program instructions, wherein when the computer program instructions are executed by the processor,
It triggers the calculating equipment and executes method of the present invention.
The alternative manner and device of core body model provided by the invention, by carrying out desensitization process to received data,
The extraction and screening of characteristic are carried out in pretreated data;It overcomes because of time difference, the distribution and instruction of test data
The distribution for practicing data changes, and leads to the performance because of the variation influence depth learning algorithm of data;According to the core body mould
The state of type chooses hyper parameter in the characteristic, is trained to the core body model, overcomes because of scene difference,
The distribution of test data and the distribution of training data change, and lead to the property because of the variation influence depth learning algorithm of data
Energy;Method provided by the invention allows updated model to be adapted to each scene, has played the ultimate attainment performance of algorithm, improves
The precision of the core body verifying of the algorithm.
Specific embodiment
Present invention is further described in detail with reference to the accompanying drawing.
Fig. 1 is the flow diagram of the alternative manner for the seed nucleus body model that one embodiment of the invention provides, such as Fig. 1 institute
Show, the alternative manner of seed nucleus body model provided in an embodiment of the present invention, comprising:
101, desensitization process is carried out to received data, characteristic is extracted and screened in the data after the desensitization process
According to;
Desensitization process in the present embodiment, to be that the data from different scenes carry out desensitization process to received data,
It can be understood as the sensitive information in received data being removed, such as data desensitization process, pass through desensitization process guarantee
The risk that user information is stolen.
102, the state according to the core body model, chooses hyper parameter in the characteristic, to the core body model
It is trained;
There are many states of core body model, including various numerical informations, for example, loss function value, value of gradient etc.;
Hyper parameter in characteristic includes learning rate, momentum etc..
103, the core body model after assessment training, and after assessment is qualified, the core body model after the iteration is online.
Since the model performance after automation training is without assessment, if the performance standard for reaching online is unknown, therefore
It needs to configure automation assessment, corresponding parameter index is set to judge.
Different performance indicators is had according to different business scenarios, for example pays and logs in, therefore core after training
Body model is online again after the performance indicator assessment under required business scenario is qualified.
By carrying out desensitization process to received data, extraction and the sieve of characteristic are carried out in data after the pre-treatment
Choosing;It overcomes because time difference, the distribution of test data and the distribution of training data change, leads to the variation because of data
The performance of influence depth learning algorithm;According to the state of the core body model, hyper parameter is chosen in the characteristic, to institute
It states core body model to be trained, overcome because of scene difference, the distribution of test data and the distribution of training data change, lead
Reason is the performance of the variation influence depth learning algorithm of data;Method provided by the invention fits updated model
With each scene, the ultimate attainment performance of algorithm has been played, has improved the precision of the core body verifying of the algorithm.
Fig. 2 is the flow diagram of the alternative manner for the seed nucleus body model that one embodiment of the invention provides, such as Fig. 2 institute
Show, the alternative manner of seed nucleus body model provided in an embodiment of the present invention, comprising:
201, desensitization process is carried out to received data;
Desensitization process in the present embodiment is that received data are carried out desensitization process reflux operation.
Data desensitization i.e. due to being possible to the sensitive information comprising user in contextual data, include user's face, user name,
Transaction Information etc., therefore before data are flowed back, it needs to carry out desensitization process to data.Desensitization process mainly includes but unlimited
In the following aspects: (1) picture watermarking encryption;(2) the key messages anonymity such as user name is handled;(3) Transaction Information
It deletes.After data desensitization, local device can be stored, waits reflux.
In general, after carrying out desensitization process to received data, the method also includes: it is carried out to received data
Timing is carried out after desensitization process, when the time of timing being more than or equal to preset reflow time interval, judges that current business is
It is no to be in idle condition;
If so, carrying out data backflow, extracts and sieve to execute subsequent step i.e. in the data after the desensitization process
The step of selecting characteristic.
In addition there is reflux and asynchronous reflux two ways in real time in data backflow, and the method for the present embodiment uses asynchronous data
Reflux.Being primarily due to automation iterated logarithmic does not have high request according to real-time.And real time data reflux can have to bandwidth it is larger
It occupies, influences the user experience of main business.Asynchronous data reflux can with flexible setting reflow time interval, and main business free time
When, occupy extra bandwidth flow data back and forth.
Automatically dispose can be carried out using different time intervals for different scenes, the present embodiment is not to specific
Time interval carries out illustrated in greater detail.
202, the data after the desensitization process are reprocessed, the characteristic after extracting the reprocessing;
Carrying out reprocessing to the data after the desensitization process described in the above-mentioned steps 202 may comprise steps of:
If data after the desensitization process are image data, described image data are decoded, rescaling and
Image normalization processing.
General above-mentioned steps 202 include following sub-step:
Contextual data flows back into model server.
It will be appreciated that the side for receiving contextual data is arranged in model server, which can be with reception
The equipment of contextual data is set together, and also can be set at different ends.
Model server pre-processes contextual data, since contextual data includes many kinds, when contextual data is figure
When as data, then it includes image decoding, graphical rule adjustment, image normalization that model server, which carries out pretreatment to contextual data,
Deng processing.
Pretreated image carries out automation feature extraction by model server.
The whole flow process of the above method is all automatically performed by model server, and algorithm development personnel can not get user's
Image data and desensitization information.Using the data after reprocessing as the screening of subsequent characteristics data, it is mainly to protect subsequent
The uniformity of characteristic.
203, the characteristic after the reprocessing is post-processed, off-note data is filtered out, are instructed
Practice sample.
The characteristic after the reprocessing is post-processed in above-mentioned steps 203, off-note data are filtered
It removes, obtains training sample, including following sub-step:
2031, using clustering algorithm, the data of cluster centre pre-determined distance is will be far from as off-note data, filtered
It removes;
After the completion of contextual data feature extraction, needs to post-process these characteristics, filter out noise, select sample
This.
Due to that there can be some noises in contextual data, abnormal data is detected using clustering algorithm.Specifically
For, when characteristic is far from cluster centre certain distance, off-note will be judged as.Off-note will not be adopted
It is used as subsequent training sample.
2032, it according to sample selection strategy is preset, samples, is trained to the data after off-note data are filtered out
Sample.
Since the contextual data of reflux is very huge on sample size, and major part is all for existing model
Simple sample is added directly into training data to the more harm than good of model training.The strategy of sample selection is mainly line root
It is sampled according to the corresponding algorithm of feature.By taking face alignment algorithm as an example, algorithm can be sampled from the dimension of user id, such as often
Every 10 features of user.
204, the state according to the core body model, chooses hyper parameter in the characteristic, to the core body
Model is trained;
According to the state of the core body model, hyper parameter is chosen in the training sample, the core body model is carried out
Training.
Traditional model training method generally requires many hyper parameters of hand picking, and there are low efficiency, human cost are high
Disadvantage.Therefore the automatic training method based on intensified learning is used in this method.
When choosing hyper parameter in training sample, intensified learning can be during model training according to the state of model
(including loss function value, value of gradient etc.) is adjusted the hyper parameter (including learning rate, momentum etc.) of model.This method
Preferable one group of hyper parameter can be found in huge hyper parameter search space, this method has been proved to than manually adjusting
Parameter is more efficient.
205, the core body model after assessment training, and after assessment is qualified, the core body model after the iteration is online.
Core body model after assessment training above-mentioned, comprising:
Core body model according to algorithm evaluation and test collection and arithmetic accuracy index, after assessment training.
It is assessed about model automatization, since the model performance after automation training is without assessment, if reach online
Performance standard it is unknown, it is therefore desirable to configuration automation assessment, corresponding parameter index is set to judge.Model automatization assessment
Two aspects of the main selection comprising evaluation and test collection and arithmetic accuracy index.
Evaluate and test and collect about algorithm: algorithm evaluation and test rally includes basic test collection and scrnario testing collection.Basic test collection is main
The generalization ability of testing algorithm, and scrnario testing collection then performance of the main testing algorithm in corresponding service scene.
About arithmetic accuracy index, the selection of arithmetic accuracy index and specific business scenario and specific algorithm types have
It closes.For example, misclassification rate is a main performance indicator for face alignment algorithm.And different business scenarios, for example pay
With log in, just have different performance indicators.If the model arithmetic accuracy that automation training obtains touches the mark, just will do it
Automatic wire charging process.
Method provided in an embodiment of the present invention, by carrying out desensitization process, data after the pre-treatment to received data
The middle extraction and screening for carrying out characteristic;It overcomes because of time difference, the distribution of test data and the distribution of training data are sent out
Changing leads to the performance because of the variation influence depth learning algorithm of data;According to the state of the core body model, described
Choose hyper parameter in characteristic, the core body model be trained, overcome because scene difference, the distribution of test data with
The distribution of training data changes, and leads to the performance because of the variation influence depth learning algorithm of data;It is provided by the invention
Method allows updated model to be adapted to each scene, has played the ultimate attainment performance of algorithm, improves the core body of the algorithm
The precision of verifying.
In 5 steps of foregoing description, as shown in figure 3, including 5 processes, it can simplify as follows:
301, contextual data desensitization reflux: different scenes are collected into different data, after desensitizing by data, flow back into
Model server.
302, contextual data feature extraction: the model of model server current version completes reflux data automatically special
Sign is extracted, to be used for subsequent model training.
303, contextual data is screened: due to there is noise in the data characteristics of reflux, and not all sample is all mould
What type needed, therefore data filter is set, data are filtered.
304, model automatization training: by automatic machinery learning art, automatically select model hyper parameter, to model into
Row training.
305, model automatization evaluation is online: being tested by setting Performance Evaluation to complete the automation of model and assess, certainly
It is fixed whether online.
By this five stages, the method can excavate the deficiency of current algorithm from the contextual data of desensitization automatically,
Then this deficiency is made up by automatic model optimization iteration.Finally, updated model can more preferably be adapted to each field
Scape plays the ultimate attainment performance of algorithm.In addition, model is automatically updated since the method can configure reasonable time interval, from
And the time is overcome to influence data distribution point.
Fig. 4 is the structural schematic diagram of the iteration means for the seed nucleus body model that one embodiment of the invention provides, such as Fig. 4 institute
Show, the iteration means of seed nucleus body model provided in an embodiment of the present invention, comprising:
Desensitization process module 41 is used to analyze the received data carry out desensitization process;
Desensitization process in the present embodiment, to be that the data from different scenes carry out desensitization process to received data,
It can be understood as the sensitive information in received data being removed, such as data desensitization process, pass through desensitization process guarantee
The risk that user information is stolen.
Characteristic screening module 42, for characteristic to be extracted and screened in the data after the desensitization process;
Model training module 43 chooses hyper parameter in the characteristic for the state according to the core body model,
The core body model is trained;
There are many states of core body model, including various numerical informations, for example, loss function value, value of gradient etc.;
Hyper parameter in characteristic includes learning rate, momentum etc..
Wire module 44 on model, for assessing the core body model after training, and after assessment is qualified, after the iteration
Core body model is online.
Since the model performance after automation training is without assessment, if the performance standard for reaching online is unknown, therefore
It needs to configure automation assessment, corresponding parameter index is set to judge.
Different performance indicators is had according to different business scenarios, for example pays and logs in, therefore core after training
Body model is online again after the performance indicator assessment under required business scenario is qualified.
Desensitization process is carried out to received data by desensitization process module in the iteration means of the core body model, passes through spy
Levy the extraction and screening that characteristic is carried out in the data of data screening module after the pre-treatment;It overcomes because of time difference, surveys
The distribution of the distribution and training data that try data changes, and leads to the property because of the variation influence depth learning algorithm of data
Energy;State by model training module according to the core body model, chooses hyper parameter in the characteristic, to the core
Body model is trained, and is overcome because of scene difference, and the distribution of test data and the distribution of training data change, cause because
For the performance of the variation influence depth learning algorithm of data;Method provided by the invention is adapted to updated model respectively
A scene has played the ultimate attainment performance of algorithm, improves the precision of the core body verifying of the algorithm.
Fig. 5 is the structural schematic diagram of the iteration means for the seed nucleus body model that one embodiment of the invention provides, such as Fig. 5 institute
Show, the iteration means of seed nucleus body model provided in an embodiment of the present invention, comprising:
The desensitization process module 51, comprising:
Judging unit, for received data carry out desensitization process after carry out timing, when timing time be greater than etc.
When preset reflow time interval, judge whether current business is in idle condition;
Data backflow unit, for carrying out data backflow when current business is in idle condition, to execute described
The step of being extracted in data after desensitization process and screening characteristic.
Characteristic screening module 52, for reprocessing to the data after the desensitization process, extraction is described to be located again
Characteristic after reason;
Wherein, characteristic screening module 52 includes,
Unit 521 is reprocessed, for reprocessing to the data after the desensitization process;
Characteristic extraction unit 522, for extracting the characteristic after the reprocessing;
Training sample acquiring unit 523, for being post-processed to the characteristic after the reprocessing, by off-note
Data are filtered out, and training sample is obtained.
In an embodiment of the invention, the training sample acquiring unit 523, is used for:
Using clustering algorithm, it will be far from the data of cluster centre pre-determined distance as off-note data, filtered out;
According to sample selection strategy is preset, is sampled to the data after off-note data are filtered out, obtain training sample.
Model training module 53 chooses hyper parameter in the characteristic for the state according to the core body model,
The core body model is trained;
Wire module 54 on the model, is used for:
Core body model according to algorithm evaluation and test collection and arithmetic accuracy index, after assessment training.
In an embodiment of the invention, the reprocessing unit, is used for:
If data after the desensitization process are image data, described image data are decoded, rescaling and
Image normalization processing.
Fig. 4 and Fig. 5 shown device of the embodiment of the present invention is the realization device of method shown in Fig. 1 and Fig. 2 of the embodiment of the present invention,
Its concrete principle is identical as method shown in Fig. 1 and Fig. 2 of the embodiment of the present invention, and details are not described herein again.
In an embodiment of the invention, a kind of storage medium is also provided, the storage medium storage computer program refers to
It enables, the computer program instructions are executed according to the method for the embodiment of the present invention.
In the present invention one typical configuration, calculating equipment includes one or more processors (CPU), input/defeated
Outgoing interface, network interface and memory.
Memory may include the non-volatile memory in computer-readable medium, random access memory (RAM) and/or
The forms such as Nonvolatile memory, such as read-only memory (ROM) or flash memory (flashRAM).Memory is showing for computer-readable medium
Example.
In an embodiment of the invention, a kind of calculating equipment is also provided, comprising: for storing computer program instructions
Memory and processor for executing computer program instructions, wherein when the computer program instructions are executed by the processor
When, trigger the method for calculating equipment and executing the embodiment of the present invention.
Computer readable storage medium includes permanent and non-permanent, removable and non-removable media, can be by appointing
What method or technique realizes that information stores.Information can be computer readable instructions, data structure, program device or other
Data.The example of the storage medium of computer includes, but are not limited to phase change memory (PRAM), static random access memory
(SRAM), dynamic random access memory (DRAM), other kinds of random access memory (RAM), read-only memory
(ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory techniques, CD-ROM (CD-
ROM), digital versatile disc (DVD) or other optical storage, magnetic cassettes, magnetic tape disk storage or other magnetic storages
Equipment or any other non-transmission medium, can be used for storage can be accessed by a computing device information.
It should be noted that the present invention can be carried out in the assembly of software and/or software and hardware, for example, can adopt
With specific integrated circuit (ASIC), general purpose computer or any other realized similar to hardware device.In some embodiments
In, software program of the invention can be executed by processor to realize above step or function.Similarly, software of the invention
Program (including relevant data structure) can be stored in computer readable recording medium, for example, RAM memory, magnetic or
CD-ROM driver or floppy disc and similar devices.In addition, some of the steps or functions of the present invention may be implemented in hardware, for example,
As the circuit cooperated with processor thereby executing each step or function.
It is obvious to a person skilled in the art that invention is not limited to the details of the above exemplary embodiments, Er Qie
In the case where without departing substantially from spirit or essential attributes of the invention, the present invention can be realized in other specific forms.Therefore, no matter
From the point of view of which point, the present embodiments are to be considered as illustrative and not restrictive, and the scope of the present invention is by appended power
Benefit requires rather than above description limits, it is intended that all by what is fallen within the meaning and scope of the equivalent elements of the claims
Variation is included in the present invention.Any reference signs in the claims should not be construed as limiting the involved claims.This
Outside, it is clear that one word of " comprising " does not exclude other units or steps, and odd number is not excluded for plural number.That states in device claim is multiple
Unit or device can also be implemented through software or hardware by a unit or device.The first, the second equal words are used to table
Show title, and does not indicate any particular order.