CN109816042A

CN109816042A - Method, apparatus, electronic equipment and the storage medium of data classification model training

Info

Publication number: CN109816042A
Application number: CN201910105031.6A
Authority: CN
Inventors: 申世伟
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2019-02-01
Filing date: 2019-02-01
Publication date: 2019-05-28
Anticipated expiration: 2039-02-01
Also published as: CN109816042B

Abstract

This disclosure relates to method, apparatus, electronic equipment and the storage medium of deep learning field more particularly to a kind of training of data classification model.The described method includes: obtaining multiple first sample data, multiple second sample datas and the first the number of iterations and secondary iteration number；In the first the number of iterations, multiple first sample data are based on, the first data classification model is trained using annular training method, obtains the second data classification model；In secondary iteration number, based on multiple second sample datas, the second data classification model is trained using tree-like training method, obtain third data classification model, when carrying out data classification training, annular training method and tree-like training method are combined, a large amount of time can be saved compared to tree-like training method is used alone, the accuracy that can also ensure that data classification model simultaneously, improves the training effectiveness of data classification model.

Description

Method, apparatus, electronic equipment and the storage medium of data classification model training

Technical field

This disclosure relates to deep learning field more particularly to data classification model training method, apparatus, electronic equipment and Storage medium.

Background technique

With the development of depth learning technology, people can effectively handle voice by using the method for deep learning The machine sorts problem such as identification or image classification.When carrying out data classification, first training data disaggregated model is needed, is based on data Disaggregated model carries out data classification.In order to improve the accuracy of data classification model classification, generally require to provide a large amount of sample Data, more sample datas also imply that heavy computation burden.In deep learning training field, compared to CPU For (Central Processing Unit, central processing unit), GPU (Graphics Processing Unit, at figure Manage device) it is very advantageous in terms of matrix parallelization calculating, it is more suitable for carrying out the training of data classification model.

In the related technology, in training data disaggregated model, common training method is tree-like training method, for example, by using Conventional more card distributed algorithms, send sample data and the complete network structure of model on each GPU, use multiple GPU Model training is carried out, trains end to summarize GPU by one every time and collects trained each model parameter, root from other GPU According to each model parameter that other GPU training obtains, the average value of model parameter is determined, then by the average value of model parameter It is distributed on other GPU, then carries out model training, until the number of iterations reaches preset total the number of iterations, training is obtained at this time Data classification model.

The relevant technologies the problem is that, during training pattern, in preset total the number of iterations, each time Iteration requires for the model parameter of all GPU to be synchronized to carries out operation together, growth and/or model with GPU total number Parameter increases, the call duration time of the GPU for summarizing can linear growth, the data of needs can not be trained in a short time Disaggregated model, therefore, training data disaggregated model occupy a large amount of time, lead to the low efficiency of data classification model training.

Summary of the invention

The disclosure provides method, apparatus, electronic equipment and the storage medium of a kind of data classification model training, can overcome Training data disaggregated model occupies a large amount of time, leads to the low efficiency problem of data classification model training.

According to the first aspect of the embodiments of the present disclosure, a kind of method of data classification model training is provided, comprising:

Multiple first sample data, multiple second sample datas and the first the number of iterations and secondary iteration number are obtained, The sum of first the number of iterations and the secondary iteration number are total the number of iterations of model training, the first sample data For the training of the first data classification model, second sample data is used for the training of the second data classification model；

In first the number of iterations, the multiple first sample data are based on, using annular training method to first Data classification model is trained, and obtains the second data classification model；

In the secondary iteration number, it is based on the multiple second sample data, using tree-like training method to described Second data classification model is trained, and obtains third data classification model.

In one possible implementation, described in first the number of iterations, it is based on the multiple first sample Data are trained the first data classification model using annular training method, obtain the second data classification model, comprising:

By multiple first training machines and the multiple first sample data, first data classification model is carried out Training obtains the first model parameter that each first training machine training obtains；

According to the annular order of connection of each first training machine, the first mould that each first training machine is obtained Shape parameter is transmitted, so that each first training machine gets the first model parameter that other first training machines obtain；

It is the first model parameter for being obtained according to first training machine training, described for each first training machine The first model parameter and the multiple first sample that other first training machines training that first training machine is got obtains Notebook data is iterated training to first data classification model, is until the number of iterations reaches first the number of iterations Only, second data classification model is obtained.

It is described to pass through the multiple first training machine and the multiple first sample in alternatively possible implementation Notebook data is trained first data classification model, obtains the first model that each first training machine training obtains Parameter, comprising:

The multiple first sample data are divided into the first quantity sample data group, each sample data group includes extremely Few first sample data, first quantity are the quantity of the multiple first training machine；

For the first training machine of each of each iteration, selected from the first quantity sample data group one not Distribute to the sample data group of first training machine；

By first training machine and the sample data group, instruction is iterated to first data classification model Practice, obtains first model parameter.

In alternatively possible implementation, first model obtained according to first training machine training is joined The first model parameter that the training of other first training machines that several, described first training machine is got obtains and the multiple First sample data are iterated training to first data classification model, until the number of iterations reaches first iteration Until number, second data classification model is obtained, comprising:

It determines the first learning rate of corresponding first data classification model of current iteration training, and determines described the Second learning rate of one training machine；

In iterative process each time, according to first learning rate, second learning rate to first training airplane First model parameter of other the first training machines that the first model parameter of device, first training machine are got and institute It states multiple first sample data progress operations to be updated the second data classification model by operation result, repeat described every The process of an iteration obtains second data classification model until the number of iterations reaches first the number of iterations.

In alternatively possible implementation, the determining current iteration trains the corresponding first data classification mould First learning rate of type, comprising:

When the number of iterations of current iteration training is zero, using initial learning rate as current iteration training First learning rate；

When the number of iterations of current iteration training is not zero, and the current iteration number is in third the number of iterations When, the third learning rate of last iteration is obtained, the third learning rate is linearly increased, obtains the current iteration instruction The first experienced learning rate, the third the number of iterations are less than first the number of iterations；

When the number of iterations of current iteration training is in the 4th the number of iterations, the third of last iteration is obtained Habit rate is decayed the third learning rate using polynomial decay strategy, and obtain the current iteration training first is learned Habit rate, the 4th the number of iterations is greater than the third the number of iterations, and is less than first the number of iterations.

In alternatively possible implementation, the second learning rate of determination first training machine, comprising:

Determine the ladder of network layer where first training machine, the weight of the network layer and the network layer Degree；

According to the network layer, the weight of the network layer, the gradient of the network layer and first training machine The first model parameter, determine the second learning rate of first training machine；

Wherein, second learning rate and the weight of the network layer and the first model parameter of first training machine It is positively correlated, the gradient of second learning rate and the network layer is negatively correlated.

It is described in the secondary iteration number in alternatively possible implementation, it is based on the multiple second sample Notebook data is trained second data classification model using tree-like training method, obtains third data classification model, packet It includes:

By the multiple second training machine and the multiple second sample data, to second data classification model It is trained, obtains the second model parameter that each second training machine training obtains；

Second model parameter of the multiple second training machine is transmitted to and summarizes machine, summarizes machine by described, Based on the second model parameter that each second training machine training obtains, third model parameter is determined；

Summarize machine by described the third model parameter is issued to each second training machine；

It is right according to the third model parameter and the multiple second sample data for each second training machine Second data classification model is iterated training, until the number of iterations reaches the secondary iteration number, obtains institute State third data classification model.

It is described according to the third model parameter and the multiple second sample in alternatively possible implementation Data are iterated training to second data classification model, until the number of iterations reaches the secondary iteration number, Obtain the third data classification model, comprising:

Determine fourth learning rate of second data classification model in current iteration training；

In iterative process each time, according to the 4th learning rate to the third model parameter and the multiple Two sample datas carry out operation and are updated by operation result to the second data classification model, repeat the iteration each time Process obtain the third data classification model until the number of iterations reaches secondary iteration number.

In alternatively possible implementation, determination second data classification model is in current iteration training The 4th learning rate, comprising:

The 5th learning rate of second data classification model is obtained, the 5th learning rate is using annular training method Learning rate at the end of training first data classification model；

It is when the number of iterations of current iteration training is zero, the 5th learning rate and the ratio of the second quantity is true It is set to the 4th learning rate, second quantity is the quantity of the multiple second training machine；

When the number of iterations of current iteration training is not zero, the 6th learning rate of last iteration is obtained, is used Polynomial decay strategy decays the 6th learning rate, obtains the 4th learning rate when current iteration training.

In alternatively possible implementation, the annular training method is to be carried out using ring-allreduce algorithm Trained training method.

In alternatively possible implementation, when classifying to data to be sorted, by the number to be sorted According to inputting in the third data classification model, the classification results of the data are obtained.

According to the second aspect of an embodiment of the present disclosure, a kind of device for classifying data is provided, comprising:

Module is obtained, is configured as obtaining multiple first sample data, multiple second sample datas and the first iteration time Several and secondary iteration number, the sum of first the number of iterations and the secondary iteration number are total iteration time of model training Number, the first sample data are used for the training of the first data classification model, and second sample data is for the second data point The training of class model；

First training module is configured as in first the number of iterations, is based on the multiple first sample data, is made The first data classification model is trained with annular training method, obtains the second data classification model；

Second training module is configured as in the secondary iteration number, is based on the multiple second sample data, is made Second data classification model is trained with tree-like training method, obtains third data classification model；

Input module is configured as when classifying to data to be sorted, and the data to be sorted are inputted institute It states in third data classification model, obtains the classification results of the data.

In one possible implementation, first training module is additionally configured to through multiple first training airplanes Device and the multiple first sample data, are trained first data classification model, obtain each first training machine The first model parameter that training obtains；

In alternatively possible implementation, first training module is additionally configured to the multiple first sample Notebook data is divided into the first quantity sample data group, and each sample data group includes at least one first sample data, described First quantity is the quantity of the multiple first training machine；

In alternatively possible implementation, first training module is additionally configured to determine current iteration training First learning rate of corresponding first data classification model, and determine the second learning rate of first training machine；

In alternatively possible implementation, first training module is additionally configured to instruct when the current iteration When experienced the number of iterations is zero, using initial learning rate as the first learning rate of current iteration training；

In alternatively possible implementation, first training module is additionally configured to determine first training The weight of network layer, the network layer where machine and the gradient of the network layer；

In alternatively possible implementation, second training module is additionally configured to by the multiple second Training machine and the multiple second sample data, are trained second data classification model, obtain each second instruction Practice the second model parameter that machine training obtains；

In alternatively possible implementation, second training module is additionally configured to determine second data Fourth learning rate of the disaggregated model in current iteration training；

In alternatively possible implementation, second training module is additionally configured to obtain second data 5th learning rate of disaggregated model, the 5th learning rate are using annular training method training first data classification model At the end of learning rate；

In alternatively possible implementation, described device further include:

According to the third aspect of an embodiment of the present disclosure, a kind of electronic equipment, including one or more processors are provided；

For storing the volatibility or nonvolatile memory of one or more of processor-executable instructions；

Wherein, one or more of processors are configured as executing the instruction of data classification model described in above-mentioned first aspect Experienced method.

According to a fourth aspect of embodiments of the present disclosure, a kind of non-transitorycomputer readable storage medium, the meter are provided Instruction is stored on calculation machine readable storage medium storing program for executing, described instruction realizes above-mentioned first aspect when being executed by the processor of electronic equipment The method of the data classification model training.

The technical scheme provided by this disclosed embodiment can include the following benefits:

By multiple first sample data being based on, using annular training method to the first data in the first the number of iterations Disaggregated model is trained, and obtains the second data classification model；In secondary iteration number, multiple second sample datas are based on, The second data classification model is trained using tree-like training method, obtains third data classification model, is carrying out data point When class, a large amount of time can be saved compared to traditional training method using annular training method, in combination with tree-like training Mode ensure that the accuracy for the data classification model that training obtains, and improve the training effectiveness of data classification model.

It should be understood that above general description and following detailed description be only it is exemplary and explanatory, not The disclosure can be limited.

Detailed description of the invention

The drawings herein are incorporated into the specification and forms part of this specification, and shows the implementation for meeting the disclosure Example, and together with specification for explaining the principles of this disclosure.

Fig. 1 is a kind of flow chart of the method for data classification model training shown according to an exemplary embodiment.

Fig. 2 is the flow chart of the method for another data classification model training shown according to an exemplary embodiment.

Fig. 3 is a kind of schematic diagram of annular training method shown according to an exemplary embodiment.

Fig. 4 is the schematic diagram of another annular training method shown according to an exemplary embodiment.

Fig. 5 is the schematic diagram of another annular training method shown according to an exemplary embodiment.

Fig. 6 is the schematic diagram of another annular training method shown according to an exemplary embodiment.

Fig. 7 is the schematic diagram of another annular training method shown according to an exemplary embodiment.

Fig. 8 is the schematic diagram of another annular training method shown according to an exemplary embodiment.

Fig. 9 is the schematic diagram of another annular training method shown according to an exemplary embodiment.

Figure 10 is a kind of schematic diagram of tree-like training method shown according to an exemplary embodiment.

Figure 11 is a kind of block diagram of data classification model training device shown according to an exemplary embodiment.

Figure 12 is the block diagram of a kind of electronic equipment shown according to an exemplary embodiment.

Specific embodiment

Example embodiments are described in detail here, and the example is illustrated in the accompanying drawings.Following description is related to When attached drawing, unless otherwise indicated, the same numbers in different drawings indicate the same or similar elements.Following exemplary embodiment Described in embodiment do not represent all implementations consistent with this disclosure.On the contrary, they be only with it is such as appended The example of the consistent device and method of some aspects be described in detail in claims, the disclosure.

Fig. 1 is a kind of flow chart of the method for data classification model training shown according to an exemplary embodiment, such as Fig. 1 Shown, the method for data classification model training is for including the following steps in electronic equipment.

In step s101, the multiple first sample data of electronic equipment acquisition, multiple second sample datas and first change Generation number and secondary iteration number, the sum of first the number of iterations and the secondary iteration number are total iteration time of model training Number.

In step s 102, in the first the number of iterations, electronic equipment is based on multiple first sample data, is instructed using annular The mode of white silk is trained the first data classification model, obtains the second data classification model.

In step s 103, in secondary iteration number, electronic equipment is based on multiple second sample datas, uses tree-like instruction The mode of white silk is trained the second data classification model, obtains third data classification model.

In the embodiments of the present disclosure, it by being based on multiple first sample data in the first the number of iterations, is instructed using annular The mode of white silk is trained the first data classification model, obtains the second data classification model；In secondary iteration number, based on more A second sample data is trained the second data classification model using tree-like training method, obtains third data classification mould Type can save a large amount of time compared to traditional training method using annular training method, together when carrying out data classification When in conjunction with tree-like training method ensure that the accuracy of data classification model that training obtains, improve the instruction of data classification model Practice efficiency.

Fig. 2 is the flow chart of the method for another data classification model training shown according to an exemplary embodiment, such as Shown in Fig. 2, the method for data classification model training is for including the following steps in electronic equipment.

In step s 201, electronic equipment obtains multiple first sample data, multiple second sample datas.

The first sample data are used for the training of the first data classification model, and second sample data is for the second data point The training of class model.Above-mentioned first sample data can be image, text information or voice signal etc., above-mentioned second sample data Or image, text information or voice signal etc..The selection of above-mentioned multiple second sample datas can be with above-mentioned multiple One sample data is identical, that is, uses same data as first sample data and the second sample data；Above-mentioned multiple second samples The selection of notebook data can also be different from above-mentioned multiple first sample data, i.e., using different data respectively as first sample Data and the second sample data；The selection of above-mentioned how many second sample datas can also be with above-mentioned multiple first sample data portions Split-phase is same, i.e., multiple first sample data have different parts, this public affairs from the existing identical part of multiple second sample datas again The selection for first sample data and the second sample data is opened without concrete restriction.Also, first sample data and second The quantity of sample data can be identical, can not also be identical.

The electronic equipment can be Portable mobile electronic device, such as: smart phone, tablet computer, laptop or Desktop computer etc. or other can complete the electronic equipment equipment of the disclosure.The electronic equipment is also possible to referred to as user and sets Other titles such as standby, portable electronic device, electronic equipment on knee, table type electronic equipment.

In step S202, electronic equipment determines the first the number of iterations and secondary iteration number.

In the embodiments of the present disclosure, it first passes through annular training method to be trained the first data classification model, obtains Then two data classification models are trained the second data classification model by tree-like training method, third data point are obtained Class model.Wherein, the first the number of iterations is using the frequency of training of annular training method, and secondary iteration number is to use tree-like instruction The frequency of training of the mode of white silk.The sum of first the number of iterations and secondary iteration number are total the number of iterations of model training.Also, the One the number of iterations and secondary iteration number can be identical, can not also be identical.

When the data to equivalent amount grade carry out the training of data classification model, the training speed of annular training method Fastly, the training precision of tree-like training method is high.Shorten the training time using annular training method, is protected using tree-like training method Demonstrate,prove training precision.Therefore, electronic equipment can determine that first changes according to the training demand and total the number of iterations of data classification model Generation number and secondary iteration number.In one possible implementation, when the training demand is wanted for required precision higher than the time When asking, electronic equipment is arranged the first the number of iterations and is greater than secondary iteration number；When the training demand is that time requirement is higher than precision It is required that when, electronic equipment is arranged the first the number of iterations and is less than secondary iteration number.

In alternatively possible implementation, electronic equipment can also determine the precision prescribed of data classification model, root According to the precision prescribed, the proportionate relationship of the first the number of iterations and secondary iteration number is determined, according to total the number of iterations and the ratio Relationship determines the first the number of iterations and secondary iteration number.

For example, total the number of iterations is 100 times, the first the number of iterations can be 80 times, and secondary iteration number can be 20 times.

In step S203, in the first the number of iterations, electronic equipment is based on multiple first sample data, is instructed using annular The mode of white silk is trained the first data classification model, obtains the second data classification model.

Wherein, annular training method can be the training method being trained using ring-allreduce algorithm.First Data classification model can be initial data classification model, or the data classification model in training process.This step It can be realized by following steps (1) to (3), comprising:

(1) electronic equipment is by multiple first training machines and multiple first sample data, to the first data classification model It is trained, obtains the first model parameter that each first training machine training obtains.

First training machine is the equipment for including processor；For example, the first training machine can be include CPU equipment or Person includes the equipment of GPU.First quantity of multiple first training machines, which can according to need, to be configured and changes, in the disclosure In embodiment, the first quantity is not especially limited；For example, the first quantity can be 5 or 8 etc..

In a possible implementation, for each first training machine, electronic equipment is by the first data classification mould Type is deployed to the first training machine, and multiple first sample data are inputted the first data classification model in the first training machine In, multiple first sample data are based on, the first data classification model is trained by first training machine, obtain the One model parameter.

In another possible implementation, it is every that multiple first sample data can be grouped by electronic equipment A first training machine distributes one group of first sample data.Correspondingly, this step can pass through following steps (1-1) to (1-3) It realizes, comprising:

Multiple first sample data are divided into the first quantity sample data group, each sample number by (1-1) electronic equipment It include at least one first sample data according to group.

In this step, multiple first sample data can be evenly dividing as the first quantity sample data by electronic equipment Group can also unevenly be divided into the first quantity sample data group with multiple first sample data.Correspondingly, each sample data The quantity for the first sample data for including in group may be the same or different.Also, electronic equipment is by multiple first sample numbers After being divided into the first quantity sample data group, the data group mark of each sample data group, associated storage data are determined Group mark and sample data group.Wherein, data group mark can be the number of data group.

For example, the first quantity is M, then multiple first sample data are evenly dividing as M sample data group by electronic equipment, Each sample data group includes at least one first sample data, and each sample data group has the data of same size, each Sample data group is identified with unique data group, and data group mark can be DATA0, DATA1, DATA2 ... DATA (M- 1)。

(1-2) for the first training machine of each of each iteration, electronic equipment is from the first quantity sample data group Selection one is not yet assigned to the sample data group of the first training machine.

Wherein, electronic equipment is that each first training machine distributes a sample data group, and each first training machine Corresponding different sample data group.In first time repetitive exercise, electronic equipment is at random by the first quantity sample data component Dispensing the first training machine of the first quantity, and establish the machine mark of the data group mark and the first training machine of sample data group The mapping relations of knowledge.Wherein, machine identification can be the SN (Serial Number, product ID) number of the first training machine Or IP (Internet Protocol Address, Internet protocol address) address etc..It is right in second of repetitive exercise In each first training machine, electronic equipment determines the sample for being not yet assigned to the first training machine according to established mapping relations The data group of notebook data group identifies, and is identified according to the data group for the sample data group for being not yet assigned to the first training machine, by first The data group identifies corresponding sample data group and distributes to the first training machine in quantity sample data group.

For example, first training machine is GPU, then 3 the first training machines are respectively GPU0, GPU1 and GPU2 when M is 3, 3 sample data groups are respectively DATA0, DATA1 and DATA2.In first time repetitive exercise, electronic equipment distributes DATA0 To GPU0, DATA1 is distributed into GPU1, DATA2 is distributed into GPU2；In second of repetitive exercise, electronic equipment is by DATA1 GPU0 is distributed to, DATA2 is distributed into GPU1, DATA0 is distributed into GPU2；In third time repetitive exercise, electronic equipment will DATA2 distributes to GPU0, and DATA0 is distributed to GPU1, and DATA1 is distributed to GPU2.

(1-3) electronic equipment is iterated the first data classification model by the first training machine and sample data group Training, obtains the first model parameter.

In a possible implementation, which is input to the first training machine by electronic equipment, every When secondary repetitive exercise, which is based on by the first training machine, training is iterated to the first data classification model, Obtain the first model parameter.

In another possible implementation, which is divided into sample data block by electronic equipment, every When secondary repetitive exercise, a sample data block is only used.Correspondingly, this step can be with are as follows:

For the sample data group that each first training machine, electronic equipment will be assigned on first training machine, Even is divided into the first quantity sample data block.In each repetitive exercise, electronic equipment selects a non-selected sample Notebook data block, electronic equipment are based on the sample data block by the first training machine and are iterated instruction to the first data classification model Practice, obtains the first model parameter.

For example, the sample data group on each GPU is uniformly divided into M data block.It is shown in Figure 3, with M for 5 It is illustrated, that is, has 5 GPU, 5 data blocks of GPU0 are respectively a₀, b₀, c₀, d₀, e₀；5 data blocks of GPU1 are distinguished For a₁, b₁, c₁, d₁, e₁；5 data blocks of GPU2 are respectively a₂, b₂, c₂, d₂, e₂；5 data blocks of GPU3 are respectively a₃, b₃, c₃, d₃, e₃；5 data blocks of GPU4 are respectively a₄, b₄, c₄, d₄, e₄。

(2) electronic equipment obtains each first training machine according to the annular order of connection of each first training machine The first model parameter transmitted so that each first training machine gets the first mould that other first training machines obtain Shape parameter.

For each first training machine, electronic equipment is determined according to the annular order of connection of each first training machine Upper first training machine for first training machine and next first training machine receive upper first training machine The first model parameter that upper first training machine training sent obtains, first that first training machine training is obtained Model parameter is sent to next first training machine, and so on, until each first training machine gets other each Until the first model parameter that first training machine obtains.It is shown in Figure 4.

(3) for each first training machine, electronic equipment is joined according to the first model that the training of the first training machine obtains The first model parameter and multiple first samples that other first training machines training that number, the first training machine are got obtains Data are iterated training to the first data classification model, until the number of iterations reaches the first the number of iterations, obtain second Data classification model.

This step can be realized by following steps (3-1) to (3-3), comprising:

(3-1) electronic equipment determines the first learning rate of corresponding first data classification model of current iteration training；

Electronic equipment solves the first data classification mould using gradient descent method in the first data classification model of training The model parameter of type needs to control learning rate in suitable range to make gradient descent method have preferable performance, on Stating the first learning rate is a dynamic learning rate, is adjusted correspondingly according to the difference of the number of iterations, guarantees training process Stability.Determine that the process of the first learning rate is as follows:

When the number of iterations of current iteration training is zero, using initial learning rate as the first study of current iteration training Rate.Above-mentioned initial learning rate is the preset learning rate before starting the first data classification model of training.Learning rate setting is got over Small, model calculating is more stable, corresponding time-consuming longer, such as the initial learning rate can be set between 0.01 to 0.08, In the embodiment of the present disclosure, in order to guarantee trained stability, 0.01 can be set by the initial learning rate.

It is not zero in the number of iterations of current iteration training, and when current iteration number is in third the number of iterations, obtains The third learning rate of last iteration, third learning rate is linearly increased, and obtains the first learning rate of current iteration training, Third the number of iterations is less than the first the number of iterations.

After iteration starts, electronic equipment determines the first study of corresponding first data classification model of current iteration training Rate obtains the third learning rate of last iteration, by the third learning rate when current iteration number is in third the number of iterations It is linear to increase, the first learning rate of current iteration training is obtained, wherein third the number of iterations is less than the first the number of iterations；Above-mentioned Three learning rates are learning rates when last iteration is completed, and above-mentioned first learning rate is current this time iterative learning rate, such as the After the completion of an iteration, the learning rate after the completion of first time iteration is third learning rate, after which is linearly increased Learning rate as second of iteration；Above-mentioned linearly increase third learning rate refer to, each iteration in third the number of iterations It incrementally increases the learning rate of next iteration, after third the number of iterations, learning rate is increased into initial learning rate and the first quantity The size of product, to achieve the purpose that dynamic regularized learning algorithm rate, such as third the number of iterations is 5 times, and initial learning rate is 0.01, When GPU number is 5, learning rate is increased to 0.05 after the completion of 5 iteration, after first time iteration, third learning rate is 0.01, third learning rate linearly increases to obtain first learning rate of second of iteration be 0.02, after second of iteration, the Three learning rates be 0.02, third learning rate is linearly increased obtain third time iteration the first learning rate be 0.03, and so on, After the 5th time, third learning rate is 0.05.

When the number of iterations of current iteration training is in the 4th the number of iterations, the third study of last iteration is obtained Rate is decayed third learning rate using polynomial decay strategy, obtains the first learning rate of current iteration training, the 4th changes Generation number is greater than third the number of iterations, and less than the first the number of iterations.After the completion of third number of iterations, continue iteration, Within 4th the number of iterations, electronic equipment obtains third learning rate when last iteration is completed, and uses polynomial decay strategy The third learning rate is decayed, the first learning rate of current iteration training is obtained, wherein the 4th the number of iterations is greater than third The number of iterations, less than the first the number of iterations.Above-mentioned polynomial decay strategy refers to, with preset decaying step number by third learning rate Pad value is to preset learning rate.

It should be noted that third the number of iterations and the 4th the number of iterations and be the first the number of iterations.

(3-2) electronic equipment determines the second learning rate of corresponding first training machine of current iteration training；

Second learning rate of each first training machine carries out learning rate when operation for first training machine, different The first training machine locating for network layer it is different, different network layers has different weights, while different network layers With different gradients.The weight of second learning rate of network layer and network layer where above-mentioned first training machine is positively correlated, the First model parameter of the second learning rate and the first training machine of network layer where one training machine is positively correlated, the first training airplane The gradient of second learning rate of network layer and network layer where device is negatively correlated.Correspondingly, this step can be with are as follows: electronic equipment determines The weight of network layer, network layer where first training machine and the gradient of network layer；According to network layer, the power of network layer First model parameter of weight, the gradient of network layer and the first training machine determines the second learning rate of the first training machine.

In this step, electronic equipment can pass through the first model parameter and net of the second learning rate and the first training machine The weight of network layers is positively correlated, and the second learning rate is determined with any algorithm of the gradient negative correlation of network layer, in the embodiment of the present disclosure In, which is not especially limited；For example, electronic equipment according to network layer, the weight of network layer, the gradient of network layer and First model parameter of the first training machine determines that the second learning rate of each first training machine can be by formula one come real It is existing.

Formula one:

Wherein, l indicates network layer locating for the first training machine, λ¹Indicate the second learning rate of the first training machine, η table Show the weight of network layer locating for the first training machine, w¹Indicate that first is the first model parameter of training machine,It indicates The gradient of network layer locating for first training machine.

It should be noted that the first model parameter positive of the weight and the first training machine of the second learning rate and network layer It closes, the gradient of the second learning rate and network layer is negatively correlated.

(3-3) in iterative process each time, electronic equipment is according to the first learning rate, the second learning rate to the first training airplane First model parameter of other the first training machines that the first model parameter of device, the first training machine are got and multiple One sample data carries out operation and is updated by operation result to the second data classification model, repeats the mistake of iteration each time Journey obtains the second data classification model until the number of iterations reaches the first the number of iterations.

Above-mentioned each first training machine is assigned sample data group and has been divided evenly into the first number of data blocks, During an iteration, each first training machine successively carries out operation, Zhi Dao to the data of each data block in sequence One number of data blocks is calculated and is finished.

For each first training machine, during an iteration, first will do it between each first training machine Time data exchange that quantity subtracts one carries out first time data exchange after first time operation, and each first training machine incite somebody to action this The operation result of the data block of secondary operation i.e. the first model parameter is sent to next first training airplane of first training machine Device, while receiving the first model parameter of training machine transmission.The first current model parameter is updated,.? When being updated to the first model parameter of the network layer where the first training machine, formula two can be used and calculated.

Formula two:

Wherein, l indicates network layer locating for the first training machine, and t indicates renewal time, and γ indicates global learning rate, λ¹ Indicate the second learning rate of the first training machine,Indicate the gradient of network layer locating for the first training machine,It indicates The updated value of the parameter of current network.

When carrying out second of operation, each first training machine can be based on the first model parameter and the mould received The corresponding data block of shape parameter carries out operation, obtains the first model parameter, then carries out second of data exchange, repeats above-mentioned mistake The first quantity of Cheng Zhizhi subtracts once.Then according to the direction contrary with data exchange, each first training machine will be each First model parameter of data block is synchronized with other first training machines, until the first all training machines is all collected into First model parameter of all data blocks.Repeatedly during above-mentioned an iteration the step of, until the number of iterations reaches Until first the number of iterations, the second data classification model is obtained at this time.

For example, with reference to shown in Fig. 5 to Fig. 9, it is illustrated so that the first quantity is 5 as an example.In Fig. 5, there are 5 GPU, GPU0 5 data blocks be respectively a₀, b₀, c₀, d₀, e₀；5 data blocks of GPU1 are respectively a₁, b₁, c₁, d₁, e₁；5 numbers of GPU2 It is respectively a according to block₂, b₂, c₂, d₂, e₂；5 data blocks of GPU3 are respectively a₃, b₃, c₃, d₃, e₃；5 data blocks of GPU4 are distinguished For a₄, b₄, c₄, d₄, e₄.For the first time when training, GPU0 is to a₀Data block carries out operation, and GPU1 is to b₁Data block carries out operation, GPU2 is to c₂Data block carries out operation, and GPU3 is to d₃Data block carries out operation, and GPU4 is to e₄Data block carries out operation.First number When according to exchange, GPU0 is by a₀Data block and the first model parameter of GPU0 are sent to GPU1, while receiving the e of GPU4 transmission₄Number According to the first model parameter of block and GPU4, and so on, GPU1 receives the data that GPU0 is sent, and sends the data to simultaneously GPU2, until each GPU completes data exchange.When second of training, shown in Figure 6, GPU0 is to e₀+e₄Data block carries out Operation, GPU1 is to a₀+a₁Data block carries out operation, and GPU2 is to b₁+b₂Data block carries out operation, and GPU3 is to c₂+c₃Data block carries out Operation, GPU4 is to d₃+d₄Data block carries out operation.When second of data exchange, GPU0 is by e₀+e₄The of data block and GPU0 One model parameter is sent to GPU1, while receiving the d of GPU4 transmission₃+d₄First model parameter of data block and GPU4, according to this Analogize, GPU1 receives the data that GPU0 is sent, while sending the data to GPU2, until each GPU completes data exchange.The When training three times, shown in Figure 7, GPU0 is to d₀+d₃+d₄Data block carries out operation, and GPU1 is to e₀+e₁+e₄Data block is transported It calculates, GPU2 is to a₀+a₁+a₂Data block carries out operation, and GPU3 is to b₁+b₂+b₃Data block carries out operation, and GPU4 is to c₂+c₃+c₄Data Block carries out operation.When second of data exchange, GPU0 is by d₀+d₃+d₄Data block and the first model parameter of GPU0 are sent to GPU1, while receiving the c of GPU4 transmission₂+c₃+c₄First model parameter of data block and GPU4, and so on, GPU1 is received The data that GPU0 is sent, while GPU2 is sent the data to, until each GPU completes data exchange.4th time it is trained when, ginseng As shown in Figure 8, GPU0 is to c₀+c₂+c₃+c₄Data block carries out operation, and GPU1 is to d₀+d₁+d₃+d₄Data block progress operation, GPU2 pairs e₀+e₁+e₂+e₄Data block carries out operation, and GPU3 is to a₀+a₁+a₂+a₃Data block carries out operation, and GPU4 is to b₁+b₂+b₃+b₄Data block Carry out operation.When second of data exchange, GPU0 is by c₀+c₂+c₃+c₄Data block and the first model parameter of GPU0 are sent to GPU1, while receiving the b of GPU4 transmission₁+b₂+b₃+b₄First model parameter of data block and GPU4, and so on, GPU1 connects The data that GPU0 is sent are received, while sending the data to GPU2, until each GPU completes data exchange.When the 4th data At the end of exchange, there is a data block to have collected the corresponding data of other GPU in each GPU, such as be collected into GPU0 B₀+b₁+b₂+b₃+b₄, c is had collected in GPU1₀+c₁+c₂+c₃+c₄, the first part during an iteration is completed at this time, That is the process of scatter-reduce.Then according to the opposite direction of data exchange, data block is synchronized.Result after synchronizing It is shown in Figure 9, an iteration process is completed at this time.

In step S204, in secondary iteration number, electronic equipment is based on multiple second sample datas, uses tree-like instruction The mode of white silk is trained the second data classification model, obtains third data classification model.

This step can pass through step (1) to step (4) Lai Shixian.

(1) electronic equipment is by multiple second training machines and multiple second sample datas, to the second data classification model It is trained, obtains the second model parameter that each second training machine training obtains.

Second training machine is the equipment for including processor；For example, the first training machine can be include CPU equipment or Person includes the equipment of GPU.Second quantity of multiple second training machines, which can according to need, to be configured and changes, in the disclosure In embodiment, the second quantity is not especially limited；For example, the second quantity can be 5 or 8 etc..

It should be noted that above-mentioned first training machine can be used as the second training machine, independence also can be used Training machine except the first training machine is as the second training machine.Second quantity of the second training machine can be with first First quantity of training machine is identical, can also be different.

In a possible implementation, the second data classification model can be deployed to each second instruction by electronic equipment Practice on machine, and multiple second sample datas are distributed on each second training machine, each training machine is based on receiving The second sample data, the second data classification model is trained, the second model parameter is obtained.

In another possible implementation, the second data classification model can be deployed to each second by electronic equipment On training machine, and multiple second sample datas are grouped, distribute one group of second sample number for each second training machine According to correspondingly, this step can be realized by following steps (1-1) to (1-3), comprising:.

Multiple second sample datas are divided into the second quantity sample data group, each sample number by (1-1) electronic equipment It include at least one second sample data according to group.

In this step, multiple second sample datas can be evenly dividing as the second quantity sample data by electronic equipment Group can also unevenly be divided into the second quantity sample data group with multiple second sample datas.Correspondingly, each sample data The quantity for the second sample data for including in group may be the same or different.Also, electronic equipment is by multiple second sample numbers After being divided into the second quantity sample data group, the data group mark of each sample data group, associated storage data are determined Group mark and sample data group.Wherein, data group mark can be the number of data group.

For example, the second quantity is N, then multiple second sample datas are evenly dividing as N number of sample data group by electronic equipment, Each sample data group includes at least one second sample data, and each sample data group has the data of same size, each Sample data group is identified with unique data group, and data group mark can be DATA0, DATA1, DATA2 ... DATA (N- 1)。

(1-2) for the second training machine of each of each iteration, electronic equipment is from the second quantity sample data group Selection one is not yet assigned to the sample data group of the second training machine.

Wherein, electronic equipment is that each second training machine distributes a sample data group, and each second training machine Corresponding different sample data group.In first time repetitive exercise, electronic equipment is at random by the second quantity sample data component Dispensing the second training machine of the second quantity, and establish the machine mark of the data group mark and the second training machine of sample data group The mapping relations of knowledge.Wherein, machine identification can be the SN (Serial Number, product ID) number of the second training machine Or IP (Internet Protocol Address, Internet protocol address) address etc..It is right in second of repetitive exercise In each second training machine, electronic equipment determines the sample for being not yet assigned to the second training machine according to established mapping relations The data group of notebook data group identifies, and is identified according to the data group for the sample data group for being not yet assigned to the second training machine, by second The data group identifies corresponding sample data group and distributes to the second training machine in quantity sample data group.

For example, second training machine is GPU, then 3 the second training machines are respectively GPU0, GPU1 and GPU2 when N is 3, 3 sample data groups are respectively DATA0, DATA1 and DATA2.In first time repetitive exercise, electronic equipment distributes DATA0 To GPU0, DATA1 is distributed into GPU1, DATA2 is distributed into GPU2；In second of repetitive exercise, electronic equipment is by DATA1 GPU0 is distributed to, DATA2 is distributed into GPU1, DATA0 is distributed into GPU2；In third time repetitive exercise, electronic equipment will DATA2 distributes to GPU0, and DATA0 is distributed to GPU1, and DATA1 is distributed to GPU2.

(1-3) electronic equipment is iterated the second data classification model by the second training machine and sample data group Training, obtains the second model parameter.

The sample data group is input to the second training machine by electronic equipment, in each repetitive exercise, passes through the second instruction Practice machine and training is iterated to the second data classification model based on the sample data group, obtains the second model parameter.

(2) the second model parameter of multiple second training machines is transmitted to by electronic equipment summarizes machine, passes through summary machine Device determines third model parameter based on the second model parameter that the training of each second training machine obtains.

Summarize the machine that machine can be the second training machine same type, can also be different types of machine.Electronics is set Standby be transmitted to the second model parameter of multiple second training machines summarizes machine, and one kind is possible to be achieved in that, electronics is set For after the second whole training machines all calculates, the second all model parameters is transmitted to and summarizes machine；It is another Possible is achieved in that, at the end of having the second training machine operation, by the second obtained mould of second training machine Shape parameter, which is transmitted to, summarizes machine, until the second model parameter for obtaining the second whole training machines is all transmitted to summary machine Until device.As shown in Figure 10.

Summarize after machine receives the second whole model parameters, averaged operation carried out to the second model parameter, Third model parameter is obtained, and the second data classification model is updated.

(3) third model parameter is issued to each second training machine by summarizing machine by electronic equipment.

Third is summarized the third model parameter that machine obtains and is handed down to each second training machine respectively by electronic equipment, together When updated second data classification model is issued to each second training machine together.

(4) for each second training machine, according to third model parameter and multiple second sample datas, to the second number It is iterated training according to disaggregated model, until the number of iterations reaches secondary iteration number, obtains third data classification model.

This step can pass through following steps (4-1) to (4-2) Lai Shixian, comprising:

(4-1) determines fourth learning rate of second data classification model in current iteration training；

Electronic equipment obtains the 5th learning rate of the second data classification model, and the 5th learning rate is using annular training side Learning rate at the end of formula the first data classification model of training.

When the number of iterations of current iteration training is zero, electronic equipment is by the ratio of above-mentioned 5th learning rate and the second quantity Value is determined as the 4th learning rate；

When the number of iterations of current iteration training is not zero, i.e., after iteration starts, electronic equipment obtains last iteration The 6th learning rate, the 6th learning rate is decayed using polynomial decay strategy, obtain current iteration training when the 4th Learning rate.6th learning rate is learning rate when last iteration is completed, such as after the completion of first time iteration, is changed for the first time Learning rate when generation completes is the 6th learning rate, and the learning rate arrived after the 6th learning rate is decayed is as second of progress 4th learning rate of iteration.

(4-2) in iterative process each time, electronic equipment is according to the 4th learning rate to third model parameter and multiple Second sample data carries out operation and is updated to the second data classification model by operation result, repeats iteration each time Process obtains third data classification model until the number of iterations reaches secondary iteration number.

After multiple second sample datas are respectively allocated to each second training machine by electronic equipment, each second training airplane Device carries out operation to assigned sample data based on above-mentioned 4th learning rate, and each second training machine obtains the second model ginseng Number, the second model parameter is transmitted to and summarizes machine.Second model parameter is transmitted to by above-mentioned second training aids summarizes machine, converges It is the process that an iteration is completed that total machine, which obtains third model parameter, the second training machine be based on above-mentioned third model parameter with The process that assigned data are trained as next iteration, the iteration until completing secondary iteration number, obtains third number According to disaggregated model.

It should be noted that step S201-S204 is the training process of data classification model, and data classification model is only Need training primary.After data classification model training is completed, can by step S205, using the data classification model into Row data classification does not need the training for carrying out data classification model again.

In step S205, when classifying to data to be sorted, electronic equipment inputs the data to be sorted In third data classification model, the classification results of data are obtained.

When needing to carry out data classification, the data being classified input will be needed to train obtained third data classification mould The result of data classification can be obtained in type.Data to be sorted can be image, audio signal etc..When data to be sorted are When image, which can be the disaggregated model at the identification age from facial image, or determination is more It equally can be to filter out the image for meeting preset kind with the image of similitude in a image.When data to be sorted are Audio signal, the third data classification model can be for based on the audio signal identification ages, or are sieved based on audio signal Select the audio signal for meeting preset condition.Classification results can be classification belonging to data to be sorted, or to be sorted What is be screened out in data meets the data of preset condition.

Figure 11 is a kind of block diagram of data classification model training device shown according to an exemplary embodiment.Referring to figure 11, which includes obtaining module 1101, the first training module 1102 and the second training module 1103.

The acquisition module 1101 is configured as obtaining multiple first sample data, multiple second sample datas and first The number of iterations and secondary iteration number, the sum of the first the number of iterations and secondary iteration number are total the number of iterations of model training.

First training module 1102, is configured as in the first the number of iterations, is based on multiple first sample data, uses Annular training method is trained the first data classification model, obtains the second data classification model.

Second training module 1103, is configured in secondary iteration number, is based on multiple second samples Data are trained the second data classification model using tree-like training method, obtain third data classification model.

In one possible implementation, the first training module 1102 is additionally configured to through multiple first training airplanes Device and multiple first sample data, are trained the first data classification model, obtain each first training machine training and obtain The first model parameter, first sample data be used for the first data classification model training, the second sample data for second number According to the training of disaggregated model；

According to the annular order of connection of each first training machine, the first model that each first training machine is obtained is joined Number is transmitted, so that each first training machine gets the first model parameter that other first training machines obtain；

For each first training machine, the first model parameter obtained according to the training of the first training machine, the first training The first model parameter and multiple first sample data that other first training machines training that machine is got obtains, to first Data classification model is iterated training, until the number of iterations reaches the first the number of iterations, obtains the second data classification mould Type.

In alternatively possible implementation, the first training module 1102 is additionally configured to multiple first sample numbers According to the first quantity sample data group is divided into, each sample data group includes at least one first sample data, the first quantity For the quantity of multiple first training machines；

For the first training machine of each of each iteration, selection one is unallocated from the first quantity sample data group To the sample data group of the first training machine；

By the first training machine and sample data group, training is iterated to the first data classification model, obtains first Model parameter.

In alternatively possible implementation, the first training module 1102 is additionally configured to determine current iteration training First learning rate of corresponding first data classification model, and determine the second learning rate of the first training machine；

In iterative process each time, according to the first learning rate, the second learning rate to the first model of the first training machine First model parameter of other the first training machines that parameter, the first training machine are got and multiple first sample data into Row operation is updated the second data classification model by operation result, the process of iteration each time is repeated, until iteration Until number reaches the first the number of iterations, the second data classification model is obtained.

In alternatively possible implementation, the first training module 1102 is additionally configured to when current iteration training When the number of iterations is zero, using initial learning rate as the first learning rate of current iteration training；

When the number of iterations of current iteration training is not zero, and current iteration number is in third the number of iterations, obtain The third learning rate of last iteration, third learning rate is linearly increased, and obtains the first learning rate of current iteration training, Third the number of iterations is less than the first the number of iterations；

When the number of iterations of current iteration training is in the 4th the number of iterations, the third study of last iteration is obtained Rate is decayed third learning rate using polynomial decay strategy, obtains the first learning rate of current iteration training, the 4th changes Generation number is greater than third the number of iterations, and less than the first the number of iterations.

In alternatively possible implementation, the first training module 1102 is additionally configured to determine the first training machine The gradient of the network layer at place, the weight of network layer and network layer；

According to the first model parameter of network layer, the weight of network layer, the gradient of network layer and the first training machine, really Second learning rate of fixed first training machine；

Wherein, the first model parameter of the weight and the first training machine of the second learning rate and network layer is positively correlated, and second The gradient of learning rate and network layer is negatively correlated.

In alternatively possible implementation, the second training module 1103 is additionally configured to through multiple second training Machine and multiple second sample datas, be trained the second data classification model, and it is trained to obtain each second training machine The second model parameter arrived；

Second model parameter of multiple second training machines is transmitted to and summarizes machine, by summarizing machine, based on each The second model parameter that the training of second training machine obtains, determines third model parameter；

Third model parameter is issued to each second training machine by summarizing machine；

For each second training machine, according to third model parameter and multiple second sample datas, to the second data Disaggregated model is iterated training, until the number of iterations reaches secondary iteration number, obtains third data classification model.

In alternatively possible implementation, the second training module 1103 is additionally configured to determine the second data classification Fourth learning rate of the model in current iteration training；

Each time in iterative process, third model parameter and multiple second sample datas are carried out according to the 4th learning rate Operation is updated the second data classification model by operation result, repeats the process of iteration each time, until iteration time Until number reaches secondary iteration number, third data classification model is obtained.

In alternatively possible implementation, the second training module 1103 is additionally configured to obtain the second data classification 5th learning rate of model, the 5th learning rate are the study at the end of using the first data classification model of annular training method training Rate；

When the number of iterations of current iteration training is zero, the ratio of the 5th learning rate and the second quantity is determined as the 4th Learning rate, the second quantity are the quantity of multiple second training machines；

When the number of iterations of current iteration training is not zero, the 6th learning rate of last iteration is obtained, use is multinomial Formula decaying strategy decays the 6th learning rate, obtains the 4th learning rate when current iteration training.

In alternatively possible implementation, device further include:

Input module is configured as when classifying to data to be sorted, and data to be sorted are inputted third number According to the classification results in disaggregated model, obtaining data.

About the device in above-described embodiment, wherein modules execute the concrete mode of operation in related this method Embodiment in be described in detail, no detailed explanation will be given here.

Figure 12 is the block diagram of a kind of electronic equipment shown according to an exemplary embodiment.The electronic equipment 1200 can be Portable mobile electronic device, such as: smart phone, tablet computer, MP3 player (Moving Picture Experts Group Audio Layer III, dynamic image expert's compression standard audio level 3), MP4 (Moving Picture Experts Group Audio Layer IV, dynamic image expert's compression standard audio level 4) player, laptop Or desktop computer.Electronic equipment 1200 is also possible to referred to as user equipment, portable electronic device, electronic equipment on knee, platform Other titles such as formula electronic equipment.

In general, electronic equipment 1200 includes: processor 1201 and memory 1202.

Processor 1201 may include one or more processing cores, such as 4 core processors, 8 core processors etc..Place Reason device 1201 can use DSP (Digital Signal Processing, Digital Signal Processing), FPGA (Field- Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array, may be programmed Logic array) at least one of example, in hardware realize.Processor 1201 also may include primary processor and coprocessor, master Processor is the processor for being handled data in the awake state, also referred to as CPU (Central Processing Unit, central processing unit)；Coprocessor is the low power processor for being handled data in the standby state.? In some embodiments, processor 1201 can be integrated with GPU (Graphics Processing Unit, image processor), GPU is used to be responsible for the rendering and drafting of content to be shown needed for display screen.In some embodiments, processor 1201 can also be wrapped AI (Artificial Intelligence, artificial intelligence) processor is included, the AI processor is for handling related machine learning Calculating operation.

Memory 1202 may include one or more computer readable storage mediums, which can To be non-transient.Memory 1202 may also include high-speed random access memory and nonvolatile memory, such as one Or multiple disk storage equipments, flash memory device.In some embodiments, the non-transient computer in memory 1202 can Storage medium is read for storing at least one instruction, at least one instruction for performed by processor 1201 to realize this public affairs The method for opening the data classification model training that middle embodiment of the method provides.

In some embodiments, electronic equipment 1200 is also optional includes: peripheral device interface 1203 and at least one outside Peripheral equipment.It can be connected by bus or signal wire between processor 1201, memory 1202 and peripheral device interface 1203.Respectively A peripheral equipment can be connected by bus, signal wire or circuit board with peripheral device interface 1203.Specifically, peripheral equipment packet It includes: radio circuit 1204, touch display screen 1205, camera 1206, voicefrequency circuit 1207, positioning component 1208 and power supply 1209 At least one of.

Peripheral device interface 1203 can be used for I/O (Input/Output, input/output) is relevant outside at least one Peripheral equipment is connected to processor 1201 and memory 1202.In some embodiments, processor 1201, memory 1202 and periphery Equipment interface 1203 is integrated on same chip or circuit board；In some other embodiments, processor 1201, memory 1202 and peripheral device interface 1203 in any one or two can be realized on individual chip or circuit board, this implementation Example is not limited this.

Radio circuit 1204 is for receiving and emitting RF (Radio Frequency, radio frequency) signal, also referred to as electromagnetic signal. Radio circuit 1204 is communicated by electromagnetic signal with communication network and other communication equipments.Radio circuit 1204 is by telecommunications Number being converted to electromagnetic signal is sent, alternatively, the electromagnetic signal received is converted to electric signal.Optionally, radio circuit 1204 include: antenna system, RF transceiver, one or more amplifiers, tuner, oscillator, digital signal processor, volume solution Code chipset, user identity module card etc..Radio circuit 1204 can by least one wireless communication protocol come with it is other Electronic equipment is communicated.The wireless communication protocol includes but is not limited to: WWW, Metropolitan Area Network (MAN), Intranet, each third-generation mobile communication Network (2G, 3G, 4G and 5G), WLAN and/or WiFi (Wireless Fidelity, Wireless Fidelity) network.

Display screen 1205 is for showing UI (UserInterface, user interface).The UI may include figure, text, figure Mark, video and its their any combination.When display screen 1205 is touch display screen, display screen 1205 also has acquisition aobvious The ability of the touch signal on the surface or surface of display screen 1205.The touch signal can be used as control signal and be input to processing Device 1201 is handled.At this point, display screen 1205 can be also used for providing virtual push button and/or dummy keyboard, also referred to as soft button And/or soft keyboard.In some embodiments, display screen 1205 can be one, and the front panel of electronic equipment 1200 is arranged；Another In some embodiments, display screen 1205 can be at least two, be separately positioned on the different surfaces of electronic equipment 1200 or in folding Folded design；In still other embodiments, display screen 1205 can be flexible display screen, and the bending table of electronic equipment 1200 is arranged in On face or on fold plane.Even, display screen 1205 can also be arranged to non-rectangle irregular figure, namely abnormity screen.Display Screen 1205 can use LCD (Liquid Crystal Display, liquid crystal display), OLED (Organic Light- Emitting Diode, Organic Light Emitting Diode) etc. materials preparation.

CCD camera assembly 1206 is for acquiring image or video.Optionally, CCD camera assembly 1206 includes front camera And rear camera.In general, the front panel of electronic equipment is arranged in front camera, electronic equipment is arranged in rear camera The back side.In some embodiments, rear camera at least two are main camera, depth of field camera, wide-angle imaging respectively Head, any one in focal length camera, to realize that main camera and the fusion of depth of field camera realize that background blurring function, master are taken the photograph As head and wide-angle camera fusion realize pan-shot and VR (Virtual Reality, virtual reality) shooting function or Other fusion shooting functions.In some embodiments, CCD camera assembly 1206 can also include flash lamp.Flash lamp can be list Colour temperature flash lamp is also possible to double-colored temperature flash lamp.Double-colored temperature flash lamp refers to the combination of warm light flash lamp and cold light flash lamp, It can be used for the light compensation under different-colour.

Voicefrequency circuit 1207 may include microphone and loudspeaker.Microphone is used to acquire the sound wave of user and environment, and It converts sound waves into electric signal and is input to processor 1201 and handled, or be input to radio circuit 1204 to realize that voice is logical Letter.For stereo acquisition or the purpose of noise reduction, microphone can be separately positioned on the different portions of electronic equipment 1200 to be multiple Position.Microphone can also be array microphone or omnidirectional's acquisition type microphone.Loudspeaker be then used for will from processor 1201 or The electric signal of radio circuit 1204 is converted to sound wave.Loudspeaker can be traditional wafer speaker, be also possible to piezoelectric ceramics Loudspeaker.When loudspeaker is piezoelectric ceramic loudspeaker, the audible sound wave of the mankind can be not only converted electrical signals to, it can also To convert electrical signals to the sound wave that the mankind do not hear to carry out the purposes such as ranging.In some embodiments, voicefrequency circuit 1207 It can also include earphone jack.

Positioning component 1208 is used for the current geographic position of Positioning Electronic Devices 1200, to realize navigation or LBS (Location Based Service, location based service).Positioning component 1208 can be the GPS based on the U.S. The Galileo system of the dipper system or Russia of (Global Positioning System, global positioning system), China Positioning component.

Power supply 1209 is used to be powered for the various components in electronic equipment 1200.Power supply 1209 can be alternating current, Direct current, disposable battery or rechargeable battery.When power supply 1209 includes rechargeable battery, which can be Line rechargeable battery or wireless charging battery.Wired charging battery is the battery to be charged by Wireline, and wireless charging battery is The battery to be charged by wireless coil.The rechargeable battery can be also used for supporting fast charge technology.

In some embodiments, electronic equipment 1200 further includes having one or more sensors 1210.The one or more Sensor 1210 includes but is not limited to: acceleration transducer 1211, gyro sensor 1212, pressure sensor 1213, fingerprint Sensor 1214, optical sensor 1215 and proximity sensor 1216.

Acceleration transducer 1211 can detecte adding in three reference axis of the coordinate system established with electronic equipment 1200 Velocity magnitude.For example, acceleration transducer 1211 can be used for detecting component of the acceleration of gravity in three reference axis.Processing The acceleration of gravity signal that device 1201 can be acquired according to acceleration transducer 1211 controls touch display screen 1205 with lateral view Figure or longitudinal view carry out the display of user interface.Acceleration transducer 1211 can be also used for game or the movement number of user According to acquisition.

Gyro sensor 1212 can detecte body direction and the rotational angle of electronic equipment 1200, gyro sensor 1212 can cooperate with acquisition user to act the 3D of electronic equipment 1200 with acceleration transducer 1211.Processor 1201 is according to top The data that spiral shell instrument sensor 1212 acquires, may be implemented following function: action induction (for example changed according to the tilt operation of user Become UI), shooting when image stabilization, game control and inertial navigation.

Pressure sensor 1213 can be set under the side frame of electronic equipment 1200 and/or touch display screen 1205 Layer.When the side frame of electronic equipment 1200 is arranged in pressure sensor 1213, user can detecte to electronic equipment 1200 Signal is held, right-hand man's identification or quick behaviour are carried out according to the gripping signal that pressure sensor 1213 acquires by processor 1201 Make.It is aobvious to touching according to user by processor 1201 when the lower layer of touch display screen 1205 is arranged in pressure sensor 1213 The pressure operation of display screen 1205, realization control the operability control on the interface UI.Operability control includes button At least one of control, scroll bar control, icon control, menu control.

Fingerprint sensor 1214 is used to acquire the fingerprint of user, is collected by processor 1201 according to fingerprint sensor 1214 Fingerprint recognition user identity, alternatively, by fingerprint sensor 1214 according to the identity of collected fingerprint recognition user.Knowing Not Chu the identity of user when being trusted identity, authorize the user to execute relevant sensitive operation by processor 1201, which grasps Make to include solving lock screen, checking encryption information, downloading software, payment and change setting etc..Fingerprint sensor 1214 can be set Set the front, the back side or side of electronic equipment 1200.When being provided with physical button or manufacturer Logo on electronic equipment 1200, refer to Line sensor 1214 can be integrated with physical button or manufacturer Logo.

Optical sensor 1215 is for acquiring ambient light intensity.In one embodiment, processor 1201 can be according to light The ambient light intensity that sensor 1215 acquires is learned, the display brightness of touch display screen 1205 is controlled.Specifically, work as ambient light intensity When higher, the display brightness of touch display screen 1205 is turned up；When ambient light intensity is lower, the aobvious of touch display screen 1205 is turned down Show brightness.In another embodiment, the ambient light intensity that processor 1201 can also be acquired according to optical sensor 1215, is moved The acquisition parameters of state adjustment CCD camera assembly 1206.

Proximity sensor 1216, also referred to as range sensor are generally arranged at the front panel of electronic equipment 1200.Close to sensing Device 1216 is used to acquire the distance between the front of user Yu electronic equipment 1200.In one embodiment, work as proximity sensor 1216 when detecting that the distance between the front of user and electronic equipment 1200 gradually becomes smaller, and is touched by the control of processor 1201 aobvious Display screen 1205 is switched to breath screen state from bright screen state；When proximity sensor 1216 is detecting user and electronic equipment 1200 just When the distance between face becomes larger, touch display screen 1205 is controlled by processor 1201 and is switched to bright screen shape from breath screen state State.

It will be understood by those skilled in the art that structure shown in Figure 12 does not constitute the restriction to electronic equipment 1200, It may include perhaps combining certain components than illustrating more or fewer components or being arranged using different components.

The embodiment of the present disclosure additionally provides a kind of non-transitorycomputer readable storage medium, is used for electronic equipment, this is deposited At least one instruction, at least a Duan Chengxu, code set or instruction set, the instruction, the program, the code set are stored in storage media Or the method that the instruction set realizes the data classification model training of above-described embodiment when being loaded and executed by processor.

Those skilled in the art after considering the specification and implementing the invention disclosed here, will readily occur to its of the disclosure Its embodiment.The disclosure is intended to cover any variations, uses, or adaptations of the disclosure, these modifications, purposes or Person's adaptive change follows the general principles of this disclosure and including the undocumented common knowledge in the art of the disclosure Or conventional techniques.The description and examples are only to be considered as illustrative, and the true scope and spirit of the disclosure are by following Claim is pointed out.

It should be understood that the present disclosure is not limited to the precise structures that have been described above and shown in the drawings, and And various modifications and changes may be made without departing from the scope thereof.The scope of the present disclosure is only limited by the accompanying claims.

Claims

1. a kind of data classification model training method, which is characterized in that the described method includes:

Multiple first sample data, multiple second sample datas and the first the number of iterations and secondary iteration number are obtained, it is described The sum of first the number of iterations and the secondary iteration number are total the number of iterations of model training, and the first sample data are used for The training of first data classification model, second sample data are used for the training of the second data classification model；

In first the number of iterations, the multiple first sample data are based on, using annular training method to the first data Disaggregated model is trained, and obtains the second data classification model；

2. the method according to claim 1, wherein described in first the number of iterations, based on described more A first sample data are trained the first data classification model using annular training method, obtain the second data classification mould Type, comprising:

By multiple first training machines and the multiple first sample data, first data classification model is instructed Practice, obtains the first model parameter that each first training machine training obtains；

For each first training machine, the first model parameter obtained according to first training machine training, described first The first model parameter and the multiple first sample number that other first training machines training that training machine is got obtains According to being iterated training to first data classification model, until the number of iterations reaches first the number of iterations, obtain To second data classification model.

3. according to the method described in claim 2, it is characterized in that, described by the multiple first training machine and described more A first sample data are trained first data classification model, obtain what each first training machine training obtained First model parameter, comprising:

The multiple first sample data are divided into the first quantity sample data group, each sample data group includes at least one A first sample data, first quantity are the quantity of the multiple first training machine；

For the first training machine of each of each iteration, selection one is unallocated from the first quantity sample data group To the sample data group of first training machine；

By first training machine and the sample data group, training is iterated to first data classification model, Obtain first model parameter.

4. according to the method described in claim 2, it is characterized in that, the obtained according to first training machine training The first model parameter that the training of other first training machines that one model parameter, first training machine are got obtains and The multiple first sample data are iterated training to first data classification model, until the number of iterations reaches described Until first the number of iterations, second data classification model is obtained, comprising:

It determines the first learning rate of corresponding first data classification model of current iteration training, and determines first instruction Practice the second learning rate of machine；

In iterative process each time, according to first learning rate, second learning rate to first training machine First model parameter of other the first training machines that the first model parameter, first training machine are got and described more A first sample data carry out operation and are updated by operation result to the second data classification model, and repetition is described each time The process of iteration obtains second data classification model until the number of iterations reaches first the number of iterations.

5. according to the method described in claim 4, it is characterized in that, corresponding first number of the determining current iteration training According to the first learning rate of disaggregated model, comprising:

When the number of iterations of current iteration training is zero, using initial learning rate as the first of current iteration training Learning rate；

When the number of iterations of current iteration training is not zero, and the current iteration number is in third the number of iterations, The third learning rate for obtaining last iteration, the third learning rate is linearly increased, and obtains the current iteration training The first learning rate, the third the number of iterations be less than first the number of iterations；

When the number of iterations of current iteration training is in the 4th the number of iterations, the third study of last iteration is obtained Rate is decayed the third learning rate using polynomial decay strategy, obtains the first study of the current iteration training Rate, the 4th the number of iterations is greater than the third the number of iterations, and is less than first the number of iterations.

6. the method according to claim 1, wherein described in the secondary iteration number, based on described more A second sample data is trained second data classification model using tree-like training method, obtains third data point Class model, comprising:

By the multiple second training machine and the multiple second sample data, second data classification model is carried out Training obtains the second model parameter that each second training machine training obtains；

Second model parameter of the multiple second training machine is transmitted to and summarizes machine, summarizes machine by described, is based on The second model parameter that each second training machine training obtains, determines third model parameter；

For each second training machine, according to the third model parameter and the multiple second sample data, to described Second data classification model is iterated training, until the number of iterations reaches the secondary iteration number, obtains described Three data classification models.

7. method according to claim 1-6, which is characterized in that the annular training method is using ring- The training method that allreduce algorithm is trained.

8. a kind of data classification model training device, which is characterized in that described device includes:

Obtain module, be configured as obtaining multiple first sample data, multiple second sample datas and the first the number of iterations and Secondary iteration number, the sum of first the number of iterations and the secondary iteration number are total the number of iterations of model training, institute Training of the first sample data for the first data classification model is stated, second sample data is used for the second data classification model Training；

First training module is configured as in first the number of iterations, is based on the multiple first sample data, is used ring Shape training method is trained the first data classification model, obtains the second data classification model；

Second training module is configured as in the secondary iteration number, is based on the multiple second sample data, is used tree Shape training method is trained second data classification model, obtains third data classification model.

9. a kind of electronic equipment is characterized in that, comprising:

One or more processors；

Wherein, one or more of processors are configured as perform claim and require the described in any item data classification models of 1-7 Trained method.

10. a kind of non-transitorycomputer readable storage medium, it is stored with instruction on the computer readable storage medium, it is described The side of the described in any item data classification model training of claim 1-7 is realized when instruction is executed by the processor of electronic equipment Method.