WO2023050778A1 - 模型训练方法、系统、电子设备和计算机可读存储介质 - Google Patents

模型训练方法、系统、电子设备和计算机可读存储介质 Download PDF

Info

Publication number
WO2023050778A1
WO2023050778A1 PCT/CN2022/087439 CN2022087439W WO2023050778A1 WO 2023050778 A1 WO2023050778 A1 WO 2023050778A1 CN 2022087439 W CN2022087439 W CN 2022087439W WO 2023050778 A1 WO2023050778 A1 WO 2023050778A1
Authority
WO
WIPO (PCT)
Prior art keywords
party
model
feature
gradient
data
Prior art date
Application number
PCT/CN2022/087439
Other languages
English (en)
French (fr)
Inventor
姜磊
赵松
徐代刚
宋汉增
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2023050778A1 publication Critical patent/WO2023050778A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/20Administration of product repair or maintenance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2111/00Details relating to CAD techniques
    • G06F2111/02CAD in a network environment, e.g. collaborative CAD or distributed simulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2113/00Details relating to the application field
    • G06F2113/02Data centres

Definitions

  • the embodiments of the present application relate to the technical field of operation and maintenance, and in particular to a model training method, system, electronic device, and computer-readable storage medium.
  • the server at the operation point After the server at the operation point receives the fault report, it needs to analyze the fault and handle the fault accordingly, such as dispatching a fault ticket or performing self-healing operations.
  • the analysis of the fault by the server at the operation point you can use the fault analysis model to learn historical fault-related data, including the data of the fault itself, the data of related alarms, and the data of the final solution to the fault, etc., so as to analyze the real cause of the fault, whether the fault can be self-healed, and how long it takes Time can heal itself, and the recommended handling methods to solve the fault, etc.
  • An embodiment of the present application provides a model training method, which is applied to a first party, and the method includes: uploading the first feature of the first party's model to a third party; receiving the feature matrix sent by the third party; wherein, The feature matrix is generated by the third party according to the first feature and the second feature of the second party's model uploaded by the second party; the first party's model is trained according to the feature matrix .
  • the embodiment of the present application also provides a model training method, which is applied to a third party.
  • the method includes: receiving the first feature of the first party's model sent by the first party and the first feature of the second party's model sent by the second party. Two features; generate a feature matrix according to the first feature and the second feature; send the feature matrix to the first party; wherein, the feature matrix is used for the first party according to the feature matrix to train the model of the first party.
  • the embodiment of the present application also provides a model training system, including: a first party, a second party and a third party; the first party is used to send the first feature of the first party's model to the third party ; the second party is used to send the second feature of the second party's model to the third party; the third party is used to generate a feature matrix according to the first feature and the second feature, and sending the feature matrix to the first party; the first party is also used to train a model of the first party according to the feature matrix.
  • a model training system including: a first party, a second party and a third party; the first party is used to send the first feature of the first party's model to the third party ; the second party is used to send the second feature of the second party's model to the third party; the third party is used to generate a feature matrix according to the first feature and the second feature, and sending the feature matrix to the first party; the first party is also used to train a model of the first party according to the feature matrix.
  • the embodiment of the present application also provides an electronic device, including: at least one processor; and a memory connected in communication with the at least one processor; wherein, the memory stores information that can be executed by the at least one processor. Instructions, the instructions are executed by the at least one processor, so that the at least one processor can execute the above-mentioned model training method applied to the first party, or execute the above-mentioned model training method applied to the third party.
  • the embodiment of the present application also provides a computer-readable storage medium, which stores a computer program.
  • the computer program is executed by a processor, the above-mentioned model training method applied to the first party is realized, or the above-mentioned model applied to the third party is realized. training method.
  • Fig. 1 is a flowchart one of a model training method according to an embodiment of the present application
  • FIG. 2 is a schematic diagram of a connection relationship between a first party, a second party, and a third party according to an embodiment of the present application;
  • Fig. 3 is a flow chart 1 of the first party training the first party's model according to the feature matrix in one embodiment of the present application;
  • Fig. 4 is a flow chart 2 of the first party training the first party's model according to the feature matrix in one embodiment of the present application;
  • FIG. 5 is a second flowchart of a model training method according to another embodiment of the present application.
  • FIG. 6 is a flowchart three of a model training method according to another embodiment of the present application.
  • FIG. 7 is a flowchart four of a model training method according to another embodiment of the present application.
  • FIG. 8 is a flowchart five of a model training method according to another embodiment of the present application.
  • Fig. 9 is a schematic diagram of a model training system according to another embodiment of the present application.
  • Fig. 10 is a schematic structural diagram of an electronic device according to another embodiment of the present application.
  • the main purpose of the embodiments of the present application is to propose a model training method, system, electronic equipment, and computer-readable storage medium, which can quickly solve the operation and maintenance cold start problem of newly-built operation points, and quickly improve the operation and maintenance analysis capabilities of newly-built operation points. Thereby improving user experience.
  • the first is to label the data of newly-built operation points through manual analysis and manual labeling, and solve the problem of machine learning cold start from scratch; the second One is to directly use the mature fault analysis model of mature operation points, and then use the data of newly-built operation points to iteratively update; the third is to gather the data of mature operation points and the data of newly-built operation points together for learning, In order to perform machine learning tuning.
  • the inventors of this application found that using the first method requires manual participation, which is time-consuming and labor-intensive, and the whole process is very slow; when using the second method, due to the differences between the various operating points, the actual The characteristics will not be exactly the same in different situations. Directly using the mature fault analysis model of mature operation points will generate a lot of noise, and the fault analysis effect will not be good; when using the third method, consider the compliance of data Due to factors such as security, security, and privacy, various operating points may not be able to share business data.
  • an embodiment of the present application provides a model training method, which is applied to The electronic device of the first party, wherein the electronic device can be a terminal or a server.
  • This embodiment and the following embodiments are described by taking the server as an example.
  • the implementation details of the model training method in this embodiment are described in detail below.
  • the following content The implementation details provided for convenience of understanding are not necessary for implementing the present solution.
  • Step 101 upload the first feature of the first party's model to the third party.
  • Step 102 receiving a feature matrix sent by a third party.
  • the first party trains the first-party model
  • it can first upload the first feature of the first-party model to the third party, and receive the feature matrix returned by the third party, wherein the feature matrix is the first Generated by the three parties based on the first feature and the second feature of the second party's model uploaded by the second party.
  • the model of the first party and the model of the second party are fault analysis models
  • the first party is a new operation point
  • the second party is a mature operation point, that is, the second party has a mature and convergent model
  • the first party of the new operating point lacks features for training when performing model training, while the second party as a mature operating point has mature features that can be used for model training, and the first and second parties upload to the third party respectively.
  • the features of its own model that is, the first party uploads the first feature to the third party, the second party uploads the second feature to the third party, the third party fuses the first feature and the second feature, generates a feature matrix, and sends the feature matrix back to first party.
  • the third party can maintain a connection relationship with several first parties and one second party, and the connection relationship between the first party, the second party and the third party can be shown in Figure 2, and several first parties to the third party A first feature of the model of the first party is sent, and a second feature of the model of the second party is sent by the second party to the third party.
  • Step 103 train the model of the first party according to the feature matrix.
  • the first party after the first party receives the feature matrix sent by the third party, it can train the model of the first party according to the feature matrix.
  • the first party trains the first party's model according to the feature matrix, which can be implemented through the steps shown in Figure 3, specifically including:
  • step 201 the features in the feature matrix are used as candidate features in turn.
  • Step 202 traversing the data of the first party, and judging whether there is data of the first party corresponding to the candidate, if yes, perform step 203, otherwise, perform step 206.
  • Step 203 judging whether the data of the first parties corresponding to the candidate features are all the same, if yes, go to step 206 , otherwise, go to step 204 .
  • Step 204 taking the candidate feature as the target feature.
  • the first party after the first party receives the feature matrix sent by the third party, it can perform feature merging according to the feature matrix, that is, according to the feature matrix, the features in the feature matrix are used as candidate features in turn, and the first party is traversed based on the candidate features.
  • the data of the first party is searched in the feature matrix for the features corresponding to the data of the first party.
  • the first party believes that the candidate feature is an effective and meaningful feature, which can be used for the training of the first party's model.
  • the first party takes the candidate feature as the target feature and only keeps the target feature, which can avoid the first party Ineffective training,.
  • Step 205 train the model of the first party according to the target features.
  • the first party after the first party obtains the target features, it can train the first party's model according to the target features.
  • Step 206 ignore the candidate feature.
  • the candidate feature has no corresponding data of the first party, it means that the feature has nothing to do with the first party, and the first party can ignore the candidate feature.
  • the data of the first party corresponding to the candidate features are all the same, it means that the feature is meaningless to the first party, and the first party ignores the candidate feature. For example, if a candidate feature is "type of faulty network element", the data of the first party corresponding to this feature is "transmission network element”, and the first party can ignore the candidate feature of "type of faulty network element”.
  • the data of the first party corresponding to the candidate feature is smaller than the preset sparse threshold, it means that the candidate feature is too sparse for the first party and meaningless, and the first party ignores the candidate feature.
  • the candidate feature is "alarm automatic recovery time”
  • the preset sparse threshold is 3, and there are only two alarms in the first-party data, which will be automatically recovered, and the rest of the alarms will not be automatically recovered, that is, the candidate feature has only two
  • the first party can ignore the candidate feature of "alarm automatic reply time”.
  • the first party can upload the first feature of the first party's model to the third party, and receive the feature matrix returned by the third party in real time, wherein the feature matrix is the third party based on the first feature, and the second party The second feature of the uploaded second-party model is generated.
  • the first-party model can be trained according to the feature matrix.
  • the first party through the first The method of sharing features between the first party and the second party realizes horizontal federated learning, so that the first party can obtain the features of the second party and realize the expansion of the features of the first party.
  • the first party does not have enough features when it is used as a new operation point , the expansion of the features through the feature matrix can quickly solve the operation and maintenance cold start problem of the newly-built operation point, quickly improve the operation and maintenance analysis ability of the newly-built operation point, thereby improving the user experience, and at the same time, the embodiment of the application does not share the first
  • the business data of one party and the second party can guarantee the security and privacy of the business data of the first party and the second party.
  • the first party trains the first party's model according to the feature matrix, which can be implemented through the steps shown in Figure 4, specifically including:
  • Step 301 performing feature vectorization and labeling on the data of the first party.
  • the first party can perform feature engineering on the business data of the first party according to the features in the feature matrix to obtain the first gradient.
  • Feature vectorization and label labeling to obtain the first-party data after feature vectorization and label labeling, that is, data that can be used for training.
  • Step 302 input the first party's data after feature vectorization and labeling into a preset machine learning network to obtain a first gradient, and upload the first gradient to a third party.
  • the first party after the first party obtains the first-party data after feature vectorization and labeling, it can divide the first-party data after feature vectorization and labeling into test data and verification data, and input them in order To the preset machine learning network for training, after the iterative training is completed, the first party extracts the gradient of the trained network as the first gradient and sends it to the third party.
  • the preset machine learning network may be a neural network of a support vector machine (Support Vector Machines, SVM for short), a random forest neural network, a Graph Neural Networks (GNN for short), etc.
  • Step 303 receiving the comprehensive gradient sent by the third party.
  • the first party after the first party sends the first gradient to the third party, it can receive the comprehensive gradient returned by the third party, wherein the comprehensive gradient is generated by the third party based on the first gradient and the second gradient, and the second gradient is the first
  • the second party is trained based on the second party's data and the preset machine learning network, and the first party and the second party use the same network.
  • Step 304 train the model of the first party according to the comprehensive gradient.
  • the training of the model of the first party according to the feature matrix includes: performing feature vectorization and labeling on the data of the first party; Input the data of the first party into the preset machine learning network to obtain the first gradient, and upload the first gradient to the third party; receive the comprehensive gradient sent by the third party; wherein, the comprehensive The gradient is generated by the third party based on the first gradient and the second gradient, and the second gradient is obtained by the second party based on the second party's data and the preset machine learning network training ; According to the comprehensive gradient, the model of the first party is trained, the first party and the second party train to obtain gradients respectively, and the third party aggregates the gradient of the first party and the gradient of the second party to obtain a more scientific, In line with the comprehensive gradient of the real situation, the first party trains the first-party model according to the comprehensive gradient, which can further improve the effect of model training.
  • Another embodiment of the present application provides a model training method, which is applied to the electronic device of the first party.
  • the implementation details of the model training method in this embodiment are described in detail below. The following content is only the implementation details provided for easy understanding , it is not necessary to implement this solution, the specific process of the model training method of this embodiment can be shown in Figure 5, including:
  • Step 401 upload the first feature of the first party's model to the third party.
  • Step 402 receiving a feature matrix sent by a third party.
  • Step 403 train the model of the first party according to the feature matrix.
  • Step 401 to Step 403 are substantially the same as Step 101 to Step 103, and will not be repeated here.
  • Step 404 sending the first evaluation data of the trained model of the first party to the third party.
  • the first party after the first party trains the first party's model according to the feature matrix, it can send the first evaluation data of the trained first party's model to the third party, wherein the first evaluation data is used to represent Classification of the first-party data by the trained first-party model.
  • Step 405 Release the trained model of the first party after receiving the first release instruction sent by the third party.
  • the first party after the first party sends the first evaluation data of the trained first party's model to the third party, it may release the trained first party's model after receiving the first release instruction sent by the third party.
  • model wherein the third party sends a first release instruction to the first party when judging the convergence of the model of the first party after training according to the first evaluation data and the second evaluation data, and the second evaluation data is used to represent the model after training
  • the second party's model classifies the second party's data.
  • after the training of the model of the first party according to the feature matrix it includes: sending the first evaluation data of the model of the first party after training to the third party; wherein, The first evaluation data is used to characterize the classification of the first party's data by the trained model of the first party; after receiving the first release instruction sent by the third party, release the training The model of the first party after training; wherein, the third party, when judging that the model of the first party after training converges according to the first evaluation data and the second evaluation data, The party sends the first release instruction, the second evaluation data is used to represent the classification of the second party’s data by the trained model of the second party, and the third party uses the first evaluation data and the second 2.
  • Evaluation data to judge whether the model of the first party is converged. If it is judged that the model of the first party is converged, send the first release instruction to the first party to judge whether the model of the first party is converged more scientifically and accurately, so as to obtain better results. nice model.
  • FIG. 6 Another embodiment of the present application provides a model training method, which is applied to a third-party electronic device.
  • the implementation details of the model training method in this embodiment are described in detail below. The following content is provided for the convenience of understanding. It is not necessary to implement this solution, and the specific process of the model training method of this embodiment can be shown in Figure 6, including:
  • Step 501 acquire the first feature of the first party's model sent by the first party and the second feature of the second party's model sent by the second party.
  • the first party after the first party is established, it can send the first feature of the first party's model to the third party, and the third party selects the second party and instructs the second party to send the second party's model to the third party.
  • Second feature Second feature.
  • a third party may pull several first parties and several second parties to perform the model training method of this embodiment.
  • Step 502 generate a feature matrix according to the first feature and the second feature.
  • Step 503 sending the feature matrix to the first party for the first party to train the model of the first party according to the feature matrix
  • the third party after the third party receives the first feature sent by the first party and the second feature sent by the second party, it can aggregate the first feature and the second feature to generate a feature matrix, and send the first feature to the first party
  • the feature matrix is sent, and the feature matrix includes the first feature and the second feature.
  • the first party After receiving the feature matrix, the first party can train the model of the first party according to the feature matrix.
  • FIG. 7 Another embodiment of the present application provides a model training method, which is applied to a third-party electronic device.
  • the implementation details of the model training method in this embodiment are described in detail below. The following content is provided for the convenience of understanding. It is not necessary to implement this solution, and the specific process of the model training method of this embodiment can be shown in Figure 7, including:
  • Step 601 Receive the first feature of the first party's model sent by the first party and the second feature of the second party's model sent by the second party.
  • Step 602 generate a feature matrix according to the first feature and the second feature.
  • Step 603 sending the feature matrix to the first party for the first party to train the model of the first party according to the feature matrix.
  • Step 601 to Step 603 are substantially the same as Step 501 to Step 503, and will not be repeated here.
  • Step 604 receiving the first gradient sent by the first party and the second gradient sent by the second party.
  • the third party can obtain the first gradient sent by the first party to the third party and the second gradient sent by the second party to the third party in real time.
  • the second gradient is obtained by training the preset machine learning network
  • the second gradient is obtained by the second party based on the second party’s data and the preset machine learning network training, wherein the preset machine learning network used by the first party and the second party
  • the preset machine learning network used is the same neural network.
  • Step 605 aggregate the first gradient and the second gradient to generate a comprehensive gradient.
  • Step 606 sending the integrated gradient to the first party.
  • the third party can aggregate the first gradient and the second gradient according to the preset aggregation algorithm, Generate a comprehensive gradient, and send the comprehensive gradient to the first party for the first party to train the first party's model according to the gradient.
  • the preset aggregation algorithm can be set by those skilled in the art according to actual needs. The present application The embodiment does not specifically limit this.
  • the third party after the third party sends the integrated gradient to the first party, it can also send the integrated gradient to the second party for the second party to train the second party's model according to the integrated gradient.
  • the first gradient sent by the first party to the third party is an encrypted first gradient
  • the second gradient sent by the second party to the third party is an encrypted second gradient
  • the encryption method of the first gradient is the same as
  • the encryption method of the second gradient is the same. Encrypting the first gradient and the second gradient can effectively prevent gradient attacks and improve the security and reliability of the entire model training process.
  • the data of the first party and the data of the second party are independent and identically distributed data. If the first party and the second party are not independent of each other, it is meaningless to perform horizontal federated learning. Therefore, it is required that the first party and the second party The second party is independent of each other to prevent invalid horizontal federated learning and avoid waste of resources.
  • This embodiment also requires the data of the first party and the data of the second party to be in the same distribution, which can effectively speed up the training speed and further reduce the number of processing machines. Learn the time-consuming problem of cold start, and further quickly improve the operation and maintenance analysis ability of newly-built operation points.
  • FIG. 8 Another embodiment of the present application provides a model training method, which is applied to a third-party electronic device.
  • the implementation details of the model training method in this embodiment are described in detail below. The following content is provided for the convenience of understanding. It is not necessary to implement this solution, and the specific process of the model training method of this embodiment can be shown in Figure 8, including:
  • Step 701 receiving the first feature of the first party's model sent by the first party and the second feature of the second party's model sent by the second party.
  • Step 702 generate a feature matrix according to the first feature and the second feature.
  • Step 703 sending the feature matrix to the first party for the first party to train the model of the first party according to the feature matrix.
  • Step 701 to Step 703 are substantially the same as Step 501 to Step 503, and will not be repeated here.
  • Step 704 Receive the first evaluation data of the trained model of the first party sent by the first party, and the second evaluation data of the trained model of the second party sent by the second party.
  • the first evaluation data is used to represent how the trained model of the first party classifies the data of the first party
  • the second evaluation data is used to represent the classification of the data of the second party by the model of the second party after training. classification situation.
  • the model takes a binary classification model as an example. If the first party has 80 first-party data, the number of data that is actually true and learned to be true is 35, and the number of data that is actually true and learned is The number of false data is 10, the number of data that is actually false and learned to be true is 5, and the number of data that is actually false and learned to be false is 20, then the evaluation data is 35, 10, 5, 20. Numbers, considering that the first party lacks data that can be tagged, and the effect of transmitting evaluation indicators is not good, so this embodiment obtains evaluation data, which can make the process of model training more scientific.
  • Step 705 according to the first evaluation data and the second evaluation data, it is judged whether the model of the first party after training converges.
  • Step 706 When it is determined that the trained model of the first party is convergent, send a first publishing instruction to the first party for the first party to publish the trained model of the first party.
  • the first evaluation data is the first evaluation data of each first party.
  • the third party can calculate the global evaluation value according to the first evaluation data of each first party, and according to the first The first evaluation data and the second evaluation data of the third party are used to calculate the reference evaluation value.
  • the third party compares the global evaluation value with the reference evaluation value, and determines that the difference between the global evaluation value and the reference evaluation value is less than the preset first threshold. In this case, it is determined that the model of each first party after training converges, and a first release instruction is sent to each first party for each first party to release the model of the first party after training, wherein the preset first threshold can be It is set by those skilled in the art according to actual needs.
  • the third party may also receive the evaluation index value of the trained second party's model sent by the second party; wherein, the evaluation index value includes any combination of the following: the accuracy rate of the trained second party's model, The precision rate of the trained second-party model and the recall rate of the trained second-party model can be judged by the third party after receiving the evaluation index value of the trained second-party model sent by the second party.
  • the third party determines that the evaluation index value of the second-party model after training is higher than that of the second-party model before training If the evaluation index value of the first party’s model, or the difference between the evaluation index value of the second party’s model after training and the evaluation index value of the second party’s model before training is less than the preset second threshold, determine the training After the second party's model converges, send a second release instruction to the second party for the second party to release the trained model of the second party, wherein the preset second threshold can be determined by those skilled in the art according to In fact, it needs to be set. As long as the evaluation index value of the second-party model after training does not drop too much, it is acceptable. It can be considered that the trained model is convergent, thereby enhancing the training effect of the second-party model.
  • the evaluation index value is the accuracy rate of the first-party model, the accuracy rate of the second-party model before training is 98%, and the preset second threshold is 3%. If the second-party model after training If the accuracy rate of the model of the second party is 98.7%, the third party can confirm that the model of the second party after training is convergent; if the accuracy rate of the model of the second party after training is 96%, the third party can determine that the second party after training The model of the second party converges; if the accuracy rate of the model of the second party after training is 94.4%, the third party can determine that the model of the second party after training does not converge.
  • FIG. 9 is The schematic diagram of the model training system described in this embodiment includes: a first party 801 , a second party 802 and a third party 803 .
  • the first party 801 is used to send the first feature of the model of the first party 801 to the third party 803;
  • the second party 802 is used to send the second feature of the model of the second party 802 to the third party 803;
  • the third party 803 is configured to generate a feature matrix according to the first feature and the second feature, and send the feature matrix to the first party 801 .
  • the first party 801 is also used to train the model of the first party 801 according to the feature matrix.
  • this embodiment is a system embodiment corresponding to the above method embodiment, and this embodiment can be implemented in cooperation with the above method embodiment.
  • the relevant technical details and technical effects mentioned in the above embodiments are still valid in this embodiment, and will not be repeated here to reduce repetition.
  • the relevant technical details mentioned in this embodiment can also be applied in the above embodiments.
  • modules involved in this embodiment are logical modules.
  • a logical unit can be a physical unit, or a part of a physical unit, or multiple physical units. Combination of units.
  • units that are not closely related to solving the technical problem proposed in the present application are not introduced in this embodiment, but this does not mean that there are no other units in this embodiment.
  • FIG. 10 Another embodiment of the present application relates to an electronic device, as shown in FIG. 10 , including: at least one processor 901; and a memory 902 communicatively connected to the at least one processor 901; wherein, the memory 902 stores Instructions that can be executed by the at least one processor 901, the instructions are executed by the at least one processor 901, so that the at least one processor 901 can perform the model training applied to the first party in the above-mentioned embodiments method, or execute the model training method applied to a third party in each of the above embodiments.
  • the memory and the processor are connected by a bus
  • the bus may include any number of interconnected buses and bridges, and the bus connects one or more processors and various circuits of the memory together.
  • the bus may also connect together various other circuits such as peripherals, voltage regulators, and power management circuits, all of which are well known in the art and therefore will not be further described herein.
  • the bus interface provides an interface between the bus and the transceivers.
  • a transceiver may be a single element or multiple elements, such as multiple receivers and transmitters, providing means for communicating with various other devices over a transmission medium.
  • the data processed by the processor is transmitted on the wireless medium through the antenna, further, the antenna also receives the data and transmits the data to the processor.
  • the processor is responsible for managing the bus and general processing, and can also provide various functions, including timing, peripheral interface, voltage regulation, power management, and other control functions. Instead, memory can be used to store data that the processor uses when performing operations.
  • Another embodiment of the present application relates to a computer-readable storage medium storing a computer program.
  • the above method embodiments are implemented when the computer program is executed by the processor.
  • a storage medium includes several instructions to make a device ( It may be a single-chip microcomputer, a chip, etc.) or a processor (processor) to execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, abbreviated: ROM), random access memory (Random Access Memory, abbreviated: RAM), magnetic disk or optical disc, etc. medium for program code.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Business, Economics & Management (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Tourism & Hospitality (AREA)
  • Biophysics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Medical Informatics (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Strategic Management (AREA)
  • Molecular Biology (AREA)
  • General Business, Economics & Management (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Economics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Hardware Design (AREA)
  • Geometry (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

本申请实施例涉及运维技术领域,特别涉及一种模型训练方法、系统、电子设备和计算机可读存储介质,所述模型训练方法包括:向第三方上传所述第一方的模型的第一特征;接收所述第三方发送的特征矩阵;其中,所述特征矩阵是所述第三方根据所述第一特征和第二方上传的所述第二方的模型的第二特征生成的;根据所述特征矩阵对所述第一方的模型进行训练。

Description

模型训练方法、系统、电子设备和计算机可读存储介质
相关申请的交叉引用
本申请基于申请号为“202111162667.8”、申请日为2021年09月30日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此以引入方式并入本申请。
技术领域
本申请实施例涉及运维技术领域,特别涉及一种模型训练方法、系统、电子设备和计算机可读存储介质。
背景技术
在电信行业的运维技术中,运营点的服务器收到故障报告后,需要对故障进行分析,进行与故障相应的处理,如派发故障单或进行自愈等操作,运营点服务器对故障的分析,可以通过故障分析模型来学习历史故障相关的数据,包括故障本身的数据,相关告警的数据和最终解决该故障的数据等,从而分析出导致故障的真实原因,故障是否可以自愈,多长时间能够自愈,以及解决该故障的推荐处理手段等内容。
但是海外运营点和5G专网园区运营点这些新建运营点来说,虽然运营点的规划和建设已经完成,但在运营点的运维方面缺乏进行机器学习的数据,没有成熟故障分析模型来定义哪些特征需要用于机器学习,运营点也就无法快速进行有效运维,这就是机器学习冷启动的问题。
然而,解决机器学习冷启动的整个过程速度较慢,耗时较长,因此无法快速提升新建运营点的运维分析能力,不能满足新建运营点的实际需求。
发明内容
本申请实施例提供了一种模型训练方法,应用于第一方,该方法包括:向第三方上传所述第一方的模型的第一特征;接收所述第三方发送的特征矩阵;其中,所述特征矩阵是所述第三方根据所述第一特征和第二方上传的所述第二方的模型的第二特征生成的;根据所述特征矩阵对所述第一方的模型进行训练。
本申请实施例还提供了一种模型训练方法,应用于第三方,该方法包括:接收第一方发送的第一方的模型的第一特征和第二方发送的第二方的模型的第二特征;根据所述第一特征和所述第二特征,生成特征矩阵;向所述第一方发送所述特征矩阵;其中,所述特征矩阵用于供所述第一方根据所述特征矩阵对所述第一方的模型进行训练。
本申请实施例还提供了一种模型训练系统,包括:第一方,第二方和第三方;所述第一方用于向所述第三方发送所述第一方的模型的第一特征;所述第二方用于向所述第三方发送所述第二方的模型的第二特征;所述第三方用于根据所述第一特征和所述第二特征,生成特征矩阵,并向所述第一方发送所述特征矩阵;所述第一方还用于根据所述特征矩阵对所述第一方的模型进行训练。
本申请实施例还提供了一种电子设备,包括:至少一个处理器;以及,与所述至少一个处理器通信连接的存储器;其中,所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行上述应用于第一方的模型训练方法,或者执行上述应用于第三方的模型训练方法。
本申请实施例还提供了一种计算机可读存储介质,存储有计算机程序,所述计算机程序被处理器执行时实现上述应用于第一方的模型训练方法,或者实现上述应用于第三方的模型训练方法。
附图说明
图1是根据本申请一个实施例的模型训练方法的流程图一;
图2是根据本申请一个实施例中,第一方、第二方和第三方之间的连接关系的示意图;
图3是根据本申请一个实施例中,第一方根据特征矩阵对第一方的模型进行训练的流程图一;
图4是根据本申请一个实施例中,第一方根据特征矩阵对第一方的模型进行训练的流程图二;
图5是根据本申请另一个实施例的模型训练方法的流程图二;
图6是根据本申请另一个实施例的模型训练方法的流程图三;
图7是根据本申请另一个实施例的模型训练方法的流程图四;
图8是根据本申请另一个实施例的模型训练方法的流程图五;
图9是根据本申请另一个实施例的模型训练系统的示意图;
图10是根据本申请另一个实施例的电子设备的结构示意图。
具体实施方式
本申请实施例的主要目的在于提出一种模型训练方法、系统、电子设备和计算机可读存储介质,可以快速解决新建运营点的运维冷启动问题,快速提升新建运营点的运维分析能力,从而提升用户的使用体验。
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合附图对本申请的各实施例进行详细的阐述。然而,本领域的普通技术人员可以理解,在本申请各实施例中,为了使读者更好地理解本申请而提出了许多技术细节。但是,即使没有这些技术细节和基于以下各实施例的种种变化和修改,也可以实现本申请所要求保护的技术方案。以下各个实施例的划分是为了描述方便,不应对本申请的具体实现方式构成任何限定,各个实施例在不矛盾的前提下可以相互结合相互引用。
对于新建运营点来说,虽然运营点的规划和建设已经完成,但在运营点的运维方面缺乏进行机器学习的数据,没有成熟故障分析模型来定义哪些特征需要用于机器学习,运营点也就无法快速进行有效运维,这就是机器学习冷启动的问题,这种情况在海外的运营点场景和5G专网园区运营点场景比较常见,海外运营点虽然以传统4G网络建设居多,但未来也会走向5G建设;而随着5G的建设,其它工业行业在基于5G信息新基建基础上,可以通过切片虚拟化实现5G行业垂直关联工业,建立大量5G专网园区运营点,这两种情况在运维初期都缺少有标签的数据,无法进行有效的机器学习,也就无法快速进行有效的运维。
相关的解决机器学习冷启动的问题的方式有三种:第一种是通过人工分析、人工标注的 方式为新建运营点的数据标注标签,从无到有地解决机器学习冷启动的问题;第二种是直接将成熟运营点的成熟的故障分析模型拿来使用,然后使用新建运营点的数据进行迭代更新;第三种是将成熟运营点的数据和新建运营点的数据汇聚在一起进行学习,从而进行机器学习调参。
本申请的发明人发现,使用第一种方式时需要人工参与,耗时耗力,整个过程很慢;使用第二种方式时,由于各运营点之间存在差异,各运营点面对的实际情况不同,特征也不会完全相同,直接将成熟运营点的成熟的故障分析模型拿来使用会产生较大的噪声,故障分析效果不好;使用第三种方式时,考虑到数据的合规性、安全性和隐私性等因素,各运营点可能无法共享业务数据。
为了解决上述解决机器学习冷启动的整个过程速度较慢、耗时较长、噪声较大、各运营点无法共享业务数据的问题,本申请的一个实施例提供了一种模型训练方法,应用于第一方的电子设备,其中,电子设备可以为终端或服务器,本实施例以及以下各实施例以服务器为例进行说明,下面对本实施例的模型训练方法的实现细节进行具体的说明,以下内容仅为方便理解提供的实现细节,并非实施本方案的必须。
本实施例的模型训练方法的具体流程可以如图1所示,包括:
步骤101,向第三方上传第一方的模型的第一特征。
步骤102,接收第三方发送的特征矩阵。
具体而言,第一方在对第一方的模型进行训练时,可以先向第三方上传第一方的模型的第一特征,并接收第三方回传的特征矩阵,其中,特征矩阵是第三方根据第一特征,以及第二方上传的第二方的模型的第二特征生成的。
示例性地,第一方的模型和第二方的模型均为故障分析模型,第一方为新建运营点,第二方为成熟运营点,即第二方拥有成熟的、收敛的模型,作为新建运营点的第一方在进行模型训练时缺少用于训练的特征,而作为成熟运营点的第二方拥有成熟的可用于模型训练的特征,第一方和第二方分别向第三方上传自身模型的特征,即第一方向第三方上传第一特征,第二方向第三方上传第二特征,第三方对第一特征和第二特征进行融合,生成特征矩阵,并将特征矩阵回传给第一方。
在一个例子中,第三方可以和若干第一方和一个第二方保持连接关系,第一方、第二方和第三方之间的连接关系可以如图2所示,若干第一方向第三方发送第一方的模型的第一特征,第二方向第三方发送的第二方的模型的第二特征。
步骤103,根据特征矩阵对第一方的模型进行训练。
示例性地,第一方在收到第三方发送的特征矩阵后,可以根据特征矩阵对第一方的模型进行训练。
在一个例子中,第一方根据特征矩阵对第一方的模型进行训练,可以通过如图3所示的各步骤实现,具体包括:
步骤201,依次将特征矩阵中的特征作为候选特征。
步骤202,遍历第一方的数据,判断是否有与候选特对应的第一方的数据,如果是,执行步骤203,否则,执行步骤206。
步骤203,判断与候选特征对应的各第一方的数据是否全相同,如果是,执行步骤206,否则,执行步骤204。
步骤204,将该候选特征作为目标特征。
在具体实现中,第一方收到第三方发送的特征矩阵后,可以根据该特征矩阵进行特征合并,即根据特征矩阵,依次将特征矩阵中的特征作为候选特征,基于候选特征遍历第一方的数据,在特征矩阵中查找与第一方的数据对应的特征,在找到与候选特对应的第一方的数据,且找到的与候选特征对应的各第一方的数据不全相同的情况下,第一方认定该候选特征是有效的、有意义的特征,可以用于第一方的模型的训练,第一方将该候选特征作为目标特征,只保留目标特征,可以避免第一方进行无效训练,。
步骤205,根据目标特征对第一方的模型进行训练。
在具体实现中,第一方在得到目标特征后,可以根据目标特征对第一方的模型进行训练。
步骤206,忽略该候选特征。
在一个例子中,若候选特征没有对应的第一方的数据,说明该特征与第一方无关,第一方可以忽略该候选特征。
在一个例子中,若候选特征对应的第一方的数据都是相同的,说明该特征对于第一方来说没有意义,第一方则忽略该候选特征。比如:某一候选特征为“故障网元类型”,与该特征对应的第一方的数据都是“传输网元”,第一方可以忽略“故障网元类型”这个候选特征。
在一个例子中,若候选特征对应的第一方的数据小于预设的稀疏阈值,说明该候选特征对于第一方来说过于稀疏,也没有意义,第一方则忽略该候选特征。比如:候选特征为“告警自动恢复时间”,预设的稀疏阈值为3,第一方的数据中只有两条告警时自动恢复的,其余告警均不是自动恢复的,即该候选特征只有两条对应的第一方的数据,第一方可以忽略“告警自动回复时间”这个候选特征。
本实施例,第一方可以向第三方上传第一方的模型的第一特征,并实时接收第三方回传来的特征矩阵,其中,特征矩阵是第三方根据第一特征,以及第二方上传的第二方的模型的第二特征生成的,第一放在接收到第三方发送的特征矩阵后,可以根据特征矩阵对第一方的模型进行训练,本申请的实施例,通过第一方和第二方共享特征的方式,实现横向联邦学习,使得第一方可以获取到第二方的特征,实现对第一方的特征的扩充,第一方作为新建运营点时没有足够的特征,通过特征矩阵进行特征的扩充,可以快速解决新建运营点的运维冷启动问题,快速提升新建运营点的运维分析能力,从而提升用户的使用体验,同时,本申请的实施例不共享第一方和第二方的业务数据,可以保证第一方和第二方的业务数据的安全性和隐私性。
在一个实施例中,第一方根据特征矩阵对第一方的模型进行训练,可以通过如图4所示的各步骤实现,具体包括:
步骤301,对第一方的数据进行特征向量化和标签标注。
在具体实现中,第一方可以根据特征矩阵中的特征,对第一方的业务数据进行特征工程,以获得第一梯度,第一方先根据特征矩阵中的特征对第一方的数据进行特征向量化和标签标注,得到特征向量化和标签标注后的第一方的数据,即可以用于训练的数据。
步骤302,将特征向量化和标签标注后的第一方的数据输入至预设的机器学习网络中,得到第一梯度,并将第一梯度上传至第三方。
在具体实现中,第一方得到特征向量化和标签标注后的第一方的数据后,可以将特征向 量化和标签标注后的第一方的数据划分为测试数据和验证数据,按顺序输入至预设的机器学习网络中进行训练,在完成迭代训练后,第一方提取训练后的网络的梯度作为第一梯度,并发送给第三方。
在一个例子中,预设的机器学习网络可以为支持向量机的神经网络(Support Vector Machines,简称:SVM)、随机森林神经网络、图神经网络(Graph Neural Networks,简称:GNN)等。
步骤303,接收第三方发送的综合梯度。
具体而言,第一方将第一梯度发送至第三方后,可以接收第三方回传的综合梯度,其中,综合梯度是第三方根据第一梯度和第二梯度生成的,第二梯度是第二方根据第二方的数据和预设的机器学习网络训练得到的,第一方和第二方使用的是相同的网络。
步骤304,根据综合梯度对第一方的模型进行训练。
本实施例,所述根据所述特征矩阵对所述第一方的模型进行训练,包括:对所述第一方的数据进行特征向量化和标签标注;将特征向量化和标签标注后的所述第一方的数据输入至预设的机器学习网络中,得到第一梯度,并将所述第一梯度上传至所述第三方;接收所述第三方发送的综合梯度;其中,所述综合梯度是所述第三方根据所述第一梯度和第二梯度生成的,所述第二梯度是所述第二方根据所述第二方的数据和所述预设的机器学习网络训练得到的;根据所述综合梯度对所述第一方的模型进行训练,第一方、第二方各自训练得到梯度,第三方对第一方的梯度和第二方的梯度进行聚合,得到更加科学、符合真实情况的综合梯度,第一方根据综合梯度来对第一方的模型进行训练,可以进一步提升模型训练的效果。
本申请的另一个实施例提供了一种模型训练方法,应用于第一方的电子设备,下面对本实施例的模型训练方法的实现细节进行具体的说明,以下内容仅为方便理解提供的实现细节,并非实施本方案的必须,本实施例的模型训练方法的具体流程可以如图5所示,包括:
步骤401,向第三方上传第一方的模型的第一特征。
步骤402,接收第三方发送的特征矩阵。
步骤403,根据特征矩阵对第一方的模型进行训练。
其中,步骤401至步骤403与步骤101至步骤103大致相同,此处不再赘述。
步骤404,向第三方发送训练后的第一方的模型的第一评价数据。
在具体实现中,第一方在根据特征矩阵对第一方的模型进行训练后,可以向第三方发送训练后的第一方的模型的第一评价数据,其中,第一评价数据用于表征训练后的第一方的模型对第一方的数据的分类情况。
步骤405,在收到第三方发送的第一发布指示的情况下,发布训练后的第一方的模型。
在具体实现中,第一方向第三方发送训练后的第一方的模型的第一评价数据后,可以在收到第三方发送的第一发布指示的情况下,发布训练后的第一方的模型,其中,第三方在根据第一评价数据和第二评价数据判断训练后的第一方的模型收敛的情况下,向第一方发送第一发布指示,第二评价数据用于表征训练后的第二方的模型对所述第二方的数据的分类情况。
本实施例,在所述根据所述特征矩阵对所述第一方的模型进行训练之后,包括:向所述第三方发送训练后的所述第一方的模型的第一评价数据;其中,所述第一评价数据用于表征训练后的所述第一方的模型对所述第一方的数据的分类情况;在收到所述第三方发送的第一 发布指示的情况下,发布训练后的所述第一方的模型;其中,所述第三方在根据所述第一评价数据和第二评价数据判断训练后的所述第一方的模型收敛的情况下,向所述第一方发送所述第一发布指示,所述第二评价数据用于表征训练后的所述第二方的模型对所述第二方的数据的分类情况,由第三方根据第一评价数据和第二评价数据判断第一方的模型是否收敛,若判断第一方的模型收敛,向第一方发送第一发布指示,可以更科学、准确地判断第一方的模型是否收敛,以获得效果更好的模型。
本申请的另一个实施例提供了一种模型训练方法,应用于第三方的电子设备,下面对本实施例的模型训练方法的实现细节进行具体的说明,以下内容仅为方便理解提供的实现细节,并非实施本方案的必须,本实施例的模型训练方法的具体流程可以如图6所示,包括:
步骤501,获取第一方发送的第一方的模型的第一特征和第二方发送的第二方的模型的第二特征。
在具体实现中,第一方在建立完成后,可以向第三方发送第一方的模型的第一特征,第三方选定第二方,指示第二方向第三方发送的第二方的模型的第二特征。
在一个例子中,第三方可以拉起若干第一方和若干第二方进行本实施例的模型训练方法。
步骤502,根据第一特征和第二特征,生成特征矩阵。
步骤503,向第一方发送特征矩阵,供第一方根据特征矩阵对第一方的模型进行训练,
在具体实现中,第三方在收到第一方发送的第一特征和第二方发送的第二特征后,可以对第一特征和第二特征进行汇聚,生成特征矩阵,并向第一方发送该特征矩阵,特征矩阵中包含第一特征和第二特征,第一方在收到特征矩阵后,可以根据特征矩阵对第一方的模型进行训练。
本申请的另一个实施例提供了一种模型训练方法,应用于第三方的电子设备,下面对本实施例的模型训练方法的实现细节进行具体的说明,以下内容仅为方便理解提供的实现细节,并非实施本方案的必须,本实施例的模型训练方法的具体流程可以如图7所示,包括:
步骤601,接收第一方发送的第一方的模型的第一特征和第二方发送的第二方的模型的第二特征。
步骤602,根据第一特征和第二特征,生成特征矩阵。
步骤603,向第一方发送特征矩阵,供第一方根据特征矩阵对第一方的模型进行训练。
其中,步骤601至步骤603与步骤501至步骤503大致相同,此处不再赘述。
步骤604,接收第一方发送的第一梯度和第二方发送的第二梯度。
在具体实现中,第三方可以实时获取第一方向第三方发送的第一梯度和第二方向第三方发送的第二梯度,第一梯度是第一方基于特征矩阵、第一方的数据和预设的机器学习网络训练得到的,第二梯度是第二方基于第二方的数据和预设的机器学习网络训练得到的,其中,第一方使用的预设的机器学习网络与第二方使用的预设的机器学习网络是相同的神经网络。
步骤605,对第一梯度和第二梯度进行聚合,生成综合梯度。
步骤606,向第一方发送综合梯度。
在具体实现中,第三方在收到第一方发送的第一梯度和第二方发送的第二梯度后,第三方可以根据预设的聚合算法,对第一梯度和第二梯度进行聚合,生成综合梯度,并向第一方 发送综合梯度,供第一方根据所述梯度对第一方的模型进行训练,预设的聚合算法可以由本领域的技术人员根据实际需要进行设置,本申请的实施例对此不做具体限定。
在一个例子中,第三方在向第一方发送综合梯度之后,还可以向第二方发送综合梯度,供第二方根据综合梯度对第二方的模型进行训练。
在一个例子中,第一方向第三方发送的第一梯度为经过加密的第一梯度,第二方向第三方发送的第二梯度为经过加密的第二梯度,其中,第一梯度的加密方式与第二梯度的加密方式相同,对第一梯度、第二梯度进行加密,可以有效防止梯度攻击,提升整个模型训练过程的安全性、可靠性。
在一个例子中,第一方的数据和第二方的数据均为独立同分布的数据,若第一方和第二方不是相互独立的,进行横向联邦学习没有意义,因此要求第一方与第二方是相互独立的,防止进行无效的横向联邦学习,避免资源浪费,本实施例还要求第一方的数据和第二方的数据为同分布,可以有效加快训练速度,进一步减少解决机器学习冷启动的问题的耗时,进一步快速提升新建运营点的运维分析能力。
本申请的另一个实施例提供了一种模型训练方法,应用于第三方的电子设备,下面对本实施例的模型训练方法的实现细节进行具体的说明,以下内容仅为方便理解提供的实现细节,并非实施本方案的必须,本实施例的模型训练方法的具体流程可以如图8所示,包括:
步骤701,接收第一方发送的第一方的模型的第一特征和第二方发送的第二方的模型的第二特征。
步骤702,根据第一特征和第二特征,生成特征矩阵。
步骤703,向第一方发送特征矩阵,供第一方根据特征矩阵对第一方的模型进行训练。
其中,步骤701至步骤703与步骤501至步骤503大致相同,此处不再赘述。
步骤704,接收第一方发送的训练后的第一方的模型的第一评价数据,和第二方发送的训练后的第二方的模型的第二评价数据。
具体而言,第一评价数据用于表征训练后的第一方的模型对第一方的数据的分类情况,第二评价数据用于表征训练后的第二方的模型对第二方的数据的分类情况。
在一个例子中,模型以一个二分类模型的为例,若第一方有80个第一方的数据,其中,实际为真且学习为真的数据的数量为35,实际为真且学习为假的数据的数量为10,实际为假且学习为真的数据的数量为5,实际为假且学习为假的数据的数量为20,那么评价数据是35,10,5,20这一组数字,考虑到第一方缺乏可进行标签标注的数据,传输评价指标的效果不好,因此本实施例获取的是评价数据,可以使模型训练的过程更加科学。
步骤705,根据第一评价数据和第二评价数据,判断训练后的第一方的模型是否收敛。
步骤706,在确定训练后的第一方的模型收敛的情况下,向第一方发送第一发布指示,供第一方发布训练后的第一方的模型。
在一个例子中,第一方为若干个,第一评价数据为各第一方的第一评价数据,第三方可以根据各第一方的第一评价数据,计算全局评价值,根据各第一方的第一评价数据和第二评价数据,计算参考评价值,第三方将全局评价值和参考评价值进行比较,在确定全局评价值与参考评价值的差值小于预设的第一阈值的情况下,确定训练后的各第一方的模型收敛,向 各第一方发送第一发布指示,供各第一方发布训练后的第一方的模型,其中,预设的第一阈值可以由本领域的技术人员根据实际需要进行设置。
在一个例子中,第三方还可以接收第二方发送的训练后的第二方的模型的评价指标值;其中,评价指标值包括以下任意组合:训练后的第二方的模型的准确率、训练后的第二方的模型的精确率和训练后的第二方的模型的召回率,第三方在收到第二方发送的训练后的第二方的模型的评价指标值后,可以判断训练后的第二方的模型的评价指标值是否大于训练前的第二方的模型的评价指标值,第三方在确定训练后的第二方的模型的评价指标值高于训练前的第二方的模型的评价指标值,或者训练后的第二方的模型的评价指标值与训练前的第二方的模型的评价指标值的差值小于预设的第二阈值的情况下,确定训练后的第二方的模型收敛,向第二方发送第二发布指示,供第二方发布训练后的所述第二方的模型,其中,预设的第二阈值可以由本领域的技术人员根据实际需要进行设置,只要训练后的第二方的模型的评价指标值没有下降过多,都是可以接受的,可以认为训练后的模型是收敛的,从而增强第二方的模型的训练效果。
在一个例子中,评价指标值为第一方的模型的精确率,训练前的第二方的模型的精确率为98%,预设的第二阈值为3%,若训练后的第二方的模型的精确率为98.7%,则第三方可以确定训练后的第二方的模型收敛;若训练后的第二方的模型的精确率为96%,则第三方可以确定训练后的第二方的模型收敛;若训练后的第二方的模型的精确率为94.4%,则第三方可以确定训练后的第二方的模型不收敛。
本申请的另一个实施例涉及一种模型训练系统,下面对本实施例的模型训练系统的细节进行具体的说明,以下内容仅为方便理解提供的实现细节,并非实施本例的必须,图9是本实施例所述的模型训练系统的示意图,包括:第一方801、第二方802和第三方803。
第一方801用于向第三方803发送第一方801的模型的第一特征;
第二方802用于向第三方803发送第二方802的模型的第二特征;
第三方803用于根据第一特征和第二特征,生成特征矩阵,并向第一方801发送特征矩阵。
第一方801还用于根据特征矩阵对第一方801的模型进行训练。
不难发现,本实施例为与上述方法实施例对应的系统实施例,本实施例可以与上述方法实施例互相配合实施。上述实施例中提到的相关技术细节和技术效果在本实施例中依然有效,为了减少重复,这里不再赘述。相应地,本实施例中提到的相关技术细节也可应用在上述实施例中。
值得一提的是,本实施例中所涉及到的各模块均为逻辑模块,在实际应用中,一个逻辑单元可以是一个物理单元,也可以是一个物理单元的一部分,还可以以多个物理单元的组合实现。此外,为了突出本申请的创新部分,本实施例中并没有将与解决本申请所提出的技术问题关系不太密切的单元引入,但这并不表明本实施例中不存在其它的单元。
本申请另一个实施例涉及一种电子设备,如图10所示,包括:至少一个处理器901;以及,与所述至少一个处理器901通信连接的存储器902;其中,所述存储器902存储有可被所述至少一个处理器901执行的指令,所述指令被所述至少一个处理器901执行,以使所述 至少一个处理器901能够执行上述各实施例中应用于第一方的模型训练方法,或者执行上述各实施例中应用于第三方的模型训练方法。
其中,存储器和处理器采用总线方式连接,总线可以包括任意数量的互联的总线和桥,总线将一个或多个处理器和存储器的各种电路连接在一起。总线还可以将诸如外围设备、稳压器和功率管理电路等之类的各种其他电路连接在一起,这些都是本领域所公知的,因此,本文不再对其进行进一步描述。总线接口在总线和收发机之间提供接口。收发机可以是一个元件,也可以是多个元件,比如多个接收器和发送器,提供用于在传输介质上与各种其他装置通信的单元。经处理器处理的数据通过天线在无线介质上进行传输,进一步,天线还接收数据并将数据传送给处理器。
处理器负责管理总线和通常的处理,还可以提供各种功能,包括定时,外围接口,电压调节、电源管理以及其他控制功能。而存储器可以被用于存储处理器在执行操作时所使用的数据。
本申请另一个实施例涉及一种计算机可读存储介质,存储有计算机程序。计算机程序被处理器执行时实现上述方法实施例。
即,本领域技术人员可以理解,实现上述实施例方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序存储在一个存储介质中,包括若干指令用以使得一个设备(可以是单片机,芯片等)或处理器(processor)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,简称:ROM)、随机存取存储器(Random Access Memory,简称:RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
本领域的普通技术人员可以理解,上述各实施例是实现本申请的具体实施例,而在实际应用中,可以在形式上和细节上对其作各种改变,而不偏离本申请的精神和范围。

Claims (13)

  1. 一种模型训练方法,应用于第一方,所述方法包括:
    向第三方上传所述第一方的模型的第一特征;
    接收所述第三方发送的特征矩阵;其中,所述特征矩阵是所述第三方根据所述第一特征和第二方上传的所述第二方的模型的第二特征生成的;
    根据所述特征矩阵对所述第一方的模型进行训练。
  2. 根据权利要求1所述的模型训练方法,其中,所述根据所述特征矩阵对所述第一方的模型进行训练,包括:
    依次将所述特征矩阵中的特征作为候选特征;
    遍历所述第一方的数据,在所述候选特征有对应的第一方的数据,且所述候选特征对应的各第一方的数据不全相同的情况下,将所述候选特征作为目标特征;
    根据所述目标特征对所述第一方的模型进行训练。
  3. 根据权利要求1所述的模型训练方法,其中,所述根据所述特征矩阵对所述第一方的模型进行训练,包括:
    对所述第一方的数据进行特征向量化和标签标注;
    将特征向量化和标签标注后的所述第一方的数据输入至预设的机器学习网络中,得到第一梯度,并将所述第一梯度上传至所述第三方;
    接收所述第三方发送的综合梯度;其中,所述综合梯度是所述第三方根据所述第一梯度和第二梯度生成的,所述第二梯度是所述第二方根据所述第二方的数据和所述预设的机器学习网络训练得到的;
    根据所述综合梯度对所述第一方的模型进行训练。
  4. 根据权利要求1至3中任一项所述的模型训练方法,其中,在所述根据所述特征矩阵对所述第一方的模型进行训练之后,包括:
    向所述第三方发送训练后的所述第一方的模型的第一评价数据;其中,所述第一评价数据用于表征训练后的所述第一方的模型对所述第一方的数据的分类情况;
    在收到所述第三方发送的第一发布指示的情况下,发布训练后的所述第一方的模型;其中,所述第三方在根据所述第一评价数据和第二评价数据判断训练后的所述第一方的模型收敛的情况下,向所述第一方发送所述第一发布指示,所述第二评价数据用于表征训练后的所述第二方的模型对所述第二方的数据的分类情况。
  5. 一种模型训练方法,应用于第三方,所述方法包括:
    接收第一方发送的第一方的模型的第一特征和第二方发送的第二方的模型的第二特征;
    根据所述第一特征和所述第二特征,生成特征矩阵;
    向所述第一方发送所述特征矩阵;其中,所述特征矩阵用于供所述第一方根据所述特征矩阵对所述第一方的模型进行训练。
  6. 根据权利要求5所述的模型训练方法,其中,在所述向所述第一方发送所述特征矩阵之后,包括:
    接收所述第一方发送的第一梯度和所述第二方发送的第二梯度;其中,所述第一梯度是所述第一方基于所述特征矩阵、所述第一方的数据和预设的机器学习网络训练得到的,所述 第二梯度是所述第二方基于所述第二方的数据和所述预设的机器学习网络训练得到的;
    对所述第一梯度和所述第二梯度进行聚合,生成综合梯度;
    向所述第一方发送所述综合梯度;其中,所述综合梯度至少用于供所述第一方根据所述综合梯度对所述第一方的模型进行训练。
  7. 根据权利要求5所述的模型训练方法,其中,在所述向所述第一方发送所述特征矩阵之后,包括:
    接收所述第一方发送的训练后的所述第一方的模型的第一评价数据,和所述第二方发送的训练后的所述第二方的模型的第二评价数据;其中,所述第一评价数据用于表征训练后的所述第一方的模型对所述第一方的数据的分类情况,所述第二评价数据用于表征训练后的所述第二方的模型对所述第二方的数据的分类情况;
    根据所述第一评价数据和所述第二评价数据,判断训练后的所述第一方的模型是否收敛;
    在确定所述训练后的所述第一方的模型收敛的情况下,向所述第一方发送第一发布指示,供所述第一方发布训练后的所述第一方的模型。
  8. 根据权利要求7所述的模型训练方法,其中,所述第一方为若干个,所述第一评价数据为各所述第一方的第一评价数据;
    所述根据所述第一评价数据和所述第二评价数据,判断训练后的所述第一方的模型是否收敛,包括:
    根据各所述第一方的第一评价数据,计算全局评价值;
    根据各所述第一方的第一评价数据和所述第二评价数据,计算参考评价值;
    在所述全局评价值与所述参考评价值的差值小于预设的第一阈值的情况下,确定训练后的各所述第一方的模型收敛,向各所述第一方发送第一发布指示,供各所述第一方发布训练后的所述第一方的模型。
  9. 根据权利要求6所述的模型训练方法,其中,在所述向所述第一方发送所述综合梯度之后,包括:
    向所述第二方发送所述综合梯度;其中,所述综合梯度还用于供所述第二方根据所述综合梯度对所述第二方的模型进行训练。
  10. 根据权利要求9所述的模型训练方法,其中,在所述向所述第二方发送所述综合梯度之后,包括:
    接收所述第二方发送的训练后的所述第二方的模型的评价指标值;其中,所述评价指标值包括以下任意组合:训练后的所述第二方的模型的准确率、训练后的所述第二方的模型的精确率和训练后的所述第二方的模型的召回率;
    在所述评价指标值高于训练前的所述第二方的模型的评价指标值,或者所述评价指标值与训练前的所述第二方的模型的评价指标值的差值小于预设的第二阈值的情况下,确定训练后的所述第二方的模型收敛,向所述第二方发送第二发布指示,供所述第二方发布训练后的所述第二方的模型。
  11. 一种模型训练系统,包括:第一方,第二方和第三方;
    所述第一方用于向所述第三方发送所述第一方的模型的第一特征;
    所述第二方用于向所述第三方发送所述第二方的模型的第二特征;
    所述第三方用于根据所述第一特征和所述第二特征,生成特征矩阵,并向所述第一方发 送所述特征矩阵;
    所述第一方还用于根据所述特征矩阵对所述第一方的模型进行训练。
  12. 一种电子设备,包括:
    至少一个处理器;以及,
    与所述至少一个处理器通信连接的存储器;其中,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行如权利要求1至4中任一项所述的模型训练方法,或执行如权利要求5至10中任一项所述的模型训练方法。
  13. 一种计算机可读存储介质,存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1至4中任一项所述的模型训练方法,或实现如权利要求5至10中任一项所述的模型训练方法。
PCT/CN2022/087439 2021-09-30 2022-04-18 模型训练方法、系统、电子设备和计算机可读存储介质 WO2023050778A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111162667.8 2021-09-30
CN202111162667.8A CN115936659A (zh) 2021-09-30 2021-09-30 模型训练方法、系统、电子设备和计算机可读存储介质

Publications (1)

Publication Number Publication Date
WO2023050778A1 true WO2023050778A1 (zh) 2023-04-06

Family

ID=85780411

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/087439 WO2023050778A1 (zh) 2021-09-30 2022-04-18 模型训练方法、系统、电子设备和计算机可读存储介质

Country Status (2)

Country Link
CN (1) CN115936659A (zh)
WO (1) WO2023050778A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109492420A (zh) * 2018-12-28 2019-03-19 深圳前海微众银行股份有限公司 基于联邦学习的模型参数训练方法、终端、系统及介质
US20190228338A1 (en) * 2018-01-19 2019-07-25 Hyperdyne, Inc. Coordinated learning using distributed average consensus
CN110490738A (zh) * 2019-08-06 2019-11-22 深圳前海微众银行股份有限公司 一种混合联邦学习方法及架构
CN112183730A (zh) * 2020-10-14 2021-01-05 浙江大学 一种基于共享学习的神经网络模型的训练方法
CN112862011A (zh) * 2021-03-31 2021-05-28 中国工商银行股份有限公司 基于联邦学习的模型训练方法、装置及联邦学习系统

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190228338A1 (en) * 2018-01-19 2019-07-25 Hyperdyne, Inc. Coordinated learning using distributed average consensus
CN109492420A (zh) * 2018-12-28 2019-03-19 深圳前海微众银行股份有限公司 基于联邦学习的模型参数训练方法、终端、系统及介质
CN110490738A (zh) * 2019-08-06 2019-11-22 深圳前海微众银行股份有限公司 一种混合联邦学习方法及架构
CN112183730A (zh) * 2020-10-14 2021-01-05 浙江大学 一种基于共享学习的神经网络模型的训练方法
CN112862011A (zh) * 2021-03-31 2021-05-28 中国工商银行股份有限公司 基于联邦学习的模型训练方法、装置及联邦学习系统

Also Published As

Publication number Publication date
CN115936659A (zh) 2023-04-07

Similar Documents

Publication Publication Date Title
CN110084377B (zh) 用于构建决策树的方法和装置
EP3971798A1 (en) Data processing method and apparatus, and computer readable storage medium
WO2022111068A1 (zh) Rru欠压风险预测方法、装置、系统、设备及介质
CN104363072A (zh) 一种错误信息传递转义方法、装置和系统
WO2020199785A1 (zh) 私有数据的处理方法、计算方法及所适用的设备
WO2023098374A1 (zh) 网络资源部署方法、装置、电子设备及存储介质
CN112307331A (zh) 一种基于区块链高校毕业生智能招聘信息推送方法、系统及终端设备
CN111680900A (zh) 一种工单发布方法、装置、电子设备及存储介质
WO2021139476A1 (zh) 交集数据的生成方法和基于交集数据的联邦模型训练方法
CN115563859A (zh) 一种基于分层联邦学习的电力负荷预测方法、装置及介质
CN112291305A (zh) 基于统一标识的码链构建方法及装置
WO2023050778A1 (zh) 模型训练方法、系统、电子设备和计算机可读存储介质
CN106874371A (zh) 一种数据处理方法及装置
CN110855802B (zh) 职教诊改系统的数据分片分发存储方法、装置及服务器
CN116800671A (zh) 数据传输方法、装置、计算机设备、存储介质和程序产品
CN116860470A (zh) 数据传输方法、装置、计算机设备和存储介质
CN116032590A (zh) 一种ddos攻击的检测模型训练方法及相关装置
CN113821811B (zh) 基于区块链的数据获取方法及系统、电子设备及存储介质
CN115460617A (zh) 基于联邦学习的网络负载预测方法、装置、电子设备及介质
CN113672361A (zh) 分布式数据处理系统、方法、服务器和可读存储介质
CN113472715A (zh) 数据传输方法和装置
WO2023168976A1 (zh) 光传送网性能预测方法、系统、电子设备及存储介质
CN113379463B (zh) 一种网点选址方法、装置、设备和存储介质
WO2023124312A1 (zh) 联合学习中的预测方法及装置
WO2024001507A1 (zh) 数据处理方法、系统、装置、设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22874177

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE