CN116756536B - Data identification method, model training method, device, equipment and storage medium - Google Patents

Data identification method, model training method, device, equipment and storage medium Download PDF

Info

Publication number
CN116756536B
CN116756536B CN202311034853.2A CN202311034853A CN116756536B CN 116756536 B CN116756536 B CN 116756536B CN 202311034853 A CN202311034853 A CN 202311034853A CN 116756536 B CN116756536 B CN 116756536B
Authority
CN
China
Prior art keywords
model
data
gradient
server
local
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311034853.2A
Other languages
Chinese (zh)
Other versions
CN116756536A (en
Inventor
徐聪
李仁刚
贾麒
刘璐
范宝余
金良
闫瑞栋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Electronic Information Industry Co Ltd
Original Assignee
Inspur Electronic Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Electronic Information Industry Co Ltd filed Critical Inspur Electronic Information Industry Co Ltd
Priority to CN202311034853.2A priority Critical patent/CN116756536B/en
Publication of CN116756536A publication Critical patent/CN116756536A/en
Application granted granted Critical
Publication of CN116756536B publication Critical patent/CN116756536B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a data identification method, a model training method, a device, equipment and a storage medium, relates to the technical field of computers, and aims to solve the problem that multi-source domain data cannot be identified rapidly and efficiently in the traditional technology, wherein the data identification method is applied to a client and comprises the following steps: acquiring an initial recognition model; the initial recognition model comprises a local model and a global model; training the initial recognition model by using a local sample to obtain a local model gradient and a global model gradient; uploading the global model gradients to a server so that the server updates the server model parameters by utilizing the global model gradients to obtain server model updating parameters; updating local model parameters by using the local model gradient, and updating global model parameters by using the server model updating parameters until a data identification model meeting preset conditions is obtained; and executing data identification operation by using the data identification model.

Description

Data identification method, model training method, device, equipment and storage medium
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a data identification method, a model training method, a device, equipment, and a storage medium.
Background
Along with the development of a knowledge spectrogram, the map construction of a cross-domain and cross-mechanism is an important task which needs to be completed urgently, and entity identification is used as the basis of the knowledge map construction, so that entity identification of field text data needs to be completed first. But if a separate identification model is built for each domain, the time and labor costs will increase significantly. However, labeling data in a single domain is often limited, labeling these domains with sufficient labeling data is expensive and time consuming, and requires a great deal of expertise related to the domain. Furthermore, while many platforms may have some annotated data sets, since these domain data contain user and company information and are highly sensitive to privacy and security, these data cannot be shared directly to complete model training.
Therefore, how to quickly and efficiently identify multi-source domain data while ensuring the security of heterogeneous multi-source domain data is a technical problem that needs to be solved by those skilled in the art.
Disclosure of Invention
The invention aims to provide a data identification method which can quickly and efficiently identify multi-source domain data while ensuring the safety of heterogeneous multi-source domain data; another object of the present invention is to provide a model training method, a data recognition device, a model training device, an electronic apparatus, a computer readable storage medium, which also have the above-mentioned advantageous effects.
In a first aspect, the present invention provides a data identification method, applied to a client, including:
acquiring an initial recognition model; the initial recognition model comprises a local model and a global model;
training the initial recognition model by using a local sample to obtain a local model gradient and a global model gradient;
Uploading the global model gradients to a server so that the server updates the server model parameters by utilizing the global model gradients to obtain server model updating parameters;
Updating local model parameters by using the local model gradient, and updating global model parameters by using the server model updating parameters until a data identification model meeting preset conditions is obtained;
and executing data identification operation by using the data identification model.
Optionally, the training the initial recognition model by using the local sample to obtain a local model gradient and a global model gradient includes:
Processing the local samples by using the initial recognition model to obtain probability distribution of each local sample;
Obtaining a current loss function of the initial recognition model according to each probability distribution calculation;
Calculating by using the current loss function and the local model parameters to obtain the local model gradient;
and calculating by using the current loss function and the global model parameter to obtain the global model gradient.
Optionally, the processing the local samples with the initial recognition model to obtain probability distribution of each local sample includes:
For each local sample, carrying out text division on the local sample to obtain each text data;
combining the word data into phrase data according to the word data;
determining the serial number of each text data and the serial number of each phrase data;
Generating an absolute head position and an absolute tail position of each text data according to the serial number of each text data;
generating an absolute head position and an absolute tail position of each phrase data according to the serial number of each phrase data;
Generating a model input sequence according to the sequence number, the absolute head position and the absolute tail position of each word data and the sequence number, the absolute head position and the absolute tail position of each phrase data;
And inputting the model input sequence corresponding to each local sample into the initial recognition model to obtain probability distribution of each local sample.
Optionally, the combining according to each text data into each phrase data includes:
and obtaining phrase data corresponding to each text data by inquiring a dictionary.
Optionally, the determining the sequence number of each text data and the sequence number of each phrase data includes:
And determining the serial number of each text data and the serial number of each phrase data by inquiring a preset word list.
Optionally, the generating the absolute head position and the absolute tail position of each text data according to the serial number of each text data includes:
for each text data, determining the position information of the text data in the local sample according to the serial number of the text data;
And generating an absolute head position and an absolute tail position of the text data according to the position information.
Optionally, the generating the absolute head position and the absolute tail position of each phrase data according to the serial number of each phrase data includes:
For each phrase data, determining each text data in the phrase data;
And generating the absolute head position and the absolute tail position of the phrase data according to the absolute head position and the absolute tail position of each text data.
Optionally, the generating a model input sequence according to the sequence number, the absolute head position, the absolute tail position of each text data and the sequence number, the absolute head position and the absolute tail position of each phrase data includes:
for each text data, calculating according to the serial number, the absolute head position and the absolute tail position of the text data to obtain a feature vector corresponding to the text data;
for each phrase data, calculating and obtaining a feature vector corresponding to the phrase data according to the sequence number, the absolute head position and the absolute tail position of the phrase data;
And combining the characteristic vector of each text data and the characteristic vector of each phrase data into the model input sequence.
Optionally, the calculating according to the serial number, the absolute head position and the absolute tail position of the text data to obtain the feature vector corresponding to the text data includes:
converting the serial number of the text data into a serial number vector;
converting the absolute head position of the text data into an absolute head position vector;
converting the absolute tail position of the text data into an absolute tail position vector;
and vector addition calculation is carried out on the serial number vector, the absolute head position vector and the absolute tail position vector of the text data, so as to obtain the feature vector corresponding to the text data.
Optionally, the calculating according to the sequence number, the absolute head position and the absolute tail position of the phrase data to obtain the feature vector corresponding to the phrase data includes:
converting the sequence number of the phrase data into a sequence number vector;
Converting the absolute head position of the phrase data into an absolute head position vector;
Converting the absolute tail position of the phrase data into an absolute tail position vector;
And vector addition calculation is carried out on the sequence number vector, the absolute head position vector and the absolute tail position vector of the phrase data, so as to obtain the feature vector corresponding to the phrase data.
Optionally, the updating the local model parameters by using the local model gradient, updating the global model parameters by using the server model updating parameters until obtaining a data identification model meeting a preset condition, including:
And updating local model parameters by using the local model gradient, and updating global model parameters by using the server model updating parameters until a data identification model with model loss reaching a preset threshold value is obtained.
Optionally, the updating the local model parameters with the local model gradient includes:
determining the current value of the local model parameter, and acquiring a preset learning rate;
And calculating to obtain an updated value of the local model parameter according to the local model gradient, the current value of the model parameter and the preset learning rate.
Optionally, the updating the global model parameter with the server model update parameter includes:
And taking the current value of the server model updating parameter as the updating value of the global model parameter.
Optionally, the uploading the global model gradient to a server, so that the server updates the server model parameters by using the global model gradient to obtain server model update parameters, including:
uploading the global model gradient to the server so that the server can aggregate the global model gradient uploaded by each client to obtain an aggregate gradient, and updating the server model parameters by utilizing the aggregate gradient to obtain the server model updating parameters.
Optionally, the uploading the global model gradient to the server, so that the server aggregates the global model gradient uploaded by each client to obtain an aggregated gradient, including:
Uploading the global model gradient to the server so that the server can perform weighted average calculation on the global model gradient uploaded by each client by using an aggregation formula to obtain the aggregation gradient;
The aggregation formula is as follows:
wherein, For the polymerization gradient,/>For/>Global model gradient of individual clients,/>For/>Sample duty cycle of individual clients,/>For/>Weights of individual clients,/>For/>Local sample number of individual clients,/>For all clients/>Is/is the total number of samplesFor/>Gradient change of individual clients,/>Is a natural constant.
Optionally, before training the initial recognition model by using the local sample to obtain the local model gradient and the global model gradient, the method further includes:
acquiring the size of a training sample batch in the server;
Dividing the local global samples according to the size of the training sample batch to obtain each local sample.
Optionally, the global model comprises an embedding layer, a self-attention layer and a normalization layer; the local model comprises a feedforward neural network layer, a normalization layer and a conditional random field layer.
In a second aspect, the present invention provides another data identification method, applied to a client, including:
acquiring an initial recognition model; the initial recognition model comprises a local model and a global model;
training the initial recognition model by using a local sample to obtain a local model gradient and a global model gradient;
Uploading the global model gradients to a server so that the server updates the server model parameters by utilizing the global model gradients to obtain server model updating parameters;
And updating local model parameters by using the local model gradient, and updating global model parameters by using the server model updating parameters until a data identification model meeting preset conditions is obtained.
In a third aspect, the present invention also discloses a data identification device, and an application client, including:
the first acquisition module is used for acquiring an initial recognition model; the initial recognition model comprises a local model and a global model;
The first training module is used for training the initial recognition model by utilizing a local sample to obtain a local model gradient and a global model gradient;
The first uploading module is used for uploading the global model gradient to a server so that the server can update the server model parameters by utilizing the global model gradient to obtain server model updating parameters;
The first updating module is used for updating local model parameters by utilizing the local model gradient, and updating global model parameters by utilizing the server model updating parameters until a data identification model meeting preset conditions is obtained;
And the execution module is used for executing data identification operation by utilizing the data identification model.
In a fourth aspect, the present invention also discloses another data identification device, which is applied to a client, and includes:
the second acquisition module is used for acquiring an initial recognition model; the initial recognition model comprises a local model and a global model;
the second training module is used for training the initial recognition model by utilizing a local sample to obtain a local model gradient and a global model gradient;
The second uploading module is used for uploading the global model gradient to a server so that the server can update the server model parameters by utilizing the global model gradient to obtain server model updating parameters;
And the second updating module is used for updating the local model parameters by utilizing the local model gradient, and updating the global model parameters by utilizing the server model updating parameters until a data identification model meeting preset conditions is obtained.
In a fifth aspect, the present invention discloses an electronic device, comprising:
a memory for storing a computer program;
and a processor for implementing the steps of any one of the data recognition methods and/or the steps of any one of the model training methods as described above when executing the computer program.
In a sixth aspect, the present invention discloses a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of any one of the data recognition methods and/or the steps of any one of the model training methods described above.
The technical scheme provided by the invention has the following technical effects:
According to the data identification method provided by the invention, the initial identification model is arranged at each client for training the data identification model, and the initial identification model consists of a local model and a global model, in the model training process, the local sample of the client is used for training the initial identification model to respectively obtain the local model gradient and the global model gradient, wherein the global model gradient is uploaded to a server, so that the server can update the server model parameters by using the global model gradient uploaded by each client, the updated server parameters can be used for updating the client global model parameters, the local model gradient is used for updating the client local model parameters, and thus, the data identification model meeting preset conditions can be obtained through iterative training.
The invention also discloses a model training method, a data identification device, a model training device, electronic equipment and a computer readable storage medium, which have the technical effects as well, and the invention is not repeated here.
Drawings
In order to more clearly illustrate the technical solutions in the prior art and the embodiments of the present invention, the following will briefly describe the drawings that need to be used in the description of the prior art and the embodiments of the present invention. Of course, the following drawings related to embodiments of the present invention are only a part of embodiments of the present invention, and it will be obvious to those skilled in the art that other drawings can be obtained from the provided drawings without any inventive effort, and the obtained other drawings also fall within the scope of the present invention.
FIG. 1 is a schematic flow chart of a data identification method provided by the invention;
FIG. 2 is a schematic flow chart of a model training method according to the present invention;
FIG. 3 is a flowchart of another data identification method according to the present invention;
FIG. 4 is a schematic diagram of a data identification model according to the present invention;
FIG. 5 is a schematic diagram of a data identification system according to the present invention;
Fig. 6 is a schematic structural diagram of a data recognition device according to the present invention;
FIG. 7 is a schematic diagram of a model training apparatus according to the present invention;
Fig. 8 is a schematic structural diagram of an electronic device according to the present invention.
Detailed Description
The core of the invention is to provide a data identification method which can quickly and efficiently identify multi-source domain data while ensuring the safety of heterogeneous multi-source domain data; another core of the present invention is to provide a model training method, a data recognition device, a model training device, an electronic apparatus, a computer readable storage medium, which also have the above-mentioned advantageous effects.
In order to more clearly and completely describe the technical solutions in the embodiments of the present invention, the technical solutions in the embodiments of the present invention will be described below with reference to the accompanying drawings in the embodiments of the present invention. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
An embodiment of the invention provides a data identification method.
Referring to fig. 1, fig. 1 is a flowchart of a data identification method provided by the present invention, where the data identification method is applied to a client, and includes S101 to S105.
S101: acquiring an initial recognition model; the initial recognition model includes a local model and a global model.
Firstly, it should be noted that the data identification method provided in this embodiment is applied to a client. Specifically, different source domains correspond to different clients, each client establishes a communication connection relationship with the server, and training of the data recognition model on each client is achieved through mutual cooperation of the client and the server, namely simultaneous construction of the data recognition model on each source domain client is achieved.
Further, this step aims at achieving the acquisition of the initial recognition model. Specifically, initial recognition models are deployed on each source domain client, the model structures of the initial recognition models are consistent, and the data recognition models applicable to the source domain clients can be obtained by training the initial recognition models by using local sample data on the source domain clients. The data identification model is divided into a local model and a global model according to a network structure, namely, one part of the network structure in the data identification model is a local model part, the other part of the network structure is a global model part, and the local model part carries out model training based on local sample data of a client so as to effectively avoid data exchange among source domains and ensure data safety; and the global model part is trained by the local sample collaboration server of the client so as to mutually fuse the characterization information of the server and other source domains and ensure the accuracy of the model.
In one possible implementation, the global model may include an embedding layer, a self-attention layer, and a normalization layer; the local model may include a Feed-Forward neural Network layer (FFN), a normalization layer, a conditional random field layer (CRF, conditional Random Fields).
It can be appreciated that since the embedding layer, the self-attention layer and the normalization layer generally learn general semantic features and entity characterizations, they can be divided into global models, and can be mutually fused by the characterization information of the server and other source domains; the feedforward neural network layer, the normalization layer and the conditional random field layer behind the model are used for learning specific knowledge of the source domain to conduct specific data classification, classification can be conducted according to the characteristics of the global model part, and local models of different source domains can learn specific type information in different data, so that the model can be divided into local models, the local models are not fused with other source domains through a server, knowledge of the source domain can be reserved, and the data identification model of the source domain can be used for better identifying data on the source domain.
S102: and training the initial recognition model by using the local sample to obtain a local model gradient and a global model gradient.
This step aims at achieving training of the initial recognition model. In this process, a local sample of the current source domain client (tag setup is completed) may be obtained, and then the local sample is input into the initial recognition model for training, so as to obtain gradient information of the local model (i.e., the local model gradient) and gradient information of the global model (i.e., the global model gradient). It should be noted that, for different source domain clients, the model training is performed by using own local samples, and data interaction with other source domain clients is not needed, so that the security of the data information in the source domain can be effectively ensured.
S103: and uploading the global model gradient to a server so that the server updates the server model parameters by using the global model gradient to obtain server model updating parameters.
The step aims at realizing the uploading processing of the global model gradient, namely uploading the global model gradient to a server for subsequent processing. For each source domain client, after a round of model training is completed to obtain a local model gradient and a global model gradient, the global model gradient can be uploaded to a server, so that the server collects the global model gradients uploaded by all source domain clients and then performs parameter updating processing by using all global model gradients, which means updating the server model parameters to obtain server model updating parameters. The server model parameters correspond to global model parameters in each source domain client, that is, the global model gradients of all source domain clients are integrated by combining a central server to update parameters of the global model in the source domain client.
S104: and updating local model parameters by using the local model gradient, and updating global model parameters by using the server model updating parameters until a data identification model meeting preset conditions is obtained.
The step aims at realizing iterative updating of parameters (including local model parameters and global model parameters) so as to obtain a data identification model. Specifically, local model parameters can be updated by using local model gradients, global model parameters are updated by using server model updating parameters, and accordingly update processing of the local model parameters and the global model parameters is achieved, and therefore optimal local model parameters and optimal global model parameters can be obtained through iterative updating, and accordingly, a training model based on the optimal local model parameters and the optimal global model parameters, namely, a data identification model, is obtained.
In the iterative updating process, after each round of iterative updating, whether the training model of the round meets the preset condition is judged, for example, whether the current iterative updating times reach the preset iterative updating times, whether the model training loss reaches the preset loss range, whether the trained model is converged or not, and the like, if the preset condition is not met, a new round of iterative updating is continued until the training model meets the preset condition, and therefore, when the preset condition is met, model training can be considered to be finished, and a final data identification model is obtained.
Therefore, the training of the data identification model on each source domain client is realized through the cooperative processing of each source domain client and the server, and for each source domain client, the data identification model obtained through training is a model obtained based on local sample training, and meanwhile, the server learns the data characteristics in other source domain clients, so that the applicability of the data identification model in each source domain can be effectively ensured, the model precision is ensured, and the safety of heterogeneous multi-source domain data can be effectively ensured.
S105: and performing a data recognition operation by using the data recognition model.
This step aims at realizing the data identification operation. Specifically, after the training of the data recognition model is completed, the data recognition operation can be performed by using the data recognition model, and the target data to be recognized can be input into the data recognition model for processing, and the output of the model is the recognition result of the target data.
In the training process of the data recognition model, in practical application, after the model training is completed to obtain the data recognition model, the data recognition model can be stored in a preset storage space, and when target data to be recognized is received, the data recognition model is directly fetched for processing, so that model training is not required to be repeated during each data recognition.
Therefore, the data identification method provided by the embodiment of the invention is characterized in that the initial identification model is arranged at each client, the initial identification model consists of a local model and a global model, in the model training process, the initial identification model is trained by using a local sample of the client to obtain a local model gradient and a global model gradient respectively, wherein the global model gradient is uploaded to a server, so that the server can update the server model parameters by using the global model gradient uploaded by each client, the updated server parameters can be used for updating the client global model parameters, the local model gradient is used for updating the client local model parameters, and thus, the data identification model meeting preset conditions can be obtained through iterative training.
Based on the above embodiments:
In one embodiment of the present invention, training the initial recognition model using the local sample to obtain the local model gradient and the global model gradient may include:
processing the local samples by using an initial recognition model to obtain probability distribution of each local sample;
obtaining a current loss function of the initial recognition model according to each probability distribution calculation;
Calculating by using the current loss function and the local model parameters to obtain a local model gradient;
and calculating by using the current loss function and the global model parameters to obtain the global model gradient.
The embodiment of the invention provides a realization method for training and calculating local model gradients and global model gradients through an initial recognition model. Firstly, inputting local samples into a local initial recognition model for processing, outputting the model, namely probability distribution of each local sample, then calculating a current loss function of the initial recognition model, namely a loss function value of the training of the model, by using the probability distribution of all the local samples, and finally, calculating a local model gradient and a global model gradient by combining the current values of local model parameters and global model parameters.
In an embodiment of the present invention, the processing the local samples using the initial recognition model to obtain probability distributions of the local samples may include:
For each local sample, carrying out text division on the local sample to obtain each text data;
combining the word data into phrase data according to the word data;
Determining the serial number of each text data and the serial number of each phrase data;
generating an absolute head position and an absolute tail position of each text data according to the serial number of each text data;
generating an absolute head position and an absolute tail position of each phrase data according to the serial number of each phrase data;
Generating a model input sequence according to the sequence number, the absolute head position and the absolute tail position of each word data and the sequence number, the absolute head position and the absolute tail position of each phrase data;
And inputting the model input sequence corresponding to each local sample into the initial recognition model to obtain probability distribution of each local sample.
The embodiment of the invention provides a realization method for calculating probability distribution of each local sample by using an initial recognition model. Specifically, when the data recognition model is used for realizing named entity recognition, the local samples are text data, so that each local sample can be subjected to text division to obtain each word in the text, namely the word data, and then each word data is combined into a plurality of phrase data in a word combination mode to further obtain each word data and a serial number of each phrase data; further, by combining each serial number data, an absolute head position and an absolute tail position of each word data, and an absolute head position and an absolute tail position of each phrase data can be generated; finally, a model input sequence is generated based on the data information, namely, a data sequence to be input into an initial recognition model is generated, the sequence is input into the initial recognition model for processing, and the output of the model is probability distribution of each local sample.
Therefore, in the embodiment of the invention, phrase data formed based on each word data is obtained by word combination, and is combined with the word data, the generation of a model input sequence is realized based on the characteristic information (including serial number and position information) of each word data and the characteristic information of each phrase data, the sample expansion function is realized to a certain extent, and the training is facilitated to obtain a data recognition model with higher precision, so that the accuracy of a data recognition result can be effectively improved.
In an embodiment of the present invention, the combining the word group data according to the word group data may include: and obtaining phrase data corresponding to each text data by inquiring the dictionary. Specifically, to generate phrase data based on the text data, the text may be formed by querying a dictionary, and all phrase data contained in each local sample may be obtained by the user.
In an embodiment of the present invention, the determining the sequence number of each text data and the sequence number of each phrase data may include: and determining the serial number of each text data and the serial number of each phrase data by inquiring a preset word list. Specifically, a vocabulary may be created in advance, which is used to record the correspondence between each text data (including text data and phrase data) and the serial number, so that the determination of the serial number data can be achieved by querying the preset vocabulary.
In an embodiment of the present invention, the generating the absolute head position and the absolute tail position of each text data according to the serial number of each text data may include:
For each text data, determining the position information of the text data in the local sample according to the serial number of the text data;
and generating an absolute head position and an absolute tail position of the text data according to the position information.
The embodiment of the invention provides a realization method for calculating the absolute position of each text data. For each text data, the position information of the text data in the local sample, such as the first text, the fifth text and the like in the local sample, can be determined according to the serial number of the text data, and then the absolute head position and the absolute tail position of the text data are determined according to the position information, so that the absolute position calculation of the text data is completed.
In an embodiment of the present invention, the generating the absolute head position and the absolute tail position of each phrase data according to the sequence number of each phrase data may include:
For each phrase data, determining each text data in the phrase data;
and generating the absolute head position and the absolute tail position of the phrase data according to the absolute head position and the absolute tail position of each text data.
The embodiment of the invention provides a realization method for calculating the absolute position of each phrase data, and can understand that the phrase data is generated based on the text data, so that the absolute position of the phrase data can be calculated according to the absolute position of the text data contained in the phrase data. Specifically, for each phrase data, each text data contained therein may be determined first, and then the absolute position of the phrase data, that is, the absolute head position and the absolute tail position of the phrase data, may be calculated according to the absolute position of each text data, thereby completing the calculation of the absolute position of the phrase data.
In one embodiment of the present invention, the generating the model input sequence according to the sequence number, the absolute head position, the absolute tail position of each text data and the sequence number, the absolute head position, the absolute tail position of each phrase data may include:
For each text data, calculating according to the serial number, the absolute head position and the absolute tail position of the text data to obtain a feature vector corresponding to the text data;
for each phrase data, calculating according to the sequence number, the absolute head position and the absolute tail position of the phrase data to obtain a feature vector corresponding to the phrase data;
and combining the characteristic vector of each text data and the characteristic vector of each phrase data into a model input sequence.
The embodiment of the invention provides a method for realizing a model input sequence. It will be appreciated that, to implement model-based data processing, the data information to be processed may be converted into a form of a feature vector, specifically, for each text data, a corresponding feature vector is calculated according to its serial number and absolute position information, and for each phrase data, a corresponding feature vector is calculated according to its serial number and absolute position information, so that, for each local sample, feature vectors of all text data and feature vectors of all phrases contained therein may be combined into a model input sequence corresponding to the local sample.
In an embodiment of the present invention, the calculating to obtain the feature vector corresponding to the text data according to the sequence number, the absolute head position and the absolute tail position of the text data may include:
Converting the serial number of the text data into a serial number vector;
converting the absolute head position of the text data into an absolute head position vector;
converting the absolute tail position of the text data into an absolute tail position vector;
and vector addition calculation is carried out on the serial number vector, the absolute head position vector and the absolute tail position vector of the text data, so as to obtain the feature vector corresponding to the text data.
The embodiment of the invention provides a realization method for calculating a feature vector of text data. For each text data, the serial number, the absolute head position and the absolute position value of the text data can be converted into corresponding feature vectors in sequence to obtain a serial number vector, an absolute head position vector and an absolute tail position vector, and then the feature vectors corresponding to the text data are obtained through a vector addition calculation mode.
In an embodiment of the present invention, the calculating according to the sequence number, the absolute head position, and the absolute tail position of the phrase data to obtain the feature vector corresponding to the phrase data may include:
converting the sequence number of the phrase data into a sequence number vector;
converting the absolute head position of phrase data into an absolute head position vector;
converting the absolute tail position of phrase data into an absolute tail position vector;
and vector addition calculation is carried out on the sequence number vector, the absolute head position vector and the absolute tail position vector of the phrase data, so as to obtain the feature vector corresponding to the phrase data.
The embodiment of the invention provides a method for realizing feature vector calculation of phrase data, and the realization process is referred to the calculation process of the feature vector of the text data, and is not repeated here.
In an embodiment of the present invention, the updating the local model parameters with the local model gradient and updating the global model parameters with the server model update parameters until obtaining the data identification model that meets the preset condition may include: and updating local model parameters by using local model gradients, and updating global model parameters by using server model updating parameters until a data identification model with model loss reaching a preset threshold is obtained.
The embodiment of the invention provides a specific preset condition. Specifically, after each round of model training is completed, whether the current value of the model loss function after the training reaches a preset threshold value or not can be judged, if so, the model training is finished, a data identification model is obtained, otherwise, the model training is considered not to be finished, and iterative training needs to be continued. It should be noted that, the value of the preset threshold does not affect the implementation of the technical scheme, and the preset threshold is set by a technician according to the actual situation, which is not limited by the present invention.
In one embodiment of the present invention, the updating the local model parameters using the local model gradient may include:
Determining the current value of a local model parameter, and acquiring a preset learning rate;
and calculating according to the local model gradient, the current value of the model parameter and the preset learning rate to obtain the updated value of the local model parameter.
The embodiment of the invention provides an implementation mode for updating local model parameters by using local model gradients. Firstly, determining the current value of a local model parameter, wherein the current value is a numerical value obtained by updating after the last round of model training is finished, simultaneously obtaining a super-parameter preset learning rate, wherein the preset learning rate is a preset fixed value, and then carrying out formula calculation by combining the local model gradient obtained by the model training, so that the updated value of the local model parameter can be obtained, thereby completing the updating of the local model parameter.
In an embodiment of the present invention, the updating the global model parameter with the server model update parameter may include: and taking the current value of the server model updating parameter as the updating value of the global model parameter.
The embodiment of the invention provides an implementation mode for updating global model parameters by using server model updating parameters. As described above, the server model parameters correspond to the global model parameters in each source domain client, so after the server finishes the parameter update, the current value of the global model parameters is directly replaced with the current value of the server model update parameters, so that the update of the global model parameters is completed.
In an embodiment of the present invention, uploading the global model gradient to the server, so that the server updates the server model parameters with the global model gradient to obtain server model updated parameters may include: uploading the global model gradient to a server so that the server can aggregate the global model gradient uploaded by each client to obtain an aggregate gradient, and updating the server model parameters by utilizing the aggregate gradient to obtain server model updating parameters.
The embodiment of the invention provides a method for realizing the update of server model parameters. For each source domain client, after a round of model training is completed to obtain a global model gradient and a local model gradient, uploading the global model gradient to a server, and updating server model parameters by the server by using the global model gradient; further, for the server, after the global model gradients uploaded by all source domain clients are obtained, aggregation processing can be performed on all the global model gradients to obtain an aggregation gradient, and then update of server model parameters is achieved by using the aggregation gradient to obtain server model update parameters. It should be noted that, the update of the server model parameter by the server may refer to the update process of the local model parameter, and may also be calculated by combining the current value of the server model parameter and the preset learning rate to obtain the server model update parameter.
In an embodiment of the present invention, uploading the global model gradient to a server, so that the server aggregates the global model gradient uploaded by each client to obtain an aggregate gradient, may include: uploading the global model gradient to a server so that the server can carry out weighted average calculation on the global model gradient uploaded by each client by utilizing an aggregation formula to obtain an aggregation gradient;
The aggregation formula is:
wherein, For the polymerization gradient,/>For/>Global model gradient of individual clients,/>For/>Sample duty cycle of individual clients,/>For/>Weights of individual clients,/>For/>Local sample number of individual clients,/>For all clients/>Is/is the total number of samplesFor/>Gradient change of individual clients,/>Is a natural constant.
The embodiment of the invention provides a specific implementation mode of aggregation processing, namely weighted average calculation. For the server, after the global model gradients uploaded by all source domain clients are obtained, weighted average calculation can be performed on all the global model gradients according to preset weights, and the weighted average obtained by calculation is used as a final aggregation gradient. Wherein the gradient variesRefers to the/>And the difference value between the global model gradient calculated in the current round of iterative training process and the global model gradient calculated in the previous round of iterative training process is calculated in each client. /(I)
Wherein, the setting rule about the client weight is as follows: considering that the number of samples used for performing model training on different clients is different, the contribution degree of each client is also different when the server performs gradient aggregation, obviously, when the number of samples used for performing model training on a certain client is more, the contribution degree of the samples used for performing model training on the server is larger, and when the number of samples used for performing model training on a certain client is less, the contribution degree of the samples used for performing model training on the server is smaller, therefore, higher weight can be set for the former client, and lower weight can be set for the latter client, so that the model precision is effectively improved, and the accuracy of data identification results is further improved.
In an embodiment of the present invention, before training the initial recognition model by using the local sample to obtain the local model gradient and the global model gradient, the method may further include:
Acquiring the size of a training sample batch in a server;
dividing the local global samples according to the size of the training sample batch to obtain each local sample.
Specifically, a local sample, which refers to sample data used for this round of model training, may be determined prior to training the initial recognition model with the local sample. Firstly, obtaining a set training sample batch size from a server, namely, the number of sample data (namely, the number of local samples) used by each round of model training, and then, carrying out sample division on local global samples in a source domain client by utilizing the training sample batch size to obtain a plurality of local samples, wherein the local global samples refer to all data samples existing in the source domain client. In the implementation process, the proportion calculation can be performed according to the size of the training sample batch in the server and the number of the local global samples, the duty ratio of the local samples used by each round of model training is obtained, and the local samples are selected from the local global samples to perform model training based on the duty ratio, so that the situation that sample data on individual source domains are trained completely but sample data in other source domains need to be trained can be effectively avoided by setting the number of the local samples used by each round of model training, and the balance of each round of training samples on all clients is ensured.
The second embodiment of the invention provides a model training method.
Referring to fig. 2, fig. 2 is a flow chart of a model training method provided by the present invention, and the model training method is applied to a client, and includes S201 to S204.
S201: acquiring an initial recognition model; the initial recognition model comprises a local model and a global model;
s202: training an initial recognition model by using a local sample to obtain a local model gradient and a global model gradient;
S203: uploading the global model gradients to a server so that the server updates the server model parameters by utilizing the global model gradients to obtain server model updating parameters;
S204: and updating local model parameters by using the local model gradient, and updating global model parameters by using the server model updating parameters until a data identification model meeting preset conditions is obtained.
Therefore, the initial recognition model is arranged at each client to train the data recognition model, and the initial recognition model consists of a local model and a global model, in the model training process, the local model gradient and the global model gradient are respectively obtained by training the initial recognition model through a local sample of the client, wherein the global model gradient is uploaded to a server, so that the server can update the server model parameters by utilizing the global model gradient uploaded by each client, the global model parameters of the client can be updated by utilizing the updated server parameters, the local model gradient is utilized to update the local model parameters of the client, and thus, the data recognition model meeting preset conditions can be obtained through iterative training.
In the third embodiment, named entity recognition is taken as an example in the embodiment of the invention, and another data recognition method is provided.
Referring to fig. 3 and fig. 5, fig. 3 is a schematic flow chart of another data identification method provided by the present invention, and fig. 5 is a schematic structural diagram of a data identification system provided by the present invention, where an implementation flow of the data identification method may include:
(1) Referring to fig. 4, fig. 4 is a schematic structural diagram of a data recognition model provided by the present invention, where the basic structure of the initial recognition model is a standard transducer (a specific model with self-attention as a main component) +crf (conditional random field), where an embedded layer, a self-attention layer and a normalization layer are divided into global models, and a feedforward neural network layer, a normalization layer and a conditional random field layer are divided into local models.
(2) In each source domain client, a local named entity initial recognition model is trained using private data.
(3) And for each local sample, obtaining the serial number corresponding to each word in the local sample according to the word list, obtaining all possible phrases corresponding to the word through inquiring the dictionary, and obtaining the serial number corresponding to each phrase through inquiring the word list.
(4) Generating an absolute position head position code and an absolute tail position code according to the position of each word in a local sample, and generating an absolute head position code and an absolute tail position code of each phrase according to the head and tail positions of the words contained in each phrase; then, for each absolute position code, the absolute position code is converted into corresponding feature vectors, namely an absolute head position vector and an absolute tail position vector through an embedding layer of the initial recognition model, and a corresponding feature vector sequence is obtained.
(5) For the serial number of each word and the serial number of each phrase, the serial number of each word and the serial number of each phrase are converted into corresponding feature vectors, namely serial number vectors, through an embedding layer of an initial recognition model, and a corresponding serial number vector sequence is obtained.
(6) And (3) adding the sequence number vector sequence, the absolute head position vector sequence and the absolute tail position vector sequence in the step (4) and the step (5) according to positions to obtain a final model input sequence.
(7) And (3) inputting the model input sequence in the step (6) into an initial recognition model, and obtaining the probability distribution of each local sample by the output of the final CRF layer through the softmax layer.
(8) Calculating a current loss function by using each probability distribution, and marking as follows:
wherein, Represents the/>Loss function of individual source domain client,/>Represents the/>Probability distribution of individual local samples,/>Representing the standard loss function value.
(9) In each training iteration process, the home domain client selects a minimum batch of data for training:
wherein, Is the training sample size (batch size) set by the server,/>Represents the/>Number of local samples used for this round of model training on a source domain client,/>For/>Sample number of individual clients,/>For all clients/>Is a sample of the total number of samples.
(10) Calculating model gradients in the current source domain:
In the source domain (First/>)Individual source domain clients) may utilize a dataset/>Calculating the magnitude of model gradients, local model gradients and global model gradients being/>, respectivelyAnd/>Wherein/>For/>Local model parameters on a personal source domain client,/>For/>Global model parameters on the individual source domain clients.
(11) The global model gradients for each source domain are uploaded to the server so that the server calculates the aggregate gradient from the global model gradients for all source domains.
(12) Updating the local model parameters according to a local model parameter updating formula:
Wherein the super parameter Is the learning rate.
(13) The server performs weighted average calculation by using the global model gradients uploaded by each source domain client to obtain an aggregation gradient:
wherein, For the polymerization gradient,/>For/>Global model gradient of individual clients,/>For/>Sample duty cycle of individual clients,/>For/>Weights of individual clients,/>For/>Local sample number of individual clients,/>For all clients/>Is/is the total number of samplesFor/>Gradient change of individual clients,/>Is a natural constant.
(14) The server updates the server model parameters according to the server model parameter updating formula:
(15) And replacing the global model parameters of the home domain client with the updated server model parameters.
(16) And (3) returning to the step (2) for iterative training until the model loss converges to the target value (preset threshold value), and completing the model training.
(17) And in each source domain, a trained global model and a trained local model can be combined into a complete data identification model to execute named entity identification operation.
Therefore, the data identification method provided by the embodiment of the invention is characterized in that the initial identification model is arranged at each client, the initial identification model consists of a local model and a global model, in the model training process, the initial identification model is trained by using a local sample of the client to obtain a local model gradient and a global model gradient respectively, wherein the global model gradient is uploaded to a server, so that the server can update the server model parameters by using the global model gradient uploaded by each client, the updated server parameters can be used for updating the client global model parameters, the local model gradient is used for updating the client local model parameters, and thus, the data identification model meeting preset conditions can be obtained through iterative training.
The fourth embodiment of the invention provides a data identification device.
Referring to fig. 6, fig. 6 is a schematic structural diagram of a data identification device provided by the present invention, where the data identification device is applied to a client, and includes:
a first obtaining module 110, configured to obtain an initial recognition model; the initial recognition model comprises a local model and a global model;
a first training module 120, configured to train the initial recognition model by using a local sample, so as to obtain a local model gradient and a global model gradient;
the first uploading module 130 is configured to upload the global model gradient to the server, so that the server updates the server model parameters by using the global model gradient to obtain server model update parameters;
The first updating module 140 is configured to update local model parameters with the local model gradient, and update global model parameters with the server model update parameters until a data identification model satisfying a preset condition is obtained;
And an execution module 150 for executing the data recognition operation using the data recognition model.
Therefore, the data recognition device provided by the embodiment of the invention performs training of the data recognition model at each client, the initial recognition model consists of a local model and a global model, in the model training process, the local model gradient and the global model gradient are respectively obtained by training the initial recognition model by using the local sample of the client, wherein the global model gradient is uploaded to a server, so that the server can update the server model parameters by using the global model gradient uploaded by each client, the updated server parameters can be used for updating the client global model parameters, the local model gradient is used for updating the client local model parameters, and thus, the data recognition model meeting the preset conditions can be obtained by iterative training.
In one embodiment of the present invention, the first training module 120 may include:
The processing unit is used for processing the local samples by utilizing the initial recognition model to obtain probability distribution of each local sample;
the first calculation unit is used for calculating and obtaining the current loss function of the initial recognition model according to each probability distribution;
The second calculation unit is used for calculating by using the current loss function and the local model parameters to obtain a local model gradient;
And the third calculation unit is used for calculating by using the current loss function and the global model parameters to obtain the global model gradient.
In one embodiment of the present invention, the processing unit may include:
the dividing subunit is used for carrying out text division on each local sample to obtain each text data;
A combination subunit, configured to combine the word data into phrase data according to the word data;
A determining subunit, configured to determine a sequence number of each text data and a sequence number of each phrase data;
A first generating subunit, configured to generate an absolute head position and an absolute tail position of each text data according to the serial number of each text data;
The second generation subunit is used for generating an absolute head position and an absolute tail position of each phrase data according to the serial number of each phrase data;
The third generation subunit is used for generating a model input sequence according to the sequence number, the absolute head position and the absolute tail position of each text data and the sequence number, the absolute head position and the absolute tail position of each phrase data;
And the output subunit is used for inputting the model input sequence corresponding to each local sample into the initial recognition model to obtain probability distribution of each local sample.
In one embodiment of the present invention, the combination subunit may be specifically configured to obtain phrase data corresponding to each text data by querying a dictionary.
In an embodiment of the present invention, the determining subunit may be specifically configured to determine the sequence number of each text data and the sequence number of each phrase data by querying a preset vocabulary.
In an embodiment of the present invention, the first generating subunit may be specifically configured to determine, for each text data, location information of the text data in the local sample according to a serial number of the text data; and generating an absolute head position and an absolute tail position of the text data according to the position information.
In an embodiment of the present invention, the second generating subunit may be specifically configured to determine, for each phrase data, each text data in the phrase data; and generating the absolute head position and the absolute tail position of the phrase data according to the absolute head position and the absolute tail position of each text data.
In an embodiment of the present invention, the third generating subunit may be specifically configured to calculate, for each text data, a feature vector corresponding to the text data according to a sequence number, an absolute head position, and an absolute tail position of the text data; for each phrase data, calculating according to the sequence number, the absolute head position and the absolute tail position of the phrase data to obtain a feature vector corresponding to the phrase data; and combining the characteristic vector of each text data and the characteristic vector of each phrase data into a model input sequence.
In an embodiment of the present invention, the third generating subunit may be specifically configured to convert a sequence number of the text data into a sequence number vector; converting the absolute head position of the text data into an absolute head position vector; converting the absolute tail position of the text data into an absolute tail position vector; and vector addition calculation is carried out on the serial number vector, the absolute head position vector and the absolute tail position vector of the text data, so as to obtain the feature vector corresponding to the text data.
In an embodiment of the present invention, the third generating subunit may be specifically configured to convert a sequence number of the phrase data into a sequence number vector; converting the absolute head position of phrase data into an absolute head position vector; converting the absolute tail position of phrase data into an absolute tail position vector; and vector addition calculation is carried out on the sequence number vector, the absolute head position vector and the absolute tail position vector of the phrase data, so as to obtain the feature vector corresponding to the phrase data.
In one embodiment of the present invention, the first updating module 140 may be specifically configured to update local model parameters with local model gradients, and update global model parameters with server model update parameters until a data identification model is obtained in which the model loss reaches a preset threshold.
In one embodiment of the present invention, the first updating module 140 may be specifically configured to determine a current value of a local model parameter and obtain a preset learning rate; and calculating according to the local model gradient, the current value of the model parameter and the preset learning rate to obtain the updated value of the local model parameter.
In one embodiment of the present invention, the first updating module 140 may be specifically configured to take the current value of the server model update parameter as the updated value of the global model parameter.
In an embodiment of the present invention, the first uploading module 130 may be specifically configured to upload the global model gradients to a server, so that the server performs an aggregation process on the global model gradients uploaded by each client to obtain an aggregate gradient, and performs an update process on the server model parameters by using the aggregate gradient to obtain server model update parameters.
In an embodiment of the present invention, the first uploading module 130 may be specifically configured to upload the global model gradient to a server, so that the server performs weighted average calculation on the global model gradient uploaded by each client by using an aggregation formula to obtain an aggregation gradient;
The aggregation formula is:
wherein, For the polymerization gradient,/>For/>Global model gradient of individual clients,/>For/>Sample duty cycle of individual clients,/>For/>Weights of individual clients,/>For/>Local sample number of individual clients,/>For all clients/>Is/is the total number of samplesFor/>Gradient change of individual clients,/>Is a natural constant.
In one embodiment of the present invention, the data recognition device may further include a partitioning module, configured to obtain a training sample batch size in the server before training the initial recognition model by using the local samples to obtain the local model gradient and the global model gradient; dividing the local global samples according to the size of the training sample batch to obtain each local sample.
In one embodiment of the invention, the global model may include an embedding layer, a self-attention layer, a normalization layer; the local model may include a feedforward neural network layer, a normalization layer, a conditional random field layer.
For the description of the apparatus provided by the embodiment of the present invention, refer to the above method embodiment, and the description of the present invention is omitted here.
Fifth, the embodiment of the invention provides a model training device.
Referring to fig. 7, fig. 7 is a schematic structural diagram of a model training device provided by the present invention, where the model training device is applied to a client, and includes:
a second obtaining module 210, configured to obtain an initial recognition model; the initial recognition model comprises a local model and a global model;
a second training module 220, configured to train the initial recognition model by using a local sample, so as to obtain a local model gradient and a global model gradient;
the second uploading module 230 is configured to upload the global model gradient to the server, so that the server updates the server model parameters by using the global model gradient to obtain server model update parameters;
the second updating module 240 is configured to update the local model parameters with the local model gradient, and update the global model parameters with the server model update parameters until a data identification model satisfying the preset condition is obtained.
Therefore, the initial recognition model is arranged at each client to train the data recognition model, and the initial recognition model consists of a local model and a global model, in the model training process, the local model gradient and the global model gradient are respectively obtained by training the initial recognition model through a local sample of the client, wherein the global model gradient is uploaded to a server, so that the server can update the server model parameters by utilizing the global model gradient uploaded by each client, the global model parameters of the client can be updated by utilizing the updated server parameters, the local model gradient is utilized to update the local model parameters of the client, and thus, the data recognition model meeting preset conditions can be obtained through iterative training.
For the description of the apparatus provided by the embodiment of the present invention, refer to the above method embodiment, and the description of the present invention is omitted here.
The sixth embodiment of the invention provides an electronic device.
Referring to fig. 8, fig. 8 is a schematic structural diagram of an electronic device according to the present invention, where the electronic device may include:
A memory 11 for storing a computer program;
the processor 10 is configured to implement the steps of any of the data recognition methods and/or the steps of any of the model training methods described above when executing the computer program.
As shown in fig. 8, which is a schematic diagram of a composition structure of an electronic device, the electronic device may include: a processor 10, a memory 11, a communication interface 12 and a communication bus 13. The processor 10, the memory 11 and the communication interface 12 all complete communication with each other through a communication bus 13.
In an embodiment of the present invention, the processor 10 may be a central processing unit (Central Processing Unit, CPU), an asic, a dsp, a field programmable gate array, or other programmable logic device, etc.
Processor 10 may invoke programs stored in memory 11 and, in particular, processor 10 may perform operations in embodiments of the data recognition method and/or the model training method.
The memory 11 is used for storing one or more programs, and the programs may include program codes including computer operation instructions, and in the embodiment of the present invention, at least the programs for implementing the following functions are stored in the memory 11:
acquiring an initial recognition model; the initial recognition model comprises a local model and a global model;
Training an initial recognition model by using a local sample to obtain a local model gradient and a global model gradient;
Uploading the global model gradients to a server so that the server updates the server model parameters by utilizing the global model gradients to obtain server model updating parameters;
updating local model parameters by using local model gradients, and updating global model parameters by using server model updating parameters until a data identification model meeting preset conditions is obtained;
and performing a data recognition operation by using the data recognition model.
In one possible implementation, the memory 11 may include a storage program area and a storage data area, where the storage program area may store an operating system, and at least one application program required for functions, etc.; the storage data area may store data created during use.
In addition, the memory 11 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device or other volatile solid-state storage device.
The communication interface 12 may be an interface of a communication module for interfacing with other devices or systems.
Of course, it should be noted that the structure shown in fig. 8 is not limited to the electronic device in the embodiment of the present invention, and the electronic device may include more or less components than those shown in fig. 8 or may be combined with some components in practical applications.
Embodiment seven, an embodiment of the present invention provides a computer-readable storage medium.
The computer readable storage medium provided by the embodiment of the invention stores a computer program, and when the computer program is executed by a processor, the steps of any one of the data identification methods and/or the steps of any one of the model training methods can be realized.
The computer readable storage medium may include: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
For the description of the computer-readable storage medium provided in the embodiment of the present invention, refer to the above method embodiment, and the description of the present invention is omitted here.
In the description, each embodiment is described in a progressive manner, and each embodiment is mainly described by the differences from other embodiments, so that the same similar parts among the embodiments are mutually referred. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The technical scheme provided by the invention is described in detail. The principles and embodiments of the present invention have been described herein with reference to specific examples, the description of which is intended only to facilitate an understanding of the method of the present invention and its core ideas. It should be noted that it will be apparent to those skilled in the art that the present invention may be modified and practiced without departing from the spirit of the present invention.

Claims (19)

1. A data identification method, applied to a client, comprising:
Acquiring an initial data identification model; the initial data identification model is divided into a local model part and a global model part according to a network structure;
training the initial data identification model by using a local sample to obtain a local model gradient and a global model gradient; the local sample is a text sample;
Uploading the global model gradients to a server so that the server updates the server model parameters by utilizing the global model gradients to obtain server model updating parameters;
updating local model parameters by using the local model gradient, and taking the current value of the server model updating parameters as the updating value of the global model parameters until a data identification model meeting preset conditions is obtained; the data identification model is used for realizing named entity identification;
performing a data recognition operation using the data recognition model;
the uploading the global model gradient to a server, so that the server updates the server model parameters by using the global model gradient to obtain server model update parameters, including: uploading the global model gradients to the server so that the server aggregates the global model gradients uploaded by the clients to obtain aggregated gradients, and updating the server model parameters by utilizing the aggregated gradients to obtain server model updating parameters;
The uploading the global model gradient to the server, so that the server aggregates the global model gradient uploaded by each client to obtain an aggregate gradient, includes: uploading the global model gradient to the server so that the server can perform weighted average calculation on the global model gradient uploaded by each client by using an aggregation formula to obtain the aggregation gradient; the aggregation formula is as follows: ;/>;/>
wherein, For the polymerization gradient,/>For/>Global model gradient of individual clients,/>For/>Sample duty cycle of individual clients,/>For/>Weights of individual clients,/>For/>Local sample number of individual clients,/>For all clients/>Is/is the total number of samplesFor/>Gradient change of individual clients,/>Is a natural constant.
2. The method of claim 1, wherein training the initial data recognition model using local samples to obtain a local model gradient and a global model gradient comprises:
processing the local samples by using the initial data identification model to obtain probability distribution of each local sample;
Obtaining a current loss function of the initial data identification model according to each probability distribution calculation;
Calculating by using the current loss function and the local model parameters to obtain the local model gradient;
and calculating by using the current loss function and the global model parameter to obtain the global model gradient.
3. The data recognition method of claim 2, wherein the processing the local samples using the initial data recognition model to obtain probability distributions for each of the local samples comprises:
For each local sample, carrying out text division on the local sample to obtain each text data;
combining the word data into phrase data according to the word data;
determining the serial number of each text data and the serial number of each phrase data;
Generating an absolute head position and an absolute tail position of each text data according to the serial number of each text data;
generating an absolute head position and an absolute tail position of each phrase data according to the serial number of each phrase data;
Generating a model input sequence according to the sequence number, the absolute head position and the absolute tail position of each word data and the sequence number, the absolute head position and the absolute tail position of each phrase data;
And inputting the model input sequence corresponding to each local sample into the initial data identification model to obtain probability distribution of each local sample.
4. The data recognition method of claim 3, wherein the combining the word data into the phrase data includes:
and obtaining phrase data corresponding to each text data by inquiring a dictionary.
5. The data recognition method of claim 3, wherein said determining the sequence number of each of the text data and the sequence number of each of the phrase data comprises:
And determining the serial number of each text data and the serial number of each phrase data by inquiring a preset word list.
6. The data recognition method of claim 3, wherein the generating the absolute head position and the absolute tail position of each of the text data from the serial number of each of the text data comprises:
for each text data, determining the position information of the text data in the local sample according to the serial number of the text data;
And generating an absolute head position and an absolute tail position of the text data according to the position information.
7. The data recognition method of claim 3, wherein the generating an absolute head position and an absolute tail position of each of the phrase data based on a serial number of each of the phrase data comprises:
For each phrase data, determining each text data in the phrase data;
And generating the absolute head position and the absolute tail position of the phrase data according to the absolute head position and the absolute tail position of each text data.
8. The data recognition method of claim 3, wherein generating the model input sequence based on the sequence number, the absolute head position, the absolute tail position of each of the character data, and the sequence number, the absolute head position, the absolute tail position of each of the phrase data, comprises:
for each text data, calculating according to the serial number, the absolute head position and the absolute tail position of the text data to obtain a feature vector corresponding to the text data;
for each phrase data, calculating and obtaining a feature vector corresponding to the phrase data according to the sequence number, the absolute head position and the absolute tail position of the phrase data;
And combining the characteristic vector of each text data and the characteristic vector of each phrase data into the model input sequence.
9. The method for recognizing data according to claim 8, wherein the calculating the feature vector corresponding to the text data according to the sequence number, absolute head position, absolute tail position of the text data comprises:
converting the serial number of the text data into a serial number vector;
converting the absolute head position of the text data into an absolute head position vector;
converting the absolute tail position of the text data into an absolute tail position vector;
and vector addition calculation is carried out on the serial number vector, the absolute head position vector and the absolute tail position vector of the text data, so as to obtain the feature vector corresponding to the text data.
10. The method for recognizing data according to claim 8, wherein the calculating to obtain the feature vector corresponding to the phrase data according to the sequence number, absolute head position, absolute tail position of the phrase data comprises:
converting the sequence number of the phrase data into a sequence number vector;
Converting the absolute head position of the phrase data into an absolute head position vector;
Converting the absolute tail position of the phrase data into an absolute tail position vector;
And vector addition calculation is carried out on the sequence number vector, the absolute head position vector and the absolute tail position vector of the phrase data, so as to obtain the feature vector corresponding to the phrase data.
11. The method for data identification according to claim 2, wherein updating the local model parameters using the local model gradient and updating the global model parameters using the server model update parameters until a data identification model satisfying a preset condition is obtained, comprises:
And updating local model parameters by using the local model gradient, and updating global model parameters by using the server model updating parameters until a data identification model with model loss reaching a preset threshold value is obtained.
12. The method of claim 1, wherein updating the local model parameters using the local model gradient comprises:
determining the current value of the local model parameter, and acquiring a preset learning rate;
And calculating to obtain an updated value of the local model parameter according to the local model gradient, the current value of the model parameter and the preset learning rate.
13. The method of any one of claims 1 to 12, wherein training the initial data recognition model using local samples, before obtaining a local model gradient and a global model gradient, further comprises:
acquiring the size of a training sample batch in the server;
Dividing the local global samples according to the size of the training sample batch to obtain each local sample.
14. The data recognition method of claim 1, wherein the global model comprises an embedding layer, a self-attention layer, a normalization layer; the local model comprises a feedforward neural network layer, a normalization layer and a conditional random field layer.
15. A model training method, applied to a client, comprising:
Acquiring an initial data identification model; the initial data identification model is divided into a local model part and a global model part according to a network structure;
training the initial data identification model by using a local sample to obtain a local model gradient and a global model gradient; the local sample is a text sample;
Uploading the global model gradients to a server so that the server updates the server model parameters by utilizing the global model gradients to obtain server model updating parameters;
updating local model parameters by using the local model gradient, and taking the current value of the server model updating parameters as the updating value of the global model parameters until a data identification model meeting preset conditions is obtained; the data identification model is used for realizing named entity identification;
the uploading the global model gradient to a server, so that the server updates the server model parameters by using the global model gradient to obtain server model update parameters, including: uploading the global model gradients to the server so that the server aggregates the global model gradients uploaded by the clients to obtain aggregated gradients, and updating the server model parameters by utilizing the aggregated gradients to obtain server model updating parameters;
The uploading the global model gradient to the server, so that the server aggregates the global model gradient uploaded by each client to obtain an aggregate gradient, includes: uploading the global model gradient to the server so that the server can perform weighted average calculation on the global model gradient uploaded by each client by using an aggregation formula to obtain the aggregation gradient; the aggregation formula is as follows: ;/>;/>
wherein, For the polymerization gradient,/>For/>Global model gradient of individual clients,/>For/>Sample duty cycle of individual clients,/>For/>Weights of individual clients,/>For/>Local sample number of individual clients,/>For all clients/>Is/is the total number of samplesFor/>Gradient change of individual clients,/>Is a natural constant.
16. A data recognition device, applied to a client, comprising:
the first acquisition module is used for acquiring an initial data identification model; the initial data identification model is divided into a local model part and a global model part according to a network structure;
The first training module is used for training the initial data identification model by utilizing a local sample to obtain a local model gradient and a global model gradient; the local sample is a text sample;
The first uploading module is used for uploading the global model gradient to a server so that the server can update the server model parameters by utilizing the global model gradient to obtain server model updating parameters;
The first updating module is used for updating the local model parameters by utilizing the local model gradient, and taking the current value of the server model updating parameters as the updating value of the global model parameters until a data identification model meeting the preset conditions is obtained; the data identification model is used for realizing named entity identification;
the execution module is used for executing data identification operation by utilizing the data identification model;
The first uploading module is specifically configured to upload the global model gradient to the server, so that the server aggregates the global model gradients uploaded by the clients to obtain an aggregate gradient, and updates the server model parameters by using the aggregate gradient to obtain the server model update parameters;
the first uploading module is specifically configured to upload the global model gradient to the server, so that the server performs weighted average calculation on the global model gradient uploaded by each client by using an aggregation formula, and obtains the aggregation gradient; the aggregation formula is as follows: ;/>;/>
wherein, For the polymerization gradient,/>For/>Global model gradient of individual clients,/>For/>Sample duty cycle of individual clients,/>For/>Weights of individual clients,/>For/>Local sample number of individual clients,/>For all clients/>Is/is the total number of samplesFor/>Gradient change of individual clients,/>Is a natural constant.
17. A model training apparatus, for use in a client, comprising:
The second acquisition module is used for acquiring an initial data identification model; the initial data identification model is divided into a local model part and a global model part according to a network structure;
the second training module is used for training the initial data identification model by utilizing a local sample to obtain a local model gradient and a global model gradient; the local sample is a text sample;
The second uploading module is used for uploading the global model gradient to a server so that the server can update the server model parameters by utilizing the global model gradient to obtain server model updating parameters;
The second updating module is used for updating the local model parameters by utilizing the local model gradient, and taking the current value of the server model updating parameters as the updating value of the global model parameters until a data identification model meeting the preset conditions is obtained; the data identification model is used for realizing named entity identification;
The second uploading module is specifically configured to upload the global model gradient to the server, so that the server aggregates the global model gradients uploaded by the clients to obtain an aggregate gradient, and updates the server model parameters by using the aggregate gradient to obtain the server model update parameters;
the second uploading module is specifically configured to upload the global model gradient to the server, so that the server performs weighted average calculation on the global model gradient uploaded by each client by using an aggregation formula, and obtains the aggregation gradient; the aggregation formula is as follows: ;/>;/>
wherein, For the polymerization gradient,/>For/>Global model gradient of individual clients,/>For/>Sample duty cycle of individual clients,/>For/>Weights of individual clients,/>For/>Local sample number of individual clients,/>For all clients/>Is/is the total number of samplesFor/>Gradient change of individual clients,/>Is a natural constant.
18. An electronic device, comprising:
a memory for storing a computer program;
Processor for implementing the steps of the data recognition method according to any one of claims 1 to 14 or the model training method according to claim 15 when executing the computer program.
19. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, implements the steps of the data recognition method according to any one of claims 1 to 14 or the steps of the model training method according to claim 15.
CN202311034853.2A 2023-08-17 2023-08-17 Data identification method, model training method, device, equipment and storage medium Active CN116756536B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311034853.2A CN116756536B (en) 2023-08-17 2023-08-17 Data identification method, model training method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311034853.2A CN116756536B (en) 2023-08-17 2023-08-17 Data identification method, model training method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN116756536A CN116756536A (en) 2023-09-15
CN116756536B true CN116756536B (en) 2024-04-26

Family

ID=87957541

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311034853.2A Active CN116756536B (en) 2023-08-17 2023-08-17 Data identification method, model training method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116756536B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117808128A (en) * 2024-02-29 2024-04-02 浪潮电子信息产业股份有限公司 Image processing method, federal learning method and device under heterogeneous data condition

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113779992A (en) * 2021-07-19 2021-12-10 西安理工大学 Method for realizing BcBERT-SW-BilSTM-CRF model based on vocabulary enhancement and pre-training
CN114386417A (en) * 2021-12-28 2022-04-22 北京理工大学 Chinese nested named entity recognition method integrated with word boundary information
CN114912022A (en) * 2022-05-10 2022-08-16 平安科技(深圳)有限公司 Prediction model training method, system, computer device and storage medium
CN115204414A (en) * 2022-07-28 2022-10-18 贵州大学 Incentive mechanism-based federated learning optimization method and system
CN115795535A (en) * 2022-11-17 2023-03-14 北京邮电大学 Differential private federal learning method and device for providing adaptive gradient
CN115829055A (en) * 2022-12-08 2023-03-21 深圳大学 Federal learning model training method and device, computer equipment and storage medium
CN115878803A (en) * 2022-12-26 2023-03-31 国网四川省电力公司经济技术研究院 Sensitive data detection method, system, computer terminal and storage medium
CN115983275A (en) * 2022-12-26 2023-04-18 国网四川省电力公司经济技术研究院 Named entity identification method, system and electronic equipment

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220036178A1 (en) * 2020-07-31 2022-02-03 Microsoft Technology Licensing, Llc Dynamic gradient aggregation for training neural networks

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113779992A (en) * 2021-07-19 2021-12-10 西安理工大学 Method for realizing BcBERT-SW-BilSTM-CRF model based on vocabulary enhancement and pre-training
CN114386417A (en) * 2021-12-28 2022-04-22 北京理工大学 Chinese nested named entity recognition method integrated with word boundary information
CN114912022A (en) * 2022-05-10 2022-08-16 平安科技(深圳)有限公司 Prediction model training method, system, computer device and storage medium
CN115204414A (en) * 2022-07-28 2022-10-18 贵州大学 Incentive mechanism-based federated learning optimization method and system
CN115795535A (en) * 2022-11-17 2023-03-14 北京邮电大学 Differential private federal learning method and device for providing adaptive gradient
CN115829055A (en) * 2022-12-08 2023-03-21 深圳大学 Federal learning model training method and device, computer equipment and storage medium
CN115878803A (en) * 2022-12-26 2023-03-31 国网四川省电力公司经济技术研究院 Sensitive data detection method, system, computer terminal and storage medium
CN115983275A (en) * 2022-12-26 2023-04-18 国网四川省电力公司经济技术研究院 Named entity identification method, system and electronic equipment

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"Federated learning meets multi-objective optimization";Zeou Hu等;《arXiv:2006.11489v1》;20200620;全文 *
联邦学习安全与隐私保护研究综述;周俊;方国英;吴楠;;西华大学学报(自然科学版);20200710(第04期);全文 *

Also Published As

Publication number Publication date
CN116756536A (en) 2023-09-15

Similar Documents

Publication Publication Date Title
CN110704588A (en) Multi-round dialogue semantic analysis method and system based on long-term and short-term memory network
CN111414987B (en) Training method and training device of neural network and electronic equipment
CN116756536B (en) Data identification method, model training method, device, equipment and storage medium
JP6712644B2 (en) Acoustic model learning device, method and program
CN112084301B (en) Training method and device for text correction model, text correction method and device
CN110046706A (en) Model generating method, device and server
CN113449821B (en) Intelligent training method, device, equipment and medium fusing semantics and image characteristics
CN116663525B (en) Document auditing method, device, equipment and storage medium
CN113488023B (en) Language identification model construction method and language identification method
JPWO2016125500A1 (en) Feature conversion device, recognition device, feature conversion method, and computer-readable recording medium
CN111340245A (en) Model training method and system
CN111950579A (en) Training method and training device for classification model
CN113919418A (en) Classification model training method and device based on small samples and electronic equipment
CN111612648B (en) Training method and device for photovoltaic power generation prediction model and computer equipment
CN116702765A (en) Event extraction method and device and electronic equipment
CN110705889A (en) Enterprise screening method, device, equipment and storage medium
CN115358473A (en) Power load prediction method and prediction system based on deep learning
CN110717037A (en) Method and device for classifying users
KR20240034804A (en) Evaluating output sequences using an autoregressive language model neural network
CN113139368B (en) Text editing method and system
CN113986245A (en) Object code generation method, device, equipment and medium based on HALO platform
CN114116456A (en) Test case generation method, system and computer readable storage medium
WO2020162240A1 (en) Language model score calculation device, language model creation device, methods therefor, program, and recording medium
JP2018081294A (en) Acoustic model learning device, voice recognition device, acoustic model learning method, voice recognition method, and program
CN117332090B (en) Sensitive information identification method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant