CN112101578A

CN112101578A - Distributed language relationship recognition method, system and device based on federal learning

Info

Publication number: CN112101578A
Application number: CN202011285430.4A
Authority: CN
Inventors: 赵军; 陈玉博; 刘康; 隋典伯
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2020-11-17
Filing date: 2020-11-17
Publication date: 2020-12-18
Anticipated expiration: 2040-11-17
Also published as: CN112101578B

Abstract

The invention belongs to the field of data identification, and particularly relates to a distributed language relationship identification method, a distributed language relationship identification system and a distributed language relationship identification device based on federal learning, aiming at solving the problem that a plurality of participants are difficult to jointly model on the basis of not sharing data. The invention comprises the following steps: each local platform acquires data to be recognized and local training data with labels, a local classifier model is trained according to the local training data, the trained classifier model is integrated into a global classifier model, the local classifier model is initialized through the global classifier model, and the initialized local classifier model is used for recognizing the data to be recognized to obtain a language relationship prediction result. The invention realizes the data use and machine learning modeling of the local platform combined with other platforms on the basis of not sharing data, improves the cooperation of the natural language recognition model training, solves the problem of data isolated island, and reduces the total amount and requirements of the training data required by each platform training model.

Description

Distributed language relationship recognition method, system and device based on federal learning

Technical Field

The invention belongs to the field of data identification, and particularly relates to a distributed language relationship identification method, a system and a device based on federal learning.

Background

In real life, most enterprises have the problems of small data volume and poor data quality, and the realization of artificial intelligence technology is not enough supported; meanwhile, the domestic and foreign environments also strengthen data protection step by step, so that data freely flow on the premise of safety compliance, and become a great trend; data owned by business companies often has great potential value from both a user and enterprise perspective. Two companies, and even departments between companies, are concerned with the exchange of interests, often these organizations do not provide their private data to other companies, and even within the same company, the data often appears in an isolated island. To address data islanding, and privacy protection issues, federal learning has come into force. Federal Machine Learning (Federal Machine Learning/Federal Learning), also known as Federal Learning, joint Learning and league Learning, is a Machine Learning framework, and can effectively help a plurality of organizations to perform data use and Machine Learning modeling under the condition of meeting the requirements of user privacy protection, data safety and government regulations. The federated learning is used as a distributed machine learning paradigm, the problem of data islands can be effectively solved, participators can jointly model on the basis of not sharing data, the data islands can be technically broken, and the cooperation of artificial intelligence is realized.

Disclosure of Invention

In order to solve the above problems in the prior art, how to implement joint modeling of multiple participants on the basis of not sharing data, and break through the problem of implementing artificial intelligence cooperation of a data island, the invention provides a distributed language relationship identification method based on federal learning, which comprises the following steps:

s100, each local platform acquires data to be identified and local training data with labels;

s200, each local platform trains a first local classifier model through the local training data with the labels to obtain a second local classifier model;

step S300, generating a global classifier model by a weighted average method based on the second local classifier models of the local platforms;

step S400, initializing the second local classifier model of each local platform based on the global classifier model, and generating a third local classifier model;

step S500, performing language relationship recognition on the data to be recognized through the third local classifier model, and obtaining a language relationship prediction result of the data to be recognized.

Further, step S200 includes:

step S210, let t =1, perform an encoding operation on the tagged local training data through a BERT model, and generate 1 sentence-characterized representation and a plurality of entity-characterized representations for each sentence of the tagged local training data;

step S220, selecting the t-th sentence of the local data with the tag, selecting 2 entity characterization representations, and splicing the selected 2 entity characterization representations and the sentence characterization representations to generate a sentence representation containing entity information

：

Wherein the content of the first and second substances,

for the purpose of the sentence characterization representation,

for the 1 st selected entity characterization representation,

for the 2 nd selected entity characterization representation,

representing a real space, d representing the dimension of each characterization representation;

step S230, the sentence representation based on the implication entity information

Acquiring a predicted language relation p of the labeled local training data through the first local classifier model:

wherein the content of the first and second substances,

sentence representation representing information of implication entity

The final prediction relation p is the maximum value in the prediction relation distribution,

a label representing the relationship between the user and the user,

a set of sentences is represented in a set of sentences,

the parameters of the model are represented by,Wb is a trainable parameter adjusted in the training process, and softmax represents a universal softmax classifier;

step S240, let t = t +1 and jump to step S220, and trainable parameters are adjusted through a stochastic gradient descent algorithmWAnd training the parameters b until the local loss function of the model is smaller than a preset first threshold value, and obtaining a second local classifier model.

Further, step S300 further includes step S300B of obtaining a global classifier model by knowledge distillation based on the second local classifier models of the respective local platforms.

Further, the global classifier model is obtained by knowledge distillation based on each second local classifier model, and the method comprises the following steps:

step S310B, each local platform acquires global server data with labels;

step S320B, predicting the global server data with labels through the second local classifier models of each local platform respectively to obtain the local prediction relation of the global data

Collecting;

step S330B, performing the global data local prediction relation

Aggregation of sets and optimization of global data local prediction relation through stochastic gradient descent algorithm

Collecting a global loss function L of the global server data with the label until the global loss function value L is smaller than a preset second threshold value, and obtaining a global classifier model;

the global loss function L is:

wherein the content of the first and second substances,

in order to verify the set of images,iin order to be an index,

sentence representation for entity information

The predicted relationship distribution of (a) is,

is prepared by reacting with

Sentence representation of different entity information

Predicted relationship distribution of (2):

where τ is the temperature parameter used to control the distribution of the distillation, r is another index different from i,

comprises the following steps:

wherein the content of the first and second substances,

is the value of the location of the label aggregated model for each second local classifier model,

sentence representation representing local implication entity information for aggregation

S represents a sentence, J is an index in the local classifier model, and J is the local classifier model.

In another aspect of the present invention, a distributed language relationship recognition system based on federal learning is provided, the system includes: the system comprises a data acquisition module, a local training module, an aggregation module, a local classifier optimization module and a data identification module;

each local platform of the data acquisition module acquires data to be identified and local training data with labels;

the local training module is configured to train a first local classifier model through local training data with labels by each local platform to obtain a second local classifier model;

the aggregation module is configured to generate a global classifier model through a weighted average method based on the second local classifier models of the local platforms;

the local classifier optimization module is configured to initialize the second local classifier model of each local platform based on the global classifier model and generate a third local classifier model;

the data identification module is configured to perform language relationship identification on the data to be identified through the third local classifier model, and obtain a language relationship prediction result of the data to be identified.

Further, the local training module comprises: the sentence prediction and updating system comprises a sentence coding unit, a characterization representation unit, a sentence prediction unit and an iteration updating unit;

the sentence coding unit makes t =1, performs coding operation on the tagged local training data through a BERT model, and generates 1 sentence characterization representation and a plurality of entity characterization representations for each sentence of the tagged local training data;

the characterization representation unit is used for randomly selecting 2 entity characterization representations for each sentence of the tagged local data, and splicing the selected 2 entity characterization representations and the sentence characterization representations to generate a sentence representation containing entity information

：

Wherein the content of the first and second substances,

for the purpose of the sentence characterization representation,

for the 1 st selected entity characterization representation,

for the 2 nd selected entity characterization representation,

the space of real numbers is represented by a real number,

representing dimensions of each of the characterization representations;

the sentence prediction unit is configured to represent the sentence based on the entity information

And acquiring a prediction relation p of the labeled local training data through the first local classifier model:

wherein the content of the first and second substances,

sentence representation representing each implication entity information

representing relationshipsThe number of the labels is such that,

a set of sentences is represented in a set of sentences,

and the iterative updating unit makes t = t +1 and jumps to the characterization representation unit, and the trainable parameters are adjusted by a random gradient descent algorithmWAnd training the parameters b until the local loss function of the model is smaller than a preset first threshold value, and obtaining a second local classifier model.

Further, the aggregation module further includes: and obtaining a global classifier model by knowledge distillation based on the second local classifier models of the local platforms.

Further, the second local classifier model based on each local platform obtains a global classifier model by knowledge distillation, and the method comprises the following steps:

step S310B, acquiring global server data with labels;

step S320B, the global server data with labels is predicted through each second local classifier model respectively, and a global data local prediction relation is obtained

Collecting;

step S330B, performing the global data local prediction relation

The global loss function L of the set and the global server data with the label is smaller than the preset global loss function value LA second threshold value, obtaining a global classifier model;

the global loss function L is:

wherein the content of the first and second substances,

for the local training data that is tagged,

in order to be the global data,iin order to be an index,

for sentence representation of local implication entity information

The predicted relationship distribution of (a) is,

and distributing the prediction relation of the aggregated models of the second local classifier models:

comprises the following steps:

wherein the content of the first and second substances,

is each second local classifierThe label of the model aggregates the value of the location of the model,

In a third aspect of the present invention, a storage device is provided, in which a plurality of programs are stored, the programs being adapted to be loaded and executed by a processor to implement the above-mentioned distributed natural language identification method based on federal learning.

In a fourth aspect of the present invention, a processing apparatus is provided, which includes a processor, a storage device; the processor is suitable for executing various programs; the storage device is suitable for storing a plurality of programs; the program is adapted to be loaded and executed by a processor to implement the federated learning-based distributed natural language identification method described above.

The invention has the beneficial effects that:

(1) according to the distributed language relationship recognition method based on the federal learning, the classification models trained locally are gathered into a global classifier model, and the local classifier model is initialized through the global classifier model, so that data use and machine learning modeling are carried out on a local platform in combination with other platforms on the basis of not sharing data, and the cooperation of natural language recognition model training is improved.

(2) According to the distributed language relationship recognition method based on the federal learning, through the federal learning mode, each local classification model can be trained in a coordinated mode without sending out data, the problem of data islanding is solved, data barriers are broken, and the total amount and requirements of training data required by each platform training model are reduced.

(3) According to the distributed language relationship recognition method based on the federal learning, the transmission cost of the federal learning is reduced and the learning efficiency of the model is improved through a knowledge distillation technology.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

fig. 1 is a flow chart of a distributed language relationship recognition method based on federal learning according to an embodiment of the present invention.

Detailed Description

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

The invention discloses a distributed language relationship recognition method based on federal learning, which comprises the following steps:

In order to more clearly illustrate the distributed language relationship identification method based on federal learning of the present invention, the following describes each step in the embodiment of the method of the present invention in detail with reference to fig. 1.

The distributed language relationship recognition method based on federal learning in the embodiment of the invention comprises the following steps S100-S500, and the steps are described in detail as follows:

in this embodiment, step S200 includes:

step S210, let t =1, perform an encoding operation on the tagged local training data through a BERT model, and generate 1 sentence-characterized representation and a plurality of entity-characterized representations for each sentence of the tagged local training data; in this embodiment, the BERT model is a Deep transforms depth coding model;

As shown in equation (1):

（1）

wherein the content of the first and second substances,

for the purpose of the sentence characterization representation,

for the 1 st selected entity characterization representation,

for the 2 nd selected entity characterization representation,

the space of real numbers is represented by a real number,

representing dimensions of each of the characterization representations;

in this embodiment, the entities are selected words, one sentence includes a plurality of words, and each sentence marks 2 entities;

Acquiring a prediction language relation p of the labeled local training data through the first local classifier model, wherein the prediction language relation p is shown as a formula (2);

（2）

wherein the content of the first and second substances,

sentence representation representing each implication entity information

a label representing the relationship between the user and the user,

a set of sentences is represented in a set of sentences,

the parameters of the model are represented by,Wand b is a trainable parameter, softmax, adjusted during trainingRepresents the generic softmax classifier;

the invention may also use other classifier models, and softmax is only used as an example to facilitate understanding of the invention, and is not limited to the specific example.

in this embodiment, step S300 further includes step S300B of obtaining a global classifier model by knowledge distillation based on the second local classifier models of the respective local platforms.

In this embodiment, the second local classifier model based on each local platform obtains the global classifier model by knowledge distillation, and the method includes:

step S310B, each local platform acquires global server data with labels;

Collecting;

step S330B, performing the global data local prediction relation

Collecting global loss function L of the global server data with the label until the global loss function value L is smaller than a preset second threshold value, and obtaining the global loss function value LA classifier model;

the global loss function L is shown in equation (3):

（3）

wherein the content of the first and second substances,

for the local training data that is tagged,

for global data, i is an index,

sentence representation for local implication of entity information

The predicted relationship distribution of (a) is,

and (3) the prediction relation distribution of the aggregated model of each second local classifier model is shown as formula (4):

（4）

as shown in equation (5):

（5）

wherein the content of the first and second substances,

is the value of the location of the aggregated model for each second local classifier model label,

The predicted relationship distribution of (a) is,sthe representation of a sentence is represented by,jfor the purpose of indexing in the local classifier model,Jis a local classifier model.

The I2B2 dataset was used as a training and testing corpus. The corpus contains 10231 training data pieces and 19114 training data pieces.

The effectiveness of the prior art method is demonstrated by comparing the effects of the method. The results are shown in table 1:

table 1 comparison of the effects of the prior art and the examples of the present invention:

the first part (first three lines) in the table is the effect of the traditional centralized method on the corpus with the standard annotations, and the second part (last three lines) adopts the federal learning mode to train the result. From the experimental results, we propose a federal training method that exceeds the previous method, and this method has proven effective.

The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention and are not intended to limit the present invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

A distributed language relationship recognition system based on federated learning according to a second embodiment of the present invention, the system includes: the system comprises a data acquisition module, a local training module, an aggregation module, a local classifier optimization module and a data identification module;

in this embodiment, the local training module includes: the sentence prediction and updating system comprises a sentence coding unit, a characterization representation unit, a sentence prediction unit and an iteration updating unit;

the characterization representation unit selects 2 entity characterization representations for each sentence of the tagged local data, and splices the selected 2 entity characterization representations and the sentence characterization representations to generate a sentence representation containing entity information

As shown in equation (6):

（6）

wherein the content of the first and second substances,

for the purpose of the sentence characterization representation,

for the 1 st selected entity characterization representation,

for the 2 nd selected entity characterization representation,

the space of real numbers is represented by a real number,

representing dimensions of each of the characterization representations;

The prediction relation p of the labeled local training data obtained by the first local classifier model is shown in formula (7):

（7）

wherein the content of the first and second substances,

sentence representation representing each implication entity information

a label representing the relationship between the user and the user,

a set of sentences is represented in a set of sentences,

and the iterative updating unit makes t = t +1 and jumps to the characterization representation unit, and the trainable parameters are adjusted by a random gradient descent algorithmWAnd trainable parameters b until the local loss function of the model is less than a preset first threshold value, and obtainA second local classifier model.

in this embodiment, the aggregation module further includes: and obtaining a global classifier model by knowledge distillation based on the second local classifier models of the local platforms.

step S310B, acquiring global server data with labels;

Collecting;

step S330B, performing the global data local prediction relation

the global loss function L is shown in equation (8):

（8）

wherein the content of the first and second substances,

for the local training data that is tagged,

for global data, i is an index,

sentence representation for local implication of entity information

The predicted relationship distribution of (a) is,

and (3) the prediction relation distribution of the aggregated model of each second local classifier model is shown as formula (9):

（9）

as shown in equation (10):

（10）

wherein the content of the first and second substances,

The predicted relationship distribution of (a) is,sdisplay sentenceIn the case of a hybrid vehicle,jfor the purpose of indexing in the local classifier model,Jis a local classifier model.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process and related description of the system described above may refer to the corresponding process in the foregoing method embodiments, and will not be described herein again.

It should be noted that, the distributed language relationship identification system based on federal learning provided in the foregoing embodiment is only illustrated by the division of the foregoing functional modules, and in practical applications, the functions may be distributed by different functional modules according to needs, that is, the modules or steps in the embodiment of the present invention are further decomposed or combined, for example, the modules in the foregoing embodiment may be combined into one module, or may be further split into multiple sub-modules, so as to complete all or part of the functions described above. The names of the modules and steps involved in the embodiments of the present invention are only for distinguishing the modules or steps, and are not to be construed as unduly limiting the present invention.

A storage apparatus according to a third embodiment of the present invention stores therein a plurality of programs, which are adapted to be loaded and executed by a processor to implement the distributed language relationship recognition method based on federal learning described above.

A processing apparatus according to a fourth embodiment of the present invention includes a processor, a storage device; a processor adapted to execute various programs; a storage device adapted to store a plurality of programs; the program is adapted to be loaded and executed by a processor to implement the federated learning-based distributed language relationship recognition approach described above.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes and related descriptions of the storage device and the processing device described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

Those of skill in the art would appreciate that the various illustrative modules, method steps, and modules described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that programs corresponding to the software modules, method steps may be located in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. To clearly illustrate this interchangeability of electronic hardware and software, various illustrative components and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as electronic hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The terms "first," "second," and the like are used for distinguishing between similar elements and not necessarily for describing or implying a particular order or sequence.

The terms "comprises," "comprising," or any other similar term are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

Claims

1. A distributed language relationship recognition method based on federated learning is characterized in that the method comprises the following steps:

2. The distributed language relationship recognition method based on federal learning as claimed in claim 1, wherein step S200 comprises:

：

Wherein the content of the first and second substances,

for the purpose of the sentence characterization representation,

for the 1 st selected entity characterization representation,

for the 2 nd selected entity characterization representation,

the space of real numbers is represented by a real number,

representing dimensions of each of the characterization representations;

wherein the content of the first and second substances,

sentence representation representing each implication entity information

Predicted relationship distribution of (1), predicted relationship distributionThe final predicted relationship with the largest numerical valuep，

A label representing the relationship between the user and the user,

a set of sentences is represented in a set of sentences,

3. The distributed language relationship recognition method based on federal learning as claimed in claim 2, wherein step S300 further comprises step S300B of obtaining a global classifier model by knowledge distillation based on the second local classifier models of the respective local platforms.

4. The distributed language relationship recognition method based on federal learning as claimed in claim 3, wherein the second local classifier model based on each local platform obtains a global classifier model by knowledge distillation, and the method comprises:

step S310B, each local platform acquires global server data with labels;

Collecting;

step S330B, performing the global data local prediction relation

the global loss function L is:

wherein the content of the first and second substances,

for the local training data that is tagged,

for global data, i is an index,

sentence representation for local implication of entity information

The predicted relationship distribution of (a) is,

comprises the following steps:

wherein the content of the first and second substances,is the value of the location of the aggregated model for each second local classifier model label,

5. A distributed language relationship recognition system based on federal learning, the system comprising: the system comprises a data acquisition module, a local training module, an aggregation module, a local classifier optimization module and a data identification module;

6. The distributed language relationship recognition based on federated learning system of claim 5, wherein the local training module comprises: the sentence prediction and updating system comprises a sentence coding unit, a characterization representation unit, a sentence prediction unit and an iteration updating unit;

：

Wherein the content of the first and second substances,

for sentence characterization, for the 1 st selected entity characterization,

for the 2 nd selected entity characterization representation,

the space of real numbers is represented by a real number,

representing dimensions of each of the characterization representations;

wherein the content of the first and second substances,

sentence representation representing each implication entity information

a label representing the relationship between the user and the user,

a set of sentences is represented in a set of sentences,

the iterative updating unit makes t = t +1 and jumps to the characterization representation unit, and the random ladder is usedDegree descent algorithm adjustment trainable parametersWAnd training the parameters b until the local loss function of the model is smaller than a preset first threshold value, and obtaining a second local classifier model.

7. The distributed language relationship recognition based on federated learning system of claim 6, wherein the aggregation module further comprises: and obtaining a global classifier model by knowledge distillation based on the second local classifier models of the local platforms.

8. The distributed language relationship recognition system based on federal learning as claimed in claim 7, wherein the second local classifier model based on each local platform obtains a global classifier model by knowledge distillation by:

step S310B, acquiring global server data with labels;

Collecting;

step S330B, performing the global data local prediction relation

the global loss function L is:

wherein the content of the first and second substances,

for the local training data that is tagged,

for global data, i is an index,

sentence representation for local implication of entity information

The predicted relationship distribution of (a) is,

comprises the following steps:

wherein the content of the first and second substances,

9. A storage device having stored therein a plurality of programs, wherein the programs are adapted to be loaded and executed by a processor to implement the distributed language relationship recognition based on federated learning method of any of claims 1-4.

10. A processing apparatus comprising a processor adapted to execute various programs, and a storage apparatus adapted to store a plurality of programs, wherein the programs are adapted to be loaded and executed by the processor to implement the federal learning based distributed language relationship identification method as claimed in any one of claims 1 to 4.