CN112101578A - Distributed language relationship recognition method, system and device based on federal learning - Google Patents

Distributed language relationship recognition method, system and device based on federal learning Download PDF

Info

Publication number
CN112101578A
CN112101578A CN202011285430.4A CN202011285430A CN112101578A CN 112101578 A CN112101578 A CN 112101578A CN 202011285430 A CN202011285430 A CN 202011285430A CN 112101578 A CN112101578 A CN 112101578A
Authority
CN
China
Prior art keywords
local
data
classifier model
global
sentence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011285430.4A
Other languages
Chinese (zh)
Other versions
CN112101578B (en
Inventor
赵军
陈玉博
刘康
隋典伯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN202011285430.4A priority Critical patent/CN112101578B/en
Publication of CN112101578A publication Critical patent/CN112101578A/en
Application granted granted Critical
Publication of CN112101578B publication Critical patent/CN112101578B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Geometry (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Machine Translation (AREA)

Abstract

The invention belongs to the field of data identification, and particularly relates to a distributed language relationship identification method, a distributed language relationship identification system and a distributed language relationship identification device based on federal learning, aiming at solving the problem that a plurality of participants are difficult to jointly model on the basis of not sharing data. The invention comprises the following steps: each local platform acquires data to be recognized and local training data with labels, a local classifier model is trained according to the local training data, the trained classifier model is integrated into a global classifier model, the local classifier model is initialized through the global classifier model, and the initialized local classifier model is used for recognizing the data to be recognized to obtain a language relationship prediction result. The invention realizes the data use and machine learning modeling of the local platform combined with other platforms on the basis of not sharing data, improves the cooperation of the natural language recognition model training, solves the problem of data isolated island, and reduces the total amount and requirements of the training data required by each platform training model.

Description

Distributed language relationship recognition method, system and device based on federal learning
Technical Field
The invention belongs to the field of data identification, and particularly relates to a distributed language relationship identification method, a system and a device based on federal learning.
Background
In real life, most enterprises have the problems of small data volume and poor data quality, and the realization of artificial intelligence technology is not enough supported; meanwhile, the domestic and foreign environments also strengthen data protection step by step, so that data freely flow on the premise of safety compliance, and become a great trend; data owned by business companies often has great potential value from both a user and enterprise perspective. Two companies, and even departments between companies, are concerned with the exchange of interests, often these organizations do not provide their private data to other companies, and even within the same company, the data often appears in an isolated island. To address data islanding, and privacy protection issues, federal learning has come into force. Federal Machine Learning (Federal Machine Learning/Federal Learning), also known as Federal Learning, joint Learning and league Learning, is a Machine Learning framework, and can effectively help a plurality of organizations to perform data use and Machine Learning modeling under the condition of meeting the requirements of user privacy protection, data safety and government regulations. The federated learning is used as a distributed machine learning paradigm, the problem of data islands can be effectively solved, participators can jointly model on the basis of not sharing data, the data islands can be technically broken, and the cooperation of artificial intelligence is realized.
Disclosure of Invention
In order to solve the above problems in the prior art, how to implement joint modeling of multiple participants on the basis of not sharing data, and break through the problem of implementing artificial intelligence cooperation of a data island, the invention provides a distributed language relationship identification method based on federal learning, which comprises the following steps:
s100, each local platform acquires data to be identified and local training data with labels;
s200, each local platform trains a first local classifier model through the local training data with the labels to obtain a second local classifier model;
step S300, generating a global classifier model by a weighted average method based on the second local classifier models of the local platforms;
step S400, initializing the second local classifier model of each local platform based on the global classifier model, and generating a third local classifier model;
step S500, performing language relationship recognition on the data to be recognized through the third local classifier model, and obtaining a language relationship prediction result of the data to be recognized.
Further, step S200 includes:
step S210, let t =1, perform an encoding operation on the tagged local training data through a BERT model, and generate 1 sentence-characterized representation and a plurality of entity-characterized representations for each sentence of the tagged local training data;
step S220, selecting the t-th sentence of the local data with the tag, selecting 2 entity characterization representations, and splicing the selected 2 entity characterization representations and the sentence characterization representations to generate a sentence representation containing entity information
Figure 6867DEST_PATH_IMAGE001
Figure 879008DEST_PATH_IMAGE002
Wherein the content of the first and second substances,
Figure 731426DEST_PATH_IMAGE003
for the purpose of the sentence characterization representation,
Figure 954597DEST_PATH_IMAGE004
for the 1 st selected entity characterization representation,
Figure 35817DEST_PATH_IMAGE005
for the 2 nd selected entity characterization representation,
Figure 824781DEST_PATH_IMAGE006
representing a real space, d representing the dimension of each characterization representation;
step S230, the sentence representation based on the implication entity information
Figure 723467DEST_PATH_IMAGE001
Acquiring a predicted language relation p of the labeled local training data through the first local classifier model:
Figure 292989DEST_PATH_IMAGE007
wherein the content of the first and second substances,
Figure 302533DEST_PATH_IMAGE008
sentence representation representing information of implication entity
Figure 293142DEST_PATH_IMAGE001
The final prediction relation p is the maximum value in the prediction relation distribution,
Figure 97149DEST_PATH_IMAGE009
a label representing the relationship between the user and the user,
Figure 153967DEST_PATH_IMAGE010
a set of sentences is represented in a set of sentences,
Figure 701623DEST_PATH_IMAGE011
the parameters of the model are represented by,Wb is a trainable parameter adjusted in the training process, and softmax represents a universal softmax classifier;
step S240, let t = t +1 and jump to step S220, and trainable parameters are adjusted through a stochastic gradient descent algorithmWAnd training the parameters b until the local loss function of the model is smaller than a preset first threshold value, and obtaining a second local classifier model.
Further, step S300 further includes step S300B of obtaining a global classifier model by knowledge distillation based on the second local classifier models of the respective local platforms.
Further, the global classifier model is obtained by knowledge distillation based on each second local classifier model, and the method comprises the following steps:
step S310B, each local platform acquires global server data with labels;
step S320B, predicting the global server data with labels through the second local classifier models of each local platform respectively to obtain the local prediction relation of the global data
Figure 543808DEST_PATH_IMAGE012
Collecting;
step S330B, performing the global data local prediction relation
Figure 518718DEST_PATH_IMAGE012
Aggregation of sets and optimization of global data local prediction relation through stochastic gradient descent algorithm
Figure 328411DEST_PATH_IMAGE012
Collecting a global loss function L of the global server data with the label until the global loss function value L is smaller than a preset second threshold value, and obtaining a global classifier model;
the global loss function L is:
Figure 679758DEST_PATH_IMAGE013
wherein the content of the first and second substances,
Figure 376449DEST_PATH_IMAGE014
in order to verify the set of images,iin order to be an index,
Figure 787839DEST_PATH_IMAGE015
sentence representation for entity information
Figure 819249DEST_PATH_IMAGE001
The predicted relationship distribution of (a) is,
Figure 708708DEST_PATH_IMAGE016
is prepared by reacting with
Figure 446856DEST_PATH_IMAGE015
Sentence representation of different entity information
Figure 638934DEST_PATH_IMAGE001
Predicted relationship distribution of (2):
Figure 33007DEST_PATH_IMAGE017
where τ is the temperature parameter used to control the distribution of the distillation, r is another index different from i,
Figure 585211DEST_PATH_IMAGE018
comprises the following steps:
Figure 381128DEST_PATH_IMAGE019
wherein the content of the first and second substances,
Figure 6757DEST_PATH_IMAGE020
is the value of the location of the label aggregated model for each second local classifier model,
Figure 888126DEST_PATH_IMAGE021
sentence representation representing local implication entity information for aggregation
Figure 244021DEST_PATH_IMAGE022
S represents a sentence, J is an index in the local classifier model, and J is the local classifier model.
In another aspect of the present invention, a distributed language relationship recognition system based on federal learning is provided, the system includes: the system comprises a data acquisition module, a local training module, an aggregation module, a local classifier optimization module and a data identification module;
each local platform of the data acquisition module acquires data to be identified and local training data with labels;
the local training module is configured to train a first local classifier model through local training data with labels by each local platform to obtain a second local classifier model;
the aggregation module is configured to generate a global classifier model through a weighted average method based on the second local classifier models of the local platforms;
the local classifier optimization module is configured to initialize the second local classifier model of each local platform based on the global classifier model and generate a third local classifier model;
the data identification module is configured to perform language relationship identification on the data to be identified through the third local classifier model, and obtain a language relationship prediction result of the data to be identified.
Further, the local training module comprises: the sentence prediction and updating system comprises a sentence coding unit, a characterization representation unit, a sentence prediction unit and an iteration updating unit;
the sentence coding unit makes t =1, performs coding operation on the tagged local training data through a BERT model, and generates 1 sentence characterization representation and a plurality of entity characterization representations for each sentence of the tagged local training data;
the characterization representation unit is used for randomly selecting 2 entity characterization representations for each sentence of the tagged local data, and splicing the selected 2 entity characterization representations and the sentence characterization representations to generate a sentence representation containing entity information
Figure 691183DEST_PATH_IMAGE001
Figure 615276DEST_PATH_IMAGE002
Wherein the content of the first and second substances,
Figure 859307DEST_PATH_IMAGE003
for the purpose of the sentence characterization representation,
Figure 628680DEST_PATH_IMAGE004
for the 1 st selected entity characterization representation,
Figure 258244DEST_PATH_IMAGE005
for the 2 nd selected entity characterization representation,
Figure 353239DEST_PATH_IMAGE006
the space of real numbers is represented by a real number,
Figure 84566DEST_PATH_IMAGE023
representing dimensions of each of the characterization representations;
the sentence prediction unit is configured to represent the sentence based on the entity information
Figure 657630DEST_PATH_IMAGE001
And acquiring a prediction relation p of the labeled local training data through the first local classifier model:
Figure 79384DEST_PATH_IMAGE007
wherein the content of the first and second substances,
Figure 469914DEST_PATH_IMAGE008
sentence representation representing each implication entity information
Figure 547591DEST_PATH_IMAGE001
The final prediction relation p is the maximum value in the prediction relation distribution,
Figure 65291DEST_PATH_IMAGE009
representing relationshipsThe number of the labels is such that,
Figure 544814DEST_PATH_IMAGE010
a set of sentences is represented in a set of sentences,
Figure 840666DEST_PATH_IMAGE011
the parameters of the model are represented by,Wb is a trainable parameter adjusted in the training process, and softmax represents a universal softmax classifier;
and the iterative updating unit makes t = t +1 and jumps to the characterization representation unit, and the trainable parameters are adjusted by a random gradient descent algorithmWAnd training the parameters b until the local loss function of the model is smaller than a preset first threshold value, and obtaining a second local classifier model.
Further, the aggregation module further includes: and obtaining a global classifier model by knowledge distillation based on the second local classifier models of the local platforms.
Further, the second local classifier model based on each local platform obtains a global classifier model by knowledge distillation, and the method comprises the following steps:
step S310B, acquiring global server data with labels;
step S320B, the global server data with labels is predicted through each second local classifier model respectively, and a global data local prediction relation is obtained
Figure 405640DEST_PATH_IMAGE012
Collecting;
step S330B, performing the global data local prediction relation
Figure 210521DEST_PATH_IMAGE012
Aggregation of sets and optimization of global data local prediction relation through stochastic gradient descent algorithm
Figure 810130DEST_PATH_IMAGE012
The global loss function L of the set and the global server data with the label is smaller than the preset global loss function value LA second threshold value, obtaining a global classifier model;
the global loss function L is:
Figure 214566DEST_PATH_IMAGE024
wherein the content of the first and second substances,
Figure 391470DEST_PATH_IMAGE014
for the local training data that is tagged,
Figure 110027DEST_PATH_IMAGE025
in order to be the global data,iin order to be an index,
Figure 439508DEST_PATH_IMAGE015
for sentence representation of local implication entity information
Figure 952529DEST_PATH_IMAGE001
The predicted relationship distribution of (a) is,
Figure 616729DEST_PATH_IMAGE016
and distributing the prediction relation of the aggregated models of the second local classifier models:
Figure 138977DEST_PATH_IMAGE017
where τ is the temperature parameter used to control the distribution of the distillation, r is another index different from i,
Figure 322965DEST_PATH_IMAGE018
comprises the following steps:
Figure 272466DEST_PATH_IMAGE026
wherein the content of the first and second substances,
Figure 830486DEST_PATH_IMAGE018
is each second local classifierThe label of the model aggregates the value of the location of the model,
Figure 281059DEST_PATH_IMAGE027
sentence representation representing local implication entity information for aggregation
Figure 444187DEST_PATH_IMAGE001
S represents a sentence, J is an index in the local classifier model, and J is the local classifier model.
In a third aspect of the present invention, a storage device is provided, in which a plurality of programs are stored, the programs being adapted to be loaded and executed by a processor to implement the above-mentioned distributed natural language identification method based on federal learning.
In a fourth aspect of the present invention, a processing apparatus is provided, which includes a processor, a storage device; the processor is suitable for executing various programs; the storage device is suitable for storing a plurality of programs; the program is adapted to be loaded and executed by a processor to implement the federated learning-based distributed natural language identification method described above.
The invention has the beneficial effects that:
(1) according to the distributed language relationship recognition method based on the federal learning, the classification models trained locally are gathered into a global classifier model, and the local classifier model is initialized through the global classifier model, so that data use and machine learning modeling are carried out on a local platform in combination with other platforms on the basis of not sharing data, and the cooperation of natural language recognition model training is improved.
(2) According to the distributed language relationship recognition method based on the federal learning, through the federal learning mode, each local classification model can be trained in a coordinated mode without sending out data, the problem of data islanding is solved, data barriers are broken, and the total amount and requirements of training data required by each platform training model are reduced.
(3) According to the distributed language relationship recognition method based on the federal learning, the transmission cost of the federal learning is reduced and the learning efficiency of the model is improved through a knowledge distillation technology.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
fig. 1 is a flow chart of a distributed language relationship recognition method based on federal learning according to an embodiment of the present invention.
Detailed Description
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
The invention discloses a distributed language relationship recognition method based on federal learning, which comprises the following steps:
s100, each local platform acquires data to be identified and local training data with labels;
s200, each local platform trains a first local classifier model through the local training data with the labels to obtain a second local classifier model;
step S300, generating a global classifier model by a weighted average method based on the second local classifier models of the local platforms;
step S400, initializing the second local classifier model of each local platform based on the global classifier model, and generating a third local classifier model;
step S500, performing language relationship recognition on the data to be recognized through the third local classifier model, and obtaining a language relationship prediction result of the data to be recognized.
In order to more clearly illustrate the distributed language relationship identification method based on federal learning of the present invention, the following describes each step in the embodiment of the method of the present invention in detail with reference to fig. 1.
The distributed language relationship recognition method based on federal learning in the embodiment of the invention comprises the following steps S100-S500, and the steps are described in detail as follows:
s100, each local platform acquires data to be identified and local training data with labels;
s200, each local platform trains a first local classifier model through the local training data with the labels to obtain a second local classifier model;
in this embodiment, step S200 includes:
step S210, let t =1, perform an encoding operation on the tagged local training data through a BERT model, and generate 1 sentence-characterized representation and a plurality of entity-characterized representations for each sentence of the tagged local training data; in this embodiment, the BERT model is a Deep transforms depth coding model;
step S220, selecting the t-th sentence of the local data with the tag, selecting 2 entity characterization representations, and splicing the selected 2 entity characterization representations and the sentence characterization representations to generate a sentence representation containing entity information
Figure 439956DEST_PATH_IMAGE022
As shown in equation (1):
Figure 954114DEST_PATH_IMAGE028
(1)
wherein the content of the first and second substances,
Figure 942799DEST_PATH_IMAGE003
for the purpose of the sentence characterization representation,
Figure 694854DEST_PATH_IMAGE004
for the 1 st selected entity characterization representation,
Figure 124174DEST_PATH_IMAGE005
for the 2 nd selected entity characterization representation,
Figure 860049DEST_PATH_IMAGE006
the space of real numbers is represented by a real number,
Figure 386845DEST_PATH_IMAGE023
representing dimensions of each of the characterization representations;
in this embodiment, the entities are selected words, one sentence includes a plurality of words, and each sentence marks 2 entities;
step S230, the sentence representation based on the implication entity information
Figure 321303DEST_PATH_IMAGE001
Acquiring a prediction language relation p of the labeled local training data through the first local classifier model, wherein the prediction language relation p is shown as a formula (2);
Figure 49088DEST_PATH_IMAGE007
(2)
wherein the content of the first and second substances,
Figure 882046DEST_PATH_IMAGE008
sentence representation representing each implication entity information
Figure 353478DEST_PATH_IMAGE001
The final prediction relation p is the maximum value in the prediction relation distribution,
Figure 470339DEST_PATH_IMAGE009
a label representing the relationship between the user and the user,
Figure 837866DEST_PATH_IMAGE010
a set of sentences is represented in a set of sentences,
Figure 423699DEST_PATH_IMAGE011
the parameters of the model are represented by,Wand b is a trainable parameter, softmax, adjusted during trainingRepresents the generic softmax classifier;
the invention may also use other classifier models, and softmax is only used as an example to facilitate understanding of the invention, and is not limited to the specific example.
Step S240, let t = t +1 and jump to step S220, and trainable parameters are adjusted through a stochastic gradient descent algorithmWAnd training the parameters b until the local loss function of the model is smaller than a preset first threshold value, and obtaining a second local classifier model.
Step S300, generating a global classifier model by a weighted average method based on the second local classifier models of the local platforms;
in this embodiment, step S300 further includes step S300B of obtaining a global classifier model by knowledge distillation based on the second local classifier models of the respective local platforms.
In this embodiment, the second local classifier model based on each local platform obtains the global classifier model by knowledge distillation, and the method includes:
step S310B, each local platform acquires global server data with labels;
step S320B, predicting the global server data with labels through the second local classifier models of each local platform respectively to obtain the local prediction relation of the global data
Figure 433244DEST_PATH_IMAGE012
Collecting;
step S330B, performing the global data local prediction relation
Figure 404611DEST_PATH_IMAGE012
Aggregation of sets and optimization of global data local prediction relation through stochastic gradient descent algorithm
Figure 474198DEST_PATH_IMAGE012
Collecting global loss function L of the global server data with the label until the global loss function value L is smaller than a preset second threshold value, and obtaining the global loss function value LA classifier model;
the global loss function L is shown in equation (3):
Figure 203120DEST_PATH_IMAGE013
(3)
wherein the content of the first and second substances,
Figure 891721DEST_PATH_IMAGE014
for the local training data that is tagged,
Figure 858540DEST_PATH_IMAGE025
for global data, i is an index,
Figure 958083DEST_PATH_IMAGE015
sentence representation for local implication of entity information
Figure 643142DEST_PATH_IMAGE001
The predicted relationship distribution of (a) is,
Figure 872785DEST_PATH_IMAGE016
and (3) the prediction relation distribution of the aggregated model of each second local classifier model is shown as formula (4):
Figure 694111DEST_PATH_IMAGE017
(4)
where τ is the temperature parameter used to control the distribution of the distillation, r is another index different from i,
Figure 698976DEST_PATH_IMAGE018
as shown in equation (5):
Figure 153222DEST_PATH_IMAGE026
(5)
wherein the content of the first and second substances,
Figure 121309DEST_PATH_IMAGE018
is the value of the location of the aggregated model for each second local classifier model label,
Figure 390616DEST_PATH_IMAGE027
sentence representation representing local implication entity information for aggregation
Figure 48606DEST_PATH_IMAGE001
The predicted relationship distribution of (a) is,sthe representation of a sentence is represented by,jfor the purpose of indexing in the local classifier model,Jis a local classifier model.
The I2B2 dataset was used as a training and testing corpus. The corpus contains 10231 training data pieces and 19114 training data pieces.
The effectiveness of the prior art method is demonstrated by comparing the effects of the method. The results are shown in table 1:
table 1 comparison of the effects of the prior art and the examples of the present invention:
Figure 832892DEST_PATH_IMAGE030
the first part (first three lines) in the table is the effect of the traditional centralized method on the corpus with the standard annotations, and the second part (last three lines) adopts the federal learning mode to train the result. From the experimental results, we propose a federal training method that exceeds the previous method, and this method has proven effective.
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention and are not intended to limit the present invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
A distributed language relationship recognition system based on federated learning according to a second embodiment of the present invention, the system includes: the system comprises a data acquisition module, a local training module, an aggregation module, a local classifier optimization module and a data identification module;
each local platform of the data acquisition module acquires data to be identified and local training data with labels;
the local training module is configured to train a first local classifier model through local training data with labels by each local platform to obtain a second local classifier model;
in this embodiment, the local training module includes: the sentence prediction and updating system comprises a sentence coding unit, a characterization representation unit, a sentence prediction unit and an iteration updating unit;
the sentence coding unit makes t =1, performs coding operation on the tagged local training data through a BERT model, and generates 1 sentence characterization representation and a plurality of entity characterization representations for each sentence of the tagged local training data;
the characterization representation unit selects 2 entity characterization representations for each sentence of the tagged local data, and splices the selected 2 entity characterization representations and the sentence characterization representations to generate a sentence representation containing entity information
Figure 604670DEST_PATH_IMAGE001
As shown in equation (6):
Figure 400587DEST_PATH_IMAGE031
(6)
wherein the content of the first and second substances,
Figure 278413DEST_PATH_IMAGE032
for the purpose of the sentence characterization representation,
Figure 894203DEST_PATH_IMAGE033
for the 1 st selected entity characterization representation,
Figure 266409DEST_PATH_IMAGE034
for the 2 nd selected entity characterization representation,
Figure 916833DEST_PATH_IMAGE006
the space of real numbers is represented by a real number,
Figure 965561DEST_PATH_IMAGE023
representing dimensions of each of the characterization representations;
the sentence prediction unit is configured to represent the sentence based on the entity information
Figure 545010DEST_PATH_IMAGE001
The prediction relation p of the labeled local training data obtained by the first local classifier model is shown in formula (7):
Figure 642279DEST_PATH_IMAGE035
(7)
wherein the content of the first and second substances,
Figure 491418DEST_PATH_IMAGE036
sentence representation representing each implication entity information
Figure 976626DEST_PATH_IMAGE001
The final prediction relation p is the maximum value in the prediction relation distribution,
Figure 645635DEST_PATH_IMAGE009
a label representing the relationship between the user and the user,
Figure 484278DEST_PATH_IMAGE010
a set of sentences is represented in a set of sentences,
Figure 233929DEST_PATH_IMAGE011
the parameters of the model are represented by,Wb is a trainable parameter adjusted in the training process, and softmax represents a universal softmax classifier;
and the iterative updating unit makes t = t +1 and jumps to the characterization representation unit, and the trainable parameters are adjusted by a random gradient descent algorithmWAnd trainable parameters b until the local loss function of the model is less than a preset first threshold value, and obtainA second local classifier model.
The aggregation module is configured to generate a global classifier model through a weighted average method based on the second local classifier models of the local platforms;
in this embodiment, the aggregation module further includes: and obtaining a global classifier model by knowledge distillation based on the second local classifier models of the local platforms.
In this embodiment, the second local classifier model based on each local platform obtains the global classifier model by knowledge distillation, and the method includes:
step S310B, acquiring global server data with labels;
step S320B, the global server data with labels is predicted through each second local classifier model respectively, and a global data local prediction relation is obtained
Figure 499825DEST_PATH_IMAGE037
Collecting;
step S330B, performing the global data local prediction relation
Figure 452869DEST_PATH_IMAGE037
Aggregation of sets and optimization of global data local prediction relation through stochastic gradient descent algorithm
Figure 829623DEST_PATH_IMAGE037
Collecting a global loss function L of the global server data with the label until the global loss function value L is smaller than a preset second threshold value, and obtaining a global classifier model;
the global loss function L is shown in equation (8):
Figure 433780DEST_PATH_IMAGE038
(8)
wherein the content of the first and second substances,
Figure 667315DEST_PATH_IMAGE039
for the local training data that is tagged,
Figure 232289DEST_PATH_IMAGE040
for global data, i is an index,
Figure 285171DEST_PATH_IMAGE041
sentence representation for local implication of entity information
Figure 884779DEST_PATH_IMAGE001
The predicted relationship distribution of (a) is,
Figure 351533DEST_PATH_IMAGE042
and (3) the prediction relation distribution of the aggregated model of each second local classifier model is shown as formula (9):
Figure 138223DEST_PATH_IMAGE043
(9)
where τ is the temperature parameter used to control the distribution of the distillation, r is another index different from i,
Figure 997726DEST_PATH_IMAGE044
as shown in equation (10):
Figure 248579DEST_PATH_IMAGE045
(10)
wherein the content of the first and second substances,
Figure 27179DEST_PATH_IMAGE046
is the value of the location of the label aggregated model for each second local classifier model,
Figure 425799DEST_PATH_IMAGE047
sentence representation representing local implication entity information for aggregation
Figure 948047DEST_PATH_IMAGE048
The predicted relationship distribution of (a) is,sdisplay sentenceIn the case of a hybrid vehicle,jfor the purpose of indexing in the local classifier model,Jis a local classifier model.
The local classifier optimization module is configured to initialize the second local classifier model of each local platform based on the global classifier model and generate a third local classifier model;
the data identification module is configured to perform language relationship identification on the data to be identified through the third local classifier model, and obtain a language relationship prediction result of the data to be identified.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process and related description of the system described above may refer to the corresponding process in the foregoing method embodiments, and will not be described herein again.
It should be noted that, the distributed language relationship identification system based on federal learning provided in the foregoing embodiment is only illustrated by the division of the foregoing functional modules, and in practical applications, the functions may be distributed by different functional modules according to needs, that is, the modules or steps in the embodiment of the present invention are further decomposed or combined, for example, the modules in the foregoing embodiment may be combined into one module, or may be further split into multiple sub-modules, so as to complete all or part of the functions described above. The names of the modules and steps involved in the embodiments of the present invention are only for distinguishing the modules or steps, and are not to be construed as unduly limiting the present invention.
A storage apparatus according to a third embodiment of the present invention stores therein a plurality of programs, which are adapted to be loaded and executed by a processor to implement the distributed language relationship recognition method based on federal learning described above.
A processing apparatus according to a fourth embodiment of the present invention includes a processor, a storage device; a processor adapted to execute various programs; a storage device adapted to store a plurality of programs; the program is adapted to be loaded and executed by a processor to implement the federated learning-based distributed language relationship recognition approach described above.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes and related descriptions of the storage device and the processing device described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
Those of skill in the art would appreciate that the various illustrative modules, method steps, and modules described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that programs corresponding to the software modules, method steps may be located in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. To clearly illustrate this interchangeability of electronic hardware and software, various illustrative components and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as electronic hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The terms "first," "second," and the like are used for distinguishing between similar elements and not necessarily for describing or implying a particular order or sequence.
The terms "comprises," "comprising," or any other similar term are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

Claims (10)

1. A distributed language relationship recognition method based on federated learning is characterized in that the method comprises the following steps:
s100, each local platform acquires data to be identified and local training data with labels;
s200, each local platform trains a first local classifier model through the local training data with the labels to obtain a second local classifier model;
step S300, generating a global classifier model by a weighted average method based on the second local classifier models of the local platforms;
step S400, initializing the second local classifier model of each local platform based on the global classifier model, and generating a third local classifier model;
step S500, performing language relationship recognition on the data to be recognized through the third local classifier model, and obtaining a language relationship prediction result of the data to be recognized.
2. The distributed language relationship recognition method based on federal learning as claimed in claim 1, wherein step S200 comprises:
step S210, let t =1, perform an encoding operation on the tagged local training data through a BERT model, and generate 1 sentence-characterized representation and a plurality of entity-characterized representations for each sentence of the tagged local training data;
step S220, selecting the t-th sentence of the local data with the tag, selecting 2 entity characterization representations, and splicing the selected 2 entity characterization representations and the sentence characterization representations to generate a sentence representation containing entity information
Figure 874763DEST_PATH_IMAGE001
Figure 141927DEST_PATH_IMAGE002
Wherein the content of the first and second substances,
Figure 982844DEST_PATH_IMAGE003
for the purpose of the sentence characterization representation,
Figure 584727DEST_PATH_IMAGE004
for the 1 st selected entity characterization representation,
Figure 434871DEST_PATH_IMAGE005
for the 2 nd selected entity characterization representation,
Figure 556542DEST_PATH_IMAGE006
the space of real numbers is represented by a real number,
Figure 568361DEST_PATH_IMAGE007
representing dimensions of each of the characterization representations;
step S230, the sentence representation based on the implication entity information
Figure 595223DEST_PATH_IMAGE001
Acquiring a predicted language relation p of the labeled local training data through the first local classifier model:
Figure 983479DEST_PATH_IMAGE008
wherein the content of the first and second substances,
Figure 208924DEST_PATH_IMAGE009
sentence representation representing each implication entity information
Figure 407955DEST_PATH_IMAGE001
Predicted relationship distribution of (1), predicted relationship distributionThe final predicted relationship with the largest numerical valuep
Figure 984430DEST_PATH_IMAGE010
A label representing the relationship between the user and the user,
Figure 910797DEST_PATH_IMAGE011
a set of sentences is represented in a set of sentences,
Figure 990749DEST_PATH_IMAGE012
the parameters of the model are represented by,Wb is a trainable parameter adjusted in the training process, and softmax represents a universal softmax classifier;
step S240, let t = t +1 and jump to step S220, and trainable parameters are adjusted through a stochastic gradient descent algorithmWAnd training the parameters b until the local loss function of the model is smaller than a preset first threshold value, and obtaining a second local classifier model.
3. The distributed language relationship recognition method based on federal learning as claimed in claim 2, wherein step S300 further comprises step S300B of obtaining a global classifier model by knowledge distillation based on the second local classifier models of the respective local platforms.
4. The distributed language relationship recognition method based on federal learning as claimed in claim 3, wherein the second local classifier model based on each local platform obtains a global classifier model by knowledge distillation, and the method comprises:
step S310B, each local platform acquires global server data with labels;
step S320B, predicting the global server data with labels through the second local classifier models of each local platform respectively to obtain the local prediction relation of the global data
Figure 813211DEST_PATH_IMAGE013
Collecting;
step S330B, performing the global data local prediction relation
Figure 362135DEST_PATH_IMAGE013
Aggregation of sets and optimization of global data local prediction relation through stochastic gradient descent algorithm
Figure 92194DEST_PATH_IMAGE013
Collecting a global loss function L of the global server data with the label until the global loss function value L is smaller than a preset second threshold value, and obtaining a global classifier model;
the global loss function L is:
Figure 292231DEST_PATH_IMAGE014
wherein the content of the first and second substances,
Figure 82333DEST_PATH_IMAGE015
for the local training data that is tagged,
Figure 118553DEST_PATH_IMAGE016
for global data, i is an index,
Figure 589986DEST_PATH_IMAGE017
sentence representation for local implication of entity information
Figure 644529DEST_PATH_IMAGE018
The predicted relationship distribution of (a) is,
Figure 605532DEST_PATH_IMAGE019
and distributing the prediction relation of the aggregated models of the second local classifier models:
Figure 378316DEST_PATH_IMAGE020
where τ is the temperature parameter used to control the distribution of the distillation, r is another index different from i,
Figure 200910DEST_PATH_IMAGE021
comprises the following steps:
Figure 375539DEST_PATH_IMAGE022
wherein the content of the first and second substances,is the value of the location of the aggregated model for each second local classifier model label,
Figure 705206DEST_PATH_IMAGE023
sentence representation representing local implication entity information for aggregation
Figure 315179DEST_PATH_IMAGE018
The predicted relationship distribution of (a) is,sthe representation of a sentence is represented by,jfor the purpose of indexing in the local classifier model,Jis a local classifier model.
5. A distributed language relationship recognition system based on federal learning, the system comprising: the system comprises a data acquisition module, a local training module, an aggregation module, a local classifier optimization module and a data identification module;
each local platform of the data acquisition module acquires data to be identified and local training data with labels;
the local training module is configured to train a first local classifier model through local training data with labels by each local platform to obtain a second local classifier model;
the aggregation module is configured to generate a global classifier model through a weighted average method based on the second local classifier models of the local platforms;
the local classifier optimization module is configured to initialize the second local classifier model of each local platform based on the global classifier model and generate a third local classifier model;
the data identification module is configured to perform language relationship identification on the data to be identified through the third local classifier model, and obtain a language relationship prediction result of the data to be identified.
6. The distributed language relationship recognition based on federated learning system of claim 5, wherein the local training module comprises: the sentence prediction and updating system comprises a sentence coding unit, a characterization representation unit, a sentence prediction unit and an iteration updating unit;
the sentence coding unit makes t =1, performs coding operation on the tagged local training data through a BERT model, and generates 1 sentence characterization representation and a plurality of entity characterization representations for each sentence of the tagged local training data;
the characterization representation unit selects 2 entity characterization representations for each sentence of the tagged local data, and splices the selected 2 entity characterization representations and the sentence characterization representations to generate a sentence representation containing entity information
Figure 95047DEST_PATH_IMAGE001
Figure 397853DEST_PATH_IMAGE002
Wherein the content of the first and second substances,
Figure 145229DEST_PATH_IMAGE003
for sentence characterization, for the 1 st selected entity characterization,
Figure 558893DEST_PATH_IMAGE005
for the 2 nd selected entity characterization representation,
Figure 193268DEST_PATH_IMAGE006
the space of real numbers is represented by a real number,
Figure 870237DEST_PATH_IMAGE007
representing dimensions of each of the characterization representations;
the sentence prediction unit is configured to represent the sentence based on the entity information
Figure 839330DEST_PATH_IMAGE001
And acquiring a prediction relation p of the labeled local training data through the first local classifier model:
Figure 791105DEST_PATH_IMAGE024
wherein the content of the first and second substances,
Figure 60412DEST_PATH_IMAGE025
sentence representation representing each implication entity information
Figure 201892DEST_PATH_IMAGE001
The final prediction relation p is the maximum value in the prediction relation distribution,
Figure 658281DEST_PATH_IMAGE010
a label representing the relationship between the user and the user,
Figure 413748DEST_PATH_IMAGE011
a set of sentences is represented in a set of sentences,
Figure 475245DEST_PATH_IMAGE012
the parameters of the model are represented by,Wb is a trainable parameter adjusted in the training process, and softmax represents a universal softmax classifier;
the iterative updating unit makes t = t +1 and jumps to the characterization representation unit, and the random ladder is usedDegree descent algorithm adjustment trainable parametersWAnd training the parameters b until the local loss function of the model is smaller than a preset first threshold value, and obtaining a second local classifier model.
7. The distributed language relationship recognition based on federated learning system of claim 6, wherein the aggregation module further comprises: and obtaining a global classifier model by knowledge distillation based on the second local classifier models of the local platforms.
8. The distributed language relationship recognition system based on federal learning as claimed in claim 7, wherein the second local classifier model based on each local platform obtains a global classifier model by knowledge distillation by:
step S310B, acquiring global server data with labels;
step S320B, the global server data with labels is predicted through each second local classifier model respectively, and a global data local prediction relation is obtained
Figure 290754DEST_PATH_IMAGE013
Collecting;
step S330B, performing the global data local prediction relation
Figure 985172DEST_PATH_IMAGE013
Aggregation of sets and optimization of global data local prediction relation through stochastic gradient descent algorithm
Figure 278750DEST_PATH_IMAGE013
Collecting a global loss function L of the global server data with the label until the global loss function value L is smaller than a preset second threshold value, and obtaining a global classifier model;
the global loss function L is:
Figure 991491DEST_PATH_IMAGE026
wherein the content of the first and second substances,
Figure 243481DEST_PATH_IMAGE015
for the local training data that is tagged,
Figure 425194DEST_PATH_IMAGE016
for global data, i is an index,
Figure 522463DEST_PATH_IMAGE017
sentence representation for local implication of entity information
Figure 292973DEST_PATH_IMAGE018
The predicted relationship distribution of (a) is,
Figure 450285DEST_PATH_IMAGE019
and distributing the prediction relation of the aggregated models of the second local classifier models:
Figure 368563DEST_PATH_IMAGE027
where τ is the temperature parameter used to control the distribution of the distillation, r is another index different from i,
Figure 754676DEST_PATH_IMAGE021
comprises the following steps:
Figure 707588DEST_PATH_IMAGE028
wherein the content of the first and second substances,
Figure 35801DEST_PATH_IMAGE021
is the value of the location of the label aggregated model for each second local classifier model,
Figure 175796DEST_PATH_IMAGE023
sentence representation representing local implication entity information for aggregation
Figure 83709DEST_PATH_IMAGE018
S represents a sentence, J is an index in the local classifier model, and J is the local classifier model.
9. A storage device having stored therein a plurality of programs, wherein the programs are adapted to be loaded and executed by a processor to implement the distributed language relationship recognition based on federated learning method of any of claims 1-4.
10. A processing apparatus comprising a processor adapted to execute various programs, and a storage apparatus adapted to store a plurality of programs, wherein the programs are adapted to be loaded and executed by the processor to implement the federal learning based distributed language relationship identification method as claimed in any one of claims 1 to 4.
CN202011285430.4A 2020-11-17 2020-11-17 Distributed language relationship recognition method, system and device based on federal learning Active CN112101578B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011285430.4A CN112101578B (en) 2020-11-17 2020-11-17 Distributed language relationship recognition method, system and device based on federal learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011285430.4A CN112101578B (en) 2020-11-17 2020-11-17 Distributed language relationship recognition method, system and device based on federal learning

Publications (2)

Publication Number Publication Date
CN112101578A true CN112101578A (en) 2020-12-18
CN112101578B CN112101578B (en) 2021-02-23

Family

ID=73784706

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011285430.4A Active CN112101578B (en) 2020-11-17 2020-11-17 Distributed language relationship recognition method, system and device based on federal learning

Country Status (1)

Country Link
CN (1) CN112101578B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113537509A (en) * 2021-06-28 2021-10-22 南方科技大学 Collaborative model training method and device
CN113657607A (en) * 2021-08-05 2021-11-16 浙江大学 Continuous learning method for federal learning
WO2022178719A1 (en) * 2021-02-24 2022-09-01 华为技术有限公司 Horizontal federated learning-based training method, apparatus and system
WO2022227212A1 (en) * 2021-04-25 2022-11-03 平安科技(深圳)有限公司 Federated learning-based speech representation model training method and apparatus, device, and medium
WO2023005133A1 (en) * 2021-07-28 2023-02-02 深圳前海微众银行股份有限公司 Federated learning modeling optimization method and device, and readable storage medium and program product
CN117540829A (en) * 2023-10-18 2024-02-09 广西壮族自治区通信产业服务有限公司技术服务分公司 Knowledge sharing large language model collaborative optimization method and system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103559181A (en) * 2013-11-14 2014-02-05 苏州大学 Establishment method and system for bilingual semantic relation classification model
CN108280058A (en) * 2018-01-02 2018-07-13 中国科学院自动化研究所 Relation extraction method and apparatus based on intensified learning
CN108345583A (en) * 2017-12-28 2018-07-31 中国科学院自动化研究所 Event recognition and sorting technique based on multi-lingual attention mechanism and device
US20200057810A1 (en) * 2017-12-11 2020-02-20 Abbyy Production Llc Information object extraction using combination of classifiers
CN111737552A (en) * 2020-06-04 2020-10-02 中国科学院自动化研究所 Method, device and equipment for extracting training information model and acquiring knowledge graph
CN111831829A (en) * 2020-06-12 2020-10-27 广州多益网络股份有限公司 Entity relationship extraction method and device for open domain and terminal equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103559181A (en) * 2013-11-14 2014-02-05 苏州大学 Establishment method and system for bilingual semantic relation classification model
US20200057810A1 (en) * 2017-12-11 2020-02-20 Abbyy Production Llc Information object extraction using combination of classifiers
CN108345583A (en) * 2017-12-28 2018-07-31 中国科学院自动化研究所 Event recognition and sorting technique based on multi-lingual attention mechanism and device
CN108280058A (en) * 2018-01-02 2018-07-13 中国科学院自动化研究所 Relation extraction method and apparatus based on intensified learning
CN111737552A (en) * 2020-06-04 2020-10-02 中国科学院自动化研究所 Method, device and equipment for extracting training information model and acquiring knowledge graph
CN111831829A (en) * 2020-06-12 2020-10-27 广州多益网络股份有限公司 Entity relationship extraction method and device for open domain and terminal equipment

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022178719A1 (en) * 2021-02-24 2022-09-01 华为技术有限公司 Horizontal federated learning-based training method, apparatus and system
WO2022227212A1 (en) * 2021-04-25 2022-11-03 平安科技(深圳)有限公司 Federated learning-based speech representation model training method and apparatus, device, and medium
CN113537509A (en) * 2021-06-28 2021-10-22 南方科技大学 Collaborative model training method and device
WO2023005133A1 (en) * 2021-07-28 2023-02-02 深圳前海微众银行股份有限公司 Federated learning modeling optimization method and device, and readable storage medium and program product
CN113657607A (en) * 2021-08-05 2021-11-16 浙江大学 Continuous learning method for federal learning
CN113657607B (en) * 2021-08-05 2024-03-22 浙江大学 Continuous learning method for federal learning
CN117540829A (en) * 2023-10-18 2024-02-09 广西壮族自治区通信产业服务有限公司技术服务分公司 Knowledge sharing large language model collaborative optimization method and system
CN117540829B (en) * 2023-10-18 2024-05-17 广西壮族自治区通信产业服务有限公司技术服务分公司 Knowledge sharing large language model collaborative optimization method and system

Also Published As

Publication number Publication date
CN112101578B (en) 2021-02-23

Similar Documents

Publication Publication Date Title
CN112101578B (en) Distributed language relationship recognition method, system and device based on federal learning
CN104781837B (en) System and method for forming predictions using event-based sentiment analysis
US20170323212A1 (en) Agent aptitude prediction
CN109033105A (en) The method and apparatus for obtaining judgement document's focus
CN113139628B (en) Sample image identification method, device and equipment and readable storage medium
CN109948140B (en) Word vector embedding method and device
CN111444410A (en) Associated transaction mining and identifying method and device based on knowledge graph
CN108959236A (en) Medical literature disaggregated model training method, medical literature classification method and its device
CN111428448A (en) Text generation method and device, computer equipment and readable storage medium
CN112925914B (en) Data security grading method, system, equipment and storage medium
CN107665221A (en) The sorting technique and device of keyword
CN112084342A (en) Test question generation method and device, computer equipment and storage medium
CN113850666A (en) Service scheduling method, device, equipment and storage medium
CN113505273B (en) Data sorting method, device, equipment and medium based on repeated data screening
US7644049B2 (en) Decision forest based classifier for determining predictive importance in real-time data analysis
CN116756281A (en) Knowledge question-answering method, device, equipment and medium
CN108830302B (en) Image classification method, training method, classification prediction method and related device
CN113724055B (en) Commodity attribute mining method and device
US20160162538A1 (en) Platform for consulting solution
CN115293867A (en) Financial reimbursement user portrait optimization method, device, equipment and storage medium
CN110555143A (en) Question automatic answering method and computer storage medium
CN114387088A (en) Loan risk identification method and device based on knowledge graph
CN113536111A (en) Insurance knowledge content recommendation method and device and terminal equipment
CN108241650A (en) The training method and device of training criteria for classification
CN115496638B (en) Student course score analysis management method and system based on smart campus

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant