CN115879467A

CN115879467A - Federal learning-based Chinese address word segmentation method and device

Info

Publication number: CN115879467A
Application number: CN202211626861.1A
Authority: CN
Inventors: 李莹; 李文龙; 金路; 鲍迪恩; 彭聪
Original assignee: Zhejiang Bangsheng Technology Co ltd
Current assignee: Zhejiang Bangsheng Technology Co ltd
Priority date: 2022-12-16
Filing date: 2022-12-16
Publication date: 2023-03-31
Anticipated expiration: 2042-12-16
Also published as: CN115879467B

Abstract

The invention discloses a Chinese address word segmentation method and a device based on federal learning, wherein a Chinese address word segmentation task is divided into an address word segmentation model and a place name classification model to be trained independently, the address word segmentation model is trained by using local address data, and the place name classification model is used for realizing multi-data-source combined training by using the federal learning technology; during decision making, the two models jointly complete the Chinese address word segmentation task. The invention firstly proposes a dual-model to complete Chinese address word segmentation task together, the address word segmentation model is trained based on local data, and the model is used for segmenting nouns of complete addresses; the place name classification model is trained by using federal learning to learn data distribution of multi-source data, and is used for grading place name nouns segmented by the address word segmentation model. Meanwhile, the federal learning process of the place name classification model is improved, and a knowledge distillation technology is introduced to accelerate training efficiency. And finally, improving the loss function of the knowledge distillation method and enhancing the effect of knowledge distillation.

Description

Federal learning-based Chinese address word segmentation method and device

Technical Field

The invention relates to the technical field of Chinese address word segmentation, in particular to a method and a device for Chinese address word segmentation based on federal learning.

Background

With the development of artificial intelligence technology and distributed computing, the situation of data islanding is increasingly serious. Modern machine learning algorithms rely on large amounts of data, especially when training deep neural models from high dimensional data such as text and images. Most data naturally comes from various organizations, the data are stored in different machines, and the data all relate to private information and do not allow private sharing and dissemination. Therefore, it is necessary to learn a well-behaved machine learning model while protecting user privacy. Federal learning, a technique for protecting data privacy, has become a new machine learning paradigm that allows models to be trained on multiple distributed organizations or servers that have local data samples without exchanging information. Meanwhile, the knowledge distillation technology is used as a model compression method, information is concentrated into a smaller model, and the method is applied to the federal learning, so that the information transmission efficiency can be improved, and the performance of the federal learning can be improved.

In a real-world scene, different banks, enterprises and other organizations hold address data of users, and the banks want to train an address word segmentation model for everyone to use, but cannot require the banks to upload own data to the cloud in a centralized manner due to laws and other reasons. Even within the same business, different departments often maintain address data locally. It has been shown that training of most language models can present legal and ethical issues, as they can reveal the user's address information in an unexpected way.

Disclosure of Invention

The invention aims to provide a Chinese address word segmentation method and device based on federal learning, aiming at overcoming the defects in the prior art, and the method and device jointly finish the training of an address word segmentation model on the premise of ensuring that the privacy of address data is not revealed by using the federal learning technology, and simultaneously ensure the training efficiency and the model precision.

The purpose of the invention is realized by the following technical scheme: in a first aspect, the invention provides a method for segmenting Chinese addresses based on federal learning, which comprises the following steps:

(1) The local address data is used for training an address word segmentation model and is responsible for segmenting an address into a plurality of place nouns, the whole training process is only carried out locally, and model parameter updating does not participate in federal learning parameter transmission and sharing;

(2) The method comprises the following steps of performing federal learning training by using a place name classification model, wherein the place name classification model is trained by using place nouns and place level label data, and the training process is realized on the basis of a transverse federal learning technology; the training of the place name classification model is completed by a local complex model and a federal simple model; each round of model training, wherein the local complex model learns the aggregation parameters of the federated simple model and the knowledge in the local training data, the federated simple model learns the knowledge of the local complex model, and the federated simple model participates in transverse federated learning and is iteratively updated until a preset termination condition is met;

(3) The trained address word segmentation model and the place name classification model jointly complete the Chinese address word segmentation task; and sending a complete Chinese address into the address word segmentation model, segmenting the address word segmentation model into a plurality of Chinese place nouns, and carrying out hierarchical decision on the place nouns by the local complex model.

Further, in the step (2), the training process of the place name classification model includes three steps:

1) The local complex model learns knowledge of the federated simple model and the local data set through knowledge distillation;

2) After learning the knowledge of the local complex model through knowledge distillation, the federated simple model is used as a participant, updated parameters are uploaded to a cooperative party for parameter aggregation, and the federated simple model participates in federated learning training;

3) And returning the aggregation parameters by the collaborators for updating the federal simple model.

Further, the loss function of the knowledge distillation method consists of a mission loss function and a distillation loss function; the mission loss function of the knowledge distillation method is

Wherein y is _i Represents a data annotation, <' > or>

Representing the prediction result of the student model; the distillation loss function of the knowledge distillation method is +>

Wherein->

Representing the prediction result of the teacher model, wherein T is a hyperparameter used for controlling the smoothness of the distillation result; the integrity loss function of the knowledge distillation method is

Wherein alpha is the learning emphasis of the hyper-parameter for controlling knowledge distillation, the fitting to the teacher model is more emphasized when alpha is larger, and the fitting to the data is emphasized when alpha is smaller.

Further, when local data knowledge is transmitted to a federal simple model by knowledge distillation aiming at the local complex model, the knowledge distillation method takes the local complex model as a teacher model and the federal simple model as a student model; and when the federal learning knowledge is transferred to the local complex model by knowledge distillation aiming at the federal simple model, the federal simple model is used as a teacher model, and the local complex model is used as a student model.

Further, in the step (1), the address word segmentation model adopts a model with information memory capacity to complete a location segmentation task, the selected model is a recurrent neural network model, a Bert model or a graph neural network, and model training is performed by using only a locally stored address segmentation labeling data set until a preset termination condition is met.

Further, in the step (2), the place name classification model uses a federal simple model to participate in federal learning, and the structure of the federal simple model of all participants in the federal learning is required to be consistent; for the local complex model to learn local data knowledge, each participant is allowed to participate in training using a different model.

Further, the federal simple model is a bidirectional cyclic neural network model, a convolutional neural network model or a hidden Markov model within 10 layers;

further, the local complex model is a Bert model or a bidirectional recurrent neural network model with a depth of more than 20 layers.

In a second aspect, the invention provides a federally-learning-based Chinese address word segmentation device, which comprises a memory and one or more processors, wherein executable codes are stored in the memory, and when the processors execute the executable codes, the federally-learning-based Chinese address word segmentation method is realized.

In a third aspect, the present invention provides a computer-readable storage medium, on which a program is stored, where the program, when executed by a processor, implements the method for chinese address word segmentation based on federal learning.

The invention has the beneficial effects that: on one hand, the traditional Chinese address word segmentation task is divided into an address word segmentation model for site word segmentation and a place name classification model for site classification, and the address word segmentation accuracy can be better improved through the model after work division. Meanwhile, in order to solve the problem that location nouns are distributed in equipment of each organization and cannot share and enhance model precision, the invention provides a federated learning method to carry out joint training on the location nouns, and address grading knowledge of each organization is extracted on the premise of protecting address data from being leaked. On the other hand, in order to ensure the performance of the federal learning and prevent the transmission bottleneck from being reached because too many updated parameters need to be transmitted, the invention provides a knowledge distillation method for refining the transmitted knowledge, and compared with the traditional federal learning method, the method has the advantages of lower transmission cost and better performance.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a flow chart of a method for segmenting words of Chinese addresses based on federal learning provided by the invention;

FIG. 2 is a flowchart of a place name classification model training in a Federal learning-based Chinese address word segmentation method according to the present invention;

FIG. 3 is a view of the federal learning architecture of a place name classification model in a Chinese address word segmentation method based on federal learning provided by the present invention;

FIG. 4 is a flow chart of an address word segmentation task in a Federal learning-based Chinese address word segmentation method according to the present invention;

fig. 5 is a structural diagram of a chinese address word segmentation apparatus based on federal learning according to the present invention.

Detailed Description

The invention is described in further detail below with reference to the figures and specific examples.

As shown in fig. 1, the method for segmenting the words of the chinese address based on the federal learning provided by the present invention comprises three steps:

(1) Firstly, local address data is used for training an address word segmentation model, the address word segmentation model is responsible for segmenting an address into a plurality of place nouns, the whole training process is only carried out locally, and model parameters are updated without participating in federal learning parameter transmission and sharing; the training data set is a complete address, and the model parameters are updated by using local data until the model converges, for example, "ancient border street text garden cells" in the western lake region of Hangzhou, zhejiang, data labels are used for specially marking the beginning and the end of each noun in the address, so that the model can identify the boundary of the place name noun. The address word segmentation model completes a place segmentation task by adopting a model with information memory capacity, the address word segmentation model can select a bidirectional cyclic neural network, a conditional random field, a Bert, a graph neural network and the like, the training task can be designed into a normal sequence-to-sequence training task, and model training is only carried out by using a locally stored address segmentation labeling data set until a preset termination condition is met.

(2) Secondly, performing federal learning training by using a place name classification model, wherein the place name classification model is trained by using place nouns and place level label data, and the training process is realized on the basis of a transverse federal learning technology; and the training of the place name classification model is completed by a local complex model and a federal simple model together. Each round of model training, wherein the local complex model learns the aggregation parameters of the federated simple model and the knowledge in the local training set, the federated simple model learns the knowledge of the local complex model, and then the federated simple model participates in the transverse federated learning and is iteratively updated until a preset termination condition is met; the specific process is as follows: 1) The local complex model learns knowledge of the federated simple model and the local data set through knowledge distillation; 2) After learning the knowledge of the local complex model through knowledge distillation, the federated simple model is used as a participant, the updated parameters are uploaded to a cooperative party for parameter aggregation, and the federated simple model participates in federated learning training; 3) And returning the aggregation parameters by the collaborators for updating the federal simple model. The place name classification model combines a plurality of data sources to learn classification of place name nouns, and address levels of the place name nouns are identified, wherein the address levels comprise province, city, district, county, street and the like.

(3) Finally, the trained address word segmentation model and the place name classification model jointly complete the Chinese address word segmentation task; and sending a complete Chinese address into the address word segmentation model, segmenting the address word segmentation model into a plurality of Chinese place nouns, and carrying out grading decision on the place nouns by the local complex model to judge the grades of provinces, cities, districts, streets and the like.

Referring to fig. 2, describing the training process of the place name classification model in step (2), first, the local complex model is trained using local address noun data, where the local address noun data includes "hang state city", "south kyo university", "beijing city", "high new district", "qinghua university", and the like. The training model is a local complex model, a model with more model parameters and more complex structure is used, and the model can select Bert, a bidirectional cyclic neural network with the depth of more than 20 layers and the like. The training process of the local complex model comprises two parts: 1. learning knowledge of the local data; 2. knowledge distillation is used to learn the knowledge of the federal simple model. When the federal learning knowledge is transferred to the local complex model through knowledge distillation aiming at the federal simple model, the federal simple model is used as a teacher model, and the local complex model is used as a student model. The integrity loss function is as follows:

wherein alpha is a hyper-parameter used for controlling the smoothness degree of the teacher model and the student model; y is _i The actual data annotation is represented by a label,

representing the prediction results of the student model, here using the local complex model prediction results as the output of the student model,

representing the prediction result of the teacher model, wherein the prediction result of the federal simple model is used as the output of the teacher model, and T is superThe parameters are used to control the smoothness of the distillation result. And when the training of the local complex model reaches a preset stopping condition, stopping the training and entering the next process, wherein the preset stopping condition can be that 10 epochs are trained or a limit value of the convergence rate of the model is reached. And the next process is to train the simple federated model by knowledge distillation, and transmit the knowledge of the local complex model to the simple federated model, wherein the simple federated model adopts a model with short-term memory and a small number of model parameters, and the selected model can be a bidirectional recurrent neural network model, a convolutional neural network model, a hidden Markov model and the like within 10 layers. The knowledge distillation method is characterized in that local data knowledge is transmitted to a federal simple model by knowledge distillation aiming at a local complex model, the local complex model is used as a teacher model, and the federal simple model is used as a student model; the integrity loss function is as follows:

wherein, alpha is a hyper-parameter used for controlling the smoothness degree of the teacher model and the student model, and y is _i The label of the real data is represented,

representing the prediction results of the student model, here using the federal simple model prediction results as the output of the student model,

and (4) representing the prediction result of the teacher model, wherein the prediction result of the local complex model is used as the output of the teacher model, and T is a hyperparameter used for controlling the smoothness of the distillation result. Of simple federal modelsAnd stopping training when the training reaches a preset stopping condition, wherein the preset stopping condition can be that 10 epochs are trained or a limit value of the convergence rate of the model is reached. The difference between the federal simple model training and the local complex model training is that the setting of the hyper-parameters alpha and T is different, the local complex model training tends to learn the knowledge of local data, and the federal simple model training focuses on learning the knowledge of the local complex model.

Furthermore, a federal simple model is used for participating in the horizontal federal learning training, the parameters of the federal simple model are updated, and horizontal federal learning strategies such as FedAvg and FedSgd can be selected for federal learning. And carrying out iterative updating in such a way until a preset termination condition is met to obtain the trained place name classification model.

As shown in fig. 3, a block diagram of a federal learning framework is presented in which the participants represent a separate entity that can provide knowledge of local data. Each participant can use a self-defined local complex model as required, federal learning does not make uniform requirements on the local complex model, and the model structure is required to be consistent only for the federal simple model. Taking the participant 1 as an example, the local complex model of the participant 1 only participates in local training and transmits knowledge to the federal simple model, the federal simple model encrypts updated parameters and uploads the updated parameters to the collaborator, and the collaborator performs parameter aggregation by using a weighted average equal algorithm and distributes the aggregated parameters to each participant. The encryption algorithm may be a homomorphic encryption algorithm, a secure aggregation algorithm, or the like.

As shown in fig. 4, the procedure of performing the chinese address word segmentation task by the federal learning-based chinese address word segmentation method is to input a complete address, for example, "ancient dangling street and flatstem cell in west lake region of hangzhou city, zhejiang province"; then, address noun word segmentation is carried out by using an address word segmentation model to obtain Zhejiang province, hangzhou city, western lake region, ancient street and flatstem milkvetch cell; then, using a place name classification model to classify the address nouns in grades, for example, classifying Zhejiang province to obtain the level of the address nouns belonging to province, and classifying West lake areas to obtain the level of the address nouns belonging to areas; and finally, outputting a Chinese address word segmentation result, wherein Zhejiang province is classified into a province level, hangzhou city is classified into a city level, west lake region is classified into a region level, gudang streets are classified into a street level, and aster cells are classified into residential regions.

Corresponding to the embodiment of the Chinese address word segmentation method based on the federal learning, the invention also provides an embodiment of a Chinese address word segmentation device based on the federal learning.

Referring to fig. 5, the device for segmenting a chinese address based on federal learning according to an embodiment of the present invention includes a memory and one or more processors, where the memory stores executable codes, and the processors execute the executable codes to implement the method for segmenting a chinese address based on federal learning in the foregoing embodiments.

The embodiment of the Chinese address word segmentation device based on federal learning can be applied to any equipment with data processing capability, such as computers and other equipment or devices. The device embodiments may be implemented by software, or by hardware, or by a combination of hardware and software. The software implementation is taken as an example, and as a logical device, the device is formed by reading corresponding computer program instructions in the nonvolatile memory into the memory for running through the processor of any device with data processing capability. In terms of hardware, as shown in fig. 5, a hardware structure diagram of any device with data processing capability where the federally learned chinese address word segmentation apparatus is located according to the present invention is shown, and in addition to the processor, the memory, the network interface, and the nonvolatile memory shown in fig. 5, any device with data processing capability where the apparatus is located in the embodiment may generally include other hardware according to the actual function of the any device with data processing capability, which is not described again.

The implementation process of the functions and actions of each unit in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.

For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on multiple network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the invention. One of ordinary skill in the art can understand and implement it without inventive effort.

An embodiment of the present invention further provides a computer-readable storage medium, on which a program is stored, and when the program is executed by a processor, the method for segmenting the chinese address based on the federal learning in the foregoing embodiments is implemented.

The computer readable storage medium may be an internal storage unit, such as a hard disk or a memory, of any data processing capability device described in any of the foregoing embodiments. The computer readable storage medium may also be any external storage device of a device with data processing capabilities, such as a plug-in hard disk, a Smart Media Card (SMC), an SD Card, a Flash memory Card (Flash Card), etc. provided on the device. Further, the computer readable storage medium may include both an internal storage unit and an external storage device of any data processing capable device. The computer-readable storage medium is used for storing the computer program and other programs and data required by the arbitrary data processing-capable device, and may also be used for temporarily storing data that has been output or is to be output.

The above-described embodiments are intended to illustrate rather than to limit the invention, and any modifications and variations of the present invention are within the spirit of the invention and the scope of the appended claims.

Claims

1. A Chinese address word segmentation method based on federal learning is characterized by comprising the following steps:

2. The method for Chinese address segmentation based on federal learning as claimed in claim 1, wherein in the step (2), the training process of the place name classification model comprises three steps:

2) After learning the knowledge of the local complex model through knowledge distillation, the federated simple model is used as a participant, the updated parameters are uploaded to a cooperative party for parameter aggregation, and the federated simple model participates in federated learning training;

3. The Federal learning-based Chinese address word segmentation method as claimed in claim 2, wherein the knowledge distilling partyThe loss function of the method consists of a mission loss function and a distillation loss function; the mission loss function of the knowledge distillation method is

Wherein y is _i Represents a data annotation, <' > or>

Representing the prediction result of the student model; the distillation loss function of the known distillation method is ≥>

Wherein->

Representing the prediction result of the teacher model, wherein T is a hyperparameter used for controlling the smoothness of the distillation result; the integrity loss function of the knowledge distillation method is ≥>

4. The method for Chinese address segmentation based on federal learning as claimed in claim 2, wherein when the local data knowledge is transmitted to the federal simple model by knowledge distillation aiming at the local complex model, the knowledge distillation method takes the local complex model as a teacher model and the federal simple model as a student model; and when the federal learning knowledge is transferred to the local complex model by knowledge distillation aiming at the federal simple model, the federal simple model is used as a teacher model, and the local complex model is used as a student model.

5. The method for Chinese address segmentation based on federal learning according to claim 1, wherein in the step (1), the address segmentation model adopts a model with information memory capacity to complete a location segmentation task, the selected model is a recurrent neural network model, a Bert model or a graph neural network, and model training is performed only by using a locally stored address segmentation tagging dataset until a preset termination condition is met.

6. The method for segmenting words into Chinese addresses based on federated learning as defined in claim 1, wherein in the step (2), the place name classification model uses a federated simple model to participate in federated learning, and the federated simple model structure of all participants in federated learning is required to be consistent; for the local complex model to learn local data knowledge, each participant is allowed to participate in training using a different model.

7. The method of claim 1, wherein the federated learning-based Chinese address word segmentation method is a bidirectional recurrent neural network model, a convolutional neural network model, or a hidden Markov model with 10 layers or less.

8. The Federal learning-based Chinese address word segmentation method according to claim 1, wherein the local complex model is a Bert model or a bidirectional recurrent neural network model with a depth exceeding 20 layers.

9. A federal learning-based chinese address participle device, comprising a memory and one or more processors, wherein the memory stores executable codes, and the processors execute the executable codes to realize the federal learning-based chinese address participle method according to any one of claims 1 to 8.

10. A computer-readable storage medium having a program stored thereon, wherein the program, when executed by a processor, implements the federal learning based chinese address tokenization method as claimed in any one of claims 1 to 8.