CN115879467B

CN115879467B - Chinese address word segmentation method and device based on federal learning

Info

Publication number: CN115879467B
Application number: CN202211626861.1A
Authority: CN
Inventors: 李莹; 李文龙; 金路; 鲍迪恩; 彭聪
Original assignee: Zhejiang Bangsheng Technology Co ltd
Current assignee: Zhejiang Bangsheng Technology Co ltd
Priority date: 2022-12-16
Filing date: 2022-12-16
Publication date: 2024-04-30
Anticipated expiration: 2042-12-16
Also published as: CN115879467A

Abstract

The invention discloses a Chinese address word segmentation method and a Chinese address word segmentation device based on federal learning, which are characterized in that a Chinese address word segmentation task is divided into an address word segmentation model and a place name classification model to be trained independently, the address word segmentation model is trained by using local address data, and the place name classification model is trained by using federal learning technology to realize multi-data source joint training; and when in decision, the two models jointly complete the Chinese address word segmentation task. The invention firstly puts forward a dual-model to jointly complete the Chinese address word segmentation task, an address word segmentation model is trained based on local data, and the model is used for segmenting nouns of complete addresses; the place name classification model is trained by using federation learning, learns the data distribution of multi-source data, and is used for classifying place name nouns segmented by the address word segmentation model. Meanwhile, the federal learning process of the place name classification model is improved, and the training efficiency is accelerated by introducing a knowledge distillation technology. And finally, improving the loss function of the knowledge distillation method and enhancing the effect of the knowledge distillation.

Description

Chinese address word segmentation method and device based on federal learning

Technical Field

The invention relates to the technical field of Chinese address word segmentation, in particular to a Chinese address word segmentation method and device based on federal learning.

Background

With the development of artificial intelligence technology and distributed computing, the situation of data islands is becoming increasingly serious. While modern machine learning algorithms rely on large amounts of data, especially when training deep neural models from high-dimensional data such as text and images. Most data, which naturally come from various organizations, are stored on different machines, and are related to private information, not allowing private sharing and dissemination. Therefore, there is a need for a machine learning model that has good learning performance while protecting user privacy. Federal learning, a technique for protecting data privacy, has become a new paradigm of machine learning in which models can be trained on multiple, decentralized institutions or servers, all of which possess local data samples, without the need to exchange information. Meanwhile, the knowledge distillation technology is used as a model compression method to concentrate information into a smaller model, and is applied to federal learning, so that the information transmission efficiency can be improved, and the federal learning performance can be improved.

In a real world scenario, organizations such as different enterprises hold address data of users, and the enterprises want to train an address word segmentation model for use by everyone, but cannot ask them to upload their own data set to the cloud. Even in the same business, different departments often keep address data local.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a Chinese address word segmentation method and device based on federal learning, which jointly complete training of an address word segmentation model on the premise of ensuring that address data privacy is not revealed by using federal learning technology, and ensure training efficiency and model precision.

The aim of the invention is realized by the following technical scheme: in a first aspect, the present invention provides a chinese address word segmentation method based on federal learning, the method comprising the steps of:

(1) Training an address word segmentation model by using local address data, and dividing the address into a plurality of place nouns, wherein the whole training process is only carried out locally, and model parameter updating does not participate in federal learning parameter transmission and sharing;

(2) Federal learning training is carried out by using a place name classification model, the place name classification model is trained by using place nouns and place level label data, and the training process is realized based on a transverse federal learning technology; training of the place name classification model is completed by a local complex model and a federal simple model together; each round of model training, the local complex model learns aggregation parameters of the federal simple model and knowledge in local training data, the federal simple model learns knowledge of the local complex model, and the federal simple model participates in transverse federal learning and iterative updating until a preset termination condition is met;

(3) The trained address word segmentation model and place name classification model jointly complete the Chinese address word segmentation task; and sending a complete Chinese address into the address word segmentation model, segmenting the address word segmentation model into a plurality of Chinese place nouns by the address word segmentation model, and carrying out hierarchical decision on each place noun by the local complex model.

Further, in the step (2), the training process of the place name classification model includes three steps:

1) The local complex model learns knowledge of the federal simple model and the local data set through knowledge distillation;

2) After learning the knowledge of the local complex model through knowledge distillation, the federal simple model serves as a participant, and updated parameters are uploaded to a cooperator for parameter aggregation to participate in federal learning training;

3) The collaborators return aggregation parameters for updating the federal simple model.

Further, the loss function of the knowledge distillation method consists of a task loss function and a distillation loss function; the task loss function of the knowledge distillation method isWhere y _i represents a data annotation,/>Representing a student model prediction result; the distillation loss function of the knowledge distillation method is/>Wherein/>The prediction result of the teacher model is represented, and T is the smoothness of the super parameter for controlling the distillation result; the complete loss function of the knowledge distillation method is Wherein alpha is the learning emphasis of the super-parameters for controlling knowledge distillation, the bigger alpha is more emphasis on fitting the teacher model, and the smaller alpha is more emphasis on fitting the data.

Further, when the local data knowledge is transmitted to the federal simple model by knowledge distillation aiming at the local complex model, the knowledge distillation method takes the local complex model as a teacher model and takes the federal simple model as a student model; when federal learning knowledge is transferred to a local complex model by knowledge distillation for the federal simple model, the federal simple model is used as a teacher model, and the local complex model is used as a student model.

Further, in the step (1), the address word segmentation model adopts a model with information memory capability to complete a site segmentation task, the selected model is a cyclic neural network model, a Bert model or a graph neural network, and model training is performed by only using a locally stored address segmentation labeling data set until a preset termination condition is met.

Further, in the step (2), the place name classification model uses a federal simple model to participate in federal learning, and the federal simple model structure of all participants in federal learning is required to be consistent; for the locally complex model of learning local data knowledge, each participant is allowed to participate in training using a different model.

Further, the federal simple model is a bidirectional cyclic neural network model, a convolutional neural network model or a hidden markov model within 10 layers;

further, the local complex model is a Bert model or a bi-directional recurrent neural network model with a depth exceeding 20 layers.

In a second aspect, the present invention provides a chinese address word segmentation device based on federal learning, including a memory and one or more processors, where the memory stores executable codes, and the processors implement the chinese address word segmentation method based on federal learning when executing the executable codes.

In a third aspect, the present invention provides a computer readable storage medium having stored thereon a program which, when executed by a processor, implements the federally learned chinese address word segmentation method.

The invention has the beneficial effects that: on one hand, the invention splits the traditional Chinese address word segmentation task into an address word segmentation model for site word segmentation and a place name classification model for site classification, and the segmented models can better improve the accuracy of the address word segmentation. Meanwhile, in order to solve the problem that the location nouns are distributed in the equipment of each organization and cannot share the enhancement model precision, the invention provides a federal learning method for carrying out joint training on the location nouns, and the address classification knowledge of each organization is extracted on the premise of protecting the address data from leakage. On the other hand, in order to ensure the performance of federal learning and prevent the transmission bottleneck from being reached because too many updated parameters need to be transmitted, the invention proposes to refine the transmitted knowledge by using a knowledge distillation method, and compared with the traditional federal learning method, the method has the advantages of lower transmission cost and better performance.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a Chinese address word segmentation method based on federal learning;

FIG. 2 is a training flow chart of a place name classification model in a Chinese address word segmentation method based on federal learning;

FIG. 3 is a diagram of a federal learning architecture of a place name classification model in a federal learning-based Chinese address word segmentation method;

FIG. 4 is a flow chart of an address word segmentation task in a Chinese address word segmentation method based on federal learning;

Fig. 5 is a block diagram of a chinese address word segmentation apparatus based on federal learning according to the present invention.

Detailed Description

The invention will be described in further detail with reference to the drawings and the specific examples.

As shown in fig. 1, the method for separating Chinese address words based on federal learning provided by the invention comprises three steps:

(1) Firstly, training an address word segmentation model by using local address data, and dividing an address into a plurality of place nouns, wherein the whole training process is only carried out locally, and model parameter updating does not participate in federal learning parameter transmission and sharing; the training data set is a complete address, local data is used for updating model parameters until the model converges, for example, a 'West lake area ancient street literature garden district in Hangzhou, zhejiang province', and the data mark is to specially mark the beginning and the end of each noun in the address, so that the model can identify the boundary of the place name noun. The address word segmentation model adopts a model with information memory capacity to complete a place segmentation task, can select a bidirectional cyclic neural network, a conditional random field, a Bert, a graph neural network and the like, can be designed into a normal sequence-to-sequence training task, and is trained by only using a locally stored address segmentation marking data set until a preset termination condition is met.

(2) Performing federal learning training by using a place name classification model, wherein the place name classification model is trained by using place nouns and place level label data, and the training process is realized based on a transverse federal learning technology; the training of the place name classification model is completed by a local complex model and a federal simple model. Each round of model training, the local complex model learns aggregation parameters of the federal simple model and knowledge in a local training set, the federal simple model learns knowledge of the local complex model, and then the federal simple model participates in transverse federal learning, and iterative updating is performed until a preset termination condition is met; the specific process is as follows: 1) The local complex model learns knowledge of the federal simple model and the local data set through knowledge distillation; 2) After learning the knowledge of the local complex model through knowledge distillation, the federal simple model serves as a participant, and updated parameters are uploaded to a cooperator for parameter aggregation to participate in federal learning training; 3) The collaborators return aggregation parameters for updating the federal simple model. The place name classification model learns the classification of place name nouns in combination with a plurality of data sources, and identifies the address level of the place name nouns, including 'province', 'city', 'district', 'county', 'street', and the like.

(3) Finally, the trained address word segmentation model and place name classification model jointly complete the Chinese address word segmentation task; and sending a complete Chinese address into the address word segmentation model, segmenting the address word segmentation model into a plurality of Chinese place nouns, and carrying out hierarchical decision on the place nouns by the local complex model to judge the levels of province, city, district, street and the like.

Referring to fig. 2, describing the training process of the place name classification model in step (2), first, the local complex model is trained using local address noun data, where the address noun data includes "hangzhou city", "nanjing university", "beijing city", "gaohen region", "qinghua university", etc. The training model is a local complex model, and the model can select Bert, a bidirectional circulating neural network with the depth exceeding 20 layers and the like by using the model with more parameters and more complex structures. The training process of the local complex model includes two parts: 1. learning knowledge of the local data; 2. knowledge distillation is used to learn knowledge of the federal simple model. When federal learning knowledge is transferred to a local complex model through knowledge distillation aiming at the federal simple model, the federal simple model is used as a teacher model, and the local complex model is used as a student model. The complete loss function is as follows:

Wherein alpha is a super parameter for controlling the smoothness degree of the teacher model and the student model; y _i represents the actual data annotation, Representing predictions of a student model, where local complex model predictions are used as outputs of the student model,/>The prediction result of the teacher model is represented, and here, the federal simple model prediction result is used as the teacher model output, and T is a super parameter for controlling the smoothness of the distillation result. When the training of the local complex model reaches a preset stopping condition, stopping training, and entering the next process, wherein the preset stopping condition can be training 10 epochs or reaching the limit value of the model convergence rate. The next process is to train the federal simple model by using knowledge distillation, transmit the knowledge of the local complex model to the federal simple model, and the federal simple model adopts a model with short-term memory capacity and a small number of model parameters, and the selected model can be a bidirectional cyclic neural network model, a convolutional neural network model, a hidden Markov model and the like within 10 layers. Transmitting local data knowledge to a federal simple model through knowledge distillation aiming at a local complex model, wherein the knowledge distillation method takes the local complex model as a teacher model and the federal simple model as a student model; the complete loss function is as follows:

Wherein alpha is a super parameter for controlling the smoothness degree of the teacher model and the student model, y _i represents real data annotation, Representing the prediction results of the student model, here using the federal simple model prediction results as the output of the student model,The prediction result of the teacher model is represented, where the local complex model prediction result is used as the output of the teacher model, and T is the hyper-parameter used to control the smoothness of the distillation result. Training of the federal simple model is stopped when a preset stopping condition is reached, which may be training 10 epochs or reaching a threshold value for the model convergence rate. The difference between federal simple model training and local complex model training is that the settings of hyper-parameters alpha and T are different, and the local complex model training tends to learn the knowledge of local data more, and the federal simple model training focuses more on learning the knowledge of the local complex model.

Furthermore, the federal simple model is used for participating in the transverse federal learning training, the parameters of the federal simple model are updated, and the federal learning can select FedAvg, fedSgd and other transverse federal learning strategies. And (5) carrying out iterative updating until a preset termination condition is met, and obtaining the trained place name classification model.

As shown in fig. 3, a block diagram of a federal learning framework is shown in which participants represent a self-contained entity that can provide local data knowledge. Each participant can use a customized local complex model according to the needs, federal learning does not make uniform requirements on the local complex model, and only requires a consistent model structure for the union simple model. Taking the participant 1 as an example, the local complex model of the participant 1 only participates in local training and transmits knowledge to the federal simple model, the federal simple model encrypts updated parameters and uploads the updated parameters to the cooperators, and the cooperators use a weighted average algorithm to perform parameter aggregation and distribute aggregated parameters to all the participants. The encryption algorithm may be a homomorphic encryption algorithm, a secure aggregation algorithm, or the like.

As shown in fig. 4, the step flow of performing the task of chinese address word segmentation based on the chinese address word segmentation method of federal learning is that a complete address, for example, "a street literature garden district of a western lake area of hangzhou city in Zhejiang province" is input first; then, address noun word segmentation is carried out by using an address word segmentation model to obtain Zhejiang province, hangzhou city, western lake region, ancient street, literature garden district; then, classifying the address nouns by using a place name classification model, for example, classifying the address nouns to the class of Zhejiang province, and classifying the address nouns to the class of West lake region, and obtaining the class of Zhejiang province; and finally, outputting a result of Chinese address word segmentation, wherein Zhejiang province is classified into a province level, hangzhou city is classified into a city level, west lake area is classified into a district level, ancient street is classified into a street level, and literature garden district is classified into a residential area.

Corresponding to the embodiment of the Chinese address word segmentation method based on the federal learning, the invention also provides the embodiment of the Chinese address word segmentation device based on the federal learning.

Referring to fig. 5, the chinese address word segmentation device based on federation learning provided by the embodiment of the present invention includes a memory and one or more processors, where the memory stores executable codes, and the processors are configured to implement the chinese address word segmentation method based on federation learning in the above embodiment when executing the executable codes.

The embodiment of the Chinese address word segmentation device based on federal learning can be applied to any device with data processing capability, and the device with data processing capability can be a device or a device such as a computer. The apparatus embodiments may be implemented by software, or may be implemented by hardware or a combination of hardware and software. Taking software implementation as an example, the device in a logic sense is formed by reading corresponding computer program instructions in a nonvolatile memory into a memory by a processor of any device with data processing capability. In terms of hardware, as shown in fig. 5, a hardware structure diagram of an apparatus with any data processing capability where the chinese address word segmentation apparatus based on federal learning of the present invention is located is shown in fig. 5, and in addition to the processor, the memory, the network interface, and the nonvolatile memory shown in fig. 5, the apparatus with any data processing capability where the apparatus is located in the embodiment generally includes other hardware according to the actual function of the apparatus with any data processing capability, which is not described herein again.

The implementation process of the functions and roles of each unit in the above device is specifically shown in the implementation process of the corresponding steps in the above method, and will not be described herein again.

For the device embodiments, reference is made to the description of the method embodiments for the relevant points, since they essentially correspond to the method embodiments. The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purposes of the present invention. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

The embodiment of the invention also provides a computer readable storage medium, wherein a program is stored on the computer readable storage medium, and when the program is executed by a processor, the Chinese address word segmentation method based on federal learning in the embodiment is realized.

The computer readable storage medium may be an internal storage unit, such as a hard disk or a memory, of any of the data processing enabled devices described in any of the previous embodiments. The computer readable storage medium may also be an external storage device of any device having data processing capabilities, such as a plug-in hard disk, a smart memory card (SMART MEDIA CARD, SMC), an SD card, a flash memory card (FLASH CARD), etc. provided on the device. Further, the computer readable storage medium may include both internal storage units and external storage devices of any data processing device. The computer readable storage medium is used for storing the computer program and other programs and data required by the arbitrary data processing apparatus, and may also be used for temporarily storing data that has been output or is to be output.

The above-described embodiments are intended to illustrate the present invention, not to limit it, and any modifications and variations made thereto are within the spirit of the invention and the scope of the appended claims.

Claims

1. A Chinese address word segmentation method based on federal learning is characterized by comprising the following steps:

(2) Federal learning training is carried out by using a place name classification model, the place name classification model is trained by using place nouns and place level label data, and the training process is realized based on a transverse federal learning technology; training of the place name classification model is completed by a local complex model and a federal simple model together; each round of model training, the local complex model learns aggregation parameters of the federal simple model and knowledge in local training data, the federal simple model learns knowledge of the local complex model, and the federal simple model participates in transverse federal learning and iterative updating until a preset termination condition is met; the training process of the place name classification model comprises three steps:

1) The local complex model learns knowledge of the federal simple model and the local data set through knowledge distillation; the loss function of the knowledge distillation method consists of a task loss function and a distillation loss function; the task loss function of the knowledge distillation method is Where y _i represents a data annotation,/>Representing a student model prediction result; the distillation loss function of the knowledge distillation method is/>Wherein/>The prediction result of the teacher model is represented, and T is the smoothness of the super parameter for controlling the distillation result; the complete loss function of the knowledge distillation method is/>Wherein alpha is the learning emphasis of the super-parameters for controlling knowledge distillation, the bigger alpha is more emphasis on fitting the teacher model, and the smaller alpha is more emphasis on fitting the data; when local data knowledge is transmitted to a federal simple model through knowledge distillation aiming at the local complex model, the knowledge distillation method takes the local complex model as a teacher model, and the federal simple model as a student model; when federal learning knowledge is transferred to a local complex model by knowledge distillation aiming at the federal simple model, the federal simple model is used as a teacher model, and the local complex model is used as a student model;

3) The cooperator returns aggregation parameters for updating the federal simple model;

2. The method of claim 1, wherein in the step (1), the address word segmentation model adopts a model with information memory capability to complete the place segmentation task, the selected model is a cyclic neural network model, a Bert model or a graphic neural network, and the model training is performed only by using a locally stored address segmentation labeling data set until a preset termination condition is met.

3. The method of claim 1, wherein in the step (2), the place name classification model uses a federal simple model to participate in federal learning, and requires that federal simple model structures of all participants in federal learning must be consistent; for the locally complex model of learning local data knowledge, each participant is allowed to participate in training using a different model.

4. The federal learning-based chinese address word segmentation method of claim 1, wherein the federal simple model is a bi-directional cyclic neural network model, a convolutional neural network model, or a hidden markov model within 10 layers.

5. The federally learned chinese address word segmentation method of claim 1, wherein the local complex model is a Bert model or a bi-directional recurrent neural network model with a depth exceeding 20 layers.

6. A chinese address word segmentation device based on federal learning, comprising a memory and one or more processors, wherein the memory stores executable code, and wherein the processor implements the chinese address word segmentation method based on federal learning of any one of claims 1-5 when executing the executable code.

7. A computer-readable storage medium having a program stored thereon, which when executed by a processor, implements a federally learning-based chinese address word segmentation method according to any one of claims 1-5.