WO2023185082A1

WO2023185082A1 - Training method and training device for language representation model

Info

Publication number: WO2023185082A1
Application number: PCT/CN2022/137523
Authority: WO
Inventors: 陶建军; 乔楠; 张雷; 苏嘉; 何彬; 沈雯
Original assignee: 华为云计算技术有限公司
Priority date: 2022-03-29
Filing date: 2022-12-08
Publication date: 2023-10-05
Also published as: CN116933789A

Abstract

Disclosed in embodiments of the present application are a training method and a training device for a language representation model, used for improving the model precision of a language representation model. The method in the embodiments of the present application comprises: acquiring first language data; performing multi-task learning on the basis of the language representation model and the first language data to obtain a target loss function, wherein the multi-task learning comprises a mask language task and a relationship classification task, the mask language task is used for executing a mask language task according to the data generated by the language representation model, and the relationship classification task is used for executing a relationship classification task on the basis of the data generated by the language representation model; and updating the language representation model according to the target loss function.

Description

A training method and training device for language representation model

This application claims the priority of the Chinese patent application submitted to the China Patent Office on March 29, 2022, with the application number "202210318687.8" and the application title "A training method and training device for a language representation model", and its entire content has been approved This reference is incorporated into this application.

Technical field

Embodiments of the present application relate to the field of artificial intelligence, and in particular, to a training method and training device for a language representation model.

Background technique

Natural language processing (NLP) is an important direction in the fields of computer science and artificial intelligence. In natural language processing technology, language representation models can represent text information in natural language as vectors, so that methods such as neural networks can be applied to text information in natural language to perform tasks, such as machine translation, sentiment analysis, and auxiliary diagnosis.

In the current training process of language representation models, the language representation models are trained based on a large amount of general natural language data. When the trained language representation models perform downstream tasks in specific fields, due to the professional language in each specific field, , if the language representation model has poor performance in representing professional languages in a specific field, the application of the trained language representation model in the specific field will be greatly limited.

Although some language representation models are currently trained based on professional languages in specific fields during the training process, due to the single training dimension of the training process of language representation models, the representation accuracy of language representation models is low for some complex language data in specific fields. , further leading to poor performance in executing downstream tasks.

Contents of the invention

Embodiments of the present application provide a training method and a training device for a language representation model, which are used to improve the accuracy of the language representation model.

The first aspect of the embodiments of the present application provides a method for training a language representation model. The method can be executed by a computing device, or by a component of the computing device, such as a processor, a chip or a chip system of the computing device. It can also be executed It is implemented by logic modules or software that can realize all or part of the functions of the computing device. Taking the execution of a computing device as an example, the method provided in the first aspect includes: the computing device obtains first language data, where the first language data includes multiple different types of language data. The computing device performs multi-task learning based on the language representation model and the first language data to obtain a target loss function. The target loss function can indicate the deviation between the output result of the language representation model and the target result. Multi-task learning includes masked language tasks and relational classification tasks. The masked language task is used to perform the masked language task based on the data generated by the language representation model, and the relational classification task is used to perform the relational classification task based on the data generated by the language representation model. The computing device updates the language representation model according to the target loss function.

In the embodiment of the present application, the computing device can perform multi-task learning based on the language representation model during the process of training the language representation model. Specifically, the language representation model is trained by performing masking language tasks and performing relationship classification tasks, so that the language representation model training process is combined The knowledge entities in the first language data and the correlation between the knowledge entities are identified, and the multi-task and multi-data training process improves the accuracy of the language representation model.

In one possible implementation, during the multi-task learning process of the computing device based on the language representation model and the first language data, the computing device performs a masked language task on the first language data based on the language representation model to obtain the first loss function. The computing device maps the association relationship of the knowledge entities in the knowledge graph to the knowledge entities in the first language data to obtain the second language data. The computing device performs a relationship classification task on the second language data based on the language representation model to obtain a second loss function. The computing device determines a target loss function based on the first loss function and the second loss function.

In the embodiment of the present application, the computing device maps the association between knowledge entities in the knowledge graph to the knowledge entities in the first language data to obtain the second language data, and performs the relationship classification task based on the second language data. Therefore, the computing device In the process of training the language representation model, the correlation between the knowledge entities in the knowledge graph is integrated, so that the language data representation of the language data in the specific field corresponding to the knowledge graph corresponding to the language representation model is more accurate, and the accuracy of the language representation model is improved.

In a possible implementation, before mapping the knowledge entity association relationship in the knowledge graph to the knowledge entity in the first language data, the computing device performs named entity recognition NER on the first language data. Specifically, the computing device performs named entity recognition on the first language data. Extract knowledge entities from the language data to obtain the knowledge entities in the first language data. The computing device can perform named entity recognition on the first language data based on the built-in NER module, or can perform named entity recognition on the first language data based on the external NER module. There is no specific limitation.

In the embodiment of the present application, before mapping the knowledge entity relationships in the knowledge graph to the knowledge entities in the first language data, the computing device must first extract the knowledge entities in the first language data to obtain the knowledge entities of the first language data. , thus improving the achievability of the solution.

In a possible implementation, in the process of mapping the knowledge entity association relationship in the knowledge graph to the knowledge entity in the first language data by the computing device, when the relationship between the knowledge entity in the knowledge graph and the knowledge entity in the first language data is When the semantic similarity exceeds the preset threshold, the computing device maps the association relationships of the knowledge entities in the knowledge graph to the knowledge entities in the first language data to obtain the second language data.

In the embodiment of the present application, before the computing device maps the association relationship in the knowledge graph to the knowledge entity in the first language data, the computing device needs to use the semantics of the knowledge entity in the knowledge graph and the semantics of the knowledge entity in the first language data. Determine whether they are the same knowledge entity, thereby improving the accuracy of the mapping process.

In a possible implementation, when the computing device performs a masking language task on the first language data, the computing device performs a masking language task on the first language data according to the masking algorithm corresponding to the first language data, and the first language The data includes one or more of the following data: natural language data, domain language data, and domain task language data. The masking algorithm includes full-word masking, entity masking, and entity masking based on word frequency and inverse document frequency.

The first language data in the embodiments of the present application includes multiple types of language data, and the computing device can perform masking language tasks on multiple types of first language data based on multiple masking algorithms, thereby improving the richness of the solution.

In a possible implementation, when the computing device performs a masking language task on the first language data according to the masking algorithm corresponding to the first language data, when the first language data is natural language data, the whole-word mask is used to mask the first language data. The algorithm performs a masked language task on first language data. When the first language data is domain language data, a masking language task is performed on the first language data according to the entity masking algorithm. When the first language data is domain task language data, the mask language task is performed on the first language data according to the entity masking algorithm based on word frequency and inverse document frequency TF-IDF.

In the embodiment of the present application, the computing device selects different masking algorithms for different types of first language data, thereby improving the accuracy of the masking task and further improving the accuracy of the language representation model.

In a possible implementation, when the first language data contains multiple types of language data at the same time, the computing device can sequentially perform masking language tasks on the multiple types of first language data.

The computing device in the embodiment of the present application can perform multi-level masking language tasks on multiple types of first language data, thereby further improving the training accuracy of the language representation model.

In a possible implementation, the domain language data includes language data of medical materials, the domain task language data includes language data of electronic medical records or language data of imaging examinations, the computing device performs downstream tasks based on the updated language representation model, and the downstream tasks Including one or more of the following tasks: electronic medical record structured tasks, assisted diagnosis, or intelligent consultation.

In the embodiment of the present application, after the computer device updates the language representation model based on the target loss function, it can perform a variety of downstream tasks based on the updated language representation model, thereby improving the performance of the computing device in performing downstream tasks.

The second aspect of the embodiment of the present application provides a training device for a language representation model, including an acquisition unit and a processing unit. Wherein, the acquisition unit is used to acquire the first language data. The processing unit is used to perform multi-task learning based on the language representation model and first language data to obtain the target loss function. The multi-task learning includes a mask language task and a relationship classification task. The mask language task is used to execute based on the data generated by the language representation model. Masked language task, the relation classification task is used to perform relation classification tasks based on the data generated by the language representation model. The processing unit is also used to update the language representation model based on the target loss function.

In a possible implementation, the processing unit is specifically configured to perform a masking language task on the first language data based on the language representation model, obtain a first loss function, and map the association relationships of the knowledge entities in the knowledge graph to the first language data. The knowledge entities in the second language data are obtained, and the relationship classification task is performed on the second language data based on the language representation model to obtain the second loss function. The target loss function is determined based on the first loss function and the second loss function.

In a possible implementation, the processing unit is also used to perform named entity recognition NER on the first language data to obtain the knowledge entities in the first language data.

In a possible implementation, the processing unit is specifically configured to map the association relationship of the knowledge entities in the knowledge map when the semantic similarity between the knowledge entities in the knowledge map and the knowledge entities in the first language data exceeds a preset threshold. to knowledge entities in first language data.

In a possible implementation, the processing unit is specifically configured to perform a masking language task on the first language data according to a masking algorithm corresponding to the first language data. The first language data includes one or more of the following data: natural language data. , domain language data and domain task language data, masking algorithms include full word masking, entity masking and entity masking based on word frequency and inverse document frequency.

In a possible implementation, the processing unit is specifically configured to perform a masking language task on the first language data according to a whole-word masking algorithm when the first language data is natural language data. When the first language data is domain language data, a masking language task is performed on the first language data according to the entity masking algorithm. When the first language data is domain task language data, the mask language task is performed on the first language data according to the entity masking algorithm based on word frequency and inverse document frequency TF-IDF.

In a possible implementation, the domain language data includes language data of medical materials, the domain task language data includes language data of electronic medical records or language data of imaging examinations, and the processing unit is also used to perform downstream tasks based on the updated language representation model. ,Downstream tasks include one or more of the following tasks: ,electronic medical record structured tasks, auxiliary diagnosis, or ,intelligent consultation.

The third aspect of the embodiment of the present application provides a computing device, including a processor, the processor is coupled to a memory, and the processor is configured to store instructions. When the instructions are executed by the processor, the computing device executes the above first aspect or the first aspect. The method described in any possible implementation manner.

The fourth aspect of the embodiments of the present application provides a computer-readable storage medium on which instructions are stored. When the instructions are executed, the computer executes the method described in the above-mentioned first aspect or any possible implementation manner of the first aspect. method.

The fifth aspect of the embodiments of the present application provides a computer program product. The computer program product includes instructions. When the instructions are executed, the computer implements the method described in the first aspect or any possible implementation manner of the first aspect.

It can be understood that the beneficial effects that can be achieved by any of the training devices, computing devices, computer-readable media or computer program products provided above can be referred to the beneficial effects in the corresponding computing system, and will not be described again here.

Description of drawings

Figure 1a is a schematic diagram of the training system architecture of a language representation model provided by an embodiment of the present application;

Figure 1b is a schematic diagram of the training process framework of a language representation model provided by an embodiment of the present application;

Figure 2 is a schematic flowchart of a training method for a language representation model provided by an embodiment of the present application;

Figure 3 is a schematic flowchart of a training method for another language representation model provided by an embodiment of the present application;

Figure 4a is a schematic diagram of performing a masked language task provided by an embodiment of the present application;

Figure 4b is another schematic diagram of performing a masked language task provided by an embodiment of the present application;

Figure 5 is a schematic diagram of performing a relationship classification task provided by an embodiment of the present application;

Figure 6 is a schematic structural diagram of a training device provided by an embodiment of the present application;

FIG. 7 is a schematic structural diagram of a computing device provided by an embodiment of the present application.

Detailed ways

The terms "first", "second", "third", "fourth", etc. (if present) in the description and claims of this application and the above-mentioned drawings are used to distinguish similar objects without necessarily using Used to describe a specific order or sequence. It is to be understood that the data so used are interchangeable under appropriate circumstances so that the embodiments described herein can be practiced in sequences other than those illustrated or described herein. In addition, the terms "including" and "having" and any variations thereof are intended to cover non-exclusive inclusions, e.g., a process, method, system, product, or apparatus that encompasses a series of steps or units and need not be limited to those explicitly listed. Those steps or elements may instead include other steps or elements not expressly listed or inherent to the process, method, product or apparatus.

In the embodiments of this application, words such as "exemplary" or "for example" are used to represent examples, illustrations or explanations. Any embodiment or design described as "exemplary" or "such as" in the embodiments of the present application is not to be construed as preferred or advantageous over other embodiments or designs. Rather, use of the words "exemplary" or "such as" is intended to present the concept in a concrete manner.

In the following, some terms used in this application are explained to facilitate understanding by those skilled in the art.

Mask refers to the technology of covering some words in the language data during the pre-training process of the language representation model.

Language model (LM) is also called language representation model. Language model is a simple, unified, abstract formal system. Language data is represented by the language model to form language data suitable for automatic processing by computing devices.

Entity is also called knowledge entity. Entity is the abstraction of objective individuals. A person, a movie, or a sentence can be regarded as an entity.

Named entity recognition (NER) refers to the identification of entities with specific meanings in text, mainly including person names, place names, organization names, proper nouns, etc. Named entity recognition is an important basic tool for information extraction, question answering systems, syntactic analysis, machine translation, knowledge graphs and other applications.

Medical knowledge graph (MKG) is a professional graph that integrates knowledge graph theory with doctors' clinical medical knowledge to connect medical knowledge points (information, data) and the internal logical mechanisms of medical knowledge.

Electronic medical records (EMR) are electronic patient records based on computer systems that provide users with access to complete and accurate data, alerts, prompts, and clinical decision support systems.

The triplet contains head entity, relationship and tail entity, where the head entity is the subject, the relationship is the object, and the tail entity is the predicate.

Natural language refers to a language that naturally evolves with culture.

Cross entropy is an important concept in Shannon information theory, which is mainly used to measure the difference information between two probability distributions.

The training method and training device of the language representation model provided by the embodiments of the present application will be introduced below with reference to the accompanying drawings.

Please refer to Figure 1a. Figure 1a is a schematic system architecture diagram of an application system in which a language representation model provided by an embodiment of the present application is located. As shown in Figure 1a, the application system 100 includes a multi-task learning module 101, a data module 102 and a downstream task module 103. The multi-task learning module 101 is used to train a language representation model by performing multiple tasks, and the data module 102 is used to Providing training data for the language representation model, the downstream task module 103 is configured to perform various downstream tasks according to the language representation model.

The multi-task learning module 101 includes a mask language task sub-module 1011 and a relationship classification task sub-module 1012. Among them, the masked language task module 1011 is used to perform the masked language task based on the language representation model. The masked language task refers to covering some words in the language data during the training process, making semantic predictions for the covered words, and the semantic prediction results are The loss function is then used to update the language representation model. The relationship classification task sub-module 1012 is used to perform a relationship classification task based on the language representation model. The relationship classification task means that during the training process, the computing device combines the knowledge graph to predict the relationship of the knowledge entities in the language data, and the loss function of the relationship prediction result is then Used to update language representation models. The multi-task learning module 101 can calculate a target loss function based on the loss function of the mask language task and the loss function of the relationship classification task, and update the language representation model based on the target loss function.

The data module 102 is used to provide training data for the language representation model. The training data of the language representation model includes natural language data, domain language data and domain task language data. The domain language data refers to the general data in a specific field, and the domain task language data refers to the language data generated by performing specific tasks in a certain field. The training data provided by the data module 102 can be used to perform named entity recognition (NER) to obtain knowledge entities, or can be used to perform knowledge extraction to obtain a knowledge graph. The NER module that performs named entity recognition in the embodiment of this application can be a built-in module in the system or an external module. The knowledge graph in the embodiments of this application can be a knowledge graph extracted from training data, or an external knowledge graph input by the user, and is not specifically limited.

The downstream task module 103 is configured to perform downstream tasks based on the trained language representation model. The trained language representation model supports a variety of downstream tasks in the field. For example, in the medical field, downstream tasks include electronic medical record structuring tasks, assisted diagnosis or intelligent consultation, etc.

Please refer to Figure 1b, which is a schematic diagram of the training framework of a language representation model provided by an embodiment of the present application. As shown in Figure 1b, the computing device outputs a language representation model after performing continuous multi-level learning based on a variety of first language data. The first language data includes natural language data, domain language data, and domain task language data.

Among them, for the domain language data and domain task language data, the computing device can perform multi-task learning based on the domain language data, domain task language data and the language representation model, and update the language representation model according to the target loss function obtained by multi-task learning.

It can be seen from the training framework shown in 1b that multi-task learning includes masked language tasks and relationship classification tasks. When the computing device performs a language masking task based on the language representation model, it selects different masking algorithms for the first language data with different characteristics. The masking algorithms include full-word masking algorithms, entity masking algorithms, and word frequency- and reverse-document-based algorithms. Entity masking algorithm for frequency TF-IDF.

In the training framework shown in Figure 1b, before the computing device performs the masking language task according to the entity masking algorithm or the entity masking algorithm based on word frequency and inverse document frequency, the computing device needs to perform named entity recognition on the first language data, and we get knowledge entity. And further calculate the word frequency and reverse document frequency for the obtained knowledge entities, and obtain the knowledge entities with high word frequency and reverse document frequency. When the computing device uses the entity masking algorithm to perform a masking language task, it will perform a masking operation on the domain language data based on the above knowledge entities. When the computing device uses an entity masking algorithm based on word frequency and inverse document frequency to perform a masking language task, the computing device will perform a masking operation on the domain task language data based on the knowledge entities with high word frequency and inverse document frequency.

In the training framework shown in Figure 1b, before performing a relationship classification task based on the language representation model, the computing device maps the entity association in the knowledge graph to the knowledge entity in the first language data based on the mapped first language. Data and language representation models perform relationship classification tasks.

Please refer to Figure 2. Figure 2 is a training method for a language representation model provided by an embodiment of the present application. The training method of the language representation model includes but is not limited to the following steps:

201. The computing device obtains first language data.

The computing device obtains first language data, and the first language data is used to train a language representation model. First language data includes multiple types of data input by users, including natural language data, domain language data, and domain task language data.

Among them, natural language data refers to language data that naturally evolves with culture, domain language data refers to data in a specific field, and domain task language data refers to data generated by performing specific tasks in a specific domain. For the medical field, domain language data is such as language data in medical materials. Domain task language data is, for example, language data generated in a hospital electronic medical record system or language data generated in a hospital imaging examination.

In the embodiment of the present application, the first language data is also used to perform named entity recognition to obtain knowledge entities. After the computing device obtains the first language data, it performs named entity recognition on the first language data based on the named entity recognition NER module to obtain the knowledge entity of the first language data. The named entity recognition module can be a built-in NER module or an external NER module. , there is no specific limit. When performing a masking language task, the computing device performs a masking operation on the first language data based on the knowledge entities of the first language data.

In a possible implementation, after the computing device obtains the first language data, it also needs to obtain a knowledge graph. The knowledge graph may be a knowledge graph generated based on the first language data, or it may be an external knowledge graph input by the user. When the knowledge graph is a knowledge graph generated based on the first language data, after the computing device obtains the first language data, it generates a knowledge graph based on the correlation between the knowledge entities in the first language data. These knowledge graphs can be used to perform relationship classification. Task.

Please refer to Figure 3, which is a schematic diagram of a training method for a language representation model provided by an embodiment of the present application. In the example shown in 3, before the computing device trains the language representation model, the computing device acquires the first language data, and the first language data is a required input for the computing device to train the language representation model.

In the example shown in Figure 3, the computing device also acquires the knowledge graph and acquires the knowledge entities based on the NER module. The acquired knowledge graph may be an external knowledge graph input by the user, and the external knowledge graph may replace the built-in knowledge graph of the computing device. The NER module that performs named entity recognition can also be an external NER module input by the user, and the external NER module can replace the internal NER module of the computing device. External knowledge graphs and external NER modules can be used as optional inputs for computing devices to train language representation models.

202. The computing device performs multi-task learning based on the first language data and the language representation model to obtain the target loss function. The multi-task learning includes masking language tasks and relationship classification tasks.

The computing device performs multi-task learning based on the first language data and the language representation model to obtain the target loss function. The multi-task learning includes masked language tasks and relational sub-tasks. Specifically, the computer performs a masking language task on the first language data based on the language representation model to obtain the first loss function. Among them, for the domain language data and the domain task language data, the computing device performs the mask language task based on the knowledge entity of the first language data and the language representation model. The knowledge entity of the first language data is that the computing device performs operations on the first language data. Knowledge entities obtained through named entity recognition.

Then, the computing device maps the association between the knowledge entities in the knowledge graph to the knowledge entities of the first language data to obtain the second language data. The computing device performs a relationship classification task on the second language data based on the language representation model to obtain the third language data. Two loss functions. Finally, the computing device determines the objective function based on the first loss function and the second loss function.

The following is a detailed introduction to the process of the computing device performing the mask language task and performing the relationship classification task in the multi-task learning process.

1. The computing device performs the masked language task.

First, the process of the computing device performing the masking language task is introduced. In the process of the computing device performing the masking language task on the first language data based on the language representation model, the computing device performs masking on the first language data through a multi-level masking algorithm. language tasks. The masking algorithm in the embodiment of this application includes full-word masking, entity masking, and entity masking based on word frequency and inverse document frequency.

In the process of the computing device performing a language masking task on the first language data through a hierarchical masking algorithm, when the first language data is natural language data, the computing device performs a language masking task on the first language data according to the whole-word masking algorithm. Task. When the first language data is domain language data, the computing device performs a masking language task on the first language data according to the entity masking algorithm. When the first language data is domain task language data, the computing device performs a mask language task on the first language data according to an entity masking algorithm based on word frequency and inverse document frequency TF-IDF.

Please refer to Figure 4a, which is a schematic diagram of a multi-level masking algorithm provided by an embodiment of the present application. As shown in Figure 4a, (a) is a schematic diagram of the whole word mask (WWM) algorithm. In the example of (a), the first language data is natural language data. In the whole word mask, In the algorithm, the covering words are common words, such as words obtained from general dictionaries or general books. Specifically, these covering words can be words obtained through Chinese word segmentation tools. For example, in Figure (a), the words masked based on the full-word masking algorithm in "Preschool ×× is the ×× component of ××education ××" are "education", "national", "embodiment" and "important" ”, this word is a common word in natural language data.

As shown in Figure 4a, (b) is a schematic diagram of the entity mask (EM) algorithm. In the example of (b), the first language data is domain language data. In the entity mask algorithm, mask The words include entity words obtained by identifying knowledge entities based on named entities, or entity words obtained by computer equipment based on knowledge entities in the knowledge graph. For example, in Figure (b), the word masked based on the entity masking algorithm in "First use ×××× anti-inflammatory drugs for treatment" is "azithromycin", and "azithromycin" is the entity word in the domain language data.

As shown in Figure 4a, (c) is a schematic diagram of the entity masking algorithm based on term frequency and inverse document frequency (term frequency-inverse document frequency, TF-IDF). In the example shown in (c), the first language data is domain task language data. In the entity masking algorithm based on TF-IDF, the masked words include filtering out the above-mentioned entity words based on the product of word frequency and inverse document frequency. entity words. Among them, term frequency TF represents the frequency of terms appearing in documents. Inverse document frequency IDF is the logarithmic value of the ratio of the total number of documents to the number of documents containing terms. Inverse document frequency represents the distinguishing ability of terms. The fewer documents containing terms, the IDF The bigger. The product of term frequency and inverse document frequency indicates the importance of the term in the document. For example, in Figure (c), the words masked based on the entity masking algorithm in "××××, 3 days, 3 days, give ××× treatment" are "repeated cough", "phlegm" and "ketofen". These words are the entity words in the domain language data.

In the example shown in Figure 4a, before the computing device performs the masking language task based on the TF-IDF entity masking algorithm, the computing device needs to calculate TF-IDF for the acquired knowledge entities, and calculate the TF-IDF based on the product of word frequency and inverse document frequency. The knowledge entities are sorted to obtain the knowledge entities sorted based on TF-IDF. The top-ranked knowledge entities are the knowledge entities with high importance. The computing device performs masking language tasks on the domain task language data based on these knowledge entities.

When the computing device performs a masking language task on the first language data through a multi-level masking algorithm, the computing device predicts the words covered by different masking algorithms based on the language representation model, and based on the predicted words and the labels of the actual masked words The cross entropy is calculated to obtain a first loss function, which can be used to update the language representation model.

It should be noted that when multiple types of data exist simultaneously in the first language data, the computing device in this embodiment of the present application can perform multi-level masking language tasks on multiple types of first language data. For example, after the computing device performs a mask language task on natural language data to obtain a first-level language representation model, it then performs a mask language task on domain language data based on the first-level language representation model to obtain a second-level language representation. model, and then perform masking language tasks on domain task language data based on the second-level language representation model to obtain a third-level language representation model. The third-level language representation model is to perform multi-level masking language tasks. The subsequent model, the training process of the multi-level language representation model is also called fine-tuning the language representation model.

Please refer to Figure 4b. Figure 4b is a schematic diagram of performing a multi-level mask language task provided by an embodiment of the present application. In the example shown in Figure 4b, the computing device sequentially performs a mask language task on the natural language data, domain language data, and domain task language data. After performing the mask language task on each language data, a fine-tuned language representation model is obtained, and then The fine-tuned language representation model is used to perform a masking language task on the next language data, and finally the language representation model after performing the multi-level masking language task is output.

In the example shown in Figure 4b, natural language data such as "Preschool education is an important part of the national education system", domain language data such as "Use azithromycin anti-inflammatory drugs for treatment first", and domain task language data such as "Repeated cough, After coughing up phlegm for 3 days, he was treated with Ketofen."

The computing device in the embodiment of the present application can combine multiple types of first language data and perform multi-level masking language tasks on multiple types of data based on the language representation model, thereby improving the training accuracy of the language representation model.

2. The computing device performs the relationship classification task.

The following describes the process of a computing device performing a relationship classification task. In the process of the computing device performing the relationship classification task based on the language representation model, the computing device maps the association relationships between the knowledge entities in the knowledge graph to the knowledge entities in the first language to obtain the second language data, then the second language data It includes head entity, tail entity and association relationship.

Specifically, in the process of mapping the association between knowledge entities in the knowledge graph to the knowledge entities in the first language by the computing device, the computing device determines the semantic similarity between the knowledge entities in the knowledge graph and the knowledge entities in the first language data. Degree is mapped, and when the semantic similarity between the knowledge entity in the knowledge graph and the knowledge entity in the first language data exceeds a preset threshold, the computing device maps the association relationship of the knowledge entity in the knowledge graph to the first language data. Knowledge entities, get second language data.

Then, the computing device performs a relationship classification task on the second language data based on the language representation model to obtain a second loss function. Specifically, the computing device obtains the representation vector of the second language data based on the Transformer module of the language representation model, and extracts the representation vectors of the head entity, tail entity and flag bits from it. For the three representation vectors, the head entity and the tail entity are output through the fully connected layer. Based on the predicted relationships and corresponding probabilities between entities, the computing device calculates a second loss function for the relationship classification task based on the predicted relationships and probabilities and the labels of the relationships and probabilities. The second loss function can be used to update the language representation model.

Please refer to FIG. 5 , which is a schematic diagram of a computing device provided by the present application for performing a relationship classification task. In the example shown in Figure 5, the computing device maps the association relationships between knowledge entities in the knowledge graph to knowledge entities in the first language language data. For example, the computing device obtains "dizziness" and "heart palpitations" from the medical knowledge graph. The association between the two knowledge entities is "accompanying", and the computing device maps the association to the knowledge entity in the second language data "repeated dizziness, sweating, and worsening heart palpitations for 3 days", that is, through <e1>, </e1> identifies the head entity "dizziness", uses <e2>, </e2> to identify the tail entity "palpitations", and identifies the second language data through the flag bit [CLS], which is used to execute Relationship classification task.

In the example shown in Figure 5, the computing device obtains the representation vector of the second language data based on the Transformer module of the language representation model, extracts the representation vector of the head entity, the tail entity and the flag bit, and splices the three representation vectors through the full process. In the connection layer, the predicted relationship between the head entity and the tail entity is obtained. The second loss function of the relationship classification task can be obtained by calculating the cross entropy based on the predicted relationship and the label of the actual relationship.

After the computing device obtains the second loss function of the relationship classification task based on the language representation model, it calculates the target loss function based on the first loss function obtained by performing the masked language task and the second loss function. Specifically, the computing device adds the first loss function and the second loss function to obtain the target loss function.

203. The computing device updates the language representation model according to the target loss function.

The computing device updates the language representation model according to the target loss function. Specifically, when the target loss function does not reach the expected value, the computing device continues to train the language representation model through multi-task learning. When the target loss function reaches the expected value, the computing device outputs the updated language. Representation model, this updated language representation model is used to perform downstream tasks.

After the computing device updates the language representation model according to the target loss function, the updated language representation model is used to perform various downstream tasks. The downstream tasks include one or more of the following tasks: electronic medical record structured tasks, assisted diagnosis, or intelligent consultation.

Please continue to refer to Figure 3. In the example shown in Figure 3, after the computing device performs multi-task learning to obtain an updated language representation model, the language representation model can be used as a basic tool to perform various downstream tasks. The downstream tasks include, for example, assisted diagnosis. , intelligent consultation and electronic medical record structuring.

It can be seen from the above embodiments that in the embodiments of the present application, the computing device can perform multi-task learning based on the language representation model during the training of the language representation model. The computing device can integrate the knowledge in the knowledge graph during the training of the language representation model. The correlation between entities makes the language data representation in a specific field corresponding to the knowledge graph of the language representation model more accurate, improving the accuracy of the language representation model.

The above describes a language representation model training method provided by the embodiment of the present application. The following describes the relevant devices involved in the embodiment of the present application.

Please refer to FIG. 6 , which is a schematic structural diagram of a training device provided by an embodiment of the present application. The training device is used to implement the steps performed by the computing device in the above method embodiment. As shown in FIG. 6 , the training device 600 includes an acquisition unit 601 and a processing unit 602.

Among them, the obtaining unit 601 is used to obtain the first language data. The processing unit 602 is used to perform multi-task learning based on the language representation model and the first language data to obtain the target loss function. The multi-task learning includes a mask language task and a relationship classification task. The mask language task is used for data generated according to the language representation model. The masked language task is performed and the relation classification task is used to perform the relation classification task based on the data generated by the language representation model. The processing unit 602 is also used to update the language representation model according to the target loss function.

In one possible implementation, the processing unit 602 is specifically configured to perform a masking language task on the first language data based on the language representation model, obtain a first loss function, and map the association relationships of the knowledge entities in the knowledge graph to the first language. Knowledge entities in the data are used to obtain second language data. Based on the language representation model, the relationship classification task is performed on the second language data to obtain the second loss function. The target loss function is determined based on the first loss function and the second loss function.

In a possible implementation, the processing unit 602 is also configured to perform named entity recognition NER on the first language data to obtain the knowledge entities in the first language data.

In one possible implementation, the processing unit 602 is specifically configured to change the association relationship of the knowledge entities in the knowledge graph when the semantic similarity between the knowledge entities in the knowledge graph and the knowledge entities in the first language data exceeds a preset threshold. Mapping to knowledge entities in first language data.

In one possible implementation, the processing unit 602 is specifically configured to perform a masking language task on the first language data according to the masking algorithm corresponding to the first language data. The first language data includes one or more of the following data: natural language Data, domain language data and domain task language data, masking algorithms include full word masking, entity masking and entity masking based on word frequency and inverse document frequency.

In one possible implementation, the processing unit 602 is specifically configured to perform a masking language task on the first language data according to a whole-word masking algorithm when the first language data is natural language data. When the first language data is domain language data, a masking language task is performed on the first language data according to the entity masking algorithm. When the first language data is domain task language data, the mask language task is performed on the first language data according to the entity masking algorithm based on word frequency and inverse document frequency TF-IDF.

It should be understood that the division of units in the above device is only a division of logical functions. In actual implementation, all or part of the units may be integrated into a physical entity or physically separated. And the units in the device can all be implemented in the form of software calling through processing components; they can also all be implemented in the form of hardware; some units can also be implemented in the form of software calling through processing components, and some units can be implemented in the form of hardware. For example, each unit can be a separate processing element, or it can be integrated and implemented in a certain chip of the device. In addition, it can also be stored in the memory in the form of a program, and a certain processing element of the device can call and execute the unit. Function. In addition, all or part of these units can be integrated together or implemented independently. The processing element described here can also be a processor, which can be an integrated circuit with signal processing capabilities. During the implementation process, each step of the above method or each unit above can be implemented by an integrated logic circuit of hardware in the processor element or implemented in the form of software calling through the processing element.

It is worth noting that for the above method embodiments, for the sake of simple description, they are all expressed as a series of action combinations. However, those skilled in the art should know that the present invention is not limited by the described action sequence. Secondly, those skilled in the art should also know that the embodiments described in the specification are all preferred embodiments, and the actions involved are not necessarily necessary for the present application.

Based on the above description, those skilled in the art can think of other reasonable step combinations, which also fall within the protection scope of the present application. Secondly, those skilled in the art should also be familiar with the fact that the embodiments described in the specification are all preferred embodiments, and the actions involved are not necessarily necessary for the present application.

Please refer to FIG. 7 , which is a schematic diagram of a computing device provided by an embodiment of the present application. As shown in Figure 7, the computing device 700 includes: a processor 710, a memory 720, and an interface 730. The processor 710, the memory 720, and the interface 730 are coupled through a bus (not labeled in the figure). The memory 720 stores instructions. When the execution instructions in the memory 720 are executed, the computing device 700 performs the method performed by the computer device in the above method embodiment.

The computing device 700 may be one or more integrated circuits configured to implement the above methods, such as one or more application specific integrated circuits (ASICs), or one or more microprocessors (digital signal processors). , DSP), or, one or more field programmable gate arrays (FPGA), or a combination of at least two of these integrated circuit forms. For another example, when the unit in the device can be implemented in the form of a processing element scheduler, the processing element can be a general processor, such as a central processing unit (Central Processing Unit, CPU) or other processors that can call programs. For another example, these units can be integrated together and implemented in the form of a system-on-a-chip (SOC).

The processor 710 can be a central processing unit (CPU), or other general-purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), or a field-programmable processor. Field programmable gate array (FPGA) or other programmable logic devices, transistor logic devices, hardware components or any combination thereof. A general-purpose processor can be a microprocessor or any conventional processor.

Memory 720 may include read-only memory and random access memory and provides instructions and data to processor 710 . Memory 720 may also include non-volatile random access memory. For example, the memory 720 may be provided with multiple partitions, and each area is used to store private keys of different software modules.

Memory 720 may be volatile memory or non-volatile memory, or may include both volatile and non-volatile memory. Among them, non-volatile memory can be read-only memory (ROM), programmable ROM (PROM), erasable programmable read-only memory (erasable PROM, EPROM), electrically removable memory. Erase electrically programmable read-only memory (EPROM, EEPROM) or flash memory. Volatile memory can be random access memory (RAM), which is used as an external cache. By way of illustration, but not limitation, many forms of RAM are available, such as static random access memory (SRAM), dynamic random access memory (DRAM), synchronous dynamic random access memory (synchronous DRAM, SDRAM), Double data rate synchronous dynamic random access memory (double data date SDRAM, DDR SDRAM), enhanced synchronous dynamic random access memory (enhanced SDRAM, ESDRAM), synchronous link dynamic random access memory (synchlink DRAM, SLDRAM) and direct Memory bus random access memory (direct rambus RAM, DR RAM).

In addition to the data bus, the bus may also include a power bus, a control bus, a status signal bus, etc. The bus can be a peripheral component interconnect express (PCIe) bus, an extended industry standard architecture (EISA) bus, a unified bus (unified bus, Ubus or UB), or a computer quick link (compute express link (CXL), cache coherent interconnect for accelerators (CCIX), etc. The bus can be divided into address bus, data bus, control bus, etc.

In another embodiment of the present application, a computer-readable storage medium is also provided. Computer-executable instructions are stored in the computer-readable storage medium. When the processor of the device executes the computer-executed instructions, the device executes the above method embodiment. A method performed by a computer device.

In another embodiment of the present application, a computer program product is also provided, the computer program product including computer execution instructions. When the processor of the device executes the computer execution instruction, the device executes the method executed by the computer device in the above method embodiment.

Those skilled in the art can clearly understand that for the convenience and simplicity of description, the specific working processes of the systems, devices and units described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be described again here.

In the several embodiments provided in this application, it should be understood that the disclosed systems, devices and methods can be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or can be integrated into another system, or some features can be ignored, or not implemented. On the other hand, the coupling or direct coupling or communication connection between each other shown or discussed may be through some interfaces, and the indirect coupling or communication connection of the devices or units may be in electrical, mechanical or other forms.

The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or they may be distributed to multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application can be integrated into one processing unit, each unit can exist physically alone, or two or more units can be integrated into one unit. The above integrated units can be implemented in the form of hardware or software functional units.

If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application is essentially or contributes to the existing technology, or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in various embodiments of this application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, read-only memory), random access memory (RAM, random access memory), magnetic disk or optical disk and other media that can store program code. .

Claims

A training method for a language representation model, which is characterized by including:

Get first language data;

Multi-task learning is performed based on the language representation model and the first language data to obtain a target loss function. The multi-task learning includes a mask language task and a relationship classification task. The mask language task is used to perform according to the language representation model. The generated data performs a masked language task, and the relationship classification task is used to perform a relationship classification task based on the data generated by the language representation model;

The language representation model is updated according to the target loss function.
The method according to claim 1, characterized in that, performing multi-task learning according to the language representation model and the first language data to obtain the target loss function includes:

Perform a masking language task on the first language data based on the language representation model to obtain a first loss function;

Map the association relationship of the knowledge entities in the knowledge graph to the knowledge entities in the first language data to obtain the second language data;

Perform a relationship classification task on the second language data based on the language representation model to obtain a second loss function;

The target loss function is determined based on the first loss function and the second loss function.
The method according to claim 2, characterized in that before mapping the knowledge entity association relationship in the knowledge graph to the knowledge entity in the first language data, the method further includes;

Perform named entity recognition NER on the first language data to obtain knowledge entities in the first language data.
The method according to claim 2 or 3, characterized in that mapping the knowledge entity association relationship in the knowledge graph to the knowledge entity in the first language data includes:

When the semantic similarity between the knowledge entity in the knowledge graph and the knowledge entity in the first language data exceeds a preset threshold, map the association relationship of the knowledge entity in the knowledge graph to the first language data knowledge entities in .
The method according to any one of claims 2 to 4, wherein performing a masking language task on the first language data includes:

Perform a masking language task on the first language data according to the masking algorithm corresponding to the first language data. The first language data includes one or more of the following data: natural language data, domain language data, and domain tasks. For language data, the masking algorithm includes full-word masking, entity masking, and entity masking based on word frequency and inverse document frequency.
The method according to claim 5, wherein performing a masking language task on the first language data according to the masking algorithm corresponding to the first language data includes:

When the first language data is natural language data, perform a masking language task on the first language data according to a whole-word masking algorithm;

When the first language data is domain language data, perform a masking language task on the first language data according to an entity masking algorithm;

When the first language data is domain task language data, a masking language task is performed on the first language data according to an entity masking algorithm based on word frequency and inverse document frequency TF-IDF.
The method according to claim 5 or 6, characterized in that the domain language data includes language data of medical materials, the domain task language data includes language data of electronic medical records or language data of imaging examinations, and the method further include:

Perform downstream tasks based on the updated language representation model, and the downstream tasks include one or more of the following tasks: electronic medical record structured tasks, assisted diagnosis, or intelligent consultation.
A training device for a language representation model, which is characterized by including:

The acquisition unit is used to acquire first language data;

A processing unit configured to perform multi-task learning based on the language representation model and the first language data to obtain a target loss function. The multi-task learning includes a masked language task and a relationship classification task, and the masked language task is used to obtain a target loss function based on the language representation model and the first language data. The data generated by the language representation model performs a masked language task, and the relationship classification task is used to perform a relationship classification task based on the data generated by the language representation model;

The processing unit is also used to update the language representation model according to the target loss function.
The device according to claim 8, characterized in that the processing unit is specifically configured to:

Perform a masking language task on the first language data based on the language representation model to obtain a first loss function;

Map the association relationship of the knowledge entities in the knowledge graph to the knowledge entities in the first language data to obtain the second language data;

Perform a relationship classification task on the second language data based on the language representation model to obtain a second loss function;

The target loss function is determined based on the first loss function and the second loss function.
The device according to claim 9, characterized in that the processing unit is also used for;

Perform named entity recognition NER on the first language data to obtain knowledge entities in the first language data.
The device according to claim 9 or 10, characterized in that the processing unit is specifically used for:

When the semantic similarity between the knowledge entity in the knowledge graph and the knowledge entity in the first language data exceeds a preset threshold, map the association relationship of the knowledge entity in the knowledge graph to the first language data knowledge entities in .
The device according to any one of claims 9 to 11, characterized in that the processing unit is specifically used for:

Perform a masking language task on the first language data according to the masking algorithm corresponding to the first language data. The first language data includes one or more of the following data: natural language data, domain language data, and domain tasks. For language data, the masking algorithm includes full-word masking, entity masking, and entity masking based on word frequency and inverse document frequency.
The device according to claim 12, characterized in that the processing unit is specifically configured to:

When the first language data is natural language data, perform a masking language task on the first language data according to a whole-word masking algorithm;

When the first language data is domain language data, perform a masking language task on the first language data according to an entity masking algorithm;

When the first language data is domain task language data, a masking language task is performed on the first language data according to an entity masking algorithm based on word frequency and inverse document frequency TF-IDF.
The device according to claim 11 or 13, wherein the domain language data includes language data of medical data, the domain task language data includes language data of electronic medical records or language data of imaging examinations, and the processing unit Also used for:

Perform downstream tasks based on the updated language representation model, and the downstream tasks include one or more of the following tasks: electronic medical record structured tasks, assisted diagnosis, or intelligent consultation.
A computing device, characterized by comprising a processor coupled to a memory, the processor configured to store instructions that, when executed by the processor, cause the computing device to perform claims The method described in any one of 1 to 7.
A computer-readable storage medium with instructions stored thereon, characterized in that when the instructions are executed, the computer executes the method described in any one of claims 1 to 7.
A computer program product, the computer program product includes instructions, characterized in that when the instructions are executed, the computer implements the method according to any one of claims 1 to 7.