WO2023185082A1 - Training method and training device for language representation model - Google Patents

Training method and training device for language representation model Download PDF

Info

Publication number
WO2023185082A1
WO2023185082A1 PCT/CN2022/137523 CN2022137523W WO2023185082A1 WO 2023185082 A1 WO2023185082 A1 WO 2023185082A1 CN 2022137523 W CN2022137523 W CN 2022137523W WO 2023185082 A1 WO2023185082 A1 WO 2023185082A1
Authority
WO
WIPO (PCT)
Prior art keywords
language
language data
task
data
masking
Prior art date
Application number
PCT/CN2022/137523
Other languages
French (fr)
Chinese (zh)
Inventor
陶建军
乔楠
张雷
苏嘉
何彬
沈雯
Original Assignee
华为云计算技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为云计算技术有限公司 filed Critical 华为云计算技术有限公司
Publication of WO2023185082A1 publication Critical patent/WO2023185082A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/186Templates

Definitions

  • Embodiments of the present application relate to the field of artificial intelligence, and in particular, to a training method and training device for a language representation model.
  • Natural language processing is an important direction in the fields of computer science and artificial intelligence.
  • language representation models can represent text information in natural language as vectors, so that methods such as neural networks can be applied to text information in natural language to perform tasks, such as machine translation, sentiment analysis, and auxiliary diagnosis.
  • the language representation models are trained based on a large amount of general natural language data.
  • the trained language representation models perform downstream tasks in specific fields, due to the professional language in each specific field, , if the language representation model has poor performance in representing professional languages in a specific field, the application of the trained language representation model in the specific field will be greatly limited.
  • Embodiments of the present application provide a training method and a training device for a language representation model, which are used to improve the accuracy of the language representation model.
  • the first aspect of the embodiments of the present application provides a method for training a language representation model.
  • the method can be executed by a computing device, or by a component of the computing device, such as a processor, a chip or a chip system of the computing device. It can also be executed It is implemented by logic modules or software that can realize all or part of the functions of the computing device.
  • the method provided in the first aspect includes: the computing device obtains first language data, where the first language data includes multiple different types of language data.
  • the computing device performs multi-task learning based on the language representation model and the first language data to obtain a target loss function.
  • the target loss function can indicate the deviation between the output result of the language representation model and the target result.
  • Multi-task learning includes masked language tasks and relational classification tasks.
  • the masked language task is used to perform the masked language task based on the data generated by the language representation model
  • the relational classification task is used to perform the relational classification task based on the data generated by the language representation model.
  • the computing device updates the language representation model according to the target loss function.
  • the computing device can perform multi-task learning based on the language representation model during the process of training the language representation model.
  • the language representation model is trained by performing masking language tasks and performing relationship classification tasks, so that the language representation model training process is combined
  • the knowledge entities in the first language data and the correlation between the knowledge entities are identified, and the multi-task and multi-data training process improves the accuracy of the language representation model.
  • the computing device performs a masked language task on the first language data based on the language representation model to obtain the first loss function.
  • the computing device maps the association relationship of the knowledge entities in the knowledge graph to the knowledge entities in the first language data to obtain the second language data.
  • the computing device performs a relationship classification task on the second language data based on the language representation model to obtain a second loss function.
  • the computing device determines a target loss function based on the first loss function and the second loss function.
  • the computing device maps the association between knowledge entities in the knowledge graph to the knowledge entities in the first language data to obtain the second language data, and performs the relationship classification task based on the second language data. Therefore, the computing device In the process of training the language representation model, the correlation between the knowledge entities in the knowledge graph is integrated, so that the language data representation of the language data in the specific field corresponding to the knowledge graph corresponding to the language representation model is more accurate, and the accuracy of the language representation model is improved.
  • the computing device before mapping the knowledge entity association relationship in the knowledge graph to the knowledge entity in the first language data, the computing device performs named entity recognition NER on the first language data. Specifically, the computing device performs named entity recognition on the first language data. Extract knowledge entities from the language data to obtain the knowledge entities in the first language data. The computing device can perform named entity recognition on the first language data based on the built-in NER module, or can perform named entity recognition on the first language data based on the external NER module. There is no specific limitation.
  • the computing device before mapping the knowledge entity relationships in the knowledge graph to the knowledge entities in the first language data, the computing device must first extract the knowledge entities in the first language data to obtain the knowledge entities of the first language data. , thus improving the achievability of the solution.
  • the computing device maps the association relationships of the knowledge entities in the knowledge graph to the knowledge entities in the first language data to obtain the second language data.
  • the computing device before the computing device maps the association relationship in the knowledge graph to the knowledge entity in the first language data, the computing device needs to use the semantics of the knowledge entity in the knowledge graph and the semantics of the knowledge entity in the first language data. Determine whether they are the same knowledge entity, thereby improving the accuracy of the mapping process.
  • the computing device when the computing device performs a masking language task on the first language data, the computing device performs a masking language task on the first language data according to the masking algorithm corresponding to the first language data, and the first language
  • the data includes one or more of the following data: natural language data, domain language data, and domain task language data.
  • the masking algorithm includes full-word masking, entity masking, and entity masking based on word frequency and inverse document frequency.
  • the first language data in the embodiments of the present application includes multiple types of language data, and the computing device can perform masking language tasks on multiple types of first language data based on multiple masking algorithms, thereby improving the richness of the solution.
  • the computing device when the computing device performs a masking language task on the first language data according to the masking algorithm corresponding to the first language data, when the first language data is natural language data, the whole-word mask is used to mask the first language data.
  • the algorithm performs a masked language task on first language data.
  • a masking language task is performed on the first language data according to the entity masking algorithm.
  • the mask language task is performed on the first language data according to the entity masking algorithm based on word frequency and inverse document frequency TF-IDF.
  • the computing device selects different masking algorithms for different types of first language data, thereby improving the accuracy of the masking task and further improving the accuracy of the language representation model.
  • the computing device can sequentially perform masking language tasks on the multiple types of first language data.
  • the computing device in the embodiment of the present application can perform multi-level masking language tasks on multiple types of first language data, thereby further improving the training accuracy of the language representation model.
  • the domain language data includes language data of medical materials
  • the domain task language data includes language data of electronic medical records or language data of imaging examinations
  • the computing device performs downstream tasks based on the updated language representation model, and the downstream tasks Including one or more of the following tasks: electronic medical record structured tasks, assisted diagnosis, or intelligent consultation.
  • the computer device after the computer device updates the language representation model based on the target loss function, it can perform a variety of downstream tasks based on the updated language representation model, thereby improving the performance of the computing device in performing downstream tasks.
  • the second aspect of the embodiment of the present application provides a training device for a language representation model, including an acquisition unit and a processing unit.
  • the acquisition unit is used to acquire the first language data.
  • the processing unit is used to perform multi-task learning based on the language representation model and first language data to obtain the target loss function.
  • the multi-task learning includes a mask language task and a relationship classification task.
  • the mask language task is used to execute based on the data generated by the language representation model.
  • Masked language task the relation classification task is used to perform relation classification tasks based on the data generated by the language representation model.
  • the processing unit is also used to update the language representation model based on the target loss function.
  • the processing unit is specifically configured to perform a masking language task on the first language data based on the language representation model, obtain a first loss function, and map the association relationships of the knowledge entities in the knowledge graph to the first language data.
  • the knowledge entities in the second language data are obtained, and the relationship classification task is performed on the second language data based on the language representation model to obtain the second loss function.
  • the target loss function is determined based on the first loss function and the second loss function.
  • the processing unit is also used to perform named entity recognition NER on the first language data to obtain the knowledge entities in the first language data.
  • the processing unit is specifically configured to map the association relationship of the knowledge entities in the knowledge map when the semantic similarity between the knowledge entities in the knowledge map and the knowledge entities in the first language data exceeds a preset threshold. to knowledge entities in first language data.
  • the processing unit is specifically configured to perform a masking language task on the first language data according to a masking algorithm corresponding to the first language data.
  • the first language data includes one or more of the following data: natural language data. , domain language data and domain task language data, masking algorithms include full word masking, entity masking and entity masking based on word frequency and inverse document frequency.
  • the processing unit is specifically configured to perform a masking language task on the first language data according to a whole-word masking algorithm when the first language data is natural language data.
  • a masking language task is performed on the first language data according to the entity masking algorithm.
  • the mask language task is performed on the first language data according to the entity masking algorithm based on word frequency and inverse document frequency TF-IDF.
  • the domain language data includes language data of medical materials
  • the domain task language data includes language data of electronic medical records or language data of imaging examinations
  • the processing unit is also used to perform downstream tasks based on the updated language representation model.
  • ,Downstream tasks include one or more of the following tasks: ,electronic medical record structured tasks, auxiliary diagnosis, or ,intelligent consultation.
  • the third aspect of the embodiment of the present application provides a computing device, including a processor, the processor is coupled to a memory, and the processor is configured to store instructions. When the instructions are executed by the processor, the computing device executes the above first aspect or the first aspect. The method described in any possible implementation manner.
  • the fourth aspect of the embodiments of the present application provides a computer-readable storage medium on which instructions are stored.
  • the computer executes the method described in the above-mentioned first aspect or any possible implementation manner of the first aspect. method.
  • the fifth aspect of the embodiments of the present application provides a computer program product.
  • the computer program product includes instructions. When the instructions are executed, the computer implements the method described in the first aspect or any possible implementation manner of the first aspect.
  • Figure 1a is a schematic diagram of the training system architecture of a language representation model provided by an embodiment of the present application
  • Figure 1b is a schematic diagram of the training process framework of a language representation model provided by an embodiment of the present application
  • Figure 2 is a schematic flowchart of a training method for a language representation model provided by an embodiment of the present application
  • Figure 3 is a schematic flowchart of a training method for another language representation model provided by an embodiment of the present application
  • Figure 4a is a schematic diagram of performing a masked language task provided by an embodiment of the present application.
  • Figure 4b is another schematic diagram of performing a masked language task provided by an embodiment of the present application.
  • Figure 5 is a schematic diagram of performing a relationship classification task provided by an embodiment of the present application.
  • Figure 6 is a schematic structural diagram of a training device provided by an embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of a computing device provided by an embodiment of the present application.
  • Embodiments of the present application provide a training method and a training device for a language representation model, which are used to improve the accuracy of the language representation model.
  • Mask refers to the technology of covering some words in the language data during the pre-training process of the language representation model.
  • Language model is also called language representation model.
  • Language model is a simple, unified, abstract formal system.
  • Language data is represented by the language model to form language data suitable for automatic processing by computing devices.
  • Entity is also called knowledge entity. Entity is the abstraction of objective individuals. A person, a movie, or a sentence can be regarded as an entity.
  • Named entity recognition refers to the identification of entities with specific meanings in text, mainly including person names, place names, organization names, proper nouns, etc. Named entity recognition is an important basic tool for information extraction, question answering systems, syntactic analysis, machine translation, knowledge graphs and other applications.
  • Medical knowledge graph is a professional graph that integrates knowledge graph theory with doctors' clinical medical knowledge to connect medical knowledge points (information, data) and the internal logical mechanisms of medical knowledge.
  • EMR Electronic medical records
  • the triplet contains head entity, relationship and tail entity, where the head entity is the subject, the relationship is the object, and the tail entity is the predicate.
  • Natural language refers to a language that naturally evolves with culture.
  • Cross entropy is an important concept in Shannon information theory, which is mainly used to measure the difference information between two probability distributions.
  • Figure 1a is a schematic system architecture diagram of an application system in which a language representation model provided by an embodiment of the present application is located.
  • the application system 100 includes a multi-task learning module 101, a data module 102 and a downstream task module 103.
  • the multi-task learning module 101 is used to train a language representation model by performing multiple tasks
  • the data module 102 is used to Providing training data for the language representation model
  • the downstream task module 103 is configured to perform various downstream tasks according to the language representation model.
  • the multi-task learning module 101 includes a mask language task sub-module 1011 and a relationship classification task sub-module 1012.
  • the masked language task module 1011 is used to perform the masked language task based on the language representation model.
  • the masked language task refers to covering some words in the language data during the training process, making semantic predictions for the covered words, and the semantic prediction results are The loss function is then used to update the language representation model.
  • the relationship classification task sub-module 1012 is used to perform a relationship classification task based on the language representation model.
  • the relationship classification task means that during the training process, the computing device combines the knowledge graph to predict the relationship of the knowledge entities in the language data, and the loss function of the relationship prediction result is then Used to update language representation models.
  • the multi-task learning module 101 can calculate a target loss function based on the loss function of the mask language task and the loss function of the relationship classification task, and update the language representation model based on the target loss function.
  • the data module 102 is used to provide training data for the language representation model.
  • the training data of the language representation model includes natural language data, domain language data and domain task language data.
  • the domain language data refers to the general data in a specific field
  • the domain task language data refers to the language data generated by performing specific tasks in a certain field.
  • the training data provided by the data module 102 can be used to perform named entity recognition (NER) to obtain knowledge entities, or can be used to perform knowledge extraction to obtain a knowledge graph.
  • NER named entity recognition
  • the NER module that performs named entity recognition in the embodiment of this application can be a built-in module in the system or an external module.
  • the knowledge graph in the embodiments of this application can be a knowledge graph extracted from training data, or an external knowledge graph input by the user, and is not specifically limited.
  • the downstream task module 103 is configured to perform downstream tasks based on the trained language representation model.
  • the trained language representation model supports a variety of downstream tasks in the field. For example, in the medical field, downstream tasks include electronic medical record structuring tasks, assisted diagnosis or intelligent consultation, etc.
  • Figure 1b is a schematic diagram of the training framework of a language representation model provided by an embodiment of the present application.
  • the computing device outputs a language representation model after performing continuous multi-level learning based on a variety of first language data.
  • the first language data includes natural language data, domain language data, and domain task language data.
  • the computing device can perform multi-task learning based on the domain language data, domain task language data and the language representation model, and update the language representation model according to the target loss function obtained by multi-task learning.
  • multi-task learning includes masked language tasks and relationship classification tasks.
  • the computing device performs a language masking task based on the language representation model, it selects different masking algorithms for the first language data with different characteristics.
  • the masking algorithms include full-word masking algorithms, entity masking algorithms, and word frequency- and reverse-document-based algorithms. Entity masking algorithm for frequency TF-IDF.
  • the computing device before the computing device performs the masking language task according to the entity masking algorithm or the entity masking algorithm based on word frequency and inverse document frequency, the computing device needs to perform named entity recognition on the first language data, and we get knowledge entity. And further calculate the word frequency and reverse document frequency for the obtained knowledge entities, and obtain the knowledge entities with high word frequency and reverse document frequency.
  • the computing device uses the entity masking algorithm to perform a masking language task, it will perform a masking operation on the domain language data based on the above knowledge entities.
  • the computing device uses an entity masking algorithm based on word frequency and inverse document frequency to perform a masking language task, the computing device will perform a masking operation on the domain task language data based on the knowledge entities with high word frequency and inverse document frequency.
  • the computing device maps the entity association in the knowledge graph to the knowledge entity in the first language data based on the mapped first language.
  • Data and language representation models perform relationship classification tasks.
  • Figure 2 is a training method for a language representation model provided by an embodiment of the present application.
  • the training method of the language representation model includes but is not limited to the following steps:
  • the computing device obtains first language data.
  • the computing device obtains first language data, and the first language data is used to train a language representation model.
  • First language data includes multiple types of data input by users, including natural language data, domain language data, and domain task language data.
  • natural language data refers to language data that naturally evolves with culture
  • domain language data refers to data in a specific field
  • domain task language data refers to data generated by performing specific tasks in a specific domain.
  • domain language data is such as language data in medical materials.
  • Domain task language data is, for example, language data generated in a hospital electronic medical record system or language data generated in a hospital imaging examination.
  • the first language data is also used to perform named entity recognition to obtain knowledge entities.
  • the computing device After the computing device obtains the first language data, it performs named entity recognition on the first language data based on the named entity recognition NER module to obtain the knowledge entity of the first language data.
  • the named entity recognition module can be a built-in NER module or an external NER module. , there is no specific limit.
  • the computing device When performing a masking language task, the computing device performs a masking operation on the first language data based on the knowledge entities of the first language data.
  • the computing device after the computing device obtains the first language data, it also needs to obtain a knowledge graph.
  • the knowledge graph may be a knowledge graph generated based on the first language data, or it may be an external knowledge graph input by the user.
  • the knowledge graph is a knowledge graph generated based on the first language data
  • after the computing device obtains the first language data it generates a knowledge graph based on the correlation between the knowledge entities in the first language data.
  • Figure 3 is a schematic diagram of a training method for a language representation model provided by an embodiment of the present application.
  • the computing device acquires the first language data, and the first language data is a required input for the computing device to train the language representation model.
  • the computing device also acquires the knowledge graph and acquires the knowledge entities based on the NER module.
  • the acquired knowledge graph may be an external knowledge graph input by the user, and the external knowledge graph may replace the built-in knowledge graph of the computing device.
  • the NER module that performs named entity recognition can also be an external NER module input by the user, and the external NER module can replace the internal NER module of the computing device.
  • External knowledge graphs and external NER modules can be used as optional inputs for computing devices to train language representation models.
  • the computing device performs multi-task learning based on the first language data and the language representation model to obtain the target loss function.
  • the multi-task learning includes masking language tasks and relationship classification tasks.
  • the computing device performs multi-task learning based on the first language data and the language representation model to obtain the target loss function.
  • the multi-task learning includes masked language tasks and relational sub-tasks.
  • the computer performs a masking language task on the first language data based on the language representation model to obtain the first loss function.
  • the computing device performs the mask language task based on the knowledge entity of the first language data and the language representation model.
  • the knowledge entity of the first language data is that the computing device performs operations on the first language data. Knowledge entities obtained through named entity recognition.
  • the computing device maps the association between the knowledge entities in the knowledge graph to the knowledge entities of the first language data to obtain the second language data.
  • the computing device performs a relationship classification task on the second language data based on the language representation model to obtain the third language data.
  • the computing device determines the objective function based on the first loss function and the second loss function.
  • the computing device performs the masked language task.
  • the process of the computing device performing the masking language task is introduced.
  • the computing device performs masking on the first language data through a multi-level masking algorithm.
  • language tasks The masking algorithm in the embodiment of this application includes full-word masking, entity masking, and entity masking based on word frequency and inverse document frequency.
  • the computing device performs a language masking task on the first language data according to the whole-word masking algorithm.
  • Task When the first language data is domain language data, the computing device performs a masking language task on the first language data according to the entity masking algorithm.
  • the first language data is domain task language data, the computing device performs a mask language task on the first language data according to an entity masking algorithm based on word frequency and inverse document frequency TF-IDF.
  • FIG 4a is a schematic diagram of a multi-level masking algorithm provided by an embodiment of the present application.
  • (a) is a schematic diagram of the whole word mask (WWM) algorithm.
  • the first language data is natural language data.
  • the covering words are common words, such as words obtained from general dictionaries or general books. Specifically, these covering words can be words obtained through Chinese word segmentation tools.
  • (b) is a schematic diagram of the entity mask (EM) algorithm.
  • the first language data is domain language data.
  • mask The words include entity words obtained by identifying knowledge entities based on named entities, or entity words obtained by computer equipment based on knowledge entities in the knowledge graph.
  • the word masked based on the entity masking algorithm in "First use ⁇ anti-inflammatory drugs for treatment” is "azithromycin", and "azithromycin” is the entity word in the domain language data.
  • (c) is a schematic diagram of the entity masking algorithm based on term frequency and inverse document frequency (term frequency-inverse document frequency, TF-IDF).
  • the first language data is domain task language data.
  • the masked words include filtering out the above-mentioned entity words based on the product of word frequency and inverse document frequency.
  • entity words include filtering out the above-mentioned entity words based on the product of word frequency and inverse document frequency.
  • entity words include filtering out the above-mentioned entity words based on the product of word frequency and inverse document frequency.
  • entity words Among them, term frequency TF represents the frequency of terms appearing in documents.
  • Inverse document frequency IDF is the logarithmic value of the ratio of the total number of documents to the number of documents containing terms. Inverse document frequency represents the distinguishing ability of terms. The fewer documents containing terms, the IDF The bigger.
  • the computing device before the computing device performs the masking language task based on the TF-IDF entity masking algorithm, the computing device needs to calculate TF-IDF for the acquired knowledge entities, and calculate the TF-IDF based on the product of word frequency and inverse document frequency.
  • the knowledge entities are sorted to obtain the knowledge entities sorted based on TF-IDF.
  • the top-ranked knowledge entities are the knowledge entities with high importance.
  • the computing device performs masking language tasks on the domain task language data based on these knowledge entities.
  • the computing device When the computing device performs a masking language task on the first language data through a multi-level masking algorithm, the computing device predicts the words covered by different masking algorithms based on the language representation model, and based on the predicted words and the labels of the actual masked words The cross entropy is calculated to obtain a first loss function, which can be used to update the language representation model.
  • the computing device in this embodiment of the present application can perform multi-level masking language tasks on multiple types of first language data. For example, after the computing device performs a mask language task on natural language data to obtain a first-level language representation model, it then performs a mask language task on domain language data based on the first-level language representation model to obtain a second-level language representation. model, and then perform masking language tasks on domain task language data based on the second-level language representation model to obtain a third-level language representation model.
  • the third-level language representation model is to perform multi-level masking language tasks.
  • the subsequent model, the training process of the multi-level language representation model is also called fine-tuning the language representation model.
  • Figure 4b is a schematic diagram of performing a multi-level mask language task provided by an embodiment of the present application.
  • the computing device sequentially performs a mask language task on the natural language data, domain language data, and domain task language data. After performing the mask language task on each language data, a fine-tuned language representation model is obtained, and then The fine-tuned language representation model is used to perform a masking language task on the next language data, and finally the language representation model after performing the multi-level masking language task is output.
  • natural language data such as "Preschool education is an important part of the national education system”
  • domain language data such as "Use azithromycin anti-inflammatory drugs for treatment first”
  • domain task language data such as "Repeated cough, After coughing up phlegm for 3 days, he was treated with Ketofen.”
  • the computing device in the embodiment of the present application can combine multiple types of first language data and perform multi-level masking language tasks on multiple types of data based on the language representation model, thereby improving the training accuracy of the language representation model.
  • the computing device performs the relationship classification task.
  • the computing device maps the association relationships between the knowledge entities in the knowledge graph to the knowledge entities in the first language to obtain the second language data, then the second language data It includes head entity, tail entity and association relationship.
  • the computing device determines the semantic similarity between the knowledge entities in the knowledge graph and the knowledge entities in the first language data. Degree is mapped, and when the semantic similarity between the knowledge entity in the knowledge graph and the knowledge entity in the first language data exceeds a preset threshold, the computing device maps the association relationship of the knowledge entity in the knowledge graph to the first language data. Knowledge entities, get second language data.
  • the computing device performs a relationship classification task on the second language data based on the language representation model to obtain a second loss function. Specifically, the computing device obtains the representation vector of the second language data based on the Transformer module of the language representation model, and extracts the representation vectors of the head entity, tail entity and flag bits from it. For the three representation vectors, the head entity and the tail entity are output through the fully connected layer. Based on the predicted relationships and corresponding probabilities between entities, the computing device calculates a second loss function for the relationship classification task based on the predicted relationships and probabilities and the labels of the relationships and probabilities. The second loss function can be used to update the language representation model.
  • FIG. 5 is a schematic diagram of a computing device provided by the present application for performing a relationship classification task.
  • the computing device maps the association relationships between knowledge entities in the knowledge graph to knowledge entities in the first language language data. For example, the computing device obtains "dizziness” and "heart palpitations" from the medical knowledge graph.
  • the association between the two knowledge entities is "accompanying", and the computing device maps the association to the knowledge entity in the second language data "repeated dizziness, sweating, and worsening heart palpitations for 3 days", that is, through ⁇ e1>, ⁇ /e1> identifies the head entity “dizziness”, uses ⁇ e2>, ⁇ /e2> to identify the tail entity “palpitations”, and identifies the second language data through the flag bit [CLS], which is used to execute Relationship classification task.
  • CCS flag bit
  • the computing device obtains the representation vector of the second language data based on the Transformer module of the language representation model, extracts the representation vector of the head entity, the tail entity and the flag bit, and splices the three representation vectors through the full process.
  • the predicted relationship between the head entity and the tail entity is obtained.
  • the second loss function of the relationship classification task can be obtained by calculating the cross entropy based on the predicted relationship and the label of the actual relationship.
  • the computing device After the computing device obtains the second loss function of the relationship classification task based on the language representation model, it calculates the target loss function based on the first loss function obtained by performing the masked language task and the second loss function. Specifically, the computing device adds the first loss function and the second loss function to obtain the target loss function.
  • the computing device updates the language representation model according to the target loss function.
  • the computing device updates the language representation model according to the target loss function. Specifically, when the target loss function does not reach the expected value, the computing device continues to train the language representation model through multi-task learning. When the target loss function reaches the expected value, the computing device outputs the updated language. Representation model, this updated language representation model is used to perform downstream tasks.
  • the computing device updates the language representation model according to the target loss function
  • the updated language representation model is used to perform various downstream tasks.
  • the downstream tasks include one or more of the following tasks: electronic medical record structured tasks, assisted diagnosis, or intelligent consultation.
  • the language representation model can be used as a basic tool to perform various downstream tasks.
  • the downstream tasks include, for example, assisted diagnosis. , intelligent consultation and electronic medical record structuring.
  • the computing device can perform multi-task learning based on the language representation model during the training of the language representation model.
  • the computing device can integrate the knowledge in the knowledge graph during the training of the language representation model.
  • the correlation between entities makes the language data representation in a specific field corresponding to the knowledge graph of the language representation model more accurate, improving the accuracy of the language representation model.
  • FIG. 6 is a schematic structural diagram of a training device provided by an embodiment of the present application.
  • the training device is used to implement the steps performed by the computing device in the above method embodiment.
  • the training device 600 includes an acquisition unit 601 and a processing unit 602.
  • the obtaining unit 601 is used to obtain the first language data.
  • the processing unit 602 is used to perform multi-task learning based on the language representation model and the first language data to obtain the target loss function.
  • the multi-task learning includes a mask language task and a relationship classification task.
  • the mask language task is used for data generated according to the language representation model.
  • the masked language task is performed and the relation classification task is used to perform the relation classification task based on the data generated by the language representation model.
  • the processing unit 602 is also used to update the language representation model according to the target loss function.
  • the processing unit 602 is specifically configured to perform a masking language task on the first language data based on the language representation model, obtain a first loss function, and map the association relationships of the knowledge entities in the knowledge graph to the first language. Knowledge entities in the data are used to obtain second language data. Based on the language representation model, the relationship classification task is performed on the second language data to obtain the second loss function. The target loss function is determined based on the first loss function and the second loss function.
  • the processing unit 602 is also configured to perform named entity recognition NER on the first language data to obtain the knowledge entities in the first language data.
  • the processing unit 602 is specifically configured to change the association relationship of the knowledge entities in the knowledge graph when the semantic similarity between the knowledge entities in the knowledge graph and the knowledge entities in the first language data exceeds a preset threshold. Mapping to knowledge entities in first language data.
  • the processing unit 602 is specifically configured to perform a masking language task on the first language data according to the masking algorithm corresponding to the first language data.
  • the first language data includes one or more of the following data: natural language Data, domain language data and domain task language data, masking algorithms include full word masking, entity masking and entity masking based on word frequency and inverse document frequency.
  • the processing unit 602 is specifically configured to perform a masking language task on the first language data according to a whole-word masking algorithm when the first language data is natural language data.
  • a masking language task is performed on the first language data according to the entity masking algorithm.
  • the mask language task is performed on the first language data according to the entity masking algorithm based on word frequency and inverse document frequency TF-IDF.
  • the domain language data includes language data of medical materials
  • the domain task language data includes language data of electronic medical records or language data of imaging examinations
  • the processing unit is also used to perform downstream tasks based on the updated language representation model.
  • ,Downstream tasks include one or more of the following tasks: ,electronic medical record structured tasks, auxiliary diagnosis, or ,intelligent consultation.
  • each unit in the device can be a separate processing element, or it can be integrated and implemented in a certain chip of the device.
  • it can also be stored in the memory in the form of a program, and a certain processing element of the device can call and execute the unit. Function.
  • all or part of these units can be integrated together or implemented independently.
  • the processing element described here can also be a processor, which can be an integrated circuit with signal processing capabilities.
  • each step of the above method or each unit above can be implemented by an integrated logic circuit of hardware in the processor element or implemented in the form of software calling through the processing element.
  • FIG. 7 is a schematic diagram of a computing device provided by an embodiment of the present application.
  • the computing device 700 includes: a processor 710, a memory 720, and an interface 730.
  • the processor 710, the memory 720, and the interface 730 are coupled through a bus (not labeled in the figure).
  • the memory 720 stores instructions. When the execution instructions in the memory 720 are executed, the computing device 700 performs the method performed by the computer device in the above method embodiment.
  • the computing device 700 may be one or more integrated circuits configured to implement the above methods, such as one or more application specific integrated circuits (ASICs), or one or more microprocessors (digital signal processors). , DSP), or, one or more field programmable gate arrays (FPGA), or a combination of at least two of these integrated circuit forms.
  • ASICs application specific integrated circuits
  • DSP digital signal processors
  • FPGA field programmable gate arrays
  • the unit in the device can be implemented in the form of a processing element scheduler
  • the processing element can be a general processor, such as a central processing unit (Central Processing Unit, CPU) or other processors that can call programs.
  • CPU central processing unit
  • these units can be integrated together and implemented in the form of a system-on-a-chip (SOC).
  • SOC system-on-a-chip
  • the processor 710 can be a central processing unit (CPU), or other general-purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), or a field-programmable processor.
  • DSP digital signal processor
  • ASIC application specific integrated circuit
  • FPGA Field programmable gate array
  • a general-purpose processor can be a microprocessor or any conventional processor.
  • Memory 720 may include read-only memory and random access memory and provides instructions and data to processor 710 .
  • Memory 720 may also include non-volatile random access memory.
  • the memory 720 may be provided with multiple partitions, and each area is used to store private keys of different software modules.
  • Memory 720 may be volatile memory or non-volatile memory, or may include both volatile and non-volatile memory.
  • non-volatile memory can be read-only memory (ROM), programmable ROM (PROM), erasable programmable read-only memory (erasable PROM, EPROM), electrically removable memory. Erase electrically programmable read-only memory (EPROM, EEPROM) or flash memory.
  • Volatile memory can be random access memory (RAM), which is used as an external cache.
  • RAM static random access memory
  • DRAM dynamic random access memory
  • SDRAM synchronous dynamic random access memory
  • Double data rate synchronous dynamic random access memory double data date SDRAM, DDR SDRAM
  • enhanced synchronous dynamic random access memory enhanced SDRAM, ESDRAM
  • synchronous link dynamic random access memory direct rambus RAM, DR RAM
  • the bus may also include a power bus, a control bus, a status signal bus, etc.
  • the bus can be a peripheral component interconnect express (PCIe) bus, an extended industry standard architecture (EISA) bus, a unified bus (unified bus, Ubus or UB), or a computer quick link (compute express link (CXL), cache coherent interconnect for accelerators (CCIX), etc.
  • PCIe peripheral component interconnect express
  • EISA extended industry standard architecture
  • CXL computer quick link
  • CXL cache coherent interconnect for accelerators
  • the bus can be divided into address bus, data bus, control bus, etc.
  • a computer-readable storage medium is also provided.
  • Computer-executable instructions are stored in the computer-readable storage medium.
  • the processor of the device executes the computer-executed instructions
  • the device executes the above method embodiment. A method performed by a computer device.
  • a computer program product is also provided, the computer program product including computer execution instructions.
  • the processor of the device executes the computer execution instruction
  • the device executes the method executed by the computer device in the above method embodiment.
  • the disclosed systems, devices and methods can be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components may be combined or can be integrated into another system, or some features can be ignored, or not implemented.
  • the coupling or direct coupling or communication connection between each other shown or discussed may be through some interfaces, and the indirect coupling or communication connection of the devices or units may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or they may be distributed to multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application can be integrated into one processing unit, each unit can exist physically alone, or two or more units can be integrated into one unit.
  • the above integrated units can be implemented in the form of hardware or software functional units.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a computer-readable storage medium.
  • the technical solution of the present application is essentially or contributes to the existing technology, or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in various embodiments of this application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, read-only memory), random access memory (RAM, random access memory), magnetic disk or optical disk and other media that can store program code. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

Disclosed in embodiments of the present application are a training method and a training device for a language representation model, used for improving the model precision of a language representation model. The method in the embodiments of the present application comprises: acquiring first language data; performing multi-task learning on the basis of the language representation model and the first language data to obtain a target loss function, wherein the multi-task learning comprises a mask language task and a relationship classification task, the mask language task is used for executing a mask language task according to the data generated by the language representation model, and the relationship classification task is used for executing a relationship classification task on the basis of the data generated by the language representation model; and updating the language representation model according to the target loss function.

Description

一种语言表征模型的训练方法以及训练装置A training method and training device for language representation model
本申请要求于2022年3月29日提交中国专利局、申请号为“202210318687.8”、申请名称为“一种语言表征模型的训练方法以及训练装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application submitted to the China Patent Office on March 29, 2022, with the application number "202210318687.8" and the application title "A training method and training device for a language representation model", and its entire content has been approved This reference is incorporated into this application.
技术领域Technical field
本申请实施例涉及人工智能领域,尤其涉及一种语言表征模型的训练方法以及训练装置。Embodiments of the present application relate to the field of artificial intelligence, and in particular, to a training method and training device for a language representation model.
背景技术Background technique
自然语言处理(natural language processing,NLP)是计算机科学领域与人工智能领域中的一个重要方向。在自然语言处理技术中,语言表征模型可以将自然语言中的文本信息表征为向量,从而可以对自然语言的文本信息应用神经网络等方法执行任务,例如,机器翻译、情感分析和辅助诊断等。Natural language processing (NLP) is an important direction in the fields of computer science and artificial intelligence. In natural language processing technology, language representation models can represent text information in natural language as vectors, so that methods such as neural networks can be applied to text information in natural language to perform tasks, such as machine translation, sentiment analysis, and auxiliary diagnosis.
目前语言表征模型的训练过程中,语言表征模型基于大量的通用自然语言数据进行训练,训练得到的语言表征模型在执行特定领域的下游任务时,由于在各个特定的领域中有着领域内的专业语言,如果语言表征模型对特定领域内的专业语言的表征效果差,则训练得到的语言表征模型在特定领域内的应用会受到很大的限制。In the current training process of language representation models, the language representation models are trained based on a large amount of general natural language data. When the trained language representation models perform downstream tasks in specific fields, due to the professional language in each specific field, , if the language representation model has poor performance in representing professional languages in a specific field, the application of the trained language representation model in the specific field will be greatly limited.
虽然目前部分语言表征模型在训练过程中也基于特定领域内的专业语言进行训练,但是由于语言表征模型的训练过程训练维度单一,从而对于一些复杂的特定领域语言数据,语言表征模型的表征精度低,进一步导致执行下游任务效果差。Although some language representation models are currently trained based on professional languages in specific fields during the training process, due to the single training dimension of the training process of language representation models, the representation accuracy of language representation models is low for some complex language data in specific fields. , further leading to poor performance in executing downstream tasks.
发明内容Contents of the invention
本申请实施例提供了一种语言表征模型的训练方法以及训练装置,用于提升语言表征模型的精度。Embodiments of the present application provide a training method and a training device for a language representation model, which are used to improve the accuracy of the language representation model.
本申请实施例第一方面提供了一种语言表征模型的训练方法,该方法可以由计算设备执行,也可以由计算设备的部件,例如计算设备的处理器、芯片或芯片系统等执行,还可以由能实现全部或部分计算设备功能的逻辑模块或软件实现。以计算设备执行为例,第一方面提供的方法包括:计算设备获取第一语言数据,该第一语言数据包括多种不同类型的语言数据。计算设备基于语言表征模型和第一语言数据进行多任务学习,得到目标损失函数,该目标损失函数能够指示语言表征模型的输出结果与目标结果之间的偏差。多任务学习包括掩码语言任务和关系分类任务,掩码语言任务用于根据语言表征模型生成的数据执行掩码语言任务,关系分类任务用于基于语言表征模型生成的数据执行关系分类任务。计算设备根据目标损失函数更新语言表征模型。The first aspect of the embodiments of the present application provides a method for training a language representation model. The method can be executed by a computing device, or by a component of the computing device, such as a processor, a chip or a chip system of the computing device. It can also be executed It is implemented by logic modules or software that can realize all or part of the functions of the computing device. Taking the execution of a computing device as an example, the method provided in the first aspect includes: the computing device obtains first language data, where the first language data includes multiple different types of language data. The computing device performs multi-task learning based on the language representation model and the first language data to obtain a target loss function. The target loss function can indicate the deviation between the output result of the language representation model and the target result. Multi-task learning includes masked language tasks and relational classification tasks. The masked language task is used to perform the masked language task based on the data generated by the language representation model, and the relational classification task is used to perform the relational classification task based on the data generated by the language representation model. The computing device updates the language representation model according to the target loss function.
本申请实施例中计算设备在训练语言表征模型的过程中,能够基于语言表征模型进行多任务学习,具体通过执行掩码语言任务和执行关系分类任务训练语言表征模型,使得语 言表征模型训练过程结合了第一语言数据中的知识实体和知识实体之间的关联关系,多任务和多数据训练过程提升了语言表征模型的精度。In the embodiment of the present application, the computing device can perform multi-task learning based on the language representation model during the process of training the language representation model. Specifically, the language representation model is trained by performing masking language tasks and performing relationship classification tasks, so that the language representation model training process is combined The knowledge entities in the first language data and the correlation between the knowledge entities are identified, and the multi-task and multi-data training process improves the accuracy of the language representation model.
一种可能的实施方式中,计算设备根据语言表征模型和第一语言数据进行多任务学习的过程中,计算设备基于语言表征模型对第一语言数据执行掩码语言任务,得到第一损失函数。计算设备将知识图谱中的知识实体的关联关系映射到第一语言数据中的知识实体,得到第二语言数据。计算设备基于语言表征模型对第二语言数据执行关系分类任务,得到第二损失函数。计算设备根据第一损失函数和第二损失函数确定目标损失函数。In one possible implementation, during the multi-task learning process of the computing device based on the language representation model and the first language data, the computing device performs a masked language task on the first language data based on the language representation model to obtain the first loss function. The computing device maps the association relationship of the knowledge entities in the knowledge graph to the knowledge entities in the first language data to obtain the second language data. The computing device performs a relationship classification task on the second language data based on the language representation model to obtain a second loss function. The computing device determines a target loss function based on the first loss function and the second loss function.
本申请实施例中计算设备基于知识图谱中将知识实体之间的关联关系映射到第一语言数据中的知识实体得到第二语言数据,并基于第二语言数据执行关系分类任务,因此,计算设备在训练语言表征模型的过程中,融入知识图谱中知识实体之间的关联关系,从而使得语言表征模型对应知识图谱对应的特定领域的语言数据表征更加精确,提升了语言表征模型的精度。In the embodiment of the present application, the computing device maps the association between knowledge entities in the knowledge graph to the knowledge entities in the first language data to obtain the second language data, and performs the relationship classification task based on the second language data. Therefore, the computing device In the process of training the language representation model, the correlation between the knowledge entities in the knowledge graph is integrated, so that the language data representation of the language data in the specific field corresponding to the knowledge graph corresponding to the language representation model is more accurate, and the accuracy of the language representation model is improved.
一种可能的实施方式中,将知识图谱中的知识实体关联关系映射到第一语言数据中的知识实体之前,计算设备对第一语言数据进行命名实体识别NER,具体的,计算设备对第一语言数据进行知识实体抽取,得到第一语言数据中的知识实体。计算设备可以基于内置的NER模块对应第一语言数据进行命名实体识别,也可以基于外部NER模块对第一语言数据进行命名实体识别,具体不做限定In a possible implementation, before mapping the knowledge entity association relationship in the knowledge graph to the knowledge entity in the first language data, the computing device performs named entity recognition NER on the first language data. Specifically, the computing device performs named entity recognition on the first language data. Extract knowledge entities from the language data to obtain the knowledge entities in the first language data. The computing device can perform named entity recognition on the first language data based on the built-in NER module, or can perform named entity recognition on the first language data based on the external NER module. There is no specific limitation.
本申请实施例中计算设备在将知识图谱中的知识实体关联关系映射到第一语言数据中的知识实体之前,先要对第一语言数据中的知识实体进行抽取得到第一语言数据的知识实体,从而提升了方案的可实现性。In the embodiment of the present application, before mapping the knowledge entity relationships in the knowledge graph to the knowledge entities in the first language data, the computing device must first extract the knowledge entities in the first language data to obtain the knowledge entities of the first language data. , thus improving the achievability of the solution.
一种可能的实施方式中,计算设备将知识图谱中的知识实体关联关系映射到第一语言数据中的知识实体的过程中,当知识图谱中的知识实体与第一语言数据中的知识实体的语义相似度超过预设阈值时,计算设备将知识图谱中的知识实体的关联关系映射到第一语言数据中的知识实体,得到第二语言数据。In a possible implementation, in the process of mapping the knowledge entity association relationship in the knowledge graph to the knowledge entity in the first language data by the computing device, when the relationship between the knowledge entity in the knowledge graph and the knowledge entity in the first language data is When the semantic similarity exceeds the preset threshold, the computing device maps the association relationships of the knowledge entities in the knowledge graph to the knowledge entities in the first language data to obtain the second language data.
本申请实施例中计算设备在将知识图谱中的关联关系映射到第一语言数据中的知识实体之前,计算设备要根据知识图谱中的知识实体的语义和第一语言数据中的知识实体的语义判断是否为相同的知识实体,从而提升了映射过程的准确性。In the embodiment of the present application, before the computing device maps the association relationship in the knowledge graph to the knowledge entity in the first language data, the computing device needs to use the semantics of the knowledge entity in the knowledge graph and the semantics of the knowledge entity in the first language data. Determine whether they are the same knowledge entity, thereby improving the accuracy of the mapping process.
一种可能的实施方式中,计算设备对第一语言数据执行掩码语言任务的过程中,计算设备根据第一语言数据对应的掩码算法对第一语言数据执行掩码语言任务,第一语言数据包括以下一项或多项数据:自然语言数据、领域语言数据和领域任务语言数据,掩码算法包括全词掩码、实体掩码和基于词频及逆向文档频率的实体掩码。In a possible implementation, when the computing device performs a masking language task on the first language data, the computing device performs a masking language task on the first language data according to the masking algorithm corresponding to the first language data, and the first language The data includes one or more of the following data: natural language data, domain language data, and domain task language data. The masking algorithm includes full-word masking, entity masking, and entity masking based on word frequency and inverse document frequency.
本申请实施例中的第一语言数据包括多种类型的语言数据,计算设备可基于多种掩码算法对于多种第一语言数据执行掩码语言任务,从而提升方案的丰富性。The first language data in the embodiments of the present application includes multiple types of language data, and the computing device can perform masking language tasks on multiple types of first language data based on multiple masking algorithms, thereby improving the richness of the solution.
一种可能的实施方式中,计算设备根据第一语言数据对应的掩码算法对第一语言数据执行掩码语言任务的过程中,当第一语言数据为自然语言数据时,根据全词掩码算法对第一语言数据执行掩码语言任务。当第一语言数据为领域语言数据时,根据实体掩码算法对第一语言数据执行掩码语言任务。当第一语言数据为领域任务语言数据时,根据基于词频 及逆向文档频率TF-IDF的实体掩码算法对第一语言数据执行掩码语言任务。In a possible implementation, when the computing device performs a masking language task on the first language data according to the masking algorithm corresponding to the first language data, when the first language data is natural language data, the whole-word mask is used to mask the first language data. The algorithm performs a masked language task on first language data. When the first language data is domain language data, a masking language task is performed on the first language data according to the entity masking algorithm. When the first language data is domain task language data, the mask language task is performed on the first language data according to the entity masking algorithm based on word frequency and inverse document frequency TF-IDF.
本申请实施例中计算设备对于不同类型的第一语言数据选择不同的掩码算法,从提升掩码任务的准确性,进一步提升语言表征模型的精度。In the embodiment of the present application, the computing device selects different masking algorithms for different types of first language data, thereby improving the accuracy of the masking task and further improving the accuracy of the language representation model.
一种可能的实施方式中,当第一语言数据同时包含多种类型的语言数据时,计算设备可以对于多种类型的第一语言数据依次执行掩码语言任务。In a possible implementation, when the first language data contains multiple types of language data at the same time, the computing device can sequentially perform masking language tasks on the multiple types of first language data.
本申请实施例中计算设备能够对多种类型的第一语言数据执行多层次的掩码语言任务,从而进一步提升了语言表征模型的训练精度。The computing device in the embodiment of the present application can perform multi-level masking language tasks on multiple types of first language data, thereby further improving the training accuracy of the language representation model.
一种可能的实施方式中,领域语言数据包括医学资料的语言数据,领域任务语言数据包括电子病历的语言数据或影像检查的语言数据,计算设备基于更新后的语言表征模型执行下游任务,下游任务包括以下一项或多项任务:电子病历结构化任务、辅助诊断或智能问诊。In a possible implementation, the domain language data includes language data of medical materials, the domain task language data includes language data of electronic medical records or language data of imaging examinations, the computing device performs downstream tasks based on the updated language representation model, and the downstream tasks Including one or more of the following tasks: electronic medical record structured tasks, assisted diagnosis, or intelligent consultation.
本申请实施例中计算机设备基于目标损失函数更新语言表征模型之后,能够基于更新的后的语言表征模型执行多种下游任务,从而提升了计算设备执行下游任务的效果。In the embodiment of the present application, after the computer device updates the language representation model based on the target loss function, it can perform a variety of downstream tasks based on the updated language representation model, thereby improving the performance of the computing device in performing downstream tasks.
本申请实施例第二方面提供了一种语言表征模型的训练装置,包括获取单元和处理单元。其中,获取单元用于获取第一语言数据。处理单元用于基于语言表征模型和第一语言数据进行多任务学习,得到目标损失函数,多任务学习包括掩码语言任务和关系分类任务,掩码语言任务用于根据语言表征模型生成的数据执行掩码语言任务,关系分类任务用于基于语言表征模型生成的数据执行关系分类任务。处理单元还用于根据目标损失函数更新语言表征模型。The second aspect of the embodiment of the present application provides a training device for a language representation model, including an acquisition unit and a processing unit. Wherein, the acquisition unit is used to acquire the first language data. The processing unit is used to perform multi-task learning based on the language representation model and first language data to obtain the target loss function. The multi-task learning includes a mask language task and a relationship classification task. The mask language task is used to execute based on the data generated by the language representation model. Masked language task, the relation classification task is used to perform relation classification tasks based on the data generated by the language representation model. The processing unit is also used to update the language representation model based on the target loss function.
一种可能的实施方式中,处理单元具体用于基于语言表征模型对第一语言数据执行掩码语言任务,得到第一损失函数,将知识图谱中的知识实体的关联关系映射到第一语言数据中的知识实体,得到第二语言数据,基于语言表征模型对第二语言数据执行关系分类任务,得到第二损失函数,根据第一损失函数和第二损失函数确定目标损失函数。In a possible implementation, the processing unit is specifically configured to perform a masking language task on the first language data based on the language representation model, obtain a first loss function, and map the association relationships of the knowledge entities in the knowledge graph to the first language data. The knowledge entities in the second language data are obtained, and the relationship classification task is performed on the second language data based on the language representation model to obtain the second loss function. The target loss function is determined based on the first loss function and the second loss function.
一种可能的实施方式中,处理单元还用于对第一语言数据进行命名实体识别NER,得到第一语言数据中的知识实体。In a possible implementation, the processing unit is also used to perform named entity recognition NER on the first language data to obtain the knowledge entities in the first language data.
一种可能的实施方式中,处理单元具体用于当知识图谱中的知识实体与第一语言数据中的知识实体的语义相似度超过预设阈值时,将知识图谱中的知识实体的关联关系映射到第一语言数据中的知识实体。In a possible implementation, the processing unit is specifically configured to map the association relationship of the knowledge entities in the knowledge map when the semantic similarity between the knowledge entities in the knowledge map and the knowledge entities in the first language data exceeds a preset threshold. to knowledge entities in first language data.
一种可能的实施方式中,处理单元具体用于根据第一语言数据对应的掩码算法对第一语言数据执行掩码语言任务,第一语言数据包括以下一项或多项数据:自然语言数据、领域语言数据和领域任务语言数据,掩码算法包括全词掩码、实体掩码和基于词频及逆向文档频率的实体掩码。In a possible implementation, the processing unit is specifically configured to perform a masking language task on the first language data according to a masking algorithm corresponding to the first language data. The first language data includes one or more of the following data: natural language data. , domain language data and domain task language data, masking algorithms include full word masking, entity masking and entity masking based on word frequency and inverse document frequency.
一种可能的实施方式中,处理单元具体用于当第一语言数据为自然语言数据时,根据全词掩码算法对第一语言数据执行掩码语言任务。当第一语言数据为领域语言数据时,根据实体掩码算法对第一语言数据执行掩码语言任务。当第一语言数据为领域任务语言数据时,根据基于词频及逆向文档频率TF-IDF的实体掩码算法对第一语言数据执行掩码语言任 务。In a possible implementation, the processing unit is specifically configured to perform a masking language task on the first language data according to a whole-word masking algorithm when the first language data is natural language data. When the first language data is domain language data, a masking language task is performed on the first language data according to the entity masking algorithm. When the first language data is domain task language data, the mask language task is performed on the first language data according to the entity masking algorithm based on word frequency and inverse document frequency TF-IDF.
一种可能的实施方式中,领域语言数据包括医学资料的语言数据,领域任务语言数据包括电子病历的语言数据或影像检查的语言数据,处理单元还用于基于更新后的语言表征模型执行下游任务,下游任务包括以下一项或多项任务:电子病历结构化任务、辅助诊断或智能问诊。In a possible implementation, the domain language data includes language data of medical materials, the domain task language data includes language data of electronic medical records or language data of imaging examinations, and the processing unit is also used to perform downstream tasks based on the updated language representation model. ,Downstream tasks include one or more of the following tasks: ,electronic medical record structured tasks, auxiliary diagnosis, or ,intelligent consultation.
本申请实施例第三方面提供了一种计算设备,包括处理器,处理器与存储器耦合,处理器用于存储指令,当指令被处理器执行时,以使得计算设备执行上述第一方面或者第一方面任意一种可能的实施方式所述的方法。The third aspect of the embodiment of the present application provides a computing device, including a processor, the processor is coupled to a memory, and the processor is configured to store instructions. When the instructions are executed by the processor, the computing device executes the above first aspect or the first aspect. The method described in any possible implementation manner.
本申请实施例第四方面提供了一种计算机可读存储介质,其上存储有指令,指令被执行时,以使得计算机执行上述第一方面或者第一方面任意一种可能的实施方式所述的方法。The fourth aspect of the embodiments of the present application provides a computer-readable storage medium on which instructions are stored. When the instructions are executed, the computer executes the method described in the above-mentioned first aspect or any possible implementation manner of the first aspect. method.
本申请实施例第五方面提供了一种计算机程序产品,计算机程序产品中包括指令,指令被执行时,以使得计算机实现第一方面或者第一方面任意一种可能的实施方式所述的方法。The fifth aspect of the embodiments of the present application provides a computer program product. The computer program product includes instructions. When the instructions are executed, the computer implements the method described in the first aspect or any possible implementation manner of the first aspect.
可以理解,上述提供的任意一种训练装置、计算设备、计算机可读介质或计算机程序产品等所能达到的有益效果可参考对应的计算系统中的有益效果,此处不再赘述。It can be understood that the beneficial effects that can be achieved by any of the training devices, computing devices, computer-readable media or computer program products provided above can be referred to the beneficial effects in the corresponding computing system, and will not be described again here.
附图说明Description of drawings
图1a为本申请实施例提供的一种语言表征模型的训练系统架构示意图;Figure 1a is a schematic diagram of the training system architecture of a language representation model provided by an embodiment of the present application;
图1b为本申请实施例提供的一种语言表征模型的训练流程框架示意图;Figure 1b is a schematic diagram of the training process framework of a language representation model provided by an embodiment of the present application;
图2为本申请实施例提供的一种语言表征模型的训练方法流程示意图;Figure 2 is a schematic flowchart of a training method for a language representation model provided by an embodiment of the present application;
图3为本申请实施例提供的另一种语言表征模型的训练方法流程示意图;Figure 3 is a schematic flowchart of a training method for another language representation model provided by an embodiment of the present application;
图4a为本申请实施例提供的一种执行掩码语言任务的示意图;Figure 4a is a schematic diagram of performing a masked language task provided by an embodiment of the present application;
图4b为本申请实施例提供的另一种执行掩码语言任务的示意图;Figure 4b is another schematic diagram of performing a masked language task provided by an embodiment of the present application;
图5为本申请实施例提供的一种执行关系分类任务的示意图;Figure 5 is a schematic diagram of performing a relationship classification task provided by an embodiment of the present application;
图6为本申请实施例提供的一种训练装置的结构示意图;Figure 6 is a schematic structural diagram of a training device provided by an embodiment of the present application;
图7为本申请实施例提供的一种计算设备的结构示意图。FIG. 7 is a schematic structural diagram of a computing device provided by an embodiment of the present application.
具体实施方式Detailed ways
本申请实施例提供了一种语言表征模型的训练方法以及训练装置,用于提升语言表征模型的精度。Embodiments of the present application provide a training method and a training device for a language representation model, which are used to improve the accuracy of the language representation model.
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”、“第四”等(如果存在)是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的实施例能够以除了在这里图示或描述的内容以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。The terms "first", "second", "third", "fourth", etc. (if present) in the description and claims of this application and the above-mentioned drawings are used to distinguish similar objects without necessarily using Used to describe a specific order or sequence. It is to be understood that the data so used are interchangeable under appropriate circumstances so that the embodiments described herein can be practiced in sequences other than those illustrated or described herein. In addition, the terms "including" and "having" and any variations thereof are intended to cover non-exclusive inclusions, e.g., a process, method, system, product, or apparatus that encompasses a series of steps or units and need not be limited to those explicitly listed. Those steps or elements may instead include other steps or elements not expressly listed or inherent to the process, method, product or apparatus.
在本申请实施例中,“示例性的”或者“例如”等词用于表示作例子、例证或说明。本申请实施例中被描述为“示例性的”或者“例如”的任何实施例或设计方案不应被解释为比其它实施例或设计方案更优选或更具优势。确切而言,使用“示例性的”或者“例如”等词旨在以具体方式呈现相关概念。In the embodiments of this application, words such as "exemplary" or "for example" are used to represent examples, illustrations or explanations. Any embodiment or design described as "exemplary" or "such as" in the embodiments of the present application is not to be construed as preferred or advantageous over other embodiments or designs. Rather, use of the words "exemplary" or "such as" is intended to present the concept in a concrete manner.
以下,对本申请中的部分用语进行解释说明,以便于本领域技术人员理解。In the following, some terms used in this application are explained to facilitate understanding by those skilled in the art.
掩码(Mask)是指语言表征模型的预训练过程中遮盖语言数据中的部分词语的技术。Mask refers to the technology of covering some words in the language data during the pre-training process of the language representation model.
语言模型(language model,LM)也称作语言表征模型,语言模型是一个单纯的、统一的、抽象的形式系统,语言数据经过语言模型的表征,形成适合于计算设备进行自动处理的语言数据。Language model (LM) is also called language representation model. Language model is a simple, unified, abstract formal system. Language data is represented by the language model to form language data suitable for automatic processing by computing devices.
实体(Entity)也称作知识实体,实体是对客观个体的抽象,一个人、一部电影、一句话都可以看作是一个实体。Entity is also called knowledge entity. Entity is the abstraction of objective individuals. A person, a movie, or a sentence can be regarded as an entity.
命名实体识别(named entity recognition,NER)是指识别文本中具有特定意义的实体,主要包括人名、地名、机构名、专有名词等。命名实体识别是信息提取、问答系统、句法分析、机器翻译、知识图谱等应用的重要基础工具。Named entity recognition (NER) refers to the identification of entities with specific meanings in text, mainly including person names, place names, organization names, proper nouns, etc. Named entity recognition is an important basic tool for information extraction, question answering systems, syntactic analysis, machine translation, knowledge graphs and other applications.
医学知识图谱(medical knowledge graph,MKG)是将知识图谱理论与医生的临床医学知识进行融合,打通医学知识点(信息、数据)及其之间符合医学知识内在逻辑机制而形成的专业型图谱。Medical knowledge graph (MKG) is a professional graph that integrates knowledge graph theory with doctors' clinical medical knowledge to connect medical knowledge points (information, data) and the internal logical mechanisms of medical knowledge.
电子病历(electronic medical records,EMR)基于计算机系统的电子化病人记录,该系统能够提供用户访问完整准确的数据、警示、提示和临床决策支持系统的能力。Electronic medical records (EMR) are electronic patient records based on computer systems that provide users with access to complete and accurate data, alerts, prompts, and clinical decision support systems.
三元组包含头实体、关系和尾实体,其中头实体即主语,关系即宾语,尾实体即谓语。The triplet contains head entity, relationship and tail entity, where the head entity is the subject, the relationship is the object, and the tail entity is the predicate.
自然语言(natural language)是指一种自然地随文化演化的语言。Natural language refers to a language that naturally evolves with culture.
交叉熵(cross entropy)是香农信息论中一个重要的概念,主要用于度量两个概率分布间的差异性信息。Cross entropy is an important concept in Shannon information theory, which is mainly used to measure the difference information between two probability distributions.
下面结合附图介绍本申请实施例提供的语言表征模型的训练方法以及训练装置。The training method and training device of the language representation model provided by the embodiments of the present application will be introduced below with reference to the accompanying drawings.
请参阅图1a,图1a为本申请实施例提供的一种语言表征模型所在的应用系统的系统架构示意图。如图1a所示,应用系统100包括多任务学习模块101、数据模块102和下游任务模块103,其中,多任务学习模块101用于通过执行多种任务来训练语言表征模型,数据模块102用于提供语言表征模型的训练数据,下游任务模块103用于根据语言表征模型执行多种下游任务。Please refer to Figure 1a. Figure 1a is a schematic system architecture diagram of an application system in which a language representation model provided by an embodiment of the present application is located. As shown in Figure 1a, the application system 100 includes a multi-task learning module 101, a data module 102 and a downstream task module 103. The multi-task learning module 101 is used to train a language representation model by performing multiple tasks, and the data module 102 is used to Providing training data for the language representation model, the downstream task module 103 is configured to perform various downstream tasks according to the language representation model.
多任务学习模块101包括掩码语言任务子模块1011和关系分类任务子模块1012。其中,掩码语言任务模块1011用于基于语言表征模型执行掩码语言任务,掩码语言任务是指在训练过程中遮盖语言数据中的部分词语,对遮盖的词语进行语义预测,语义预测结果的损失函数再用于更新语言表征模型。关系分类任务子模块1012用于基于语言表征模型执行关系分类任务,关系分类任务是指在训练过程中,计算设备结合知识图谱对语言数据中的知识实体进行关系预测,关系预测结果的损失函数再用于更新语言表征模型。多任务学习模块101能够根据掩码语言任务的损失函数和关系分类任务的损失函数计算得到目标损失函数,并根据该目标损失函数更新语言表征模型。The multi-task learning module 101 includes a mask language task sub-module 1011 and a relationship classification task sub-module 1012. Among them, the masked language task module 1011 is used to perform the masked language task based on the language representation model. The masked language task refers to covering some words in the language data during the training process, making semantic predictions for the covered words, and the semantic prediction results are The loss function is then used to update the language representation model. The relationship classification task sub-module 1012 is used to perform a relationship classification task based on the language representation model. The relationship classification task means that during the training process, the computing device combines the knowledge graph to predict the relationship of the knowledge entities in the language data, and the loss function of the relationship prediction result is then Used to update language representation models. The multi-task learning module 101 can calculate a target loss function based on the loss function of the mask language task and the loss function of the relationship classification task, and update the language representation model based on the target loss function.
数据模块102用于为语言表征模型提供训练数据。语言表征模型的训练数据包括自然语言数据、领域语言数据和领域任务语言数据。其中领域语言数据是指某一特定领域内的通用数据,领域任务语言数据是指某一领域执行特定任务产生的语言数据。数据模块102提供的训练数据可以用于进行命名实体识别NER得到知识实体,也可以用于进行知识抽取得到知识图谱。本申请实施例中进行命名实体识的NER模块可以系统内置模块,也可以外置模块。本申请实施例中的知识图谱都可以是训练数据抽取得到的知识图谱,也可以用户输入的外部知识图谱,具体不做限定。The data module 102 is used to provide training data for the language representation model. The training data of the language representation model includes natural language data, domain language data and domain task language data. The domain language data refers to the general data in a specific field, and the domain task language data refers to the language data generated by performing specific tasks in a certain field. The training data provided by the data module 102 can be used to perform named entity recognition (NER) to obtain knowledge entities, or can be used to perform knowledge extraction to obtain a knowledge graph. The NER module that performs named entity recognition in the embodiment of this application can be a built-in module in the system or an external module. The knowledge graph in the embodiments of this application can be a knowledge graph extracted from training data, or an external knowledge graph input by the user, and is not specifically limited.
下游任务模块103用于根据训练完成语言表征模型执行下游任务,训练完成的语言表征模型支持领域内的多种下游任务。例如,在医疗领域,下游任务包括电子病历结构化任务、辅助诊断或智能问诊等。The downstream task module 103 is configured to perform downstream tasks based on the trained language representation model. The trained language representation model supports a variety of downstream tasks in the field. For example, in the medical field, downstream tasks include electronic medical record structuring tasks, assisted diagnosis or intelligent consultation, etc.
请参阅图1b,图1b为本申请实施例提供的一种语言表征模型的训练框架示意图。如图1b所示,计算设备基于多种第一语言数据进行连续多级学习后,输出语言表征模型,第一语言数据包括自然语言数据、领域语言数据和领域任务语言数据。Please refer to Figure 1b, which is a schematic diagram of the training framework of a language representation model provided by an embodiment of the present application. As shown in Figure 1b, the computing device outputs a language representation model after performing continuous multi-level learning based on a variety of first language data. The first language data includes natural language data, domain language data, and domain task language data.
其中,对于领域语言数据和领域任务语言数据,计算设备可以基于领域语言数据和领域任务语言数据以及语言表征模型进行多任务学习,并根据多任务学习得到的目标损失函数更新语言表征模型。Among them, for the domain language data and domain task language data, the computing device can perform multi-task learning based on the domain language data, domain task language data and the language representation model, and update the language representation model according to the target loss function obtained by multi-task learning.
从1b所示的训练框架中可知,多任务学习包括掩码语言任务和关系分类任务。计算设备在基于语言表征模型执行掩码语言任务时,对于不同特征的第一语言数据选择不同的掩码算法算,掩码算法包括全词掩码算法、实体掩码算法和基于词频及逆向文档频率TF-IDF的实体掩码算法。It can be seen from the training framework shown in 1b that multi-task learning includes masked language tasks and relationship classification tasks. When the computing device performs a language masking task based on the language representation model, it selects different masking algorithms for the first language data with different characteristics. The masking algorithms include full-word masking algorithms, entity masking algorithms, and word frequency- and reverse-document-based algorithms. Entity masking algorithm for frequency TF-IDF.
在图1b所示的训练框架中,计算设备根据实体掩码算法或者基于词频及逆向文档频率的实体掩码算法执行掩码语言任务之前,计算设备需要对第一语言数据进行命名实体识别,得到知识实体。并进一步对得到知识实体计算词频及逆向文档频率,得到高词频及逆向文档频率的知识实体。计算设备在采用实体掩码算法执行掩码语言任务时,会基于上述知识实体对领域语言数据执行掩码操作。计算设备在采用基于词频及逆向文档频率的实体掩码算法执行掩码语言任务时,计算设备会基于上述高词频及逆向文档频率的知识实体对领域任务语言数据执行掩码操作。In the training framework shown in Figure 1b, before the computing device performs the masking language task according to the entity masking algorithm or the entity masking algorithm based on word frequency and inverse document frequency, the computing device needs to perform named entity recognition on the first language data, and we get knowledge entity. And further calculate the word frequency and reverse document frequency for the obtained knowledge entities, and obtain the knowledge entities with high word frequency and reverse document frequency. When the computing device uses the entity masking algorithm to perform a masking language task, it will perform a masking operation on the domain language data based on the above knowledge entities. When the computing device uses an entity masking algorithm based on word frequency and inverse document frequency to perform a masking language task, the computing device will perform a masking operation on the domain task language data based on the knowledge entities with high word frequency and inverse document frequency.
在图1b所示的训练框架中,计算设备在基于语言表征模型执行关系分类任务之前,基于知识图谱中的实体关联映射到第一语言数据中的知识实体,计算设备基于映射后的第一语言数据和语言表征模型执行关系分类任务。In the training framework shown in Figure 1b, before performing a relationship classification task based on the language representation model, the computing device maps the entity association in the knowledge graph to the knowledge entity in the first language data based on the mapped first language. Data and language representation models perform relationship classification tasks.
请参阅图2,图2为本申请实施例提供的一种语言表征模型的训练方法。该语言表征模型的训练方法包括但不限于以下步骤:Please refer to Figure 2. Figure 2 is a training method for a language representation model provided by an embodiment of the present application. The training method of the language representation model includes but is not limited to the following steps:
201.计算设备获取第一语言数据。201. The computing device obtains first language data.
计算设备获取第一语言数据,第一语言数据用于训练语言表征模型。第一语言数据包括用户输入的多种类型的数据,包括自然语言数据、领域语言数据和领域任务语言数据。The computing device obtains first language data, and the first language data is used to train a language representation model. First language data includes multiple types of data input by users, including natural language data, domain language data, and domain task language data.
其中,自然语言数据是指自然地随文化演化的产生的语言数据,领域语言数据是指在特定领域内的数据,领域任务语言数据是指在特定领域内执行特定任务所产生的数据。对于医疗领域而言,领域语言数据例如医学资料中的语言数据。领域任务语言数据例如,医院电子病历系统中产生的语言数据或医院影像检查产生的语言数据。Among them, natural language data refers to language data that naturally evolves with culture, domain language data refers to data in a specific field, and domain task language data refers to data generated by performing specific tasks in a specific domain. For the medical field, domain language data is such as language data in medical materials. Domain task language data is, for example, language data generated in a hospital electronic medical record system or language data generated in a hospital imaging examination.
本申请实施例第一语言数据还用于进行命名实体识别得到知识实体。计算设备获取第一语言数据之后,基于命名实体识别NER模块对第一语言数据进行命名实体识别,得到第一语言数据的知识实体,该命名实体识别模块可以是内置NER模块,也可以外部NER模块,具体不做限定。计算设备在执行掩码语言任务时根据这些第一语言数据的知识实体对第一语言数据执行掩码操作。In the embodiment of the present application, the first language data is also used to perform named entity recognition to obtain knowledge entities. After the computing device obtains the first language data, it performs named entity recognition on the first language data based on the named entity recognition NER module to obtain the knowledge entity of the first language data. The named entity recognition module can be a built-in NER module or an external NER module. , there is no specific limit. When performing a masking language task, the computing device performs a masking operation on the first language data based on the knowledge entities of the first language data.
一种可能的实施方式中,计算设备获取第一语言数据之后,还需要获取知识图谱,该知识图谱可以是基于第一语言数据生成的知识图谱,也可以是用户输入的外部知识图谱。当知识图谱是基于第一语言数据生成的知识图谱时,计算设备获取第一语言数据之后,根据第一语言数据中知识实体之间的关联关系生成知识图谱,这些知识图谱能够用于执行关系分类任务。In a possible implementation, after the computing device obtains the first language data, it also needs to obtain a knowledge graph. The knowledge graph may be a knowledge graph generated based on the first language data, or it may be an external knowledge graph input by the user. When the knowledge graph is a knowledge graph generated based on the first language data, after the computing device obtains the first language data, it generates a knowledge graph based on the correlation between the knowledge entities in the first language data. These knowledge graphs can be used to perform relationship classification. Task.
请参阅图3,图3为本申请实施例提供的一种语言表征模型的训练方法示意图。在3所示的示例中,计算设备在训练语言表征模型之前,计算设备获取第一语言数据,第一语言数据为计算设备训练语言表征模型的必选输入。Please refer to Figure 3, which is a schematic diagram of a training method for a language representation model provided by an embodiment of the present application. In the example shown in 3, before the computing device trains the language representation model, the computing device acquires the first language data, and the first language data is a required input for the computing device to train the language representation model.
在图3所示的示例中,计算设备还获取知识图谱,以及基于NER模块获取知识实体。其中,获取的知识图谱可以是用户输入的外部知识图谱,该外部知识图谱可以替换计算设备的内置知识图谱。进行命名实体识别的NER模块也可以是用户输入的外部NER模块,外部NER模块可以替换计算设备的内部NER模块。外部知识图谱和外部NER模块可以作为计算设备训练语言表征模型的可选输入。In the example shown in Figure 3, the computing device also acquires the knowledge graph and acquires the knowledge entities based on the NER module. The acquired knowledge graph may be an external knowledge graph input by the user, and the external knowledge graph may replace the built-in knowledge graph of the computing device. The NER module that performs named entity recognition can also be an external NER module input by the user, and the external NER module can replace the internal NER module of the computing device. External knowledge graphs and external NER modules can be used as optional inputs for computing devices to train language representation models.
202.计算设备根据第一语言数据和语言表征模型进行多任务学习,得到目标损失函数,多任务学习包括掩码语言任务和关系分类任务。202. The computing device performs multi-task learning based on the first language data and the language representation model to obtain the target loss function. The multi-task learning includes masking language tasks and relationship classification tasks.
计算设备根据第一语言数据和语言表征模型进行多任务学习,得到目标损失函数,多任务学习包括掩码语言任务和关系分任务。具体的,计算机基于语言表征模型对第一语言数据执行掩码语言任务,得到第一损失函数。其中,对于领域语言数据和领域任务语言数据,计算设备则基于第一语言数据的知识实体和语言表征模型执行掩码语言任务,该第一语言数据的知识实体为计算设备对第一语言数据进行命名实体识别得到的知识实体。The computing device performs multi-task learning based on the first language data and the language representation model to obtain the target loss function. The multi-task learning includes masked language tasks and relational sub-tasks. Specifically, the computer performs a masking language task on the first language data based on the language representation model to obtain the first loss function. Among them, for the domain language data and the domain task language data, the computing device performs the mask language task based on the knowledge entity of the first language data and the language representation model. The knowledge entity of the first language data is that the computing device performs operations on the first language data. Knowledge entities obtained through named entity recognition.
然后,计算设备将知识图谱中的知识实体之间的关联关系映射到第一语言数据的知识实体,得到第二语言数据,计算设备基于语言表征模型对第二语言数据执行关系分类任务,得到第二损失函数。最后,计算设备根据第一损失函数和第二损失函数确定目标函数。Then, the computing device maps the association between the knowledge entities in the knowledge graph to the knowledge entities of the first language data to obtain the second language data. The computing device performs a relationship classification task on the second language data based on the language representation model to obtain the third language data. Two loss functions. Finally, the computing device determines the objective function based on the first loss function and the second loss function.
下面分别具体介绍多任务学习过程中的计算设备执行掩码语言任务和执行关系分类任务的过程。The following is a detailed introduction to the process of the computing device performing the mask language task and performing the relationship classification task in the multi-task learning process.
1.计算设备执行掩码语言任务。1. The computing device performs the masked language task.
首先介绍计算设备执行掩码语言任务的过程,计算设备在基于语言表征模型对第一语 言数据执行掩码语言任务的过程中,计算设备通过多层次的掩码算法对第一语言数据执行掩码语言任务。本申请实施例中的掩码算法包括全词掩码、实体掩码和基于词频及逆向文档频率的实体掩码。First, the process of the computing device performing the masking language task is introduced. In the process of the computing device performing the masking language task on the first language data based on the language representation model, the computing device performs masking on the first language data through a multi-level masking algorithm. language tasks. The masking algorithm in the embodiment of this application includes full-word masking, entity masking, and entity masking based on word frequency and inverse document frequency.
计算设备通过层次的掩码算法对第一语言数据执行掩码语言任务的过程中,当第一语言数据为自然语言数据时,计算设备根据全词掩码算法对第一语言数据执行掩码语言任务。当第一语言数据为领域语言数据时,计算设备根据实体掩码算法对第一语言数据执行掩码语言任务。当第一语言数据为领域任务语言数据时,计算设备根据基于词频及逆向文档频率TF-IDF的实体掩码算法对第一语言数据执行掩码语言任务。In the process of the computing device performing a language masking task on the first language data through a hierarchical masking algorithm, when the first language data is natural language data, the computing device performs a language masking task on the first language data according to the whole-word masking algorithm. Task. When the first language data is domain language data, the computing device performs a masking language task on the first language data according to the entity masking algorithm. When the first language data is domain task language data, the computing device performs a mask language task on the first language data according to an entity masking algorithm based on word frequency and inverse document frequency TF-IDF.
请参阅图4a,图4a为本申请实施例提供的一种多层次掩码算法的示意图。如图4a所示,其中(a)图为全词掩码(whole word mask,WWM)算法的示意图,在(a)图的示例中,第一语言数据为自然语言数据,在全词掩码算法中,遮盖词为常见词语,常见词语例如从通用词典或通识书籍中获得词语,具体这些遮盖词可以是通过中文分词工具获得的词语。例如,在(a)图中,“学前××是××教育××的××组成部分”中基于全词掩码算法掩盖的词语为“教育”、“国民”、“体现”和“重要”,这词语即为自然语言数据中的常见词语。Please refer to Figure 4a, which is a schematic diagram of a multi-level masking algorithm provided by an embodiment of the present application. As shown in Figure 4a, (a) is a schematic diagram of the whole word mask (WWM) algorithm. In the example of (a), the first language data is natural language data. In the whole word mask, In the algorithm, the covering words are common words, such as words obtained from general dictionaries or general books. Specifically, these covering words can be words obtained through Chinese word segmentation tools. For example, in Figure (a), the words masked based on the full-word masking algorithm in "Preschool ×× is the ×× component of ××education ××" are "education", "national", "embodiment" and "important" ”, this word is a common word in natural language data.
如图4a所示,其中(b)图为实体掩码(entity mask,EM)算法的示意图,在(b)的示例中,第一语言数据为领域语言数据,在实体掩码算法中,遮盖词包括基于命名实体识别知识实体得到实体词,或者计算机设备基于知识图谱中的知识实体得到的实体词。例如,在图(b)中,“先使用××××类消炎药进行治疗”中基于实体掩码算法掩盖的词语为“阿奇霉素”,“阿奇霉素”即为领域语言数据中的实体词。As shown in Figure 4a, (b) is a schematic diagram of the entity mask (EM) algorithm. In the example of (b), the first language data is domain language data. In the entity mask algorithm, mask The words include entity words obtained by identifying knowledge entities based on named entities, or entity words obtained by computer equipment based on knowledge entities in the knowledge graph. For example, in Figure (b), the word masked based on the entity masking algorithm in "First use ×××× anti-inflammatory drugs for treatment" is "azithromycin", and "azithromycin" is the entity word in the domain language data.
如图4a所示,其中(c)图为基于词频及逆向文档频率(term frequency-inverse document frequency,TF-IDF)的实体掩码算法的示意图。在(c)图所示的示例中,第一语言数据为领域任务语言数据,在基于TF-IDF的实体掩码算法中,遮盖词包括从上述实体词中根据词频和逆向文档频率乘积筛选出来的实体词。其中词频TF表示词条在文档中出现的频率,逆向文档频率IDF为文档总数与包含词条文档数比值的对数值,逆向文档频率表示词条的区分能力,包含词条的文档越少,IDF越大。词频与逆向文档频率的乘积表示词条在文档中的重要程度。例如,在图(c)中,“××××,3天,予以×××治疗”中基于实体掩码算法掩盖的词语为“反复咳嗽”、“咯痰”和“酮体芬”,这些词语即为领域语言数据中的实体词。As shown in Figure 4a, (c) is a schematic diagram of the entity masking algorithm based on term frequency and inverse document frequency (term frequency-inverse document frequency, TF-IDF). In the example shown in (c), the first language data is domain task language data. In the entity masking algorithm based on TF-IDF, the masked words include filtering out the above-mentioned entity words based on the product of word frequency and inverse document frequency. entity words. Among them, term frequency TF represents the frequency of terms appearing in documents. Inverse document frequency IDF is the logarithmic value of the ratio of the total number of documents to the number of documents containing terms. Inverse document frequency represents the distinguishing ability of terms. The fewer documents containing terms, the IDF The bigger. The product of term frequency and inverse document frequency indicates the importance of the term in the document. For example, in Figure (c), the words masked based on the entity masking algorithm in "××××, 3 days, 3 days, give ××× treatment" are "repeated cough", "phlegm" and "ketofen". These words are the entity words in the domain language data.
在图4a所示的示例中,计算设备在基于TF-IDF的实体掩码算法执行掩码语言任务之前,计算设备要对已经获取的知识实体计算TF-IDF,根据词频和逆向文档频率乘积对知识实体进行排序,得到基于TF-IDF排序后的知识实体,排名靠前的知识实体即为重要程度高的知识实体,计算设备则基于这些知识实体对领域任务语言数据执行掩码语言任务。In the example shown in Figure 4a, before the computing device performs the masking language task based on the TF-IDF entity masking algorithm, the computing device needs to calculate TF-IDF for the acquired knowledge entities, and calculate the TF-IDF based on the product of word frequency and inverse document frequency. The knowledge entities are sorted to obtain the knowledge entities sorted based on TF-IDF. The top-ranked knowledge entities are the knowledge entities with high importance. The computing device performs masking language tasks on the domain task language data based on these knowledge entities.
计算设备通过多层次的掩码算法对第一语言数据执行掩码语言任务的过程中,计算设备基于语言表征模型对不同掩码算法遮盖的词语进行预测,根据预测的词语和实际遮盖词语的标签计算交叉熵得到第一损失函数,该第一损失函数能够用于更新语言表征模型。When the computing device performs a masking language task on the first language data through a multi-level masking algorithm, the computing device predicts the words covered by different masking algorithms based on the language representation model, and based on the predicted words and the labels of the actual masked words The cross entropy is calculated to obtain a first loss function, which can be used to update the language representation model.
需要说明的是,当第一语言数据中同时存在多种类型的数据时,本申请实施例中计算设备可以对多种第一语言数据进行多层次执行掩码语言任务。例如,计算设备对于自然语 言数据执行掩码语言任务之后得到第一级的语言表征模型之后,再基于该第一级的语言表征模型对领域语言数据执行掩码语言任务得到第二级的语言表征模型,之后再基于该第二级的语言表征模型对领域任务语言数据执行掩码语言任务得到第三级的语言表征模型,该第三级的语言表征模型即为执行多层次的掩码语言任务之后的模型,多层次的语言表征模型的训练过程也称作微调语言表征模型。It should be noted that when multiple types of data exist simultaneously in the first language data, the computing device in this embodiment of the present application can perform multi-level masking language tasks on multiple types of first language data. For example, after the computing device performs a mask language task on natural language data to obtain a first-level language representation model, it then performs a mask language task on domain language data based on the first-level language representation model to obtain a second-level language representation. model, and then perform masking language tasks on domain task language data based on the second-level language representation model to obtain a third-level language representation model. The third-level language representation model is to perform multi-level masking language tasks. The subsequent model, the training process of the multi-level language representation model is also called fine-tuning the language representation model.
请参阅图4b,图4b为本申请实施例提供的一种执行多层次的掩码语言任务的示意图。在图4b所示的示例中,计算设备依次对自然语言数据、领域语言数据、领域任务语言数据执行掩码语言任务,每种语言数据执行掩码语言任务之后得到微调后的语言表征模型,再利用该微调后的语言表征模型对下一种语言数据执行掩码语言任务,最后输出执行多层次掩码语言任务之后的语言表征模型。Please refer to Figure 4b. Figure 4b is a schematic diagram of performing a multi-level mask language task provided by an embodiment of the present application. In the example shown in Figure 4b, the computing device sequentially performs a mask language task on the natural language data, domain language data, and domain task language data. After performing the mask language task on each language data, a fine-tuned language representation model is obtained, and then The fine-tuned language representation model is used to perform a masking language task on the next language data, and finally the language representation model after performing the multi-level masking language task is output.
在图4b所示的示例中,自然语言数据例如“学前教育是国民教育体系的重要组成部分”,领域语言数据例如“先使用阿奇霉素类消炎药进行治疗”,领域任务语言数据例如“反复咳嗽,咯痰3天,予以酮体芬治疗”。In the example shown in Figure 4b, natural language data such as "Preschool education is an important part of the national education system", domain language data such as "Use azithromycin anti-inflammatory drugs for treatment first", and domain task language data such as "Repeated cough, After coughing up phlegm for 3 days, he was treated with Ketofen."
本申请实施例中计算设备能够结合多种类型的第一语言数据,基于语言表征模型对多种数据执行多层次的掩码语言任务,从而提升了语言表征模型的训练精度。The computing device in the embodiment of the present application can combine multiple types of first language data and perform multi-level masking language tasks on multiple types of data based on the language representation model, thereby improving the training accuracy of the language representation model.
2.计算设备执行关系分类任务。2. The computing device performs the relationship classification task.
下面介绍计算设备执行关系分类任务的过程。计算设备基于语言表征模型执行关系分类任务的过程中,计算设备将知识图谱中知识实体之间的关联关系映射到第一语言中的知识实体,得到第二语言数据,则该第二语言数据中的包括头实体、尾实体和关联关系。The following describes the process of a computing device performing a relationship classification task. In the process of the computing device performing the relationship classification task based on the language representation model, the computing device maps the association relationships between the knowledge entities in the knowledge graph to the knowledge entities in the first language to obtain the second language data, then the second language data It includes head entity, tail entity and association relationship.
具体的,计算设备将知识图谱中知识实体之间的关联关系映射到第一语言中的知识实体的过程中,计算设备根据知识图谱中的知识实体与第一语言数据中的知识实体的语义相似度进行映射,当知识图谱中的知识实体与第一语言数据中的知识实体的语义相似度超过预设阈值时,计算设备将知识图谱中的知识实体的关联关系映射到第一语言数据中的知识实体,得到第二语言数据。Specifically, in the process of mapping the association between knowledge entities in the knowledge graph to the knowledge entities in the first language by the computing device, the computing device determines the semantic similarity between the knowledge entities in the knowledge graph and the knowledge entities in the first language data. Degree is mapped, and when the semantic similarity between the knowledge entity in the knowledge graph and the knowledge entity in the first language data exceeds a preset threshold, the computing device maps the association relationship of the knowledge entity in the knowledge graph to the first language data. Knowledge entities, get second language data.
然后,计算设备基于语言表征模型对第二语言数据执行关系分类任务,得到第二损失函数。具体的,计算设备基于语言表征模型的Transformer模块得到第二语言数据的表征向量,并从中抽取出头实体、尾实体和标志位的表征向量,对于三个表征向量通过全连接层输出头实体和尾实体之间预测的关系和对应的概率,计算设备根据预测得到的关系和概率以及关系和概率的标签计算得到关系分类任务的第二损失函数,该第二损失函数能够用于更新语言表征模型。Then, the computing device performs a relationship classification task on the second language data based on the language representation model to obtain a second loss function. Specifically, the computing device obtains the representation vector of the second language data based on the Transformer module of the language representation model, and extracts the representation vectors of the head entity, tail entity and flag bits from it. For the three representation vectors, the head entity and the tail entity are output through the fully connected layer. Based on the predicted relationships and corresponding probabilities between entities, the computing device calculates a second loss function for the relationship classification task based on the predicted relationships and probabilities and the labels of the relationships and probabilities. The second loss function can be used to update the language representation model.
请参阅图5,图5为本申请实施提供的一种计算设备执行关系分类任务的示意图。在图5所示的示例中,计算设备将知识图谱中知识实体之间关联关系映射到第一语言语言数据中的知识实体,例如,计算设备从医学知识图谱中获得“头晕”和“心悸”两个知识实体之间的关联关系为“伴随”,计算设备将该关联关系映射到第二语言数据“反复头晕、多汗,加重伴心悸3天”中的知识实体,即通过<e1>、</e1>标识出头实体“头晕”,通过<e2>、</e2>来标识尾实体“心悸”,通过标志位[CLS]标识该第二语言数据,该标志位[CLS]用于 执行关系分类任务。Please refer to FIG. 5 , which is a schematic diagram of a computing device provided by the present application for performing a relationship classification task. In the example shown in Figure 5, the computing device maps the association relationships between knowledge entities in the knowledge graph to knowledge entities in the first language language data. For example, the computing device obtains "dizziness" and "heart palpitations" from the medical knowledge graph. The association between the two knowledge entities is "accompanying", and the computing device maps the association to the knowledge entity in the second language data "repeated dizziness, sweating, and worsening heart palpitations for 3 days", that is, through <e1>, </e1> identifies the head entity "dizziness", uses <e2>, </e2> to identify the tail entity "palpitations", and identifies the second language data through the flag bit [CLS], which is used to execute Relationship classification task.
在图5所示的示例中,计算设备基于语言表征模型的Transformer模块得到第二语言数据的表征向量,从中抽取出头实体、尾实体和标志位的表征向量,将三个表征向量拼接之后经过全连接层,得到头实体和尾实体之间的预测出的关系,根据预测的关系和实际关系的标签计算交叉熵可以得到关系分类任务的第二损失函数。In the example shown in Figure 5, the computing device obtains the representation vector of the second language data based on the Transformer module of the language representation model, extracts the representation vector of the head entity, the tail entity and the flag bit, and splices the three representation vectors through the full process. In the connection layer, the predicted relationship between the head entity and the tail entity is obtained. The second loss function of the relationship classification task can be obtained by calculating the cross entropy based on the predicted relationship and the label of the actual relationship.
计算设备基于语言表征模型得到关系分类任务的第二损失函数之后,根据执行掩码语言任务得到的第一损失函数和该第二损失函数计算目标损失函数。具体的,计算设备将第一损失函数和第二数函数相加得到目标损失函数。After the computing device obtains the second loss function of the relationship classification task based on the language representation model, it calculates the target loss function based on the first loss function obtained by performing the masked language task and the second loss function. Specifically, the computing device adds the first loss function and the second loss function to obtain the target loss function.
203.计算设备根据目标损失函数更新语言表征模型。203. The computing device updates the language representation model according to the target loss function.
计算设备根据目标损失函数更新语言表征模型,具体的,当目标损失函数未达到期望值时,计算设备继续通过多任务学习训练语言表征模型,当目标损失函数达到期望值时,计算设备输出更新后的语言表征模型,该更新后的语言表征模型用于执行下游任务。The computing device updates the language representation model according to the target loss function. Specifically, when the target loss function does not reach the expected value, the computing device continues to train the language representation model through multi-task learning. When the target loss function reaches the expected value, the computing device outputs the updated language. Representation model, this updated language representation model is used to perform downstream tasks.
计算设备根据目标损失函数更新语言表征模型之后,利用更新之后的语言表征模型执行多种下游任务。其中下游任务包括以下一项或多项任务:电子病历结构化任务、辅助诊断或智能问诊。After the computing device updates the language representation model according to the target loss function, the updated language representation model is used to perform various downstream tasks. The downstream tasks include one or more of the following tasks: electronic medical record structured tasks, assisted diagnosis, or intelligent consultation.
请继续参阅图3,在图3所示的示例中,计算设备进行多任务学习得到更新后的语言表征模型之后,该语言表征模型可以作为基础工具执行各种下游任务,下游任务例如,辅助诊断、智能问诊和电子病历结构化。Please continue to refer to Figure 3. In the example shown in Figure 3, after the computing device performs multi-task learning to obtain an updated language representation model, the language representation model can be used as a basic tool to perform various downstream tasks. The downstream tasks include, for example, assisted diagnosis. , intelligent consultation and electronic medical record structuring.
从以上实施例可以看出,本申请实施例中计算设备在训练语言表征模型的过程中,能够基于语言表征模型进行多任务学习,计算设备在训练语言表征模型的过程中,融入知识图谱中知识实体之间的关联关系,从而使得语言表征模型对应知识图谱对应的特定领域的语言数据表征更加精确,提升了语言表征模型的精度。It can be seen from the above embodiments that in the embodiments of the present application, the computing device can perform multi-task learning based on the language representation model during the training of the language representation model. The computing device can integrate the knowledge in the knowledge graph during the training of the language representation model. The correlation between entities makes the language data representation in a specific field corresponding to the knowledge graph of the language representation model more accurate, improving the accuracy of the language representation model.
以上介绍本申请实施例提供的一种语言表征模型的训练方法,下面介绍本申请实施例涉及的相关装置。The above describes a language representation model training method provided by the embodiment of the present application. The following describes the relevant devices involved in the embodiment of the present application.
请参阅图6,图6为本申请实施例提供的一种训练装置的结构示意图。该训练装置用于实现上述方法实施例中计算设备执行的步骤,如图6所示,该训练装置600包括获取单元601和处理单元602。Please refer to FIG. 6 , which is a schematic structural diagram of a training device provided by an embodiment of the present application. The training device is used to implement the steps performed by the computing device in the above method embodiment. As shown in FIG. 6 , the training device 600 includes an acquisition unit 601 and a processing unit 602.
其中,获取单元601用于获取第一语言数据。处理单元602用于基于语言表征模型和第一语言数据进行多任务学习,得到目标损失函数,多任务学习包括掩码语言任务和关系分类任务,掩码语言任务用于根据语言表征模型生成的数据执行掩码语言任务,关系分类任务用于基于语言表征模型生成的数据执行关系分类任务。处理单元602还用于根据目标损失函数更新语言表征模型。Among them, the obtaining unit 601 is used to obtain the first language data. The processing unit 602 is used to perform multi-task learning based on the language representation model and the first language data to obtain the target loss function. The multi-task learning includes a mask language task and a relationship classification task. The mask language task is used for data generated according to the language representation model. The masked language task is performed and the relation classification task is used to perform the relation classification task based on the data generated by the language representation model. The processing unit 602 is also used to update the language representation model according to the target loss function.
一种可能的实施方式中,处理单元602具体用于基于语言表征模型对第一语言数据执行掩码语言任务,得到第一损失函数,将知识图谱中的知识实体的关联关系映射到第一语言数据中的知识实体,得到第二语言数据,基于语言表征模型对第二语言数据执行关系分 类任务,得到第二损失函数,根据第一损失函数和第二损失函数确定目标损失函数。In one possible implementation, the processing unit 602 is specifically configured to perform a masking language task on the first language data based on the language representation model, obtain a first loss function, and map the association relationships of the knowledge entities in the knowledge graph to the first language. Knowledge entities in the data are used to obtain second language data. Based on the language representation model, the relationship classification task is performed on the second language data to obtain the second loss function. The target loss function is determined based on the first loss function and the second loss function.
一种可能的实施方式中,处理单元602还用于对第一语言数据进行命名实体识别NER,得到第一语言数据中的知识实体。In a possible implementation, the processing unit 602 is also configured to perform named entity recognition NER on the first language data to obtain the knowledge entities in the first language data.
一种可能的实施方式中,处理单元602具体用于当知识图谱中的知识实体与第一语言数据中的知识实体的语义相似度超过预设阈值时,将知识图谱中的知识实体的关联关系映射到第一语言数据中的知识实体。In one possible implementation, the processing unit 602 is specifically configured to change the association relationship of the knowledge entities in the knowledge graph when the semantic similarity between the knowledge entities in the knowledge graph and the knowledge entities in the first language data exceeds a preset threshold. Mapping to knowledge entities in first language data.
一种可能的实施方式中,处理单元602具体用于根据第一语言数据对应的掩码算法对第一语言数据执行掩码语言任务,第一语言数据包括以下一项或多项数据:自然语言数据、领域语言数据和领域任务语言数据,掩码算法包括全词掩码、实体掩码和基于词频及逆向文档频率的实体掩码。In one possible implementation, the processing unit 602 is specifically configured to perform a masking language task on the first language data according to the masking algorithm corresponding to the first language data. The first language data includes one or more of the following data: natural language Data, domain language data and domain task language data, masking algorithms include full word masking, entity masking and entity masking based on word frequency and inverse document frequency.
一种可能的实施方式中,处理单元602具体用于当第一语言数据为自然语言数据时,根据全词掩码算法对第一语言数据执行掩码语言任务。当第一语言数据为领域语言数据时,根据实体掩码算法对第一语言数据执行掩码语言任务。当第一语言数据为领域任务语言数据时,根据基于词频及逆向文档频率TF-IDF的实体掩码算法对第一语言数据执行掩码语言任务。In one possible implementation, the processing unit 602 is specifically configured to perform a masking language task on the first language data according to a whole-word masking algorithm when the first language data is natural language data. When the first language data is domain language data, a masking language task is performed on the first language data according to the entity masking algorithm. When the first language data is domain task language data, the mask language task is performed on the first language data according to the entity masking algorithm based on word frequency and inverse document frequency TF-IDF.
一种可能的实施方式中,领域语言数据包括医学资料的语言数据,领域任务语言数据包括电子病历的语言数据或影像检查的语言数据,处理单元还用于基于更新后的语言表征模型执行下游任务,下游任务包括以下一项或多项任务:电子病历结构化任务、辅助诊断或智能问诊。In a possible implementation, the domain language data includes language data of medical materials, the domain task language data includes language data of electronic medical records or language data of imaging examinations, and the processing unit is also used to perform downstream tasks based on the updated language representation model. ,Downstream tasks include one or more of the following tasks: ,electronic medical record structured tasks, auxiliary diagnosis, or ,intelligent consultation.
应理解以上装置中单元的划分仅仅是一种逻辑功能的划分,实际实现时可以全部或部分集成到一个物理实体上,也可以物理上分开。且装置中的单元可以全部以软件通过处理元件调用的形式实现;也可以全部以硬件的形式实现;还可以部分单元以软件通过处理元件调用的形式实现,部分单元以硬件的形式实现。例如,各个单元可以为单独设立的处理元件,也可以集成在装置的某一个芯片中实现,此外,也可以以程序的形式存储于存储器中,由装置的某一个处理元件调用并执行该单元的功能。此外这些单元全部或部分可以集成在一起,也可以独立实现。这里所述的处理元件又可以成为处理器,可以是一种具有信号的处理能力的集成电路。在实现过程中,上述方法的各步骤或以上各个单元可以通过处理器元件中的硬件的集成逻辑电路实现或者以软件通过处理元件调用的形式实现。It should be understood that the division of units in the above device is only a division of logical functions. In actual implementation, all or part of the units may be integrated into a physical entity or physically separated. And the units in the device can all be implemented in the form of software calling through processing components; they can also all be implemented in the form of hardware; some units can also be implemented in the form of software calling through processing components, and some units can be implemented in the form of hardware. For example, each unit can be a separate processing element, or it can be integrated and implemented in a certain chip of the device. In addition, it can also be stored in the memory in the form of a program, and a certain processing element of the device can call and execute the unit. Function. In addition, all or part of these units can be integrated together or implemented independently. The processing element described here can also be a processor, which can be an integrated circuit with signal processing capabilities. During the implementation process, each step of the above method or each unit above can be implemented by an integrated logic circuit of hardware in the processor element or implemented in the form of software calling through the processing element.
值得说明的是,对于上述方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本发明本申请并不受所描述的动作顺序的限制,其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作并不一定是本发明本申请所必须的。It is worth noting that for the above method embodiments, for the sake of simple description, they are all expressed as a series of action combinations. However, those skilled in the art should know that the present invention is not limited by the described action sequence. Secondly, those skilled in the art should also know that the embodiments described in the specification are all preferred embodiments, and the actions involved are not necessarily necessary for the present application.
本领域的技术人员根据以上描述的内容,能够想到的其他合理的步骤组合,也属于本发明本申请的保护范围内。其次,本领域技术人员也应该熟悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作并不一定是本发明本申请所必须的。Based on the above description, those skilled in the art can think of other reasonable step combinations, which also fall within the protection scope of the present application. Secondly, those skilled in the art should also be familiar with the fact that the embodiments described in the specification are all preferred embodiments, and the actions involved are not necessarily necessary for the present application.
请参阅图7,图7为本申请实施例提供的一种计算设备示意图。如图7所示,该计算设备700包括:处理器710、存储器720和接口730,处理器710、存储器720与接口730 通过总线(图中未标注)耦合。存储器720存储有指令,当存储器720中的执行指令被执行时,计算设备700执行上述方法实施例中计算机设备所执行的方法。Please refer to FIG. 7 , which is a schematic diagram of a computing device provided by an embodiment of the present application. As shown in Figure 7, the computing device 700 includes: a processor 710, a memory 720, and an interface 730. The processor 710, the memory 720, and the interface 730 are coupled through a bus (not labeled in the figure). The memory 720 stores instructions. When the execution instructions in the memory 720 are executed, the computing device 700 performs the method performed by the computer device in the above method embodiment.
计算设备700可以是被配置成实施以上方法的一个或多个集成电路,例如:一个或多个特定集成电路(application specific integrated circuit,ASIC),或,一个或多个微处理器(digital singnal processor,DSP),或,一个或者多个现场可编程门阵列(field programmable gate array,FPGA),或这些集成电路形式中至少两种的组合。再如,当装置中的单元可以通过处理元件调度程序的形式实现时,该处理元件可以是通用处理器,例如中央处理器(central processing unit,CPU)或其它可以调用程序的处理器。再如,这些单元可以集成在一起,以片上系统(system-on-a-chip,SOC)的形式实现。The computing device 700 may be one or more integrated circuits configured to implement the above methods, such as one or more application specific integrated circuits (ASICs), or one or more microprocessors (digital signal processors). , DSP), or, one or more field programmable gate arrays (FPGA), or a combination of at least two of these integrated circuit forms. For another example, when the unit in the device can be implemented in the form of a processing element scheduler, the processing element can be a general processor, such as a central processing unit (Central Processing Unit, CPU) or other processors that can call programs. For another example, these units can be integrated together and implemented in the form of a system-on-a-chip (SOC).
处理器710可以是中央处理单元(central processing unit,CPU),还可以是其它通用处理器、数字信号处理器(digital signal processor,DSP)、专用集成电路(application specific integrated circuit,ASIC)、现场可编程门阵列(field programmable gate array,FPGA)或者其它可编程逻辑器件、晶体管逻辑器件,硬件部件或者其任意组合。通用处理器可以是微处理器,也可以是任何常规的处理器。The processor 710 can be a central processing unit (CPU), or other general-purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), or a field-programmable processor. Field programmable gate array (FPGA) or other programmable logic devices, transistor logic devices, hardware components or any combination thereof. A general-purpose processor can be a microprocessor or any conventional processor.
存储器720可以包括只读存储器和随机存取存储器,并向处理器710提供指令和数据。存储器720还可以包括非易失性随机存取存储器。例如,存储器720可设置多个分区,每个区域分别用于存储不同软件模块的私钥。 Memory 720 may include read-only memory and random access memory and provides instructions and data to processor 710 . Memory 720 may also include non-volatile random access memory. For example, the memory 720 may be provided with multiple partitions, and each area is used to store private keys of different software modules.
存储器720可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(read-only memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(random access memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(static RAM,SRAM)、动态随机存取存储器(DRAM)、同步动态随机存取存储器(synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(double data date SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(direct rambus RAM,DR RAM)。 Memory 720 may be volatile memory or non-volatile memory, or may include both volatile and non-volatile memory. Among them, non-volatile memory can be read-only memory (ROM), programmable ROM (PROM), erasable programmable read-only memory (erasable PROM, EPROM), electrically removable memory. Erase electrically programmable read-only memory (EPROM, EEPROM) or flash memory. Volatile memory can be random access memory (RAM), which is used as an external cache. By way of illustration, but not limitation, many forms of RAM are available, such as static random access memory (SRAM), dynamic random access memory (DRAM), synchronous dynamic random access memory (synchronous DRAM, SDRAM), Double data rate synchronous dynamic random access memory (double data date SDRAM, DDR SDRAM), enhanced synchronous dynamic random access memory (enhanced SDRAM, ESDRAM), synchronous link dynamic random access memory (synchlink DRAM, SLDRAM) and direct Memory bus random access memory (direct rambus RAM, DR RAM).
总线除包括数据总线之外,还可以包括电源总线、控制总线和状态信号总线等。总线可以是快捷外围部件互连标准(peripheral component interconnect express,PCIe)总线,或扩展工业标准结构(extended industry standard architecture,EISA)总线、统一总线(unified bus,Ubus或UB)、计算机快速链接(compute express link,CXL)、缓存一致互联协议(cache coherent interconnect for accelerators,CCIX)等。总线可以分为地址总线、数据总线、控制总线等。In addition to the data bus, the bus may also include a power bus, a control bus, a status signal bus, etc. The bus can be a peripheral component interconnect express (PCIe) bus, an extended industry standard architecture (EISA) bus, a unified bus (unified bus, Ubus or UB), or a computer quick link (compute express link (CXL), cache coherent interconnect for accelerators (CCIX), etc. The bus can be divided into address bus, data bus, control bus, etc.
在本申请的另一个实施例中,还提供一种计算机可读存储介质,计算机可读存储介质中存储有计算机执行指令,当设备的处理器执行该计算机执行指令时,设备执行上述方法实施例中计算机设备所执行的方法。In another embodiment of the present application, a computer-readable storage medium is also provided. Computer-executable instructions are stored in the computer-readable storage medium. When the processor of the device executes the computer-executed instructions, the device executes the above method embodiment. A method performed by a computer device.
在本申请的另一个实施例中,还提供一种计算机程序产品,该计算机程序产品包括计 算机执行指令。当设备的处理器执行该计算机执行指令时,设备执行上述方法实施例中计算机设备所执行的方法。In another embodiment of the present application, a computer program product is also provided, the computer program product including computer execution instructions. When the processor of the device executes the computer execution instruction, the device executes the method executed by the computer device in the above method embodiment.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that for the convenience and simplicity of description, the specific working processes of the systems, devices and units described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be described again here.
在本申请所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed systems, devices and methods can be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or can be integrated into another system, or some features can be ignored, or not implemented. On the other hand, the coupling or direct coupling or communication connection between each other shown or discussed may be through some interfaces, and the indirect coupling or communication connection of the devices or units may be in electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or they may be distributed to multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present application can be integrated into one processing unit, each unit can exist physically alone, or two or more units can be integrated into one unit. The above integrated units can be implemented in the form of hardware or software functional units.
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,read-only memory)、随机存取存储器(RAM,random access memory)、磁碟或者光盘等各种可以存储程序代码的介质。If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application is essentially or contributes to the existing technology, or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in various embodiments of this application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, read-only memory), random access memory (RAM, random access memory), magnetic disk or optical disk and other media that can store program code. .

Claims (17)

  1. 一种语言表征模型的训练方法,其特征在于,包括:A training method for a language representation model, which is characterized by including:
    获取第一语言数据;Get first language data;
    基于语言表征模型和所述第一语言数据进行多任务学习,得到目标损失函数,所述多任务学习包括掩码语言任务和关系分类任务,所述掩码语言任务用于根据所述语言表征模型生成的数据执行掩码语言任务,所述关系分类任务用于基于所述语言表征模型生成的数据执行关系分类任务;Multi-task learning is performed based on the language representation model and the first language data to obtain a target loss function. The multi-task learning includes a mask language task and a relationship classification task. The mask language task is used to perform according to the language representation model. The generated data performs a masked language task, and the relationship classification task is used to perform a relationship classification task based on the data generated by the language representation model;
    根据所述目标损失函数更新所述语言表征模型。The language representation model is updated according to the target loss function.
  2. 根据权利要求1所述的方法,其特征在于,所述根据语言表征模型和所述第一语言数据进行多任务学习,得到目标损失函数包括:The method according to claim 1, characterized in that, performing multi-task learning according to the language representation model and the first language data to obtain the target loss function includes:
    基于语言表征模型对所述第一语言数据执行掩码语言任务,得到第一损失函数;Perform a masking language task on the first language data based on the language representation model to obtain a first loss function;
    将知识图谱中的知识实体的关联关系映射到所述第一语言数据中的知识实体,得到第二语言数据;Map the association relationship of the knowledge entities in the knowledge graph to the knowledge entities in the first language data to obtain the second language data;
    基于所述语言表征模型对所述第二语言数据执行关系分类任务,得到第二损失函数;Perform a relationship classification task on the second language data based on the language representation model to obtain a second loss function;
    根据所述第一损失函数和所述第二损失函数确定所述目标损失函数。The target loss function is determined based on the first loss function and the second loss function.
  3. 根据权利要求2所述的方法,其特征在于,所述将所述知识图谱中的知识实体关联关系映射到所述第一语言数据中的知识实体之前,所述方法还包括;The method according to claim 2, characterized in that before mapping the knowledge entity association relationship in the knowledge graph to the knowledge entity in the first language data, the method further includes;
    对所述第一语言数据进行命名实体识别NER,得到第一语言数据中的知识实体。Perform named entity recognition NER on the first language data to obtain knowledge entities in the first language data.
  4. 根据权利要求2或3所述的方法,其特征在于,所述将知识图谱中的知识实体关联关系映射到所述第一语言数据中的知识实体包括:The method according to claim 2 or 3, characterized in that mapping the knowledge entity association relationship in the knowledge graph to the knowledge entity in the first language data includes:
    当所述知识图谱中的知识实体与所述第一语言数据中的知识实体的语义相似度超过预设阈值时,将所述知识图谱中的知识实体的关联关系映射到所述第一语言数据中的知识实体。When the semantic similarity between the knowledge entity in the knowledge graph and the knowledge entity in the first language data exceeds a preset threshold, map the association relationship of the knowledge entity in the knowledge graph to the first language data knowledge entities in .
  5. 根据权利要求2至4中任一项所述方法,其特征在于,所述对所述第一语言数据执行掩码语言任务包括:The method according to any one of claims 2 to 4, wherein performing a masking language task on the first language data includes:
    根据所述第一语言数据对应的掩码算法对所述第一语言数据执行掩码语言任务,所述第一语言数据包括以下一项或多项数据:自然语言数据、领域语言数据和领域任务语言数据,所述掩码算法包括全词掩码、实体掩码和基于词频及逆向文档频率的实体掩码。Perform a masking language task on the first language data according to the masking algorithm corresponding to the first language data. The first language data includes one or more of the following data: natural language data, domain language data, and domain tasks. For language data, the masking algorithm includes full-word masking, entity masking, and entity masking based on word frequency and inverse document frequency.
  6. 根据权利要求5所述的方法,其特征在于,所述根据所述第一语言数据对应的掩码算法对所述第一语言数据执行掩码语言任务包括:The method according to claim 5, wherein performing a masking language task on the first language data according to the masking algorithm corresponding to the first language data includes:
    当所述第一语言数据为自然语言数据时,根据全词掩码算法对所述第一语言数据执行掩码语言任务;When the first language data is natural language data, perform a masking language task on the first language data according to a whole-word masking algorithm;
    当所述第一语言数据为领域语言数据时,根据实体掩码算法对所述第一语言数据执行掩码语言任务;When the first language data is domain language data, perform a masking language task on the first language data according to an entity masking algorithm;
    当所述第一语言数据为领域任务语言数据时,根据基于词频及逆向文档频率TF-IDF的实体掩码算法对所述第一语言数据执行掩码语言任务。When the first language data is domain task language data, a masking language task is performed on the first language data according to an entity masking algorithm based on word frequency and inverse document frequency TF-IDF.
  7. 根据权利要求5或6所述的方法,其特征在于,所述领域语言数据包括医学资料的 语言数据,所述领域任务语言数据包括电子病历的语言数据或影像检查的语言数据,所述方法还包括:The method according to claim 5 or 6, characterized in that the domain language data includes language data of medical materials, the domain task language data includes language data of electronic medical records or language data of imaging examinations, and the method further include:
    基于所述更新后的语言表征模型执行下游任务,所述下游任务包括以下一项或多项任务:电子病历结构化任务、辅助诊断或智能问诊。Perform downstream tasks based on the updated language representation model, and the downstream tasks include one or more of the following tasks: electronic medical record structured tasks, assisted diagnosis, or intelligent consultation.
  8. 一种语言表征模型的训练装置,其特征在于,包括:A training device for a language representation model, which is characterized by including:
    获取单元,用于获取第一语言数据;The acquisition unit is used to acquire first language data;
    处理单元,用于基于语言表征模型和所述第一语言数据进行多任务学习,得到目标损失函数,所述多任务学习包括掩码语言任务和关系分类任务,所述掩码语言任务用于根据所述语言表征模型生成的数据执行掩码语言任务,所述关系分类任务用于基于所述语言表征模型生成的数据执行关系分类任务;A processing unit configured to perform multi-task learning based on the language representation model and the first language data to obtain a target loss function. The multi-task learning includes a masked language task and a relationship classification task, and the masked language task is used to obtain a target loss function based on the language representation model and the first language data. The data generated by the language representation model performs a masked language task, and the relationship classification task is used to perform a relationship classification task based on the data generated by the language representation model;
    所述处理单元还用于根据所述目标损失函数更新所述语言表征模型。The processing unit is also used to update the language representation model according to the target loss function.
  9. 根据权利要求8所述的装置,其特征在于,所述处理单元具体用于:The device according to claim 8, characterized in that the processing unit is specifically configured to:
    基于语言表征模型对所述第一语言数据执行掩码语言任务,得到第一损失函数;Perform a masking language task on the first language data based on the language representation model to obtain a first loss function;
    将知识图谱中的知识实体的关联关系映射到所述第一语言数据中的知识实体,得到第二语言数据;Map the association relationship of the knowledge entities in the knowledge graph to the knowledge entities in the first language data to obtain the second language data;
    基于所述语言表征模型对所述第二语言数据执行关系分类任务,得到第二损失函数;Perform a relationship classification task on the second language data based on the language representation model to obtain a second loss function;
    根据所述第一损失函数和所述第二损失函数确定所述目标损失函数。The target loss function is determined based on the first loss function and the second loss function.
  10. 根据权利要求9所述的装置,其特征在于,所述处理单元还用于;The device according to claim 9, characterized in that the processing unit is also used for;
    对所述第一语言数据进行命名实体识别NER,得到第一语言数据中的知识实体。Perform named entity recognition NER on the first language data to obtain knowledge entities in the first language data.
  11. 根据权利要求9或10所述的装置,其特征在于,所述处理单元具体用于:The device according to claim 9 or 10, characterized in that the processing unit is specifically used for:
    当所述知识图谱中的知识实体与所述第一语言数据中的知识实体的语义相似度超过预设阈值时,将所述知识图谱中的知识实体的关联关系映射到所述第一语言数据中的知识实体。When the semantic similarity between the knowledge entity in the knowledge graph and the knowledge entity in the first language data exceeds a preset threshold, map the association relationship of the knowledge entity in the knowledge graph to the first language data knowledge entities in .
  12. 根据权利要求9至11中任一项所述装置,其特征在于,所述处理单元具体用于:The device according to any one of claims 9 to 11, characterized in that the processing unit is specifically used for:
    根据所述第一语言数据对应的掩码算法对所述第一语言数据执行掩码语言任务,所述第一语言数据包括以下一项或多项数据:自然语言数据、领域语言数据和领域任务语言数据,所述掩码算法包括全词掩码、实体掩码和基于词频及逆向文档频率的实体掩码。Perform a masking language task on the first language data according to the masking algorithm corresponding to the first language data. The first language data includes one or more of the following data: natural language data, domain language data, and domain tasks. For language data, the masking algorithm includes full-word masking, entity masking, and entity masking based on word frequency and inverse document frequency.
  13. 根据权利要求12所述的装置,其特征在于,所述处理单元具体用于:The device according to claim 12, characterized in that the processing unit is specifically configured to:
    当所述第一语言数据为自然语言数据时,根据全词掩码算法对所述第一语言数据执行掩码语言任务;When the first language data is natural language data, perform a masking language task on the first language data according to a whole-word masking algorithm;
    当所述第一语言数据为领域语言数据时,根据实体掩码算法对所述第一语言数据执行掩码语言任务;When the first language data is domain language data, perform a masking language task on the first language data according to an entity masking algorithm;
    当所述第一语言数据为领域任务语言数据时,根据基于词频及逆向文档频率TF-IDF的实体掩码算法对所述第一语言数据执行掩码语言任务。When the first language data is domain task language data, a masking language task is performed on the first language data according to an entity masking algorithm based on word frequency and inverse document frequency TF-IDF.
  14. 根据权利要求11或13所述的装置,其特征在于,所述领域语言数据包括医学资料的语言数据,所述领域任务语言数据包括电子病历的语言数据或影像检查的语言数据,所述处理单元还用于:The device according to claim 11 or 13, wherein the domain language data includes language data of medical data, the domain task language data includes language data of electronic medical records or language data of imaging examinations, and the processing unit Also used for:
    基于所述更新后的语言表征模型执行下游任务,所述下游任务包括以下一项或多项任务:电子病历结构化任务、辅助诊断或智能问诊。Perform downstream tasks based on the updated language representation model, and the downstream tasks include one or more of the following tasks: electronic medical record structured tasks, assisted diagnosis, or intelligent consultation.
  15. 一种计算设备,其特征在于,包括处理器,所述处理器与存储器耦合,所述处理器用于存储指令,当所述指令被所述处理器执行时,以使得所述计算设备执行权利要求1至7中任一项所述的方法。A computing device, characterized by comprising a processor coupled to a memory, the processor configured to store instructions that, when executed by the processor, cause the computing device to perform claims The method described in any one of 1 to 7.
  16. 一种计算机可读存储介质,其上存储有指令,其特征在于,所述指令被执行时,以使得计算机执行权利要求1至7中任一项所述的方法。A computer-readable storage medium with instructions stored thereon, characterized in that when the instructions are executed, the computer executes the method described in any one of claims 1 to 7.
  17. 一种计算机程序产品,所述计算机程序产品中包括指令,其特征在于,所述指令被执行时,以使得计算机实现权利要求1至7中任一项所述的方法。A computer program product, the computer program product includes instructions, characterized in that when the instructions are executed, the computer implements the method according to any one of claims 1 to 7.
PCT/CN2022/137523 2022-03-29 2022-12-08 Training method and training device for language representation model WO2023185082A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210318687.8 2022-03-29
CN202210318687.8A CN116933789A (en) 2022-03-29 2022-03-29 Training method and training device for language characterization model

Publications (1)

Publication Number Publication Date
WO2023185082A1 true WO2023185082A1 (en) 2023-10-05

Family

ID=88198919

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/137523 WO2023185082A1 (en) 2022-03-29 2022-12-08 Training method and training device for language representation model

Country Status (2)

Country Link
CN (1) CN116933789A (en)
WO (1) WO2023185082A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111539223A (en) * 2020-05-29 2020-08-14 北京百度网讯科技有限公司 Language model training method and device, electronic equipment and readable storage medium
CN111680145A (en) * 2020-06-10 2020-09-18 北京百度网讯科技有限公司 Knowledge representation learning method, device, equipment and storage medium
CN113705187A (en) * 2021-08-13 2021-11-26 北京百度网讯科技有限公司 Generation method and device of pre-training language model, electronic equipment and storage medium
CN113704388A (en) * 2021-03-05 2021-11-26 腾讯科技(深圳)有限公司 Training method and device for multi-task pre-training model, electronic equipment and medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111539223A (en) * 2020-05-29 2020-08-14 北京百度网讯科技有限公司 Language model training method and device, electronic equipment and readable storage medium
CN111680145A (en) * 2020-06-10 2020-09-18 北京百度网讯科技有限公司 Knowledge representation learning method, device, equipment and storage medium
CN113704388A (en) * 2021-03-05 2021-11-26 腾讯科技(深圳)有限公司 Training method and device for multi-task pre-training model, electronic equipment and medium
CN113705187A (en) * 2021-08-13 2021-11-26 北京百度网讯科技有限公司 Generation method and device of pre-training language model, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN116933789A (en) 2023-10-24

Similar Documents

Publication Publication Date Title
CN106874643B (en) Method and system for automatically constructing knowledge base to realize auxiliary diagnosis and treatment based on word vectors
CN108427707B (en) Man-machine question and answer method, device, computer equipment and storage medium
WO2021000497A1 (en) Retrieval method and apparatus, and computer device and storage medium
CN111984851B (en) Medical data searching method, device, electronic device and storage medium
CN110675944A (en) Triage method and device, computer equipment and medium
WO2021151328A1 (en) Symptom data processing method and apparatus, and computer device and storage medium
KR102424085B1 (en) Machine-assisted conversation system and medical condition inquiry device and method
CN110427486B (en) Body condition text classification method, device and equipment
CN111128391B (en) Information processing apparatus, method and storage medium
US20210042344A1 (en) Generating or modifying an ontology representing relationships within input data
WO2021114836A1 (en) Text coherence determining method, apparatus, and device, and medium
CN116313120A (en) Model pre-training method, medical application task processing method and related devices thereof
Dong et al. Rare disease identification from clinical notes with ontologies and weak supervision
Cao et al. Chinese electronic medical record named entity recognition based on BERT-WWM-IDCNN-CRF
CN113157887A (en) Knowledge question-answering intention identification method and device and computer equipment
WO2023124837A1 (en) Inquiry processing method and apparatus, device, and storage medium
CN111222325A (en) Medical semantic labeling method and system of bidirectional stack type recurrent neural network
WO2023185082A1 (en) Training method and training device for language representation model
CN116719840A (en) Medical information pushing method based on post-medical-record structured processing
EP3564964A1 (en) Method for utilising natural language processing technology in decision-making support of abnormal state of object
Montenegro et al. The hope model architecture: a novel approach to pregnancy information retrieval based on conversational agents
Chen et al. Extraction of entity relations from Chinese medical literature based on multi-scale CRNN
Ren et al. Extraction of transitional relations in healthcare processes from Chinese medical text based on deep learning
CN113761899A (en) Medical text generation method, device, equipment and storage medium
Pavlopoulos et al. Clinical predictive keyboard using statistical and neural language modeling

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22934900

Country of ref document: EP

Kind code of ref document: A1