CN111859951B

CN111859951B - Language model training method and device, electronic equipment and readable storage medium

Info

Publication number: CN111859951B
Application number: CN202010564362.9A
Authority: CN
Inventors: 朱丹翔
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-06-19
Filing date: 2020-06-19
Publication date: 2024-03-26
Anticipated expiration: 2040-06-19
Also published as: CN111859951A

Abstract

The application discloses a training method and device of a language model, electronic equipment and a readable storage medium, and relates to the technical field of deep learning and natural language processing. The specific implementation scheme is as follows: acquiring word segmentation information of an original input text; annotating word cutting information and word cutting information on each word mark in the original input text to obtain an input text sample; the input text sample is input into a language model to train the language model. As the semantic information representation with larger granularity is introduced, the learning capability of the language model on word sense information is enhanced, the performance of the language model is improved, the universality of the language model is not reduced, and the method is more friendly to downstream sequence labeling tasks.

Description

Language model training method and device, electronic equipment and readable storage medium

Technical Field

The invention relates to the technical field of computers, in particular to the technical field of deep learning and natural language processing, and especially relates to a training method and device of a language model, electronic equipment and a readable storage medium.

Background

In the field of Chinese natural language processing (Natural Language Processing, NLP), a large amount of unsupervised text is used for performing self-supervised pre-training learning (pre-training) of a language model, and then supervised task data is adopted for performing fine-tuning (fine-tuning) on the language model, so that the method is an advanced language model training technology in the current NLP field.

In the prior art, in order to prevent the training effect of the language model from being influenced by the performance of the word segmentation device, the self-supervision pre-training learning of the language model is performed based on word granularity, so that the language model is difficult to learn information with larger semantic granularity (such as words), the semantics of the words are very important in Chinese language expression, and the learning based on the word granularity may damage the learning of the language model on the semantics of the words, thereby influencing the performance of the language model.

Disclosure of Invention

Aspects of the present application provide a training method, apparatus, electronic device, and readable storage medium for a language model, so as to enhance learning ability of the language model on word sense information and improve performance of the language model.

According to a first aspect, there is provided a training method of a language model, including:

acquiring word segmentation information of an original input text;

annotating word cutting information and word cutting information on each word mark in the original input text to obtain an input text sample;

the input text sample is input into a language model to train the language model.

According to a second aspect, there is provided a training apparatus of a language model, comprising:

the acquisition unit is used for acquiring word segmentation information of the original input text;

the marking unit is used for annotating word cutting information and word cutting information on each word mark in the original input text to obtain an input text sample;

and the language model is used for receiving the input text sample so as to train based on the input text sample.

According to a third aspect, there is provided an electronic device comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the aspects and methods of any one of the possible implementations described above.

According to a fourth aspect, there is provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of the aspects and any possible implementation described above.

According to a fifth aspect, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the method of the aspects and any possible implementation described above.

According to the technical scheme, the word segmentation information of the original input text is obtained, word segmentation information and word segmentation information are annotated to each word mark in the original input text to obtain an input text sample, and then the input text sample is input into a language model to train the language model, so that the language model can learn semantic information based on word granularity.

In addition, in the prior art, in a sequence labeling task of the language model pre-training, each word in an input text is labeled, the input text is required to be segmented according to the word, if the input text is segmented according to the word, the universality of the language model is reduced, and the downstream sequence labeling task is not friendly. By adopting the technical scheme provided by the application, on the basis of not changing the word segmentation of the original input text, word segmentation information is additionally introduced, so that the semantic learning capacity of a language model can be improved, the universality of the language model can not be reduced, and the method is more friendly to a downstream sequence labeling task compared with a mode of directly introducing word segmentation.

In addition, by adopting the technical scheme provided by the application, the trained language model has better semantic information expression capability, so that the accuracy of the processing result of the NLP task can be effectively improved when the trained language model is used for the subsequent NLP task.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings can be obtained according to these drawings without inventive effort to a person skilled in the art. The drawings are only for better understanding of the present solution and are not to be construed as limiting the present application. Wherein:

FIG. 1 is a schematic diagram according to a first embodiment of the present application;

FIG. 2 is a schematic diagram according to a second embodiment of the present application;

FIG. 3 is a schematic diagram according to a third embodiment of the present application;

FIG. 4 is a schematic diagram according to a fourth embodiment of the present application;

FIG. 5 is a schematic diagram according to a fifth embodiment of the present application;

FIG. 6 is a schematic diagram of an electronic device for implementing a training method for language models of embodiments of the present application.

Detailed Description

Exemplary embodiments of the present application are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present application to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

It will be apparent that the embodiments described are some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.

It should be noted that, the terminal in the embodiments of the present application may include, but is not limited to, a mobile phone, a personal digital assistant (Personal Digital Assistant, PDA), a wireless handheld device, a Tablet Computer (Tablet Computer), a personal Computer (Personal Computer, PC), an MP3 player, an MP4 player, a wearable device (e.g., smart glasses, smart watches, smart bracelets, etc.), a smart home device, and other smart devices.

In addition, the term "and/or" herein is merely an association relationship describing an association object, and means that three relationships may exist, for example, a and/or B may mean: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship.

Aiming at the problems, the application provides a training method and device of a language model, electronic equipment and a readable storage medium, which are used for enhancing the learning ability of the language model to word sense information and improving the performance of the language model.

Fig. 1 is a schematic view according to a first embodiment of the present application, as shown in fig. 1.

101. And acquiring word segmentation information of the original input text.

102. And annotating word cutting information and word cutting information on each word mark in the original input text to obtain an input text sample.

The word segmentation information is a word segmentation result of the original input text and is used for identifying each word in the original input text. The word segmentation information is a word segmentation result of the original input text and is used for identifying each word in the original input text.

103. The input text sample is input into a language model to train the language model.

The 101-103 may be an iterative process, and training of the language model is achieved through the iterative process 101-103 until a preset training completion condition is met, where training of the language model is completed.

Optionally, in a possible implementation manner of this embodiment, the preset training completion condition may be set according to actual requirements, for example, may be: the number of training of the language model (i.e., the number of iterations performed 101-103) reaches a first preset threshold, for example, 100 tens of thousands.

The execution subjects 101 to 103 may be partly or entirely applications located in the local terminal, or may be functional units such as plug-ins or software development kits (Software Development Kit, SDKs) provided in the applications located in the local terminal, or may be processing engines located in a network side server, which is not particularly limited in this embodiment.

It will be appreciated that the application may be a native program (native app) installed on the terminal, or may also be a web page program (webApp) of a browser on the terminal, which is not limited in this embodiment.

In this embodiment, the language model is trained by labeling the word segmentation information and the input text sample of the word segmentation information, so that the language model can learn semantic information based on word granularity, and the word granularity contains richer semantic information representation relative to the word granularity due to the introduction of the semantic information of the word granularity, so that the modeling of the word meaning information by the language model is enhanced based on the semantic learning of the word granularity, the learning capability of the language model on the word meaning information is enhanced, and the performance of the language model is improved.

Optionally, in one possible implementation manner of this embodiment, at least one word may be obtained by segmenting the original input text in 101, where each word in the at least one word includes at least one character, and the characters may include a text, a letter, a number, an operation symbol, punctuation marks and other symbols, and some functional symbols, and so on; then, determining whether each character is a first character of the word according to whether each character is the first character of the word, wherein the word segmentation information of the original input text comprises: and marking information of whether each character in the at least one word is a first character.

In this embodiment, according to the word segmentation result, the marking information of the first character in the word where each character is located is used as the word segmentation information of each word, so that the word segmentation result can be represented on the basis of not changing the word segmentation of the original input text, the word segmentation information can be conveniently introduced, the universality of the ERNIE model is not reduced, and the method is more friendly to the downstream sequence labeling task compared with the direct word segmentation introduction mode.

Optionally, in one possible implementation of this embodiment, in 101, the original input text includes at least one sentence. Accordingly, in 102, word segmentation information and word segmentation information may be annotated to each word label in the original input text, and sentence identification (sentence embedding) may be annotated to each sentence in the original input text, so as to obtain the input text sample. Wherein the sentence identification is used to identify the current sentence as the number of sentences in the original input text.

In this embodiment, except that word segmentation information and word segmentation information are annotated to each word mark in the original input text, and sentence identification is annotated to each sentence in the original input text, so that the language model can learn information with larger semantic granularity from a sentence level, and the semantic learning and expression capability of the language model is further improved; and, annotating each sentence in the original input text with a sentence identification can also be used to train the sentence tasks (e.g., sentence order, sentence distance, sentence logical relationship) on the original input text.

Alternatively, in one possible implementation of this embodiment, the language model in the foregoing embodiment of the present application may be any language model, for example, a knowledge-enhanced semantic representation (Enhanced Representation from kNowledge IntEgration, ERNIE) model may be used.

The ERNIE model can learn the semantic representation of the complete concept through modeling priori semantic knowledge such as entity concepts in mass data, and pre-trains the ERNIE model by marking word segmentation information and input text samples of the word segmentation information, so that the representation of the ERNIE model on semantic knowledge units is closer to the real world, and the ERNIE model is modeled based on word feature input and word feature input at the same time, and has strong semantic representation capability. In this embodiment, the ERNIE model is used as a language model, so that the strong semantic representation capability of the ERNIE model can be utilized to model words, entities and entity relationships in mass data, and learn the semantic knowledge of the real world, thereby enhancing the semantic representation capability of the model.

Fig. 2 is a schematic diagram according to a second embodiment of the present application, as shown in fig. 2.

First, word segmentation information of an original input text is acquired. The method comprises two steps:

in step one, word segmentation is performed on the original input text. Assuming that the original input text is "big brothers get up and get up, the big brothers in the morning carry bricks today no", and the word is obtained after word segmentation: (big) (brother) (get up) (la) (breakfast) (big) (brother) (today) (move brick) (no);

and step two, according to word segmentation, whether each character in each word is the first character in the word is obtained, and whether the marking information of the first character of each character is determined.

Assuming that the marking information of the first character is B and the marking information of the non-first character is I, the word segmentation information of the original input text is obtained and comprises the marking information of whether each character in the original input text is the first character or not.

For example, original text: big brothers get up and get out of bed, breakfast big brothers move bricks today or not

Word segmentation information: BBIB, IB, B I B B I B I B I B

Next, word segmentation information (token segmentation) and word segmentation information (seg segmentation) are annotated to each word in the original input text, and an input text sample is obtained.

Then, input text samples marked with word segmentation information and word segmentation information are input into an ERNIE model, and the language model is trained. As shown in fig. 2, the word segmentation information is additionally introduced in the case of keeping the text segmented by words, and the text segmented by words is kept.

Fig. 3 is a schematic view according to a third embodiment of the present application, as shown in fig. 3.

On the basis of the first embodiment, after the training of the language model is completed, the language model can be further optimized through the supervised NLP task, so that the prediction performance of the language model in the NLP task is further improved.

In a second embodiment, the optimization of the language model by the supervised NLP task may be achieved specifically by the following steps:

201. and performing NLP tasks by using the trained language model to obtain a processing result.

Optionally, in one possible implementation manner of this embodiment, the NLP task may be any one or more of classification, matching, sequence labeling, and the like, which is not limited in this embodiment. Accordingly, the processing result is a processing result of a specific NLP task, such as a classification result, a matching result, a sequence labeling result, and the like.

Optionally, in one possible implementation manner of this embodiment, in 201, the trained language model is specifically used to combine with other network models for implementing classification, matching, and sequence labeling, for example, a convolutional neural network (convolutional neural network, CNN), a long short term memory (Long Short Term Memory, LSTM) model, a Word Bag (Bag of Word, BOW) model, and perform NLP tasks to obtain processing results, for example, other network models for implementing classification, matching, and sequence labeling perform classification, matching, sequence labeling, and other processing results based on the output of the language model, so as to obtain corresponding classification results, matching results, and sequence labeling results.

202. And according to the difference between the processing result and the labeling result information corresponding to the processing result, performing fine-tuning (fine-tuning) on the parameter value in the language model, namely fine-tuning the parameter value in the language model.

The labeling result information is a correct processing result manually labeled for the NLP task to be performed in advance.

The 201 to 202 may be an iterative process, and the iterative process 201 to 202 performs multiple fine tuning on the language model until a preset condition is satisfied, so as to complete fine tuning on the language model.

Optionally, in a possible implementation manner of this embodiment, the preset condition may be set according to actual requirements, and may include, for example: the difference between the processing result and the labeling result information is smaller than a preset difference and smaller than a second preset threshold; and/or the number of fine-tuning times (i.e., the number of iterative executions of 201-202) of the language model reaches a third preset threshold.

In this embodiment, the parameter values in the language model can be further optimized by the NLP task with the supervision data (i.e., the labeling result information) under the condition of not changing the overall structure of the language model, so that optimization iteration is conveniently performed on the language model according to each NLP task, and the prediction performance of the language model is improved.

Optionally, in one possible implementation manner of this embodiment, in 201, the performing a natural language processing task with the trained language model may include any one or more of the following, for example:

classifying the text to be processed by using the trained language model to obtain a classification task of the text to be processed, for example, the text to be processed is derived from a plurality of articles, which emotion type the text to be processed belongs to, and the like, so that the text content can be classified; and/or the number of the groups of groups,

matching the text to be processed with other texts by using the trained language model to obtain other texts matched with the text to be processed, so that the content or article matched with the content of the text to be processed can be obtained; and/or the number of the groups of groups,

marking the content in the text to be processed by using the trained language model to obtain marking results of corresponding content in the text to be processed, such as key content information of each part of content in the text to be processed, so as to realize sequence marking of the text content to be processed; and/or the number of the groups of groups,

predicting the sequence among sentences in the text to be processed by using the trained language model, thereby realizing the processing of sentence sequencing tasks; and/or the number of the groups of groups,

predicting semantic distances (e.g., adjacent, from the same article, from different articles, etc.) between sentences in the text to be processed by using the trained language model, thereby realizing the prediction of sentence distances; and/or the number of the groups of groups,

and predicting the logical relations (such as causal relations, progressive relations, parallel relations and the like) among sentences in the text to be processed by using the trained language model, so that the prediction of the logical relations among the sentences is realized.

Therefore, the language model obtained by training based on the embodiment can be used for subsequent arbitrary word-level or sentence-level and article-level tasks, and has better processing performance.

It should be noted that, for simplicity of description, the foregoing method embodiments are all expressed as a series of action combinations, but it should be understood by those skilled in the art that the present application is not limited by the order of actions described, as some steps may be performed in other order or simultaneously in accordance with the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required in the present application.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to related descriptions of other embodiments.

Fig. 4 is a schematic view according to a fourth embodiment of the present application, as shown in fig. 4. The training apparatus 300 of the language model of the present embodiment may include an acquisition unit 301, a labeling unit 302, and a language model 303. Wherein, the obtaining unit 301 is configured to obtain word segmentation information of an original input text; the labeling unit 302 is configured to annotate word segmentation information and word segmentation information for each word label in the original input text, so as to obtain an input text sample; a language model 303 for receiving the input text sample for training based on the input text sample.

The execution subject of the training device of the language model of the present embodiment may be an application located in a local terminal, or may be a functional unit such as a plug-in unit or a software development kit (Software Development Kit, SDK) provided in an application located in a local terminal, or may be a processing engine located in a network server, which is not particularly limited in this embodiment.

Optionally, in one possible implementation manner of this embodiment, the acquiring unit 301 is specifically configured to: word segmentation is carried out on the original input text to obtain at least one word; each word of the at least one word includes at least one character; determining whether each character is a first character of the word according to whether each character in the at least one word is the first character of the word; the word segmentation information of the original input text comprises: and marking information of whether each character in the at least one word is a first character.

Optionally, in a possible implementation manner of this embodiment, the original input text includes at least one sentence; the labeling unit 302 is specifically configured to: and annotating word cutting information and word cutting information on each word mark in the original input text, and labeling sentence identification on each sentence in the original input text to obtain an input text sample.

Alternatively, in one possible implementation manner of this embodiment, the language model 303 in the foregoing embodiment of the present application may be any language model, for example, an ERNIE model may be used.

Optionally, in a possible implementation manner of this embodiment, the language model 303 is further configured to perform a natural language processing task after training is completed, so as to obtain a processing result.

Fig. 5 is a schematic diagram according to a fifth embodiment of the present application, as shown in fig. 5, on the basis of the embodiment shown in fig. 4, the training apparatus 300 for a language model of the present embodiment may further include: and the fine tuning unit 401 is configured to fine tune the parameter values in the language model 303 according to the difference between the processing result and the labeling result information corresponding to the processing result.

Optionally, in one possible implementation manner of this embodiment, when the language model 303 performs a natural language processing task, the method is specifically used for: classifying the text to be processed; and/or matching the text to be processed with other texts; and/or marking the content in the text to be processed; and/or predicting the sequence among sentences in the text to be processed; and/or predicting semantic distances between sentences in the text to be processed; and/or predicting the logical relation between sentences in the text to be processed.

It should be noted that, the method in the embodiments corresponding to fig. 1 to fig. 3 may be implemented by the training device of the language model provided in the embodiments of fig. 4 to fig. 5. The detailed description may refer to the relevant content in the corresponding embodiments of fig. 1 to 3, and will not be repeated here.

According to embodiments of the present application, there is also provided an electronic device and a non-transitory computer-readable storage medium storing computer instructions.

FIG. 6 is a schematic diagram of an electronic device for implementing a training method for language models of embodiments of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the application described and/or claimed herein.

As shown in fig. 6, the electronic device includes: one or more processors 501, memory 502, and interfaces for connecting components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the electronic device, including instructions stored in or on memory to display graphical information of a GUI (graphical user interface) on an external input/output device, such as a display device coupled to the interface. In other embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple electronic devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). One processor 501 is illustrated in fig. 6.

Memory 502 is a non-transitory computer readable storage medium provided herein. Wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the method of training the language model provided herein. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the training method of the language model provided by the present application.

The memory 502 is used as a non-transitory computer readable storage medium, and may be used to store a non-transitory software program, a non-transitory computer executable program, and units, such as program instructions/units (e.g., the acquisition unit 301, the labeling unit 302, and the language model 303 shown in fig. 4) corresponding to the training method of the language model in the embodiment of the present application. The processor 501 executes various functional applications of the server and data processing, i.e., implements the training method of the language model in the above-described method embodiment, by running non-transitory software programs, instructions, and units stored in the memory 502.

Memory 502 may include a storage program area that may store an operating system, at least one application program required for functionality, and a storage data area; the storage data area may store data created according to the use of the electronic device implementing the training method of the language model provided in the embodiment of the present application, and the like. In addition, memory 502 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, memory 502 optionally includes memory remotely located with respect to processor 501, which may be connected via a network to an electronic device implementing the training method of the language model provided by embodiments of the present application. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device of the training method of the language model may further include: an input device 503 and an output device 504. The processor 501, memory 502, input devices 503 and output devices 504 may be connected by a bus or otherwise, for example in fig. 6.

The input device 503 may receive input numeric or character information and generate key signal inputs related to user settings and function control of an electronic device implementing the language model training method provided by embodiments of the present application, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointer stick, one or more mouse buttons, a track ball, a joystick, and the like. The output devices 504 may include a display device, auxiliary lighting devices (e.g., LEDs), and haptic feedback devices (e.g., vibration motors), among others. The display device may include, but is not limited to, an LCD (liquid crystal display), an LED (light emitting diode) display, and a plasma display. In some implementations, the display device may be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, ASIC (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

These computing programs (also referred to as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, PLDs (programmable logic devices)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: LAN (local area network), WAN (wide area network), internet and blockchain networks.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

According to the technical scheme of the embodiment of the application, in the embodiment, the language model is trained by marking the input text samples of the word segmentation information and the word segmentation information, so that the language model can learn semantic information based on word granularity, the word granularity contains richer semantic information representation relative to the word granularity due to the introduction of the semantic information of the word granularity, the modeling of the word meaning information by the language model is enhanced based on the semantic learning of the word granularity, the learning capability of the language model on the word meaning information is enhanced, and the performance of the language model is improved.

In addition, in the prior art, in a sequence labeling task of the language model pre-training, each word in an input text is labeled, the input text is required to be segmented according to the word, if the input text is segmented according to the word, the universality of the language model is reduced, and the downstream sequence labeling task is not friendly. By adopting the technical scheme provided by the application, on the basis of not changing the word segmentation of the original input text, word segmentation information is additionally introduced, so that the semantic learning capacity of a language model can be improved, the universality of an ERNIE model can not be reduced, and the method is more friendly to a downstream sequence labeling task compared with a mode of directly introducing word segmentation.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present application may be performed in parallel, sequentially, or in a different order, provided that the desired results of the technical solutions disclosed in the present application can be achieved, and are not limited herein.

The above embodiments do not limit the scope of the application. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present application are intended to be included within the scope of the present application.

Claims

1. A method of training a language model, comprising:

acquiring word segmentation information of an original input text;

inputting the input text sample into a language model to train the language model; wherein,

the obtaining word segmentation information of the original input text comprises the following steps:

word segmentation is carried out on the original input text to obtain at least one word; each word of the at least one word includes at least one character;

determining whether each character is a first character of the word according to whether each character in the at least one word is the first character of the word; the word segmentation information of the original input text comprises: marking information of whether each character in the at least one word is a first character;

the original input text includes at least one sentence; the step of annotating and word-cutting information on each word mark in the original input text to obtain an input text sample comprises the following steps:

and annotating word cutting information and word cutting information on each word mark in the original input text, and labeling sentence identification on each sentence in the original input text to obtain an input text sample.

2. The method of claim 1, wherein the language model comprises a knowledge-enhanced semantic representation ERNIE model.

3. The method of any of claims 1-2, wherein the inputting the input text sample into a language model to train the language model further comprises:

performing natural language processing tasks by using the trained language model to obtain a processing result;

and fine tuning the parameter value in the language model according to the difference between the processing result and the labeling result information corresponding to the processing result.

4. The method of claim 3, wherein the performing natural language processing tasks using a trained language model comprises:

classifying the text to be processed by using the trained language model; and/or the number of the groups of groups,

matching the text to be processed with other texts by using the trained language model; and/or the number of the groups of groups,

marking the content in the text to be processed by using the trained language model; and/or the number of the groups of groups,

predicting the sequence among sentences in the text to be processed by using a trained language model; and/or the number of the groups of groups,

predicting semantic distances among sentences in the text to be processed by using the trained language model; and/or the number of the groups of groups,

and predicting the logical relations among sentences in the text to be processed by using the trained language model.

5. A training apparatus for a language model, comprising:

a language model for receiving the input text sample for training based on the input text sample; wherein,

the acquisition unit is particularly used for

the original input text includes at least one sentence; the labeling unit is particularly used for

6. The apparatus of claim 5, wherein the language model comprises a knowledge-enhanced semantic representation ERNIE model.

7. The apparatus of any of claims 5-6, wherein the language model is further configured to perform a natural language processing task after training is completed to obtain a processing result;

the apparatus further comprises:

and the fine tuning unit is used for fine tuning the parameter value in the language model according to the difference between the processing result and the labeling result information corresponding to the processing result.

8. The apparatus of claim 7, wherein the language model is specifically configured to, when performing a natural language processing task

Classifying the text to be processed; and/or the number of the groups of groups,

matching the text to be processed with other texts; and/or the number of the groups of groups,

labeling the content in the text to be processed; and/or the number of the groups of groups,

predicting the sequence among sentences in the text to be processed; and/or the number of the groups of groups,

predicting semantic distances between sentences in the text to be processed; and/or the number of the groups of groups,

predicting the logical relation between sentences in the text to be processed.

9. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-4.

10. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-4.