CN116166794A

CN116166794A - Training method of language model, text classification method, device, equipment and medium

Info

Publication number: CN116166794A
Application number: CN202111385246.1A
Authority: CN
Inventors: 翟彬旭
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-11-22
Filing date: 2021-11-22
Publication date: 2023-05-26

Abstract

The application discloses a training method of a language model, a text classification method, a device, equipment and a medium, and belongs to the technical field of text processing. The method comprises the following steps: acquiring a plurality of first texts; for any one of the first texts, acquiring the occurrence probability of each character in any one of the first texts according to the first network model; based on the occurrence probability of each character in the plurality of first texts, adjusting the first network model to obtain a first language model, wherein the first language model comprises a first feature extraction model; acquiring semantic features of any one of the first texts according to the first feature extraction model; based on semantic features of a plurality of first texts, adjusting the first feature extraction model to obtain a second feature extraction model; a target language model is determined based on the second feature extraction model. According to the method and the device, the dependence of the model on training data is reduced, the acquisition time of the training data is shortened, and the training speed of the model and the text classification efficiency are improved.

Description

Training method of language model, text classification method, device, equipment and medium

Technical Field

The embodiment of the application relates to the technical field of text processing, in particular to a training method of a language model, a text classification method, a device, equipment and a medium.

Background

In the technical field of text processing, a task for determining a category to which a text belongs is also called a text classification task, wherein the text classification task is the most basic task in natural language processing (Natural Language Processing, NLP), and the text classification task can be completed based on a language model.

The language model is a model for determining the occurrence probability of each character in a text. In the related art, a large amount of texts need to be acquired as training data, and a language model is obtained by training with the training data. When the text is classified, the text to be classified is input into a language model, the language model firstly extracts semantic features of the text, then the occurrence probability of each character in the text is determined based on the semantic features of the text, and the occurrence probability of each character in the text is output. Then, the category to which the text belongs is determined based on the occurrence probability of each character in the text.

In the above-mentioned technique, a lot of time is required to acquire training data, and the training process has a dependency on the training data, thereby affecting the training speed and the text classification efficiency of the model.

Disclosure of Invention

The embodiment of the application provides a training method, a text classification method, a device, equipment and a medium for a language model, which can be used for solving the problems of low model training speed and low text classification efficiency caused by a large amount of time spent for acquiring a large amount of training data in the related technology.

In one aspect, an embodiment of the present application provides a method for training a language model, where the method includes:

acquiring a plurality of first texts;

for any one first text, acquiring the occurrence probability of each character in the any one first text according to a first network model;

based on the occurrence probability of each character in the plurality of first texts, adjusting the first network model to obtain a first language model, wherein the first language model comprises a first feature extraction model;

acquiring semantic features of any one of the first texts according to the first feature extraction model;

based on the semantic features of the plurality of first texts, adjusting the first feature extraction model to obtain a second feature extraction model;

and determining a target language model based on the second feature extraction model.

In another aspect, an embodiment of the present application provides a text classification method, where the method includes:

acquiring a target text;

obtaining the occurrence probability of each character in at least two reconstructed texts according to a target language model, wherein the reconstructed texts comprise the target texts and candidate text categories, and the target language model is obtained according to the training method of any one of the language models;

Determining a target reconstructed text from the at least two reconstructed texts based on the occurrence probability of each character in the at least two reconstructed texts;

and determining the candidate text category in the target reconstructed text as the text category of the target text.

In another aspect, an embodiment of the present application provides a training apparatus for a language model, where the apparatus includes:

the acquisition module is used for acquiring a plurality of first texts;

the acquisition module is further used for acquiring the occurrence probability of each character in any one of the first texts according to the first network model;

the adjustment module is used for adjusting the first network model based on the occurrence probability of each character in the plurality of first texts to obtain a first language model, wherein the first language model comprises a first feature extraction model;

the acquisition module is further used for acquiring semantic features of any one of the first texts according to the first feature extraction model;

the adjusting module is further configured to adjust the first feature extraction model based on semantic features of the plurality of first texts to obtain a second feature extraction model;

and the determining module is used for determining a target language model based on the second feature extraction model.

In one possible implementation, the adjustment module is configured to determine a loss value of each first text based on semantic features of the plurality of first texts; determining a loss value of the first feature extraction model based on the loss value of the respective first text; and adjusting the first feature extraction model based on the loss value of the first feature extraction model to obtain a second feature extraction model.

In one possible implementation manner, the any one of the first texts is any one of the original texts or a replacement text corresponding to the any one of the original texts, and the replacement text corresponding to the any one of the original texts is a text obtained by replacing characters in the any one of the original texts;

the adjusting module is used for determining a loss value of any original text based on semantic features of each original text and semantic features of a replacement text corresponding to each original text; and determining the loss value of the replacement text corresponding to any original text based on the semantic features of each original text and the semantic features of the replacement text corresponding to each original text for the replacement text corresponding to any original text.

In one possible implementation manner, the adjusting module is configured to determine a first similarity between the any one original text and the replacement text corresponding to the any one original text based on the semantic features of the any one original text and the semantic features of the replacement text corresponding to the any one original text; determining a second similarity between the any one original text and other original texts based on semantic features of the any one original text and semantic features of the other original texts, the other original texts being original texts other than the any one original text in the respective original texts; determining a third similarity between the any one original text and the replacement text corresponding to the other original text based on the semantic features of the any one original text and the semantic features of the replacement text corresponding to the other original text; and determining a loss value of any original text based on the first similarity, the second similarity and the third similarity.

In one possible implementation manner, the adjusting module is configured to determine a first similarity between the any one original text and the replacement text corresponding to the any one original text based on the semantic features of the any one original text and the semantic features of the replacement text corresponding to the any one original text; determining a fourth similarity between the replacement text corresponding to the any original text and the other original text based on the semantic features of the replacement text corresponding to the any original text and the semantic features of the other original text, wherein the other original text is an original text except the any original text in the respective original texts; determining fifth similarity between the replacement text corresponding to any one original text and the replacement text corresponding to other original texts based on the semantic features of the replacement text corresponding to any one original text and the semantic features of the replacement texts corresponding to other original texts; and determining a loss value of the replacement text corresponding to any one original text based on the first similarity, the fourth similarity and the fifth similarity.

In a possible implementation manner, the adjusting module is configured to determine prediction information of each first text based on semantic features of each first text, where the prediction information of each first text is a probability that each character in the first text obtained through prediction is replaced; the method comprises the steps of obtaining marking information of each first text, wherein the marking information of the first text is information whether each character in the first text is replaced or not, which is obtained through marking; and adjusting the first feature extraction model based on the prediction information of each first text and the labeling information of each first text to obtain a second feature extraction model.

In one possible implementation manner, the determining module is configured to obtain a second text and a category label of the second text; acquiring the occurrence probability of each character in at least two spliced texts according to a second language model, wherein the spliced texts comprise the second texts and candidate text categories, and the second language model comprises the second feature extraction model; determining a target spliced text from the at least two spliced texts based on the occurrence probability of each character in the at least two spliced texts; and adjusting the second language model based on the candidate text category in the target spliced text and the category label of the second text to obtain the target language model.

In a possible implementation manner, the determining module is configured to splice the second text and the candidate text category to obtain any one of the at least two spliced texts; and inputting any spliced text into the second language model, determining the text vector of the any spliced text by the second language model, and determining the occurrence probability of each character in the any spliced text based on the text vector of the any spliced text.

In one possible implementation, the determining module is configured to input the second text into the second language model, and determine a text vector of the second text by the second language model; splicing the text vector of the second text and the text vector of the candidate text category by the second language model to obtain the text vector of any spliced text of the at least two spliced texts; determining, by the second language model, occurrence probabilities of respective characters in the any one of the spliced texts based on text vectors of the any one of the spliced texts.

In one possible implementation manner, the determining module is configured to determine, based on occurrence probabilities of respective characters in the at least two spliced texts, a confusion degree of respective spliced texts, where the confusion degree of the spliced texts characterizes a smoothness degree of the spliced texts; and determining the spliced text corresponding to the confusion degree meeting the condition as the target spliced text based on the confusion degree of each spliced text.

In a possible implementation manner, the determining module is configured to determine an occurrence probability of each of the spliced texts based on an occurrence probability of each of the characters in the at least two spliced texts; and determining the confusion degree of each spliced text based on the occurrence probability of each spliced text.

In one possible implementation, the apparatus further includes:

the acquisition module is further used for acquiring semantic features of any one of the first texts according to a second network model;

the adjusting module is further configured to adjust the second network model based on semantic features of the plurality of first texts, to obtain a third feature extraction model;

and the construction module is used for constructing the first network model based on the third characteristic extraction model.

the acquisition module is used for acquiring the target text;

the obtaining module is further configured to obtain occurrence probabilities of each character in at least two reconstructed texts according to a target language model, where the reconstructed texts include the target text and candidate text types, and the target language model is obtained according to the training method of any one of the language models;

The determining module is used for determining a target reconstructed text from the at least two reconstructed texts based on the occurrence probability of each character in the at least two reconstructed texts;

the determining module is further configured to determine a candidate text category in the target reconstructed text as a text category of the target text.

In a possible implementation manner, the obtaining module is configured to splice the target text and the candidate text category to obtain any one of the at least two reconstructed texts; and inputting the any one of the reconstructed texts into the target language model, determining a text vector of the any one of the reconstructed texts by the target language model, and determining the occurrence probability of each character in the any one of the reconstructed texts based on the text vector of the any one of the reconstructed texts.

In one possible implementation manner, the acquiring module is configured to input the target text into the target language model, and determine a text vector of the target text by the target language model; splicing the text vector of the target text and the text vector of the candidate text category by the target language model to obtain the text vector of any one of the at least two reconstructed texts; determining, by the target language model, occurrence probabilities of respective characters in the any one of the reconstructed texts based on the text vectors of the any one of the reconstructed texts.

In another aspect, an embodiment of the present application provides an electronic device, where the electronic device includes a processor and a memory, where at least one piece of program code is stored in the memory, and the at least one piece of program code is loaded and executed by the processor, so that the electronic device implements any one of the foregoing training methods of a language model or any one of the foregoing text classification methods.

In another aspect, there is provided a computer readable storage medium having at least one program code stored therein, the at least one program code loaded and executed by a processor to cause a computer to implement any one of the above-described language model training methods or any one of the above-described text classification methods.

In another aspect, a computer program or a computer program product is provided, where at least one computer instruction is stored, where the at least one computer instruction is loaded and executed by a processor, so that the computer implements the training method of any of the language models or the text classification method of any of the foregoing.

The technical scheme provided by the embodiment of the application at least brings the following beneficial effects:

According to the technical scheme, a plurality of first texts are utilized to train to obtain a first language model, the first language model comprises a first feature extraction model, semantic features of the plurality of first texts acquired based on the first feature extraction model are utilized to adjust the first feature extraction model to obtain a second feature extraction model, and a target language model is obtained based on the second feature extraction model. The method and the device realize two-stage training of the language model by using the first text, enable the language model with higher accuracy to be trained by using a small amount of the first text, reduce the dependence of the model on training data, thereby reducing the acquisition time of the training data and improving the training speed of the model and the efficiency of text classification.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of an implementation environment of a training method or a text classification method of a language model according to an embodiment of the present application;

FIG. 2 is a flowchart of a method for training a language model provided in an embodiment of the present application;

FIG. 3 is a schematic structural diagram of a BERT model according to an embodiment of the present application;

FIG. 4 is a schematic diagram of processing of original text and alternative text provided in an embodiment of the present application;

FIG. 5 is a schematic diagram of a soft code provided by an embodiment of the present application;

FIG. 6 is a flow chart of a text classification method provided by an embodiment of the present application;

FIG. 7 is a schematic diagram of a target language model for processing information according to an embodiment of the present application;

FIG. 8 is a schematic structural diagram of a training device for language model according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of a text classification device according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of a terminal device provided in an embodiment of the present application;

fig. 11 is a schematic structural diagram of a server according to an embodiment of the present application.

Detailed Description

For the purposes of making the objects, technical solutions and advantages of the present application more apparent, the terms used in the alternative embodiments of the present application will be explained below.

The Language Model (LM), also called a standard Language Model (Standard Language Model), is a probability distribution Model, and aims to evaluate the generation probability p (S) of any character string in a Language, where s= (w 1, w2, w3, …, wn), S is a character string, the character string includes n characters, and wi is the i-th character of the character string.

Small sample learning (Few Shot Learning, FSL): given a data set DT with a small amount of available supervision information specific to the task T and an auxiliary data set DA not related to T, the goal of the small sample learning is to construct a function f for the task T, the completion of which takes advantage of the little supervision information in DT and the knowledge in DA to complete the task of mapping the input to the target. Briefly, small sample learning aims to learn a model to solve the problem with a small number of samples, and this task is zero sample learning (Zero Shot Learning, ZSL) when the available annotation dataset DT is empty.

Pre-training model: the model irrelevant to the specific task is obtained from the large-scale data through self-supervision learning. Because rich information is learned in the pre-training process, the model after fine tuning the pre-training model can be used for specific tasks.

Text classification: the method refers to a process of processing the natural language text T into a computer-recognizable code and inputting the code into the classifier F through the steps of text preprocessing, feature extraction, text representation, classifier construction and the like, and dividing the input text into one or more classes in a specified class set G. Common text classification tasks are emotion analysis, news classification, garbage filtering, and the like.

Embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

Fig. 1 is a schematic diagram of an implementation environment of a language model training method or a text classification method according to an embodiment of the present application, where the implementation environment includes an electronic device 11 as shown in fig. 1, and the language model training method or the text classification method according to the embodiment of the present application may be executed by the electronic device 11. The electronic device 11 may comprise at least one of a terminal device or a server, for example.

The terminal device may be at least one of a smart phone, a game console, a desktop computer, a tablet computer, an electronic book reader, an MP3 (Moving Picture Experts Group Audio Layer III, moving picture experts compression standard audio layer 3) player, an MP4 (Moving Picture Experts Group Audio Layer IV, moving picture experts compression standard audio layer 4) player, and a laptop portable computer.

The server may be one server, or a server cluster formed by a plurality of servers, or any one of a cloud computing platform and a virtualization center, which is not limited in the embodiments of the present application. The server may be communicatively connected to the terminal device via a wired network or a wireless network. The server may have functions of data processing, data storage, data transceiving, and the like, and is not limited in the embodiments of the present application.

Alternative embodiments of the present application may be implemented based on artificial intelligence (Artificial Intelligence, AI), which is a theory, method, technique, and application system that simulates, extends, and extends human intelligence, senses environment, obtains knowledge, and uses knowledge to obtain optimal results using a digital computer or a digital computer-controlled machine. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.

The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning, automatic driving, intelligent traffic and other directions.

Natural language processing (Nature Language Processing, NLP) is an important direction in the fields of computer science and artificial intelligence. It is studying various theories and methods that enable effective communication between a person and a computer in natural language. Natural language processing is a science that integrates linguistics, computer science, and mathematics. Thus, the research in this field will involve natural language, i.e. language that people use daily, so it has a close relationship with the research in linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic questions and answers, knowledge graph techniques, and the like.

With research and progress of artificial intelligence technology, research and application of artificial intelligence technology are being developed in various fields, such as common smart home, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned, autopilot, unmanned, robotic, smart medical, smart customer service, car networking, autopilot, smart transportation, etc., and it is believed that with the development of technology, artificial intelligence technology will be applied in more fields and will be of increasing importance.

Based on the above implementation environment, the embodiment of the present application provides a training method for a language model, taking a flowchart of the training method for a language model provided in the embodiment of the present application as shown in fig. 2 as an example, where the method may be executed by the electronic device 11 in fig. 1. As shown in fig. 2, the method includes steps 201 to 206.

In step 201, a plurality of first texts is obtained.

The embodiment of the application does not limit the number, content, length, text type and the like of the first text, and the first text is exemplified by text in the multimedia information or text such as a barrage, comment and the like aiming at the multimedia information.

Step 202, for any one of the first texts, obtaining the occurrence probability of each character in any one of the first texts according to the first network model.

For any one of the first texts, inputting the any one of the first texts into a first network model, determining semantic features of any one of the first texts by the first network model, and determining the occurrence probability of each character in any one of the first texts based on the semantic features of any one of the first texts. The semantic features of any one of the first texts comprise character semantic features of each character in any one of the first texts, and the embodiment of the application does not limit the model structure and the model size of the first network model.

Optionally, the first network model includes a Generative Pre-Training (GPT) model, and the GPT model may be a GPT-1 model, a GPT-2 model, or the like. At this time, the occurrence probability of any one character in the first text may be a probability of occurrence of the any one character on the basis of at least one character preceding the any one character in the first text. For example, the probability of occurrence of any one character in the first text satisfies P (w _i |w ₁ ，w ₂ ，…，w _i-1 ) Wherein w is _i The occurrence probability for the ith character in the first text, that is, the occurrence probability of the ith character in the first text is the probability of the ith character on the basis of each character before the ith character in the first text.

Optionally, the first network model comprises a converter-based bi-directional coded representation (Bidirectional Encoder Representations from Transformers, BERT) model. As shown in fig. 3, fig. 3 is a schematic structural diagram of a BERT model according to an embodiment of the present application. For the text composed of character 1, character 2 to character N, after inputting the text into the BERT model, the text vector of each character is determined first, that is, the text vector of character 1, the text vector of character 2, … …, the text vector of character N are determined first. After that, the text vector of each character is input to the converter, which includes a plurality of conversion sections, each of which can update the text vector of one character based on the text vector of each character, and finally the converter outputs the semantic features of each character, that is, the converter outputs the semantic features of character 1, the semantic features of character 2, … …, the semantic features of character N.

When the first network model includes the BERT model, the probability of occurrence of any one character in the first text may be a probability of occurrence of the any one character on the basis of at least one character other than the any one character in the first text. For example, the probability of occurrence of any one character in the first text satisfies P (w _i |w ₁ ，…，w _i-1 ，w _i+1 ，…，w _k ) Wherein w is _i The i-th character in the first text is represented by k, which is the number of characters in the first text, and k is a positive integer, that is, the probability of occurrence of the i-th character in the first text is based on the respective characters other than the i-th character in the first text.

And 203, adjusting the first network model based on the occurrence probability of each character in the plurality of first texts to obtain a first language model, wherein the first language model comprises a first feature extraction model.

In the embodiment of the application, the loss value of the first network model is calculated based on the occurrence probability of each character in the plurality of first texts, and the model parameters of the first network model are updated once based on the loss value of the first network model, so that the updated first network model is obtained. When the first training ending condition is met, the updated first network model is used as a first language model, when the first training ending condition is not met, the updated first network model is used as a first network model for next training, and at least one time of updating training is carried out on the first network model according to a plurality of first texts and in the mode from step 202 to step 203 until the first language model is obtained. The embodiment of the present application does not limit the satisfaction of the first training ending condition, and the satisfaction of the first training ending condition is that the training number reaches the first target training number (for example, 500 times).

Alternatively, according to formula L ₁ (x)＝∑ _i logP(x _i |x _i-k ,…,x _i-1 θ) calculating a penalty value for the first network model based on the probability of occurrence of each character in the plurality of first texts. Wherein L is ₁ (x) For the loss value of the first network model, θ is the model parameter of the first network model, x _i For the ith character in the first text, P represents the occurrence probability of the character, and k is a positive integer greater than or equal to 1.

In an embodiment of the present application, the first language model includes a first feature extraction model and a regression model. The regression model is not limited in this embodiment, and is illustratively a Softmax model.

According to the method and the device, based on the occurrence probability of each character in the plurality of first texts, the first language model is obtained according to the first network model, so that the first language model learns the occurrence probability of each character in the texts, and text classification is facilitated based on the occurrence probability of each character in the texts.

And step 204, acquiring semantic features of any one of the first texts according to the first feature extraction model.

In the embodiment of the application, any one of the first texts is input into the first feature extraction model, and the semantic features of any one of the first texts are output by the first feature extraction model. Wherein the semantic features of any one of the first texts comprise text semantic features of any one of the first texts and/or character semantic features of the respective characters in any one of the first texts.

Optionally, the first feature extraction model includes an encoder and a converter. Any one of the first texts is input into the first feature extraction model, a text vector of any one of the first texts is determined by the encoder, and semantic features of any one of the first texts are determined by the converter based on the text vector of any one of the first texts. The text vector of any one of the first texts comprises a character vector of each character in any one of the first texts, a position vector of each character in any one of the first texts and a paragraph vector of each character in any one of the first texts.

It should be noted that the text vector of any one character includes the character vector of any one character, the position vector of any one character, and the paragraph vector of any one character, that is, the text vector of any one first text includes the text vector of each character in any one first text.

And step 205, adjusting the first feature extraction model based on the semantic features of the plurality of first texts to obtain a second feature extraction model.

In the embodiment of the application, based on semantic features of a plurality of first texts, the first feature extraction model is updated once, and an updated first feature extraction model is obtained. And if the second training ending condition is met, taking the updated first feature extraction model as a second feature extraction model, and if the second training ending condition is not met, taking the updated first feature extraction model as a first feature extraction model for next training, and based on a plurality of first texts, performing at least one updating training on the first feature extraction model according to the modes from step 204 to step 205 until the second feature extraction model is obtained. The embodiment of the present application does not limit the satisfaction of the second training ending condition, and the satisfaction of the second training ending condition is that the training number reaches the second target training number (for example, 1500 times) by way of example.

In one possible implementation (this implementation is denoted as implementation A1), adjusting the first feature extraction model based on semantic features of the plurality of first texts, to obtain a second feature extraction model, including: determining a penalty value for each first text based on semantic features of the plurality of first texts; determining a loss value of the first feature extraction model based on the loss value of each first text; and adjusting the first feature extraction model based on the loss value of the first feature extraction model to obtain a second feature extraction model.

In the embodiment of the application, the loss value of any one first text is determined based on the semantic features of a plurality of first texts, and in this way, the loss value of each first text can be determined. Then, the sum of the loss values of the respective first texts is taken as the loss value of the first feature extraction model. And updating the model parameters of the first feature extraction model once based on the loss value of the first feature extraction model to obtain an updated first feature extraction model, and acquiring a second feature extraction model based on the updated first feature extraction model.

Optionally, any one of the first texts is any one of the original texts or a replacement text corresponding to any one of the original texts, and the replacement text corresponding to any one of the original texts is a text obtained by replacing characters in any one of the original texts; determining a penalty value for each first text based on semantic features of the plurality of first texts, comprising: for any one original text, determining a loss value of any one original text based on the semantic features of each original text and the semantic features of the replacement text corresponding to each original text; for the replacement text corresponding to any one of the original texts, determining a loss value of the replacement text corresponding to any one of the original texts based on the semantic features of each of the original texts and the semantic features of the replacement text corresponding to each of the original texts.

In the embodiment of the application, the implementation A1 may be implemented based on contrast learning (Contrastive Learning, CTL). At this time, any one of the first texts is any one of the original texts or a substitute text corresponding to any one of the original texts, that is, the plurality of first texts includes at least two original texts and a substitute text corresponding to each of the original texts. The replacing text corresponding to any one of the original texts is a text obtained by replacing at least one character in any one of the original texts, and the at least one character can be continuous or discontinuous. For example, the original text is segmented to obtain at least two words, each character in any word is replaced to obtain a replacement text corresponding to the original text, and any word of the original text can be an entity.

In one possible implementation, the replacement of at least one character in the original text is accomplished by a paraphrase replacement of a word vector model. That is, firstly, word segmentation is performed on the original text to obtain at least two words, for any word selected randomly, the word with the maximum cosine similarity with any word is queried in the word list, and the queried word is used for replacing any word to obtain a replacement text corresponding to the original text, that is, the replacement text corresponding to the original text is a similar text. The cosine similarity between the queried term and any term is not lower than a similarity threshold (for example, 0.8), and the term vector model includes but is not limited to a word2vector model, a Glove model and a Fasttext model.

In this embodiment of the present application, any one of the first texts is any one of the original texts or a substitute text corresponding to any one of the original texts, and the loss value of any one of the first texts is determined based on the semantic features of the plurality of first texts, that is, the loss value of any one of the original texts is determined based on the semantic features of the plurality of first texts or the loss value of the substitute text corresponding to any one of the original texts is determined based on the semantic features of the plurality of first texts. Then, a penalty value of the first feature extraction model is determined based on the penalty value of each original text and the penalty value of the replacement text corresponding to each original text. The loss value of the first feature extraction model is determined as shown in the following formula (1).

Wherein L is _capt Extracting a loss value, x, of the model for the first feature _i As the original text is to be processed,

for the corresponding alternate text of the original text, n is the number of the original text and/or the number of the alternate text, L (x _i ) Loss value for original text, +.>

And the loss value of the replacement text corresponding to the original text.

Determining a penalty value for any one of the original texts based on semantic features of the plurality of first texts is described below. Optionally, determining the loss value of any original text based on the semantic feature of each original text and the semantic feature of the replacement text corresponding to each original text includes: determining a first similarity between any one original text and the replacement text corresponding to any one original text based on the semantic features of any one original text and the semantic features of the replacement text corresponding to any one original text; determining a second similarity between any one original text and other original texts based on the semantic features of any one original text and the semantic features of other original texts, wherein the other original texts are original texts except for any one original text in the respective original texts; determining a third similarity between any one original text and the replacement text corresponding to other original texts based on the semantic features of any one original text and the semantic features of the replacement text corresponding to other original texts; a loss value of any one of the original texts is determined based on the first similarity, the second similarity, and the third similarity.

In the embodiment of the application, when the loss value of any one original text is calculated, the first similarity is determined based on the semantic features of any one original text and the semantic features of the replacement text corresponding to any one original text, the second similarity is determined based on the semantic features of any one original text and the semantic features of other original texts, and the third similarity is determined based on the semantic features of any one original text and the semantic features of the replacement text corresponding to other original texts. Then, a loss value of any one of the original texts is determined based on the first similarity, the second similarity, and the third similarity. The above-described process of calculating the loss value of any one of the original texts is shown in the following formula (2).

Wherein L (x) _i ) For any one of the originalsThe loss value of the text, exp is the sign of the similarity, S _i For the semantic features of any one of the original text,

for the semantic features of the replacement text corresponding to any one of the original texts, S _j For semantic features of other original text, +.>

And for the semantic features of the alternative texts corresponding to other original texts, i and j are serial numbers, and τ is a scaling factor.

The embodiments of the present application do not limit the manner in which the scaling factor is calculated, and illustratively, the scaling factor is determined based on the number of training times. In one possible implementation, the calculation formula of the scaling factor is shown in formula (3) below.

Wherein τ (T) is a scaling factor corresponding to the current training frequency, T is the current training frequency, and T is the second target training frequency.

The loss value of the replacement text corresponding to any one of the original texts is determined based on the semantic features of the plurality of first texts is described below. Optionally, determining the loss value of the replacement text corresponding to any original text based on the semantic feature of each original text and the semantic feature of the replacement text corresponding to each original text includes: determining a first similarity between any one original text and the replacement text corresponding to any one original text based on the semantic features of any one original text and the semantic features of the replacement text corresponding to any one original text; determining fourth similarity between the replacement text corresponding to any one original text and other original texts based on semantic features of the replacement text corresponding to any one original text and semantic features of other original texts, wherein the other original texts are original texts except any one original text in the original texts; determining fifth similarity between the replacement text corresponding to any one original text and the replacement text corresponding to other original texts based on the semantic features of the replacement text corresponding to any one original text and the semantic features of the replacement texts corresponding to other original texts; and determining a loss value of the replacement text corresponding to any one original text based on the first similarity, the fourth similarity and the fifth similarity.

In the embodiment of the application, when the loss value of the replacement text corresponding to any one original text is calculated, the first similarity is determined based on the semantic features of any one original text and the semantic features of the replacement text corresponding to any one original text, the fourth similarity is determined based on the semantic features of the replacement text corresponding to any one original text and the semantic features of other original texts, and the fifth similarity is determined based on the semantic features of the replacement text corresponding to any one original text and the semantic features of the replacement text corresponding to other original texts. Then, a loss value of the replacement text corresponding to any one of the original texts is determined based on the first similarity, the fourth similarity and the fifth similarity. The above-described process of calculating the loss value of the replacement text corresponding to any one of the original texts is shown in the following formula (4).

Wherein, the liquid crystal display device comprises a liquid crystal display device,

for the loss value of the replacement text corresponding to any original text, exp is the sign of similarity, S _i For the semantic features of any one original text, +.>

Semantic features of the alternate text corresponding to the other original text, i and j are both sequence numbers, τ is a scaling factor, and descriptions about the scaling factor can be found above, and are not repeated here.

Next, referring to fig. 4, fig. 4 is a schematic diagram illustrating processing of an original text and an alternative text according to an embodiment of the present application. In this embodiment of the present application, the replacement text corresponding to the original text 1 is the replacement text 1, the replacement text corresponding to the original text 2 is the replacement text 2, the replacement text corresponding to the original text 3 is the replacement text 3, and the original text 1, the original text 2 and the original text 3 are different texts.

First, the original texts 1 to 3 and the alternative texts 1 to 3 are respectively input into a first feature extraction model, and semantic features of the original texts 1 to 3 and the alternative texts 1 to 3 are respectively output by the first feature extraction model. Next, the semantic features of one original text and the semantic features of the alternate text corresponding to the original text are taken as positive samples, for example, the semantic features of the original text 1 and the semantic features of the alternate text 1 are taken as positive samples, and the semantic features of one original text and the semantic features of the other original text, the semantic features of the alternate text corresponding to the one original text and the semantic features of the alternate text corresponding to the other original text are taken as negative samples, for example, the semantic features of the original text 1 and the semantic features of the original text 2, the semantic features of the original text 1 and the semantic features of the alternate text 3 are taken as negative samples. Thereafter, the similarity between the positive samples is calculated.

After the similarity between the positive samples and the similarity between the negative samples are calculated, the loss value of the first feature extraction model is calculated using formulas (1) to (4), and a second feature extraction model is acquired from the first feature extraction model based on the loss value of the first feature extraction model.

According to the method and the device, the first feature extraction model is adjusted based on the semantic features of the original texts and the semantic features of the replacement texts corresponding to the original texts by determining the semantic features of the original texts and the semantic features of the replacement texts corresponding to the original texts, so that the second feature extraction model is obtained, and the semantic features of different texts can be accurately determined by the second feature extraction model. When the replacement text corresponding to the original text and the original text are similar texts, the second feature extraction model can accurately identify and determine semantic features of the similar texts, so that the robustness of the second feature extraction model to noise is improved, and the accuracy of the second feature extraction model is improved.

In another possible implementation (this implementation is denoted as implementation A2), adjusting the first feature extraction model based on semantic features of the plurality of first texts, to obtain a second feature extraction model, including: determining prediction information of each first text based on semantic features of each first text, wherein the prediction information of the first text is the probability of each character in the first text obtained through prediction being replaced; the method comprises the steps of obtaining marking information of each first text, wherein the marking information of the first text is information whether each character in the first text is replaced or not, which is obtained through marking; and adjusting the first feature extraction model based on the prediction information of each first text and the labeling information of each first text to obtain a second feature extraction model.

In the embodiment of the application, the implementation A2 is implemented based on the replacement word detection (Replaced Token Detection, RTD). That is, after the semantic features of the respective first texts are acquired based on the first feature extraction model, the semantic features of the respective first texts are input to the activation model, and the prediction information of the respective first texts is output by the activation model. The embodiment of the application does not limit the model structure and the model size of the activation model, and the activation model is a sigmoid model by way of example.

For any one of the first texts, the prediction information of the first text is a predicted probability of each character being replaced in the first text, and the probability of any one character being replaced is equal to or greater than 0 and equal to or less than 1. The embodiment of the application may further obtain labeling information of each first text, where the labeling information of any first text is information whether each character in the first text is replaced or not obtained by labeling, and information whether any character is replaced or not is 0 or 1, where 0 indicates that the character is not replaced, and 1 indicates that the character is replaced.

Next, a loss value of the first feature extraction model is calculated based on the prediction information of each first text and the labeling information of each first text. Wherein, the calculation formula of the loss value of the first feature extraction model is shown in the following formula (5).

Wherein L is _Disc (x,θ _D ) A loss value representing the first feature extraction model, x being any character in the first text, θ _D Extracting model parameters of the model for the first feature, E representing the expectation, n being the number of characters in the first text, t being the position of the characters,

representation pair character x _t And carrying out the replaced character. />

And->

All are the character x _t Marking whether replaced, when->

When, i.e. character x _t When not replaced, the->

Has a value of 1, (-)>

The value of (2) is 0; when->

When, i.e. character x _t When it has been replaced by a replacement of the container,

the value of (2) is 0, (-)>

Has a value of 1, (-)>

Representing the probability that the t-th character in the first text output by the activation model is replaced.

After the loss value of the first feature extraction model is calculated, the first feature extraction model is updated once based on the loss value of the first feature extraction model, an updated first feature extraction model is obtained, and a second feature extraction model is obtained according to the updated first feature extraction model.

It is understood that the first text in the embodiment of the present application is a text obtained by replacing at least one character in the original text, where the at least one character may be a continuous character or a discontinuous character. In one possible implementation, the replacement of at least one character in the original text is accomplished by a paraphrase replacement of a word vector model. That is, firstly, word segmentation is carried out on an original text to obtain at least two words, for any word selected randomly, the word with cosine similarity not greater than a similarity threshold value with any word is queried in a word list, and the queried word is used for replacing any word to obtain a first text. The embodiment of the application does not limit the similarity threshold, and the similarity threshold is exemplified as 0.5, and the word vector model includes but is not limited to a word2vector model, a Glove model and a FastText model.

In the embodiment of the invention, since at least one character in the first text is replaced, after the second feature extraction model is obtained based on the prediction information of each first text and the labeling information of each first text, the second feature extraction model can pay more attention to the smoothness of sentences, so that the second feature extraction model can accurately determine the semantic features of the replaced text, and the accuracy of the second feature extraction model is improved.

It should be noted that, in this embodiment of the present application, implementation A1 may be executed first, and implementation A2 may be executed second, that is, loss values of each first text may be determined based on semantic features of a plurality of first texts, loss values of first feature extraction models may be determined based on the loss values of each first text, the first feature extraction models may be adjusted based on the loss values of the first feature extraction models, an intermediate feature extraction model may be obtained, prediction information of each first text may be determined based on semantic features of each first text, labeling information of each first text may be obtained, and then the intermediate feature extraction model may be adjusted based on the prediction information of each first text and the labeling information of each first text, so as to obtain a second feature extraction model. The implementation A2 may be performed first, and then the implementation A1 may be performed, that is, the prediction information of each first text is determined based on the semantic features of each first text, the labeling information of each first text is obtained, then the first feature extraction model is adjusted based on the prediction information of each first text and the labeling information of each first text, so as to obtain an intermediate feature extraction model, the loss value of each first text is determined based on the semantic features of a plurality of first texts, the loss value of the intermediate feature extraction model is determined based on the loss value of each first text, and the intermediate feature extraction model is adjusted based on the loss value of the intermediate feature extraction model, so as to obtain a second feature extraction model.

Step 206, determining the target language model based on the second feature extraction model.

And after the second feature extraction model is determined, splicing the second feature extraction model and the regression model to obtain a target language model, so that the target language model is utilized to realize classification processing of the target text. According to the method and the device, the target language model is obtained through zero sample learning (Zero Shot Learning, ZSL) training, the labeling process is reduced, the training speed of the model is accelerated, and the text classification efficiency can be improved.

Optionally, a small amount of labeled samples can be used for training the language model after the second feature extraction model and the regression model are spliced, so as to obtain the target language model. According to the method and the device, the target language model is obtained through training of small sample learning (Few Shot Learning, FSL), the number of samples to be marked is reduced, training speed of the model is accelerated, and text classification efficiency can be improved.

Optionally, determining the target language model based on the second feature extraction model includes: acquiring a second text and a category label of the second text; acquiring the occurrence probability of each character in at least two spliced texts according to a second language model, wherein the spliced texts comprise second texts and candidate text categories, and the second language model comprises a second feature extraction model; determining a target spliced text from the at least two spliced texts based on the occurrence probability of each character in the at least two spliced texts; and adjusting the second language model based on the candidate text category in the target spliced text and the category label of the second text to obtain the target language model.

In the embodiment of the application, the second feature extraction model and the regression model are spliced to obtain a second language model. For any one of the second texts, the second texts are input into a second language model, semantic features of the spliced texts are determined by the second feature extraction model, and occurrence probability of each character in the spliced texts is determined by the regression model based on the semantic features of the spliced texts. The semantic features of the spliced text comprise character semantic features of all characters in the spliced text, and the spliced text comprises a second text and candidate text categories.

The candidate text category is not limited, and can be exemplified by emotion categories such as good, general, bad, positive, negative and the like, content categories such as entertainment, military, cartoon, food and the like, and quality categories such as junk text, non-junk text and the like.

In one possible implementation manner, obtaining the occurrence probability of each character in at least two spliced texts according to the second language model includes: splicing the second text and the candidate text category to obtain any spliced text of at least two spliced texts; and inputting any spliced text into the second language model, determining the text vector of any spliced text by the second language model, and determining the occurrence probability of each character in any spliced text based on the text vector of any spliced text.

In the embodiment of the application, a hard coding mode is adopted, the second text and the candidate text category are spliced to obtain the spliced text, the spliced text is input into the second language model, the semantic features of the spliced text are determined by the second feature extraction model, the occurrence probability of each character in the spliced text is determined by the regression model based on the semantic features of the spliced text, and therefore the occurrence probability of each character in the spliced text is output by the second language model.

Wherein the second feature extraction model includes an encoder and a converter. Inputting the spliced text into a second language model, determining a text vector of the spliced text by an encoder, determining semantic features of the spliced text by a converter based on the text vector of the spliced text, and determining occurrence probability of each character in the spliced text by a regression model based on the semantic features of the spliced text. The text vector of the spliced text comprises a character vector of each character in the spliced text, a position vector of each character in the spliced text and a paragraph vector of each character in the spliced text.

Alternatively, the spliced text may contain other text in addition to the second text and the candidate text category, as shown in table 1 below.

TABLE 1

In table 1, the second text is "movie highlight is forensic". The contents of the flakes are flat, and the completion degree is still good. "or" garbage movies, waste time. "other text" feel true "and candidate text categories including" good "," general "and" bad "are spliced after the second text.

The hard coding is to directly splice the second text with the candidate text category to obtain a spliced text, and for different scenes, different hard coded texts need to be constructed, for example, for the text of "large-size pile of the autumn and winter jersey for men", the hard coded text is not suitable to splice after the text, and therefore, the corresponding hard coded text needs to be constructed for the text. This also results in less versatility of hard coding, and hard coded text is more difficult to design. Based on the above, the embodiment of the application also designs a soft coding mode, the text vector of the second text and the text vector of the candidate text category are spliced to obtain the text vector of the spliced text, and the model is learned to the soft coding text through multiple training and learning, so that the universality of the model is improved. The soft coding method in the embodiment of the present application is described below.

In another possible implementation manner, obtaining the occurrence probability of each character in at least two spliced texts according to the second language model includes: inputting the second text into a second language model, and determining a text vector of the second text by the second language model; splicing the text vector of the second text and the text vector of the candidate text category by the second language model to obtain the text vector of any spliced text of at least two spliced texts; the probability of occurrence of each character in any one of the spliced texts is determined by the second language model based on the text vector of any one of the spliced texts.

In an embodiment of the present application, the second language model includes a second feature extraction model and a regression model, and the second feature extraction model includes an encoder and a converter. And inputting the second text into a second language model in a soft coding mode, determining a text vector of the second text and a text vector of a candidate text category by an encoder, splicing the text vector of the second text and the text vector of the candidate text category to obtain a text vector of a spliced text, determining semantic features of the spliced text by a converter based on the text vector of the spliced text, and determining occurrence probability of each character in the spliced text by a regression model based on the semantic features of the spliced text.

Wherein the text vectors of the text (including, but not limited to, the first text, the second text, the stitched text, the candidate text category in the alternative embodiments) include character vectors of the respective characters in the text, position vectors of the respective characters in the text, and paragraph vectors of the respective characters in the text.

Alternatively, the text vector of the spliced text may contain text vectors of other texts in addition to the text vector of the second text and the text vector of the candidate text category. Embodiments of the present application are not limited to the content of other text, which is illustratively "feel true".

Referring to fig. 5, fig. 5 is a schematic diagram of soft coding according to an embodiment of the present application. In this embodiment of the present application, the second text includes character 1, character 2, and character 3, the second language model includes an encoder including a character encoder, a position encoder, and a paragraph encoder, and the converter includes converter 1, converter 2, and converter L, where L is a positive integer.

Inputting the second text into the second language model, determining character vectors of all characters in the second text by a character encoder, determining position vectors of all characters in the second text by a position encoder, and determining paragraph vectors of all characters in the second text by a paragraph encoder, so as to obtain text vectors of the second text. Based on the same principle, text vectors for other text, as well as text vectors for candidate text categories, are determined by the character encoder, position encoder and paragraph encoder.

And then, splicing the text vector of the second text, the text vectors of other texts and the text vectors of the candidate text categories to obtain the text vector of the spliced text. And inputting the text vector of the spliced text to a converter, and finally outputting semantic features of the spliced text after the text vector passes through the converter 1 and the converter 2 to the converter L, wherein the semantic features of the spliced text comprise character semantic features of all characters in the spliced text, and the semantic features of all characters in the spliced text comprise character semantic features 1 to 5.

The second language model in the embodiment of the present application further includes a regression model (not shown in fig. 5) for determining occurrence probabilities of respective characters in the spliced text based on semantic features of the spliced text.

In the embodiment of the application, after the occurrence probability of each character in at least two spliced texts is obtained, a target spliced text is determined from the at least two spliced texts based on the occurrence probability of each character in the at least two spliced texts.

Optionally, determining the target spliced text from the at least two spliced texts based on the occurrence probability of each character in the at least two spliced texts includes: determining the confusion degree of each spliced text based on the occurrence probability of each character in at least two spliced texts, wherein the confusion degree of the spliced texts represents the smoothness degree of the spliced texts; and determining the spliced text corresponding to the confusion degree meeting the condition as a target spliced text based on the confusion degree of each spliced text.

In the embodiment of the application, for any spliced text, the confusion degree of the spliced text is determined based on the occurrence probability of each character in the spliced text. Among these, confusion (PPL) is a text metric method used to characterize the prosecution of text. The greater the confusion of the spliced text, the less the spliced text is characterized, and the smaller the confusion of the spliced text is, the more the spliced text is characterized.

Optionally, determining the confusion degree of each spliced text based on the occurrence probability of each character in at least two spliced texts includes: determining the occurrence probability of each spliced text based on the occurrence probability of each character in at least two spliced texts; and determining the confusion degree of each spliced text based on the occurrence probability of each spliced text.

In the embodiment of the application, for any spliced text, the occurrence probability of the spliced text is determined based on the occurrence probability of each character in the spliced text. The occurrence probability of the spliced text is shown in the following formula (6-1) or (6-2).

P(S)＝p(w ₁ ,w ₂ ,w ₃ ,…,w _m )＝p(w ₁ )p(w ₂ |w ₁ )…p(w _m |w ₁ ,w ₂ ,…,w _m-1 )

Formula (6-1)

Wherein P (S) and P (w ₁ ,w ₂ ,w ₃ ,…,w _m ) Are all the occurrence probabilities of the spliced texts, p (w ₁ )、p(w ₂ |w ₁ )、p(w _m |w ₁ ,w ₂ ,…,w _m-1 ) And p (w) _i |w ₁ ,…,w _i-1 ,w _i+1 ,…,w _k ) The occurrence probability of characters in the spliced text is given, and m and k are the number of characters in the spliced text.

After the occurrence probabilities of at least two spliced texts are determined, the confusion degree of each spliced text is determined based on the occurrence probabilities of each spliced text. Wherein the confusion of the spliced text is determined according to the following formula (7).

Wherein, PP (W) is the confusion degree of the spliced text, and P (W) ₁ w ₂ …w _N ) For the occurrence probability of the spliced text, N is the number of characters in the spliced text.

After the confusion degree of at least two spliced texts is calculated, determining the spliced text corresponding to the confusion degree meeting the condition as a target spliced text. The embodiment of the application does not limit the confusion degree meeting the condition, and the exemplary confusion degree meeting the condition is the minimum confusion degree, that is, the spliced text with the minimum confusion degree is taken as the target spliced text. Alternatively, the confusion degree satisfying the condition is a confusion degree not greater than the confusion degree threshold, that is, a spliced text whose confusion degree is not greater than the confusion degree threshold is taken as a target spliced text. Thereafter, the candidate text category in the target stitched text is determined as the text category of the second text, as shown in table 2 below.

TABLE 2

In Table 2, the second text is "movie highlight is forensic". "one spliced text is" movie wonderful and attractive ". Feel true ", its corresponding confusion is" 1.0116761350214916", and the other spliced text is" movie highlight and wins. Feel true general ", the corresponding confusion degree is 1.1787686347935873, and one spliced text is movie wonderful and imperative. Feel really bad ", its corresponding confusion is" 1.2385989684455543". Based on the three confusion degrees, determining the spliced text with the minimum confusion degree as a target spliced text, namely, the target spliced text is 'movie wonderful', and the method is attractive. Feel true "the target splice text includes the second text" movie highlight is forensic ". "other text" feel true "and the candidate text category" good ", at which point" good "is wonderful as the second text" movie. "text category.

And then, determining a loss value of the second language model based on the text type of the second text and the type label of the second text, and carrying out updating training on the second language model based on the loss value of the second language model to obtain an updated second language model. And if the third training ending condition is met, taking the updated second language model as a target language model, and if the third training ending condition is not met, taking the updated second language model as a second language model for next training, and carrying out at least one updating training on the second language model based on the second text and the category label of the second text until the target language model is obtained. In this embodiment, the third training ending condition is not limited, and exemplary, the third training ending condition is that the training number reaches the third target training number (for example, 50 times), and the process of performing at least one update training on the second language model is described above, which is not repeated herein.

According to the method and the device, the second language model is trained based on the second text and the class labels of the second text, so that the target language model is obtained, the target language model can learn the soft coding text in a soft coding mode, the universality of the target language model is improved, meanwhile, the model can accurately output the occurrence probability of each character in the spliced text, and the text class of the text is accurately determined based on the occurrence probability of each character in the spliced text.

It can be understood that, in the embodiment of the present application, the spliced text corresponding to the confusion degree not greater than the confusion degree threshold may be used as the target spliced text, and the second language model may be adjusted based on the candidate text category in the target spliced text and the category label of the second text, so as to obtain the target language model. According to the method and the device for obtaining the target language model, the size of the confusion threshold can be continuously adjusted in the process of training to obtain the target language model, and the accuracy of the confusion threshold is improved.

It should be noted that, in the embodiment of the present application, based on the confusion degree of each spliced text, a target spliced text is selected from at least two spliced texts, and when the application is performed, the smoothness degree of the spliced text may be evaluated by using other evaluation indexes, so as to select the target spliced text from the spliced texts.

In one possible implementation manner, before obtaining the occurrence probability of each character in any one of the first texts according to the first network model, the method further includes: acquiring semantic features of any one of the first texts according to the second network model; based on semantic features of a plurality of first texts, adjusting the second network model to obtain a third feature extraction model; and constructing a first network model based on the third feature extraction model.

In the embodiment of the application, any one of the first texts is input into the second network model, and the semantic features of any one of the first texts are output by the second network model. Optionally, the second network model comprises an encoder and a converter. Any one of the first texts is input into the second network model, the text vector of any one of the first texts is determined by the encoder, and the semantic features of any one of the first texts are determined by the converter based on the text vector of any one of the first texts. In this way, semantic features of the plurality of first texts can be determined.

And then, based on the semantic features of the plurality of first texts, updating the second network model once to obtain an updated second network model. And if the fourth training ending condition is met, taking the updated second network model as a third feature extraction model, and if the fourth training ending condition is not met, taking the updated second network model as a second network model for next training, and based on a plurality of first texts, performing at least one updating training on the second network model until the third feature extraction model is obtained according to the mode of the embodiment of the application. The embodiment of the application does not limit the satisfaction of the fourth training ending condition, and the satisfaction of the fourth training ending condition is that the training times reach the fourth target training times.

It should be noted that, according to the second network model, the semantic features of any one of the first texts are obtained, and based on the semantic features of the plurality of first texts, the second network model is adjusted to obtain an implementation manner of the third feature extraction model, which is similar to the implementation manner of the steps 204-205, so that the description about the steps 204-205 is omitted here.

And after the third feature extraction model is obtained, splicing the third feature extraction model and the third network model to obtain a first network model. Based on the plurality of first texts, a first language model is obtained in the manner of steps 201-203. The first language model comprises a first feature extraction model and a regression model, wherein the first feature extraction model is obtained by adjusting a third feature extraction model, and the regression model is obtained by adjusting a third network model.

The method comprises the steps of firstly training a plurality of first texts to obtain a first language model, wherein the first language model comprises a first feature extraction model, then adjusting the first feature extraction model by utilizing semantic features of the plurality of first texts acquired based on the first feature extraction model to obtain a second feature extraction model, and obtaining a target language model based on the second feature extraction model. The method and the device realize two-stage training of the language model by using the first text, enable the language model with higher accuracy to be trained by using a small amount of the first text, reduce the dependence of the model on training data, thereby reducing the acquisition time of the training data and improving the training speed of the model and the efficiency of text classification.

And the accuracy of the feature extraction model can be improved by training the feature extraction model, so that the accuracy of the target language model containing the second feature extraction model is improved, and the accuracy of text classification can be ensured.

Based on the above implementation environment, the embodiment of the present application provides a text classification method, taking the flowchart of the text classification method provided in the embodiment of the present application as shown in fig. 6 as an example, where the method may be executed by the electronic device 11 in fig. 1. As shown in fig. 6, the method includes steps 601 to 604.

In step 601, a target text is acquired.

The embodiment of the application does not limit the content, the length, the text type and the like of the target text, and the target text is text in multimedia information and text such as a barrage, comments and the like aiming at the multimedia information.

Step 602, obtaining occurrence probability of each character in at least two reconstructed texts according to the target language model, wherein the reconstructed texts comprise target texts and candidate text categories.

The target language model is obtained according to the training method of the language model provided by the optional embodiments.

In this embodiment of the present invention, after the second feature extraction model is spliced with the regression model, a second language model is obtained, and the target language model is the second language model or is obtained by adjusting the second language model, so that the target language model includes a feature extraction network and a regression network, the feature extraction network is the second feature extraction model or is obtained by adjusting the second feature extraction model, and the regression network is the regression model or is obtained by adjusting the regression model.

Inputting the target text into the target language model, determining semantic features of the reconstructed text by the feature extraction network, and determining the occurrence probability of each character in the reconstructed text by the regression network based on the semantic features of the reconstructed text. The semantic features of the reconstructed text comprise character semantic features of all characters in the reconstructed text, and the reconstructed text comprises target text and candidate text categories.

In one possible implementation, obtaining, according to the target language model, occurrence probabilities of respective characters in at least two reconstructed texts includes: splicing the target text and the candidate text category to obtain any one of at least two reconstructed texts; inputting any one of the reconstructed texts into the target language model, determining the text vector of any one of the reconstructed texts by the target language model, and determining the occurrence probability of each character in any one of the reconstructed texts based on the text vector of any one of the reconstructed texts.

In the embodiment of the application, a hard coding mode is adopted to splice the target text and the candidate text category to obtain the reconstructed text, the reconstructed text is input into the target language model, the semantic features of the reconstructed text are determined by the feature extraction network, the occurrence probability of each character in the reconstructed text is determined by the regression network based on the semantic features of the reconstructed text, and therefore the occurrence probability of each character in the reconstructed text is output by the target language model.

Wherein the feature extraction network comprises an encoder and a converter. The method comprises the steps of inputting a reconstructed text into a target language model, determining a text vector of the reconstructed text by an encoder, determining semantic features of the reconstructed text by a converter based on the text vector of the reconstructed text, and determining occurrence probability of each character in the reconstructed text by a regression model based on the semantic features of the reconstructed text.

In another possible implementation manner, obtaining the occurrence probability of each character in at least two reconstructed texts according to the target language model includes: inputting the target text into a target language model, and determining a text vector of the target text by the target language model; splicing the text vector of the target text and the text vector of the candidate text category by the target language model to obtain the text vector of any one of at least two reconstructed texts; the probability of occurrence of each character in any one of the reconstructed texts is determined by the target language model based on the text vector of any one of the reconstructed texts.

In an embodiment of the present application, the target language model includes a feature extraction network and a regression network, and the feature extraction network includes an encoder and a converter. The method comprises the steps of inputting a target text into a target language model in a soft coding mode, determining a text vector of the target text and a text vector of a candidate text category by an encoder, splicing the text vector of the target text and the text vector of the candidate text category to obtain a text vector of a reconstructed text, determining semantic features of the reconstructed text by a converter based on the text vector of the reconstructed text, and determining occurrence probability of each character in the reconstructed text by a regression network based on the semantic features of the reconstructed text.

It should be noted that, the description about the step 602 may be described above as to "obtaining the occurrence probability of each character in at least two spliced texts according to the second language model", and the implementation principles of the two are the same, which is not described herein.

In step 603, a target reconstructed text is determined from the at least two reconstructed texts based on the occurrence probability of each character in the at least two reconstructed texts.

After the occurrence probability of each character in the at least two reconstructed texts is obtained, determining a target reconstructed text from the at least two reconstructed texts based on the occurrence probability of each character in the at least two reconstructed texts.

In one possible implementation, determining the target reconstructed text from the at least two reconstructed texts based on the occurrence probability of each character in the at least two reconstructed texts includes: determining the confusion degree of each reconstructed text based on the occurrence probability of each character in at least two reconstructed texts, wherein the confusion degree of the reconstructed text represents the smoothness degree of the reconstructed text; and determining the reconstructed text corresponding to the confusion degree meeting the condition as a target reconstructed text based on the confusion degree of each reconstructed text.

In the embodiment of the application, for any reconstructed text, the confusion degree of the reconstructed text is determined based on the occurrence probability of each character in the reconstructed text.

Optionally, determining the confusion degree of each reconstructed text based on the occurrence probability of each character in at least two reconstructed texts includes: determining the occurrence probability of each reconstructed text based on the occurrence probability of each character in at least two reconstructed texts; the confusion degree of each reconstructed text is determined based on the occurrence probability of each reconstructed text.

In the embodiment of the application, for any reconstructed text, the occurrence probability of the reconstructed text is determined according to a formula (6-1) or a formula (6-2), and then the confusion degree of the reconstructed text is determined according to a formula (7) based on the occurrence probability of the reconstructed text.

After the confusion degree of at least two reconstructed texts is calculated, determining the reconstructed text corresponding to the confusion degree meeting the condition as a target reconstructed text. The embodiment of the application does not limit the degree of confusion meeting the condition, and the degree of confusion meeting the condition is exemplified as the minimum degree of confusion, or the degree of confusion meeting the condition is the degree of confusion not greater than a threshold value of degree of confusion.

It should be noted that, the description about the step 603 may be described above as to "determining the target spliced text from at least two spliced texts based on the occurrence probability of each character in the at least two spliced texts", and the implementation principle of the two are the same, which is not described herein.

In step 604, the candidate text category in the target reconstructed text is determined as the text category of the target text.

In the embodiment of the application, the target reconstructed text comprises a target text and a candidate text category, and the candidate text category in the target reconstructed text is used as the text category of the target text. For example, the target reconstructed text is "movie highlight is absolute, and is attractive. Feel true ", the target reconstructed text includes the target text" movie highlight is forensic ". "other text" feel true "and candidate text category" good ", at which time the movie is wonderful with" good "as the target text" to be attractive. "text category.

It should be noted that, the description about step 604 may be referred to above as description about "determining the candidate text category in the target spliced text as the text category of the second text", and the implementation principles of the two are the same, which is not repeated herein.

The method for determining the target language model comprises the following steps: the method comprises the steps of firstly training a plurality of first texts to obtain a first language model, wherein the first language model comprises a first feature extraction model, then adjusting the first feature extraction model by utilizing semantic features of the plurality of first texts obtained based on the first feature extraction model, and then determining a target language model based on a second feature extraction model. The method and the device realize two-stage training of the language model by using the first text, enable the language model with higher accuracy to be trained by using a small amount of the first text, reduce the dependence of the model on training data, thereby reducing the acquisition time of the training data and improving the training speed of the model and the efficiency of text classification. By training the first feature extraction model, the accuracy of the feature extraction model can be improved, so that the accuracy of the target language model is improved, and the accuracy of text classification can be ensured.

The foregoing describes the training method and the text classification method of the language model according to the embodiment of the present application from the viewpoint of method steps, and the training method and the text classification method of the language model according to the embodiment of the present application will be described in combination from the viewpoint of scenes. The scene of the embodiment of the application is a scene aiming at an information recommendation type application program, and a plurality of first texts are constructed for information in the information recommendation type application program based on texts in the information, comment texts aiming at the information, barrage texts and the like.

Training the first network model based on a plurality of first texts to obtain a first language model, training a first feature extraction model in the first language model based on the plurality of first texts to obtain a second feature extraction model, and obtaining a target language model based on the second feature extraction model. The process of determining the target language model based on the plurality of first texts is described in steps 201 to 206, which will not be described herein.

Optionally, the target language model includes an encoder, a converter, and a Softmax model. At the time of application, a text category of the target text may be determined using the target language model. Taking the target text as an example of information in the information recommendation application, please refer to fig. 7, fig. 7 is a schematic diagram of processing information by using a target language model according to an embodiment of the present application.

For information in the information recommendation class application, the information includes character 1, character 2 (etc.), a special character "CLS" is added before the information when the information is input to the target language model, that is, CLS, character 1, character 2 are input to the target language model, the text vector of CLS, the text vector of character 1, the text vector of character 2 are determined by the encoder, and at the same time, the candidate text class includes character 3, and the text vector of character 3 is determined by the encoder. The text vector of the reconstructed text comprises the text vectors of the characters 1, 2 and 3.

The converter determines the semantic features of the CLS, the semantic features of the character 1, the semantic features of the character 2 and the semantic features of the character 3 based on the text vectors of the CLS, the character 1, the character 2 and the character 3, and the Softmax model outputs the occurrence probability of the character 1, the occurrence probability of the character 2 and the occurrence probability of the character 3 based on the semantic features of the CLS, the character 1, the character 2 and the character 3. The semantic features of the CLS represent text semantic features of the first text, and the semantic features of the characters 1-3 represent character semantic features of each character in the reconstructed text.

Then, based on the character semantic features of each character in the reconstructed text, the occurrence probability of the reconstructed text is determined according to a formula (6-1) or a formula (6-2), and then the confusion degree of the reconstructed text is determined according to a formula (7) based on the occurrence probability of the reconstructed text. In this way, the confusion degree of at least two reconstructed texts can be determined, then, the reconstructed text corresponding to the confusion degree meeting the condition is determined as the target reconstructed text, and the candidate text category in the target reconstructed text is determined as the text category of the target text. The process of determining the text category of the target text based on the target language model is described in steps 601 to 604, which will not be described herein.

According to the training method for the language model, the target language model is obtained through training of the first text, and the first text does not need to be marked, so that training time of the model can be shortened, iteration efficiency of the model is improved, and accuracy of the model is high. According to the embodiment of the application, the model can be finely adjusted without adopting the second texts and the category labels of the second texts, that is, the number of the second texts can be 0, zero sample learning is achieved, or the model can be finely adjusted by adopting a small number of the second texts and the category labels of the second texts, and small sample learning is achieved. Whether zero sample learning or small sample learning is performed, the accuracy of the target language model of the embodiment of the application is high, and the text classification accuracy can be ensured.

Next, referring to fig. 8, fig. 8 is a schematic structural diagram of a training device for language model according to an embodiment of the present application, as shown in fig. 8, the device includes:

an obtaining module 801, configured to obtain a plurality of first texts;

the obtaining module 801 is further configured to obtain, for any one of the first texts, occurrence probabilities of respective characters in any one of the first texts according to the first network model;

An adjustment module 802, configured to adjust the first network model based on occurrence probabilities of each character in the plurality of first texts, to obtain a first language model, where the first language model includes a first feature extraction model;

the obtaining module 801 is further configured to obtain semantic features of any one of the first texts according to the first feature extraction model;

the adjustment module 802 is further configured to adjust the first feature extraction model based on semantic features of the plurality of first texts, to obtain a second feature extraction model;

a determining module 803 is configured to determine a target language model based on the second feature extraction model.

In one possible implementation, the adjustment module 802 is configured to determine a loss value of each first text based on semantic features of the plurality of first texts; determining a loss value of the first feature extraction model based on the loss value of each first text; and adjusting the first feature extraction model based on the loss value of the first feature extraction model to obtain a second feature extraction model.

In one possible implementation manner, any one of the first texts is any one of the original texts or a replacement text corresponding to any one of the original texts, and the replacement text corresponding to any one of the original texts is a text obtained by replacing characters in any one of the original texts;

An adjustment module 802, configured to determine, for any one of the original texts, a loss value of the any one of the original texts based on semantic features of each of the original texts and semantic features of the replacement text corresponding to each of the original texts; for the replacement text corresponding to any one of the original texts, determining a loss value of the replacement text corresponding to any one of the original texts based on the semantic features of each of the original texts and the semantic features of the replacement text corresponding to each of the original texts.

In one possible implementation, the adjusting module 802 is configured to determine a first similarity between any one original text and a replacement text corresponding to any one original text based on semantic features of any one original text and semantic features of a replacement text corresponding to any one original text; determining a second similarity between any one original text and other original texts based on the semantic features of any one original text and the semantic features of other original texts, wherein the other original texts are original texts except for any one original text in the respective original texts; determining a third similarity between any one original text and the replacement text corresponding to other original texts based on the semantic features of any one original text and the semantic features of the replacement text corresponding to other original texts; a loss value of any one of the original texts is determined based on the first similarity, the second similarity, and the third similarity.

In one possible implementation, the adjusting module 802 is configured to determine a first similarity between any one original text and a replacement text corresponding to any one original text based on semantic features of any one original text and semantic features of a replacement text corresponding to any one original text; determining fourth similarity between the replacement text corresponding to any one original text and other original texts based on semantic features of the replacement text corresponding to any one original text and semantic features of other original texts, wherein the other original texts are original texts except any one original text in the original texts; determining fifth similarity between the replacement text corresponding to any one original text and the replacement text corresponding to other original texts based on the semantic features of the replacement text corresponding to any one original text and the semantic features of the replacement texts corresponding to other original texts; and determining a loss value of the replacement text corresponding to any one original text based on the first similarity, the fourth similarity and the fifth similarity.

In a possible implementation manner, the adjustment module 802 is configured to determine, based on semantic features of each first text, prediction information of each first text, where the prediction information of each first text is a probability that each character in the first text obtained by prediction is replaced; the method comprises the steps of obtaining marking information of each first text, wherein the marking information of the first text is information whether each character in the first text is replaced or not, which is obtained through marking; and adjusting the first feature extraction model based on the prediction information of each first text and the labeling information of each first text to obtain a second feature extraction model.

In one possible implementation, the determining module 803 is configured to obtain the second text and a category label of the second text; acquiring the occurrence probability of each character in at least two spliced texts according to a second language model, wherein the spliced texts comprise second texts and candidate text categories, and the second language model comprises a second feature extraction model; determining a target spliced text from the at least two spliced texts based on the occurrence probability of each character in the at least two spliced texts; and adjusting the second language model based on the candidate text category in the target spliced text and the category label of the second text to obtain the target language model.

In a possible implementation manner, the determining module 803 is configured to splice the second text and the candidate text category to obtain any one of the at least two spliced texts; and inputting any spliced text into the second language model, determining the text vector of any spliced text by the second language model, and determining the occurrence probability of each character in any spliced text based on the text vector of any spliced text.

In one possible implementation, the determining module 803 is configured to input the second text into a second language model, and determine a text vector of the second text by the second language model; splicing the text vector of the second text and the text vector of the candidate text category by the second language model to obtain the text vector of any spliced text of at least two spliced texts; the probability of occurrence of each character in any one of the spliced texts is determined by the second language model based on the text vector of any one of the spliced texts.

In one possible implementation, the determining module 803 is configured to determine, based on occurrence probabilities of respective characters in at least two spliced texts, a confusion degree of the respective spliced texts, where the confusion degree of the spliced texts characterizes a smoothness degree of the spliced texts; and determining the spliced text corresponding to the confusion degree meeting the condition as a target spliced text based on the confusion degree of each spliced text.

In one possible implementation, the determining module 803 is configured to determine an occurrence probability of each of the spliced texts based on an occurrence probability of each of the characters in the at least two spliced texts; and determining the confusion degree of each spliced text based on the occurrence probability of each spliced text.

In one possible implementation, the apparatus further includes:

the obtaining module 801 is further configured to obtain semantic features of any one of the first texts according to the second network model;

the adjustment module 802 is further configured to adjust the second network model based on semantic features of the plurality of first texts, to obtain a third feature extraction model;

and the construction module is used for constructing the first network model based on the third feature extraction model.

The device trains by utilizing a plurality of first texts to obtain a first language model, wherein the first language model comprises a first feature extraction model, then adjusts the first feature extraction model by utilizing semantic features of the plurality of first texts acquired based on the first feature extraction model to obtain a second feature extraction model, and obtains a target language model based on the second feature extraction model. The method and the device realize two-stage training of the language model by using the first text, enable the language model with higher accuracy to be trained by using a small amount of the first text, reduce the dependence of the model on training data, thereby reducing the acquisition time of the training data and improving the training speed of the model and the efficiency of text classification. By training the feature extraction model, the accuracy of the feature extraction model can be improved, so that the accuracy of a target language model containing the second feature extraction model is improved, and the accuracy of text classification can be ensured.

It should be understood that, in implementing the functions of the apparatus provided in fig. 8, only the division of the functional modules is illustrated, and in practical application, the functional modules may be allocated to different functional modules according to needs, that is, the internal structure of the apparatus is divided into different functional modules to complete all or part of the functions described above. In addition, the apparatus and the method embodiments provided in the foregoing embodiments belong to the same concept, and specific implementation processes of the apparatus and the method embodiments are detailed in the method embodiments and are not repeated herein.

Next, referring to fig. 9, fig. 9 is a schematic structural diagram of a training device for a language model according to an embodiment of the present application, as shown in fig. 9, where the training device includes:

an acquisition module 901, configured to acquire a target text;

the obtaining module 901 is further configured to obtain, according to a target language model, occurrence probabilities of respective characters in at least two reconstructed texts, where the reconstructed texts include a target text and a candidate text category, and the target language model is obtained according to a training method of any one of the foregoing language models;

a determining module 902, configured to determine a target reconstructed text from at least two reconstructed texts based on occurrence probabilities of respective characters in the at least two reconstructed texts;

The determining module 902 is further configured to determine a candidate text category in the target reconstructed text as the text category of the target text.

In a possible implementation manner, the obtaining module 901 is configured to splice the target text and the candidate text category to obtain any one of at least two reconstructed texts; inputting any one of the reconstructed texts into the target language model, determining the text vector of any one of the reconstructed texts by the target language model, and determining the occurrence probability of each character in any one of the reconstructed texts based on the text vector of any one of the reconstructed texts.

In one possible implementation, the obtaining module 901 is configured to input the target text into a target language model, and determine a text vector of the target text by the target language model; splicing the text vector of the target text and the text vector of the candidate text category by the target language model to obtain the text vector of any one of at least two reconstructed texts; the probability of occurrence of each character in any one of the reconstructed texts is determined by the target language model based on the text vector of any one of the reconstructed texts.

The target language model in the device is determined in the following manner: the method comprises the steps of firstly training a plurality of first texts to obtain a first language model, wherein the first language model comprises a first feature extraction model, then adjusting the first feature extraction model by utilizing semantic features of the plurality of first texts obtained based on the first feature extraction model, and then determining a target language model based on a second feature extraction model. The method and the device realize two-stage training of the language model by using the first text, enable the language model with higher accuracy to be trained by using a small amount of the first text, reduce the dependence of the model on training data, thereby reducing the acquisition time of the training data and improving the training speed of the model and the efficiency of text classification. By training the first feature extraction model, the accuracy of the feature extraction model can be improved, so that the accuracy of the target language model is improved, and the accuracy of text classification can be ensured.

It should be understood that, in implementing the functions of the apparatus provided in fig. 9, only the division of the functional modules is illustrated, and in practical application, the functional modules may be allocated to different functional modules according to needs, that is, the internal structure of the apparatus is divided into different functional modules to complete all or part of the functions described above. In addition, the apparatus and the method embodiments provided in the foregoing embodiments belong to the same concept, and specific implementation processes of the apparatus and the method embodiments are detailed in the method embodiments and are not repeated herein.

Fig. 10 shows a block diagram of a terminal device 1000 according to an exemplary embodiment of the present application. The terminal device 1000 may be a portable mobile terminal such as: a smart phone, a tablet, an MP3 (Moving Picture Experts Group Audio Layer III, motion picture expert compression standard audio plane 3) player, an MP4 (Moving Picture Experts Group Audio Layer IV, motion picture expert compression standard audio plane 4) player, a notebook or a desktop. Terminal device 1000 can also be referred to by other names of user devices, portable terminals, laptop terminals, desktop terminals, etc.

In general, terminal device 1000 includes: a processor 1001 and a memory 1002.

The processor 1001 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so on. The processor 1001 may be implemented in at least one hardware form of DSP (Digital Signal Processing ), FPGA (Field-Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array ). The processor 1001 may also include a main processor, which is a processor for processing data in an awake state, also referred to as a CPU (Central Processing Unit ), and a coprocessor; a coprocessor is a low-power processor for processing data in a standby state. In some embodiments, the processor 1001 may be integrated with a GPU (Graphics Processing Unit, image processor) for taking care of rendering and drawing of content that the display screen needs to display. In some embodiments, the processor 1001 may also include an AI (Artificial Intelligence ) processor for processing computing operations related to machine learning.

Memory 1002 may include one or more computer-readable storage media, which may be non-transitory. Memory 1002 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 1002 is used to store at least one instruction for execution by processor 1001 to implement the training method or text classification method of the language model provided by the method embodiments in the present application.

In some embodiments, terminal device 1000 can optionally further include: a peripheral interface 1003, and at least one peripheral. The processor 1001, the memory 1002, and the peripheral interface 1003 may be connected by a bus or signal line. The various peripheral devices may be connected to the peripheral device interface 1003 via a bus, signal wire, or circuit board. Specifically, the peripheral device includes: at least one of radio frequency circuitry 1004, a display 1005, a camera assembly 1006, audio circuitry 1007, a positioning assembly 1008, and a power supply 1009.

Peripheral interface 1003 may be used to connect I/O (Input/Output) related at least one peripheral to processor 1001 and memory 1002. In some embodiments, processor 1001, memory 1002, and peripheral interface 1003 are integrated on the same chip or circuit board; in some other embodiments, either or both of the processor 1001, memory 1002, and peripheral interface 1003 may be implemented on a separate chip or circuit board, which is not limited in this embodiment.

Radio Frequency circuit 1004 is used to receive and transmit RF (Radio Frequency) signals, also known as electromagnetic signals. Radio frequency circuitry 1004 communicates with a communication network and other communication devices via electromagnetic signals. The radio frequency circuit 1004 converts an electrical signal into an electromagnetic signal for transmission, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 1004 includes: antenna systems, RF transceivers, one or more amplifiers, tuners, oscillators, digital signal processors, codec chipsets, subscriber identity module cards, and so forth. Radio frequency circuitry 1004 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocol includes, but is not limited to: the world wide web, metropolitan area networks, intranets, generation mobile communication networks (2G, 3G, 4G, and 5G), wireless local area networks, and/or WiFi (Wireless Fidelity ) networks. In some embodiments, the radio frequency circuitry 1004 may also include NFC (Near Field Communication ) related circuitry, which is not limited in this application.

The display screen 1005 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display 1005 is a touch screen, the display 1005 also has the ability to capture touch signals at or above the surface of the display 1005. The touch signal may be input to the processor 1001 as a control signal for processing. At this time, the display 1005 may also be used to provide virtual buttons and/or virtual keyboards, also referred to as soft buttons and/or soft keyboards. In some embodiments, the display 1005 may be one, disposed on the front panel of the terminal device 1000; in other embodiments, at least two display screens 1005 may be respectively disposed on different surfaces of terminal device 1000 or in a folded design; in other embodiments, display 1005 may be a flexible display disposed on a curved surface or a folded surface of terminal device 1000. Even more, the display 1005 may be arranged in a non-rectangular irregular pattern, i.e., a shaped screen. The display 1005 may be made of LCD (Liquid Crystal Display ), OLED (Organic Light-Emitting Diode) or other materials.

The camera assembly 1006 is used to capture images or video. Optionally, camera assembly 1006 includes a front camera and a rear camera. Typically, the front camera is disposed on the front panel of the terminal and the rear camera is disposed on the rear surface of the terminal. In some embodiments, the at least two rear cameras are any one of a main camera, a depth camera, a wide-angle camera and a tele camera, so as to realize that the main camera and the depth camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize a panoramic shooting and Virtual Reality (VR) shooting function or other fusion shooting functions. In some embodiments, camera assembly 1006 may also include a flash. The flash lamp can be a single-color temperature flash lamp or a double-color temperature flash lamp. The dual-color temperature flash lamp refers to a combination of a warm light flash lamp and a cold light flash lamp, and can be used for light compensation under different color temperatures.

The audio circuit 1007 may include a microphone and a speaker. The microphone is used for collecting sound waves of users and environments, converting the sound waves into electric signals, and inputting the electric signals to the processor 1001 for processing, or inputting the electric signals to the radio frequency circuit 1004 for voice communication. For purposes of stereo acquisition or noise reduction, a plurality of microphones may be provided at different portions of terminal device 1000, respectively. The microphone may also be an array microphone or an omni-directional pickup microphone. The speaker is used to convert electrical signals from the processor 1001 or the radio frequency circuit 1004 into sound waves. The speaker may be a conventional thin film speaker or a piezoelectric ceramic speaker. When the speaker is a piezoelectric ceramic speaker, not only the electric signal can be converted into a sound wave audible to humans, but also the electric signal can be converted into a sound wave inaudible to humans for ranging and other purposes. In some embodiments, audio circuit 1007 may also include a headphone jack.

The location component 1008 is used to locate the current geographic location of the terminal device 1000 to enable navigation or LBS (Location Based Service, location-based services). The positioning component 1008 may be a positioning component based on the united states GPS (Global Positioning System ), the beidou system of china, or the galileo system of russia.

Power supply 1009 is used to power the various components in terminal device 1000. The power source 1009 may be alternating current, direct current, disposable battery or rechargeable battery. When the power source 1009 includes a rechargeable battery, the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery. The wired rechargeable battery is a battery charged through a wired line, and the wireless rechargeable battery is a battery charged through a wireless coil. The rechargeable battery may also be used to support fast charge technology.

In some embodiments, terminal device 1000 can further include one or more sensors 1010. The one or more sensors 1010 include, but are not limited to: acceleration sensor 1011, gyroscope sensor 1012, pressure sensor 1013, fingerprint sensor 1014, optical sensor 1015, and proximity sensor 1016.

The acceleration sensor 1011 can detect the magnitudes of accelerations on three coordinate axes of the coordinate system established with the terminal apparatus 1000. For example, the acceleration sensor 1011 may be used to detect components of gravitational acceleration in three coordinate axes. The processor 1001 may control the display screen 1005 to display a user interface in a landscape view or a portrait view according to the gravitational acceleration signal acquired by the acceleration sensor 1011. The acceleration sensor 1011 may also be used for the acquisition of motion data of a game or a user.

The gyro sensor 1012 may detect a body direction and a rotation angle of the terminal device 1000, and the gyro sensor 1012 may collect a 3D motion of the user to the terminal device 1000 in cooperation with the acceleration sensor 1011. The processor 1001 may implement the following functions according to the data collected by the gyro sensor 1012: motion sensing (e.g., changing UI according to a tilting operation by a user), image stabilization at shooting, game control, and inertial navigation.

Pressure sensor 1013 may be disposed at a side frame of terminal device 1000 and/or at a lower layer of display 1005. When the pressure sensor 1013 is provided at a side frame of the terminal apparatus 1000, a grip signal of the terminal apparatus 1000 by a user can be detected, and the processor 1001 performs right-left hand recognition or quick operation based on the grip signal collected by the pressure sensor 1013. When the pressure sensor 1013 is provided at the lower layer of the display screen 1005, the processor 1001 controls the operability control on the UI interface according to the pressure operation of the user on the display screen 1005. The operability controls include at least one of a button control, a scroll bar control, an icon control, and a menu control.

The fingerprint sensor 1014 is used to collect a fingerprint of the user, and the processor 1001 identifies the identity of the user based on the fingerprint collected by the fingerprint sensor 1014, or the fingerprint sensor 1014 identifies the identity of the user based on the collected fingerprint. Upon recognizing that the user's identity is a trusted identity, the processor 1001 authorizes the user to perform relevant sensitive operations including unlocking the screen, viewing encrypted information, downloading software, paying for and changing settings, etc. Fingerprint sensor 1014 may be disposed on the front, back, or side of terminal device 1000. When a physical key or vendor Logo is provided on terminal device 1000, fingerprint sensor 1014 may be integrated with the physical key or vendor Logo.

The optical sensor 1015 is used to collect ambient light intensity. In one embodiment, the processor 1001 may control the display brightness of the display screen 1005 based on the ambient light intensity collected by the optical sensor 1015. Specifically, when the intensity of the ambient light is high, the display brightness of the display screen 1005 is turned up; when the ambient light intensity is low, the display brightness of the display screen 1005 is turned down. In another embodiment, the processor 1001 may dynamically adjust the shooting parameters of the camera module 1006 according to the ambient light intensity collected by the optical sensor 1015.

Proximity sensor 1016, also referred to as a distance sensor, is typically located on the front panel of terminal device 1000. Proximity sensor 1016 is used to capture the distance between the user and the front face of terminal device 1000. In one embodiment, when proximity sensor 1016 detects a gradual decrease in the distance between the user and the front face of terminal device 1000, processor 1001 controls display 1005 to switch from the bright screen state to the off screen state; when the proximity sensor 1016 detects that the distance between the user and the front surface of the terminal device 1000 gradually increases, the processor 1001 controls the display screen 1005 to switch from the off-screen state to the on-screen state.

It will be appreciated by those skilled in the art that the structure shown in fig. 10 is not limiting and that terminal device 1000 may include more or fewer components than shown, or may combine certain components, or may employ a different arrangement of components.

Fig. 11 is a schematic structural diagram of a server provided in the embodiment of the present application, where the server 1100 may generate relatively large differences due to different configurations or performances, and may include one or more processors 1101 and one or more memories 1102, where the one or more memories 1102 store at least one program code, and the at least one program code is loaded and executed by the one or more processors 1101 to implement the training method or the text classification method of the language model provided in the above embodiments of the method, and the processor 1101 is a CPU, for example. Of course, the server 1100 may also have a wired or wireless network interface, a keyboard, an input/output interface, etc. for performing input/output, and the server 1100 may also include other components for implementing device functions, which are not described herein.

In an exemplary embodiment, there is also provided a computer-readable storage medium having stored therein at least one program code loaded and executed by a processor to cause an electronic device to implement a training method or a text classification method of any of the language models described above.

Alternatively, the above-mentioned computer readable storage medium may be a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a Read-Only optical disk (CD-ROM), a magnetic tape, a floppy disk, an optical data storage device, and the like.

In an exemplary embodiment, a computer program or computer program product having at least one computer instruction stored therein, the at least one computer instruction being loaded and executed by a processor to cause the computer to implement a training method or a text classification method of any of the language models described above is also provided.

It should be understood that references herein to "a plurality" are to two or more. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship.

The foregoing embodiment numbers of the present application are merely for describing, and do not represent advantages or disadvantages of the embodiments.

The foregoing description of the exemplary embodiments of the present application is not intended to limit the invention to the particular embodiments disclosed, but on the contrary, the intention is to cover all modifications, equivalents, alternatives, and alternatives falling within the spirit and scope of the invention.

Claims

1. A method of training a language model, the method comprising:

Acquiring a plurality of first texts;

2. The method of claim 1, wherein adjusting the first feature extraction model based on semantic features of the plurality of first texts results in a second feature extraction model, comprising:

determining a penalty value for each first text based on semantic features of the plurality of first texts;

determining a loss value of the first feature extraction model based on the loss value of the respective first text;

and adjusting the first feature extraction model based on the loss value of the first feature extraction model to obtain a second feature extraction model.

3. The method according to claim 2, wherein the any one of the first texts is any one of the original texts or a replacement text corresponding to the any one of the original texts, the replacement text corresponding to the any one of the original texts being a text obtained by replacing characters in the any one of the original texts;

the determining a penalty value for each first text based on semantic features of the plurality of first texts includes:

for any one of the original texts, determining a loss value of the original text based on semantic features of each original text and semantic features of a replacement text corresponding to each original text;

and determining the loss value of the replacement text corresponding to any original text based on the semantic features of each original text and the semantic features of the replacement text corresponding to each original text for the replacement text corresponding to any original text.

4. The method of claim 3, wherein the determining the loss value for any one of the original text based on the semantic features of the respective original text and the semantic features of the corresponding alternate text of the respective original text comprises:

Determining a first similarity between the any one original text and the replacement text corresponding to the any one original text based on the semantic features of the any one original text and the semantic features of the replacement text corresponding to the any one original text;

determining a second similarity between the any one original text and other original texts based on semantic features of the any one original text and semantic features of the other original texts, the other original texts being original texts other than the any one original text in the respective original texts;

determining a third similarity between the any one original text and the replacement text corresponding to the other original text based on the semantic features of the any one original text and the semantic features of the replacement text corresponding to the other original text;

and determining a loss value of any original text based on the first similarity, the second similarity and the third similarity.

5. The method of claim 3, wherein the determining the loss value of the alternate text corresponding to the any one of the original text based on the semantic features of the respective original text and the semantic features of the alternate text corresponding to the respective original text comprises:

determining a fourth similarity between the replacement text corresponding to the any original text and the other original text based on the semantic features of the replacement text corresponding to the any original text and the semantic features of the other original text, wherein the other original text is an original text except the any original text in the respective original texts;

determining fifth similarity between the replacement text corresponding to any one original text and the replacement text corresponding to other original texts based on the semantic features of the replacement text corresponding to any one original text and the semantic features of the replacement texts corresponding to other original texts;

and determining a loss value of the replacement text corresponding to any one original text based on the first similarity, the fourth similarity and the fifth similarity.

6. The method of claim 1, wherein adjusting the first feature extraction model based on semantic features of the plurality of first texts results in a second feature extraction model, comprising:

Determining prediction information of each first text based on semantic features of each first text, wherein the prediction information of each first text is the probability of each character in the first text obtained through prediction;

the method comprises the steps of obtaining marking information of each first text, wherein the marking information of the first text is information whether each character in the first text is replaced or not, which is obtained through marking;

and adjusting the first feature extraction model based on the prediction information of each first text and the labeling information of each first text to obtain a second feature extraction model.

7. The method of any of claims 1 to 6, wherein the determining a target language model based on the second feature extraction model comprises:

acquiring a second text and a category label of the second text;

acquiring the occurrence probability of each character in at least two spliced texts according to a second language model, wherein the spliced texts comprise the second texts and candidate text categories, and the second language model comprises the second feature extraction model;

determining a target spliced text from the at least two spliced texts based on the occurrence probability of each character in the at least two spliced texts;

And adjusting the second language model based on the candidate text category in the target spliced text and the category label of the second text to obtain the target language model.

8. The method of claim 7, wherein the obtaining, according to the second language model, the occurrence probability of each character in the at least two concatenated texts comprises:

splicing the second text and the candidate text category to obtain any spliced text of the at least two spliced texts;

and inputting any spliced text into the second language model, determining the text vector of the any spliced text by the second language model, and determining the occurrence probability of each character in the any spliced text based on the text vector of the any spliced text.

9. The method of claim 7, wherein the obtaining, according to the second language model, the occurrence probability of each character in the at least two concatenated texts comprises:

inputting the second text into the second language model, and determining a text vector of the second text by the second language model;

splicing the text vector of the second text and the text vector of the candidate text category by the second language model to obtain the text vector of any spliced text of the at least two spliced texts;

Determining, by the second language model, occurrence probabilities of respective characters in the any one of the spliced texts based on text vectors of the any one of the spliced texts.

10. The method of claim 7, wherein the determining the target stitched text from the at least two stitched texts based on the probability of occurrence of each character in the at least two stitched texts comprises:

determining the confusion degree of each spliced text based on the occurrence probability of each character in the at least two spliced texts, wherein the confusion degree of each spliced text represents the smoothness degree of the spliced text;

and determining the spliced text corresponding to the confusion degree meeting the condition as the target spliced text based on the confusion degree of each spliced text.

11. The method of claim 10, wherein the determining the confusion of each of the spliced texts based on the occurrence probability of each of the characters in the at least two spliced texts comprises:

determining the occurrence probability of each spliced text based on the occurrence probability of each character in the at least two spliced texts;

and determining the confusion degree of each spliced text based on the occurrence probability of each spliced text.

12. The method according to any one of claims 1 to 6, further comprising, before the obtaining, according to the first network model, the occurrence probability of each character in the any one of the first texts:

acquiring semantic features of any one of the first texts according to a second network model;

based on the semantic features of the plurality of first texts, adjusting the second network model to obtain a third feature extraction model;

and constructing the first network model based on the third feature extraction model.

13. A method of text classification, the method comprising:

acquiring a target text;

obtaining the occurrence probability of each character in at least two reconstructed texts according to a target language model, wherein the reconstructed texts comprise the target texts and candidate text categories, and the target language model is obtained according to the method of any one of claims 1 to 12;

14. The method of claim 13, wherein the obtaining, according to the target language model, the occurrence probability of each character in the at least two reconstructed texts comprises:

Splicing the target text and the candidate text category to obtain any one of the at least two reconstructed texts;

and inputting the any one of the reconstructed texts into the target language model, determining a text vector of the any one of the reconstructed texts by the target language model, and determining the occurrence probability of each character in the any one of the reconstructed texts based on the text vector of the any one of the reconstructed texts.

15. The method of claim 13, wherein the obtaining, according to the target language model, the occurrence probability of each character in the at least two reconstructed texts comprises:

inputting the target text into the target language model, and determining a text vector of the target text by the target language model;

splicing the text vector of the target text and the text vector of the candidate text category by the target language model to obtain the text vector of any one of the at least two reconstructed texts;

determining, by the target language model, occurrence probabilities of respective characters in the any one of the reconstructed texts based on the text vectors of the any one of the reconstructed texts.

16. A training apparatus for a language model, the apparatus comprising:

The acquisition module is used for acquiring a plurality of first texts;

17. A text classification device, the device comprising:

the acquisition module is used for acquiring the target text;

the obtaining module is further configured to obtain occurrence probabilities of characters in at least two reconstructed texts according to a target language model, where the reconstructed texts include the target text and candidate text types, and the target language model is obtained according to the method of any one of claims 1 to 12;

18. An electronic device comprising a processor and a memory, wherein the memory has stored therein at least one program code that is loaded and executed by the processor to cause the electronic device to implement the method of training a language model according to any one of claims 1 to 12 or the method of classifying text according to any one of claims 13 to 15.

19. A computer readable storage medium having stored therein at least one program code loaded and executed by a processor to cause a computer to implement the method of training a language model according to any one of claims 1 to 12 or the method of classifying text according to any one of claims 13 to 15.

20. A computer program product, characterized in that at least one computer instruction is stored in the computer program product, which is loaded and executed by a processor, to cause the computer to implement the training method of the language model according to any one of claims 1 to 12 or the text classification method according to any one of claims 13 to 15.