CN112463942A

CN112463942A - Text processing method and device, electronic equipment and computer readable storage medium

Info

Publication number: CN112463942A
Application number: CN202011446124.4A
Authority: CN
Inventors: 张超
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd; Shenzhen Huantai Technology Co Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd; Shenzhen Huantai Technology Co Ltd
Priority date: 2020-12-11
Filing date: 2020-12-11
Publication date: 2021-03-09

Abstract

The embodiment of the application discloses a text processing method and device, electronic equipment and a computer readable storage medium. The method comprises the following steps: acquiring original text data, wherein the original text data at least comprises a target text needing to be subjected to a reference resolution task; extracting key information contained in the original text data through an extraction model, and labeling each key information to obtain an extraction result; and analyzing the original text data and the extraction result through a generation model to obtain a reference resolution text corresponding to the target text. The text processing method, the text processing device, the electronic equipment and the computer readable storage medium can simplify the processing process of the reference resolution task and improve the processing efficiency.

Description

Text processing method and device, electronic equipment and computer readable storage medium

Technical Field

The application relates to the technical field of artificial intelligence, in particular to a text processing method and device, electronic equipment and a computer readable storage medium.

Background

In the field of artificial intelligence technology, Natural Language Processing (NLP) has been one of the major research directions. In the course of using a language, a pronoun or abbreviation is usually used to replace a previous word or omit a previous word, which is called "phenomenon" in linguistics. The reference resolution is a basic research in the field of NLP, and can solve the problem of unknown reference in the text, so that the electronic equipment can better understand the semantics expressed in the text. The existing reference resolution scheme is complex, long in processing time and low in efficiency.

Disclosure of Invention

The embodiment of the application discloses a text processing method and device, electronic equipment and a computer readable storage medium, which can simplify the processing process of a reference resolution task and improve the processing efficiency.

The embodiment of the application discloses a text processing method, which comprises the following steps:

acquiring original text data, wherein the original text data at least comprises a target text needing to be subjected to a reference resolution task;

extracting key information contained in the original text data through an extraction model, and labeling each key information to obtain an extraction result, wherein the key information at least comprises a candidate entity in the original text data and an insertion position of the candidate entity in the target text, the extraction model is obtained by training according to a first training sample, and the first training sample comprises an original text sample and a labeled sample corresponding to the original text sample;

analyzing the original text data and the extracted result through a generated model to obtain a reference resolution text corresponding to the target text, wherein the generated model is obtained by training according to a second training sample, and the second training sample comprises the original text sample, an extracted sample result obtained by training the original text sample through the trained extracted model, and a reference resolution sample result corresponding to the original text sample.

The embodiment of the application discloses a text processing device, includes:

the acquisition module is used for acquiring original text data, and the original text data comprises a target text needing to be subjected to a reference resolution task;

the extraction module is used for extracting key information contained in the original text data through an extraction model, and labeling each key information to obtain an extraction result, wherein the key information at least comprises a candidate entity in the original text data and an insertion position of the candidate entity in the target text, the extraction model is obtained by training according to a first training sample, and the first training sample comprises an original text sample and a labeled sample corresponding to the original text sample;

and the generating module is used for analyzing the original text data and the extracted result through a generating model to obtain a reference resolution text corresponding to the target text, wherein the generating model is obtained by training according to a second training sample, and the second training sample comprises the original text sample, the extracted sample result of the original text sample obtained by the trained extracted model, and the reference resolution sample result corresponding to the original text sample.

The embodiment of the application discloses an electronic device, which comprises a memory and a processor, wherein a computer program is stored in the memory, and when the computer program is executed by the processor, the processor is enabled to realize the method.

An embodiment of the application discloses a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the method as described above.

The text processing method, the device, the electronic device and the computer readable storage medium disclosed in the embodiments of the present application extract the key information included in the original text data through the extraction model, and label each key information to obtain an extraction result, the key information at least includes a candidate entity in the original text data and an insertion position of the candidate entity in the target text, the extraction model is obtained by training according to a first training sample, the first training sample includes an original text sample and a labeled sample corresponding to the original text sample, the original text data and the extraction result are analyzed through the generation model to obtain a reference resolution text corresponding to the target text, the generation model is obtained by training according to a second training sample, the second training sample includes the original text sample and an extraction sample result obtained by the original text sample through the trained extraction model, and the original text sample can be subjected to preliminary information extraction and reference through the extraction model, the extraction result is processed through the generation model to obtain a final reference resolution text, the complex reference resolution task is decomposed into two relatively simple tasks, the processing process of the reference resolution task can be simplified, and the processing efficiency is improved. And the generation model obtains the reference resolution text based on the extraction result of the extraction model and the original text data, so that the obtained reference resolution text is more accurate, and the quality of the reference resolution task can be improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

FIG. 1A is a diagram illustrating an example of an application scenario for a text processing method;

FIG. 1B is a diagram illustrating an electronic device performing a reference resolution task, in accordance with an embodiment;

FIG. 2 is a flow diagram of a method of text processing in one embodiment;

FIG. 3A is a diagram illustrating an original text sample and an annotated sample in one embodiment;

FIG. 3B is a diagram that illustrates generating a reference resolution text, in one embodiment;

FIG. 4 is a schematic flow diagram of generating a reference resolution text by a generative model in one embodiment;

FIG. 5 is a flowchart of a text processing method in another embodiment;

FIG. 6 is a block diagram of a text processing apparatus in one embodiment;

fig. 7 is a block diagram of an electronic device in one embodiment.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It is to be noted that the terms "comprises" and "comprising" and any variations thereof in the examples and figures of the present application are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.

It will be understood that, as used herein, the terms "first," "second," and the like may be used herein to describe various elements, but these elements are not limited by these terms. These terms are only used to distinguish one element from another. For example, a first training sample may be referred to as a second training sample, and similarly, a second training sample may be referred to as a first training sample, without departing from the scope of the present application. The first training sample and the second training sample are both training samples, but they are not the same training sample.

Reference resolution refers to the division of different references (indications) representing the same Entity (Entity) into an equivalent set, i.e. the establishment of a chain of references between references and entities. The reference resolution plays an important role in tasks such as machine reading understanding, information extraction and multi-turn conversation, and can help electronic equipment to understand meanings expressed in texts more easily. In the related art, the resolution task is mainly performed in two ways:

the first method is as follows: and performing a reference resolution task by using the rule. Logic rules are formed through a large amount of manually constructed domain and language knowledge, and the text is subjected to reference resolution by using the logic rules. This approach requires a lot of manual intervention, and the system has a very low degree of automation, low processing efficiency, and poor portability.

The second method comprises the following steps: and performing a reference resolution task based on the end-to-end model. An end-to-end model, such as a Convolutional Neural Network (CNN) model, searches for pronouns in a text, then searches for candidate entities in the text, and links the relationship between the entities and the pronouns by one-to-one matching and sorting to execute a reference resolution task. When a reference resolution task is performed, the output of a previous ring in an end-to-end model is used as the input of a next ring, and a serial task is executed, so that when a wrong prediction occurs in a certain previous link, the subsequent link is influenced, a wrong expansion is caused, the final result is influenced, and the result is uncontrollable. And the complexity of the end-to-end model is high, the processing time is long, and the processing efficiency is low.

The embodiment of the application provides a text processing method and device, electronic equipment and a computer readable storage medium, which are applicable to scenes such as machine reading understanding, intelligent man-machine multi-turn conversation and the like, can simplify the processing process of a reference resolution task, improve the processing efficiency, enable the obtained reference resolution text to be more accurate, and improve the quality of the reference resolution task.

Fig. 1A is an application scenario diagram of a text processing method in one embodiment. As shown in fig. 1A, the text processing method may be applied to a multi-turn dialog scenario, where the application scenario may include a user and an electronic device 10, and the user may have a conversation via the electronic device 10. The session may be with a user of another electronic device or with a smart voice program on the electronic device 10. The electronic device 10 may take the dialog text for one or more rounds of dialog and treat the dialog text as raw text data. The electronic device 10 may extract the key information included in the original text data through the extraction model, label each key information to obtain an extraction result, perform preliminary information extraction and reference on the original text data by using the extraction model, and analyze the original text data and the extraction result through the generation model to obtain a reference resolution text corresponding to the target text.

FIG. 1B is a diagram illustrating an electronic device performing a reference resolution task, in accordance with an embodiment. As shown in fig. 1B, in the multi-turn dialog scenario, user a has performed 3 turns of dialog with user B via electronic device 10, turn 1: you will play basketball lam, the 2 nd round: of course, round 3: and beating together in the changed days. The electronic device 10 may use the 3 rd round of speech text as the original text data and input the original text data into the extraction model to obtain the extraction result, and then input the original text data and the extraction result together into the generation model, and the generation model rewrites the 3 rd round of speech text according to the extraction result and the original text data to obtain the reference resolution text "change days and play basketball together".

As shown in fig. 2, in an embodiment, a text processing method is provided, which is applicable to the above electronic device, where the electronic device may include a terminal device such as a mobile phone, a smart wearable device, a tablet Computer, a Personal Computer (PC), and a vehicle-mounted terminal, and may also include a service device such as a server and a server cluster, which is not limited in this embodiment. The method may comprise the steps of:

step 210, obtaining original text data, where the original text data at least includes a target text that needs to be subjected to a reference resolution task.

In some embodiments, the raw text data may include dialog text that may refer to dialog text between a user of the electronic device and users of other electronic devices, and may also refer to dialog text between a user of the electronic device and an automatically responding language program, and then the target text may refer to the most recent turn of dialog text. The original text data may also include a certain paragraph of text in an article, or a certain paragraph of text published on a social media, and the content of the original text data may be determined according to different application scenarios.

In some embodiments, the original text data may further include context data of a target text, in addition to the target text that needs to be subjected to the reference resolution task, and the context data may be used to reflect a context of the target text, so as to help the electronic device to better recognize semantics of the original text data, and the entities referred to in the target text may be more accurately analyzed through the context data of the target text, thereby improving accuracy of the reference resolution task.

As a specific embodiment, in a multi-turn dialog scenario, the original text data may include all turns of dialog text in a current dialog scenario, where the current dialog scenario may refer to a dialog window opened by a user in the electronic device at this time, and the context data of the target text may include all turns of dialog text before a latest turn of dialog text. Optionally, the context data of the target text may also include N pairs of dialog texts before the latest pair of dialogs, where N may be a preset positive integer, for example, 3 pairs of dialog texts before the latest pair of dialogs are used as the context data, or 5 or 6 pairs of dialog texts before the latest pair of dialogs are used as the context data, but is not limited thereto. Optionally, the context data of the target text may also include dialog text within a preset time period before the last pair of dialogs, for example, within 3 minutes, within 10 minutes, etc. before the last pair of dialogs, but is not limited thereto.

As another specific implementation, in the context of reading and understanding an article, the original text data may include text of a paragraph in which the target text is located, and the context data of the target text may include other text in the paragraph except for the target text. Optionally, the context data of the target text may also include M sentences of text preceding the target text in the paragraph, where M may be a preset positive integer, for example, but not limited to, 2 sentences of text, 3 sentences of text, etc. preceding the target text in the paragraph.

It should be noted that the scenario for executing the reference resolution task is not limited to the several scenarios listed above, and the target text and the context data may be set and selected according to actual requirements, which is not limited in the embodiment of the present application.

And step 220, extracting the key information contained in the original text data through the extraction model, and labeling each key information to obtain an extraction result.

The key information may at least include candidate entities in the original text data and insertion positions of the candidate entities in the target text, where the candidate entities may refer to entities that the pronouns refer to when the target text is subjected to the reference resolution, or entities that are omitted from the target text. Further, the key information may further include candidate pronouns in the original text data, the candidate pronouns refer to words in the target text that refer to other entities, and the candidate pronouns and the candidate entities may or may not have a reference relationship.

For example, the original text data includes "will you play basketball lam? Of course so. Change the day and play together! "wherein the candidate entity may comprise" basketball, "and the candidate pronoun may comprise" change of day, "but not in a designated relationship. The raw text data includes "do you know tom? I do not recognize him ", the candidate entity may comprise" tom ", the candidate pronoun may comprise" he ", both having a reference relationship.

The extraction model can be a classification model with the capability of carrying out the reference resolution task, or a sequence labeling model with the capability of carrying out the reference resolution task, and the like. The extraction model can perform word segmentation on the original text data, identify the part of speech of each word in the original text data, and determine the reference relationship among the words.

The extraction model can identify the insertion positions of antecedent, response and antecedent in the target text, wherein the antecedent can refer to the word corresponding to the initial position in the reference relationship, the antecedent is the word pointed by the response, and the antecedent and the response can form a reference chain. Illustratively, the raw text data includes "how strong do you know? I do not recognize him ", wherein" Xiaoqiang "and" he "have a reference relationship, then" he "is the reference and" Xiaoqiang "is the antecedent. The candidate entities may include antecedents in the original text data, and the candidate pronouns may include correspondents in the original text data.

Alternatively, in the original text data, one reference word may correspond to one or more antecedent words, one antecedent word may also correspond to one or more reference words, and if the target text is zero-reference text (meaning that the target text does not include pronouns), one antecedent word may not correspond to a reference word. The insertion positions of antecedents, correspondents and antecedents with the reference relationship in the original text data in the target text are identified through the extraction model, key information can be extracted from the original text data, and the reference relationship between the candidate entity and the candidate pronouncing is established.

The electronic equipment can input the acquired original text data into an extraction model, the extraction model can extract text features in the original text data, and carry out reference resolution on the target text according to the text features, and mark out candidate entities, candidate pronouns and insertion positions of the candidate entities in the target text, which are contained in the original text data, so as to obtain an extraction result. The insertion position of the candidate entity in the target text can refer to the position of the candidate entity added into the target text when the target text is subjected to the reference resolution. If the candidate entity and the candidate pronouns have the reference relationship, the insertion position of the candidate entity in the target text can be the position of the candidate pronouns, and if the target text does not contain the candidate pronouns or the candidate pronouns and the candidate entity do not have the reference relationship, the insertion position of the candidate entity in the target text is the position of the omitted pronouns in the target text.

The extracted model may be obtained by training according to a first training sample, where the first training sample may include an original text sample and a labeled sample corresponding to the original text sample, and the labeled sample may label key information in the original text sample.

For example, the original text sample is "i like Zhou Ji Lun the best. I like too. I like his qilixiang the corresponding annotated sample may comprise candidate entities: zhou Ji Lun; candidate pronouns: he or she; insertion position: he has no need of using the traditional Chinese medicine. Alternatively, the insertion position of the candidate entity may be directly represented by the inserted candidate pronouns, or may also be represented by the character position where the insertion is located, for example, the number of dialog turns (or sentence number) and the character sequence number may be used for representation, for example, the insertion position in the above description may be the 7 th character in the 3 rd turn, and may be directly represented by the format of (3, 7).

The extraction model is trained by utilizing the first training sample, the first training sample can be input into the extraction model to be trained, a predicted extraction result corresponding to the original text sample is obtained through prediction of the extraction model, the predicted extraction result can be compared with a labeled sample corresponding to the original text sample, parameters of the extraction model are adjusted according to the comparison result, and the extraction model training is completed until the distance between the predicted extraction result obtained by extracting the model and the labeled sample is smaller than a preset first distance threshold value.

Because the labeled sample of the first training sample comprises the labeled candidate entity, the candidate pronouns and the insertion positions of the candidate entities, the extraction model can learn to obtain the capabilities of identifying the candidate entity and the candidate pronouns and performing reference resolution based on the first training sample, so that the extraction result obtained by predicting the original text sample by the extraction model is attached to the corresponding labeled sample and has the reference resolution capability. The extraction model labels the original text data to obtain an extraction result, the extraction result can be used as a preliminary reference resolution result of the original text data, and the electronic equipment can easily know how to perform reference resolution on the target text based on the extraction result.

In some embodiments, different types of key information may be labeled with different labels, wherein a candidate entity may be labeled with a first label, a candidate pronoun may be labeled with a second label, and an insertion location may be labeled with a third label, for example, a candidate entity is labeled with label es, a candidate pronoun is labeled with label ps, and an insertion location is labeled with label in. Further, the labeled sample may include a label sequence corresponding to the original text sample, the label sequence may include labels corresponding to respective characters in the original text sample, and a label of the insertion position, and characters other than the key information may be labeled with a uniform label, for example, labels such as "+", "-", etc., but are not limited thereto.

FIG. 3A is a diagram illustrating an original text sample and an annotated sample, in one embodiment. As shown in fig. 3A, the original text sample is "how do you like science? None of the favorites, the ball of the game is general! ", the corresponding labeled sample may be". times.. "department" is a candidate entity, labeled "AA", and "he" is a candidate pronoun, labeled "B", and refers to the fact that "department" needs to be inserted at the position of "he" at the time of resolution, and therefore, the insertion label "in" may be added before/after "he".

The extraction model may generate a sequence of tags corresponding to the original text data and input the sequence of tags as an extraction result into the generative model.

And step 230, analyzing the original text data and the extraction result through the generated model to obtain a reference resolution text corresponding to the target text.

The electronic equipment can input the extraction result obtained by extracting the model and the original text data into a generation model, and further analyze the original text data and the extraction result through the generation model to obtain a final reference resolution text. The extraction result is a reference resolution result preliminarily obtained by the extraction model, and the generated model can be combined with the original text data to further analyze the extraction result to obtain a more accurate reference resolution text. Further, the generated model can be combined with context data in the original text data to rewrite the extracted result so as to obtain the reference resolution text which is more in line with the context and the semantics.

The generated model is obtained by training according to a second training sample, and the second training sample can comprise an original text sample, an extracted sample result obtained by training the original text sample through the trained extracted model, and a reference resolution sample result corresponding to the original text sample.

When the extraction model is obtained by training with the first training sample, the original text sample can be input into the trained extraction model to obtain an extraction sample result corresponding to the original text sample, the extraction sample result, the original text sample and the reference resolution sample result are input into a generation model to be trained, and a prediction reference resolution result corresponding to the original text sample is obtained through the generation model. The predicted reference resolution result can be compared with a reference resolution sample result corresponding to the original text sample, the reference resolution sample result is an actual reference resolution result of the original text sample, parameters of the generated model are adjusted according to the comparison result, so that the distance between the predicted reference resolution result and the reference resolution sample result obtained by the generated model is smaller than a preset second distance threshold, the training of the generated model is completed, the reference resolution text obtained by the generated model is enabled to fit the actual reference resolution result, and the accuracy of the generated model is improved.

FIG. 3B is a diagram that illustrates generating a reference resolution text, in one embodiment. As shown in fig. 3B, the electronic device may input the original text data into the extraction model 310, obtain an extraction result through the extraction model 310, input the original text data and the extraction result output by the extraction model 310 into the generation model 320, and obtain the reference resolution text through the generation model 320.

In the embodiment of the application, the initial information extraction and the reference can be carried out on the original text data through the extraction model, the extraction result is processed through the generation model, and the final reference resolution text is obtained. And the generation model obtains the reference resolution text based on the extraction result of the extraction model and the original text data, so that the obtained reference resolution text is more accurate, and the quality of the reference resolution task can be improved.

In some embodiments, the step of obtaining raw text data may comprise: the method comprises the steps of obtaining a conversation text in a latest conversation round and an N-wheel conversation text before the latest conversation round, wherein the conversation text in the latest conversation round is a target text needing to be subjected to a reference resolution task, and generating an original text sequence according to the N-wheel conversation text and the target text. The characters in the original text sequence can be sequentially ordered according to the time sequence of the obtained conversation text, and the characters can be conveniently further processed subsequently by generating the text sequence.

Optionally, after acquiring the dialog text in the latest dialog round and N pairs of dialog texts before the latest dialog round, the electronic device may sequence the dialog texts in sequence from front to back according to the number of turns of the dialog texts, and add interval labels between different pairs of dialog texts to generate the original text sequence. The interval label can be used for separating different wheel-dialog texts, for example, the different wheel-dialog texts are separated by the label [ sep ], and meanwhile, a text start label can be added before the first acquired dialog round, and a text end label can be added after the last dialog round.

For example, the dialog text acquired by the electronic device is:

a: will you play basketball and will you play a basketball mani?

B: of course

A: change the heaven to beat together

The original text sequence generated may be: [ begin ] you will play basketball and lam? "sep" will be played together with the change of day

Alternatively, adding a space tag between different wheel dialog texts may be adding a wheel start tag before the first character of each wheel dialog text and adding a wheel speed tag after the last character. For example, the start of round tag may be PAD, the END of round tag may be END, and the generated original text sequence may be: will you play basketball with PAD? END PAD will of course turn END PAD to another.

By adding the interval labels among different dialog texts, the subsequent extraction model and the subsequent generation model can directly acquire each dialog, so that the extraction model and the generation model can be used for analyzing characters in the original text sequence more quickly and accurately, and the processing efficiency is improved.

Further, after the electronic device sorts the dialog text, the electronic device may identify non-chinese characters in the sorted dialog text, such as numbers, punctuation marks, and other non-chinese characters, delete the non-chinese characters in the sorted dialog text or replace the non-chinese characters with a unicode, such as a unified character with a "#" sign, to obtain the original text sequence. The electronic device can input the original text sequence into the extraction model and the generation model to obtain the reference resolution text. By carrying out normalization processing on the dialog text, unimportant characters in the dialog text are removed, the processing amount of subsequent model extraction and model generation can be reduced, and the processing speed is improved.

In some embodiments, the step of analyzing the original text data and the extracted result by generating a model to obtain a reference resolution text corresponding to the target text includes: and acquiring text characteristics of the original text data through the generation model, and processing the target text according to the text characteristics and the extraction result to obtain the reference resolution text meeting the requirement of semantic smoothness. The semantic currency can be used for describing the characteristics of the text such as language logic, word use accuracy and currency, and the higher the semantic currency is, the more the meaning resolution text conforms to the language logic specification used by human.

When the target text is subjected to the reference resolution, a situation that one candidate pronoun refers to a plurality of candidate entities at the same time may exist, and when the reference resolution task is performed, a situation that logic is not communicated may exist when the plurality of candidate entities are embedded into the target text, so that the semantic smoothness of the reference resolution text is low. For example, the original text data is "do you find apples or pears delicious? Almost all the people are good, wherein the insertion positions of the candidate entities of the apple and the pear are before the 'city', and if the candidate entities are directly embedded, the candidate entities of the apple and the pear are good, so that the candidate entities are not smooth.

The generation model can analyze the original text data to obtain the text features of the original text data, and the extraction result is rewritten by combining the text features, instead of embedding the candidate entity into the labeled insertion position according to the extraction result, so that the output reference resolution text meets the requirement of semantic compliance. For example, the original text data is "do you find apples or pears delicious? Almost all the people are good, candidate entities marked by the extraction result comprise apples and pears, the insertion positions are all before the cities, the generation model can rewrite the extraction result to obtain a reference resolution text, the apples and the pears are good, and the requirement of semantic smoothness is met. The output reference resolution text can be more accurate and accords with language logicality.

The extraction result marks the insertion position of the candidate entity in the target text, the generation model can embed the candidate entity into the corresponding insertion position in the target text, and if one insertion position only corresponds to one candidate entity and the insertion position has no corresponding candidate pronouns (no candidate pronouns beside the insertion position), the candidate entity is directly embedded into the insertion position. If an insertion position only corresponds to one candidate entity and the insertion position has a corresponding candidate pronoun, the candidate pronoun can be replaced by the candidate entity.

In some embodiments, after the generation model embeds the candidate entity into the corresponding insertion position in the target text, the semantic compliance of the target text in which the candidate entity is embedded may be calculated, and the semantic compliance may be used to reflect the language logicality, compliance, rationality, and the like of the target text in which the candidate entity is embedded. Alternatively, the semantic compliance may be expressed by a score value or the like, and the higher the value, the greater the compliance is. The generated model is trained on the actual reference resolution sample result of the original text sample, and the actual reference resolution sample result is a smooth text, so that the generated model can learn the characteristics of the smooth and coherent text, and the smoothness of the target text embedded with the candidate entity can be evaluated.

If the semantic smoothness of the target text embedded with the candidate entity is lower than the semantic smoothness threshold, the generation model can add consecutive characters between the embedded candidate entity and adjacent characters, so that the semantic smoothness of the target text embedded with the candidate entity is not lower than the semantic smoothness threshold, and the target text embedded with the candidate entity can be output as a reference resolution text. The consecutive characters may be obtained from a dictionary and may include, but are not limited to, conjunctions, prepositions, co-words, etc., such as, for example, adding, summing, and, being, etc. consecutive characters between adjacent candidate entities.

For example, the original text data is "do you know a novel XX lam? It was not heard. Y novel, inserting the candidate entity 'XX' into the front of the candidate entity 'Y' by the generation model to obtain the novel of the text 'XXY', calculating to obtain that the semantic currency of the text is lower than a semantic currency threshold value, and further adding coherent characters between the 'XX' and the 'Y' to obtain the novel referring resolution text 'XX is the novel of the Y' meeting the requirement of the semantic currency.

In some embodiments, if an insertion position corresponds to at least two candidate entities, the generative model may embed the at least two candidate entities in the same insertion position of the target text according to the text features and the extraction result, and calculate a first semantic compliance between the at least two candidate entities. The generative model may determine whether the first semantic compliance is below a semantic compliance threshold, and if so, may add consecutive characters between the embedded adjacent candidate entities, such that the first semantic compliance between the at least two candidate entities is not lower than the semantic compliance threshold.

After the consecutive characters are added, the generation model can recalculate the first semantic compliance between the two candidate entities with the consecutive characters added, and continuously judge whether the first semantic compliance is lower than a semantic compliance threshold value, if the first semantic compliance is lower than the semantic compliance threshold value, the consecutive characters can be obtained from the dictionary again and added between the adjacent embedded candidate entities until the first semantic compliance between at least two candidate entities is not lower than the semantic compliance threshold value.

In some embodiments, if an insertion position corresponds to at least two candidate entities, the generative model may embed the at least two candidate entities in the same insertion position of the target text according to the text features and the extraction result, and calculate a second semantic compliance between the at least two candidate entities and other characters in the target text. Wherein the other characters in the target text can be adjacent characters or adjacent words and the like of the at least two embedded candidate entities. And if the second semantic compliance is lower than the semantic compliance threshold, adjusting the embedding sequence of the at least two candidate entities at the same inserting position so that the second semantic compliance between the at least two candidate entities and other characters in the target text is not lower than the semantic compliance threshold.

For example, the original text data is "how can you speak? The method cannot be used. What is what? "wherein," voice "is the first candidate entity," can't send out "is the second candidate entity, directly imbed the candidate entity behind" what "for what", the text that gets is "can't send out for what voice", the semantic smoothness is low, unsatisfied with the requirement, then can adjust the order between "voice" and "can't send out", get the text is cleared up for what that satisfies the semantic smoothness requirement is sent out for what.

In the embodiment of the application, the generation model can calculate the semantic compliance of the target text embedded with the candidate entity, and the target text is rewritten according to the semantic compliance to obtain the reference resolution text meeting the requirement of the semantic compliance, so that the accuracy of the generated reference resolution text can be improved, the reference resolution text is more in line with human expression logic, and the intelligence of a machine is improved.

As shown in fig. 4, in an embodiment, the step of obtaining the text feature of the original text data by generating the model, and processing the target text according to the text feature and the extraction result to obtain the reference resolution text meeting the requirement of semantic smoothness may include the following steps:

step 402, obtaining a first feature vector corresponding to each character in the original text data through the generation model, and obtaining a second feature vector corresponding to each character according to a label corresponding to each character in the extraction result.

In some embodiments, the generative model may be a model constructed based on a Long Short-Term Memory network (LSTM), which is a time-cycling neural network, or a model constructed based on a self-attention mechanism, such as a transform model. The self-attention mechanism is a mechanism for simulating the attention of human vision, and the human vision generally does not observe a scene from beginning to end and all at a time when the human vision perceives things, but often observes a specific part according to requirements. And when people find that a scene often appears something they want to observe in a certain part, people can learn to pay attention to the part when similar scenes reappear in the future.

The generating model can comprise an encoder and a decoder, the encoder has a character representation function, the electronic equipment can input an original text sequence and a label sequence output by the extraction model into the encoder, the encoder converts each character in the original text sequence into a corresponding first feature vector, and converts a label corresponding to each character in the label sequence into a second feature vector.

Optionally, the first feature vector and the second feature vector may be embedded vectors (Embedding), which refers to converting each character into a vector representation with a fixed length, so as to facilitate digital processing. The encoder may convert each character in the original text sequence to a first feature vector in numerical terms and each tag in the sequence of tags to a second feature vector in numerical terms. The length of the embedded vector may be an artificially set length, such as 300, 200, 256, etc., but is not limited thereto.

And 404, fusing the first feature vector and the second feature vector of each character through the generation model to obtain a target feature vector of each character, and rewriting the target text according to the target feature vector of each character to obtain the reference resolution text meeting the requirement of semantic smoothness.

In one embodiment, the decimation model may generate a sequence of tags corresponding to the original text sequence based on a predefined set of tags. The tag set may include at least a candidate entity start tag, a candidate entity end tag, a candidate pronoun start tag, a candidate pronoun end tag, and an insertion position tag, but is not limited thereto.

For example, the original text sequence is:

will you play basketball with PAD? END PAD will be played together with END PAD after another day;

the tag sequence generated may be:

。。。。es ee。。。。。。。。。。ps pe。。。in。；

wherein es is a candidate entity start tag, ee is a candidate entity end tag, ps is a candidate pronoun start tag, pe is a candidate pronoun end tag, in is an insertion position tag, and other characters in the original text sequence are uniformly tagged ". "label out.

In some embodiments, for each character in the original text sequence, taking a first character as an example, the first character may be any character in the original text sequence (one tag in the original text sequence may be used as one character), the encoder may obtain a first feature vector corresponding to the first character and a second feature vector corresponding to the tag corresponding to the first character, and concatenate the first feature vector and the second feature vector to obtain a target feature vector of the first character. The encoder can obtain the coding information corresponding to the input original text sequence according to the target characteristic vector of each character, the coding information is input into the decoder, each output character is predicted in sequence by the decoder, and an output text sequence is obtained, wherein the output text sequence is the reference resolution text.

Because the tag sequence also comprises an insertion position tag, if the insertion position tag is adjacent to the candidate pronoun starting tag or the candidate pronoun ending tag, the insertion position tag and the adjacent candidate pronoun starting tag or candidate pronoun ending tag can be used as a common tag, and a corresponding second feature vector is obtained through calculation.

If the inserting position label can be adjacent to the uniform label, the inserting position label and the adjacent uniform label can be used as a common label, and a corresponding second feature vector can be obtained through calculation. For example, the tag sequence may be: . . . . es ee. . . . . . . . . . ps pe. . . in. (ii) a With the insert position label and the uniform label ". If the second feature vector is adjacent to the previous unified tag, the second feature vector may be calculated to obtain a corresponding second feature vector, where the second feature vector may correspond to a character "typing" corresponding to the previous unified tag, or may be calculated to obtain a corresponding second feature vector with the next unified tag as a common tag, which is not limited herein.

As another embodiment, the extraction result already includes a reference relationship, and the insertion position corresponding to the insertion position tag in the original text sequence can be determined according to the insertion position tag in the tag sequence, and the characters of the candidate entity to be inserted are embedded into the insertion position to obtain a new text sequence, where the new text sequence is in a one-to-one correspondence relationship with the tag sequence, and then the target feature vector of each character can be obtained according to the first feature vector corresponding to each character of the new text sequence and the second feature vector corresponding to the tag.

For example, the original text sequence is: what is more commander PAD you feel Tom and Roger? END PAD I feel that they are both commander END; the corresponding tag sequences are: . . . . es ee. es ee. . . . . . . . pe/ps. . in pe ps. . . . . (ii) a The new text sequence generated may be what is more commander for PAD you feel tom and rogen? The END PAD I feel Thomju that they are both commander END, wherein the labels corresponding to the embedded characters of Thomju can be insertion position labels of 'in'. Therefore, the target characteristic vector of each character in the new text sequence can be obtained according to the first characteristic vector of each character in the new text sequence and the second characteristic vector of the corresponding label.

As another embodiment, since the insertion position in the tag sequence is before or after a certain character, the insertion position can be understood as no corresponding character in the original text sequence, after the generation model obtains the tag sequence, according to the insertion position tag in the tag sequence, a substitute character, such as "&", "^" or other symbols, can be embedded in the insertion position corresponding to the target text, and then feature vector conversion is performed to ensure that the target feature vector corresponding to each character in the original text sequence is obtained, which facilitates further processing by the encoder to obtain encoded information, which can be understood as text features corresponding to the original text sequence, and the encoded information can be hidden state features or self-attention features.

After the encoder inputs the coding information into the decoder, the decoder can obtain the current output sequence according to the coding information and the previous output sequence, and sequentially output each character of the reference resolution text according to the time sequence until the stop sign of the output sequence, so that the complete reference resolution text is obtained. For example, the original text sequence is: what is more commander PAD you feel Tom and Roger? The END PAD I think that they are general, the decoder firstly obtains an output sequence according to the coding information input by the coder: < sos >, which is a sequence start symbol, and then obtaining an output sequence according to the output sequence < sos > and the coding information: and (2) obtaining an output sequence according to the output sequence < sos > I and the coding information: < sos > i feel, … …, and so on, until the output sequence < sos > i feel tom and rodgers are very commander < eos >, the output is stopped, < eos > is the sequence stop sign, and the obtained reference resolution text is: i feel that both Tom and Roger are very commander.

The generative model is trained based on the actual reference resolution sample result of the original text sample, and the actual reference resolution sample result is a smooth text, so that the generative model can output the reference resolution text meeting the requirement of semantic smoothness.

In the embodiment of the application, the generated model can be combined with the context data in the original text data and the extraction result of the extraction model to rewrite the extraction result, and the generated reference resolution texts are the context data and the target text from the original text data and are strongly guided by the extraction result, so that the obtained reference resolution texts are more accurate, and the quality of the reference resolution task can be improved.

As shown in fig. 5, in one embodiment, another text processing method is provided, which can be applied to the electronic device described above, and the method can include the following steps:

step 502, inputting the first training sample into a pre-trained natural language processing model, and labeling the original text sample of the first training sample through the natural language processing model to obtain a prediction extraction result.

The natural language model is obtained by pre-training according to text data in a corpus, has strong character representation capability and can accurately convert characters into corresponding feature vectors represented by numbers. In the embodiment of the application, the electronic device may perform secondary training on the pre-trained natural language processing model by using a first training sample, where the first training sample may include an original text sample and a labeled sample, extract key information of the original text sample by using the pre-trained natural language processing model, label the original text sample, and obtain a prediction extraction result.

Step 504, comparing the prediction extraction result with the labeled sample in the first training sample, and calculating the result loss.

Optionally, after the natural language processing model obtains the prediction extraction result, a loss function may be used to calculate a loss of the prediction extraction result with respect to the labeled sample, where the loss may be used to explain an error between the prediction extraction result and the labeled sample. Further, the predicted extraction result may be compared with the labeled sample corresponding to the original text sample, and a distance between the predicted extraction result and the labeled sample may be calculated by an algorithm such as a euclidean distance and a manhattan distance, which is not limited in the embodiment of the present application.

Step 506, adjusting parameters of the natural language processing model according to the result loss so as to train to obtain the extraction model.

If the distance between the prediction extraction result and the labeled sample is larger than the first distance threshold, the obtained prediction extraction result is not in accordance with the expectation, the parameters of the natural language processing model can be adjusted, the adjusted natural language processing model is trained again by using a new first training sample, and the extraction model is trained until the distance between the obtained prediction extraction result and the labeled sample is smaller than the first distance threshold, namely the result loss meets the expectation.

The extraction model is obtained by further training on the basis of the pre-trained natural language processing model, so that training can be completed only by less labeled data (namely the first training sample), a large number of training samples are not needed, and the extraction model has strong text representation capability, reduces the training difficulty while achieving the extraction effect, and improves the training efficiency.

And step 508, inputting the original text sample into the extracted model obtained by training to obtain an extracted sample result corresponding to the original text sample, and training the generation model according to the original text sample, the corresponding extracted sample result and the corresponding reference digestion sample result.

The electronic equipment can input the original text sample into the extracted model obtained by training to obtain an extracted sample result corresponding to the original text sample, and train the generated model by taking the original text sample, the corresponding extracted sample result and the corresponding reference resolution sample result as a second training sample, so that the predicted reference resolution text output by the generated model is close to the actual reference resolution sample result.

Step 510, obtaining raw text data, where the raw text data at least includes a target text that needs to be subjected to a reference resolution task.

The description of step 510 may refer to the related descriptions in the above embodiments, and is not repeated herein.

And step 512, respectively generating third feature vectors corresponding to the characters contained in the original text data through the extraction model, and performing label prediction on the characters according to the third feature vectors and the label set to obtain a label sequence corresponding to the original text data.

The extraction model is obtained by further training on the basis of the pre-trained natural language processing model, so that the extraction model has strong text representation capability. After the electronic equipment inputs the original text data into the extraction model, the extraction model can convert each character contained in the original text data into a corresponding third feature vector, and the third feature vector can represent the character in a digital mode. The extraction model can identify whether each character belongs to the key information and belongs to which category (such as candidate entity, candidate pronoun and the like) in the key information based on the third feature vector corresponding to each character, and can also identify the words with reference relationship in the original text data, so that each character can be labeled according to each label predefined in the label set, label prediction is carried out on each character, and a label sequence corresponding to the original text data is obtained.

Optionally, the tag set at least includes a candidate entity start tag, a candidate entity end tag, a candidate pronoun start tag, a candidate pronoun end tag, an insertion position tag, and the like, and by tagging characters, key information and a reference relationship in original text data can be more accurately described, so that the processing efficiency and the generation effect of the generation model are improved.

And 514, acquiring text characteristics of the original text data through the generated model, and processing the target text according to the text characteristics and the label sequence to obtain the reference resolution text meeting the requirement of semantic smoothness.

The description of step 514 can refer to the related descriptions in the above embodiments, and is not repeated herein.

In the embodiment of the application, the extraction model can be obtained based on the pre-trained natural language processing model training, so that the extraction model has strong text representation capability, the training process is simple, a large number of training samples are not needed, the training cost is reduced, and the training efficiency is improved.

As shown in fig. 6, in an embodiment, a text processing apparatus 600 is provided, which can be applied to the electronic device described above, and the text processing apparatus 600 can include an obtaining module 610, an extracting module 620, and a generating module 630.

The obtaining module 610 is configured to obtain raw text data, where the raw text data includes a target text that needs to be subjected to a reference resolution task.

And the extraction module 620 is configured to extract key information included in the original text data through an extraction model, and label each key information to obtain an extraction result, where the key information at least includes a candidate entity in the original text data and an insertion position of the candidate entity in the target text, and the extraction model is obtained by training according to a first training sample, where the first training sample includes an original text sample and a labeled sample corresponding to the original text sample.

The generating module 630 is configured to analyze the original text data and the extracted result through a generated model to obtain a reference resolution text corresponding to the target text, where the generated model is obtained by training according to a second training sample, and the second training sample includes the original text sample, an extracted sample result obtained by training the original text sample through the extracted model, and a reference resolution sample result corresponding to the original text sample.

In one embodiment, the obtaining module 610 is further configured to obtain a dialog text in the latest dialog round, and N pairs of dialog texts before the latest dialog round, where N is a positive integer, and the dialog text in the latest dialog round is a target text that needs to be subjected to a reference resolution task, and to generate an original text sequence according to the N pairs of dialog texts and the target text.

In an embodiment, the generating module 630 is further configured to obtain a text feature of the original text data through the generating model, and process the target text according to the text feature and the extraction result to obtain a reference resolution text meeting the requirement of semantic smoothness.

In one embodiment, the generating module 630 includes a smoothness calculating unit and an adjusting unit.

And the conformity degree calculation unit is used for calculating second semantic conformity degrees between the at least two candidate entities and other characters in the target text if the at least two candidate entities are embedded in the same insertion position of the target text according to the text characteristics and the extraction result.

And the adjusting unit is used for adjusting the embedding sequence of the at least two candidate entities at the same inserting position if the second semantic currency is lower than the semantic currency threshold value, so that the second semantic currency between the at least two candidate entities and other characters in the target text is not lower than the semantic currency threshold value.

In one embodiment, the compliance calculation unit is further configured to calculate a first semantic compliance between at least two candidate entities if the at least two candidate entities are embedded in the same insertion position of the target text according to the text features and the extraction result.

The adjusting unit is further used for adding consecutive characters between the embedded adjacent candidate entities if the first semantic compliance is lower than the semantic compliance threshold value, so that the first semantic compliance between at least two candidate entities is not lower than the semantic compliance threshold value.

In one embodiment, the generating module 630, in addition to the smoothness calculating unit and the adjusting unit, further includes a feature obtaining unit and a rewriting unit.

The feature acquisition unit is used for acquiring a first feature vector corresponding to each character in the original text data through the generated model and acquiring a second feature vector corresponding to each character according to a label corresponding to each character in the extraction result;

and the rewriting unit is used for fusing the first feature vector and the second feature vector of each character through the generated model to obtain a target feature vector of each character, and rewriting the target text according to the target feature vector of each character to obtain the reference resolution text meeting the requirement of semantic smoothness.

In one embodiment, the text processing apparatus 600 includes a first training module and a second training module in addition to the obtaining module 610, the extracting module 620 and the generating module 630.

And the first training module is used for training the pre-trained natural language processing model according to the first training sample to obtain the extraction module.

The first training module comprises a prediction unit, a comparison unit and an adjustment unit.

And the prediction unit is used for inputting the first training sample into a pre-trained natural language processing model, labeling the original text sample of the first training sample through the natural language processing model, and obtaining a prediction extraction result, wherein the natural language model is obtained by pre-training according to text data in the corpus.

And the comparison unit is used for comparing the prediction extraction result with the labeled sample in the first training sample and calculating the result loss.

And the adjusting unit is used for adjusting the parameters of the natural language processing model according to the result loss so as to train to obtain the extraction model.

And the second training module is used for inputting the original text sample into the extracted model obtained by training so as to obtain an extracted sample result corresponding to the original text sample, and training the generation model according to the original text sample, the corresponding extracted sample result and the corresponding reference resolution sample result.

In an embodiment, the extracting module 620 is further configured to generate third feature vectors corresponding to each character included in the original text data through the extracting model, and perform label prediction on each character according to the third feature vectors and a label set to obtain a label sequence corresponding to the original text data, where the label set at least includes a candidate entity start label, a candidate entity end label, a candidate pronoun start label, a candidate pronoun end label, and an insertion position label.

Fig. 7 is a block diagram of an electronic device in one embodiment. As shown in fig. 7, electronic device 700 may include one or more of the following components: a processor 710, a memory 720 coupled to the processor 710, wherein the memory 720 may store one or more computer programs that may be configured to be executed by the one or more processors 710 to implement the methods as described in the various embodiments above.

Processor 710 may include one or more processing cores. The processor 710 interfaces with various components throughout the electronic device 700 using various interfaces and circuitry to perform various functions of the electronic device 700 and process data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 720 and invoking data stored in the memory 720. Alternatively, the processor 710 may be implemented in hardware using at least one of Digital Signal Processing (DSP), Field-Programmable Gate Array (FPGA), and Programmable Logic Array (PLA). The processor 710 may integrate one or more of a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a modem, and the like. Wherein, the CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing display content; the modem is used to handle wireless communications. It is understood that the modem may not be integrated into the processor 710, but may be implemented by a communication chip.

The Memory 720 may include a Random Access Memory (RAM) or a Read-Only Memory (ROM). The memory 720 may be used to store instructions, programs, code sets, or instruction sets. The memory 720 may include a program storage area and a data storage area, wherein the program storage area may store instructions for implementing an operating system, instructions for implementing at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing the various method embodiments described above, and the like. The storage data area may also store data created during use by the electronic device 700, and the like.

It is understood that the electronic device 700 may include more or less structural elements than those shown in the above structural block diagrams, for example, a power module, a physical button, a WiFi (Wireless Fidelity) module, a speaker, a bluetooth module, a sensor, etc., and is not limited thereto.

The embodiment of the application discloses a computer readable storage medium, which stores a computer program, wherein the computer program realizes the method described in the above embodiment when being executed by a processor.

Embodiments of the present application disclose a computer program product comprising a non-transitory computer readable storage medium storing a computer program, and the computer program, when executed by a processor, implements the method as described in the embodiments above.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a non-volatile computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the program is executed. The storage medium may be a magnetic disk, an optical disk, a ROM, etc.

Any reference to memory, storage, database, or other medium as used herein may include non-volatile and/or volatile memory. Suitable non-volatile memory can include ROM, Programmable ROM (PROM), Erasable PROM (EPROM), Electrically Erasable PROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM), which acts as external cache memory. By way of illustration and not limitation, RAM is available in many forms, such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDR SDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), and Direct Rambus DRAM (DRDRAM).

It should be appreciated that reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present application. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. Those skilled in the art should also appreciate that the embodiments described in this specification are all alternative embodiments and that the acts and modules involved are not necessarily required for this application.

In various embodiments of the present application, it should be understood that the size of the serial number of each process described above does not mean that the execution sequence is necessarily sequential, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation on the implementation process of the embodiments of the present application.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated units, if implemented as software functional units and sold or used as a stand-alone product, may be stored in a computer accessible memory. Based on such understanding, the technical solution of the present application, which is a part of or contributes to the prior art in essence, or all or part of the technical solution, may be embodied in the form of a software product, stored in a memory, including several requests for causing a computer device (which may be a personal computer, a server, a network device, or the like, and may specifically be a processor in the computer device) to execute part or all of the steps of the above-described method of the embodiments of the present application.

The text processing method, the text processing apparatus, the electronic device, and the computer-readable storage medium disclosed in the embodiments of the present application are described in detail above, and specific examples are applied herein to illustrate the principles and implementations of the present application, and the descriptions of the above embodiments are only used to help understand the method and the core ideas of the present application. Meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. A method of text processing, comprising:

2. The method according to claim 1, wherein the analyzing the original text data and the extraction result through the generative model to obtain the reference resolution text corresponding to the target text comprises:

and acquiring text features of the original text data through a generating model, and processing the target text according to the text features and the extraction result to obtain a reference resolution text meeting the requirement of semantic currency.

3. The method according to claim 2, wherein the obtaining of the text features of the original text data through the generative model and the processing of the target text according to the text features and the extraction result to obtain the reference resolution text meeting the requirement of semantic smoothness comprises:

obtaining a first feature vector corresponding to each character in the original text data through a generation model, and obtaining a second feature vector corresponding to each character according to a label corresponding to each character in the extraction result;

and fusing the first characteristic vector and the second characteristic vector of each character through the generation model to obtain a target characteristic vector of each character, and rewriting the target text according to the target characteristic vector of each character to obtain a reference resolution text meeting the requirement of semantic smoothness.

4. The method according to claim 2 or 3, wherein the processing the target text according to the text features and the extraction result to obtain a reference resolution text meeting a requirement of semantic currency comprises:

if at least two candidate entities are embedded in the same insertion position of the target text according to the text features and the extraction result, calculating a first semantic compliance between the at least two candidate entities;

and if the first semantic compliance is lower than a semantic compliance threshold, adding coherent characters between the embedded adjacent candidate entities so that the first semantic compliance between the at least two candidate entities is not lower than the semantic compliance threshold.

5. The method according to claim 2 or 3, wherein the processing the target text according to the text features and the extraction result to obtain a reference resolution text meeting a requirement of semantic currency comprises:

if at least two candidate entities are embedded in the same insertion position of the target text according to the text features and the extraction result, calculating second semantic compliance between the at least two candidate entities and other characters in the target text;

if the second semantic compliance is lower than a semantic compliance threshold, adjusting the embedding sequence of the at least two candidate entities at the same inserting position so that the second semantic compliance between the at least two candidate entities and other characters in the target text is not lower than the semantic compliance threshold.

6. The method according to any one of claims 1 to 3, wherein the extracting key information contained in the original text data through an extraction model, and labeling each key information to obtain an extraction result comprises:

and respectively generating third eigenvectors corresponding to each character contained in the original text data through the extraction model, and performing label prediction on each character according to the third eigenvectors and a label set to obtain a label sequence corresponding to the original text data, wherein the label set at least comprises a candidate entity start label, a candidate entity end label, a candidate pronoun start label, a candidate pronoun end label and an insertion position label.

7. The method of any of claims 1 to 3, wherein said obtaining raw text data comprises:

obtaining a dialog text in a latest round of dialog and an N-round dialog text before the latest round of dialog, wherein N is a positive integer, and the dialog text in the latest round of dialog is a target text which needs to be subjected to a reference resolution task;

and generating an original text sequence according to the N pairs of dialog texts and the target text.

8. The method of claim 1, wherein prior to said obtaining raw text data, the method further comprises:

inputting a first training sample into a pre-trained natural language processing model, labeling an original text sample of the first training sample through the natural language processing model to obtain a prediction extraction result, wherein the natural language model is obtained by pre-training according to text data in a corpus;

comparing the predicted extraction result with the labeled samples in the first training sample, and calculating the result loss;

and adjusting parameters of the natural language processing model according to the result loss so as to train to obtain an extraction model.

9. A text processing apparatus, comprising:

10. An electronic device comprising a memory and a processor, the memory having stored thereon a computer program that, when executed by the processor, causes the processor to carry out the method of any one of claims 1 to 8.

11. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1 to 8.