CN112463942B

CN112463942B - Text processing method, text processing device, electronic equipment and computer readable storage medium

Info

Publication number: CN112463942B
Application number: CN202011446124.4A
Authority: CN
Inventors: 张超
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd; Shenzhen Huantai Technology Co Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd; Shenzhen Huantai Technology Co Ltd
Priority date: 2020-12-11
Filing date: 2020-12-11
Publication date: 2024-08-20
Anticipated expiration: 2040-12-11
Also published as: CN112463942A

Abstract

The embodiment of the application discloses a text processing method, a text processing device, electronic equipment and a computer readable storage medium. The method comprises the following steps: acquiring original text data, wherein the original text data at least comprises a target text required to perform an index resolution task; extracting key information contained in the original text data through an extraction model, and marking each key information to obtain an extraction result; and analyzing the original text data and the extraction result through a generation model to obtain an reference digestion text corresponding to the target text. The text processing method, the text processing device, the electronic equipment and the computer readable storage medium can simplify the processing process of the reference resolution task and improve the processing efficiency.

Description

Text processing method, text processing device, electronic equipment and computer readable storage medium

Technical Field

The application relates to the technical field of artificial intelligence, in particular to a text processing method, a text processing device, electronic equipment and a computer readable storage medium.

Background

In the field of artificial intelligence technology, natural language processing (Natural Language Processing, NLP) has been one of the directions of intensive research. In the use process of language, a pronoun or an abbreviation is usually adopted to replace a word which appears before, or a word which appears before is directly omitted, and this situation is called as a "referring phenomenon" in the language. Reference resolution is a fundamental study in the field of NLP, by which the problem of unexplained references in text can be solved, so that the electronic device can better understand the semantics expressed in the text. The existing reference digestion scheme is complex, long in treatment time and low in efficiency.

Disclosure of Invention

The embodiment of the application discloses a text processing method, a text processing device, electronic equipment and a computer readable storage medium, which can simplify the processing process of an index resolution task and improve the processing efficiency.

The embodiment of the application discloses a text processing method, which comprises the following steps:

Acquiring original text data, wherein the original text data at least comprises a target text required to perform an index resolution task;

Extracting key information contained in the original text data through an extraction model, and labeling each key information to obtain an extraction result, wherein the key information at least comprises a candidate entity in the original text data and an insertion position of the candidate entity in the target text, the extraction model is obtained by training according to a first training sample, and the first training sample comprises an original text sample and a labeling sample corresponding to the original text sample;

analyzing the original text data and the extraction result through a generation model to obtain an reference digestion text corresponding to the target text, wherein the generation model is obtained by training according to a second training sample, and the second training sample comprises the original text sample, the extraction sample result obtained by the original text sample through the trained extraction model, and the reference digestion sample result corresponding to the original text sample.

The embodiment of the application discloses a text processing device, which comprises:

the acquisition module is used for acquiring original text data, wherein the original text data comprises target text which needs to be subjected to an reference resolution task;

the extraction module is used for extracting key information contained in the original text data through an extraction model and labeling each key information to obtain an extraction result, wherein the key information at least comprises a candidate entity in the original text data and an insertion position of the candidate entity in the target text, the extraction model is obtained by training according to a first training sample, and the first training sample comprises an original text sample and a labeling sample corresponding to the original text sample;

The generation module is used for analyzing the original text data and the extraction result through a generation model to obtain an reference digestion text corresponding to the target text, wherein the generation model is obtained by training according to a second training sample, and the second training sample comprises the original text sample, the extraction sample result obtained by the original text sample through the trained extraction model, and the reference digestion sample result corresponding to the original text sample.

The embodiment of the application discloses an electronic device, which comprises a memory and a processor, wherein the memory stores a computer program, and the computer program when executed by the processor causes the processor to realize the method.

The embodiment of the application discloses a computer readable storage medium, on which a computer program is stored, which when being executed by a processor, implements the method as described above.

According to the text processing method, the device, the electronic equipment and the computer readable storage medium disclosed by the embodiment of the application, key information contained in original text data is extracted through an extraction model, each key information is marked to obtain an extraction result, the key information at least comprises candidate entities in the original text data and insertion positions of the candidate entities in a target text, the extraction model is obtained by training according to a first training sample, the first training sample comprises the original text sample and marked samples corresponding to the original text sample, the original text data and the extraction result are analyzed through a generation model, the reference digestion text corresponding to the target text is obtained, the generation model is obtained by training according to a second training sample, the second training sample comprises an extraction sample result obtained by training the original text sample and the extraction model, and an reference digestion sample result corresponding to the original text sample, the extraction model can be used for carrying out preliminary information extraction and reference on the original text data, and the extraction result is processed through the generation model to obtain a final reference text, the complex reference task is decomposed into two relatively simple reference digestion tasks, and the digestion process efficiency of the reference digestion task can be improved. And the generation model is used for obtaining the reference resolution text based on the extraction result of the extraction model and the original text data, so that the obtained reference resolution text is more accurate, and the quality of the reference resolution task can be improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1A is an application scenario diagram of a text processing method in one embodiment;

FIG. 1B is a schematic diagram of an electronic device performing an reference resolution task in one embodiment;

FIG. 2 is a flow diagram of a method of text processing in one embodiment;

FIG. 3A is a schematic diagram of an original text sample and a labeling sample in one embodiment;

FIG. 3B is a schematic diagram of generating reference digested text in one embodiment;

FIG. 4 is a flow diagram of generating reference digested text by generating a model in one embodiment;

FIG. 5 is a flow chart of a text processing method in another embodiment;

FIG. 6 is a block diagram of a text processing device in one embodiment;

Fig. 7 is a block diagram of an electronic device in one embodiment.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

It should be noted that the terms "comprising" and "having" and any variations thereof in the embodiments of the present application and the accompanying drawings are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus.

It will be understood that the terms first, second, etc. as used herein may be used to describe various elements, but these elements are not limited by these terms. These terms are only used to distinguish one element from another element. For example, a first training sample may be referred to as a second training sample, and similarly, a second training sample may be referred to as a first training sample, without departing from the scope of the application. Both the first training sample and the second training sample are training samples, but they are not the same training samples.

Reference resolution refers to the partitioning of different references (Mention) representing the same Entity (Entity) into an equivalent set, i.e., establishing a chain of references between the references and the Entity. Reference resolution plays an important role in machine-reading understanding, information extraction, multi-round conversations, and other tasks, and can help electronic devices more easily understand the meaning expressed in text. In the related art, the reference resolution task is mainly performed in two ways:

Mode one: the rule is used to perform an reference resolution task. Logic rules are formed through a large amount of manually constructed domain and language knowledge and are used to reference the text for resolution. The method needs a large amount of manual participation, the automation degree of the system is very low, the treatment efficiency is low, and the portability is poor.

Mode two: the reference resolution task is performed based on an end-to-end model. End-to-end models, such as convolutional neural networks (Convolutional Neural Networks, CNN) and the like, search for pronouns in the text first, search for candidate entities in the text, and link the relationship between the entities and the pronouns through one-to-one matching and sorting to execute an referring resolution task. When the reference digestion task is carried out, the output of the previous loop in the end-to-end model is taken as the input of the next loop, and the serial task is executed, so that the error prediction of a certain link in front can influence the next link, and the error expansion can influence the final result, so that the result is uncontrollable. And the complexity of the end-to-end model is high, the processing time is long, and the processing efficiency is low.

The embodiment of the application provides a text processing method, a text processing device, electronic equipment and a computer readable storage medium, which are applicable to machine reading understanding, intelligent man-machine multi-round dialogue and other scenes, can simplify the processing process of an index digestion task, improve the processing efficiency, enable the obtained index digestion text to be more accurate and improve the quality of the index digestion task.

Fig. 1A is an application scenario diagram of a text processing method in one embodiment. As shown in fig. 1A, the text processing method may be applied to a multi-turn dialog scenario, where the application scenario may include a user and an electronic device 10, and the user may conduct a conversation through the electronic device 10. The session may be with a user of another electronic device or with an intelligent voice program on the electronic device 10. The electronic device 10 may obtain the dialog text for one or more rounds of dialog and take the dialog text as raw text data. The electronic device 10 may extract key information included in the original text data through the extraction model, label each key information to obtain an extraction result, perform preliminary information extraction and indexing on the original text data by using the extraction model, and analyze the original text data and the extraction result through the generation model to obtain an index resolution text corresponding to the target text.

FIG. 1B is a schematic diagram of an electronic device performing an reference resolution task, in one embodiment. As shown in fig. 1B, in the multi-turn conversation scenario, user a has performed 3-turn conversations with user B via electronic device 10, round 1: you can play basketball prayer, round 2: of course, o: the mixture is played together after changed. The electronic device 10 may take the 3 rounds of dialogue text as the original text data and input the original text data into the extraction model to obtain an extraction result, and input the original text data and the extraction result together into the generation model, where the generation model rewrites the 3rd round of dialogue text according to the extraction result and the original text data to obtain the reference digestion text "play basketball together after changing the day".

As shown in fig. 2, in an embodiment, a text processing method is provided, which may be applied to the above electronic device, where the electronic device may include a terminal device such as a mobile phone, an intelligent wearable device, a tablet computer, a personal computer (Personal Computer, PC), a vehicle-mounted terminal, and may also include a service device such as a server, a server cluster, and the embodiment of the present application is not limited thereto. The method may comprise the steps of:

at step 210, raw text data is obtained, the raw text data including at least target text for which an reference resolution task is desired.

In some embodiments, the original text data may include dialog text, which may refer to dialog text between a user of the electronic device and a user of another electronic device, or may refer to dialog text between a user of the electronic device and an automatically answered language program, and the target text may refer to the last round of dialog text. The original text data may also include a certain paragraph text in an article, or a certain paragraph text published on social media, etc., and the content of the original text data may be determined according to different application scenarios.

In some embodiments, the original text data may include, in addition to the target text required to perform the reference resolution task, context data of the target text, where the context data may be used to reflect a context of the target text, so that the electronic device may be assisted in better identifying semantics of the original text data, and entities referred to in the target text may be more accurately analyzed by the context data of the target text, thereby improving accuracy of the reference resolution task.

In one embodiment, in a multi-turn dialog scenario, the original text data may include dialog text for all turns in a current dialog scenario, which may refer to a dialog window that the user is opening this time in the electronic device, and the context data for the target text may include dialog text for all turns preceding the last turn of dialog text. Alternatively, the context data of the target text may also include N rounds of dialog text before the last round of dialog, where N may be a predetermined positive integer, for example, 3 rounds of dialog text before the last round of dialog are taken as context data, or 5 rounds, 6 rounds, etc. of dialog text before the last round of dialog are taken as context data, but not limited thereto. Alternatively, the context data of the target text may also include dialog text within a preset time period before the last round of dialog, for example, dialog text within 3 minutes, within 10 minutes, or the like before the last round of dialog, but is not limited thereto.

As another specific embodiment, in a scenario where an article is read and understood, the original text data may include text of a paragraph in which the target text is located, and the context data of the target text may include other text in the paragraph than the target text. Alternatively, the context data of the target text may also include M texts preceding the target text in the paragraph, where M may be a predetermined positive integer, for example, but not limited to, 2 texts preceding the target text, 3 texts, etc. in the paragraph.

It should be noted that, the scenario of performing the reference resolution task is not limited to the above-listed scenarios, and the target text and the context data may be set and selected according to the actual requirements, which is not limited in the embodiment of the present application.

And 220, extracting key information contained in the original text data through an extraction model, and labeling each key information to obtain an extraction result.

The key information may at least include candidate entities in the original text data and insertion positions of the candidate entities in the target text, where the candidate entities may refer to entities referred to by the pronouns when the target text is subjected to reference resolution, or entities omitted from the target text. Further, the key information may further include candidate pronouns in the original text data, where the candidate pronouns refer to words in the target text referring to other entities, and the candidate pronouns may have a reference relationship with the candidate entities or may not have a reference relationship.

For example, the original text data includes "you will play a basketball prayer? Of course, the method is o. The get around of the change is made-! ", where a candidate entity may include" basketball ", a candidate pronoun may include" changing days ", but the two do not have a reference relationship. The original text data includes "how do you know thomson? I do not know him, "candidate entities may include" tom, "candidate pronouns may include" he, "both of which have a reference relationship.

The extraction model may be a classification model with the ability to perform an reference resolution task, a sequence annotation model with the ability to perform an reference resolution task, or the like. The extraction model may segment the original text data, identify the parts of speech of each term in the original text data, and determine the reference relationships between each term.

The extraction model can identify the preceding language, the corresponding language and the insertion position of the preceding language in the target text, wherein the preceding language, the corresponding language and the inserting position of the preceding language in the original text data can be identified, the corresponding language can refer to the word corresponding to the initial position in the reference relation, the preceding language is the word referred to by the corresponding language, and the preceding language and the corresponding language can form a reference chain. Illustratively, the raw text data includes "how do you know how strong? I do not know him, "wherein" Xiaoqiang "and" he "have reference relationships, and" he "is the corresponding language and" Xiaoqiang "is the preceding language. The candidate entity may include a precursor in the original text data, and the candidate pronoun may include a corresponding in the original text data.

Alternatively, in the original text data, one of the phrases may correspond to one or more preceding phrases, one of the preceding phrases may also correspond to one or more of the phrases, and if the target text is zero reference text (meaning that no pronoun is included in the target text), one of the preceding phrases may also not correspond to the phrase. The key information can be extracted from the original text data by identifying the insertion positions of the preceding language, the corresponding language and the preceding language with the reference relation in the original text data in the target text through the extraction model, and the reference relation between the candidate entity and the candidate pronoun is established.

The electronic equipment can input the obtained original text data into an extraction model, the extraction model can extract text features in the original text data, and reference digestion is carried out on the target text according to the text features, and candidate entities, candidate pronouns and insertion positions of the candidate entities in the target text contained in the original text data are marked so as to obtain extraction results. Wherein the insertion location of the candidate entity in the target text may refer to the location of the candidate entity added to the target text when the target text is reference resolved. If the candidate entity and the candidate pronoun have a reference relationship, the insertion position of the candidate entity in the target text can be the position of the candidate pronoun, and if the target text does not contain the candidate pronoun or the candidate pronoun and the candidate entity do not have a reference relationship, the insertion position of the candidate entity in the target text is the position of the omitted reference entity in the target text.

The extraction model can be obtained by training according to a first training sample, the first training sample can comprise an original text sample and a labeling sample corresponding to the original text sample, and the labeling sample can label key information in the original text sample.

For example, the original text sample is "I prefer singer A". I also like. I prefer his work a ", the corresponding annotation sample may comprise candidate entities: singer A; candidate pronouns: he; insertion position: he. Alternatively, the insertion position of the candidate entity may be directly represented by the candidate pronoun of the insertion, or may be represented by the character position where the insertion is located, for example, the number of dialogues (or sentence number) and the character sequence number may be used, for example, the insertion position in the above may be the 7 rd character of the 3 rd round, and may be directly represented by the format of (3, 7) or the like.

And training the extraction model by using the first training sample, inputting the first training sample into the extraction model to be trained, predicting the extraction model to obtain a predicted extraction result corresponding to the original text sample, comparing the predicted extraction result with a labeling sample corresponding to the original text sample, and adjusting parameters of the extraction model according to the comparison result until the distance between the predicted extraction result obtained by the extraction model and the labeling sample is smaller than a preset first distance threshold value, thereby completing the extraction model training.

Because the labeling sample of the first training sample comprises labeled candidate entities, candidate pronouns and insertion positions of the candidate entities, the extraction model can learn based on the first training sample to obtain the capability of identifying the candidate entities, the candidate pronouns and performing reference resolution, so that the extraction model fits the corresponding labeling sample to the extraction result obtained by predicting the original text sample, and the extraction model has the reference resolution capability. The extraction model labels the original text data to obtain an extraction result, the extraction result can be used as a preliminary reference digestion result of the original text data, and the electronic equipment can easily know how to perform reference digestion on the target text based on the extraction result.

In some embodiments, different types of key information may be labeled with different labels, where candidate entities may be labeled with a first label, candidate pronouns may be labeled with a second generation label, and insertion locations may be labeled with a third label, e.g., candidate entities may be labeled with labels es, candidate pronouns may be labeled with labels ps, and insertion locations may be labeled with labels in. Further, the labeling sample may include a label sequence corresponding to the original text sample, where the label sequence may include a label corresponding to each character in the original text sample, and a label at an insertion position, and other characters except for the key information may be labeled with a unified label, for example, labels such as "×", "-", etc., but not limited thereto.

FIG. 3A is a schematic diagram of an original text sample and a labeling sample in one embodiment. As shown in fig. 3A, the original text sample is "how do you like player C? One of the love is that he is playing the ball and the marshal-! The "corresponding labeling samples may be" ("A A") and "(" in B ") where a candidate entity may be denoted by a label a, a candidate pronoun is denoted by a label B, and in denotes the insertion position of the candidate entity. "player C" is a candidate entity, labeled "AA", and "he" is a candidate pronoun, labeled "B", and refers to the need to insert "player C" at the location of "he" at the time of digestion, so the insert tag "in" may be added before/after "he".

The extraction model may generate a tag sequence corresponding to the original text data and input the tag sequence as the extraction result into the generation model.

And 230, analyzing the original text data and the extraction result through a generation model to obtain an reference digestion text corresponding to the target text.

The electronic device may input the extraction result obtained by the extraction model and the original text data into a generation model, and further analyze the original text data and the extraction result by the generation model to obtain a final reference resolved text. The extraction result is an reference digestion result obtained by the extraction model preliminarily, and the generated model can be combined with the original text data to further analyze the extraction result, so that a more accurate reference digestion text is obtained. Further, the generative model may rewrite the extraction results in conjunction with the context data in the original text data to obtain reference resolved text that is more context-and semantic-compliant.

The generating model is obtained by training according to a second training sample, and the second training sample can comprise an original text sample, an extracted sample result obtained by the original text sample through a trained extracted model, and an reference digestion sample result corresponding to the original text sample.

When the first training sample is used for training to obtain an extraction model, the original text sample can be input into the trained extraction model to obtain an extraction sample result corresponding to the original text sample, the extraction sample result, the original text sample and the reference resolution sample result are input into a generation model to be trained, and a prediction reference resolution result corresponding to the original text sample is obtained through the generation model. The predicted reference digestion result and the reference digestion sample result corresponding to the original text sample can be compared, the reference digestion sample result is the actual reference digestion result of the original text sample, and parameters of the generation model are adjusted according to the comparison result, so that the distance between the predicted reference digestion result obtained by the generation model and the reference digestion sample result is smaller than a preset second distance threshold value, training of the generation model is completed, and the reference digestion text predicted by the generation model fits the actual reference digestion result, so that the accuracy of the generation model is improved.

FIG. 3B is a schematic diagram of generating reference digested text in one embodiment. As shown in fig. 3B, the electronic device may input the original text data into the extraction model 310, obtain an extraction result through the extraction model 310, input the original text data and the extraction result output by the extraction model 310 into the generation model 320, and obtain the reference resolved text through the generation model 320.

In the embodiment of the application, the original text data can be subjected to preliminary information extraction and reference through the extraction model, the extraction result is processed through the generation model to obtain the final reference resolution text, according to the embodiment of the application, the extraction type + generation type reference resolution scheme is adopted, the complex reference resolution task is decomposed into two relatively simple tasks, the processing process of the reference resolution task can be simplified, and the processing efficiency is improved. And the generation model is used for obtaining the reference resolution text based on the extraction result of the extraction model and the original text data, so that the obtained reference resolution text is more accurate, and the quality of the reference resolution task can be improved.

In some embodiments, the step of obtaining raw text data may include: and acquiring the dialogue text in the last round of dialogue and N rounds of dialogue text before the last round of dialogue, wherein the dialogue text in the last round of dialogue is the target text required to perform the reference resolution task, and generating an original text sequence according to the N rounds of dialogue text and the target text. The characters in the original text sequence can be orderly ordered according to the acquired time sequence of the dialogue text, and the characters can be conveniently and further processed by generating the text sequence.

Optionally, after acquiring the dialog text in the last dialog and N dialog texts before the last dialog, the electronic device may sort the dialog texts sequentially from front to back according to the number of dialog texts, and add interval labels between different dialog texts to generate the original text sequence. The interval label can be used for separating different rounds of dialogue texts, for example, the interval of different rounds of dialogue texts is separated by a label [ sep ], and meanwhile, a text start label can be added before the first round of dialogue is acquired, and a text end label can be added after the last round of dialogue.

For example, the dialog text acquired by the electronic device is:

A: you will play a basketball prayer?

B: of course o

A: the people are changed to get together

The original text sequence generated may be [ begin ] you will play a basketball prayer? [ sep ] will of course be referred to as "end" together with the "sep" changed day by day

Alternatively, the addition of a space tag between different rounds of dialog text may be the addition of a round start tag before the first character of each round of dialog text and a round end tag after the last character. For example, the wheel start tag may be PAD and the wheel END tag may be END, and taking the above-mentioned dialogue text as an example, the generated original text sequence may be: PAD you will play basketball praise? The END PAD will of course make the END together with the END PAD changed day by day.

Through adding interval labels between different rounds of dialogue texts, the subsequent extraction model and the generation model can directly acquire the dialogue of each round, so that the extraction model and the generation model can be helped to analyze characters in an original text sequence faster and more accurately, and the processing efficiency is improved.

Further, after sorting the dialog text, the electronic device may identify non-chinese characters in the sorted dialog text, for example, non-chinese characters such as numerals, punctuations, etc., and may delete the non-chinese characters in the sorted dialog text or replace the non-chinese characters with unicode, for example, uniformly replace with "#" signs, etc., to obtain the original text sequence. The electronic device can input the original text sequence into a extraction model and generate a model to obtain an reference digested text. By carrying out normalization processing on the dialogue text and eliminating unimportant characters in the dialogue text, the processing amount of the follow-up extraction model and the generation model can be reduced, and the processing speed is improved.

In some embodiments, the step of analyzing the original text data and the extraction result by generating a model to obtain an reference resolved text corresponding to the target text includes: and obtaining text characteristics of the original text data through the generation model, and processing the target text according to the text characteristics and the extraction result to obtain an reference resolution text meeting the semantic smoothness requirement. The semantic meaning degree can be used for describing the characteristics of language logicality, word accuracy, smoothness and the like of the text, and the higher the semantic meaning degree is, the more the description meaning resolution text accords with the language logic specification used by human beings.

When reference digestion is performed on a target text, a situation that one candidate pronoun refers to a plurality of candidate entities simultaneously exists, and when reference digestion tasks are performed, logic failure exists when the plurality of referenced candidate entities are embedded into the target text, so that the semantic smoothness of the reference digested text is low. For example, the raw text data is "do you feel apple or pear good? The candidate entities are just like apples and pears, and the insertion positions of the candidate entities are just before the apple and the pear, and the candidate entities are just like apples and pears if the candidate entities are directly embedded, so that the candidate entities are just like pears and are not smooth.

The generation model can analyze the original text data to obtain text characteristics of the original text data, and rewrite the extraction result by combining the text characteristics, instead of embedding the candidate entity into the insertion position of the label according to the extraction result, so that the output reference resolution text meets the requirement of semantic smoothness. For example, the raw text data is "do you feel apple or pear good? The candidate entities marked by the extraction results comprise apples and pears, the insertion positions are in front of the apples and the pears, the generation model can rewrite the extraction results, and the index digestion text apples and pears are obtained, so that the semantic smoothness requirement is met. The output reference resolution text can be more accurate and accords with language logic.

The extraction result marks the insertion position of the candidate entity in the target text, the generated model can embed the candidate entity in the corresponding insertion position in the target text, and if one insertion position corresponds to only one candidate entity and the insertion position does not have the corresponding candidate pronoun (the insertion position is adjacent to the candidate pronoun), the candidate entity is directly embedded in the insertion position. If an insertion location corresponds to only one candidate entity and the insertion location has a corresponding candidate pronoun, the candidate pronoun may be replaced with the candidate entity.

In some embodiments, after the candidate entity is embedded into the corresponding insertion position in the target text by the generating model, the semantic prosity of the target text embedded with the candidate entity can be calculated, and the semantic prosity can be used for reflecting the language logic, prosity, rationality and the like of the target text embedded with the candidate entity. Alternatively, the semantic smoothness may be represented by a fractional number or the like, with higher numbers indicating greater smoothness. Because the generated model is trained based on the actual reference digested sample result of the original text sample, which is the through text, the generated model can learn the characteristics of the text with the through consistency, so that the through consistency of the target text embedded with the candidate entity can be evaluated.

If the semantic smoothness of the target text embedded with the candidate entity is lower than the semantic smoothness threshold, a generation model can add consecutive characters between the embedded candidate entity and adjacent characters so that the semantic smoothness of the target text embedded with the candidate entity is not lower than the semantic smoothness threshold, and the target text embedded with the candidate entity can be output as an index resolution text. The consecutive characters may be retrieved from a dictionary and may include, but are not limited to, hyphens, prepositions, co-words, etc., such as added and/or added, and/or added between adjacent candidate entities.

For example, the original text data is "you know the novel XX mani? Not speaking. And (3) before the candidate entity ' XX ' is inserted into ' Y ', the generating model obtains the novel of the text ' XXY ', and the semantic smoothness of the text is calculated to be lower than the semantic smoothness threshold value, and then consecutive characters are further added between the ' XX ' and the ' Y ', so that the novel of the reference digestion text ' XX is Y which meets the semantic smoothness requirement is obtained.

In some embodiments, if one insertion position corresponds to at least two candidate entities, the generating model may embed the at least two candidate entities in the same insertion position of the target text according to the text feature and the extraction result, and calculate a first semantic smoothness between the at least two candidate entities. The generation model may determine whether the first semantic meaning is below a semantic meaning threshold, and if so, may add consecutive characters between the embedded adjacent candidate entities such that the first semantic meaning between the at least two candidate entities is not below the semantic meaning threshold.

After adding the consecutive characters, the generating model can recalculate the first semantic meaning of the two candidate entities added with the consecutive characters, and continuously judge whether the first semantic meaning is lower than a semantic meaning threshold, if so, the generating model can acquire the consecutive characters from the dictionary again and add the consecutive characters between the embedded adjacent candidate entities until the first semantic meaning of the at least two candidate entities is not lower than the semantic meaning threshold.

In some embodiments, if one insertion position corresponds to at least two candidate entities, the generating model may embed the at least two candidate entities in the same insertion position of the target text according to the text feature and the extraction result, and calculate a second semantic smoothness between the at least two candidate entities and other characters in the target text. Wherein the other characters in the target text may be adjacent characters or adjacent words, etc. to the embedded at least two candidate entities. And if the second semantic meaning is lower than the semantic meaning threshold, adjusting the embedding sequence of the at least two candidate entities at the same insertion position so that the second semantic meaning between the at least two candidate entities and other characters in the target text is not lower than the semantic meaning threshold.

For example, the original text data is "how can you speech? Can not be sent out. Is what is done? The method comprises the steps of firstly, embedding a candidate entity into a text, namely ' why voice is not sent ', and the text is low in semantic smoothness and does not meet the requirement, wherein ' voice ' is a first candidate entity, ' not sent ' is a second candidate entity, and the text is directly embedded into the back of ' why ', so that the sequence between ' voice ' and ' not sent ' can be adjusted, and the reference digested text ' why meeting the semantic smoothness requirement is obtained and the voice is not sent.

In the embodiment of the application, the generation model can calculate the semantic smoothness of the target text embedded with the candidate entity, and rewrite the target text according to the semantic smoothness, so as to obtain the reference digestion text meeting the semantic smoothness requirement, and the accuracy of the generated reference digestion text can be improved, so that the reference digestion text is more in line with human expression logic, and the intelligence of the machine is improved.

As shown in fig. 4, in one embodiment, the steps of obtaining text features of original text data by generating a model, and processing a target text according to the text features and the extraction result to obtain an reference resolved text meeting the semantic smoothness requirement may include the following steps:

Step 402, obtaining a first feature vector corresponding to each character in the original text data through a generation model, and obtaining a second feature vector corresponding to each character according to the labels corresponding to each character in the extraction result.

In some embodiments, the generation model may be a model constructed based on an LSTM (Long Short-Term Memory network), which is a time-cycled neural network, or a model constructed based on a self-attention mechanism, such as a transducer model, etc. Self-attention mechanisms are mechanisms for simulating the attention of human vision, which typically does not see a scene from head to tail every time it perceives something, but rather tends to observe a particular portion of attention as desired. And when a person finds that a scene often appears in a portion where he wants to observe, the person will learn to pay attention to the portion when a similar scene appears again in the future.

The generating model may include an encoder and a decoder, the encoder may have a character characterization function, the electronic device may input the original text sequence and a tag sequence output by the extracting model into the encoder, the encoder may convert each character in the original text sequence into a corresponding first feature vector, and may convert a tag corresponding to each character in the tag sequence into a second feature vector.

Optionally, the first feature vector and the second feature vector may be embedded vectors (Embedding), where the embedded vectors refer to a vector representation that converts each character to a fixed length, thereby facilitating digital processing. The encoder may convert each character in the original text sequence into a first feature vector that is represented numerically and each tag in the tag sequence into a second feature vector that is represented numerically. The length of the embedded vector may be a length set by man, for example, 300, 200, 256, etc., but is not limited thereto.

And step 404, fusing the first feature vector and the second feature vector of each character through a generation model to obtain a target feature vector of each character, and rewriting a target text according to the target feature vector of each character to obtain an reference resolution text meeting the semantic smoothness requirement.

In one embodiment, the extraction model may generate a tag sequence corresponding to the original text sequence based on a predefined set of tags. The tag set may include, but is not limited to, at least a candidate entity start tag, a candidate entity end tag, a candidate pronoun start tag, a candidate pronoun end tag, and an insert position tag.

For example, the original text sequence is:

PAD you will play basketball praise? The END PAD can of course make the END of the END PAD changed to the day;

The tag sequence generated may be:

。。。。es ee 。。。。。。。。。。ps pe 。。。in 。；

Wherein es is a candidate entity start tag, ee is a candidate entity end tag, ps is a candidate pronoun start tag, pe is a candidate pronoun end tag, in is an insertion position tag, and other characters in the original text sequence are unified tags. "labeling".

In some embodiments, for each character in the original text sequence, taking a first character as an example, the first character may be any character in the original text sequence (a tag in the original text sequence may be used as a character), and the encoder may obtain a first feature vector corresponding to the first character and a second feature vector corresponding to the tag corresponding to the first character, and splice the first feature vector and the second feature vector to obtain a target feature vector of the first character. The encoder can obtain coding information corresponding to the input original text sequence according to the target feature vector of each character, the coding information is input into the decoder, each character output by the decoder is predicted in sequence, and an output text sequence is obtained, and the output text sequence is the reference digestion text.

Because the tag sequence further includes an insertion position tag, if the insertion position tag is adjacent to the candidate pronoun start tag or the candidate pronoun end tag, the insertion position tag and the adjacent candidate pronoun start tag or candidate pronoun end tag can be used as a common tag, and a corresponding second feature vector is calculated.

If the insertion position label can be adjacent to the unified label, the insertion position label and the adjacent unified label can be used as a common label, and a corresponding second feature vector can be calculated. For example, the tag sequence may be: . . . . es ee. . . . . . . . . . ps pe. . . in. ; wherein the position tag and the unified tag are inserted. The "adjacent" may be used as a common label with the adjacent previous unified label, and a corresponding second feature vector may be calculated, where the second feature vector may correspond to the character "beat" corresponding to the previous unified label, or may be calculated with the adjacent next unified label as a common label, and the corresponding second feature vector is not limited herein.

As another implementation manner, the extraction result already includes an indication relationship, an insertion position corresponding to the insertion position label in the original text sequence can be determined according to the insertion position label in the label sequence, characters of a candidate entity to be inserted are embedded into the insertion position to obtain a new text sequence, and the new text sequence is a one-to-one relationship with the label sequence, and then the target feature vector of each character can be obtained according to the first feature vector corresponding to each character of the new text sequence and the second feature vector corresponding to the label.

For example, the original text sequence is: PAD you feel thomson and rojie which is more Qi? END PAD i feel that they are all very beautiful END; the corresponding tag sequences are: . . . . es ee. es ee. . . . . . . . pe/ps. . in pe ps. . . . . ; the new text sequence generated may be PAD which is more beautiful than thomson and rojie you feel? END PAD i feel Shang Mluo j they all have a general purpose END, where the tag corresponding to the embedded character "Shang Mluo j" may be the insertion location tag "in". Thus, the target feature vector of each character in the new text sequence can be obtained according to the first feature vector of each character in the new text sequence and the second feature vector of the corresponding label.

As another implementation, since the insertion position in the tag sequence is before or after a certain character, the insertion position in the original text sequence may be understood as no corresponding character, after the tag sequence is obtained by the generating model, the generating model may first embed a substitute character, for example "&", "ζ" etc. symbol, in the insertion position corresponding to the target text, and then perform feature vector conversion, so as to ensure that the target feature vector corresponding to each character in the original text sequence is obtained, so that the encoder can conveniently further process to obtain coding information, where the coding information may be understood as text features corresponding to the original text sequence, and the coding information may be hidden state features, self-attention features, etc.

After the encoder inputs the encoded information into the decoder, the decoder can obtain the current output sequence according to the encoded information and the previous output sequence, and sequentially output each character of the reference resolution text according to the time sequence until the output sequence stop symbol, so as to obtain the complete reference resolution text. For example, the original text sequence is: PAD you feel thomson and rojie which is more Qi? END PAD i feel that they are all very beautiful END, the decoder first gets the output sequence according to the encoded information input by the encoder: < sos >, which is a sequence initiator, and then obtaining an output sequence from the output sequence < sos > and the coding information: < sos > me, and then obtaining an output sequence according to the output sequence < sos > me and the coding information: < sos > i feel, … …, and so on until the output sequence < sos > i feel tom and rojie are all very beautiful < eos >, and then output is stopped, wherein < eos > is a sequence stop character, and the obtained reference digestion text is: i feel that both tom and Rojie are very beautiful.

Because the generative model is trained based on the actual reference digested sample results of the original text samples, which are prosodic texts, the generative model is able to output reference digested texts that meet the semantic prosity requirements.

In the embodiment of the application, the generated model can be combined with the context data in the original text data and the extraction result of the extraction model to rewrite the extraction result, and the generated reference digestion text is from the context data and the target text in the original text data and has strong guidance of the extraction result, so that the obtained reference digestion text is more accurate and the quality of the reference digestion task can be improved.

As shown in fig. 5, in one embodiment, another text processing method is provided, which can be applied to the electronic device, and the method can include the following steps:

step 502, inputting the first training sample into a pre-trained natural language processing model, and labeling an original text sample of the first training sample through the natural language processing model to obtain a prediction extraction result.

The natural language processing model is obtained by pre-training according to text data in a corpus, has strong character characterization capability, and can accurately convert characters into corresponding feature vectors expressed by numbers. In the embodiment of the application, the electronic equipment can perform secondary training on the pre-trained natural language processing model by using the first training sample, wherein the first training sample can comprise an original text sample and a labeling sample, the key information of the original text sample is extracted by using the pre-trained natural language processing model, and the original text sample is labeled to obtain a prediction extraction result.

Step 504, comparing the predicted extraction result with the labeling sample in the first training sample, and calculating the result loss.

Optionally, after the natural language processing model obtains the predicted extraction result, a loss function may be used to calculate a loss of the predicted extraction result relative to the labeled sample, where the loss may be used to account for an error between the predicted extraction result and the labeled sample. Further, the predicted extraction result may be compared with a labeling sample corresponding to the original text sample, and the distance between the predicted extraction result and the labeling sample may be calculated by using an euclidean distance, a manhattan distance, or other algorithm, which is not limited in the embodiment of the present application.

And step 506, adjusting parameters of the natural language processing model according to the result loss to train to obtain an extraction model.

If the distance between the predicted extraction result and the labeling sample is greater than the first distance threshold, it can be stated that the obtained predicted extraction result does not conform to the expectation, parameters of the natural language processing model can be adjusted, and the adjusted natural language processing model is continuously trained by using a new first training sample again until the distance between the obtained predicted extraction result and the labeling sample is less than the first distance threshold, that is, the result loss meets the expectation, and the extraction model is obtained through training.

Because the extraction model is obtained by further training on the basis of the pre-trained natural language processing model, the training can be completed only by less marking data (namely the first training sample), a large number of training samples are not needed, and the extraction model has strong text characterization capability, so that the extraction effect is achieved, the training difficulty is reduced, and the training efficiency is improved.

Step 508, inputting the original text sample into the extraction model obtained by training to obtain an extraction sample result corresponding to the original text sample, and training the generated model according to the original text sample, the corresponding extraction sample result and the corresponding reference resolution sample result.

The electronic device can input the original text sample into the extraction model obtained by training to obtain an extraction sample result corresponding to the original text sample, and train the generated model by taking the original text sample, the corresponding extraction sample result and the corresponding reference digestion sample result as a second training sample, so that the predicted reference digestion text output by the generated model is close to the actual reference digestion sample result.

Step 510, obtaining raw text data, the raw text data including at least a target text for which an reference resolution task is required.

The description of step 510 may refer to the related descriptions in the above embodiments, and will not be repeated here.

And step 512, respectively generating third feature vectors corresponding to the characters contained in the original text data through the extraction model, and carrying out label prediction on the characters according to the third feature vectors and the label set to obtain a label sequence corresponding to the original text data.

The extraction model is further trained on the basis of a pre-trained natural language processing model, so that the extraction model has strong text characterization capability. After the electronic device inputs the original text data into the extraction model, the extraction model can convert each character contained in the original text data into a corresponding third feature vector, and the third feature vector can represent the character in a digital mode. The extraction model can identify whether each character belongs to key information and which category (such as candidate entity, candidate pronoun and the like) belongs to the key information based on the third feature vector corresponding to each character, and can also identify words with reference relations in the original text data, so that each character can be marked according to each label predefined in a label set, label prediction is carried out on each character, and a label sequence corresponding to the original text data is obtained.

Optionally, the label set at least includes a candidate entity start label, a candidate entity end label, a candidate pronoun start label, a candidate pronoun end label, an insertion position label, and the like, and by labeling the characters, key information and a reference relationship in the original text data can be more accurately described, so that the processing efficiency and the generating effect of the generating model are improved.

In step 514, the text features of the original text data are obtained through the generation model, and the target text is processed according to the text features and the tag sequence, so that the reference resolution text meeting the semantic smoothness requirement is obtained.

The description of step 514 may refer to the related description in the above embodiments, and will not be repeated here.

In the embodiment of the application, the extraction model can be obtained based on the training of the pre-trained natural language processing model, so that the extraction model has strong text representation capability, the training process is simple, a large number of training samples are not needed, the training cost is reduced, and the training efficiency is improved.

As shown in fig. 6, in one embodiment, a text processing apparatus 600 is provided and may be applied to the above electronic device, where the text processing apparatus 600 may include an obtaining module 610, an extracting module 620, and a generating module 630.

An obtaining module 610 is configured to obtain original text data, where the original text data includes target text that needs to be subjected to an reference resolution task.

The extraction module 620 is configured to extract key information included in the original text data through an extraction model, and label each key information to obtain an extraction result, where the key information at least includes a candidate entity in the original text data and an insertion position of the candidate entity in the target text, the extraction model is obtained by training according to a first training sample, and the first training sample includes an original text sample and a label sample corresponding to the original text sample.

The generating module 630 is configured to analyze the original text data and the extraction result through a generating model, so as to obtain an reference resolved text corresponding to the target text, where the generating model is obtained by training according to a second training sample, and the second training sample includes an original text sample, an extraction sample result obtained by the original text sample through the trained extraction model, and the reference resolved sample result corresponding to the original text sample.

In one embodiment, the obtaining module 610 is further configured to obtain a dialog text in a last dialog and N dialog texts before the last dialog, where N is a positive integer, the dialog text in the last dialog is a target text that needs to perform an instruction resolution task, and generate an original text sequence according to the N dialog texts and the target text.

In one embodiment, the generating module 630 is further configured to obtain text features of the original text data through the generating model, and process the target text according to the text features and the extraction result, so as to obtain the reference resolved text meeting the semantic smoothness requirement.

In one embodiment, the generating module 630 includes a smoothness calculating unit and an adjusting unit.

And the smoothness calculation unit is used for calculating the second semantic smoothness between the at least two candidate entities and other characters in the target text if the at least two candidate entities are embedded in the same insertion position of the target text according to the text characteristics and the extraction result.

And the adjusting unit is used for adjusting the embedding sequence of the at least two candidate entities at the same inserting position if the second semantic smoothness is lower than the semantic smoothness threshold so that the second semantic smoothness between the at least two candidate entities and other characters in the target text is not lower than the semantic smoothness threshold.

In one embodiment, the smoothness calculating unit is further configured to calculate a first semantic smoothness between at least two candidate entities if the at least two candidate entities are embedded in the same insertion position of the target text according to the text feature and the extraction result.

And the adjusting unit is further used for adding consecutive characters between the embedded adjacent candidate entities if the first semantic meaning and smoothness is lower than the semantic meaning and smoothness threshold so that the first semantic meaning and smoothness between at least two candidate entities is not lower than the semantic meaning and smoothness threshold.

In one embodiment, the generating module 630 includes a feature acquiring unit and a rewriting unit in addition to the smoothness calculating unit and the adjusting unit.

The feature acquisition unit is used for acquiring a first feature vector corresponding to each character in the original text data through the generation model, and acquiring a second feature vector corresponding to each character according to the label corresponding to each character in the extraction result;

And the rewriting unit is used for fusing the first feature vector and the second feature vector of each character through the generation model to obtain target feature vectors of each character, and rewriting the target text according to the target feature vectors of each character to obtain the reference resolution text meeting the semantic smoothness requirement.

In one embodiment, the text processing device 600 includes a first training module and a second training module in addition to the obtaining module 610, the extracting module 620, and the generating module 630.

The first training module is used for training the pre-trained natural language processing model according to the first training sample to obtain the extraction module.

The first training module comprises a prediction unit, a comparison unit and an adjustment unit.

The prediction unit is used for inputting the first training sample into a pre-trained natural language processing model, marking the original text sample of the first training sample through the natural language processing model to obtain a prediction extraction result, and the natural language processing model is obtained by pre-training according to the text data in the corpus.

And the comparison unit is used for comparing the prediction extraction result with the labeling sample in the first training sample and calculating the result loss.

And the adjusting unit is used for adjusting the parameters of the natural language processing model according to the result loss so as to train and obtain the extraction model.

The second training module is used for inputting the original text sample into the extraction model obtained by training to obtain an extraction sample result corresponding to the original text sample, and training the generated model according to the original text sample, the corresponding extraction sample result and the corresponding reference resolution sample result.

In one embodiment, the extraction module 620 is further configured to generate third feature vectors corresponding to each character included in the original text data through the extraction model, and perform label prediction on each character according to the third feature vectors and a label set to obtain a label sequence corresponding to the original text data, where the label set at least includes a candidate entity start label, a candidate entity end label, a candidate pronoun start label, a candidate pronoun end label, and an insertion position label.

Fig. 7 is a block diagram of an electronic device in one embodiment. As shown in fig. 7, the electronic device 700 may include one or more of the following components: processor 710, memory 720 coupled to processor 710, wherein memory 720 may store one or more computer programs that may be configured to implement methods as described in the various embodiments above when executed by one or more processors 710.

Processor 710 may include one or more processing cores. The processor 710 utilizes various interfaces and lines to connect various portions of the overall electronic device 700, perform various functions of the electronic device 700, and process data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 720, and invoking data stored in the memory 720. Alternatively, the processor 710 may be implemented in hardware in at least one of digital signal Processing (DIGITAL SIGNAL Processing, DSP), field-Programmable gate array (Field-Programmable GATE ARRAY, FPGA), programmable logic array (Programmable Logic Array, PLA). The processor 710 may integrate one or a combination of several of a central processing unit (Central Processing Unit, CPU), an image processor (Graphics Processing Unit, GPU), and a modem, etc. The CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for being responsible for rendering and drawing of display content; the modem is used to handle wireless communications. It will be appreciated that the modem may not be integrated into the processor 710 and may be implemented solely by a single communication chip.

The Memory 720 may include random access Memory (Random Access Memory, RAM) or Read-Only Memory (ROM). Memory 720 may be used to store instructions, programs, code, sets of codes, or sets of instructions. The memory 720 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for implementing at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing the various method embodiments described above, and the like. The storage data area may also store data created by the electronic device 700 in use, and the like.

It is to be appreciated that the electronic device 700 may include more or fewer structural elements than those described in the above structural block diagrams, including, for example, a power module, a physical key, a WiFi (WIRELESS FIDELITY ) module, a speaker, a bluetooth module, a sensor, etc., and may not be limited herein.

The embodiment of the application discloses a computer readable storage medium storing a computer program, wherein the computer program, when being executed by a processor, implements the method as described in the above embodiment.

Embodiments of the present application disclose a computer program product comprising a non-transitory computer readable storage medium storing a computer program, which when executed by a processor, implements a method as described in the above embodiments.

Those skilled in the art will appreciate that all or part of the processes in the methods of the above embodiments may be implemented by a computer program for instructing relevant hardware, where the program may be stored in a non-volatile computer readable storage medium, and where the program, when executed, may include processes in the embodiments of the methods described above. Wherein the storage medium may be a magnetic disk, an optical disk, a ROM, etc.

Any reference to memory, storage, database, or other medium as used herein may include non-volatile and/or volatile memory. Suitable nonvolatile memory can include ROM, programmable ROM (PROM), erasable PROM (EPROM), electrically Erasable PROM (ELECTRICALLY ERASABLE PROM, EEPROM), or flash memory. Volatile memory can include random access memory (random access memory, RAM), which acts as external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as static RAM (STATIC RAM, SRAM), dynamic RAM (Dynamic Random Access Memory, DRAM), synchronous DRAM (SDRAM), double-data-rate SDRAM (Double DATA RATE SDRAM, DDR SDRAM), enhanced SDRAM (Enhanced Synchronous DRAM, ESDRAM), synchronous link DRAM (SYNCHLINK DRAM, SLDRAM), memory bus Direct RAM (Rambus DRAM, RDRAM), and Direct memory bus dynamic RAM (Direct RambusDRAM, DRDRAM).

It should be appreciated that reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present application. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. Those skilled in the art will also appreciate that the embodiments described in the specification are alternative embodiments and that the acts and modules referred to are not necessarily required for the present application.

In various embodiments of the present application, it should be understood that the sequence numbers of the foregoing processes do not imply that the execution sequences of the processes should be determined by the functions and internal logic of the processes, and should not be construed as limiting the implementation of the embodiments of the present application.

The units described above as separate components may or may not be physically separate, and components shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units described above, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer-accessible memory. Based on this understanding, the technical solution of the present application, or a part contributing to the prior art or all or part of the technical solution, may be embodied in the form of a software product stored in a memory, comprising several requests for a computer device (which may be a personal computer, a server or a network device, etc., in particular may be a processor in a computer device) to execute some or all of the steps of the above-mentioned method of the various embodiments of the present application.

The foregoing has described in detail a text processing method, apparatus, electronic device and computer readable storage medium according to embodiments of the present application, and specific examples have been applied to illustrate the principles and embodiments of the present application, and the above description of the embodiments is only for aiding in understanding the method and core idea of the present application. Meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present application, the present description should not be construed as limiting the present application in view of the above.

Claims

1. A text processing method, comprising:

Analyzing the original text data and the extraction result through a generation model to obtain an reference digestion text corresponding to the target text, wherein the generation model is obtained by training according to a second training sample, and the second training sample comprises the original text sample, the extraction sample result obtained by the original text sample through the trained extraction model and the reference digestion sample result corresponding to the original text sample;

The extraction result comprises a label sequence corresponding to the original text data;

Extracting key information contained in the original text data through an extraction model, and labeling each key information to obtain an extraction result, wherein the extraction result comprises the following steps:

generating third feature vectors corresponding to all characters contained in the original text data through the extraction model, and carrying out label prediction on all the characters according to the third feature vectors and a label set to obtain a label sequence corresponding to the original text data, wherein the label set at least comprises a candidate entity starting label, a candidate entity ending label, a candidate pronoun starting label, a candidate pronoun ending label and an inserting position label.

2. The method according to claim 1, wherein the analyzing the original text data and the extraction result by generating a model to obtain the reference resolved text corresponding to the target text includes:

and obtaining text characteristics of the original text data through a generation model, and processing the target text according to the text characteristics and the extraction result to obtain an reference resolution text meeting the semantic smoothness requirement.

3. The method according to claim 2, wherein the obtaining text features of the original text data by generating a model and processing the target text according to the text features and the extraction result to obtain an reference resolved text meeting a semantic smoothness requirement includes:

Obtaining a first feature vector corresponding to each character in the original text data through a generation model, and obtaining a second feature vector corresponding to each character according to the labels corresponding to each character in the extraction result;

And fusing the first feature vector and the second feature vector of each character through the generation model to obtain target feature vectors of each character, and rewriting the target text according to the target feature vectors of each character to obtain an reference resolution text meeting the semantic smoothness requirement.

4. A method according to claim 2 or 3, wherein said processing said target text according to said text features and said extraction results to obtain reference resolved text meeting semantic smoothness requirements comprises:

If at least two candidate entities are embedded in the same insertion position of the target text according to the text features and the extraction result, calculating a first semantic smoothness between the at least two candidate entities;

and if the first semantic meaning is lower than a semantic meaning threshold, adding a coherent character between the embedded adjacent candidate entities so that the first semantic meaning between the at least two candidate entities is not lower than the semantic meaning threshold.

5. A method according to claim 2 or 3, wherein said processing said target text according to said text features and said extraction results to obtain reference resolved text meeting semantic smoothness requirements comprises:

If at least two candidate entities are embedded in the same insertion position of the target text according to the text features and the extraction result, calculating second semantic smoothness between the at least two candidate entities and other characters in the target text;

And if the second semantic smoothness is lower than a semantic smoothness threshold, adjusting the embedding sequence of the at least two candidate entities at the same insertion position so that the second semantic smoothness between the at least two candidate entities and other characters in the target text is not lower than the semantic smoothness threshold.

6. A method according to any one of claims 1 to 3, wherein said obtaining raw text data comprises:

The method comprises the steps of obtaining dialogue texts in a last round of dialogue and N rounds of dialogue texts before the last round of dialogue, wherein N is a positive integer, and the dialogue texts in the last round of dialogue are target texts required to be subjected to an index resolution task;

And generating an original text sequence according to the N rounds of dialogue texts and the target text.

7. The method of claim 1, wherein prior to said obtaining raw text data, the method further comprises:

Inputting a first training sample into a pre-trained natural language processing model, marking an original text sample of the first training sample through the natural language processing model to obtain a prediction extraction result, wherein the natural language processing model is obtained by pre-training according to text data in a corpus;

comparing the predicted extraction result with a labeling sample in the first training sample, and calculating result loss;

And adjusting parameters of the natural language processing model according to the result loss to train and obtain an extraction model.

8. A text processing apparatus, comprising:

the generation module is used for analyzing the original text data and the extraction result through a generation model to obtain an reference digestion text corresponding to the target text, wherein the generation model is obtained by training according to a second training sample, and the second training sample comprises the original text sample, the extraction sample result obtained by the original text sample through the trained extraction model and the reference digestion sample result corresponding to the original text sample;

The extraction module is further configured to generate third feature vectors corresponding to each character included in the original text data through the extraction model, and perform label prediction on each character according to the third feature vectors and a label set to obtain a label sequence corresponding to the original text data, where the label set at least includes a candidate entity start label, a candidate entity end label, a candidate pronoun start label, a candidate pronoun end label, and an insertion position label.

9. An electronic device comprising a memory and a processor, the memory having stored therein a computer program which, when executed by the processor, causes the processor to implement the method of any of claims 1 to 7.

10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the method according to any of claims 1 to 7.