CN112347790A

CN112347790A - Text processing method and device, computer equipment and storage medium

Info

Publication number: CN112347790A
Application number: CN202011232565.4A
Authority: CN
Inventors: 马东民; 邱学侃
Original assignee: Beijing Lexuebang Network Technology Co ltd
Current assignee: Beijing Lexuebang Network Technology Co ltd
Priority date: 2020-11-06
Filing date: 2020-11-06
Publication date: 2021-02-09
Anticipated expiration: 2040-11-06
Also published as: CN112347790B

Abstract

The present disclosure provides a text processing method, apparatus, computer device, and storage medium, wherein the method comprises: acquiring a feature sentence vector of a text to be processed and feature word vectors corresponding to a plurality of characters in the text to be processed respectively; processing the characteristic sentence vector and the characteristic word vectors corresponding to the plurality of characters respectively through an attention mechanism to obtain correlation degree information between the plurality of characters and the characteristic sentence vector respectively; and obtaining importance degree information corresponding to the characters respectively based on the relevance information between the characters and the feature sentence vectors respectively. According to the method and the device, the importance degrees corresponding to the characters are obtained by representing the feature sentence vectors of the features of the text to be processed and the feature word vectors of the features corresponding to the characters in the text to be processed respectively, the importance degrees of the characters in the text to be processed respectively can be obtained under the condition of not depending on the context association constraint, and the accuracy is higher.

Description

Text processing method and device, computer equipment and storage medium

Technical Field

The present disclosure relates to the field of language word processing technologies, and in particular, to a text processing method, an apparatus, a computer device, and a storage medium.

Background

At present, the screening and determination of important words in sentences are widely applied in the fields of customer service reply, information retrieval, automatic conversation and the like. For example, in the information retrieval process, the importance of each word in the retrieval text input by the user is judged, the keywords are determined based on the importance of different words, and the retrieval is carried out according to the keywords, so that the retrieval result which is more in line with the retrieval requirements of the user is obtained.

At present, a supervised sequence labeling mode is usually adopted to analyze the importance of words in sentences, but the method only has better context association constraint, and the analysis result of the sentences with shorter sentence lengths is more accurate, and the analysis result of the texts with smaller context association or longer sentence lengths has larger error, so that the application scenario of the method has larger limitation.

Disclosure of Invention

The embodiment of the disclosure at least provides a text processing method, a text processing device, computer equipment and a storage medium.

In a first aspect, an embodiment of the present disclosure provides a text processing method, including:

acquiring a feature sentence vector of a text to be processed and feature word vectors corresponding to a plurality of characters in the text to be processed respectively;

processing the characteristic sentence vectors and the characteristic word vectors corresponding to the characters respectively through an attention mechanism to obtain correlation degree information between the characters and the characteristic sentence vectors respectively;

and obtaining importance degree information corresponding to the characters respectively based on the relevance information between the characters and the feature sentence vectors respectively.

In an optional implementation manner, the obtaining a feature sentence vector of a text to be processed and feature word vectors corresponding to a plurality of characters in the text to be processed respectively includes:

acquiring a text to be processed; wherein the text to be processed comprises a plurality of characters;

determining a sentence vector corresponding to the text to be processed based on the text to be processed, and determining word vectors corresponding to a plurality of characters respectively based on the plurality of characters in the text to be processed;

and performing combined feature extraction on the sentence vector and the word vectors corresponding to the characters respectively to obtain a feature sentence vector corresponding to the text to be processed and a feature word vector corresponding to the characters in the text to be processed respectively.

In an optional implementation manner, the performing joint feature extraction on the sentence vector and the word vectors corresponding to the plurality of characters respectively includes:

and performing combined feature extraction on the sentence vectors and the word vectors corresponding to the characters respectively by using a pre-trained neural network to obtain feature sentence vectors corresponding to the text to be processed and feature word vectors corresponding to the characters in the text to be processed respectively.

In an optional implementation manner, the processing, by the attention mechanism, the feature sentence vector and the feature word vectors corresponding to the plurality of characters, to obtain the correlation information between the plurality of characters and the feature sentence vector, includes:

for each character in the plurality of characters, determining a feature word vector corresponding to each character and a distance between the feature sentence vector;

and determining correlation information between each character and the feature sentence vector based on the distance.

In an alternative embodiment, the training the neural network comprises:

obtaining a sample text;

obtaining sample correlation degree information between characters in the sample text and sample sentence vectors of the sample text respectively by using a neural network to be trained;

determining the loss of the neural network to be trained based on the sample correlation degree information, and training the neural network to be trained based on the loss;

and obtaining the neural network through multi-round training of the neural network to be trained.

In an optional implementation manner, the obtaining, based on the information of the correlation between each of the plurality of characters and the feature sentence vector, importance degree information corresponding to each of the plurality of characters includes:

and normalizing the correlation information between the characters and the feature sentence vectors to obtain importance degree information corresponding to the characters.

determining a target sentence vector representing the semantics of the text to be processed based on the relevancy information between the characters and the feature sentence vectors and the word vectors corresponding to the characters;

and determining importance degree information corresponding to the characters respectively based on the target sentence vector and the word vectors corresponding to the characters respectively.

In an optional implementation manner, the determining, based on the information of the correlation degree between each of the plurality of characters and the feature sentence vector and the word vector corresponding to each of the plurality of characters, a target sentence vector characterizing the semantics of the text to be processed includes:

for each character in the plurality of characters, determining an intermediate word vector corresponding to each character based on the correlation information between each character and the feature sentence vector;

and summing the intermediate word vectors corresponding to the characters respectively to obtain the target sentence vector.

In an optional implementation manner, determining, based on the target sentence vector and the word vectors corresponding to the plurality of characters, importance degree information corresponding to the plurality of characters, respectively includes:

for each character in the characters, splicing the word vector corresponding to each character with the target sentence vector to obtain a target vector corresponding to each character;

and obtaining importance degree information corresponding to the characters by utilizing a probability model based on the target vectors corresponding to the characters respectively.

In an alternative embodiment, the probabilistic model includes at least one of:

conditional random field models, maximum entropy models, and markov models.

In an optional embodiment, the method further comprises:

determining a target search term based on the importance degree information corresponding to the characters respectively;

and determining a reply text corresponding to the text to be processed based on the target search word.

In a second aspect, an embodiment of the present disclosure further provides a text processing apparatus, including:

the acquisition module is used for acquiring a feature sentence vector of a text to be processed and feature word vectors corresponding to a plurality of characters in the text to be processed respectively;

the processing module is used for processing the characteristic sentence vectors and the characteristic word vectors corresponding to the characters respectively through an attention mechanism to obtain correlation degree information between the characters and the characteristic sentence vectors respectively;

and the first determining module is used for obtaining importance degree information corresponding to the characters respectively based on the relevance information between the characters and the feature sentence vectors respectively.

In an optional implementation manner, when acquiring a feature sentence vector of a text to be processed and a feature word vector corresponding to each of a plurality of characters in the text to be processed, the acquiring module is configured to:

In an optional implementation manner, when performing joint feature extraction on the sentence vector and the word vectors corresponding to the plurality of characters, the obtaining module is configured to:

In an optional implementation manner, when the feature sentence vector and the feature word vectors corresponding to the plurality of characters respectively are processed by an attention mechanism to obtain the correlation information between the plurality of characters and the feature sentence vector, the processing module is configured to:

In an optional embodiment, the system further comprises a model training module, configured to:

obtaining a sample text;

In an optional implementation manner, when obtaining importance degree information corresponding to each of the plurality of characters based on the correlation information between each of the plurality of characters and the feature sentence vector, the first determining module is configured to:

In an optional implementation manner, when determining a target sentence vector characterizing the semantic meaning of the text to be processed based on the relevancy information between the plurality of characters and the feature sentence vector and the word vectors corresponding to the plurality of characters, the first determining module is configured to:

In an optional implementation manner, the first determining module, when determining the importance degree information corresponding to each of the plurality of characters based on the target sentence vector and the word vectors corresponding to each of the plurality of characters, is configured to:

In an alternative embodiment, the probabilistic model includes at least one of:

conditional random field models, maximum entropy models, and markov models.

In an optional implementation manner, the apparatus further includes a second determining module, configured to:

In a third aspect, this disclosure also provides a computer device, a processor, and a memory, where the memory stores machine-readable instructions executable by the processor, and the processor is configured to execute the machine-readable instructions stored in the memory, and when the machine-readable instructions are executed by the processor, the machine-readable instructions are executed by the processor to perform the steps in the first aspect or any one of the possible implementations of the first aspect.

In a fourth aspect, this disclosure also provides a computer-readable storage medium having a computer program stored thereon, where the computer program is executed to perform the steps in the first aspect or any one of the possible implementation manners of the first aspect.

For the description of the effects of the text processing apparatus, the computer device, and the computer-readable storage medium, reference is made to the description of the text processing method, which is not repeated herein.

According to the embodiment of the invention, the characteristic sentence vector of the text to be processed and the characteristic word vectors corresponding to a plurality of characters in the text to be processed are obtained, and the obtained characteristic sentence vector and the plurality of characteristic word vectors are processed through the attention mechanism, so that the relevancy information of the plurality of characters and the characteristic sentence vectors is obtained, and the importance degree information corresponding to the plurality of characters is obtained. According to the process, the importance degrees corresponding to the characters are obtained by representing the feature sentence vectors of the features of the text to be processed and the feature word vectors capable of representing the features corresponding to the characters in the text to be processed respectively, the importance degrees of the characters in the text to be processed respectively can be obtained under the condition of not depending on the context association constraint, and the accuracy is higher.

In order to make the aforementioned objects, features and advantages of the present disclosure more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings required for use in the embodiments will be briefly described below, and the drawings herein incorporated in and forming a part of the specification illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the technical solutions of the present disclosure. It is appreciated that the following drawings depict only certain embodiments of the disclosure and are therefore not to be considered limiting of its scope, for those skilled in the art will be able to derive additional related drawings therefrom without the benefit of the inventive faculty.

Fig. 1 shows a flowchart of a text processing method provided by an embodiment of the present disclosure;

fig. 2 is a flowchart illustrating a specific method for obtaining a feature sentence vector of a text to be processed and feature word vectors corresponding to a plurality of characters in the text to be processed, according to an embodiment of the present disclosure;

FIG. 3 illustrates a flow chart of a method of training a neural network provided by an embodiment of the present disclosure;

fig. 4 is a flowchart illustrating a specific method for determining sample relevancy information between characters in a sample text and sample sentence vectors of the sample text, respectively, according to an embodiment of the present disclosure;

fig. 5 is a flowchart illustrating a specific method for determining relevancy information between a plurality of characters in a text to be processed and a feature sentence vector according to an embodiment of the present disclosure;

FIG. 6 is a flowchart illustrating a specific method for determining a target sentence vector characterizing the semantics of a text to be processed according to an embodiment of the present disclosure;

fig. 7 is a flowchart illustrating a specific method for determining importance level information corresponding to each of a plurality of characters according to an embodiment of the present disclosure;

FIG. 8 is a schematic diagram of a text processing apparatus provided by an embodiment of the present disclosure;

fig. 9 shows a schematic diagram of a computer device structure provided by an embodiment of the present disclosure.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, not all of the embodiments. The components of embodiments of the present disclosure, as generally described and illustrated herein, may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present disclosure is not intended to limit the scope of the disclosure, as claimed, but is merely representative of selected embodiments of the disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the disclosure without making creative efforts, shall fall within the protection scope of the disclosure.

Research shows that the screening and determination of the important words in the sentences have wide application in many fields, for example, when information retrieval is carried out, the words with higher importance can be determined as key words by judging the importance of each word in a retrieval text input by a user, so that a retrieval result meeting the retrieval requirements of the user can be obtained according to the key words; or when the user automatically asks and answers, the question key point or intention information of the user can be determined by judging the importance of each word in the question or answer information of the user, and the answer text is automatically matched for the user based on the question key point or intention information of the user. Currently, when text is processed, a supervised sequence labeling manner is usually adopted to analyze the importance of words in a sentence, for example, a Bi-directional Long Short-Term Memory + Conditional Random Field joint model (Bi-directional Long Short-Term Memory + Conditional Random Field, Bi-LSTM + CRF) based on a Recurrent Neural Network (RNN) as a basic model. For a long text or a sentence with a poor context association constraint relation, because the RNN model depends on good context association constraint, and after an intermediate vector is generated according to a word vector corresponding to a character, the intermediate vector is utilized to sequentially process the word vector corresponding to the next character, when a word vector corresponding to a character with a longer distance is processed, because the semantic association with the character at a former position is smaller, and the association between the used intermediate vector and the word vector corresponding to the character at the former position is smaller, the importance analysis result of the character in the sentence is inaccurate.

Based on the research, the present disclosure provides a text processing method, in which a feature sentence vector capable of characterizing features of a text to be processed and a feature word vector capable of characterizing features corresponding to a plurality of characters in the text to be processed are obtained, correlation information between each character in the text to be processed and the text to be processed is determined based on the feature sentence vector and the feature word vector, importance degrees corresponding to the plurality of characters are obtained based on the correlation information, importance degrees of the plurality of characters in the text to be processed can be obtained without depending on context association constraints, and the method has higher accuracy.

The above-mentioned drawbacks are the results of the inventor after practical and careful study, and therefore, the discovery process of the above-mentioned problems and the solutions proposed by the present disclosure to the above-mentioned problems should be the contribution of the inventor in the process of the present disclosure.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

To facilitate understanding of the present embodiment, a text processing method disclosed in the embodiments of the present disclosure is first described in detail, and an execution subject of the text processing method provided in the embodiments of the present disclosure is generally a computer device with certain computing capability, where the computer device includes: a terminal device, which may be a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle mounted device, a wearable device, or a server or other processing device. In some possible implementations, the text processing method may be implemented by a processor invoking computer readable instructions stored in a memory.

In some possible implementations, the text processing method provided in the embodiments of the present disclosure is applicable to the internet field, and when the field is specifically the internet online education field, the text processing method is applicable to a scenario where an online customer service of an online education enterprise processes a user session request, which is not limited in this respect.

The following describes a text processing method provided in the embodiments of the present disclosure.

Referring to fig. 1, a flowchart of a text processing method provided in an embodiment of the present disclosure is shown, where the method includes steps S101 to S103, where:

s101: acquiring a feature sentence vector of a text to be processed and feature word vectors corresponding to a plurality of characters in the text to be processed respectively;

s102: processing the characteristic sentence vector and the characteristic word vectors corresponding to the plurality of characters respectively through an attention mechanism to obtain correlation degree information between the plurality of characters and the characteristic sentence vector respectively;

s103: and obtaining importance degree information corresponding to the characters respectively based on the relevance information between the characters and the feature sentence vectors respectively.

The following describes details of S101 to S103.

For the above S101, the text to be processed is, for example, a sentence, and the sentence may include one or more clauses; when the text to be processed includes a plurality of clauses, the plurality of clauses include, for example, a clause that includes fewer characters and is obtained by performing sense group segmentation on a sentence by using punctuation marks such as commas and pause signs, and each clause may include one or more characters. In addition, the text to be processed may not be divided according to punctuation marks in the text to be processed, but may be divided according to the expression meanings of characters in the text to be processed.

In addition, the text to be processed can be regarded as a complete sentence, and the sentence division is not performed on the text.

Specifically, when the text processing method provided by the embodiment of the present disclosure is applied to different scenes, the texts to be processed are also different. For example, in the case of applying the text processing method to information retrieval, a retrieval sentence inputted by a user may be used as a text to be processed, for example, when a user searches a knowledge point of a mathematical discipline after browsing a relevant course on an official website of an internet online education enterprise, "what is a weighted sum operation? ", then" what is a weighted sum operation with this input question? "as the text to be processed; when the text processing method is applied to the customer service reply, the dialogue information input by the user can be used as the text to be processed, for example, when the user communicates with the customer service aiming at the network course, the user sends the communication information of 'good you, and there is course today' to the customer service, and the communication information 'good you, and there is course today' at the moment is used as the text to be processed. Wherein, for the text to be processed, "do you, there is course today", two clauses "do you" and "do there is course today" divided by commas are included.

In some possible embodiments, to improve accuracy, the text to be processed may also include a symbol, e.g., "? "; alternatively, the symbols in the text to be processed may also be converted into corresponding words, e.g. "will? "converted to" do "; alternatively, questions, negatives, etc. may also be focused on to determine the true meaning of the user. For example, "there are classes today? "whether there is a lesson today", "there is a lesson bar today", and "there is no lesson today" can be expressed by "there is a lesson today", and the details are not described here.

In addition, emotion analysis can be performed on the text to be processed, the analysis result is used as the final text to be processed, for example, emotion of the user, such as anger and the like, is judged according to the information input by the user, and after the emotional state of the user is determined, the information input by the user is supplemented to obtain the final text to be processed. For example, "without any reminder, I simply don't know if there is a lesson today, resulting in I missing a lesson, how you are responsible! "angry, missing class" because there is no reminder, and the like, which is not limited in any way.

In one possible embodiment, the text to be processed is not limited to a language, and may include, for example, chinese, english, japanese, korean, or other languages, and may be a combination of languages. For example, the text to be processed may include "How to update my information".

In a possible implementation manner, the text to be processed may also be text information obtained by analyzing audio/video information, picture information, and the like, which is not described in detail herein.

Referring to fig. 2, an embodiment of the present disclosure provides a specific method for obtaining a feature sentence vector of a to-be-processed text and a feature word vector corresponding to each of a plurality of characters in the to-be-processed text, including:

s201: acquiring a text to be processed; wherein the text to be processed comprises a plurality of characters.

For example, in the case of applying the text processing method to information retrieval, a retrieval sentence input by a user in a retrieval box may be acquired; when the text processing method is applied to the customer service reply, the communication information sent by the user in the dialog box can be directly extracted, or the communication information sent by the user is extracted from a database for storing the communication information of the user. The specific obtaining method may be determined according to actual conditions, and is not described herein again.

At this time, the obtained text to be processed includes a plurality of characters. For example, in the case where the to-be-processed text contains "hello, do there are classes today", two clauses "hello" and "do there are classes today" contained in the to-be-processed text may be determined; and a plurality of characters "you", "good", "today", "day", "there", "lesson", and "do".

S202: the method comprises the steps of determining a sentence vector corresponding to a text to be processed based on the text to be processed, and determining word vectors corresponding to a plurality of characters respectively based on the plurality of characters in the text to be processed.

When a sentence vector corresponding to a text to be processed and a word vector corresponding to a plurality of characters in the text to be processed are determined, for example, vector encoding may be performed on the text to be processed and the word vectors corresponding to the plurality of characters in the text to be processed to determine the sentence vector and the plurality of word vectors, and the obtained sentence vector and the plurality of word vectors have the same dimension. Wherein the vector encoding comprises at least one of: One-Hot Encoding (One-Hot Encoding), Label Encoding (Label Encoder, LE).

Illustratively, in the case where the text to be processed contains "hello, do it today", the sample text can be determined using vector encodingThe sentence vectors corresponding to the two clauses "hello" and "do there is course today" contained in the sentence can be expressed as CLS, for example₁And CLS₂. At this time, for the sentence vector CLS corresponding to the clause "hello₁And CLS₁The corresponding characters include: "you", "good", a word vector corresponding to a character may be represented as E, for example₁、E₂(ii) a For sentence vector CLS corresponding to clause' do there are lessons today₂And CLS₂The corresponding characters include: "present", "day", "having", "lesson", and "do", for example, the word vector corresponding to the character may be represented as E₃、E₄、……、E₇。

In addition, "do you have a course today" can be directly used as the same clause, that is, as a complete character string. In this case, the sentence vector of the clause can be expressed as: CLS₃The characters corresponding to the clauses include: "you", "good", "present", "day", "having", "class", and "do", for example, the word vector corresponding to a character may be represented as E₁、E₂、……、E₇。

At this time, the unstructured and non-computable text data can be converted into structured and computable vector data, which is beneficial for further calculation processing of the text to be processed.

S203: and performing combined feature extraction on the sentence vector and the word vectors corresponding to the characters respectively to obtain a feature sentence vector corresponding to the text to be processed and a feature word vector corresponding to the characters in the text to be processed respectively.

When the sentence vector and the word vectors corresponding to a plurality of characters are subjected to the joint feature extraction, for example, the following method may be adopted: and performing combined feature extraction on the sentence vectors and the word vectors corresponding to the characters respectively by using a pre-trained neural network to obtain feature sentence vectors corresponding to the text to be processed and feature word vectors corresponding to the characters in the text to be processed respectively.

The neural network may include, for example, at least one of: language Understanding depth Bidirectional transformer (BERT), Global vector model (Global Vectors for Word reconstruction, Global ve), Word vector generation model (Word to vector, Word2 vec).

When the sentence vectors and the word vectors corresponding to a plurality of characters are subjected to combined feature extraction, the obtained feature sentence vectors are influenced by the sentence vectors and the word vectors corresponding to all the characters, and the feature sentence vectors not only contain the integral features of the text to be processed, but also contain the character features of each character; similarly, the obtained feature word vector is influenced by word vectors corresponding to the feature word vector and word vectors corresponding to other characters and sentence vectors, so that the feature word vector also comprises the features of other characters and the overall features of the text to be processed besides the features of the corresponding characters, and therefore, the association among different characters, characters and the text to be processed is established.

And then, processing the characteristic word vectors respectively corresponding to the characteristic sentence vectors and the plurality of characters through an attention mechanism, wherein when the relevancy information between the plurality of characters and the characteristic sentence vectors is obtained, the obtained relevancy information has higher accuracy, and further, the importance degree information respectively corresponding to each character obtained based on the relevancy information also has higher accuracy.

Referring to fig. 3, an embodiment of the present disclosure further provides a method for training a neural network, including:

s301: sample text is obtained.

For example, the sample text may include text information collected in the network, such as comment information under a web lesson learning software; alternatively, stored historical communication information of the user may be included. The specific method for obtaining the sample text can be determined according to actual needs, and is not described herein again.

Since the character can be directly determined as an important character when only one character is included in the sample text, or whether the character is an important character cannot be determined separately, in the case of determining the sample text, at least two characters are included in the sample text.

For example, in the case where the sample text includes "ask for a question, how to renew a fee", two clauses "ask for a question" and "how to renew a fee" included in the sample text may be determined; and a plurality of sample characters "please", "ask", "like", "what", "continue", and "fee".

S302: and obtaining sample correlation degree information between characters in the sample text and sample sentence vectors of the sample text respectively by using the neural network to be trained.

Referring to fig. 4, an embodiment of the present disclosure provides a specific method for determining sample relevancy information between characters in a sample text and sample sentence vectors of the sample text, where the method includes:

s401: based on the sample text and a plurality of sample characters in the sample text, a sample sentence vector corresponding to the sample text and a sample word vector corresponding to the characters in the sample text respectively can be determined by using vector coding.

For example, in the case that the sample text includes "ask for a question, how to renew a fee", the vector encoding may be used to determine the sample sentence vector corresponding to the two clauses "ask for a question" and "how to renew a fee" contained in the sample text, which may be expressed as cls, for example₁And cls₂(ii) a And a sample word vector corresponding to a plurality of sample characters "please", "ask", "like", "what", "continue", and "fee" included in the sample text, which may be represented as e₁、e₂、……、e₆。

S402: based on a sample sentence vector corresponding to the sample text and sample word vectors corresponding to at least two characters respectively, determining a sample characteristic sentence vector corresponding to the sample sentence vector and sample characteristic word vectors corresponding to the sample characters respectively by using a neural network to be trained.

Illustratively, where the sample text includes "ask for question, how to renew fee", it may be based on the sample sentence vector cls₁And cls₂And a sample word vector e₁、e₂、……、e₆Using the training aidThe neural network extracts the sample characteristics of the sample sentence vector and the sample word vector in a parallel processing mode, and determines the sample characteristic sentence vector corresponding to the sample sentence vector, which can be expressed as cls, for example₁'、cls₂'; and sample feature word vectors respectively corresponding to the sample characters, e.g. represented as o₁、o₂、……、o₆。

Under the condition that a neural network to be trained is used for determining a sample feature sentence vector and a plurality of sample feature word vectors, any sample feature sentence vector and any sample feature word vector are obtained by synthesizing all sample sentence vectors and all sample word vectors in a sample text, namely any sample feature sentence vector and any sample feature word vector both contain the features of all clauses and characters in the sample text. Word vector o with sample features₁For example, a sample word vector e is determined using a neural network to be trained₁Corresponding sample feature word vector o₁Then, the neural network simultaneously synthesizes all the other sample sentence vectors cls₁、cls₂And a sample word vector e₂、e₃、……、e₆The vector feature of (2).

At this time, since any sample feature sentence vector and sample feature word vector obtained both contain the features of all clauses and characters in the sample text, the influence of the context semantic association degree and the sentence length in the sentence can be reduced.

S403: and processing sample characteristic word vectors corresponding to the at least two characters respectively through an attention mechanism to obtain sample correlation degree information between the at least two characters and the characteristic sentence vectors respectively.

In one possible implementation, the Attention focusing model (Attention) may be utilized to process sample word vectors corresponding to at least two characters, and sample relevancy information between the at least two characters and the feature sentence vectors is determined. At least two characters can be obtained by a ranking method, for example, the ranking method includes a document pair ranking method (PR).

At this time, the sample text may be represented as s, for example, and the relative importance between two determined characters in the sample text s may be predetermined, that is, for the two determined characters, one of the characters may be determined to be more important than the other character. The two determined characters may be represented, for example, as ti + and tj-, respectively, where ti and tj represent two different characters, and "+" represents a higher degree of importance, and "-" represents a lower degree of importance.

In this case, the sample correlation information obtained by processing the sample feature word vector by the attention mechanism may be represented as a (s, ti +) and a (s, tj-), for example.

Receiving the above S302, the method for training a neural network further includes:

s303: and determining the loss of the neural network to be trained based on the sample correlation degree information, and training the neural network to be trained based on the loss.

In one possible embodiment, the loss of the neural network to be trained can be determined, for example, using the following equation (1):

L(s,ti+,tj-)＝exp(a(s,ti+))/(exp(a(s,ti+))+exp(a(s,tj-))) (1)

where L (s, ti +, tj-) represents the loss of the neural network to be trained. Under the condition that the importance degree of ti + is higher than that of tj-determined by the neural network, the obtained L (s, ti +, tj-) is a positive value for example, and the judgment of the importance of the neural network on the characters is represented to be correct; under the condition that the importance degree of ti + is lower than that of tj-by utilizing the neural network, the obtained L (s, ti +, tj-) is a negative value for example, and represents that the judgment of the importance degree of the neural network to the characters is wrong.

S304: and obtaining the neural network through multi-round training of the neural network to be trained.

In a possible embodiment, the neural network to be trained may be adjusted and optimized according to the determined loss of the neural network to be trained, wherein the direction of the adjustment and optimization is to obtain more tokens to judge the correct loss L (s, ti +, tj-) for the importance of the character. At this time, the neural network can be trained for multiple rounds by using a plurality of sample texts to obtain the neural network. The detailed process of the multi-round training is not repeated herein.

For example, in the case that the text to be processed contains "how there is course today", the corresponding sentence vector CLS may be matched₂And word vector E₃、E₄、……、E₇Performing combined feature extraction to obtain a feature sentence vector corresponding to the text to be processed, which can be represented as CLS, for example₂' and a feature word vector corresponding to each of a plurality of characters in the text to be processed, for example, can be represented as O₃、O₄、……、O₇. The manner of extracting the joint features is similar to the manner of extracting the joint features of the sample sentence vectors and the sample word vectors in S203, and is not described herein again.

After the neural network is obtained through the training in the process, after the text to be processed is obtained, firstly, the text to be processed is converted into a sentence vector and a word vector, and then the sentence vector and the word vector of the text to be processed are input into the neural network, so that a feature sentence vector of the text to be processed and a feature word vector corresponding to a plurality of characters in the text to be processed are obtained.

For the above step S102, after the feature sentence vector corresponding to the text to be processed and the feature word vectors corresponding to the characters in the text to be processed are obtained in step S101, the feature sentence vector and the feature word vectors corresponding to the characters can be processed by the attention mechanism, so as to obtain the correlation information between the characters and the feature sentence vector.

Referring to fig. 5, an embodiment of the present disclosure further provides a specific method for determining relevance information between a plurality of characters in a text to be processed and a feature sentence vector, where the specific method includes:

s501: and determining the distance between the characteristic word vector and the characteristic sentence vector corresponding to each character for each character in the plurality of characters.

Wherein, the distance between the characteristic word vector corresponding to each character and the characteristic sentence vector comprises at least one of the following: euclidean distance, mahalanobis distance, hamming distance, and manhattan distance. In the case of determining the distance between the feature word vector corresponding to each character and the feature sentence vector, a corresponding distance calculation method may be determined, which is not described herein again.

S502: based on the distance, relevancy information between each character and the feature sentence vector is determined.

In the specific implementation, taking the acquisition of the characteristic word vector corresponding to each character and the euclidean distance between the characteristic sentence vectors as an example, when the euclidean distance between any character and a characteristic sentence vector is large, the correlation degree between the character and the characteristic sentence vector is represented to be low; when the Euclidean distance between any character and the feature sentence vector is small, the relevance between the character and the feature sentence vector is high.

In one possible implementation, for example, the correlation information between each character and the feature sentence vector may be determined based on a relative magnitude relationship of euclidean distances between the feature word vector and the feature sentence vector corresponding to each character, and for example, a proportional relationship between the metric and the euclidean distances between the feature word vector and the feature sentence vector corresponding to each character may be determined as the correlation information between each character and the feature sentence vector, with a maximum distance among the euclidean distances between the feature word vector and the feature sentence vector corresponding to each character as a metric.

For example, in the case that the text to be processed contains "hello, today, there is lesson", the obtained information about the correlation degree between each character and the feature sentence vector in the text to be processed may be represented as R₁、R₂、……、R₇。

For the above S103, the importance level information may include, for example, numerical values representing importance levels, such as 0.1 and 0.4, where 0.1 represents an importance level lower than 0.4, and 0.4 represents an importance level lower than 0.1; alternatively, a series of levels characterizing the importance level is included, e.g., 1, 2, 3.

When the importance degree information includes the number of stages representing the importance level, the meaning of the number of stages may be set, and when the number of stages of the importance level includes 1, 2, and 3, for example, a larger number of stages may be set to represent a higher importance degree, that is, when the importance degree information is 3, the representation importance degree is higher; or the representation importance degree with larger grade number can be set to be lower, namely when the importance degree information is 3, the representation importance degree is lower.

In another embodiment, the value of the degree of correlation may be directly determined as the importance degree information.

In another embodiment, a plurality of relevancy intervals may be predetermined, each relevancy interval corresponding to an importance level value; when the relevance between a certain feature word vector and a feature sentence vector falls into a relevance interval, determining the importance degree value corresponding to the relevance interval as the importance degree information of the feature word vector.

For the above S103, when determining the target sentence vector characterizing the semantics of the text to be processed based on the correlation information between the plurality of characters and the feature sentence vector, at least one of the following (1) and (2) may be adopted, for example and without limitation:

(1): and normalizing the correlation information between the characters and the characteristic sentence vectors to obtain importance degree information corresponding to the characters.

In one possible embodiment, for example, a normalization function (Softmax) may be used to normalize the correlation information between each of the plurality of characters and the feature sentence vector, and obtain importance information of the plurality of characters. In this case, the obtained importance level information of the plurality of characters includes a plurality of numerical values corresponding to the respective characters.

For example, in the case of the text to be processed "hello, there is a lesson today", the obtained importance level information corresponding to the characters "you", "good", "present", "day", "there", "lesson", and "do", respectively, may be represented as 0.05, 0.2, and 0.1, for example, and represent the most important words in the text to be processed as "today" and "lesson there".

(2): determining a target sentence vector representing the semantics of the text to be processed based on the relevancy information between the characters and the characteristic sentence vector and the word vectors corresponding to the characters; and determining importance degree information corresponding to the characters respectively based on the target sentence vector and the word vectors corresponding to the characters respectively.

Referring to fig. 6, an embodiment of the present disclosure provides a specific method for determining a target sentence vector representing semantics of a text to be processed, including:

s601: and for each character in the plurality of characters, determining an intermediate word vector corresponding to each character based on the correlation information between each character and the characteristic sentence vector.

In a possible implementation manner, the relevancy information between each character and the feature sentence vector in the obtained text to be processed contains R₁、R₂、……、R₇And the word vectors corresponding to the characters respectively include E₁、E₂、……、E₇In this case, the word vector corresponding to each character may be weighted by using the correlation information between each character in the text to be processed and the feature sentence vector to obtain an intermediate word vector corresponding to each character, which may be represented as C, for example₁、C₂、……、C₇. Wherein, the intermediate word vector C corresponding to any character_i,i∈[1,7]For example, it can be obtained by using the following formula (2):

C_i＝R_iE_i (2)

s602: and summing the intermediate word vectors corresponding to the plurality of characters respectively to obtain a target sentence vector.

In the case of determining intermediate word vectors corresponding to a plurality of characters, the intermediate word vectors corresponding to the plurality of characters may be summed to obtain a target sentence vector representing the semantics of the text to be processed, which may be represented as CLS, for example^aim。

For example, in the case of representing the target sentence vector by using the matrix vector, the corresponding target sentence vector when the text to be processed contains "hello, today is lesson" can be expressed as the following formula (3), for example:

CLS^aim＝[C₁,C₂,…,C₇] (3)

in the case of determining the target sentence vector, importance degree information corresponding to each of the plurality of characters may be determined.

Referring to fig. 7, an embodiment of the present disclosure provides a specific method for determining importance degree information corresponding to a plurality of characters based on a target sentence vector and word vectors corresponding to the plurality of characters, where the specific method includes:

s701: splicing a word vector and a target sentence vector corresponding to each character to obtain a target vector corresponding to each character;

s702: and obtaining importance degree information corresponding to the characters by utilizing a probability model based on the target vectors corresponding to the characters respectively.

In the case of determining the target sentence vector, the word vector corresponding to each character and the target sentence vector may be spliced to obtain the target vector corresponding to each character, which may be represented as, for example

Representing the target vector corresponding to the ith character.

In the case where the probability model is used to determine corresponding importance level information for target vectors corresponding to a plurality of characters, the probability model includes at least one of: conditional Random Field models (CRF), Maximum Entropy models (MEMM) and Markov Models (MM). In this case, the result output by the probabilistic model may include, for example, importance levels 1, 2, and 3.

For example, when the preset importance level value is higher, the importance is higher, and the text to be processed includes "hello, today have lesson", the obtained importance level information corresponding to a plurality of characters "you", "good", "present", "day", "having", "lesson", and "do", respectively, may be represented as, for example, 1, 3, and 2, and represent the more important words in the text to be processed as "today" and "lesson".

After determining the more important words in the text to be processed, the more important words can be used as the target search words, and the reply text corresponding to the text to be processed is determined by using the target search words.

For example, in a case where it is determined that the more important words included in the text to be processed "how good you are and there is lesson today" are "today" and "there is lesson," today "and" there is lesson "may be used as the target search terms, and then, for example, a standard question sentence with a high matching degree with the target search terms, such as" today's lesson arrangement ", may be determined from a preset standard question-and-answer library as a sentence closest to the text to be processed, and an answer sentence corresponding to the standard question sentence" today's lesson arrangement "in the standard question-and-answer library, for example, includes" there is a math lesson today "as a reply text corresponding to the text to be processed.

After the reply text is determined, the reply text can be sent to the terminal equipment used by the user, so that the system can automatically match the reply text for the user after the user sends a retrieval sentence or communication information, and the user can receive the reply text replied by the system.

It will be understood by those skilled in the art that in the method of the present invention, the order of writing the steps does not imply a strict order of execution and any limitations on the implementation, and the specific order of execution of the steps should be determined by their function and possible inherent logic.

Based on the same inventive concept, a text processing apparatus corresponding to the text processing method is also provided in the embodiments of the present disclosure, and because the principle of the apparatus in the embodiments of the present disclosure for solving the problem is similar to the text processing method described above in the embodiments of the present disclosure, the implementation of the apparatus may refer to the implementation of the method, and repeated details are not described again.

Referring to fig. 8, a schematic diagram of a text processing apparatus provided in an embodiment of the present disclosure is shown, where the apparatus includes: an acquisition module 81, a processing module 82, a first determination module 83; wherein the content of the first and second substances,

the obtaining module 81 is configured to obtain a feature sentence vector of a text to be processed and feature word vectors corresponding to a plurality of characters in the text to be processed, respectively;

a processing module 82, configured to process the feature sentence vector and the feature word vectors corresponding to the multiple characters, respectively, through an attention mechanism, so as to obtain correlation information between the multiple characters and the feature sentence vector, respectively;

the first determining module 83 is configured to obtain importance degree information corresponding to each of the plurality of characters based on the correlation information between each of the plurality of characters and the feature sentence vector.

In an optional implementation manner, when acquiring a feature sentence vector of a text to be processed and a feature word vector corresponding to each of a plurality of characters in the text to be processed, the acquiring module 81 is configured to:

In an optional implementation manner, when performing joint feature extraction on the sentence vector and the word vectors corresponding to the multiple characters, the obtaining module 81 is configured to:

In an optional implementation manner, when the feature sentence vector and the feature word vectors corresponding to the plurality of characters respectively are processed by the attention mechanism to obtain the correlation information between the plurality of characters and the feature sentence vector, the processing module 82 is configured to:

In an alternative embodiment, a model training module 84 is further included for:

obtaining a sample text;

In an optional implementation manner, when obtaining the importance degree information corresponding to each of the plurality of characters based on the correlation information between each of the plurality of characters and the feature sentence vector, the first determining module 83 is configured to:

In an optional implementation manner, when determining the target sentence vector characterizing the semantic meaning of the text to be processed based on the relevancy information between the plurality of characters and the feature sentence vector and the word vectors corresponding to the plurality of characters, the first determining module 83 is configured to:

In an optional implementation manner, the first determining module 83, when determining the importance degree information corresponding to each of the plurality of characters based on the target sentence vector and the word vectors corresponding to each of the plurality of characters, is configured to:

In an alternative embodiment, the probabilistic model includes at least one of:

conditional random field models, maximum entropy models, and markov models.

In an optional implementation, the second determining module 85 is further included to:

The description of the processing flow of each module in the device and the interaction flow between the modules may refer to the related description in the above method embodiments, and will not be described in detail here.

An embodiment of the present disclosure further provides a computer device, as shown in fig. 9, which is a schematic structural diagram of the computer device provided in the embodiment of the present disclosure, and the computer device includes:

a processor 91 and a memory 92; the memory 92 stores machine-readable instructions executable by the processor 91, the processor 91 being configured to execute the machine-readable instructions stored in the memory 92, the processor 91 performing the following steps when the machine-readable instructions are executed by the processor 91:

The memory 92 includes a memory 921 and an external memory 922; the memory 921 is also referred to as an internal memory, and temporarily stores operation data in the processor 91 and data exchanged with an external memory 922 such as a hard disk, and the processor 91 exchanges data with the external memory 922 through the memory 921.

The specific execution process of the instruction may refer to the steps of the text processing method described in the embodiments of the present disclosure, and details are not described here.

The embodiments of the present disclosure also provide a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to perform the steps of the text processing method described in the above method embodiments. The storage medium may be a volatile or non-volatile computer-readable storage medium.

The embodiments of the present disclosure also provide a computer program product, where the computer program product carries a program code, and instructions included in the program code may be used to execute the steps of the text processing method in the foregoing method embodiments, which may be referred to specifically in the foregoing method embodiments, and are not described herein again.

The computer program product may be implemented by hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK), or the like.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. In the several embodiments provided in the present disclosure, it should be understood that the disclosed system, apparatus, and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present disclosure may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present disclosure. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

Finally, it should be noted that: the above-mentioned embodiments are merely specific embodiments of the present disclosure, which are used for illustrating the technical solutions of the present disclosure and not for limiting the same, and the scope of the present disclosure is not limited thereto, and although the present disclosure is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive of the technical solutions described in the foregoing embodiments or equivalent technical features thereof within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present disclosure, and should be construed as being included therein. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims

1. A method of text processing, comprising:

2. The method according to claim 1, wherein the obtaining a feature sentence vector of the text to be processed and a feature word vector corresponding to each of a plurality of characters in the text to be processed comprises:

3. The method of claim 2, wherein the performing joint feature extraction on the sentence vector and the word vectors corresponding to the characters respectively comprises:

4. The method according to claim 1, wherein the processing the feature sentence vector and the feature word vectors corresponding to the characters, respectively, by an attention mechanism to obtain the correlation information between the characters and the feature sentence vector, respectively, comprises:

5. The method of claim 3, wherein the training the neural network comprises:

obtaining a sample text;

6. The method according to any one of claims 1 to 5, wherein obtaining importance degree information corresponding to each of the plurality of characters based on the relevance degree information between each of the plurality of characters and the feature sentence vector comprises:

7. The method according to any one of claims 1 to 5, wherein obtaining importance degree information corresponding to each of the plurality of characters based on the relevance degree information between each of the plurality of characters and the feature sentence vector comprises:

8. The method according to claim 7, wherein the determining a target sentence vector characterizing the semantics of the text to be processed based on the information about the correlation degree between each of the plurality of characters and the feature sentence vector and the word vectors corresponding to each of the plurality of characters comprises:

9. The method of claim 7, wherein determining the importance level information corresponding to each of the plurality of characters based on the target sentence vector and the word vectors corresponding to each of the plurality of characters comprises:

10. The method of claim 9, wherein the probabilistic model comprises at least one of:

conditional random field models, maximum entropy models, and markov models.

11. The text processing method according to claim 1, further comprising:

12. A text processing apparatus, comprising:

13. A computer device, comprising: a processor, a memory storing machine-readable instructions executable by the processor, the processor for executing the machine-readable instructions stored in the memory, the processor performing the steps of the text processing method of any of claims 1 to 11 when the machine-readable instructions are executed by the processor.

14. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when executed by a computer device, performs the steps of the text processing method according to any one of claims 1 to 11.