CN113535927A

CN113535927A - Method, medium, device and computing equipment for acquiring similar texts

Info

Publication number: CN113535927A
Application number: CN202110871649.0A
Authority: CN
Inventors: 杨萌; 冯旻伟; 尹竞成; 黄旭; 阮良
Original assignee: Hangzhou Netease Zhiqi Technology Co Ltd
Current assignee: Hangzhou Netease Zhiqi Technology Co Ltd
Priority date: 2021-07-30
Filing date: 2021-07-30
Publication date: 2021-10-22

Abstract

The disclosure provides a method, medium, device and computing equipment for acquiring similar texts. The text features of the standard text are determined not only based on a set of vectors mapped by the standard text, but also based on a set of vectors mapped by semantic role labeling results of individual words in the labeled text. And inputting the determined text characteristics into a similar text generation model to obtain at least one similar text.

Description

Method, medium, device and computing equipment for acquiring similar texts

Technical Field

The embodiment of the disclosure relates to the technical field of information, in particular to a method, a medium, a device and a computing device for acquiring similar texts.

Background

This section is intended to provide a background or context to the embodiments of the disclosure recited in the claims. The description herein is not admitted to be prior art by inclusion in this section.

In some application scenarios, several similar texts need to be acquired based on the standard text. For example, in an application scenario in which a user interacts with a customer service system, a specific text expression of a question input by the user to the customer service system is non-standardized, and in order to improve the intelligence degree of the customer service system, it is often necessary to deploy a plurality of similar questions acquired based on standard questions in the customer service system, and match similar questions corresponding to the same standard question to the same standard answer.

However, an effective technical solution for acquiring similar texts is lacking at present.

Disclosure of Invention

In this context, embodiments of the present disclosure are intended to provide a method, medium, apparatus, and computing device for obtaining similar text, so as to obtain more effective similar text based on standard text.

In a first aspect of embodiments of the present disclosure, a method for acquiring similar texts is provided, including:

acquiring a standard text;

determining text features of the standard text, including: mapping the standard text into a vector set; performing semantic role labeling on each word in the standard text, and mapping a labeling result into a vector set; determining the text characteristics of the standard text according to the two vector sets obtained by mapping;

inputting the text features of the standard text into a similar text generation model, and outputting at least one similar text.

In one embodiment of the present disclosure, the similar text generation model is constructed using a SimBERT algorithm; or the similar text generation model is constructed by a multi-head attention mechanism algorithm; or the similar text generation model is constructed by adopting a recurrent neural network algorithm.

In another embodiment of the present disclosure, mapping the annotation result to a vector set includes:

if the standard text comprises at least two sentences with independent semantic structures, mapping the part corresponding to the sentence in the labeling result into a vector corresponding to the sentence aiming at each sentence with an independent semantic structure;

and forming vector sets by vectors respectively corresponding to the statements with the independent semantic structures, or combining the vectors respectively corresponding to the statements with the independent semantic structures into one vector.

In yet another embodiment of the present disclosure, mapping the annotation result to a set of vectors includes:

for each statement with an independent semantic structure included in the standard text, wherein the statement comprises N words, and mapping a part corresponding to the statement in the labeling result into an N-dimensional vector; and the dimensions in the N-dimensional vector correspond to the words in the sentence one by one, and the value of any dimension is determined based on the semantic role of the word corresponding to the dimension.

In another embodiment of the present disclosure, determining the text feature of the standard text according to two vector sets obtained by mapping includes:

and forming a new vector set by the two vector sets obtained by mapping, wherein the new vector set is used as the text characteristic of the standard text.

In yet another embodiment of the present disclosure, the similar text generation model is trained by:

acquiring a training sample set, wherein each training sample comprises a first type of text and a second type of text; the first type of text and the second type of text included in the same training sample have the same content meaning;

for each training sample, determining text features of each text included in the training sample, including: mapping the text into a set of vectors; performing semantic role labeling on each word in the text, and mapping a labeling result into a vector set; determining the text characteristics of the text according to the two vector sets obtained by mapping;

and training a similar text generation model by taking the text features of the first type of text included in the training sample as model input and the text features of the second type of text included in the training sample as model output.

In still another embodiment of the present disclosure, the method further includes:

calling a first translation tool to translate the standard text of the first language version into a text of a second language version;

and calling a second translation tool to translate the text of the second language version back to the text of the first language version as similar text.

taking the standard text as a search object, and calling a search engine to search;

and taking a plurality of search results which are specified by the search engine and have similar meanings with the search object as similar texts.

In another embodiment of the present disclosure, applied to a customer service system, the method further includes:

displaying a cold start configuration interface;

acquiring a plurality of similar problems corresponding to the standard problems input into the configuration interface;

and during the cold starting process, associating at least part of the acquired similar questions with the standard answers corresponding to the standard questions.

displaying the obtained at least one similar question through the configuration interface;

in response to a selection signal for the presented similar text, determining a selected similar question;

associating at least part of the obtained similar questions with the standard answers corresponding to the standard questions, wherein the method comprises the following steps:

and associating the selected similar questions with the standard answers corresponding to the standard question texts.

In yet another embodiment of the present disclosure, the at least one similar text obtained is presented, including:

calculating the similarity of each obtained similar problem and the standard problem;

and displaying the acquired at least one similar text in a sequence form according to the sequence of the similarity from big to small.

In a second aspect of the disclosed embodiments, there is provided an apparatus for acquiring similar text, comprising:

the standard text acquisition module is used for acquiring a standard text;

the text characteristic determining module is used for determining the text characteristic of the standard text and comprises the following steps: mapping the standard text into a vector set; performing semantic role labeling on each word in the standard text, and mapping a labeling result into a vector set; determining the text characteristics of the standard text according to the two vector sets obtained by mapping;

and the similar text acquisition module inputs the text characteristics of the standard text into a similar text generation model and outputs at least one similar text.

In a third aspect of the embodiments of the present disclosure, a computer-readable storage medium is provided, on which a computer program is stored, which when executed by a processor, implements the method of acquiring similar text of any of the embodiments of the present disclosure.

In a fourth aspect of embodiments of the present disclosure, there is provided a computing device comprising a memory, a processor; the memory is used for storing computer instructions executable on the processor, and the processor is used for realizing the method for acquiring similar texts of any embodiment of the disclosure when executing the computer instructions.

According to the method, the medium, the device and the computing equipment for obtaining the similar texts, the text characteristics of the standard texts are determined not only based on the vector set mapped by the standard texts, but also based on the vector set mapped by the semantic role labeling results of all words in the labeled texts. And inputting the determined text characteristics into a similar text generation model to obtain at least one similar text.

The text features of the standard text determined in this way can contain information on the aspect of the literal expression of the standard text and information on the aspect of the core semantic structure of the standard text. The similar text obtained by inputting the text features into the similar text generation model not only has similarity with the standard text in literal expression, but also inherits the core semantic structure of the standard text as much as possible, and is more effective similar text.

Drawings

The above and other objects, features and advantages of exemplary embodiments of the present disclosure will become readily apparent from the following detailed description read in conjunction with the accompanying drawings. Several embodiments of the present disclosure are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which:

FIG. 1 illustratively provides an application scenario in which a user interacts with a customer service system;

FIG. 2 schematically provides a method flow for obtaining similar text;

FIG. 3 is an exemplary illustration of a specific classification table of core arguments and additional arguments;

FIG. 4 illustrates a cold start configuration interface;

FIG. 5 is an exemplary illustration of an apparatus for obtaining similar text;

FIG. 6 is a schematic diagram of a computer-readable storage medium provided by the present disclosure;

fig. 7 is a schematic structural diagram of a computing device provided by the present disclosure.

In the drawings, the same or corresponding reference numerals indicate the same or corresponding parts. Any number of elements in the drawings are by way of example and not by way of limitation, and any nomenclature is used solely for differentiation and not by way of limitation.

Detailed Description

The principles and spirit of the present disclosure will be described with reference to a number of exemplary embodiments. It is understood that these embodiments are given solely for the purpose of enabling those skilled in the art to better understand and to practice the present disclosure, and are not intended to limit the scope of the present disclosure in any way. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

As will be appreciated by one skilled in the art, embodiments of the present disclosure may be embodied as a system, apparatus, device, method, or computer program product. Accordingly, the present disclosure may be embodied in the form of: entirely hardware, entirely software (including firmware, resident software, micro-code, etc.), or a combination of hardware and software.

According to the embodiment of the disclosure, a method, a medium, a device and a computing device for acquiring similar texts are provided.

The principles and spirit of the present disclosure are explained in detail below with reference to several representative embodiments of the present disclosure.

The inventors have found that for natural language processing tasks that generate corresponding similar text based on standard text, the key is how to define "similarity". In some specific application scenarios, the definition of similarity is that semantic information acquired by a standard text and a similar text respectively needs to be understood without a large deviation, however, the definition of similarity is not only ensured that the similar text is similar enough to the standard text in terms of literal expression to conform to the similar definition. Only similar texts similar to the standard texts in literal expression still have the possibility of losing the core semantics of the standard texts, resulting in larger semantic information deviation.

The specific application scenario may be, for example, an application scenario in which a user interacts with a customer service system. In this application scenario, the specific text expression of the question input by the user to the customer service system is non-standardized, and in order to improve the intelligence degree of the customer service system, a plurality of similar questions obtained based on the standard question are often deployed in the customer service system, and the similar questions corresponding to the same standard question are all matched to the same standard answer.

FIG. 1 illustratively provides an application scenario in which a user interacts with a customer service system. As shown in fig. 1, when the user uses the customer service system, the user inputs a question to be asked, for example, "how to know his/her score", to the client system, and from the viewpoint of business, the customer service system is expected to identify a non-standardized question, such as "how to know his/her score", as a similar question to the standard question "query user score", and further feed back a standard answer corresponding to the "query user score" to the user. Therefore, it is required that the standard problem "query user points" deployed in advance in the client system includes "how to know their points", which requires that the standard problem "query user points" can be used to obtain the similar problem "how to know their points".

If the similar question obtained based on the standard question "inquire user's score" is "inquire user's bill", and the similar question obtained based on the standard question "how to obtain the score" is "know the score of itself", then the similar question thus generated is only similar to the standard question in literal expression, but loses the core semantics of the standard question, which easily causes the customer service system to match the non-standardized question input by the user to the standard question with larger semantic deviation, and then easily feeds back the standard answer corresponding to the standard question with larger deviation to the user, thereby bringing the bad experience of "not answering questions" to the user. Or, the client system is easy to be unable to understand the non-standardized problem of the user input, and a bad experience of 'one question of three unknown' is brought to the user.

Therefore, how to enable the acquired similar texts to retain the core semantics of the standard texts is crucial.

Therefore, in the technical scheme provided by the disclosure, the standard text is considered to have a certain core semantic structure, and if the similar text can inherit the core semantic structure of the standard text as much as possible, an overlarge semantic information deviation does not exist between the similar text and the standard text. In technical implementation, a technical means of semantic role labeling can be adopted to obtain the representation corresponding to the core semantic structure of the standard text, and then the obtained representation of the core semantic structure can be referred to obtain the similar text.

In one or more embodiments provided by the present disclosure, the text feature of the standard text is determined not only based on the set of vectors mapped by the standard text, but also based on the set of vectors mapped by the semantic character labeling result of each word in the labeled text. And inputting the determined text characteristics into a similar text generation model to obtain at least one similar text.

The text features of the standard text determined in this way can contain information on the aspect of the literal expression of the standard text and information on the aspect of the core semantic structure of the standard text. The similar text obtained by inputting the text features into the similar text generation model not only has similarity with the standard text in the aspect expression, but also inherits the core semantic structure of the standard text as much as possible. The similar text does not have a larger deviation from the standard text in core semantics and is more effective.

Having described the general principles of the present disclosure, various non-limiting embodiments of the present disclosure are described in detail below.

Fig. 2 exemplarily provides a method flow for obtaining similar texts, which includes the following steps:

s200: and acquiring a standard text.

S202: text features of the standard text are determined.

S204: inputting the text features of the standard text into a similar text generation model, and outputting at least one similar text.

In one or more embodiments of the present disclosure, the model similar text is generated by using the similar text, in these embodiments, the text features of the standard text are generally required to be used as the model input, and the model output is the similar text.

The step of determining text features of the standard text may comprise: mapping the standard text into a vector set; performing semantic role labeling on each word in the standard text, and mapping a labeling result into a vector set; and determining the text characteristics of the standard text according to the two vector sets obtained by mapping.

Wherein the set of vectors comprises at least one vector. In some embodiments, if the vector set includes a plurality of vectors, each vector in the vector set may be regarded as a row (or a column) to obtain a matrix.

The vector set mapped by the standard text contains information of the literal expression aspect of the standard text. The vector set mapped by the semantic role labeling result of each word in the standard text contains the information in the core semantic structure of the standard text.

The mapping of the standard text into a set of vectors has the effect of obtaining a mathematical representation of the standard text over the word representation. The word expressions of the standard text can be vectorized by using a common text mapping algorithm.

And the semantic role labeling technology is used for performing semantic role labeling on each word in the sentence. The semantic role labeling object is a single sentence, the single sentence has an independent semantic structure, and the standard text can include one or more sentences, so the semantic role labeling process is performed on the labeled text, and actually, the semantic role labeling process is performed on each sentence in the standard text respectively.

It should be noted that before semantic role labeling is performed on a sentence, word segmentation is usually performed on the sentence. For example, an open source Word segmentation tool such as HanNLP, Stanford Word Segmenter, etc. can be used for Word segmentation.

For example, for a standard text "ask for a star dew cereal language mobile phone chinese version download address, not requiring a hundred degree cloud", it includes two sentences, and the two sentences are both participled to obtain:

Semantic role labeling technologies generally support labeled semantic roles including predicates, core arguments, and additional arguments. Wherein, the predicate is generally a verb or an adjective; core arguments are words directly related to predicates, usually acting as subjects or objects in a sentence; the additional argument is other words in the statement except for the predicate and the core argument.

When semantic role labeling is performed, a label corresponding to the predicate may be set as PERD, a label corresponding to the core argument may be set as ARG, and a label corresponding to the additional argument may be set as ARGM.

In addition, core arguments and additional arguments may also be divided at a finer granularity. For example, there may be multiple categories of core arguments, multiple categories of additional arguments.

FIG. 3 illustratively provides a detailed classification table of core arguments with additional arguments. The label corresponding to the core argument is set to ARG-N, where N represents a number from 0 to 5, and different numbers are used to distinguish different types of core arguments, which can be specifically seen in fig. 3. The tag corresponding to the additional argument is set as ARGM-XXX, XXX represents an identifier comprising 3 letters, and different identifiers are used for distinguishing different types of additional arguments, which can be specifically shown in FIG. 3.

In some embodiments, the semantic role labeling can be performed on the sentences by refining to the level of the specific core argument kind and the additional argument kind shown in fig. 3.

In other embodiments, semantic role labeling may be performed on the sentences at the level of whether the sentences belong to the core argument or the accessory argument, instead of being detailed to the level of the specific core argument type and the additional argument type shown in fig. 3.

labeling result 1: (find, PRED, 0, 1);

labeling result 2: (Xinglu cereal language Chinese version download address in cell phone, ARG1, 1, 8);

labeling result 3: (not, ARGM-ADV, 9, 10);

labeling result 4: (to, PRED, 10, 11);

labeling result 5: (Baidu cloud, ARG1, 11, 12).

(find, PRED, 0, 1) indicates that after the 0 th position, the 1 st position is 'find' and the tag is 'PRED' (i.e., belongs to the predicate).

(Xinglu grain language Chinese version download address in cell phone, ARG1, 1, 8) shows that after the 1 st position, the labels of the 2 nd to 8 th positions are all 'ARG 1' (i.e. core argument-subject).

(not, ARGM-ADV, 9, 10) indicates that after the 9 th position (comma), the 10 th position is 'not' and the tag is 'ARGM-ADV' (i.e., additional argument-status language).

(to, PRED, 10, 11) indicates that after the 10 th position, the 11 th position is 'to' and the tag is 'PRED'.

(Baidu cloud, ARG1, 11, 12) indicates that after the 11 th position, the 12 th position is 'Baidu cloud' labeled 'ARG 1'.

The labeling result 1 and the labeling result 2 belong to the labeling result corresponding to the first sentence in the standard text, and the labeling results 3 to 5 belong to the labeling results corresponding to the second sentence in the standard text.

In the step of mapping the labeling result into a vector set, various methods can be adopted as long as the labeling result can be converted into a mathematical representation in the form of a vector set.

In some embodiments, if the standard text includes at least two sentences having independent semantic structures, the part of the annotation result corresponding to the sentence can be mapped to the vector corresponding to the sentence for each sentence having an independent semantic structure. Then, vectors corresponding to each sentence with an independent semantic structure may be combined into a vector set, or vectors corresponding to each sentence with an independent semantic structure may be combined into one vector.

In some embodiments, for each sentence with an independent semantic structure included in the standard text, the sentence includes N words, and a part corresponding to the sentence in the labeling result is mapped into an N-dimensional vector; and the dimensions in the N-dimensional vector correspond to the words in the sentence one by one, and the value of any dimension is determined based on the semantic role of the word corresponding to the dimension.

In some embodiments, for each sentence with an independent semantic structure included in the standard text, the sentence includes M words, and a part corresponding to the sentence in the annotation result is mapped into an M-dimensional vector; and the value of any dimension is determined based on the semantic role to which the word corresponding to the dimension belongs.

For example, still following the above example, a single word (or a single punctuation mark) in the standard text is used as a vector dimension, and if the label corresponding to this dimension is ARG, the mapping is 1; if the label corresponding to the dimension is ARGM, the label is mapped to 2; if the label corresponding to this dimension is PRED, the mapping is 3, and if the label corresponding to this dimension is a punctuation mark (such as comma), the mapping is 0. Thus, the standard text "ask for the star dew cereal language mobile phone chinese version download address, do not require Baidu cloud" can be mapped as the following vector:

(3，1，1，1，1，1，1，1，1，1，1，1，1，1，1，0，2，3，1，1，1)。

this vector includes 21 dimensions, one-to-one corresponding to 21 words (or punctuation) in standard text. This vector may be used as a set of vectors for standard text.

For another example, two sentences in the standard text may be mapped respectively based on the same mapping rule as in the previous example to obtain two vectors including 21 dimensions, and a vector set corresponding to the standard text is formed as follows:

(3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0) and (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 3, 1, 1, 1).

Since the first sentence includes 15 words, the 16 th to 21 st dimensions of the vector corresponding to the first sentence are all 0. Since statement two includes 5 words, the 1 st dimension to the 16 th dimension of the vector corresponding to statement one are all 0.

After the standard text is mapped into a vector set and a labeling result obtained by performing semantic role labeling on the standard text is mapped into the vector set, text features to be input into a similar text generation model can be determined according to the two vector sets. There are various ways of determining the text features according to the two vector sets, as long as the mathematical representations corresponding to the two vector sets can be combined to obtain a new mathematical representation.

In some embodiments, the two vector sets obtained by mapping may be combined into a new vector set as a text feature of the standard text. For example, each vector in the two vector sets can be directly used as a row (or a column) to form a matrix as a text feature.

And inputting the obtained text features into a similar text generation model to generate at least one similar text.

It should be noted that the similar text generation model applied in the method flow shown in fig. 2 may be constructed and trained in advance. In addition, before the similar text generation model is used to implement the method flow shown in fig. 2, a model construction phase and a model training phase may be performed.

Model construction and model training are described herein.

In the model building stage, the similar text generation model can be built by adopting various text generation algorithms. In some embodiments, a SimBERT algorithm may be employed to construct a similar text generation model.

In addition, a multi-head attention mechanism algorithm may be used to construct the similar text generation model, or a recurrent neural network algorithm (e.g., LSTM algorithm or GRU algorithm) may be used to construct the similar text generation model, which is not listed herein.

In the model training stage, a training sample set can be obtained, wherein each training sample comprises a first type of text and a second type of text; the first type of text and the second type of text included in the same training sample have the same content meaning.

Next, for each training sample, text features of each text included in the training sample may be determined, including: mapping the text into a set of vectors; performing semantic role labeling on each word in the text, and mapping a labeling result into a vector set; and determining the text characteristics of the text according to the two vector sets obtained by mapping.

Then, the text features of the first type of text included in the training sample can be used as model input, and the text features of the second type of text included in the training sample can be used as model output, so that the similar text generation model can be trained.

It should be noted that, in the process of training the similar text generation model, actually, the text features of the first type of text in one training sample are used as model input, and the text features of the second type of text in the same training sample are used as model output, so that the model learns the similarity rule of the first type of text and the second type of text with the same content meaning on the core semantic structure. Therefore, the trained similar text generation model can generate a similar text which accords with the learned similarity rule according to the input standard text.

In addition, in some embodiments provided by the present disclosure, in addition to generating several similar texts using a similar text generation model, more similar texts may be obtained using other approaches.

For example, a first translation tool may be invoked to translate the standard text in a first language version to text in a second language version. Then, a second translation tool is invoked to translate the text in the second language version back to the text in the first language version as similar text.

The first translation tool and the second translation tool may be the same translation tool or different translation tools. The first language version is a different language version than the second language version.

Of course, more translation tools (e.g., a third translation tool) may be used, or the standard text in the first language version may be translated into more other language versions (e.g., a third language version) and translated back to obtain more similar text.

In another example, a search engine may be invoked to perform a search using the standard text as a search object. And then, a plurality of search results which are similar to the meaning of the search object and are designated by the search engine are used as similar texts. If the technical scheme of the present disclosure needs to be implemented in an application scenario where a user interacts with a customer service system, the search engine may be a question and answer community website.

After a plurality of similar texts are obtained from one or more ways, the similarity between the standard text and each similar text can be respectively calculated, and the standard text and each similar text are sorted from big to small based on the similarity. In some implementations, similar text with a similarity less than a specified threshold may be discarded.

The present disclosure provides a technical solution for calculating similarity between a standard text and a similar text, as follows:

the standard text and the similar text may be vectorized separately. Taking a standard text as an example, the standard text may be segmented, on one hand, a tf-idf value (a standard measure) of each word in the standard text is calculated, and on the other hand, a word vector of each word in the standard text is queried based on a word vector dictionary. It should be noted that if a word is not in the word vector dictionary, the word vector of the word may be set to 0. And taking the tf-idf value of each word in the standard text as the weight corresponding to the word vector of the word, and calculating the weighted sum of the word vectors of each word in the standard text to obtain the vector corresponding to the standard text. By using a similar method, vectors corresponding to similar texts can also be obtained.

The distance (e.g., cosine distance) between the standard text vector and the similar text vector is calculated. The cosine distance can be used as a similarity characterization between the standard text and the similar text.

In addition, the editing distance between the standard text and the similar text can be obtained, the cosine distance and the editing distance are integrated, and the similarity representation between the standard text and the similar text is determined. For example, a weighted sum may be calculated for the cosine distance and the edit distance, resulting in a similarity characterization between the standard text and the similar text.

In addition, in a scene that a user interacts with the customer service system, the customer service system can display a cold start configuration interface. The client system may then retrieve a number of similar questions corresponding to the standard questions entered into the configuration interface. Then, the customer service system associates at least part of the obtained similar questions with the standard answers corresponding to the standard questions in the cold starting process.

The cold start of the customer service system refers to a process of storing a plurality of similar questions for each standard question before the customer service system provides service for a user.

In some embodiments, the customer service system may cold start the configuration interface, presenting the retrieved at least one similar question. The customer service system may determine the selected similar question in response to a selection signal for the presented similar text, and associate the selected similar question with a standard answer corresponding to the standard question text.

In some embodiments, the customer service system may calculate a similarity of each of the obtained similar questions to the standard question. And then displaying the acquired at least one similar text in a sequence form according to the sequence of the similarity from big to small.

FIG. 4 illustratively provides a cold start configuration interface. When the customer service system needs to perform cold start, a cold start configuration interface which can be displayed to an administrator can be shown in fig. 4, the cold start configuration interface can provide a function of searching for a plurality of similar problems based on standard problems, and the search path can include a model generation path and a search engine search path. In addition, the search path may also include a language version retranslation path. The administrator may select the appropriate similar question and click the download to associate the selected similar question with the standard question.

Fig. 5 exemplarily provides an apparatus for acquiring similar texts, including:

a standard text acquisition module 501 for acquiring a standard text;

the text feature determining module 502 determines text features of the standard text, including: mapping the standard text into a vector set; performing semantic role labeling on each word in the standard text, and mapping a labeling result into a vector set; determining the text characteristics of the standard text according to the two vector sets obtained by mapping;

the similar text obtaining module 503 inputs the text features of the standard text into a similar text generation model, and outputs at least one similar text.

In some embodiments, the similar text generation model is constructed using a SimBERT algorithm;

or the similar text generation model is constructed by a multi-head attention mechanism algorithm;

or the similar text generation model is constructed by adopting a recurrent neural network algorithm.

In some embodiments, the text feature determining module 502, if the standard text includes at least two sentences having independent semantic structures, for each sentence having an independent semantic structure, maps a portion of the annotation result corresponding to the sentence into a vector corresponding to the sentence; and forming vector sets by vectors respectively corresponding to the statements with the independent semantic structures, or combining the vectors respectively corresponding to the statements with the independent semantic structures into one vector.

In some embodiments, the text feature determining module 502 maps, for each sentence with an independent semantic structure included in the standard text, the sentence including N words, a part of the labeling result corresponding to the sentence into an N-dimensional vector; and the dimensions in the N-dimensional vector correspond to the words in the sentence one by one, and the value of any dimension is determined based on the semantic role of the word corresponding to the dimension.

In some embodiments, the text feature determining module 502 combines the two vector sets obtained by mapping into a new vector set, which is used as the text feature of the standard text.

The similar text generation model is trained by the following method:

In some embodiments, the similar text acquiring module 503 invokes a first translation tool to translate the standard text of the first language version into the text of the second language version;

In some embodiments, the similar text obtaining module 503 takes the standard text as a search object, and invokes a search engine to perform a search; and taking a plurality of search results which are specified by the search engine and have similar meanings with the search object as similar texts.

In some embodiments, the apparatus is applied to a customer service system, and the apparatus further comprises:

a cold start module 504 that presents a cold start configuration interface; acquiring a plurality of similar problems corresponding to the standard problems input into the configuration interface; and during the cold starting process, associating at least part of the acquired similar questions with the standard answers corresponding to the standard questions.

In some embodiments, the cold start module 504, via the configuration interface, presents the retrieved at least one similar question; in response to a selection signal for the presented similar text, determining a selected similar question; associating at least part of the obtained similar questions with the standard answers corresponding to the standard questions, wherein the method comprises the following steps:

In some embodiments, the cold start module 504 calculates a similarity between each obtained similar question and the standard question; and displaying the acquired at least one similar text in a sequence form according to the sequence of the similarity from big to small.

It should be noted that although in the above detailed description several units/modules or sub-units/sub-modules of the apparatus are mentioned, such a division is merely exemplary and not mandatory. Indeed, the features and functionality of two or more of the units/modules described above may be embodied in one unit/module, in accordance with embodiments of the present disclosure. Conversely, the features and functions of one unit/module described above may be further divided into embodiments by a plurality of units/modules.

Fig. 6 is a schematic diagram of a computer-readable storage medium 140 provided by the present disclosure, and a computer program is stored on the medium 140, and when the computer program is executed by a processor, the method for adjusting information recommendation weight according to any embodiment of the present disclosure is implemented.

The present disclosure also provides a computing device comprising a memory, a processor; the memory is used for storing computer instructions executable on the processor, and the processor is used for realizing the method for acquiring similar texts of any embodiment of the disclosure when executing the computer instructions.

Fig. 7 is a schematic structural diagram of a computing device provided by the present disclosure, and as shown in fig. 7, the computing device 15 may include, but is not limited to: a processor 151, a memory 152, and a bus 153 that connects the various system components, including the memory 152 and the processor 151.

Wherein the memory 152 stores computer instructions executable by the processor 131 to enable the processor 151 to perform a method of obtaining similar text according to any of the embodiments of the present disclosure. The memory 152 may include a random access memory unit RAM1521, a cache memory unit 1522, and/or a read only memory unit ROM 1523. The memory 152 may further include: a program tool 1525 having a set of program modules 1524, the program modules 1524 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, one or more combinations of which may comprise an implementation of a network environment.

The bus 153 may include, for example, a data bus, an address bus, a control bus, and the like. The computing device 15 may also communicate with an external device 155 through the I/O interface 154, the external device 155 may be, for example, a keyboard, a bluetooth device, etc. The computing device 150 may also communicate with one or more networks, which may be, for example, local area networks, wide area networks, public networks, etc., through the network adapter 156. The network adapter 156 may also communicate with other modules of the computing device 15 via the bus 153, as shown in FIG. 7.

Further, while the operations of the disclosed methods are depicted in the drawings in a particular order, this does not require or imply that these operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.

While the spirit and principles of the present disclosure have been described with reference to several particular embodiments, it is to be understood that the present disclosure is not limited to the particular embodiments disclosed, nor is the division of aspects, which is for convenience only as the features in such aspects may not be combined to benefit. The disclosure is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims

1. A method of obtaining similar text, comprising:

acquiring a standard text;

2. The method of claim 1, wherein the similar text generation model is constructed using a SimBERT algorithm;

3. The method of claim 1, mapping the annotation result to a set of vectors, comprising:

4. The method of any of claims 1-3, mapping the annotated results to a set of vectors, comprising:

5. The method of claim 1, determining the text feature of the standard text according to the two vector sets obtained by mapping, comprising:

6. The method of claim 1, wherein the similar text generation model is trained by:

7. The method of claim 1, further comprising:

8. An apparatus for obtaining similar text, comprising:

the standard text acquisition module is used for acquiring a standard text;

9. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method of any one of claims 1 to 7.

10. A computing device comprising a memory, a processor; the memory is for storing computer instructions executable on the processor for implementing the method of any one of claims 1 to 7 when the computer instructions are executed.