CN112364149B

CN112364149B - User question obtaining method and device and electronic equipment

Info

Publication number: CN112364149B
Application number: CN202110033442.6A
Authority: CN
Inventors: 黄诗雅
Original assignee: Guangzhou Yunqu Information Technology Co ltd
Current assignee: Guangzhou Yunqu Information Technology Co ltd
Priority date: 2021-01-12
Filing date: 2021-01-12
Publication date: 2021-04-23
Anticipated expiration: 2041-01-12
Also published as: CN112364149A

Abstract

The application discloses a user problem obtaining method, a user problem obtaining device and electronic equipment, wherein the method comprises the following steps: acquiring original text dialogue data; acquiring target text data from the original text dialogue data, wherein the target text data is text data corresponding to a target role category; and obtaining a target user question according to the target text data, wherein the target user question is a question asked by a target user in the original text dialogue data, and the user role of the target user is the target role category. The method can conveniently and accurately obtain the question asked by the target user in the original text dialogue data.

Description

User question obtaining method and device and electronic equipment

Technical Field

The present disclosure relates to the field of natural language processing technologies, and in particular, to a user question obtaining method and apparatus, and an electronic device.

Background

In daily life, a user can generally consult an enterprise customer service with problems of products before sale, after sale and the like in a voice communication mode; in the voice communication process, an enterprise customer service is generally required to answer user questions quickly and accurately.

In practice, in order to enable the enterprise customer service to answer user questions quickly and accurately, the enterprise customer service usually manually performs double listening, manual labeling and sorting on historical call voices of users under the condition that the users are authorized to use the call voices of the users, so as to extract questions frequently asked by the users, and the enterprise customer service can answer the user questions quickly and accurately by constructing a user question bank and training the enterprise customer service based on the user question bank.

In the process of realizing the application, the inventor finds that the existing method for extracting the user problems mainly depends on manpower and has the problems of time and labor waste; on the other hand, since manual labeling may be wrong, the problem of inaccurate extraction result may also be caused, and therefore, it is necessary to provide a user problem obtaining method to solve the above problem.

Disclosure of Invention

In a first aspect of the present disclosure, a user question obtaining method is provided, including:

acquiring original text dialogue data;

acquiring target text data from the original text dialogue data, wherein the target text data is text data corresponding to a target role category;

and obtaining a target user question according to the target text data, wherein the target user question is a question asked by a target user in the original text dialogue data, and the user role of the target user is the target role category.

Optionally, the obtaining target text data from the original text dialogue data includes:

performing data preprocessing on the original text dialogue data to obtain preprocessed text data;

and inputting the preprocessed text data into a target role classification model to obtain the target text data, wherein the target role classification model is used for predicting the probability that the sentence belongs to the target role category.

Optionally, the inputting the preprocessed text data into a target role classification model to obtain the target text data includes:

predicting the probability corresponding to each statement in the preprocessed text data according to the target role classification model;

and extracting sentences of which the corresponding probability is not less than a preset probability threshold value from the preprocessed text data to obtain the target text data.

Optionally, the obtaining a target user question according to the target text data includes:

obtaining a first sentence to be determined from the sentences of the target text data by using a preset text abstract extraction algorithm, wherein the first sentence to be determined is a sentence of which the corresponding sentence weight meets a preset condition;

acquiring a preset key vocabulary, wherein the preset key vocabulary is used for determining whether a sentence is a user problem;

acquiring a sentence containing any word in the preset key words from the first sentence to be determined as a second sentence to be determined;

and obtaining sentences with sentence position ordering meeting a preset position condition from the second sentence to be determined as the target user question, wherein the sentence positions are positions of the corresponding sentences in the target text data.

Optionally, the obtaining a first sentence to be determined from the sentence of the target text data by using a preset text summarization extraction algorithm includes:

respectively calculating the similarity between any two sentences in the target text data to construct a sentence similarity matrix;

constructing a statement weight map according to the statement similarity matrix, wherein the weight of an edge between two adjacent statements in the statement weight map is the similarity between the two statements;

obtaining the weight of the sentence in the target text data according to the sentence weight graph;

and sequencing the weights of all sentences in the target text data to obtain the sentences meeting the preset conditions as the first to-be-determined sentences.

Optionally, the target role classification model is obtained by training through the following steps:

acquiring a training data set, wherein the training data set comprises a plurality of sample text data, and each sample text data corresponds to a unique role category;

calculating prior probability corresponding to each role category as first probability according to the training data set;

under the setting based on mutual independence among the characteristics, counting the probability of the vocabularies in the training data set appearing in sample text data corresponding to other role categories as a second probability, wherein the other role categories are role categories except the target role category;

and obtaining the target role classification model according to the first probability and the second probability.

Optionally, the second probability is obtained by:

acquiring any vocabulary from the vocabularies in the training data set as a first vocabulary;

under the setting that the first vocabulary at least appears once in the sample text data corresponding to the other role categories, calculating a feature weight of the first vocabulary in the sample text data by using a preset feature value extraction algorithm, and taking the feature weight as a second sub-probability corresponding to the first vocabulary;

and obtaining the second probability according to the second sub-probability.

Optionally, the performing data preprocessing on the original text dialogue data to obtain preprocessed text data includes:

performing word segmentation processing and vocabulary part-of-speech tagging processing on sentences in the original text dialogue data to obtain first text data, wherein the vocabulary part-of-speech tagging processing is processing for tagging part-of-speech of vocabularies obtained after the word segmentation processing;

performing data cleaning processing on the first text data by using a preset data cleaning rule to obtain second text data, wherein the preset data cleaning rule is at least one of the following contents in a filtering statement: presetting a stop vocabulary, a part of speech, a numerical value and a symbol;

and executing text alignment processing on the second text data according to the part of speech of the sentence end vocabulary of the sentence in the second text data to obtain the preprocessed text data.

In a second aspect of the present disclosure, there is also provided a user question obtaining apparatus, including:

the original text dialogue data acquisition module is used for acquiring original text dialogue data;

the target text data acquisition module is used for acquiring target text data from the original text dialogue data, wherein the target text data is text data corresponding to a target role category;

and the target user question obtaining module is used for obtaining a target user question according to the target text data, wherein the target user question is a question asked by a target user in the original text dialogue data, and the user role of the target user is the target role category.

In a third aspect of the present disclosure, there is also provided an electronic device, which includes the apparatus of the second aspect of the present disclosure; or,

the electronic device includes: a memory for storing executable instructions; and the processor is used for operating the electronic equipment to execute the method of the first aspect of the disclosure according to the control of the instruction.

One advantageous effect of the present disclosure is that according to the embodiments of the present disclosure, when a user question needs to be extracted from a user call voice, an electronic device, for example, a server, may perform text conversion on the user voice call data to obtain original text dialogue data, and based on a target role category corresponding to a user who wants to extract the user question, obtain target text data corresponding to the target role category from the original text dialogue data, and then, the electronic device may reliably, conveniently, and accurately obtain the target user question according to the target text data.

Other features of the present disclosure and advantages thereof will become apparent from the following detailed description of exemplary embodiments thereof, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of the specification, illustrate embodiments of the disclosure and together with the description, serve to explain the principles of the disclosure.

Fig. 1 is a schematic block diagram showing a hardware configuration of an electronic device that can be used to implement the user question acquisition method of an embodiment.

Fig. 2 is a flowchart illustrating a user question obtaining method according to an embodiment of the present disclosure.

Fig. 3 is a schematic block diagram of a user question obtaining apparatus provided in an embodiment of the present disclosure.

Fig. 4 is a schematic diagram of a hardware structure of an electronic device provided in an embodiment of the present disclosure.

Detailed Description

Various exemplary embodiments of the present disclosure will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, the numerical expressions, and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless specifically stated otherwise.

The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses.

Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.

In all examples shown and discussed herein, any particular value should be construed as merely illustrative, and not limiting. Thus, other examples of the exemplary embodiments may have different values.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.

< hardware configuration >

Fig. 1 is a block diagram of a hardware configuration of a server that can be used to implement a user question acquisition method according to one embodiment.

As shown in fig. 1, the server 1000 may include a processor 1100, a memory 1200, an interface device 1300, a communication device 1400, a display device 1500, and an input device 1600. The processor 1100 may be, for example, a central processing unit CPU or the like. The memory 1200 includes, for example, a ROM (read only memory), a RAM (random access memory), a nonvolatile memory such as a hard disk, and the like. The interface device 1130 includes, for example, a USB interface, a serial interface, and the like. Communication device 1400 is capable of wired or wireless communication, for example. The display device 1500 is, for example, a liquid crystal display panel. The input device 1600 may include, for example, a touch screen, a keyboard, and the like.

In this embodiment, the server 1000 may be used to participate in implementing a method according to any embodiment of the present disclosure.

As applied to any embodiment of the present disclosure, the memory 1200 of the server 1000 is configured to store instructions for controlling the processor 1100 to operate in support of implementing a method according to any embodiment of the present invention. The skilled person can design the instructions according to the disclosed solution. How the instructions control the operation of the processor is well known in the art and will not be described in detail herein.

It should be understood by those skilled in the art that although a plurality of devices of the server 1000 are shown in fig. 1, the server 1000 of the disclosed embodiments may refer to only some of the devices therein, for example, only the processor 1110 and the memory 1120. This is well known in the art and will not be described in further detail herein.

< method examples >

As described in the background art, in the prior art, there are at least problems of time and labor waste and low accuracy when extracting a user question from a user call voice based on manual work, and to solve the problem, an embodiment of the present disclosure provides a user question obtaining method, please refer to fig. 2, which is a flowchart of the user question obtaining method provided by the embodiment of the present disclosure, and the method may be implemented by an electronic device, for example, the server 1000 in fig. 1.

As shown in FIG. 2, the method of the present embodiment may include steps S2100-S2300, which are described in detail below.

In step S2100, original text dialogue data is acquired.

In this embodiment, the original text dialogue data may be text dialogue data between a plurality of users respectively corresponding to at least two different character categories.

The original text dialogue data can be text dialogue data in a dialogue directly taking text as a dialogue mode; alternatively, the speech dialogue data in a dialogue using speech as a dialogue mode may be acquired, and then the speech recognition processing may be performed on the speech dialogue data to obtain text dialogue data.

In this embodiment, the dialog mode corresponding to the original text dialog data is not specially limited; in addition, the users corresponding to the original text dialogue data can be 2 users, and the 2 users respectively correspond to different role categories; alternatively, the number of users may be 2 or more, and the 2 or more users may correspond to at least two different role categories.

The role category, i.e., the user role category, is used to identify the type of role to which the user belongs. For example, in the case where the two parties of the conversation are user 1 who uses a business product and user 2 who solves a user question, respectively, the role category of user 1 may be "user" and the role category of user 2 may be "customer service".

Specifically, taking the dialogue data as the voice dialogue data, that is, the user call voice as an example, when the original text dialogue data is obtained, for the dialogue data of the user question to be extracted, the user call voice of the user question to be extracted may be uploaded by the user to the electronic device, for example, a server, which implements the method; the electronic device performs speech recognition processing on the user call speech to obtain the original text dialogue data described in this embodiment.

It should be noted that, in the specific implementation, the electronic device may also obtain the user call voice to be processed from the preset storage directory according to the preset time interval, so as to save the user operation and improve the user experience.

Step S2200 is to obtain target text data from the original text dialogue data, where the target text data is text data corresponding to a target role category.

In this embodiment, the target role category corresponds to a user who initiates a conversation and raises a question.

Specifically, after the original text dialogue data is obtained in step S2100, in order to quickly obtain the target user question corresponding to the target character category from the dialogue data, the text data corresponding to the target character category, that is, the text data issued by the question maker, may be obtained from the original text dialogue data, so as to reduce the data processing amount and improve the processing speed and accuracy.

In a specific implementation, the step of obtaining target text data from the original text dialogue data includes the following steps S2201 to S2202, which will be described in detail below.

Step S2201, performing data preprocessing on the original text dialogue data to obtain preprocessed text data.

Specifically, since the original text dialogue data may contain meaningless contents such as spoken sentences, small spoken sentences, transitional sentences, and the like, and may also be a case where one complete sentence is spoken by a user as divided into several sub-sentences, in order to improve processing speed and accuracy, after obtaining the original text dialogue data, data preprocessing may be first performed on the original text dialogue data to wash out the meaningless contents in the dialogue data and to perform text alignment processing on several sub-sentences belonging to the same complete sentence.

In one embodiment, the performing data preprocessing on the original text dialogue data to obtain preprocessed text data includes: performing word segmentation processing and vocabulary part-of-speech tagging processing on sentences in the original text dialogue data to obtain first text data, wherein the vocabulary part-of-speech tagging processing is processing for tagging part-of-speech of vocabularies obtained after the word segmentation processing; performing data cleaning processing on the first text data by using a preset data cleaning rule to obtain second text data, wherein the preset data cleaning rule is at least one of the following contents in a filtering statement: presetting a stop vocabulary, a part of speech, a numerical value and a symbol; and executing text alignment processing on the second text data according to the part of speech of the sentence end vocabulary of the sentence in the second text data to obtain the preprocessed text data.

In this embodiment, the word segmentation processing and the vocabulary part-of-speech tagging processing are performed on the sentences in the original text dialogue data to obtain first text data, and the word segmentation processing may be performed on the sentences in the original text dialogue data by using a Jieba (Jieba) word segmentation method and the vocabulary obtained after the word segmentation processing is performed on the vocabulary to obtain the first text data; and then, the contents of the stop vocabulary, the vocabulary of the non-important part of speech, the numerical value or the special symbol and the like in the filtered sentence can be cleaned based on the preset data, wherein the stop vocabulary can be the vocabulary in the stop vocabulary preset by the user, and the numerical value of the non-important part of speech, the special symbol and the like can also be preset by the user and are not specially limited.

It should be noted that, in the specific implementation, it is needless to say that other word segmentation methods may be used to perform word segmentation processing and word part-of-speech tagging processing on the sentence in the original text dialogue data, and no particular limitation is imposed here.

In this embodiment, the text alignment processing is performed on the second text data according to the part of speech of the end word of the sentence in the second text data to obtain the preprocessed text data, specifically, the word segmentation processing and the part of speech tagging processing may be performed on the sentence in the second text data, and then the sentence may be merged with the next sentence when the part of speech of the end word of each sentence is a conjunctive word.

For example, a sentence

The term at the end of the sentence is "because" and its part of speech is a conjunctive word, then the sentence can be put into practice

With its next statement, i.e. statement

And performing text alignment processing to combine the two sentences into a sentence.

It should be noted that, in practice, since the length of the sentence containing the user question is usually greater than a certain threshold, after the data preprocessing is performed on the original text dialogue data to obtain the preprocessed text data, in order to further reduce the data processing amount and improve the processing efficiency, a preset sentence length threshold may also be obtained, and the sentence with the sentence length not greater than the preset sentence length threshold in the preprocessed text data is filtered, where the preset sentence length threshold may be set according to actual needs, and the threshold may be, for example, 4.

Step 2202, inputting the preprocessed text data into a target role classification model to obtain the target text data.

After performing data preprocessing on the original text dialogue data to obtain preprocessed text data in step S2201, in order to identify and obtain target text data corresponding to a target character category, in this embodiment, the preprocessed text data is input into a target character classification model obtained by pre-training to obtain the target text data.

The target role classification model can be a role classification classifier and is used for predicting the probability that the statement belongs to the target role classification; in the present embodiment, the model may be a naive bayes (NBM, naivebayes model), and how to train to obtain the model is described below.

The target role classification model can be obtained by training through the following steps: acquiring a training data set, wherein the training data set comprises a plurality of sample text data, and each sample text data corresponds to a unique role category; calculating prior probability corresponding to each role category as first probability according to the training data set; under the setting based on mutual independence among characteristic conditions, counting the probability of occurrence of vocabularies in the training data set in sample text data corresponding to other role categories as a second probability, wherein the other role categories are role categories except the target role category; and obtaining the target role classification model according to the first probability and the second probability.

That is, in this embodiment, the training data set used for training to obtain the target role classification model may include a plurality of sample text data, each sample text data corresponds to a unique role category, and each sample text data may include a plurality of sentences; in specific implementation, the training data set can obtain sample text data respectively corresponding to unique role categories by collecting historical text dialogue data of a user and classifying and sorting text data respectively corresponding to each role category in the historical text dialogue data of the user.

In this embodiment, the historical text dialogue data of the user is taken as data including two types of role categories, namely "user" and "customer service", as an example to explain how to construct the training data set.

For example, if the document 1 and the document 2 are historical text dialogue data of users at different times, the statements corresponding to the character category of "user" in the document 1 and the document 2 can be extracted and sorted by manual sorting to obtain first sample text data; extracting and sorting sentences corresponding to the role category of customer service in the document 1 and the document 2 respectively to obtain second sample text data; and then, according to the first and second sample text data, a training data set for training a target role classification model can be constructed.

Of course, the above method for constructing the training data set provided in this embodiment may also be used to construct the training data set by using other methods in specific implementation, and details are not described here.

After the training data set is obtained, the target character classification model can be trained according to the training data set. In this embodiment, the target character classification model may be a classifier obtained by training based on bayesian theorem and the setting that feature conditions are independent from each other.

In practice, the bayesian formula can be specifically expressed as:

the meaning of the formula representation is specifically as follows: probability of a solving statement belonging to class B, i.e. if it contains feature A

The probability of the feature A, i.e. the probability of the sentence being in the category B, can be determined by taking the sentence as the category B

Occurring separately from class BProbability, i.e.

Multiplication and probability of occurrence of feature A alone, i.e.

Dividing to obtain; due to the fact that

And

can be obtained a priori, so when classifying sentences, the solution can be obtained

To solve

I.e. to the probability that the sentence has feature a if it belongs to class B.

In practice, the sentence to be classified may contain a plurality of features simultaneously, e.g. a plurality of words, i.e. the sentence may contain features

Then, then

Due to the fact that

May be close to zero, and therefore, can be obtained under the setting that the characteristic conditions are independent of each other

So that the result is measurable.

In this embodiment, it is not necessary to solve the probability of occurrence of the feature in each category, but rather, the sentence belonging to the target under the condition of "including the feature vocabulary" will be solvedThe probability of a character category "converts to solve" the probability that a sentence has a corresponding feature vocabulary on the condition that the sentence belongs to a category other than the target character category ". Specifically, after a training data set obtained by pre-labeling of a user is obtained, a prior probability corresponding to each role category can be calculated in a prior manner to serve as a first probability; then, if the target character category is set as c, the category of the sentence outside the category c, that is, the category of the sentence outside the category c, can be obtained by counting the words in the text data in the training data set

Under the condition (2), the sentence may contain any vocabulary of the vocabulary

Probability of being about

As a second probability; and then, after the values of the first probability and the second probability are obtained, the target character classification model can be obtained.

In one embodiment, the second probability may be obtained by: acquiring any vocabulary from the vocabularies in the training data set as a first vocabulary; under the setting that the first vocabulary at least appears once in the sample text data corresponding to the other role categories, calculating a feature weight of the first vocabulary in the sample text data by using a preset feature value extraction algorithm, and taking the feature weight as a second sub-probability; and obtaining the second probability according to the second sub-probability.

In this embodiment, the preset feature value extraction algorithm may be an algorithm based on a term frequency-inverse text frequency index (TF-IDF), and the algorithm may be specifically represented by the following formula:

(ii) a Wherein,

for vocabulary ik in sample text data

N is the total number of sample text data D in the training data set,

the number of sample text data in the training dataset that contain the vocabulary ik. Generally, in a training data set composed of a plurality of sample text data, the higher the frequency of occurrence of a certain vocabulary is, the smaller the discrimination is, and the lower the corresponding feature weight is; in a sample text data, the higher the frequency of a certain vocabulary appears, the greater the discrimination, and the greater the corresponding feature weight; therefore, when the second probability is calculated, the second probability can be obtained by calculating the feature weight of each vocabulary in the training data set appearing in the sample text data corresponding to other role categories.

That is, the second probability may be obtained by the following formula:

(ii) a It should be noted that, in the embodiment, in calculating the second probability, in order to ensure that the unrecorded vocabulary has an excessive influence on the prediction result, each vocabulary is set to appear at least once in all the sample text data.

In addition, in the embodiment, the position information of the vocabulary is lost in consideration of the setting of conditional probability independence, so that when extracting the vocabulary, the characteristic situation of the vocabulary can be extracted based on a preset n-gram model, wherein the preset n-gram model can be a bri-gram or a tri-gram model, and is not particularly limited herein.

After the target role classification model is obtained through the processing training, classifying the preprocessed text data according to the model to obtain target text data; namely, the inputting the preprocessed text data into a target character classification model to obtain the target text data includes: predicting the probability corresponding to each statement in the preprocessed text data according to the target role classification model; and extracting sentences of which the corresponding probability is not less than a preset probability threshold value from the preprocessed text data to obtain the target text data.

Specifically, after the first probability and the second probability are obtained, when the sentence in the target text data is subjected to role classification, the sentence to be classified is set as

If the target role category is c, then the statement

Comprising the words of

In case of (2), a sentence

The probability of belonging to the target role class c can be calculated by respectively calculating

Obtaining a mixture of, in which,

a subset of the total vocabulary in the training dataset.

It should be noted that, in this embodiment, the target role classification model is a model obtained based on naive bayes, and in specific implementation, the model may also be other types of classification models, for example, a model based on a classification decision tree and a random forest classification, which is not particularly limited herein.

After step S2200, executing step S2300, and obtaining a target user question according to the target text data, where the target user question is a question asked by a target user in the original text dialogue data, and a user role of the target user is the target role category.

In one embodiment, the obtaining the target user question according to the target text data includes the following steps S2301-S2304, which are described in detail below.

Step S2301, obtaining a first sentence to be determined from the sentences of the target text data by using a preset text abstract extraction algorithm, where the first sentence to be determined is a sentence whose corresponding sentence weight satisfies a preset condition.

In this embodiment, the preset text summarization extracting algorithm may be a preset textrank algorithm, and the obtaining a first sentence to be determined from the sentences of the target text data by using the preset text summarization extracting algorithm includes: constructing a sentence similarity matrix by respectively calculating the similarity between any two sentences in the target text data; constructing a statement weight map according to the statement similarity matrix, wherein the weight of an edge between two adjacent statements in the statement weight map is the similarity between the two statements; obtaining the weight of the sentence in the target text data according to the sentence weight graph; and sequencing the weights of all sentences in the target text data to obtain the sentences meeting the preset conditions as the first to-be-determined sentences.

That is, for the target text data, the target text data may be split into several sentences with at least one of the symbol sets (. |)

That is, it is set that dj sentences exist in the target text data; then, by fetching each statement

For example, a Chinese glove model can be used to obtain a word vector for each vocabulary in a sentence, and the word vector is obtained by matching the sentence with the word vector

Averaging the word vectors of all the words in the sentence to obtain the sentence

The sentence vector of (1); and then, a sentence similarity matrix is constructed by calculating the similarity between any two sentences in the target text data, a sentence weight graph is constructed, and then abstract sentences in the target text data, namely a sentence set containing the target user problem, can be obtained according to the sentence weight graph. It should be noted that, when calculating the Similarity between sentences, the Similarity may be obtained by calculating Cosine Similarity (Cosine Similarity) between sentence vectors, or may be obtained by other methods, which is not described herein again.

It should be noted that, the above is a method for extracting a text abstract from target text data provided in this embodiment, and in a specific implementation, other methods may also be used to extract a text abstract from target text data, which is not limited herein.

After the step S2301, step S2302 is executed to obtain a preset key vocabulary, where the preset key vocabulary is used to determine whether the sentence is a problem of the user, so as to reduce the occurrence of the non-problem sentence, where the preset key vocabulary may be a vocabulary used to determine a problem of the user, and the vocabulary may be set as needed.

Step S2303, a sentence including any vocabulary in the preset key vocabularies is acquired from the first sentence to be determined as a second sentence to be determined.

Step S2304, obtaining, from the second to-be-determined sentence, a sentence with a sentence position order satisfying a preset position condition as the target user question, where the sentence position is a position of the corresponding sentence in the target text data.

In this embodiment, the preset position condition may be a minimum value in the sentence positions, that is, a first question sentence in the target text dialog data is selected as a target user question.

It should be noted that, in the specific implementation, the intermediate processing result of each step or sub-step may be output and provided to the user for viewing during the implementation of the method, and the intermediate processing result modified by the user may be acquired, and the subsequent processing may be executed according to the modified intermediate processing result. For example, after obtaining the target text data in step S2200, the electronic device may present the target text data to the user for viewing, so that the user can confirm whether the target text data is correct, and may perform the subsequent step S2300 based on the target text data modified by the user.

In addition, the method provided by the embodiment can be used for providing a user question bank for enterprise arrangement, so that an enterprise can better train customer service personnel, and the customer service personnel can quickly and accurately answer user questions in the voice call process; of course, the method can be applied to other scenarios, and is not limited herein.

In summary, according to the user question obtaining method provided in the embodiments of the present disclosure, when a user question needs to be extracted from user voice call data, an electronic device, for example, a server, may perform text conversion on the user voice call data to obtain original text dialogue data, and then obtain target text data corresponding to a target role category from the original text dialogue data based on the target role category corresponding to a user who wants to extract the user question, and then the electronic device may obtain the target user question reliably, conveniently, and accurately according to the target text data.

< apparatus embodiment >

Corresponding to the above method embodiments, in this embodiment, a user question obtaining apparatus is further provided, as shown in fig. 3, the apparatus 3000 may include an original text conversation data obtaining module 3100, a target text data obtaining module 3200, and a target user question obtaining module 3300.

The original text dialogue data acquisition module 3100 is configured to acquire original text dialogue data.

The target text data obtaining module 3200 is configured to obtain target text data from the original text dialogue data, where the target text data is text data corresponding to a target role category.

In one embodiment, the target text data obtaining module 3200, when obtaining the target text data from the original text dialogue data, may be configured to: performing data preprocessing on the original text dialogue data to obtain preprocessed text data; and inputting the preprocessed text data into a target role classification model to obtain the target text data.

In this embodiment, when the target text data obtaining module 3200 performs data preprocessing on the original text dialogue data to obtain preprocessed text data, the target text data obtaining module may be configured to: performing word segmentation processing and vocabulary part-of-speech tagging processing on sentences in the original text dialogue data to obtain first text data, wherein the vocabulary part-of-speech tagging processing is processing for tagging part-of-speech of vocabularies obtained after the word segmentation processing; performing data cleaning processing on the first text data by using a preset data cleaning rule to obtain second text data, wherein the preset data cleaning rule is at least one of the following contents in a filtering statement: presetting a stop vocabulary, a part of speech, a numerical value and a symbol; and executing text alignment processing on the second text data according to the part of speech of the sentence end vocabulary of the sentence in the second text data to obtain the preprocessed text data.

In this embodiment, when the target text data obtaining module 3200 inputs the preprocessed text data into the target role classification model to obtain the target text data, the target text data obtaining module may be configured to: predicting the probability corresponding to each statement in the preprocessed text data according to the target role classification model; and extracting sentences of which the corresponding probability is not less than a preset probability threshold value from the preprocessed text data to obtain the target text data.

The target user question obtaining module 3300 is configured to obtain a target user question according to the target text data, where the target user question is a question asked by a target user in the original text dialogue data, and a user role of the target user is the target role category.

In one embodiment, the target user question obtaining module 3300, when obtaining the target user question according to the target text data, may be configured to: obtaining a first sentence to be determined from the sentences of the target text data by using a preset text abstract extraction algorithm, wherein the first sentence to be determined is a sentence of which the corresponding sentence weight meets a preset condition; acquiring a preset key vocabulary, wherein the preset key vocabulary is used for determining whether a sentence is a user problem; acquiring a sentence containing any word in the preset key words from the first sentence to be determined as a second sentence to be determined; and obtaining sentences with sentence position ordering meeting a preset position condition from the second sentence to be determined as the target user question, wherein the sentence positions are positions of the corresponding sentences in the target text data.

In this embodiment, the target user question obtaining module 3300, when obtaining the first to-be-determined sentence from the sentence of the target text data by using a preset text summarization extraction algorithm, may be configured to: respectively calculating the similarity between any two sentences in the target text data to construct a predicted similarity matrix; constructing a statement weight map according to the statement similarity matrix, wherein the weight of an edge between two adjacent statements in the statement weight map is the similarity between the two statements; obtaining the weight of the sentence in the target text data according to the sentence weight graph; and sequencing the weights of all sentences in the target text data to obtain the sentences meeting the preset conditions as the first to-be-determined sentences.

< apparatus embodiment >

Corresponding to the above method embodiments, in this embodiment, an electronic device is further provided, which may include the user question obtaining apparatus 4000 according to any embodiment of the present disclosure, and is configured to implement the user question obtaining method according to any embodiment of the present disclosure.

As shown in fig. 4, the electronic device 4000 may further comprise a processor 4200 and a memory 4100, the memory 4100 being configured to store executable instructions; the processor 4200 is configured to operate the electronic device according to the control of the instructions to perform a user question obtainment method according to any embodiment of the present disclosure.

The various modules of the above apparatus 3000 may be implemented by the processor 4200 executing the instructions to perform a user question obtainment method according to any embodiment of the present disclosure.

The electronic device 4000 may be a server, or may be other types of devices, such as a terminal device, and the like, which is not limited herein, and for example, the electronic device 4000 may be the server 1000 in fig. 1, and the like.

The present disclosure may be systems, methods, and/or computer program products. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied thereon for causing a processor to implement various aspects of the present disclosure.

The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or electrical signals transmitted through electrical wires.

The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.

The computer program instructions for carrying out operations of the present disclosure may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, the electronic circuitry that can execute the computer-readable program instructions implements aspects of the present disclosure by utilizing the state information of the computer-readable program instructions to personalize the electronic circuitry, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA).

Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions. It is well known to those skilled in the art that implementation by hardware, by software, and by a combination of software and hardware are equivalent.

Having described embodiments of the present disclosure, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein. The scope of the present disclosure is defined by the appended claims.

Claims

1. A user question acquisition method, comprising:

acquiring original text dialogue data;

obtaining a target user question according to the target text data, wherein the target user question is a question asked by a target user in the original text dialogue data, and a user role of the target user is the target role category;

wherein the obtaining of the target text data from the original text dialogue data comprises: performing data preprocessing on the original text dialogue data to obtain preprocessed text data; inputting the preprocessed text data into a target role classification model to obtain the target text data, wherein the target role classification model is used for predicting the probability that a statement belongs to a target role category;

the target role classification model is obtained by training through the following steps:

2. The method of claim 1, wherein inputting the preprocessed text data into a target character classification model to obtain the target text data comprises:

3. The method of claim 1, wherein obtaining a target user question from the target text data comprises:

4. The method of claim 3, wherein obtaining the first sentence to be determined from the sentences of the target text data by using a preset text summarization algorithm comprises:

constructing a sentence similarity matrix by respectively calculating the similarity between any two sentences in the target text data;

5. The method of claim 1, wherein the second probability is obtained by:

and obtaining the second probability according to the second sub-probability.

6. The method of claim 1, wherein the performing data pre-processing on the raw textual dialogue data to obtain pre-processed textual data comprises:

7. A user question acquisition apparatus, comprising:

a target user question obtaining module, configured to obtain a target user question according to the target text data, where the target user question is a question asked by a target user in the original text dialogue data, and a user role of the target user is the target role category;

under the setting based on mutual independence among the characteristics, counting the probability of the vocabularies in the training data set appearing in sample text data corresponding to other role categories as a second probability, wherein the other role categories are role categories except the target role category; and obtaining the target role classification model according to the first probability and the second probability.

8. An electronic device comprising the apparatus of claim 7; or,

the electronic device includes:

a memory for storing executable instructions;

a processor configured to execute the electronic device to perform the method according to any one of claims 1 to 6 under the control of the instructions.