CN111177350A

CN111177350A - Method, device and system for forming dialect of intelligent voice robot

Info

Publication number: CN111177350A
Application number: CN201911329798.3A
Authority: CN
Inventors: 刘宗全; 苏绥绥; 常富洋
Original assignee: Beijing Qilu Information Technology Co Ltd
Current assignee: Beijing Qilu Information Technology Co Ltd
Priority date: 2019-12-20
Filing date: 2019-12-20
Publication date: 2020-05-19

Abstract

The invention discloses a dialect forming method, a device, a system and a computer readable medium of an intelligent voice robot, which are used for extracting and generating new dialect according to historical dialogue data, and the method comprises the following steps: recording historical dialogue data of the intelligent voice robot and the client into a dialogue database, and extracting question sentences from the dialogue database to generate a question set; performing cluster analysis on the question set to form a plurality of topics, wherein each topic comprises a corresponding question; checking the classified topics, if the checked topics pass, storing question sentences corresponding to the topics into a standard question bank, and processing and supplementing a question-answer knowledge base to form a new word operation; and if the examination is not passed, storing the question sentence corresponding to the theme into a question library to be deleted. By adopting the technical scheme, the historical dialogue data is subjected to cluster analysis, a new dialogue is formed after auditing, and the problems that the original dialogue needs to be supplemented manually and the quantity is limited are solved.

Description

Method, device and system for forming dialect of intelligent voice robot

Technical Field

The invention relates to the technical field of intelligent recognition, in particular to a method, a device and a system for forming a dialect of an intelligent voice robot.

Background

The customer service center is a main bridge for communication between enterprises and users, and a main channel for improving the satisfaction degree of the users. In the past, a customer service center mainly takes manual customer service as a main part and professional customer service personnel serve users. With the development of computer information processing technology, more and more customer service centers begin to adopt intelligent voice robots to perform services such as return visits, telephone questionnaire surveys and the like.

At present, the communication between the intelligent voice robot and the user is mainly carried out according to dialogs, but the dialogs of the system are always limited, the supplementation by the manual dialogs has serious lag, and a new dialogs cannot be formed in time according to the dialogue between the intelligent voice robot and the user.

Disclosure of Invention

The invention aims to solve the problem that the existing intelligent voice robot depends on manual supplementation and cannot effectively supplement new dialogs.

In order to solve the above technical problem, a first aspect of the present invention provides a method for forming a speech of an intelligent voice robot, the method comprising:

recording historical dialogue data of the intelligent voice robot and the client into a dialogue database, and extracting question sentences from the dialogue database to generate a question set;

performing cluster analysis on the question set to form a plurality of topics, wherein each topic comprises a corresponding question;

checking the classified topics, if the checked topics pass, storing question sentences corresponding to the topics into a standard question bank, and processing and supplementing a question-answer knowledge base to form a new word operation; and if the examination is not passed, storing the question sentence corresponding to the theme into a question library to be deleted.

According to a preferred embodiment of the invention, the historical dialogue data of the intelligent voice robot and the client is dialogue data adopting linguistics.

According to a preferred embodiment of the present invention, clustering the set of question sentences to form a plurality of topics comprises:

converting the question in the question set into a text, segmenting the text of the question, and converting the segmented text into a vector;

and clustering the vectors to form a plurality of topics.

According to a preferred embodiment of the present invention, a word2vec model is used to convert the participle text into a vector.

According to a preferred embodiment of the present invention, a deep learning-based TextCNN model is used for clustering the vectors.

According to a preferred embodiment of the invention, the method further comprises: and if the theme audit is not passed, feeding the theme which is not passed in the audit back to the cluster analysis, and directly deleting the question sentence related to the theme and the answer corresponding to the question sentence in the subsequent cluster analysis.

According to a preferred embodiment of the present invention, the processed supplemental question and answer knowledge base comprises: and filtering the text content of the question corresponding to the theme, and generating a new dialect template from the question with the filtered text content.

According to a preferred embodiment of the present invention, the new conversational template includes intent tags, question content, and logical relationships.

A second aspect of the present invention provides a speech forming apparatus for an intelligent speech robot, the apparatus comprising:

the question extraction module is used for inputting historical dialogue data of the intelligent voice robot and the client into a dialogue database and extracting questions from the dialogue database to generate a question set;

the cluster analysis module is used for carrying out cluster analysis on the question set to form a plurality of topics, and each topic comprises a corresponding question;

the topic auditing module is used for auditing the classified topics, if the topics are approved by auditing, the question sentences corresponding to the topics are stored in a standard question bank, and the standard question bank is processed and supplemented to a question and answer knowledge base to form a new word operation; and if the examination is not passed, storing the question sentence corresponding to the theme into a question library to be deleted.

and clustering the vectors to form a plurality of topics.

According to a preferred embodiment of the present invention, a word2vec model is used to convert the participle text into a vector. Is a TextCNN model based on deep learning.

According to a preferred embodiment of the invention, the clustering of the vectors uses a TextCNN model based on deep learning.

According to a preferred embodiment of the invention, the device further comprises: and the feedback module feeds back the subject which is not approved to the cluster analysis if the subject is not approved, and directly deletes the question related to the subject and the answer corresponding to the question in the subsequent cluster analysis.

A third aspect of the present invention is a speech technology forming system of an intelligent voice robot, including:

a storage unit for storing a computer executable program;

and the processing unit is used for reading the computer executable program in the storage unit so as to execute the voice forming method of the intelligent voice robot.

A fourth aspect of the present invention is a computer-readable medium storing a computer-readable program, wherein the computer-readable program is configured to execute the intelligent voice robot speech formation method.

By adopting the technical scheme, the intelligent robot and the dialogue data of the user are subjected to cluster analysis, a new dialogue is formed after auditing, the dialogue library of the system is well supplemented, and the problems that the original dialogue needs to be supplemented manually and the quantity is limited are solved.

Drawings

In order to make the technical problems solved by the present invention, the technical means adopted and the technical effects obtained more clear, the following will describe in detail the embodiments of the present invention with reference to the accompanying drawings. It should be noted, however, that the drawings described below are only illustrations of exemplary embodiments of the invention, from which other embodiments can be derived by those skilled in the art without inventive step.

FIG. 1 is a flow chart of a method for recognizing emotion based on pictures of speech waveform pulses according to an embodiment of the present invention;

FIG. 2 is a schematic structural diagram of an emotion recognition apparatus based on pictures of speech waveform pulses in an embodiment of the present invention;

FIG. 3 is a frame diagram of an emotion recognition structure based on a picture of a speech waveform pulse in an embodiment of the present invention;

fig. 4 is a schematic structural diagram of a computer-readable storage medium in an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present invention will now be described more fully hereinafter with reference to the accompanying drawings, in which exemplary embodiments of the invention may be embodied in many specific forms, and should not be construed as limited to the embodiments set forth herein. Rather, these exemplary embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the invention to those skilled in the art.

The structures, properties, effects or other characteristics described in a certain embodiment may be combined in any suitable manner in one or more other embodiments, while still complying with the technical idea of the invention.

In describing particular embodiments, specific details of structures, properties, effects, or other features are set forth in order to provide a thorough understanding of the embodiments by one skilled in the art. However, it is not excluded that a person skilled in the art may implement the invention in a specific case without the above-described structures, performances, effects or other features.

The flow chart in the drawings is only an exemplary flow demonstration, and does not represent that all the contents, operations and steps in the flow chart are necessarily included in the scheme of the invention, nor does it represent that the execution is necessarily performed in the order shown in the drawings. For example, some operations/steps in the flowcharts may be divided, some operations/steps may be combined or partially combined, and the like, and the execution order shown in the flowcharts may be changed according to actual situations without departing from the gist of the present invention.

The block diagrams in the figures generally represent functional entities and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different network and/or processing unit devices and/or microcontroller devices.

The same reference numerals denote the same or similar elements, components, or parts throughout the drawings, and thus, a repetitive description thereof may be omitted hereinafter. It will be further understood that, although the terms first, second, third, etc. may be used herein to describe various elements, components, or sections, these elements, components, or sections should not be limited by these terms. That is, these phrases are used only to distinguish one from another. For example, a first device may also be referred to as a second device without departing from the spirit of the present invention. Furthermore, the term "and/or", "and/or" is intended to include all combinations of any one or more of the listed items.

Fig. 1 is a flowchart of a speech technology forming method of an intelligent voice robot according to the present invention, as shown in fig. 1, the method of the present invention has the following steps:

s101, recording historical dialogue data of the intelligent voice robot and the client into a dialogue database, and extracting question sentences from the dialogue database to generate a question set.

In the embodiment, the intelligent voice robot carries out telephone communication with a large number of users every day, and generates a large amount of dialogue data which can be used as a basis for discovering new dialogues.

In the embodiment, a question judgment model based on deep learning is adopted to judge whether a speech input by a user is a question, and when the judgment is carried out, the sentence is firstly segmented, for example, "a few meeting points in tomorrow? After word segmentation processing, the words are divided into ' Zan ', ' tomorrow ', ' several points ', ' head collision ', ' are? And inputting the word segmentation into a question and sentence judgment model and outputting a judgment result. The question judging model is trained in a supervised learning mode.

On the basis of the technical scheme, furthermore, the historical dialogue data of the intelligent voice robot and the client is dialogue data adopting the linguistics.

In this embodiment, when the intelligent voice robot communicates with the user, the intelligent voice robot responds to the question or answer of the user according to the speech technique. However, when the intelligent voice robot cannot recognize the subject of the user's question or has no corresponding answer, the intelligent voice robot may answer the user by means of bottom-of-pocket phonetics, such as answering "ask you call XXXX to ask this question", "you this question i can't answer for a while, ask you leave your phone, i can answer you later", and so on. In general, the questions answered by the linguistics may be new questions needing to be supplemented into the linguistics library, so that the linguistics answered by the linguistics are better selected as the historical dialog records for extracting the linguistics, and the linguistics needing to be supplemented can be extracted more accurately.

In the embodiment, the extracted new dialogs to be supplemented mainly aim at the question which cannot be answered by the intelligent voice robot, so that the extraction of the question sentences in the dialogue data can improve the discovery efficiency of the new dialogs.

And S102, carrying out cluster analysis on the question set to form a plurality of topics, wherein each topic comprises a corresponding question.

In this embodiment, a large amount of historical dialogue data is input into the model, and the extracted question data is classified by the model, and the questions with the same or similar content are classified into a topic, such as consulting a coupon, consulting new product information, and the like. The classification number facilitates subsequent operations thereafter.

On the basis of the technical scheme, further, performing cluster analysis on the question set to form a plurality of topics comprises:

and clustering the vectors to form a plurality of topics.

On the basis of the technical scheme, the word segmentation text is further converted into a vector by adopting a word2vec model.

Vectorization of text, i.e., representing text using numerical features, because computers cannot directly understand human-created languages and words. In order to make a computer understand text, the text information needs to be mapped into a numerical semantic space, which we can refer to as a word vector space. There are many algorithms for converting text into vectors, such as TF-IDF, BOW, One-Hot, word2vec, etc. In the embodiment, the vectorization of the text adopts a word2vec algorithm, the word2vec model is an unsupervised learning model, and the mapping of the text information to the semantic space can be realized by using the training of an unmarked corpus.

The intelligent voice robot communicates with the user every day, a large amount of historical dialogue data are accumulated, the historical dialogue data can be used as a corpus for word2vec model training, and the word2vec model can better complete text vectorization through training of the historical dialogue data.

On the basis of the technical scheme, the vectors are further clustered by adopting a TextCNN model based on deep learning.

There are also many kinds of algorithmic models for performing cluster analysis, such as: LDA, LSI, SVM, Chameleon algorithm model. In the present embodiment, a TextCNN model based on deep learning is used. The TextCNN model includes an input layer, a convolutional layer, a pooling layer, and a fully-connected layer.

The input layer of the TextCNN model needs to input a text sequence with a fixed length, the length L of the input sequence needs to be specified by analyzing the length of a corpus sample, a sample sequence shorter than L needs to be filled, and a sequence longer than L needs to be intercepted. And finally, the input of the input layer is the word vector corresponding to each vocabulary in the text sequence.

A number of convolution kernels of different sizes are typically used in the model. The height of the convolution kernel, i.e. the window value, is generally 2-8.

Maximum pooling is used in the pooling layer of the model, which not only reduces the parameters of the model, but also ensures that a fixed-length fully-connected layer input is obtained on the output of the roll base layer with an indefinite length.

The core function of the convolutional layer and the pooling layer in the classification model is a feature extraction function, primary features are extracted from an input fixed-length text sequence by using local word sequence information, the primary features are combined to be high-level features, and the step of feature engineering in the traditional machine learning is omitted through convolution and pooling operations.

The full connection layer is used as a classifier for classifying the input texts and summarizing the texts into different topics.

The TextCNN model generally employs a supervised learning model, i.e., the model is trained using labeled corpora. The model is optimized through training, and parameters of the model are adjusted, so that model classification can be more accurate.

The historical dialogue data is indexed in a manual mode and divided into three groups, namely training samples, adjusting samples and testing samples. Firstly, training a TextCNN model by using a training sample, determining an approximate value of a parameter, then optimizing the parameter of the model by using an adjusting sample, and finally judging whether the model meets the requirement by using a testing sample. And if the requirements are not met, the model is trained again by using a new sample.

S103, checking the classified topics, if the checked topics pass, storing question sentences corresponding to the topics into a standard question bank, and processing and supplementing a question-answer knowledge base to form a new dialect; and if the examination is not passed, storing the question sentence corresponding to the theme into a question library to be deleted.

In the embodiment, after the classified topics are examined, whether the topics need to generate new words or not is judged, if so, the examination is passed, and the corresponding question sentences under the topics are stored in the standard question bank; if the subject already has the corresponding operation or does not need to generate a new operation, the examination is not passed, and the subject is stored in a question bank to be deleted.

The review of the theme can adopt a machine review or manual review mode, and in the embodiment, in order to more accurately review the theme, the method adopts

On the basis of the technical scheme, the method further comprises the following steps: and if the theme audit is not passed, feeding the theme which is not passed in the audit back to the cluster analysis, and directly deleting the question sentence related to the theme and the answer corresponding to the question sentence in the subsequent cluster analysis.

In this embodiment, the topic that fails to be audited may have a corresponding topic, or there is no need to generate a new topic for the topic, so in the subsequent cluster analysis, such topic does not need to be audited any more, and resource waste is avoided. Therefore, the failed topics are fed back to the clustering analysis, and then the topics and the corresponding question sentences are directly deleted.

On the basis of the technical scheme, the processed and supplemented question-answer knowledge base further comprises: and filtering the text content of the question corresponding to the theme, and generating a new dialect template from the question with the filtered text content.

On the basis of the technical scheme, the new dialect template is further characterized by comprising an intention label, question content and a logical relationship.

In the embodiment, the corresponding question sentences under the subjects passed through the examination are summarized, the subject words and the key sentences are extracted, and the sentences without practical significance are deleted to form a new dialect template. After a new dialect template is generated, an intention label is set for the dialect template, when the dialect is used later, the dialog of the intelligent voice robot and a user is stored as historical dialog data, and the label corresponding to the dialect is distributed, so that the later-stage management and maintenance are facilitated; and setting a logic relation between the front and the back of the statement, setting a logic tree, and leading to different logic nodes according to different answers of the user.

As shown in fig. 2, in an embodiment of the present invention, a speech forming apparatus of an intelligent voice robot is further disclosed, the apparatus including:

and the question extraction module 201 is used for inputting the historical dialogue data of the intelligent voice robot and the client into a dialogue database, and extracting a question from the dialogue database to generate a question set.

In the embodiment, the intelligent voice robot carries out telephone communication with a large number of users every day, and generates a large amount of dialogue data which can be used as a basis for discovering new dialogues. On the basis of the technical scheme, furthermore, the historical dialogue data of the intelligent voice robot and the client is dialogue data adopting the linguistics.

And the cluster analysis module 202 is configured to perform cluster analysis on the question set to form a plurality of topics, where each topic includes a corresponding question.

and clustering the vectors to form a plurality of topics.

The topic auditing module 203 is used for auditing the classified topics, if the auditing is passed, the question sentence corresponding to the topic is stored in a standard question bank, and is processed and supplemented into a question and answer knowledge bank to form a new talk; and if the examination is not passed, storing the question sentence corresponding to the theme into a question library to be deleted.

As shown in fig. 3, in an embodiment of the present invention, a speech forming system of an intelligent voice robot is further disclosed, and the information processing system shown in fig. 3 is only an example, and should not bring any limitation to the functions and the application scope of the embodiment of the present invention.

A dialoging system 300 for an intelligent voice robot includes a storage unit 320 for storing a computer executable program; a processing unit 310 for reading the computer executable program in the storage unit to execute the steps of various embodiments of the present invention.

The speech forming system 300 of the intelligent voice robot in this embodiment further includes a bus 330 connecting different system components (including the storage unit 320 and the processing unit 310), a display unit 340, and the like.

The storage unit 320 stores a computer readable program, which may be a code of a source program or a read-only program. The program may be executed by the processing unit 310 such that the processing unit 310 performs the steps of various embodiments of the present invention. For example, the processing unit 310 may perform the steps as shown in fig. 1.

The storage unit 320 may include readable media in the form of volatile storage units, such as a random access memory unit (RAM)3201 and/or a cache storage unit 3202, and may further include a read only memory unit (ROM) 3203. The storage unit 320 may also include a program/utility 3204 having a set (at least one) of program modules 3205, such program modules 3205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

Bus 330 may be one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.

An intelligent voice robot tactical formation system 300 may also communicate with one or more external devices 370 (e.g., keyboard, display, network device, bluetooth device, etc.) enabling a user to interact with the processing unit 310 via these external devices 370 through an input/output (I/O) interface 350, and may also communicate with one or more networks (e.g., Local Area Network (LAN), Wide Area Network (WAN), and/or public network, such as the internet) through a network adapter 360. The network adapter 360 may communicate with other modules of the speech waveform pulse based emotion recognition system 300 over the bus 330. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in the operations platform-based information handling system 300, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

FIG. 4 is a schematic diagram of one computer-readable medium embodiment of the present invention. As shown in fig. 4, the computer program may be stored on one or more computer readable media. The computer readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a random access memory unit (RAM), a read-only memory unit (ROM), an erasable programmable read-only memory unit (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory unit (CD-ROM), an optical storage unit, a magnetic storage unit, or any suitable combination of the foregoing. The computer program, when executed by one or more data processing devices, enables the computer-readable medium to implement the above-described method of the invention, namely:

s101, inputting historical dialogue data of the intelligent voice robot and a client into a dialogue database, and extracting question sentences from the dialogue database to generate a question set;

s102, carrying out cluster analysis on the question set to form a plurality of topics, wherein each topic comprises a corresponding question;

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments of the present invention described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiment of the present invention can be embodied in the form of a software product, which can be stored in a computer-readable storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to make a data processing device (which can be a personal computer, a server, or a network device, etc.) execute the above-mentioned method according to the present invention.

The computer readable storage medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable storage medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

In summary, the present invention can be implemented as a method, an apparatus, an electronic device, or a computer-readable medium executing a computer program. Some or all of the functions of the present invention may be implemented in practice using general purpose data processing equipment such as a micro-processing unit or a digital signal processing unit (DSP).

While the foregoing embodiments have described the objects, aspects and advantages of the present invention in further detail, it should be understood that the present invention is not inherently related to any particular computer, virtual machine or electronic device, and various general-purpose machines may be used to implement the present invention. The invention is not to be considered as limited to the specific embodiments thereof, but is to be understood as being modified in all respects, all changes and equivalents that come within the spirit and scope of the invention.

Claims

1. A dialoging method for an intelligent voice robot, the method comprising:

2. The dialogs formation method of claim 1, wherein the historical dialogue data of the intelligent voice robot and the client is dialogue data using a linguist.

3. The utterance formation method of any one of claims 1 to 2, wherein performing cluster analysis on the set of question sentences to form a plurality of topics comprises:

and clustering the vectors to form a plurality of topics.

4. A method of dialect formation as claimed in claims 1-3 in which converting the participle text into a vector employs a word2vec model.

5. The method of claim 1-4, wherein clustering the vectors uses a deep learning based TextCNN model.

6. The method of claim 1-5, further comprising: and if the theme audit is not passed, feeding the theme which is not passed in the audit back to the cluster analysis, and directly deleting the question sentence related to the theme and the answer corresponding to the question sentence in the subsequent cluster analysis.

7. The dialect forming method of claims 1-6, wherein processing to supplement the knowledge base of questions and answers includes: and filtering the text content of the question corresponding to the theme, and generating a new dialect template from the question with the filtered text content.

8. A speech formation apparatus of an intelligent speech robot, the apparatus comprising:

9. A dialoging system for an intelligent voice robot, comprising:

a storage unit for storing a computer executable program;

a processing unit for reading the computer executable program in the storage unit to execute the intelligent voice robot dialoging method of any one of claims 1-7.

10. A computer-readable medium storing a computer-readable program for executing the dialoging method of the intelligent voice robot of any one of claims 1-7.