CA3155717A1 - Method of realizing configurable intelligent voice robot, device and storage medium - Google Patents

Method of realizing configurable intelligent voice robot, device and storage medium Download PDF

Info

Publication number
CA3155717A1
CA3155717A1 CA3155717A CA3155717A CA3155717A1 CA 3155717 A1 CA3155717 A1 CA 3155717A1 CA 3155717 A CA3155717 A CA 3155717A CA 3155717 A CA3155717 A CA 3155717A CA 3155717 A1 CA3155717 A1 CA 3155717A1
Authority
CA
Canada
Prior art keywords
conversation
scenario
feature
user
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CA3155717A
Other languages
French (fr)
Inventor
Gang Wang
Jian Lin
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
10353744 Canada Ltd
Original Assignee
10353744 Canada Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 10353744 Canada Ltd filed Critical 10353744 Canada Ltd
Publication of CA3155717A1 publication Critical patent/CA3155717A1/en
Pending legal-status Critical Current

Links

Landscapes

  • Manipulator (AREA)
  • Machine Translation (AREA)

Abstract

Pertaining to field of artificial intelligence, the present application discloses a method of realizing a configurable intelligent voice robot, and device and storage, of which the method comprises: obtaining sample corpora of various conversation scenarios in a plurality of conversation scenarios; generating, with respect to each conversation scenario, a scenario feature of the conversation scenario on the basis of the sample corpus of the conversation scenario, wherein the scenario feature includes feature words of the conversation scenario and a feature word sequence obtained by mapping transformation of the feature words; and configuring the intelligent voice robot on the basis of a preset word vector space model and scenario features of the various conversation scenarios, wherein the word vector space model is employed for the intelligent voice robot to perform word vector similarity calculation on a user's conversation and the scenario features of the various conversation scenarios.

Description

METHOD OF REALIZING CONFIGURABLE INTELLIGENT VOICE ROBOT, DEVICE AND STORAGE MEDIUM
BACKGROUND OF THE INVENTION
Technical Field The present application relates to the field of artificial intelligence, and more particularly to a method of realizing a configurable intelligent voice robot, and corresponding device and storage medium.
Description of Related Art At present, intelligent voice robots are widely applied in telemarketing and customer service systems, and telemarketing on designated clients is completed by robots through IVR (interactive voice response) services. Telemarketing includes a simple form and a complicated form, by the simple form is meant the "one-sided marketing promotion", such as a broadcast of marketing content in one sentence, by the complicated form can be further realized several rounds of voiced questioning and answering (hereinafter referred to as "Q&A") interactions with clients, as well as judgement and feedback of clients' potential intentions. The intelligent voice robots can address problems perplexing the field for long in which the traditional human telemarketing is high in recruitment cost, long in training cycle, inconsistent in business levels, and instable in service qualities, and so on, as large-scale, repetitive works are now completed by backstage robots based on natural language models, and nearly 80% of general outbound manpower cost is helped to be reduced thereby for enterprises.
However, as the inventor has found during the process of achieving the present invention, currently available intelligent voice robots that can make the complicated form of scenario Q&A
interactions are generally technically problematic as discussed below.

Date Recue/Date Received 2022-04-19 During the process of a scenario Q&A interaction, the intelligent voice robot can make a corresponding reply by recognizing the intended scenario in a user's conversation. What the traditional intention recognition algorithm, which is based on the text classification model, employs is the offline training mode, by which the model judges a sample newly added in the future by learning historical label corpora, and classifies the sample in a certain label class already learnt. Such an algorithmic model that bases on the Bayesian theory can only process known classifications, and would classify any class that never appeared in the historical corpora still in the known classifications, thus necessarily leading to classification error.
In order to deal with this problem in practical application, the developing personnel can generally only add marked corpora of new classes to the historical label corpora, and thereafter roll back again to train the model. Such a mode is not only low in efficiency and high in cost, but can also not guarantee forward convergence of the model, that is to say, the introduction of new corpora to relearn might lower the classification precision of the original classes; moreover, since the algorithm relies on the conditional probability at the bottom, and discriminant probabilities of the various classes are not mutually independent, thusly, when intelligent voice robots are developed for different scenario classes, customized developments are respectively required, and migration and reuse are impossible therebetween, so that the development cost of scenarios for the intelligent voice robots is rendered unduly high.
SUMMARY OF THE INVENTION
In order to overcome the technical problems mentioned in the above Description of Related Art, the present application provides a method of realizing a configurable intelligent voice robot, and corresponding device and storage medium. The technical solutions of the present application are as follows.
According to the first aspect, there is provided a method of realizing a configurable intelligent voice robot, and the method comprises:
2 Date Recue/Date Received 2022-04-19 obtaining sample corpora of various conversation scenarios in a plurality of conversation scenarios;
generating, with respect to each said conversation scenario, a scenario feature of the conversation scenario on the basis of the sample corpus of the conversation scenario, wherein the scenario feature includes feature words of the conversation scenario and a feature word sequence obtained by mapping transformation of the feature words; and configuring the intelligent voice robot on the basis of a preset word vector space model and scenario features of the various conversation scenarios, wherein the word vector space model is employed for the intelligent voice robot to perform word vector similarity calculation on a user's conversation and the scenario features of the various conversation scenarios, so as to recognize an intended scenario of the user's conversation.
Further, the step of generating, with respect to each said conversation scenario, a scenario feature of the conversation scenario on the basis of the sample corpus of the conversation scenario includes:
obtaining, with respect to each said conversation scenario, discrete representation of the sample corpus of the conversation scenario on the basis of a preset domain dictionary;
employing a feature selection algorithm to extract feature words of the conversation scenario on the basis of the discrete representation of the sample corpus of the conversation scenario; and mapping to transform the features words of the conversation scenario to a corresponding dictionary index, and generating the feature word sequence of the conversation scenario;
preferably, the feature selection algorithm is a chi-square statistic feature selection algorithm.
Moreover, the method further comprises:
storing the various conversation scenarios and the scenario features of the conversation scenarios in a scenario feature relation table;
preferably, the method further comprises:
receiving a configuration feature word input with respect to any conversation scenario; and maintaining the scenario feature of the conversation scenario in the scenario feature relation table
3 Date Recue/Date Received 2022-04-19 on the basis of the configuration feature word of the conversation scenario and a configuration feature word sequence obtained by mapping transformation of the configuration feature word.
Preferably, the step of receiving a configuration feature word input with respect to any conversation scenario includes:
receiving a configuration feature word input by a user having feature configuration permission with respect to the conversation scenario.
Further, the step of maintaining the scenario feature of the conversation scenario in the scenario feature relation table on the basis of the configuration feature word of the conversation scenario and a configuration feature word sequence obtained by mapping transformation of the configuration feature word includes:
merging the configuration feature word of the conversation scenario to the feature words of the conversation scenario in the scenario feature relation table, and adding the configuration feature word sequence of the merged configuration feature word to the feature word sequence of the conversation scenario.
Further, the word vector space model is trained and obtained by:
employing domain corpora of a domain to which the various conversation scenarios pertain to train a pre-trained BERT word vector space, and obtaining the word vector space model.
Moreover, the method further comprises:
receiving, with respect to any conversation scenario, a state transition graph input by a first user for the conversation scenario, and receiving supplementary information input by a second user for the state transition graph, to generate a state transition matrix of the conversation scenario;
and generating a script file for containing state transition logical relation on the basis of the state transition matrix of the conversation scenario, and generating a finite state machine on the basis of the script file, to return a corresponding pattern when the intended scenario of the user's
4 Date Recue/Date Received 2022-04-19 conversation is recognized.
Moreover, the method further comprises:
preprocessing the user's conversation to obtain a plurality of segmented terms in the user's conversation after the well-configured intelligent voice robot has received the user's conversation, performing mapping transformation on the plural segmented terms, and obtaining a feature word sequence of the user's conversation;
employing the word vector space model to construct a feature vector of the user's conversation and scenario feature vectors of the various conversation scenarios on the basis of the feature word sequence of the user's conversation and feature word sequences of the various conversation scenarios; and performing similarity calculation on the feature vector of the user's conversation and the scenario feature vectors of the various conversation scenarios, and recognizing intention of the user's conversation on the basis of a similarity calculation result, to return a pattern to which the intention corresponds.
According to the second aspect, there is provided a device for realizing a configurable intelligent voice robot, and the device comprises:
an obtaining module, for obtaining sample corpora of various conversation scenarios in a plurality of conversation scenarios;
a generating module, for generating, with respect to each said conversation scenario, a scenario feature of the conversation scenario on the basis of the sample corpus of the conversation scenario, wherein the scenario feature includes feature words of the conversation scenario and a feature word sequence obtained by mapping transformation of the feature words; and a configuring module, for configuring the intelligent voice robot on the basis of a preset word vector space model and scenario features of the various conversation scenarios, wherein the word vector space model is employed for the intelligent voice robot to perform word vector similarity calculation on a user's conversation and the scenario features of the various conversation scenarios, so as to recognize an intended scenario of the user's conversation.
Date Recue/Date Received 2022-04-19 Further, the generating module includes:
a representing unit, for obtaining, with respect to each said conversation scenario, discrete representation of the sample corpus of the conversation scenario on the basis of a preset domain dictionary;
a screening unit, for employing a feature selection algorithm to extract feature words of the conversation scenario on the basis of the discrete representation of the sample corpus of the conversation scenario; and a generating unit, for mapping to transform the features words of the conversation scenario to a corresponding dictionary index, and generating the feature word sequence of the conversation scenario;
preferably, the feature selection algorithm is a chi-square statistic feature selection algorithm.
Moreover, the device further comprises:
a storing module, for storing the various conversation scenarios and the scenario features of the conversation scenarios in a scenario feature relation table;
preferably, the device further comprises:
a receiving module, for receiving a configuration feature word input with respect to any conversation scenario; and a maintaining module, for maintaining the scenario feature of the conversation scenario in the scenario feature relation table on the basis of the configuration feature word of the conversation scenario and a configuration feature word sequence obtained by mapping transformation of the configuration feature word.
Preferably, the receiving module is employed for receiving a configuration feature word input by a user having feature configuration permission with respect to the conversation scenario.
Further, the maintaining module is employed for merging the configuration feature word of the conversation scenario to the feature words of the conversation scenario in the scenario feature Date Recue/Date Received 2022-04-19 relation table, and adding the configuration feature word sequence of the merged configuration feature word to the feature word sequence of the conversation scenario.
Further, the device further comprises a training module for employing domain corpora of a domain to which the various conversation scenarios pertain to train a pre-trained BERT word vector space, and obtaining the word vector space model.
Moreover, the device further comprises a state machine configuring module for:
receiving, with respect to any conversation scenario, a state transition graph input by a first user for the conversation scenario, and receiving supplementary information input by a second user for the state transition graph, to generate a state transition matrix of the conversation scenario;
and generating a script file for containing state transition logical relation on the basis of the state transition matrix of the conversation scenario, and generating a finite state machine on the basis of the script file, to return a corresponding pattern when the intended scenario of the user's conversation is recognized.
Moreover, the device further comprises an intended scenario recognizing module that includes:
an obtaining unit, for preprocessing the user's conversation to obtain a plurality of segmented terms in the user's conversation after the well-configured intelligent voice robot has received the user's conversation, performing mapping transformation on the plural segmented terms, and obtaining a feature word sequence of the user's conversation;
a constructing unit, for employing the word vector space model to construct a feature vector of the user's conversation and scenario feature vectors of the various conversation scenarios on the basis of the feature word sequence of the user's conversation and feature word sequences of the various conversation scenarios; and a matching unit, for performing similarity calculation on the feature vector of the user's conversation and the scenario feature vectors of the various conversation scenarios, and recognizing intention of the user's conversation on the basis of a similarity calculation result, to Date Recue/Date Received 2022-04-19 return a pattern to which the intention corresponds.
According to the third aspect, there is provided a computer equipment that comprises a memory, a processor and a computer program stored on the memory and operable on the processor, and the following operational steps are realized when the processor executes the computer program:
obtaining sample corpora of various conversation scenarios in a plurality of conversation scenarios;
generating, with respect to each said conversation scenario, a scenario feature of the conversation scenario on the basis of the sample corpus of the conversation scenario, wherein the scenario feature includes feature words of the conversation scenario and a feature word sequence obtained by mapping transformation of the feature words; and configuring the intelligent voice robot on the basis of a preset word vector space model and scenario features of the various conversation scenarios, wherein the word vector space model is employed for the intelligent voice robot to perform word vector similarity calculation on a user's conversation and the scenario features of the various conversation scenarios, so as to recognize an intended scenario of the user's conversation.
According to the fourth aspect, there is provided a computer-readable storage medium storing a computer program thereon, and the following operational steps are realized when the computer program is executed by a processor:
obtaining sample corpora of various conversation scenarios in a plurality of conversation scenarios;
generating, with respect to each said conversation scenario, a scenario feature of the conversation scenario on the basis of the sample corpus of the conversation scenario, wherein the scenario feature includes feature words of the conversation scenario and a feature word sequence obtained by mapping transformation of the feature words; and configuring the intelligent voice robot on the basis of a preset word vector space model and scenario features of the various conversation scenarios, wherein the word vector space model is employed for the intelligent voice robot to perform word vector similarity calculation on a user's Date Recue/Date Received 2022-04-19 conversation and the scenario features of the various conversation scenarios, so as to recognize an intended scenario of the user's conversation.
The present application provides a method of realizing a configurable intelligent voice robot, and corresponding device and storage medium, by obtaining sample corpora of various conversation scenarios in a plurality of conversation scenarios, generating, with respect to each said conversation scenario, a scenario feature of the conversation scenario on the basis of the sample corpus of the conversation scenario, wherein the scenario feature includes feature words of the conversation scenario and a feature word sequence obtained by mapping transformation of the feature words, and configuring the intelligent voice robot on the basis of a preset word vector space model and scenario features of the various conversation scenarios, wherein the word vector space model is employed for the intelligent voice robot to perform word vector similarity calculation on a user's conversation and the scenario features of the various conversation scenarios, an intended scenario of the user's conversation is recognized. By employing the technical solution provided by the present application, it is made possible to process with a set of general intelligent voice robot solving scheme with respect to any business scenario of the same business field, to avoid the embarrassing situation in which it is required to develop different voice robots when specific IVR (Interactive Voice Response) scenario conversations are carried out for different products or different customer groups of the same business line, and to achieve cross-task reuse between common conversation scenarios, whereby the development cost of marketing robots and the threshold for customized services are greatly lowered.
BRIEF DESCRIPTION OF THE DRAWINGS
In order to describe the technical solutions in the embodiments of the present application more clearly, accompanying drawings required for use in the description of the embodiments will be briefly introduced below. Apparently, the accompanying drawings introduced below are merely directed to partial embodiments of the present application, and it is possible for persons ordinarily skilled in the art to acquire other drawings based on these drawings without spending any creative Date Recue/Date Received 2022-04-19 effort in the process.
Fig. 1 is a view illustrating the whole framework for the realization of a configurable intelligent voice robot provided by the present application;
Fig. 2 is a flowchart schematically illustrating the method of realizing a configurable intelligent voice robot in an embodiment;
Fig. 3 is a flowchart schematically illustrating step 202 in the method shown in Fig. 2;
Fig. 4 is a flowchart schematically illustrating scenario feature maintenance in an embodiment;
Fig. 5 is a flowchart schematically illustrating logical entity configuration in an embodiment;
Fig. 6 is a view illustrating the drawing board effect of frontstage state transition in an embodiment;
Fig. 7 is a view schematically illustrating a json file form expressing state transition torque logical relation in an embodiment;
Fig. 8 is a flowchart schematically illustrating intended scenario recognition in an embodiment;
Fig. 9 is a view schematically illustrating the structure of the device for realizing a configurable intelligent voice robot in an embodiment; and Fig. 10 is a view illustrating the internal structure of a computer equipment provided by an embodiment of the present application.
Date Recue/Date Received 2022-04-19 DETAILED DESCRIPTION OF THE INVENTION
In order to make more lucid and clear the objectives, technical solutions, and advantages of the present application, the technical solutions in the embodiments of the present application will be clearly and comprehensively described below with reference to accompanying drawings in the embodiments of the present application. Apparently, the embodiments as described are merely partial, rather than the entire, embodiments of the present application. All other embodiments obtainable by persons ordinarily skilled in the art based on the embodiments in the present application shall all be covered by the protection scope of the present application.
As should be noted, unless definitely required otherwise in the context, such terms as "comprising", "including", "containing" and their various forms as used throughout the Description and Claims should be understood to mean inclusion rather than exclusion or exhaustion, in other words, these terms mean "including, but not limited to.
In addition, as should be noted, such terms as "first" and "second" as used in the description of the present application are merely meant for descriptive purposes, and shall not be understood to indicate or imply relative importance. In addition, unless otherwise specified, such terms as "plural" and "plurality of' as used in the description of the present application mean "two or more".
As noted in the Description of Related Art, it is required in prior-art technology to make respectively customized developments of intelligent voice robots with respect to different scenario classes, migration and reuse therebetween are impossible, so that the development cost of scenarios of the intelligent voice robots is rendered unduly high. In view of this, the present application creatively proposes a method of realizing a configurable intelligent voice robot, whereby unified interface services can be provided and the scenarios are configurable, the business is supplied with a visual operation interface through a matched "intelligent voice platform", scenario development efficiency and business participation experience of intelligent Date Recue/Date Received 2022-04-19 voice robots are greatly enhanced, the problem in which robot logical entity cannot be reused is solved to certain extent, and a business-friendly and open scenario configuration interface is provided, so that the business can directly participate in the training and generating processes of the robot entity.
Fig. 1 is a view illustrating the whole framework for the realization of a configurable intelligent voice robot provided by the present application. With reference to Fig. 1, the framework mainly involves internal basic configuration, external configuration, intention recognition and FSM
(Finite-state machine) robot logical entity, of which the internal basic configuration can be realized by backstage algorithm developing personal through a relevant algorithm and includes internal basic features (namely scenario features), internal basic patterns, and internal basic logic, and the external configuration can be correspondingly configured by the frontstage business personnel through the frontend of the intelligent voice robot backstage administration system, and includes externally configurable features, externally configurable patterns, and conversation configurable logic. After the intelligent voice robot has been configured and generated by means of the technical solution of the present application, the intelligent voice robot can recognize the intended scenario on the input content of the framework, namely on text content recognized and transcribed from a client's voice through an ASR (automatic speech recognition) module, and return corresponding pattern content according to the finite state machine logical entity, so as to output voice content transformed from the pattern through TTS (Text-To-Speech) technique.
The technical solution of the present application is described in detail below through a plurality of embodiments.
In one embodiment, there is provided a method of realizing a configurable intelligent voice robot, the method can be applied to any computer equipment, such as a server, and the server can be embodied as an independent server or a server cluster consisting of a plurality of servers. As shown in Fig. 2, the method can comprise the following steps.

Date Recue/Date Received 2022-04-19 201 - obtaining sample corpora of various conversation scenarios in a plurality of conversation scenarios.
Here, the plural conversation scenarios are contained in a preset conversation scenario list, and the conversation scenario list is used to record one or more conversation scenario(s) of a specific business field.
Specifically, the sample corpora of the various conversation scenarios can be obtained by classifying and marking specific domain corpora according to classes of the conversation scenarios, here, the specific domain corpora indicate corpora of a specific business field, such as customer service corpora of consumption loan telemarketing. As should be understood, the specific obtaining process is not restricted in the embodiments of the present application.
The aforementioned conversation scenarios can be obtained by performing scenario abstraction on the specific domain corpora, and the scenario abstraction is a process from data to information and then to knowledge. Exemplarily, with respect to the consumption loan telemarketing field, it is possible to sort out common conversation scenarios of telemarketing activities by analyzing telemarketing dialogue logs of consumption loan products under the guidance of business personnel, such as "problems related to credit", "problems related to limit", "problems related to interests" and "operation consultation", etc., and statements in customer service conversation logs are marked according to several scenarios that have been summarized and classified. For instance, with respect to the application of a consumption loan telemarketing activity, conversion scenarios as shown in the following Table 1 can be abstracted:
Table 1: Consumption Loan Telemarketing Conversation Scenarios Serial Number Conversation Scenario 1 credit related 2 interests related 3 limit related 4 operation consultation Date Recue/Date Received 2022-04-19 contact again 6 affirm 7 negate 8 terminate request 9 unknown In practical application, each conversation scenario can be abstracted as a conversation state, and the dialogue process between customer service and customer can be abstracted as a transition between conversation states. If the conversation state is taken as a node, an oriented line between conversation states is precisely a process in which one state is transferred to another state, then the entire dialogue process can be abstracted as a graph consisting of nodes and oriented lines.
202 ¨ generating, with respect to each conversation scenario, a scenario feature of the conversation scenario on the basis of the sample corpus of the conversation scenario, wherein the scenario feature includes feature words of the conversation scenario and a feature word sequence obtained by mapping transformation of the feature words.
Specifically, content transformation based on a WOB (Word of Bag) model can be performed on the sample corpus of each conversation scenario, discrete representation of the sample corpus of each conversation scenario is obtained, a feature selection algorithm is thereafter employed to extract feature words of each conversation scenario on the basis of the discrete representation of the sample corpus of each conversation scenario, the feature words of each conversation scenario are subsequently mapped and transformed to a dictionary index of a preset domain dictionary, and a feature word sequence of each conversation scenario is obtained.
The aforementioned WOB model can divide a corpus text into separate terms, and it is imaged that all terms are placed in a bag, such elements as their word orders, grammars and syntaxes are ignored, the bag is only regarded as a collection of plural vocabularies, each term as it appears in the text is independent, and does not rely on whether other terms appear or not. The WOB model can be embodied as a one-hot (also referred to as one-hot coding), TF-IDF or N-gram model.

Date Recue/Date Received 2022-04-19 Exemplarily, with respect to five feature words of a "credit related"
conversation scenario, namely ["credit", "People's Bank of China", "report", "personal credit", "risk contron, the index serial numbers as these are mapped and transformed in the corresponding domain dictionary are [12, 223, 166, 17, 621, and a feature word sequence of the "credit related"
conversation scenario is obtained.
Preferably, after the step of generating, with respect to each conversation scenario, a scenario feature of the conversation scenario on the basis of the sample corpus of the conversation scenario, the method can further comprise:
storing the various conversation scenarios and the scenario features of the various conversation scenarios in a scenario feature relation table.
Specifically, the name, feature words and feature word sequence of each conversation scenario are correspondingly stored in the scenario feature relation table. The scenario feature relation table is used to store correspondence relations between conversation scenarios and scenario features (including feature words and feature word sequences).
Exemplarily, a scenario feature relation table of a consumption loan telemarketing activity can be as shown in Table 2.
Table 2: Scenario Feature Relation Table of Consumption Loan Telemarketing Activity Serial Conversation Feature Word Feature Word Number Scenario Sequence 1 credit related ['credit', 'People's Bank of [12, 2331 Chinal 2 interests ['limit', 'loan amount', 'loaned [2, 12, 13, related amount', ' amount '1 3 limit related ['interests', 'interest rate', [3, 5, 91 ' amount'l 4 operation ['operate', 'handle', 'set up', [8, 103, 198, consultation `configure'l 2101 Date Recue/Date Received 2022-04-19 ... ... ... ...
In practical application, the above scenario feature relation table is stored in a server, maintained offline by backstage algorithm technical personnel based on periodical text data mining operations, and isolated from frontstage business personnel.
203 - configuring the intelligent voice robot on the basis of a preset word vector space model and scenario features of the various conversation scenarios, wherein the word vector space model is employed for the intelligent voice robot to perform word vector similarity calculation on a user's conversation and the scenario features of the various conversation scenarios, so as to recognize an intended scenario of the user's conversation.
In this embodiment, the intelligent voice robot is configured on the basis of the word vector space model and the correspondence relations between the scenario features of the various conversation scenarios, and it is thus possible, in practical application, to recognize the intended scenario of the user's conversation based on the word vector space model and the scenario features of the various conversation scenarios. Specifically speaking, when the intelligent voice robot is conversing with the user, it is possible to obtain the user's conversation text by recognizing and transforming the user's conversation voice via the automatic speech recognition (ASR) technique, extract feature information out of the user's conversation text, hence perform word vector similarity calculation on the user's conversation and scenario features of the various conversation scenarios according to the word vector space model, and recognize the intended scenario of the user's conversation according to the word vector similarity calculation result.
In an example, the word vector space model can be trained and obtained by:
employing domain corpora of a domain to which the various conversation scenarios pertain to train a pre-trained BERT word vector space, and obtaining the word vector space model.
Here, the domain corpora indicate corpora of a specific business field to which the various Date Recue/Date Received 2022-04-19 conversation scenarios pertain, such as customer service corpora of consumption loan telemarketing.
In this embodiment, pre-trained embedding, which is based on large-scale corpora and high computational cost, and which cannot be achieved via current hardware resources, is obtained by introducing large-scale BERT (Bidirectional Encoder Representations from Transformers) word vector space (768 dimensions) pretrained in google bert serving. On the basis thereof, the BERT
word vector space is retrained by bringing in own business customer service corpora, and calibration of the BERT word vector is realized, so that it more conforms to specific business scenarios.
By employing the technical solution of this embodiment, it is made possible to process with a set of general intelligent voice robot solving scheme with respect to any business scenario of the same business field, to avoid the embarrassing situation in which it is required to develop different voice robots when specific IVR (Interactive Voice Response) scenario conversations are carried out for different products or different customer groups of the same business line, to achieve cross-task reuse between common conversation scenarios, and to make the business scenarios extendible, whereby the development cost of marketing robots and the threshold for customized services are greatly lowered.
In one embodiment, as shown in Fig. 3, the above step 202 of generating, with respect to each conversation scenario, a scenario feature of the conversation scenario on the basis of the sample corpus of the conversation scenario can include the following.
301 - obtaining, with respect to each conversation scenario, discrete representation of the sample corpus of the conversation scenario on the basis of a preset domain dictionary.
Specifically, total segmentation corpora from which stop-words are removed can be based on to construct the domain dictionary, the domain dictionary includes total effective vocabularies Date Recue/Date Received 2022-04-19 appearing in the corpora, and the domain dictionary is based on to perform content transformation on all sample corpora of a target conversation scenario on the basis of the WOB model to obtain the discrete representation, i.e., to transform the corpora to the following mode of expression:
[index of terms in a dictionary, word frequencies of terms in a document]
For instance, there is a basic dictionary that contains four terms, ["I", "you", "dislike", "love", "Nanjing", "hometown"1, then the statement "I / love / my / hometown /
Nanjing" will be [(0,2),(3,1),(4,1),(5,1)] after transcription by the word of bag.
302 - employing a feature selection algorithm to extract feature words of the conversation scenario on the basis of the discrete representation of the sample corpus of the conversation scenario.
Specifically, after word of bag transformation, discrete representation of the sample corpus of the target conversation scenario is obtained, and the feature selection algorithm can be used thereafter to extract feature words of the target conversation scenario.
Preferably, the feature selection algorithm is a chi-square statistic feature selection algorithm.
In this embodiment, the chi-square statistic (CHI) technique can be based on to extract feature words of the target conversation scenario. During the process of specific implementation, feature word extraction of the target conversation scenario can be performed according to the form shown in the following Table 3 and CHI calculation formula.
Table 3: Classes and Document Ownership Document Ownership Number of Documents of Number of Documents of Class C non-Class C
number of t documents a b included number oft documents c d Date Recue/Date Received 2022-04-19 not included CHI calculation formula:
N x (ad ¨ cb)2 x2 (t, c) ¨
(a + c) x (b + d) x (a + b) x (c + d) where c is a certain class, namely a "conversation scenario", t is a certain term, and N is the total number of texts in the training corpora.
The above x2 is generally used for chi square hypothesis verification in statistics, to judge uniformity or goodness of fit between actual distribution and theoretical distribution, with its null hypothesis Ho being "no marked difference between observed frequency and desired frequency". Accordingly, the less the chi square statistic is, the closer will be the observed frequency to the desired frequency, and the higher will be the relevancy between the two.
Therefore, x2 can be regarded as a measure of the distance between an observed object and a desired object, the less the distance is, the higher will be the relevancy therebetween. The "observed object" in the present application is a term, and the desired object is a conversation scenario, if the term and the conversation scenario are highly relevant, statistic distributions of the two should be close to each other in the entire samples. Consequently, through the x2 statistic, relevancies between all vocabularies in a dictionary and the various classes can be calculated relatively quickly and accurately on the basis of great quantities of corpora, and a preset number of terms (the preset number can be set as 5) with the least x2 is selected according to the relevancy sorting result to serve as a feature set of conversation scenarios, to complete feature mapping of various scenarios/classes in the scenario list.
303 - mapping to transform the features words of the conversation scenario to a corresponding dictionary index, and generating the feature word sequence of the conversation scenario.
Specifically, feature words of the target conversation scenario are mapped and transformed Date Recue/Date Received 2022-04-19 through dictionary index of the domain dictionary, to obtain the feature word sequence of the target conversation scenario.
Exemplarily, suppose that the five feature words extracted from the "credit related" conversation scenario in step 302 are ["credit", "People's Bank of China", "report", "personal credit", "risk contron, through dictionary index mapping transformation, index serial numbers can be obtained in the corresponding dictionary as [12, 223, 166, 17, 621, namely the feature word sequence of the target conversation scenario.
In the state of the art, such word vector models as Word2vec, Glove or ELMo are usually employed with respect to sample corpora to generate word vectors, but because abstract dimensions of terms are still hidden in the word vectors as generated, it is impossible for the user to maintain and to extend, whereas in the embodiments of the present application, with respect to each conversation scenario, discrete representation of the sample corpus of the conversation scenario is obtained on the basis of the preset domain dictionary, and the feature words of the target conversation scenario are extracted by means of a feature selection algorithm and mapped and transformed to a feature word sequence, since the feature words and the feature word sequence of the target conversation scenario can be expressed as specific terms, it is therefore made possible to facilitate the user to perform visual maintenance and extension, so as to provide the business personnel and the algorithm personnel with the possibility to cooperate in the development of scenario features of intelligent voice robots.
In one embodiment, as shown by the flowchart of Fig. 4 schematically illustrating scenario feature maintenance, the method further comprises the following.
401 - receiving a configuration feature word input with respect to any conversation scenario.
Specifically, a configuration feature word input by the user with respect to the target conversion scenario through the system frontend is received. The user here indicates the business personnel, Date Recue/Date Received 2022-04-19 such as the customer service personnel; during the process of specific implementation, the frontend can provide the business personnel with a "feature relation extension" function to maintain the business field, whereby the business personnel selects a specific conversion scenario option at the frontend, and keys in a vocabulary highly related to the scenario as summarized and extracted by the business personnel himself according to business knowledge and experience at the input box, to service as the configuration feature word of the target conversion scenario; on receiving the configuration feature word, the system updates it into an externally input feature set of the target conversion scenario after review at the backstage.
The step of receiving a configuration feature word input with respect to any conversation scenario includes:
receiving a configuration feature word input by a user having feature configuration permission with respect to the conversation scenario.
In this embodiment, since configuration feature words coming from the outside might be problematic in terms of instable qualities due to difference in experiences and levels of the business personnel, it is therefore possible to open the external feature configuration permission only to those well-experienced, selected business personnel by constructing a feature configuration permission list.
402 - maintaining the scenario feature of the conversation scenario in the scenario feature relation table on the basis of the configuration feature word of the conversation scenario and a configuration feature word sequence obtained by mapping transformation of the configuration feature word.
In one example, the process of realizing step 402 may include:
merging the configuration feature word of the conversation scenario to the feature words of the conversation scenario in the scenario feature relation table, and adding the configuration feature word sequence of the merged configuration feature word to the feature word sequence of the Date Recue/Date Received 2022-04-19 conversation scenario.
Specifically, after having performed such basic processes as duplicate-removing and rectifying the configuration feature word input with respect to the target conversation scenario, the configuration feature word is directly added to the feature words of the target conversation scenario in the scenario feature relation table, and feature word sequence mapping is carried out.
The final merging result is as shown in the following Table 4.
Table 4: Scenario Feature Relation Table after Internal and External Configurations Merging Serial Conversation Feature Feature Externally Merged Merged Feature Number Scenario Word Sequence Input Feature Word Sequence Vocabulary Word 1 credit related ['credit', [12,223] ['People's ['credit', [12,223]
'People's Bank of 'People's Bank'] China'] Bank', 'People's Bank of China']
2 interests ['interests', [3,5,9] ['rate', ['interests', [3,5,9,102]
related 'interest 'interest 'interest rate', rate'] rate', 'amount of 'amount of money 'I money', 'rate]
3 limit related ['limit', [2,12,13,9] ['loan ['limit', [2,12,13,9,4,86,321]
'loan limit', 'loan amount', 'limit', amount', 'loaned 'amounel 'loaned amount', amount', 'amount of 'amount of money 'I money', 'loan limit', 'amount'l 4 operation ['operate', [8,103,198,210] [] ['operate', [8,103,198,210]
consultation 'handle', 'handle', Date Recue/Date Received 2022-04-19 'set up', 'set up', `configure'l `configure'l ... ... - - - ... ... - - - ...
As should be noted, if an externally input configuration feature word is not contained in the domain dictionary, this configuration feature word is ignored. For instance, the term "People's Bank of China" in Table 4 is not in the domain dictionary, this term will be directly ignored when feature word sequences are merged. In this embodiment, by maintaining the scenario feature of the conversation scenario in the scenario feature relation table on the basis of the configuration feature word of the conversation scenario and a configuration feature word sequence obtained by mapping transformation of the configuration feature word, it is made possible to overcome the traditional operation mode of "frontstage and backstage" in which "businesses define requirements, and technologies develop applications", and to greatly alleviate the problem of "detachment of development from production, and mismatch of requirement and response"
pending in the traditional robot development pattern by realizing cooperative development of marketing robots by business personnel and technical personnel through the pattern of "common definition of internal and external configurations".
In one embodiment, such as shown in the flowchart schematically illustrating logical entity configuration in Fig. 5, the method further comprises the following.
501 ¨ receiving, with respect to any conversation scenario, a state transition graph input by a first user for the conversation scenario, and receiving supplementary information input by a second user for the state transition graph, to generate a state transition matrix of the conversation scenario.
The first user indicates business personnel, the second user indicates algorithm developing technical personnel.
502 ¨ generating a script file for containing state transition logical relation on the basis of the Date Recue/Date Received 2022-04-19 state transition matrix of the conversation scenario, and generating a finite state machine on the basis of the script file, to return a corresponding pattern when the intended scenario of the user's conversation is recognized.
The finite state machine (FSM), also referred to as a finite state automat and abbreviated as a state machine, is a mathematical model that expresses a finite number of states and such behaviors as transitions and actions amongst these states. The main function thereof is to describe state sequences of an object experienced within its life cycle, and how to respond to various events coming from the outside to transition between the states. A state machine includes four elements, namely current state, condition, action, and next state. The current state is a state currently being in; the condition is also referred to as "event", when a condition is satisfied, an action will be triggered or one round of state transition will be performed;
the action indicates an action executed after the condition has been satisfied, after the action has been executed to completion, a new state can be transitioned, and it is also possible to maintain the original state, the action is not indispensable, after the condition has been satisfied, it is also possible not to execute any action but to directly transition to a new state; the next state indicates a new state to be transitioned after the condition has been satisfied. The "next state" is relative to the "current state", once a "next state" is activated, it becomes a new "current state".
The logical relation of the finite state machine can be expressed as a state transition matrix as shown in the following Table 5.
Table 5: FSM State Transition Matrix Next State State A State B State C ...
Curren State Triggering Triggering Triggering State A Condition: Event Condition: Event Condition:
Event ...
1.1 1.2 1.3 Date Recue/Date Received 2022-04-19 Triggering Triggering State B Condition: Event Condition: Event Nonexistent ...
2.1 2.2 Triggering Triggering State C Condition: Event Nonexistent Condition:
Event ...
3.1 3.3 ... ... ... ... ...
The "action" here indicates a changing process in which the "current state" is transformed to the "next state" after the "triggering condition" has been satisfied.
In practical application, after the business personnel has drawn a basic state transition graph through drawing board dragging at the frontend platform, a possibly incomplete state transition torque will be generated at the backstage. The drawing board effect of frontstage state transition can be as shown with reference to Fig. 6. Considering restriction of the business technical capability, it is very difficult to guarantee the basic requirement on "logical completeness" of the state transition torque, so it is usual for backstage technical personnel to modify the newly generated state transition torques daily at fixed timing, so as to make them conform to the states shown in Fig. 5. A finite state machine model can be abstracted as the following combination of patterns:
START event END;
When the intended scenario of the user's conversation is recognized, it is possible to return the corresponding pattern content through logical entity of the finite state machine. The "conversation scenario" is a "condition/event". For instance, the current state is a "greeting" state, the condition/event is a "limit related" conversation scenario, and the next state of "consulting on limit" is transitioned after the action has been triggered and executed. In the next round of dialogue, "consulting on limit" is the current state, the condition/event is "negate", and the next state of "transaction not concluded" is transitioned after the action has been triggered and executed. A complete customer service dialogue process is thus completed. The state transition Date Recue/Date Received 2022-04-19 logic is as follows:
( greeting) limit related' ( consulting on limit) negate' (transaction not concluded) ;
The FSM state transition torque completely repaired above will be automatically translated to a script file of json format by the program.
In practical application, with respect to a consumption loan activity, the json script file form expressing its state transition torque logical relation can be as shown with reference to Fig. 7.
The json script file will be subsequently read and input by the program to a state machine object when a finite state machine instance is generated, to validate the logic. The state machine instance as generated will be stored in Redis as an index value according to uuid transferred by the frontend, so as to facilitate subsequent program access when the IVR service is started. It is also possible for the user to perform persistent operation on the state machine as required, so as to facilitate long-term stable invoking of the state machine. If the user selects a task type as "single task" at the frontend, the finite state machine instance of this task as stored in Redis will be cleared away as an invalidated object within a preset time period (such as 24 hours) after the IVR
marketing service has been triggered, so as to economize on storage resources.
Relative to the traditional development mode in which the marketing robot instances are trained according to business scenario requirements offline, since the traditional development mode relies on solidified configuration files and state transition logics, once environmental change occurs at the business side, such as requiring to modify the logic or the pattern, the reconstruction cost and risk will be huge at the R&D side, use of the technical solution of the present embodiment ensures timely response by the robot to the update of the configuration information through the mode of frontend configuration and backstage translation, moreover, configuration of file is decoupled from program application, so that update of the core assembly of the robot, namely the FSM, is made more simple and handy. Furthermore, the pattern section also relies on the frontstage for maintenance, thereby realizing timely validation.

Date Recue/Date Received 2022-04-19 In addition, in practical application, the developing personnel completes the abstraction of business logic and construction of the state transition graph according to product requirements on their own initiative, but there lacks direct participation and supervision by the business in the entire process of robot development, while the final application effect of the robot depends greatly on the understanding of the business background and the commercial environment, so this causes detachment of the development from the production and mismatch between requirement and response to a greater extent. However, the technical solution of the present embodiment makes it possible to overcome the traditional operation mode of "frontstage and backstage" in which "businesses define requirements, and technologies develop applications", and to greatly alleviate the problem of "detachment of development from production, and mismatch of requirement and response" pending in the traditional robot development pattern by realizing cooperative development of intelligent voice robots by business personnel and technical personnel through the pattern of "common definition of internal and external configurations".
In one embodiment, as shown in the flowchart schematically illustrating intended scenario recognition in Fig. 8, the method further comprises the following.
801 - preprocessing the user's conversation to obtain a plurality of segmented terms in the user's conversation after the well-configured intelligent voice robot has received the user's conversation, performing mapping transformation on the plural segmented terms, and obtaining a feature word sequence of the user's conversation.
Specifically, after the intelligent voice robot has been configured and generated, the intelligent voice robot can converse with the user, the user's conversation can be text content recognized and transcribed from user's conversation speech through automatic speech recognition (ASR) technology, the text content is word-segmented to obtain a plurality of segmented terms, and the word-segmentation process includes character purification, rectification, word segmentation, and removal of stop-words. The plural segmented terms are mapped and transformed through index of the domain dictionary to the form of expression like the "feature word sequence" of the Date Recue/Date Received 2022-04-19 conversation scenario in the scenario feature relation table, namely to obtain a feature word sequence of the user's conversation.
802 - employing the word vector space model to construct a feature vector of the user's conversation and scenario feature vectors of the various conversation scenarios on the basis of the feature word sequence of the user's conversation and feature word sequences of the various conversation scenarios.
In an example, the process of realizing step 802 can include:
employing the word vector space model to respectively map the feature word sequence of the user's conversation and the feature word sequences of the various conversation scenarios in the scenario feature relation table, and generating a feature vector of the user's conversation and scenario feature vectors of the various conversation scenarios.
Specifically speaking, each element in the feature word sequence of the user's conversation is mapped to the BERT word vector space to obtain a feature vector of 768 dimensions, and all elements are summated and averaged (or chosen with the maximal or medium value thereof) to obtain a 1x768 vector to serve as feature expression input by the user's conversation.
Correspondingly, the feature word sequences of the various conversation scenarios in the scenario feature relation table are likewise operated to be respectively converted to 1x768 feature vectors.
803 ¨ performing similarity calculation on the feature vector of the user's conversation and the scenario feature vectors of the various conversation scenarios, and recognizing intention of the user's conversation on the basis of a similarity calculation result, to return a pattern to which the intention corresponds.
Specifically, with respect to each conversation scenario, a cosine similarity between the feature vector input by the user's conversation and the scenario feature vector of the conversation Date Recue/Date Received 2022-04-19 scenario is calculated, the greater the value is, the higher will be the similarity, and the higher will also be the relevancy between the user's conversation and the conversation scenario. By arranging all the conversation scenarios in a descending order according to the cosine similarity calculation result, a conversation scenario with the highest similarity is returned to serve as the judgement result of the intended scenario of the current input by the user, and a corresponding response pattern under the intended scenario is returned according to the state in which the finite state machine currently is, wherein the response pattern will be transformed to speech content through Text-To-Speech (TTS) technology to be broadcast to the user. Forms of the response pattern can be as shown with reference to Table 6.
Table 6: State Pattern Relation Table Serial State Pattern Number 1 Greeting Hello, this is Xiao Su of the Customer Service Center.
2 Consulting This product is loanable for a revolving line of credit at the on limit maximum of 300 thousand RMBs, borrow and return as you, the loan can be completed within 30 minutes at the quickest, direct cash withdrawal is enabled to Your bank card for use after opening.
3 Consulting Hello, the expense standards are unified at our end, You can repay on interests the loan by 5 instalments, 10 instalments or 15 instalments, and You may also borrow and return as you go, as ours are more flexible as compared with banks on the market and other finance companies.
4 Consulting This loan will be brought in the personal credit investigation system on credit of the Central Bank, maintaining an excellent credit helps You enjoy subsequent, other services of the State and the banks.
Transaction You may enquire relevant activities on our applets, I thank You for not Your listening. Have a good day! Goodbye.
concluded ... - - - ...
Use of the technical solution of the present embodiment can enhance the precision of the result of recognizing the intention of the user's conversation by employing the word vector space model to construct a feature vector of the user's conversation and scenario feature vectors of the various Date Recue/Date Received 2022-04-19 conversation scenarios on the basis of the feature word sequence of the user's conversation and feature word sequences of the various conversation scenarios, and by performing similarity matching on the feature vector of the user's conversation and the scenario feature vectors of the various conversation scenarios, so as to recognize the intended scenario of the user's conversation.
As should be understood, although the various steps in the aforementioned flowcharts are sequentially displayed as indicated by arrows, these steps are not necessarily executed in the sequences indicated by arrows. Unless otherwise explicitly noted in this paper, execution of these steps is not restricted by any sequence, as these steps can also be executed in other sequences (than those indicated in the drawings). Moreover, at least partial steps in the flowcharts may include plural sub-steps or multi-phases, these sub-steps or phases are not necessarily completed at the same timing, but can be executed at different timings, and these sub-steps or phases are also not necessarily sequentially performed, but can be performed in turns or alternately with other steps or with at least some of sub-steps or phases of other steps.
In one embodiment, there is provided a device for realizing a configurable intelligent voice robot, and the device is employed for executing the method of realizing a configurable intelligent voice robot as recited in the above embodiment. As shown in Fig. 9, the realizing device can comprise:
an obtaining module 901, for obtaining sample corpora of various conversation scenarios in a plurality of conversation scenarios;
a generating module 902, for generating, with respect to each conversation scenario, a scenario feature of the conversation scenario on the basis of the sample corpus of the conversation scenario, wherein the scenario feature includes feature words of the conversation scenario and a feature word sequence obtained by mapping transformation of the feature words; and a configuring module 903, for configuring the intelligent voice robot on the basis of a preset word vector space model and scenario features of the various conversation scenarios, wherein the word vector space model is employed for the intelligent voice robot to perform word vector similarity calculation on a user's conversation and the scenario features of the various conversation scenarios, so as to recognize an intended scenario of the user's conversation.
Date Recue/Date Received 2022-04-19 In one embodiment, the generating module 902 can include:
a representing unit, for obtaining, with respect to each conversation scenario, discrete representation of the sample corpus of the conversation scenario on the basis of a preset domain dictionary;
a screening unit, for employing a feature selection algorithm to extract feature words of the conversation scenario on the basis of the discrete representation of the sample corpus of the conversation scenario; and a generating unit, for mapping to transform the features words of the conversation scenario to a corresponding dictionary index, and generating the feature word sequence of the conversation scenario;
preferably, the feature selection algorithm is a chi-square statistic feature selection algorithm.
In one embodiment, the device can further comprise:
a storing module 904, for storing the various conversation scenarios and the scenario features of the conversation scenarios in a scenario feature relation table;
a receiving module 905, for receiving a configuration feature word input with respect to any conversation scenario; and a maintaining module 906, for maintaining the scenario feature of the conversation scenario in the scenario feature relation table on the basis of the configuration feature word of the conversation scenario and a configuration feature word sequence obtained by mapping transformation of the configuration feature word.
Preferably, the receiving module 905 is employed for receiving a configuration feature word input by a user having feature configuration permission with respect to the conversation scenario.
In one embodiment, the maintaining module 906 is employed for merging the configuration feature word of the conversation scenario to the feature words of the conversation scenario in the scenario feature relation table, and adding the configuration feature word sequence of the merged Date Recue/Date Received 2022-04-19 configuration feature word to the feature word sequence of the conversation scenario.
In one embodiment, the device can further comprise a training module for employing domain corpora of a domain to which the various conversation scenarios pertain to train a pre-trained BERT word vector space, and obtaining the word vector space model.
In one embodiment, the device further comprises a state machine configuring module 907 for:
receiving, with respect to any conversation scenario, a state transition graph input by a first user for the conversation scenario, and receiving supplementary information input by a second user for the state transition graph, to generate a state transition matrix of the conversation scenario;
and generating a script file for containing state transition logical relation on the basis of the state transition matrix of the conversation scenario, and generating a finite state machine on the basis of the script file, to return a corresponding pattern when the intended scenario of the user's conversation is recognized.
In one embodiment, the device further comprises an intended scenario recognizing module 908 that includes:
an obtaining unit, for preprocessing the user's conversation to obtain a plurality of segmented terms in the user's conversation after the well-configured intelligent voice robot has received the user's conversation, performing mapping transformation on the plural segmented terms, and obtaining a feature word sequence of the user's conversation;
a constructing unit, for employing the word vector space model to construct a feature vector of the user's conversation and scenario feature vectors of the various conversation scenarios on the basis of the feature word sequence of the user's conversation and feature word sequences of the various conversation scenarios; and a matching unit, for performing similarity calculation on the feature vector of the user's conversation and the scenario feature vectors of the various conversation scenarios, and recognizing intention of the user's conversation on the basis of a similarity calculation result, to Date Recue/Date Received 2022-04-19 return a pattern to which the intention corresponds.
Specific definitions relevant to the device for realizing an intelligent voice robot may be inferred from the aforementioned definitions to the method of realizing an intelligent voice robot, while no repetition is made in this context. The various modules in the aforementioned device for realizing an intelligent voice robot can be wholly or partly realized via software, hardware, and a combination of software with hardware. The various modules can be embedded in the form of hardware in a processor in a computer equipment or independent of any computer equipment, and can also be stored in the form of software in a memory in a computer equipment, so as to facilitate the processor to invoke and perform operations corresponding to the aforementioned various modules.
In one embodiment, a computer equipment is provided, the computer equipment can be a server, and its internal structure can be as shown in Fig. 10. The computer equipment comprises a processor, a memory, and a network interface connected to each other via a system bus. The processor of the computer equipment is employed to provide computing and controlling capabilities. The memory of the computer equipment includes a nonvolatile storage medium and an internal memory. The nonvolatile storage medium stores therein an operating system, and a computer program. The internal memory provides environment for the running of the operating system and the computer program in the nonvolatile storage medium. The network interface of the computer equipment is employed to connect to other equipments via network for communication. The computer program realizes a method of realizing an intelligent voice robot when it is executed by a processor.
As understandable to persons skilled in the art, the structure illustrated in Fig. 10 is merely a block diagram of partial structure relevant to the solution of the present application, and does not constitute any restriction to the computer equipment on which the solution of the present application is applied, as the specific computer equipment may comprise component parts that are more than or less than those illustrated in Fig. 10, or may combine certain component parts, Date Recue/Date Received 2022-04-19 or may have different layout of component parts.
In one embodiment, there is provided a computer equipment that comprises a memory, a processor and a computer program stored on the memory and operable on the processor, and the following steps are realized when the processor executes the computer program:
generating, with respect to each conversation scenario, a scenario feature of the conversation scenario on the basis of the sample corpus of the conversation scenario, wherein the scenario feature includes feature words of the conversation scenario and a feature word sequence obtained by mapping transformation of the feature words; and configuring the intelligent voice robot on the basis of a preset word vector space model and scenario features of the various conversation scenarios, wherein the word vector space model is employed for the intelligent voice robot to perform word vector similarity calculation on a user's conversation and the scenario features of the various conversation scenarios, so as to recognize an intended scenario of the user's conversation.
In one embodiment, there is provided a computer-readable storage medium storing thereon a computer program, and the following steps are realized when the computer program is executed by a processor:
generating, with respect to each conversation scenario, a scenario feature of the conversation scenario on the basis of the sample corpus of the conversation scenario, wherein the scenario feature includes feature words of the conversation scenario and a feature word sequence obtained by mapping transformation of the feature words; and configuring the intelligent voice robot on the basis of a preset word vector space model and scenario features of the various conversation scenarios, wherein the word vector space model is employed for the intelligent voice robot to perform word vector similarity calculation on a user's conversation and the scenario features of the various conversation scenarios, so as to recognize an intended scenario of the user's conversation.
In summary, as compared with prior-art technology, the technical solutions according to the Date Recue/Date Received 2022-04-19 embodiments of the present application can achieve the following technical effects.
1. Use of the configurable intelligent voice robot according to the present application can save R&D cost for approximately 60% by comprehensive consideration. In particular as regards individual business fields in which marketing scenarios are relatively concentrated, the cost saved is positively correlated with the number of scenarios in the present application as compared with the traditional pattern.
2. The frontend configuration validation pattern as employed by the present application enables the robot to be updated securely, stably, and handily according to change in the business environment, and validation is made possible at the next day, whereby great convenience is brought to the business side, and running stability is enhanced at the same time at the technical side.
3. In conjunction with the frontend platform, the business personnel can directly participate in the R&D process of core assemblies of the marketing robot, whereby satisfaction and the sense of participation are greatly enhanced for the business personnel.
4. In cooperation with the frontend voice servicing platform, the robot provides the business side with a whole set of closed-loop solving scheme from "requirement submission" to "intelligent voice robot generation" to "IVR telemarketing initiation" and then to "feedback of adjustment requirement", thus enabling one-step response to customer intelligent marketing IVR
requirements.
As comprehensible to persons ordinarily skilled in the art, the entire or partial flows in the methods according to the aforementioned embodiments can be completed via a computer program instructing relevant hardware, the computer program can be stored in a nonvolatile computer-readable storage medium, and the computer program can include the flows as embodied in the aforementioned various methods when executed. Any reference to the memory, Date Recue/Date Received 2022-04-19 storage, database or other media used in the various embodiments provided by the present application can all include nonvolatile and/or volatile memory/memories. The nonvolatile memory can include a read-only memory (ROM), a programmable ROM (PROM), an electrically programmable ROM (EPROM), an electrically erasable and programmable ROM
(EEPROM) or a flash memory. The volatile memory can include a random access memory (RAM) or an external cache memory. To serve as explanation rather than restriction, the RAM is obtainable in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM
(RDRAM), direct Rambus dynamic RAM (DRDRAM), and Rambus dynamic RAM (RDRAM), etc.
Technical features of the aforementioned embodiments are randomly combinable, while all possible combinations of the technical features in the aforementioned embodiments are not exhausted for the sake of brevity, but all these should be considered to fall within the scope recorded in the Description as long as such combinations of the technical features are not mutually contradictory.
The foregoing embodiments are merely directed to several modes of execution of the present application, and their descriptions are relatively specific and detailed, but they should not be hence misunderstood as restrictions to the inventive patent scope. As should be pointed out, persons with ordinary skill in the art may further make various modifications and improvements without departing from the conception of the present application, and all these should pertain to the protection scope of the present application. Accordingly, the patent protection scope of the present application shall be based on the attached Claims.

Date Recue/Date Received 2022-04-19

Claims (290)

Claims:
1. A device comprising:
an obtaining module, configured to obtain sample corpora of various conversation scenarios in a plurality of conversation scenarios;
a generating module, configured to generate a scenario feature of the conversation scenario based on a sample corpus of the conversation scenario, wherein the scenario feature includes feature words of the conversation scenario and a feature word sequence obtained by mapping transformation of the feature words; and a configuring module, configured to configure an intelligent voice robot based on a preset word vector space model and scenario features of the various conversation scenarios, wherein the word vector space model is for the intelligent voice robot to perfomi word vector similarity calculation on a user's conversation and the scenario features of the various conversation scenarios, to recognize an intended scenario of the user's conversation.
2. The device of claim 1, wherein the generating module further comprises:
a representing unit, configured to obtain discrete representation of the sample corpus of the conversation scenario based on a preset domain dictionary;
a screening unit, configured to apply a feature selection algorithm to extract feature words of the conversation scenario based on the discrete representation of the sample corpus of the conversation scenario; and a generating unit, configured to:
map and transform the feature words of the conversation scenario to a corresponding dictionary index;
generate the feature word sequence of the conversation scenario, wherein the feature selection algorithm is a chi-square statistic feature selection algorithm.
3. The device of claim 1, further comprises:
a storing module, configured to store the various conversation scenarios and the scenario features of the conversation scenarios in a scenario feature relation table;
4. The device of claim 3, further comprises:
a receiving module, configured to receive a configuration feature word input with respect to any conversation scenario; and a maintaining module, configured to maintain the scenario feature of the conversation scenario in the scenario feature relation table based on the configuration feature word of the conversation scenario and a configuration feature word sequence obtained by mapping transformation of the configuration feature word.
5. The device of claim 4, wherein the receiving module is configured to receive a configuration feature word input by a user having feature configuration permission with respect to the conversation scenario.
6. The device of claim 4, wherein the maintaining module is configured to:
merge the configuration feature word of the conversation scenario to the feature words of the conversation scenario in the scenario feature relation table;
and add the configuration feature word sequence of the merged configuration feature word to the feature word sequence of the conversation scenario.
7. The device of claim 1, further comprises a training module configured to use a domain corpora of a domain to which the various conversation scenarios pertain to train a pre-trained bidirectional encoder representations from transformers (BERT) word vector space, and obtain the word vector space model.
8. The device of claim 1, further comprises a state machine configuring module configured to:

receive a state transition graph input by a first user for the conversation scenario and receive supplementary information input by a second user for a state transition graph, to generate a state transition matrix of the conversation scenario; and generate a script file for containing state transition logical relation based on the state transition matrix of the conversation scenario and generate a finite state machine (FSM) based on the script file, to return a corresponding pattern when the intended scenario of the user's conversation is recognized.
9. The device of claim 1, further comprises an intended scenario recognizing module comprising:
an obtaining unit, configured to:
preprocess the user's conversation to obtain a plurality of segmented terms in the user's conversation;
perform mapping transformation on the plural segmented terms;
obtain a feature word sequence of the user's conversation;
a constructing unit, configured to implement the word vector space model to construct a feature vector of the user's conversation and scenario feature vectors of the various conversation scenarios based on the feature word sequence of the user's conversation and feature word sequences of the various conversation scenarios;
a matching unit, configured to:
perform similarity calculation on the feature vector of the user's conversation and the scenario feature vectors of the various conversation scenarios; and recognize intention of the user's conversation based on a similarity calculation result, to return a pattern to which the intention corresponds.
10. The device of any one of claims 1 to 9, wherein the plurality of conversation scenarios are contained in a preset conversation scenario list, wherein the conversation scenario list is to record one or more conversation scenarios of a specific business field.
11. The device of any one of claims 1 to 10, wherein the sample corpora of the various conversation scenarios are obtained by classifying and marking specific domain corpora according to classes of the conversation scenarios.
12. The device of any one of claims 1 to 11, wherein the conversation scenarios are obtained by performing scenario abstraction on the specific domain corpora, and the scenario abstraction is a process from data to information and then to knowledge.
13. The device of any one of claims 1 to 12, wherein each conversation scenario is abstracted as a conversation state, and dialogue process is abstracted as a transition between conversation states.
14. The device of any one of claims 1 to 13, wherein the conversation state is taken as a node, an oriented line between the conversation states is a process in which one state is transferred to another state, wherein entire dialogue process is abstracted as a graph consisting of nodes and oriented lines.
15. The device of any one of claims 1 to 14, wherein content transformation based on a word of bag (WOB) model is performed on the sample corpus of each conversation scenario, wherein discrete representation of the sample corpus of each conversation scenario is obtained, wherein the feature selection algorithm is to extract feature words of each conversation scenario on the basis of the discrete representation of the sample corpus of each conversation scenario.
16. The device of any one of claims 1 to 15, wherein the WOB model divides a corpus text into separate terms, and it is imaged that all terms are placed in a bag, such elements as their word orders, grammars and syntaxes are ignored, wherein the bag is only regarded as a collection of plural vocabularies, wherein each term as it appears in the corpus text is independent.
17. The device of any one of claims 1 to 16, wherein the WOB model includes a one-hot, TF-EDF and N-gram model.
18. The device of any one of claims 1 to 17, wherein name, the feature words and the feature word sequence of each conversation scenario are correspondingly stored in the scenario feature relation table.
19. The device of any one of claims 1 to 18, wherein the scenario feature relation table stores correspondence relations between conversation scenarios and scenario features, including feature words and feature word sequences.
20. The device of any one of claims 1 to 19, wherein the scenario feature relation table is stored in a server, maintained offline by backstage algorithm technical personnel based on periodical text data mining operations, and isolated from frontstage business personnel.
21. The device of any one of claims 1 to 20, wherein when the intelligent voice robot is conversing with the user, obtain user's conversation text by recognizing and transforming user's conversation voice via an automatic speech recognition (ASR) technique, and extract feature information out of the user's conversation text.
22. The device of any one of claims 1 to 21, wherein pre-trained embedding is obtained by introducing large-scale BERT word vector space, pretrained in google bert serving.
23. The device of any one of claims 1 to 22, wherein the BERT word vector space is retrained by bringing in own business customer service corpora, and calibration of the BERT word vector is realized, to conform to specific business scenarios.
24. The device of any one of claims 1 to 23, wherein total segmentation corpora where stop-words are removed are to construct the preset domain dictionary, wherein the preset domain dictionary includes total effective vocabularies appearing in the corpora, and the preset domain dictionary is to perform content transformation on all sample corpora of a target conversation scenario based on the WOB model to obtain the discrete representation.
25. The device of any one of claims 1 to 24, wherein the chi-square statistic (CHI) technique is to extract feature words of the target conversation scenario.
26. The device of any one of claims 1 to 25, wherein CHI calculation formula comprises:
wherein c is a certain class, namely a "conversation scenario", t is a certain term, and N is a total number of texts in a training corpora.
27. The device of claim 26, wherein the x2 is for chi square hypothesis verification in statistics, to judge uniformity or goodness of fit between actual distribution and theoretical distribution, wherein a null hypothesis 1/0 being "no marked difference between observed frequency and desired frequency".
28. The device of any one of claims 26 to 27, wherein less the chi-square statistic is, the closer the observed frequency to the desired frequency, and higher relevancy between.
29. The device of any one of claims 26 to 28, wherein X2 is a measure of distance between an observed object and a desired object, wherein less the distance is, higher the relevancy between.
30. The device of any one of claims 26 to 29, wherein the observed object is the term, and the desired object is the conversation scenario, wherein if the term and the conversation scenario are highly relevant, statistic distributions of the two are close to each other in the entire samples.
31. The device of any one of claims 26 to 30, wherein through x2 statistic, relevancies between all vocabularies in the domain dictionary and the various classes are calculated quickly and accurately based on quantities of corpora, and a preset number of terms with the x2 selected according to relevancy sorting result to serve as a feature set of conversation scenarios, to complete feature mapping of various scenarios and the various classes in the conversation scenario list.
32. The device of any one of claims 1 to 31, wherein the configuration feature word input by the user with respect to the target conversion scenario through a system frontend.
33. The device of any one of claims 1 to 32, wherein the system frontend provides business personnel with a feature relation extension function to maintain the business field.
34. The device of any one of claims 1 to 33, wherein on receiving the configuration feature word, updates it into an externally input feature set of the target conversion scenario.
35. The device of any one of claims 1 to 34, wherein the externally input configuration feature word is not contained in the domain dictionary, the configuration feature word is ignored.
36. The device of any one of claims 1 to 35, wherein the first user is business personnel, the second user is an algorithm developing technical personnel.
37. The device of any one of claims 1 to 36, wherein the FSM is a mathematical model that expresses a finite number of states and behaviors as transitions and actions amongst these states.
38. The device of any one of claims 1 to 37, wherein the FSM is to describe state sequences of an object experienced within the objects life cycle, and how to respond to various events coming from outside to transition between the states.
39. The device of any one of claims 1 to 38, wherein the FSM includes current state, condition, action, and next state.
40. The device of any one of claims 1 to 39, wherein the current state is a state currently being in, wherein the condition is referred to as event, wherein the condition is satisfied, an action is triggered or one round of state transition is performed;
41. The device of any one of claims 1 to 40, wherein the action indicates the action executed after the condition has been satisfied, wherein the action has been executed to completion, a new state is transitioned, or the original state is maintained.
42. The device of any one of claims 1 to 41, wherein the action is not indispensable, after the condition has been satisfied, not execute any action and directly transition to the new state;
43. The device of any one of claims 1 to 42, wherein the next state is the new state to be transitioned after the condition has been satisfied.
44. The device of any one of claims 1 to 43, wherein the next state is relative to the current state, once the next state is activated, it becomes a new current state.
45. The device of any one of claims 1 to 44, wherein FSM model is abstracted comprising:
START event'END.
46. The device of any one of claims 1 to 45, wherein the FSM state transition torque completely repaired is automatically translated to the script file of j son format.
47. The device of any one of claims 1 to 46, wherein the json script file is read and input by the program to a state machine object when a finite state machine instance is generated, to validate logic.
48. The device of any one of claims 1 to 47, wherein the finite state machine instance as generated is stored in Redis as an index value according to uuid transferred by the frontend, to facilitate subsequent program access when the interactive voice response (IVR) service starts.
49. The device of any one of claims 1 to 48, wherein the user performs persistent operation on the FSM.
50. The device of any one of claims 1 to 49, wherein the user selects a task type as a single task at the frontend, the finite state machine instance of the single task as stored in Redis is cleared away as an invalidated object within a preset time period after the IVR marketing service has been triggered.
51. The device of any one of claims 1 to 50, wherein the intelligent voice robot converses with the user, wherein the user's conversation is text content recognized and transcribed from user's conversation speech through ASR technology.
52. The device of any one of claims 1 to 51, wherein the text content is word-segmented to obtain the plurality of segmented terms, and word-segmentation process includes character purification, rectification, word segmentation, and removal of stop-words.
53. The device of any one of claims 1 to 52, wherein plural segmented terms are mapped and transformed through index of the domain dictionary to form of expression of the conversation scenario in the scenario feature relation table.
54. The device of any one of claims 1 to 53, wherein each element in the feature word sequence of the user's conversation is mapped to the BERT word vector space to obtain the feature vector of 768 dimensions, and all elements are summated and averaged to obtain a 1x768 vector to serve as feature expression input by the user's conversation.
55. The device of any one of claims 1 to 54, wherein the feature word sequences of the various conversation scenarios in the scenario feature relation table are operated is converted to 1x768 feature vectors.
56. The device of any one of claims 1 to 55, wherein a cosine similarity between the feature vector input by the user's conversation and the scenario feature vector of the conversation scenario is calculated, wherein the greater cosine similarity calculation result, the higher similarity, and the higher cosine similarity calculation result is the relevancy between the user's conversation and the conversation scenario.
57. The device of any one of claims 1 to 56, wherein by arranging all the conversation scenarios in a descending order according to the cosine similarity calculation result, the conversation scenario with the highest cosine similarity calculation result is returned to serve as a judgement result of the intended scenario of current input by the user, and a corresponding response pattern under the intended scenario is returned according to the state in which the FSM is currently.
58. The device of any one of claims 1 to 57, wherein the response pattern is transformed to speech content through Text-To-Speech (TTS) technology is broadcast to the user.
59. A system comprising:
an obtaining module, configured to obtain sample corpora of various conversation scenarios in a plurality of conversation scenarios;
a generating module, configured to generate a scenario feature of the conversation scenario based on a sample corpus of the conversation scenario, wherein the scenario feature includes feature words of the conversation scenario and a feature word sequence obtained by mapping transformation of the feature words; and a configuring module, configured to configure an intelligent voice robot based on a preset word vector space model and scenario features of the various conversation scenarios, wherein the word vector space model is for the intelligent voice robot to perfomi word vector similarity calculation on a user's conversation and the scenario features of the various conversation scenarios, to recognize an intended scenario of the user's conversation.
60. The system of claim 59, wherein the generating module further comprises:
a representing unit, configured to obtain discrete representation of the sample corpus of the conversation scenario based on a preset domain dictionary;
a screening unit, configured to apply a feature selection algorithm to extract feature words of the conversation scenario based on the discrete representation of the sample corpus of the conversation scenario; and a generating unit, configured to:
map and transform the feature words of the conversation scenario to a corresponding dictionary index;

generate the feature word sequence of the conversation scenario, wherein the feature selection algorithm is a chi-square statistic feature selection algorithm.
61. The system of claim 59, further comprises:
a storing module, configured to store the various conversation scenarios and the scenario features of the conversation scenarios in a scenario feature relation table;
62. The system of claim 61, further comprises:
a receiving module, configured to receive a configuration feature word input with respect to any conversation scenario; and a maintaining module, configured to maintain the scenario feature of the conversation scenario in the scenario feature relation table based on the configuration feature word of the conversation scenario and a configuration feature word sequence obtained by mapping transformation of the configuration feature word.
63. The system of claim 62, wherein the receiving module is configured to receive a configuration feature word input by a user having feature configuration permission with respect to the conversation scenario.
64. The system of claim 62, wherein the maintaining module is configured to:
merge the configuration feature word of the conversation scenario to the feature words of the conversation scenario in the scenario feature relation table;
and add the configuration feature word sequence of the merged configuration feature word to the feature word sequence of the conversation scenario.
65. The system of claim 59, further comprises a training module configured to use a domain corpora of a domain to which the various conversation scenarios pertain to train a pre-trained bidirectional encoder representations from transformers (BERT) word vector space, and obtain the word vector space model.
66. The system of claim 59, further comprises a state machine configuring module configured to:
receive a state transition graph input by a first user for the conversation scenario and receive supplementary information input by a second user for a state transition graph, to generate a state transition matrix of the conversation scenario; and generate a script file for containing state transition logical relation based on the state transition matrix of the conversation scenario and generate a finite state machine (FSM) based on the script file, to return a corresponding pattern when the intended scenario of the user's conversation is recognized.
67. The system of claim 59, further comprises an intended scenario recognizing module comprising:
an obtaining unit, configured to:
preprocess the user's conversation to obtain a plurality of segmented terms in the user's conversation;
perform mapping transformation on the plural segmented terms;
obtain a feature word sequence of the user's conversation;
a constructing unit, configured to implement the word vector space model to construct a feature vector of the user's conversation and scenario feature vectors of the various conversation scenarios based on the feature word sequence of the user's conversation and feature word sequences of the various conversation scenarios;
a matching unit, configured to:
perform similarity calculation on the feature vector of the user's conversation and the scenario feature vectors of the various conversation scenarios; and recognize intention of the user's conversation based on a similarity calculation result, to return a pattern to which the intention corresponds.
68. The system of any one of claims 59 to 67, wherein the plurality of conversation scenarios are contained in a preset conversation scenario list, wherein the conversation scenario list is to record one or more conversation scenarios of a specific business field.
69. The system of any one of claims 59 to 68, wherein the sample corpora of the various conversation scenarios are obtained by classifying and marking specific domain corpora according to classes of the conversation scenarios.
70. The system of any one of claims 59 to 69, wherein the conversation scenarios are obtained by performing scenario abstraction on the specific domain corpora, and the scenario abstraction is a process from data to information and then to knowledge.
71. The system of any one of claims 59 to 70, wherein each conversation scenario is abstracted as a conversation state, and dialogue process is abstracted as a transition between conversation states.
72. The system of any one of claims 59 to 71, wherein the conversation state is taken as anode, an oriented line between the conversation states is a process in which one state is transferred to another state, wherein entire dialogue process is abstracted as a graph consisting of nodes and oriented lines.
73. The system of any one of claims 59 to 72, wherein content transformation based on a word of bag (WOB) model is performed on the sample corpus of each conversation scenario, wherein discrete representation of the sample corpus of each conversation scenario is obtained, wherein the feature selection algorithm is to extract feature words of each conversation scenario on the basis of the discrete representation of the sample corpus of each conversation scenario.
74. The system of any one of claims 59 to 73, wherein the WOB model divides a corpus text into separate terms, and it is imaged that all terms are placed in a bag, such elements as their word orders, grammars and syntaxes are ignored, wherein the bag is only regarded as a collection of plural vocabularies, wherein each term as it appears in the corpus text is independent.
75. The system of any one of claims 59 to 74, wherein the WOB model includes a one-hot, TF-EDF and N-gram model.
76. The system of any one of claims 59 to 75, wherein name, the feature words and the feature word sequence of each conversation scenario are correspondingly stored in the scenario feature relation table.
77. The system of any one of claims 59 to 76, wherein the scenario feature relation table stores correspondence relations between conversation scenarios and scenario features, including feature words and feature word sequences.
78. The system of any one of claims 59 to 77, wherein the scenario feature relation table is stored in a server, maintained offline by backstage algorithm technical personnel based on periodical text data mining operations, and isolated from frontstage business personnel.
79. The system of any one of claims 59 to 78, wherein when the intelligent voice robot is conversing with the user, obtain user's conversation text by recognizing and transforming user's conversation voice via an automatic speech recognition (ASR) technique, and extract feature information out of the user's conversation text.
80. The system of any one of claims 59 to 79, wherein pre-trained embedding is obtained by introducing large-scale BERT word vector space, pretrained in google bert serving.
81. The system of any one of claims 59 to 80, wherein the BERT word vector space is retrained by bringing in own business customer service corpora, and calibration of the BERT word vector is realized, to conform to specific business scenarios.
82. The system of any one of claims 59 to 81, wherein total segmentation corpora where stop-words are removed are to construct the preset domain dictionary, wherein the preset domain dictionary includes total effective vocabularies appearing in the corpora, and the preset domain dictionary is to perfomi content transformation on all sample corpora of a target conversation scenario based on the WOB model to obtain the discrete representation.
83. The system of any one of claims 59 to 82, wherein the chi-square statistic (CBI) technique is to extract feature words of the target conversation scenario.
84. The system of any one of claims 59 to 83, wherein CHI calculation formula comprises:
wherein c is a certain class, namely a "conversation scenario", t is a certain term, and N is a total number of texts in a training corpora.
85. The system of claim 84, wherein the x2 is for chi square hypothesis verification in statistics, to judge uniformity or goodness of fit between actual distribution and theoretical distribution, wherein a null hypothesis 1/0 being "no marked difference between observed frequency and desired frequency".
86. The system of any one of claims 84 to 85, wherein less the chi-square statistic is, the closer the observed frequency to the desired frequency, and higher relevancy between.
87. The system of any one of claims 84 to 86, wherein x' is a measure of distance between an observed object and a desired object, wherein less the distance is, higher the relevancy between.
88. The system of any one of claims 84 to 87, wherein the observed object is the term, and the desired object is the conversation scenario, wherein if the term and the conversation scenario are highly relevant, statistic distributions of the two are close to each other in the entire samples.
89. The system of any one of claims 84 to 88, wherein through e statistic, relevancies between all vocabularies in the domain dictionary and the various classes are calculated quickly and accurately based on quantities of corpora, and a preset number of terms with the x2 selected according to relevancy sorting result to serve as a feature set of conversation scenarios, to complete feature mapping of various scenarios and the various classes in the conversation scenario list.
90. The system of any one of claims 59 to 89, wherein the configuration feature word input by the user with respect to the target conversion scenario through a system frontend.
91. The system of any one of claims 59 to 90, wherein the system frontend provides business personnel with a feature relation extension function to maintain the business field.
92. The system of any one of claims 59 to 91, wherein on receiving the configuration feature word, updates it into an externally input feature set of the target conversion scenario.
93. The system of any one of claims 59 to 92, wherein the externally input configuration feature word is not contained in the domain dictionary, the configuration feature word is ignored.
94. The system of any one of claims 59 to 93, wherein the first user is business personnel, the second user is an algorithm developing technical personnel.
95. The system of any one of claims 59 to 94, wherein the FSIV1 is a mathematical model that expresses a finite number of states and behaviors as transitions and actions amongst these states.
96. The system of any one of claims 59 to 95, wherein the FSIV1 is to describe state sequences of an object experienced within the objects life cycle, and how to respond to various events coming from outside to transition between the states.
97. The system of any one of claims 59 to 96, wherein the FSIV1 includes current state, condition, action, and next state.
98. The system of any one of claims 59 to 97, wherein the current state is a state currently being in, wherein the condition is referred to as event, wherein the condition is satisfied, an action is triggered or one round of state transition is performed;
99. The system of any one of claims 59 to 98, wherein the action indicates the action executed after the condition has been satisfied, wherein the action has been executed to completion, a new state is transitioned, or the original state is maintained.
100. The system of any one of claims 59 to 99, wherein the action is not indispensable, after the condition has been satisfied, not execute any action and directly transition to the new state;
101. The system of any one of claims 59 to 100, wherein the next state is the new state to be transitioned after the condition has been satisfied.
102. The system of any one of claims 59 to 101, wherein the next state is relative to the current state, once the next state is activated, it becomes a new current state.
103. The system of any one of claims 59 to 102, wherein FSM model is abstracted comprising:
104. The system of any one of claims 59 to 103, wherein the FSM state transition torque completely repaired is automatically translated to the script file of json format.
105. The system of any one of claims 59 to 104, wherein the json script file is read and input by the program to a state machine object when a finite state machine instance is generated, to validate logic.
106. The system of any one of claims 59 to 105, wherein the finite state machine instance as generated is stored in Redis as an index value according to uuid transferred by the frontend, to facilitate subsequent program access when the interactive voice response (IVR) service starts.
107. The system of any one of claims 59 to 106, wherein the user performs persistent operation on the FSM.
108. The system of any one of claims 59 to 107, wherein the user selects a task type as a single task at the frontend, the finite state machine instance of the single task as stored in Redis is cleared away as an invalidated object within a preset time period after the IVR marketing service has been triggered.
109. The system of any one of claims 59 to 108, wherein the intelligent voice robot converses with the user, wherein the user's conversation is text content recognized and transcribed from user's conversation speech through ASR technology.
110. The system of any one of claims 59 to 109, wherein the text content is word-segmented to obtain the plurality of segmented terms, and word-segmentation process includes character purification, rectification, word segmentation, and removal of stop-words.
111. The system of any one of claims 59 to 110, wherein plural segmented terms are mapped and transformed through index of the domain dictionary to form of expression of the conversation scenario in the scenario feature relation table.
112. The system of any one of claims 59 to 111, wherein each element in the feature word sequence of the user's conversation is mapped to the BERT word vector space to obtain the feature vector of 768 dimensions, and all elements are summated and averaged to obtain a 1x768 vector to serve as feature expression input by the user's conversation.
113. The system of any one of claims 59 to 112, wherein the feature word sequences of the various conversation scenarios in the scenario feature relation table are operated is converted to 1x768 feature vectors.
114. The system of any one of claims 59 to 113, wherein a cosine similarity between the feature vector input by the user's conversation and the scenario feature vector of the conversation scenario is calculated, wherein the greater cosine similarity calculation result, the higher similarity, and the higher cosine similarity calculation result is the relevancy between the user's conversation and the conversation scenario.
115. The system of any one of claims 59 to 114, wherein by arranging all the conversation scenarios in a descending order according to the cosine similarity calculation result, the conversation scenario with the highest cosine similarity calculation result is returned to serve as a judgement result of the intended scenario of current input by the user, and a corresponding response pattern under the intended scenario is returned according to the state in which the FSM is currently.
116. The system of any one of claims 59 to 115, wherein the response pattern is transformed to speech content through Text-To-Speech (TTS) technology is broadcast to the user.
117.A method comprising:
obtaining sample corpora of various conversation scenarios in a plurality of conversation scenarios;
generating a scenario feature of the conversation scenario based on a sample corpus of the conversation scenario, wherein the scenario feature includes feature words of the conversation scenario and a feature word sequence obtained by mapping transformation of the feature words; and configuring an intelligent voice robot with a preset word vector space model and the scenario features of the various conversation scenarios, wherein the word vector space model is for the intelligent voice robot to perform word vector similarity calculation on a user's conversation and the scenario features of the various conversation scenarios, to recognize an intended scenario of the user's conversation.
118. The method of claim 117, wherein generating the scenario feature of the conversation scenario based on the sample corpus of the conversation scenario comprises:
obtaining discrete representation of the sample corpus of the conversation scenario based on a preset domain dictionary;

employing a feature selection algorithm to extract feature words of the conversation scenario based on the discrete representation of the sample corpus of the conversation scenario;
mapping and transforming the feature words of the conversation scenario to a corresponding dictionary index; and generating the feature word sequence of the conversation scenario, wherein the feature selection algorithm is a chi-square statistic feature selection algorithm.
119. The method of claim 117, further comprises:
storing the various conversation scenarios and the scenario features of the conversation scenarios in a scenario feature relation table.
120. The method of claim 117, further comprises:
receiving a configuration feature word input with respect to the conversation scenario;
and maintaining the scenario feature of the conversation scenario in the scenario feature relation table based on the configuration feature word of the conversation scenario and a configuration feature word sequence obtained by mapping transformation of the configuration feature word.
121. The method of claim 120, further comprises receiving the configuration feature word input by the user having feature configuration permission with respect to the conversation scenario.
122. The method of claim 120, wherein maintaining the scenario feature of the conversation scenario in the scenario feature relation table based on the configuration feature word of the conversation scenario and the configuration feature word sequence obtained by mapping transformation of the configuration feature word comprises:

merging the configuration feature word of the conversation scenario to the feature words of the conversation scenario in the scenario feature relation table; and adding the configuration feature word sequence of a merged configuration feature word to the feature word sequence of the conversation scenario.
123. The method of claim 117, wherein the word vector space model is trained and obtained comprises:
applying a domain corpora of a domain to the various conversation scenarios pertaining to train a pre-trained bidirectional encoder representations from transfonners (BERT) word vector space; and obtaining the word vector space model.
124. The method of claim 117, further comprises:
receiving a state transition graph input by a first user for the conversation scenario and receiving supplementary information input by a second user for a state transition graph, to generate a state transition matrix of the conversation scenario; and generating a script file for containing state transition logical relation based on the state transition matrix of the conversation scenario and generating a finite state machine (FSM) based on the script file, to return a corresponding pattern when the intended scenario of the user's conversation is recognized.
125. The method of any one of claims 117 to 124, further comprises:
preprocessing the user's conversation to obtain a plurality of segmented terms in the user's conversation;
performing mapping transformation on the segmented terms;
obtaining a feature word sequence of the user's conversation;

using the word vector space model to construct a feature vector of the user's conversation and scenario feature vectors of the various conversation scenarios based on the feature word sequence of the user's conversation and feature word sequences of the various conversation scenarios;
performing similarity calculation on the feature vector of the user's conversation and the scenario feature vectors of the various conversation scenarios; and recognizing intention of the user's conversation based on a similarity calculation result, to return a pattern to which the intention corresponds.
126. The method of any one of claims 117 to 125, wherein the plurality of conversation scenarios are contained in a preset conversation scenario list, wherein the conversation scenario list is to record one or more conversation scenarios of a specific business field.
127. The method of any one of claims 117 to 126, wherein the sample corpora of the various conversation scenarios are obtained by classifying and marking specific domain corpora according to classes of the conversation scenarios.
128. The method of any one of claims 117 to 127, wherein the conversation scenarios are obtained by perfonning scenario abstraction on the specific domain corpora, and the scenario abstraction is a process from data to information and then to knowledge.
129. The method of any one of claims 117 to 128, wherein each conversation scenario is abstracted as a conversation state, and dialogue process is abstracted as a transition between conversation states.
130. The method of any one of claims 117 to 129, wherein the conversation state is taken as a node, an oriented line between the conversation states is a process in which one state is transferred to another state, wherein entire dialogue process is abstracted as a graph consisting of nodes and oriented lines.
131. The method of any one of claims 117 to 130, wherein content transformation based on a word of bag (WOB) model is performed on the sample corpus of each conversation scenario, wherein discrete representation of the sample corpus of each conversation scenario is obtained, wherein the feature selection algorithm is to extract feature words of each conversation scenario on the basis of the discrete representation of the sample corpus of each conversation scenario.
132. The method of any one of claims 117 to 131, wherein the WOB model divides a corpus text into separate terms, and it is imaged that all terms are placed in a bag, such elements as their word orders, grammars and syntaxes are ignored, wherein the bag is only regarded as a collection of plural vocabularies, wherein each term as it appears in the corpus text is independent.
133. The method of any one of claims 117 to 132, wherein the WOB model includes a one-hot, `11, -IDF and N-gram model.
134. The method of any one of claims 117 to 133, wherein name, the feature words and the feature word sequence of each conversation scenario are correspondingly stored in the scenario feature relation table.
135. The method of any one of claims 117 to 134, wherein the scenario feature relation table stores correspondence relations between conversation scenarios and scenario features, including feature words and feature word sequences.
136. The method of any one of claims 117 to 135, wherein the scenario feature relation table is stored in a server, maintained offline by backstage algorithm technical personnel based on periodical text data mining operations, and isolated from frontstage business personnel.
137. The method of any one of claims 117 to 136, wherein when the intelligent voice robot is conversing with the user, obtain user's conversation text by recognizing and transforming user's conversation voice via an automatic speech recognition (ASR) technique, and extract feature information out of the user's conversation text.
138. The method of any one of claims 117 to 137, wherein pre-trained embedding is obtained by introducing large-scale BERT word vector space, pretrained in google bert serving.
139. The method of any one of claims 117 to 138, wherein the BERT word vector space is retrained by bringing in own business customer service corpora, and calibration of the BERT word vector is realized, to conform to specific business scenarios.
140. The method of any one of claims 117 to 139, wherein total segmentation corpora where stop-words are removed are to construct the preset domain dictionary, wherein the preset domain dictionary includes total effective vocabularies appearing in the corpora, and the preset domain dictionary is to perform content transformation on all sample corpora of a target conversation scenario based on the WOB model to obtain the discrete representation.
141. The method of any one of claims 117 to 140, wherein the chi-square statistic (CHI) technique is to extract feature words of the target conversation scenario.
142. The method of any one of claims 117 to 141, wherein CHI calculation formula comprises:
wherein c is a certain class, namely a "conversation scenario", t is a certain term, and N is a total number of texts in a training corpora.
143. The method of claim 142, wherein the X2 is for chi square hypothesis verification in statistics, to judge uniformity or goodness of fit between actual distribution and theoretical distribution, wherein a null hypothesis 1/0 being "no marked difference between observed frequency and desired frequency".
144. The method of any one of claims 141 to 143, wherein less the chi-square statistic is, the closer the observed frequency to the desired frequency, and higher relevancy between.
145. The method of any one of claims 141 to 144, wherein x2 is a measure of distance between an observed object and a desired object, wherein less the distance is, higher the relevancy between.
146. The method of any one of claims 141 to 143, wherein the observed object is the tem, and the desired object is the conversation scenario, wherein if the term and the conversation scenario are highly relevant, statistic distributions of the two are close to each other in the entire samples.
147. The method of any one of claims 141 to 143, wherein through x2 statistic, relevancies between all vocabularies in the domain dictionary and the various classes are calculated quickly and accurately based on quantities of corpora, and a preset number of terms with the x2 selected according to relevancy sorting result to serve as a feature set of conversation scenarios, to complete feature mapping of various scenarios and the various classes in the conversation scenario list.
148. The method of any one of claims 117 to 147, wherein the configuration feature word input by the user with respect to the target conversion scenario through a system frontend.
149. The method of any one of claims 117 to 148, wherein the system frontend provides business personnel with a feature relation extension function to maintain the business field.
150. The method of any one of claims 117 to 149, wherein on receiving the configuration feature word, updates it into an externally input feature set of the target conversion scenario.
151. The method of any one of claims 117 to 150, wherein the externally input configuration feature word is not contained in the domain dictionary, the configuration feature word is ignored.
152. The method of any one of claims 117 to 151, wherein the first user is business personnel, the second user is an algorithm developing technical personnel.
153. The method of any one of claims 117 to 152, wherein the FSM is a mathematical model that expresses a finite number of states and behaviors as transitions and actions amongst these states.
154. The method of any one of claims 117 to 153, wherein the FSM is to describe state sequences of an object experienced within the objects life cycle, and how to respond to various events coming from outside to transition between the states.
155. The method of any one of claims 117 to 154, wherein the FSM includes current state, condition, action, and next state.
156. The method of any one of claims 117 to 155, wherein the current state is a state currently being in, wherein the condition is referred to as event, wherein the condition is satisfied, an action is triggered or one round of state transition is performed;
157. The method of any one of claims 117 to 156, wherein the action indicates the action executed after the condition has been satisfied, wherein the action has been executed to completion, a new state is transitioned, or the original state is maintained.
158. The method of any one of claims 117 to 157, wherein the action is not indispensable, after the condition has been satisfied, not execute any action and directly transition to the new state;
159. The method of any one of claims 117 to 158, wherein the next state is the new state to be transitioned after the condition has been satisfied.
160. The method of any one of claims 117 to 159, wherein the next state is relative to the current state, once the next state is activated, it becomes a new current state.
161. The method of any one of claims 117 to 160, wherein FSM model is abstracted comprising:
162. The method of any one of claims 117 to 161, wherein the FSM state transition torque completely repaired is automatically translated to the script file of json format.
163. The method of any one of claims 117 to 162, wherein the json script file is read and input by the program to a state machine object when a finite state machine instance is generated, to validate logic.
164. The method of any one of claims 117 to 163, wherein the finite state machine instance as generated is stored in Redis as an index value according to uuid transferred by the frontend, to facilitate subsequent program access when the interactive voice response (IVR) service starts.
165. The method of any one of claims 117 to 164, wherein the user performs persistent operation on the FSM.
166. The method of any one of claims 117 to 165, wherein the user selects a task type as a single task at the frontend, the finite state machine instance of the single task as stored in Redis is cleared away as an invalidated object within a preset time period after the IVR marketing service has been triggered.
167. The method of any one of claims 117 to 166, wherein the intelligent voice robot converses with the user, wherein the user's conversation is text content recognized and transcribed from user's conversation speech through ASR technology.
168. The method of any one of claims 117 to 167, wherein the text content is word-segmented to obtain the plurality of segmented terms, and word-segmentation process includes character purification, rectification, word segmentation, and removal of stop-words.
169. The method of any one of claims 117 to 168, wherein plural segmented terms are mapped and transformed through index of the domain dictionary to form of expression of the conversation scenario in the scenario feature relation table.
170. The method of any one of claims 117 to 169, wherein each element in the feature word sequence of the user's conversation is mapped to the BERT word vector space to obtain the feature vector of 768 dimensions, and all elements are summated and averaged to obtain a 1x768 vector to serve as feature expression input by the user's conversation.
171. The method of any one of claims 117 to 170, wherein the feature word sequences of the various conversation scenarios in the scenario feature relation table are operated is converted to 1x768 feature vectors.
172. The method of any one of claims 117 to 171, wherein a cosine similarity between the feature vector input by the user's conversation and the scenario feature vector of the conversation scenario is calculated, wherein the greater cosine similarity calculation result, the higher similarity, and the higher cosine similarity calculation result is the relevancy between the user's conversation and the conversation scenario.
173. The method of any one of claims 117 to 172, wherein by arranging all the conversation scenarios in a descending order according to the cosine similarity calculation result, the conversation scenario with the highest cosine similarity calculation result is returned to serve as a judgement result of the intended scenario of current input by the user, and a corresponding response pattern under the intended scenario is returned according to the state in which the FSM is currently.
174. The method of any one of claims 117 to 173, wherein the response pattern is transformed to speech content through Text-To-Speech (TTS) technology is broadcast to the user.
175.A computer equipment comprising:
a memory, including a nonvolatile storage medium and an internal memory, wherein the nonvolatile storage medium stores therein an operating system, and a computer program, wherein the internal memory provides environment for running of the operating system and the computer program in the nonvolatile storage medium.
a processor, configured to provide computing and controlling capabilities;
a network interface connected to each other via a system bus configured to connect to other equipment via network for communication;
a computer program stored on the memory and operable on the processor, wherein the processor executes the computer program configured to:
obtain sample corpora of various conversation scenarios in a plurality of conversation scenarios;

generate a scenario feature of the conversation scenario based on a sample corpus of the conversation scenario, wherein the scenario feature includes feature words of the conversation scenario and a feature word sequence obtained by mapping transformation of the feature words; and configure an intelligent voice robot with a preset word vector space model and the scenario features of the various conversation scenarios, wherein the word vector space model is for the intelligent voice robot to perform word vector similarity calculation on a user's conversation and the scenario features of the various conversation scenarios, to recognize an intended scenario of the user's conversation.
176. The equipment of claim 175, wherein generating the scenario feature of the conversation scenario based on the sample corpus of the conversation scenario comprises:
obtaining discrete representation of the sample corpus of the conversation scenario based on a preset domain dictionary;
employing a feature selection algorithm to extract feature words of the conversation scenario based on the discrete representation of the sample corpus of the conversation scenario;
mapping and transforming the feature words of the conversation scenario to a corresponding dictionary index; and generating the feature word sequence of the conversation scenario, wherein the feature selection algorithm is a chi-square statistic feature selection algorithm.
177. The equipment of claim 175, further comprises:
storing the various conversation scenarios and the scenario features of the conversation scenarios in a scenario feature relation table.
178. The equipment of claim 175, further comprises:

receiving a configuration feature word input with respect to the conversation scenario;
and maintaining the scenario feature of the conversation scenario in the scenario feature relation table based on the configuration feature word of the conversation scenario and a configuration feature word sequence obtained by mapping transformation of the configuration feature word.
179. The equipment of claim 178, further comprises receiving the configuration feature word input by the user having feature configuration permission with respect to the conversation scenario.
180. The equipment of claim 178, wherein maintaining the scenario feature of the conversation scenario in the scenario feature relation table based on the configuration feature word of the conversation scenario and the configuration feature word sequence obtained by mapping transformation of the configuration feature word comprises:
merging the configuration feature word of the conversation scenario to the feature words of the conversation scenario in the scenario feature relation table; and adding the configuration feature word sequence of a merged configuration feature word to the feature word sequence of the conversation scenario.
181. The equipment of claim 175, wherein the word vector space model is trained and obtained comprises:
applying a domain corpora of a domain to the various conversation scenarios pertaining to train a pre-trained bidirectional encoder representations from transformers (BERT) word vector space; and obtaining the word vector space model.
182. The equipment of claim 175, further comprises:

receiving a state transition graph input by a first user for the conversation scenario and receiving supplementary information input by a second user for a state transition graph, to generate a state transition matrix of the conversation scenario; and generating a script file for containing state transition logical relation based on the state transition matrix of the conversation scenario and generating a finite state machine (FSM) based on the script file, to return a corresponding pattern when the intended scenario of the user's conversation is recognized.
183. The equipment of any one of claims 175 to 182, further comprises:
preprocessing the user's conversation to obtain a plurality of segmented terms in the user's conversation;
performing mapping transformation on the segmented terms;
obtaining a feature word sequence of the user's conversation;
using the word vector space model to construct a feature vector of the user's conversation and scenario feature vectors of the various conversation scenarios based on the feature word sequence of the user's conversation and feature word sequences of the various conversation scenarios;
performing similarity calculation on the feature vector of the user's conversation and the scenario feature vectors of the various conversation scenarios; and recognizing intention of the user's conversation based on a similarity calculation result, to return a pattern to which the intention corresponds.
184. The equipment of any one of claims 175 to 183, wherein the plurality of conversation scenarios are contained in a preset conversation scenario list, wherein the conversation scenario list is to record one or more conversation scenarios of a specific business field.
185. The equipment of any one of claims 175 to 184, wherein the sample corpora of the various conversation scenarios are obtained by classifying and marking specific domain corpora according to classes of the conversation scenarios.
186. The equipment of any one of claims 175 to 185, wherein the conversation scenarios are obtained by performing scenario abstraction on the specific domain corpora, and the scenario abstraction is a process from data to information and then to knowledge.
187. The equipment of any one of claims 175 to 186, wherein each conversation scenario is abstracted as a conversation state, and dialogue process is abstracted as a transition between conversation states.
188. The equipment of any one of claims 175 to 187, wherein the conversation state is taken as a node, an oriented line between the conversation states is a process in which one state is transferred to another state, wherein entire dialogue process is abstracted as a graph consisting of nodes and oriented lines.
189. The equipment of any one of claims 175 to 188, wherein content transformation based on a word of bag (WOB) model is performed on the sample corpus of each conversation scenario, wherein discrete representation of the sample corpus of each conversation scenario is obtained, wherein the feature selection algorithm is to extract feature words of each conversation scenario on the basis of the discrete representation of the sample corpus of each conversation scenario.
190. The equipment of any one of claims 175 to 189, wherein the WOB model divides a corpus text into separate terrns, and it is imaged that all terms are placed in a bag, such elements as their word orders, grammars and syntaxes are ignored, wherein the bag is only regarded as a collection of plural vocabularies, wherein each term as it appears in the corpus text is independent.
191. The equipment of any one of claims 175 to 190, wherein the WOB model includes a one-hot, TF-IDF and N-gram model.
192. The equipment of any one of claims 175 to 191, wherein name, the feature words and the feature word sequence of each conversation scenario are correspondingly stored in the scenario feature relation table.
193. The equipment of any one of claims 175 to 192, wherein the scenario feature relation table stores correspondence relations between conversation scenarios and scenario features, including feature words and feature word sequences.
194. The equipment of any one of claims 175 to 193, wherein the scenario feature relation table is stored in a server, maintained offline by backstage algorithm technical personnel based on periodical text data mining operations, and isolated from frontstage business personnel.
195. The equipment of any one of claims 175 to 194, wherein when the intelligent voice robot is conversing with the user, obtain user's conversation text by recognizing and transforming user's conversation voice via an automatic speech recognition (ASR) technique, and extract feature information out of the user's conversation text.
196. The equipment of any one of claims 175 to 195, wherein pre-trained embedding is obtained by introducing large-scale BERT word vector space, pretrained in google bert serving.
197. The equipment of any one of claims 175 to 196, wherein the BERT word vector space is retrained by bringing in own business customer service corpora, and calibration of the BERT word vector is realized, to conform to specific business scenarios.
198. The equipment of any one of claims 175 to 197, wherein total segmentation corpora where stop-words are removed are to construct the preset domain dictionary, wherein the preset domain dictionary includes total effective vocabularies appearing in the corpora, and the preset domain dictionary is to perform content transformation on all sample corpora of a target conversation scenario based on the WOB model to obtain the discrete representation.
199. The equipment of any one of claims 175 to 198, wherein the chi-square statistic (CHI) technique is to extract feature words of the target conversation scenario.
200. The equipment of any one of claims 175 to 199, wherein CHI calculation formula comprises:
wherein c is a certain class, namely a "conversation scenario", t is a certain term, and N is a total number of texts in a training corpora.
201. The equipment of claim 200, wherein the x2 is for chi square hypothesis verification in statistics, to judge uniformity or goodness of fit between actual distribution and theoretical distribution, wherein a null hypothesis Ho being "no marked difference between observed frequency and desired frequency".
202. The equipment of any one of claims 200 to 201, wherein less the chi-square statistic is, the closer the observed frequency to the desired frequency, and higher relevancy between.
203. The equipment of any one of claims 200 to 202, wherein x2 is a measure of distance between an observed object and a desired object, wherein less the distance is, higher the relevancy between.
204. The equipment of any one of claims 200 to 203, wherein the observed object is the term, and the desired object is the conversation scenario, wherein if the term and the conversation scenario are highly relevant, statistic distributions of the two are close to each other in the entire samples.
205. The equipment of any one of claims 200 to 204, wherein through x2 statistic, relevancies between all vocabularies in the domain dictionary and the various classes are calculated quickly and accurately based on quantities of corpora, and a preset number of terms with the x2 selected according to relevancy sorting result to serve as a feature set of conversation scenarios, to complete feature mapping of various scenarios and the various classes in the conversation scenario list.
206. The equipment of any one of claims 175 to 205, wherein the configuration feature word input by the user with respect to the target conversion scenario through a system frontend.
207. The equipment of any one of claims 175 to 206, wherein the system frontend provides business personnel with a feature relation extension function to maintain the business field.
208. The equipment of any one of claims 175 to 207, wherein on receiving the configuration feature word, updates it into an externally input feature set of the target conversion scenario.
209. The equipment of any one of claims 175 to 208, wherein the externally input configuration feature word is not contained in the domain dictionary, the configuration feature word is ignored.
210. The equipment of any one of claims 175 to 209, wherein the first user is business personnel, the second user is an algorithm developing technical personnel.
211. The equipment of any one of claims 175 to 210, wherein the FSIVI is a mathematical model that expresses a finite number of states and behaviors as transitions and actions amongst these states.
212. The equipment of any one of claims 175 to 211, wherein the FSIVI is to describe state sequences of an object experienced within the objects life cycle, and how to respond to various events coming from outside to transition between the states.
213. The equipment of any one of claims 175 to 212, wherein the FSIVI includes current state, condition, action, and next state.
214. The equipment of any one of claims 175 to 213, wherein the current state is a state currently being in, wherein the condition is referred to as event, wherein the condition is satisfied, an action is triggered or one round of state transition is performed;
215. The equipment of any one of claims 175 to 214, wherein the action indicates the action executed after the condition has been satisfied, wherein the action has been executed to completion, a new state is transitioned, or the original state is maintained.
216. The equipment of any one of claims 175 to 215, wherein the action is not indispensable, after the condition has been satisfied, not execute any action and directly transition to the new state;
217. The equipment of any one of claims 175 to 216, wherein the next state is the new state to be transitioned after the condition has been satisfied.
218. The equipment of any one of claims 175 to 217, wherein the next state is relative to the current state, once the next state is activated, it becomes a new current state.
219. The equipment of any one of claims 175 to 218, wherein FSM model is abstracted comprising:
220. The equipment of any one of claims 175 to 219, wherein the FSM state transition torque completely repaired is automatically translated to the script file of json format.
221. The equipment of any one of claims 175 to 220, wherein the json script file is read and input by the program to a state machine object when a finite state machine instance is generated, to validate logic.
222. The equipment of any one of claims 175 to 221, wherein the finite state machine instance as generated is stored in Redis as an index value according to uuid transferred by the frontend, to facilitate subsequent program access when the interactive voice response (IVR) service starts.
223. The equipment of any one of claims 175 to 222, wherein the user performs persistent operation on the FSM.
224. The equipment of any one of claims 175 to 223, wherein the user selects a task type as a single task at the frontend, the finite state machine instance of the single task as stored in Redis is cleared away as an invalidated object within a preset time period after the IVR
marketing service has been triggered.
225. The equipment of any one of claims 175 to 224, wherein the intelligent voice robot converses with the user, wherein the user's conversation is text content recognized and transcribed from user's conversation speech through ASR technology.
226. The equipment of any one of claims 175 to 225, wherein the text content is word-segmented to obtain the plurality of segmented terms, and word-segmentation process includes character purification, rectification, word segmentation, and removal of stop-words.
227. The equipment of any one of claims 175 to 226, wherein plural segmented terms are mapped and transforrned through index of the domain dictionary to form of expression of the conversation scenario in the scenario feature relation table.
228. The equipment of any one of claims 175 to 227, wherein each element in the feature word sequence of the user's conversation is mapped to the BERT word vector space to obtain the feature vector of 768 dimensions, and all elements are summated and averaged to obtain a 1x768 vector to serve as feature expression input by the user's conversation.
229. The equipment of any one of claims 175 to 228, wherein the feature word sequences of the various conversation scenarios in the scenario feature relation table are operated is converted to 1x768 feature vectors.
230. The equipment of any one of claims 175 to 229, wherein a cosine similarity between the feature vector input by the user's conversation and the scenario feature vector of the conversation scenario is calculated, wherein the greater cosine similarity calculation result, the higher similarity, and the higher cosine similarity calculation result is the relevancy between the user's conversation and the conversation scenario.
231. The equipment of any one of claims 175 to 230, wherein by arranging all the conversation scenarios in a descending order according to the cosine similarity calculation result, the conversation scenario with the highest cosine similarity calculation result is returned to serve as a judgement result of the intended scenario of current input by the user, and a corresponding response pattern under the intended scenario is returned according to the state in which the FSM is currently.
232. The equipment of any one of claims 175 to 231, wherein the response pattern is transformed to speech content through Text-To-Speech (TTS) technology is broadcast to the user.
233.A computer readable physical memory having stored thereon a computer program executed by a computer configured to:
obtain sample corpora of various conversation scenarios in a plurality of conversation scenarios;
generate a scenario feature of the conversation scenario based on a sample corpus of the conversation scenario, wherein the scenario feature includes feature words of the conversation scenario and a feature word sequence obtained by mapping transformation of the feature words; and configure an intelligent voice robot with a preset word vector space model and the scenario features of the various conversation scenarios, wherein the word vector space model is for the intelligent voice robot to perform word vector similarity calculation on a user's conversation and the scenario features of the various conversation scenarios, to recognize an intended scenario of the user's conversation.
234. The memory of claim 233, wherein generating the scenario feature of the conversation scenario based on the sample corpus of the conversation scenario comprises:
obtaining discrete representation of the sample corpus of the conversation scenario based on a preset domain dictionary;
employing a feature selection algorithm to extract feature words of the conversation scenario based on the discrete representation of the sample corpus of the conversation scenario;
mapping and transforming the feature words of the conversation scenario to a corresponding dictionary index; and generating the feature word sequence of the conversation scenario, wherein the feature selection algorithm is a chi-square statistic feature selection algorithm.
235. The memory of claim 233, further comprises:
storing the various conversation scenarios and the scenario features of the conversation scenarios in a scenario feature relation table.
236. The memory of claim 233, further comprises:
receiving a configuration feature word input with respect to the conversation scenario;
and maintaining the scenario feature of the conversation scenario in the scenario feature relation table based on the configuration feature word of the conversation scenario and a configuration feature word sequence obtained by mapping transformation of the configuration feature word.
237. The memory of claim 236, further comprises receiving the configuration feature word input by the user having feature configuration permission with respect to the conversation scenario.
238. The memory of claim 236, wherein maintaining the scenario feature of the conversation scenario in the scenario feature relation table based on the configuration feature word of the conversation scenario and the configuration feature word sequence obtained by mapping transformation of the configuration feature word comprises:
merging the configuration feature word of the conversation scenario to the feature words of the conversation scenario in the scenario feature relation table; and adding the configuration feature word sequence of a merged configuration feature word to the feature word sequence of the conversation scenario.
239. The memory of claim 233, wherein the word vector space model is trained and obtained comprises:

applying a domain corpora of a domain to the various conversation scenarios pertaining to train a pre-trained bidirectional encoder representations from transformers (BERT) word vector space; and obtaining the word vector space model.
240. The memory of claim 233, further comprises:
receiving a state transition graph input by a first user for the conversation scenario and receiving supplementary information input by a second user for a state transition graph, to generate a state transition matrix of the conversation scenario; and generating a script file for containing state transition logical relation based on the state transition matrix of the conversation scenario and generating a finite state machine (FSM) based on the script file, to return a corresponding pattern when the intended scenario of the user's conversation is recognized.
241. The memory of any one of claims 233 to 240, further comprises:
preprocessing the user's conversation to obtain a plurality of segmented terms in the user's conversation;
performing mapping transformation on the segmented terms;
obtaining a feature word sequence of the user's conversation;
using the word vector space model to construct a feature vector of the user's conversation and scenario feature vectors of the various conversation scenarios based on the feature word sequence of the user's conversation and feature word sequences of the various conversation scenarios;
performing similarity calculation on the feature vector of the user's conversation and the scenario feature vectors of the various conversation scenarios; and recognizing intention of the user's conversation based on a similarity calculation result, to return a pattern to which the intention corresponds.
242. The memory of any one of claims 233 to 241, wherein the plurality of conversation scenarios are contained in a preset conversation scenario list, wherein the conversation scenario list is to record one or more conversation scenarios of a specific business field.
243. The memory of any one of claims 233 to 242, wherein the sample corpora of the various conversation scenarios are obtained by classifying and marking specific domain corpora according to classes of the conversation scenarios.
244. The memory of any one of claims 233 to 243, wherein the conversation scenarios are obtained by perfonning scenario abstraction on the specific domain corpora, and the scenario abstraction is a process from data to information and then to knowledge.
245. The memory of any one of claims 233 to 244, wherein each conversation scenario is abstracted as a conversation state, and dialogue process is abstracted as a transition between conversation states.
246. The memory of any one of claims 233 to 245, wherein the conversation state is taken as a node, an oriented line between the conversation states is a process in which one state is transferred to another state, wherein entire dialogue process is abstracted as a graph consisting of nodes and oriented lines.
247. The memory of any one of claims 233 to 246, wherein content transformation based on a word of bag (WOB) model is performed on the sample corpus of each conversation scenario, wherein discrete representation of the sample corpus of each conversation scenario is obtained, wherein the feature selection algorithm is to extract feature words of each conversation scenario on the basis of the discrete representation of the sample corpus of each conversation scenario.
248. The memory of any one of claims 233 to 247, wherein the WOB model divides a corpus text into separate terms, and it is imaged that all terms are placed in a bag, such elements as their word orders, grammars and syntaxes are ignored, wherein the bag is only regarded as a collection of plural vocabularies, wherein each term as it appears in the corpus text is independent.
249. The memory of any one of claims 233 to 248, wherein the WOB model includes a one-hot, `11,-IDF and N-gram model.
250. The memory of any one of claims 233 to 249, wherein name, the feature words and the feature word sequence of each conversation scenario are correspondingly stored in the scenario feature relation table.
251. The memory of any one of claims 233 to 250, wherein the scenario feature relation table stores correspondence relations between conversation scenarios and scenario features, including feature words and feature word sequences.
252. The memory of any one of claims 233 to 251, wherein the scenario feature relation table is stored in a server, maintained offline by backstage algorithm technical personnel based on periodical text data mining operations, and isolated from frontstage business personnel.
253. The memory of any one of claims 233 to 252, wherein when the intelligent voice robot is conversing with the user, obtain user's conversation text by recognizing and transforming user's conversation voice via an automatic speech recognition (ASR) technique, and extract feature information out of the user's conversation text.
254. The memory of any one of claims 233 to 253, wherein pre-trained embedding is obtained by introducing large-scale BERT word vector space, pretrained in google bert serving.
255. The memory of any one of claims 233 to 254, wherein the BERT word vector space is retrained by bringing in own business customer service corpora, and calibration of the BERT word vector is realized, to conform to specific business scenarios.
256. The memory of any one of claims 233 to 255, wherein total segmentation corpora where stop-words are removed are to construct the preset domain dictionary, wherein the preset domain dictionary includes total effective vocabularies appearing in the corpora, and the preset domain dictionary is to perform content transformation on all sample corpora of a target conversation scenario based on the WOB model to obtain the discrete representation.
257. The memory of any one of claims 233 to 256, wherein the chi-square statistic (CHI) technique is to extract feature words of the target conversation scenario.
258. The memory of any one of claims 233 to 257, wherein CHI calculation formula comprises:
wherein c is a certain class, namely a "conversation scenario", t is a certain term, and N is a total number of texts in a training corpora.
259. The memory of claim 258, wherein the x2 is for chi square hypothesis verification in statistics, to judge uniformity or goodness of fit between actual distribution and theoretical distribution, wherein a null hypothesis H 0 being "no marked difference between observed frequency and desired frequency".
260. The memory of any one of claims 258 to 259, wherein less the chi-square statistic is, the closer the observed frequency to the desired frequency, and higher relevancy between.
261. The memory of any one of claims 258 to 260, wherein X2 is a measure of distance between an observed object and a desired object, wherein less the distance is, higher the relevancy between.
262. The memory of any one of claims 258 to 261, wherein the observed object is the term, and the desired object is the conversation scenario, wherein if the term and the conversation scenario are highly relevant, statistic distributions of the two are close to each other in the entire samples.
263. The memory of any one of claims 258 to 262, wherein through X2 statistic, relevancies between all vocabularies in the domain dictionary and the various classes are calculated quickly and accurately based on quantities of corpora, and a preset number of temis with the x2 selected according to relevancy sorting result to serve as a feature set of conversation scenarios, to complete feature mapping of various scenarios and the various classes in the conversation scenario list.
264. The memory of any one of claims 233 to 263, wherein the configuration feature word input by the user with respect to the target conversion scenario through a system frontend.
265. The memory of any one of claims 233 to 264, wherein the system frontend provides business personnel with a feature relation extension function to maintain the business field.
266. The memory of any one of claims 233 to 265, wherein on receiving the configuration feature word, updates it into an externally input feature set of the target conversion scenario.
267. The memory of any one of claims 233 to 266, wherein the externally input configuration feature word is not contained in the domain dictionary, the configuration feature word is ignored.
268. The memory of any one of claims 233 to 267, wherein the first user is business personnel, the second user is an algorithm developing technical personnel.
269. The memory of any one of claims 233 to 268, wherein the FSM is a mathematical model that expresses a finite number of states and behaviors as transitions and actions amongst these states.
270. The memory of any one of claims 233 to 269, wherein the FSM is to describe state sequences of an object experienced within the objects life cycle, and how to respond to various events coming from outside to transition between the states.
271. The memory of any one of claims 233 to 270, wherein the FSM includes current state, condition, action, and next state.
272. The memory of any one of claims 233 to 271, wherein the current state is a state currently being in, wherein the condition is referred to as event, wherein the condition is satisfied, an action is triggered or one round of state transition is performed;
273. The memory of any one of claims 233 to 272, wherein the action indicates the action executed after the condition has been satisfied, wherein the action has been executed to completion, a new state is transitioned, or the original state is maintained.
274. The memory of any one of claims 233 to 273, wherein the action is not indispensable, after the condition has been satisfied, not execute any action and directly transition to the new state;
275. The memory of any one of claims 233 to 274, wherein the next state is the new state to be transitioned after the condition has been satisfied.
276. The memory of any one of claims 233 to 275, wherein the next state is relative to the current state, once the next state is activated, it becomes a new current state.
277. The memory of any one of claims 233 to 276, wherein FSM model is abstracted comprising:
278. The memory of any one of claims 233 to 277, wherein the FSM state transition torque completely repaired is automatically translated to the script file of json format.
279. The memory of any one of claims 233 to 278, wherein the json script file is read and input by the program to a state machine object when a finite state machine instance is generated, to validate logic.
280. The memory of any one of claims 233 to 279, wherein the finite state machine instance as generated is stored in Redis as an index value according to uuid transferred by the frontend, to facilitate subsequent program access when the interactive voice response (IVR) service starts.
281. The memory of any one of claims 233 to 280, wherein the user performs persistent operation on the FSIVI.
282. The memory of any one of claims 233 to 281, wherein the user selects a task type as a single task at the frontend, the finite state machine instance of the single task as stored in Redis is cleared away as an invalidated object within a preset time period after the IVR marketing service has been triggered.
283. The memory of any one of claims 233 to 282, wherein the intelligent voice robot converses with the user, wherein the user's conversation is text content recognized and transcribed from user's conversation speech through ASR technology.
284. The memory of any one of claims 233 to 283, wherein the text content is word-segmented to obtain the plurality of segmented terms, and word-segmentation process includes character purification, rectification, word segmentation, and removal of stop-words.
285. The memory of any one of claims 233 to 284, wherein plural segmented terms are mapped and transformed through index of the domain dictionary to form of expression of the conversation scenario in the scenario feature relation table.
286. The memory of any one of claims 233 to 285, wherein each element in the feature word sequence of the user's conversation is mapped to the BERT word vector space to obtain the feature vector of 768 dimensions, and all elements are summated and averaged to obtain a 1x768 vector to serve as feature expression input by the user's conversation.
287. The memory of any one of claims 233 to 286, wherein the feature word sequences of the various conversation scenarios in the scenario feature relation table are operated is converted to 1x768 feature vectors.
288. The memory of any one of claims 233 to 287, wherein a cosine similarity between the feature vector input by the user's conversation and the scenario feature vector of the conversation scenario is calculated, wherein the greater cosine similarity calculation result, the higher similarity, and the higher cosine similarity calculation result is the relevancy between the user's conversation and the conversation scenario.
289. The memory of any one of claims 233 to 288, wherein by arranging all the conversation scenarios in a descending order according to the cosine similarity calculation result, the conversation scenario with the highest cosine similarity calculation result is returned to serve as a judgement result of the intended scenario of current input by the user, and a corresponding response pattern under the intended scenario is returned according to the state in which the FSM is currently.
290. The memory of any one of claims 233 to 289, wherein the response pattern is transformed to speech content through Text-To-Speech (TTS) technology is broadcast to the user.
CA3155717A 2021-04-19 2022-04-19 Method of realizing configurable intelligent voice robot, device and storage medium Pending CA3155717A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110417581.9 2021-04-19
CN202110417581.9A CN114333813A (en) 2021-04-19 2021-04-19 Implementation method and device for configurable intelligent voice robot and storage medium

Publications (1)

Publication Number Publication Date
CA3155717A1 true CA3155717A1 (en) 2022-10-19

Family

ID=81044444

Family Applications (1)

Application Number Title Priority Date Filing Date
CA3155717A Pending CA3155717A1 (en) 2021-04-19 2022-04-19 Method of realizing configurable intelligent voice robot, device and storage medium

Country Status (2)

Country Link
CN (1) CN114333813A (en)
CA (1) CA3155717A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116668580A (en) * 2022-10-26 2023-08-29 荣耀终端有限公司 Scene recognition method, electronic device and readable storage medium

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117975964B (en) * 2024-03-22 2024-10-01 联众智慧科技股份有限公司 Intelligent robot voice interaction method and system based on Bert model

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116668580A (en) * 2022-10-26 2023-08-29 荣耀终端有限公司 Scene recognition method, electronic device and readable storage medium
CN116668580B (en) * 2022-10-26 2024-04-19 荣耀终端有限公司 Scene recognition method, electronic device and readable storage medium

Also Published As

Publication number Publication date
CN114333813A (en) 2022-04-12

Similar Documents

Publication Publication Date Title
Evermann et al. Predicting process behaviour using deep learning
CN110704641B (en) Ten-thousand-level intention classification method and device, storage medium and electronic equipment
US11003863B2 (en) Interactive dialog training and communication system using artificial intelligence
CN101010934B (en) Method for machine learning
US12026471B2 (en) Automated generation of chatbot
US11164564B2 (en) Augmented intent and entity extraction using pattern recognition interstitial regular expressions
CA3155717A1 (en) Method of realizing configurable intelligent voice robot, device and storage medium
CN108682420A (en) A kind of voice and video telephone accent recognition method and terminal device
CN113239169A (en) Artificial intelligence-based answer generation method, device, equipment and storage medium
CN110599324A (en) Method and device for predicting refund rate
CN116049360A (en) Intelligent voice dialogue scene conversation intervention method and system based on client image
US11355122B1 (en) Using machine learning to correct the output of an automatic speech recognition system
CN112214585A (en) Reply message generation method, system, computer equipment and storage medium
CN112183106A (en) Semantic understanding method and device based on phoneme association and deep learning
CN114399995A (en) Method, device and equipment for training voice model and computer readable storage medium
CN110931002A (en) Human-computer interaction method and device, computer equipment and storage medium
Oruh et al. Deep Learning‐Based Classification of Spoken English Digits
US20230206007A1 (en) Method for mining conversation content and method for generating conversation content evaluation model
Griol et al. A proposal for the development of adaptive spoken interfaces to access the Web
CN113094471A (en) Interactive data processing method and device
CN113807920A (en) Artificial intelligence based product recommendation method, device, equipment and storage medium
CN113688222A (en) Insurance sales task conversational recommendation method, system and equipment based on context semantic understanding
Kreyssig Deep learning for user simulation in a dialogue system
CN113535125A (en) Financial demand item generation method and device
Van Thin et al. A human-like interactive chatbot framework for Vietnamese banking domain