CN112202978A

CN112202978A - Intelligent outbound call system, method, computer system and storage medium

Info

Publication number: CN112202978A
Application number: CN202010857779.4A
Authority: CN
Inventors: 朱频频
Original assignee: Weizhi Technology Zhangjiakou Co ltd
Current assignee: Weizhi Technology Zhangjiakou Co ltd
Priority date: 2020-08-24
Filing date: 2020-08-24
Publication date: 2021-01-08

Abstract

The application provides an intelligent outbound call system, a method, a computer system and a storage medium, the system comprises: the dialing service unit is used for dialing a user to form a session and configuring scene information related to the dialing task; the voice service unit is used for carrying out interconversion between voice and text of the conversation; and the session service unit is used for starting a corresponding scene template according to the scene information, calling an intention engine to analyze the user text to acquire a corresponding user intention, outputting a broadcast conversation in the template by the flow node, and outputting corresponding voice to the user. According to the scheme, the scene template is preset to be selected and started according to different service requirements, the selection of the outbound application service scene which is matched flexibly according to the service requirements is realized, meanwhile, an accurate conversation effect is achieved through refined flow nodes and intention analysis, and the problems in the prior art are well solved.

Description

Intelligent outbound call system, method, computer system and storage medium

Technical Field

The embodiment of the application relates to the technical field of customer service outbound, in particular to an intelligent outbound system, an intelligent outbound method, a computer system and a storage medium.

Background

Outbound is when the customer service center system calls out the service to actively initiate a call to the customer. The outbound service is widely applied to various fields, such as the financial industry, and the outbound is often applied to the business of telephone collection, payment reminding, banking business and the like; in the education and training industry, potential students can be communicated and revisited through outbound calls, or the students who successfully make an appointment are informed of information such as class time, location and the like; in various sales industries, after-sales visits may be used.

Although there is an automatic outbound system, that is, an outbound call is automatically issued by a machine and a session response is made with a user; some scenes applied by the existing automatic outbound system are generalized or specialized, the generalized automatic outbound system is not enough in conversation accuracy, the specialized automatic outbound system is not enough in flexibility of the type of service which can be responded, and different scenes can not be flexibly applied and the conversation is accurate.

Disclosure of Invention

In view of this, embodiments of the present application provide an intelligent outbound system, an intelligent outbound method, a computer system, and a storage medium, which solve the technical problems in the prior art.

The embodiment of the application provides an intelligence system of calling out, includes:

the dialing service unit is used for executing dialing tasks related to services to form a session between the intelligent outbound system and a dialed user telephone; the dialing service unit is configured with scene information related to the dialing task;

the voice service unit is used for converting the user voice in the conversation into a user text or converting the broadcast text into broadcast voice for the user;

and the session service unit is used for receiving the call of the dialing service unit, starting a corresponding scene template according to the scene information, calling the service of an intention engine to analyze the user text to obtain the corresponding user intention, so that a plurality of process nodes in the scene template form a process driven by the user intention, determining a broadcast conversation at least part of the process nodes reached by the process, and converting the broadcast text corresponding to the broadcast conversation into broadcast voice for the user by the voice service unit.

Optionally, the types of the plurality of process nodes include any one or more of the following combinations:

the intention node is used for obtaining a user intention through the service of the intention engine when a flow arrives, and selecting a path to a next flow node according to the user intention;

the reply node is used for outputting a preset broadcast conversation when the flow arrives;

the word slot node is used for extracting key information in the user intention through the service of the intention engine when the flow arrives;

and the calling node is used for using the external resource through a preset application interface.

Optionally, the intention node and/or the word slot node are further configured to output a corresponding broadcast word when the user is queried that there is no response.

Optionally, the condition that the user is asked to have no response includes any one of the following: timeout has not replied; the reply is rejected.

Optionally, the combination formed by the connection between the intention node and the word slot node includes at least one of the following:

1) a single word slot node connected behind an intention node for extracting one or more kinds of key information from the user intention obtained by the intention node;

2) the word slot nodes are connected behind an intention node in series and used for respectively extracting one or more kinds of key information from the user intention obtained by the intention node.

Optionally, the key information includes: named entity information.

Optionally, the types of services provided by the intent engine include one or more of:

the semantic search prediction service is used for outputting a prediction result obtained by semantic search based on template matching;

the machine learning prediction service is used for outputting semantic prediction results based on the machine learning model;

the named entity recognition prediction service is used for outputting a prediction result of the named entity recognition based on a regression or machine learning semantic model;

a knowledge graph prediction service for outputting a prediction result of semantic search based on a knowledge graph;

a reading understanding prediction service for outputting a prediction result of a semantic search based on a reading understanding of a text.

Optionally, the intention node and the word slot node in the plurality of process nodes have intention service attribute information for defining a service type of an intention engine used by the intention node.

Optionally, the dialing service unit is further configured with user information related to the dialing task; and the user related content in the broadcast text is filled by linking the user information.

Optionally, the intelligent outbound system includes: and the system voice exchange service unit is connected with the dialing service unit, the voice service unit and the conversation service unit and can dial the user telephone by being connected to the telephone service network.

Optionally, the system voice exchange service unit is bridged with an external voice exchange device to connect to the telephone service network through the external voice exchange device.

Optionally, the voice service unit includes: a speech-to-text subunit and a text-to-speech subunit.

Optionally, the voice service unit further includes:

and the media resource control subunit is respectively used for connecting the voice to text subunit and the text to voice subunit in a pluggable way.

Optionally, the pluggable connection includes: the remote procedure calls the connection.

The embodiment of the application also provides an intelligent outbound method which is applied to the intelligent outbound system; the intelligent outbound method comprises the following steps:

executing a dialing task to form a conversation between the intelligent outbound system and the dialed subscriber's telephone;

converting the user voice in the conversation into user text;

enabling a corresponding scene template according to scene information related to the dialing task, calling a service of an intention engine to analyze the user text to obtain a corresponding user intention, enabling a plurality of process nodes in the scene template to form a process driven by the user intention, and determining a broadcast dialog at least part of process nodes reached by the process;

and converting the broadcast text corresponding to the broadcast operation into broadcast voice for the user.

Optionally, the key information includes: named entity information.

Optionally, the user-related content in the broadcast text is filled by linking with preconfigured user information.

The embodiment of the application also provides a computer system for the intelligent outbound system, which comprises a memory and a processor, wherein the memory is stored with a computer program capable of running on the processor, and the processor executes any step of the intelligent outbound method when running the computer program.

The embodiment of the application also provides a computer readable storage medium, wherein a computer program is stored on the computer readable storage medium, and the computer program executes the steps of any one of the intelligent outbound methods when running.

Compared with the prior art, the technical scheme of the embodiment of the application has the following beneficial effects:

on one hand, the intelligent outbound system in the embodiment of the application selects and starts according to different service requirements by presetting a scene template, and determines corresponding broadcasting techniques by analyzing the user intention in the session to form a flow in a plurality of flow nodes corresponding to the scene template, so that the selection of the outbound application service scene which is flexibly adapted according to the service requirements is achieved, meanwhile, the accurate session effect is achieved by refined flow nodes and intention analysis, and the problems in the prior art are well solved.

On the other hand, the intelligent outbound system can analyze the user intention by using one or various traditional models, deep learning models and the like, and has better scene universality and conversation accuracy.

On the other hand, in the intelligent outbound system, the form of the module for voice/text interconversion can be divided into independent modules on the system architecture, and a pluggable connection mode is used, so that different divided functional modules can meet different service requirements, and higher use efficiency is achieved.

Drawings

Fig. 1 is a schematic view of an application scenario of an intelligent outbound call system in an embodiment of the present application.

Fig. 2 is a functional module diagram of an intelligent outbound call system in the embodiment of the present application.

Fig. 3 is a functional block diagram of a voice service unit in an embodiment of the present application.

Fig. 4 is a schematic diagram of a graphical interface of scene management in the embodiment of the present application.

Fig. 5 is a schematic diagram of a scene flow structure in the embodiment of the present application.

Fig. 6 is a schematic diagram of a flow node application in the embodiment of the present application.

FIG. 7a is a schematic diagram of a graphical interface for configuring an intent node in an embodiment of the present application.

FIG. 7b is a graphical interface diagram of a configuration reply node in an embodiment of the present application.

Fig. 7c is a schematic diagram of a graphical interface for configuring a word slot node in the embodiment of the present application.

Fig. 7d is a schematic diagram of a graphical interface of a configuration calling node in the embodiment of the present application.

FIG. 8 is a schematic diagram of a combined application of an intention node and a word-slot node in an embodiment of the present application.

FIG. 9 is a diagram illustrating a combined application of an intention node and a wordcap node in an embodiment of the present application.

Fig. 10 is a functional block diagram of an intelligent outbound call system in yet another embodiment of the present application.

Fig. 11 is a hardware architecture diagram of the intelligent outbound system in the embodiment of the present application.

Fig. 12 is a flowchart illustrating an intelligent outbound method according to an embodiment of the present application.

Fig. 13 is a schematic structural diagram of a computer system in the embodiment of the present application.

Detailed Description

The intelligent calling system is used for automatically making calls and conversation to the client based on the purpose of service.

In some examples, businesses requiring outbound calls, such as stock finance, educational training, etc., may have business requirements for e-selling and return visits, while banks, for example, may also have revenue-soliciting requirements for loan transactions. In the past, the outbound work is finished by customer service, the telemarketing service needs the customer service to dial a large number of strange numbers during primary screening, but the call completing rate is extremely low, the proportion of the intended customers is very small, and a large amount of repeated work exists; in the business scene of collection, the work of customer service is to inform and inform the answering party of some information that they need to know, and the feedback classification of the answering party is also simpler: knowing and agreeing or refusing, under the notification scene such as hasty receipts, the number of the required call turns is generally short, the guidance and the purpose are clear, and the call content is greatly repeated.

Therefore, the intelligent outbound system replaces manual customer service outbound to finish the outbound, so that a large number of repetition exists, the working efficiency can be greatly improved, and the problems of errors, negative emotions and the like caused by manual operation are effectively reduced.

Fig. 1 shows a schematic diagram of a communication system applied to an intelligent outbound call system 11 in the embodiment of the present application.

To implement the outbound function, the intelligent outbound system 11 needs to be connected to a telephone network to be able to communicate with the calling user's telephone device 14. In this embodiment, the user's Telephone device 14 is connected to a Public Switched Telephone Network (PSTN) 13 through an external Private Branch eXchange (PBX) 12, and is accessible through the PSTN 13. The telephone device 14 may be a fixed phone, a mobile phone or other communication devices capable of accessing the pstn 13, such as a tablet computer, a smart watch, a smart band, smart glasses, etc. into which a Subscriber Identity Module (SIM) card is inserted.

In some examples, the intelligent outbound system 11 may be implemented by one computer device or by a plurality of computer devices connected to each other to form a processing system. Wherein the computer devices may be, for example, servers, and the plurality of computer devices may constitute a cooperating server group.

The functions to be implemented by the intelligent outbound system 11 include: automatic outbound and automatic conversation. To illustrate the working process of the system, a user list is obtained, for example, a group of user telephones are called one by one, when the system is called up, a predetermined dialect is used according to a task, for example, "hello" common in telemarketing, knows that you are good customers of my x, and recommends xxxxx "to you; furthermore, if the user answers, the answer can be subjected to semantic analysis to obtain the intention of the user, and then the answer is continued until the flow is finished and hung up.

Although the automatic outbound system in the prior art basically operates through the principle, automatically outbound through a machine and responds with a user in a conversation manner, some application scenarios of the automatic outbound system are generalized or specialized, the generalized automatic outbound system is not enough in conversation accuracy, and the specialized automatic outbound system is not enough in flexibility in service types capable of coping with; in order to solve the problem, the embodiment of the application provides a corresponding solution.

As shown in fig. 2, a functional module diagram of the intelligent outbound system in the embodiment of the present application is shown. The intelligent outbound system 20 may be used in the embodiment of fig. 1. The intelligent outbound system 20 specifically includes: a dial-up service unit 21, a voice service unit 22, and a session service unit 23.

The dialing service unit 21 is configured to perform a service-related dialing task to form a session between the intelligent outbound system 20 and a dialed telephone device of the user.

In some examples, the dialing service unit 21 may be configured with context information related to the dialing task for indicating a corresponding context. Such as a catalyst scene, etc.

In some examples, the dialing service unit 21 may further be configured with user information (e.g., information including name, telephone number, occupation, etc. of the user), so as to implement a user list service, that is, to determine a list of users to be dialed, so as to make a call therein according to the list of users.

In some examples, the dialing service unit 21 may also be configured with a policy service, wherein the policy service may determine a time schedule for dialing the user (e.g., avoiding holidays), announcements associated with personal information of the user, and so on.

In some examples, the dialing service unit 21 may also be configured with explicit number information to enable display of corresponding explicit numbers, e.g., 100XX, 955XX, etc., to the dialed telephone device.

In the session, since the user inputs voice, the intelligent outbound system also outputs voice to the user, and the voice input by the user needs to be converted into text information in the intelligent outbound system for semantic analysis, it needs to be processed by the voice service unit 22.

The voice service unit 22 is configured to convert the user voice in the session into a user text, or convert the broadcast text into broadcast voice for the user.

In some examples, converting user Speech into user text may be accomplished by an Automatic Speech Recognition (ASR) algorithm, which generally includes a process of Speech feature extraction (i.e., encoding) and parsing of the extracted feature vectors to obtain text (i.e., decoding). Among them, the algorithm of speech feature extraction is Linear Predictive analysis (LPC), Perceptual Linear Predictive Coefficient (PLP), Filterbank-based feature extraction (Fbank), Linear Predictive Cepstrum Coefficient (LPCC), or Mel-Frequency Cepstrum Coefficient (MFCC), etc., and the algorithm of decoding can be implemented by using a trained Neural Network model, such as Recurrent Neural Network (RNN), Long Short-Term Memory Network (Long Short-Term Memory, LSTM), Sequential Memory Neural Network (FSMN), etc.

In some examples, converting the announcement Text into the announcement voice for the user may be implemented by a related algorithm of Speech synthesis (Text To Speech, TTS), such as WaveNet based on an original audio generation model, tacontron based on an end-To-end Speech synthesis model, DeepVoice based on a neural Text Speech conversion, VoiceLoop based on Speech loop for Speech fitting and synthesis, and the like.

In the prior art, the ASR and TTS modules are integrated into a whole module, so that once there is a service requiring ASR or TTS, the whole module is called, but only one of the ASR and TTS may be used for working, and the other is occupied by the current service, so that the subsequent service needs to wait, and the running efficiency is low.

To this end, as shown in fig. 3, a functional module diagram of the voice service unit in the embodiment of the present application is exemplarily shown.

In this example, the voice service unit 30 includes: a separate speech to text subunit 31 and text to speech subunit 32. The speech-to-text subunit 31 may be implemented by the ASR module, and the text-to-speech subunit 32 may be implemented by the TTS module. The two are independent from each other, and can work in parallel, for example, the speech-to-speech subunit 31 receives the service requirement 1 to provide ASR service, and the speech-to-speech subunit 32 receives the service requirement 2 in parallel to provide TTS service, so that the operating efficiency of the whole intelligent outbound system can be effectively improved, and the delay is reduced.

Optionally, the voice service unit 30 may further include: and the media resource control subunit 33 is used for the voice-to-text subunit 31 and the text-to-speech subunit 32 to be respectively connected in a pluggable manner, and is used for interacting with the outside to selectively call the voice-to-text subunit 31 and the text-to-speech subunit 32 according to service requirements. The Media Resource Control subunit 33 may communicate with the outside through a Media Resource Control Protocol (MRCP), which is a communication Protocol of a computer network application layer and is used for a voice server to provide various voice services to a client.

In some examples, the pluggable connection comprises: remote Procedure Call (Remote Procedure Call) connection. In a further example, the pluggable connection may be a gRPC, that is, Google RPC, which is a high-performance open source software framework based on HTTP 2.0 transport layer protocol bearer issued by Google, and since secondary development may be performed, the use of the gRPC may be more focused on the content of the service level, which reduces the concern for the underlying communication implemented by the gRPC.

The session service unit 23 is configured to accept the call of the dialing service unit 21, enable a corresponding scene template according to the scene information, and invoke the service of the intention engine 24 to analyze the user text to obtain a corresponding user intention, so that a plurality of process nodes in the scene template form a process driven by the user intention, determine a broadcast conversation at least part of the process nodes reached by the process, and convert the broadcast text corresponding to the broadcast conversation into a broadcast voice for the user by the voice service unit 22.

In the above example, the scene template is innovatively introduced to select corresponding scene templates corresponding to different service scenes, and the scene template is not only a simple configuration parameter, in which combinations of various process nodes can be preset, and the service of the intention engine is called in the running process of the process nodes to obtain the user intention so as to drive the automatic formation of the process, so that not only is the flexible selection of the scene templates corresponding to different service scene requirements realized, but also the session precision effect is improved through the cooperation of the process nodes and the corresponding intention engines.

Fig. 4 is a schematic diagram illustrating a graphical interface of scene management in the embodiment of the present application.

In the graphical interface, a plurality of scene templates configured in advance are exemplarily shown, including: anti-cheating, hastening, activating an invitation, and the like, wherein optionally, a "+" area is provided for a user to operate (such as clicking) so as to create a scene template; alternatively, an interface for deleting the scene template may be provided.

In some examples, the scenario information configured by the dialing service unit 21 may be, for example, identification information of a scenario template, such as an ID number. The scene information can be configured in other graphical interactive interfaces related to the dialing service unit and related to outbound tasks to be executed according to business requirements. For example, if a service demand is to be collected, a collection-promoting outbound task is generated, the collection-promoting outbound task corresponds to a collection-promoting scenario, and if the ID number of the collection-promoting scenario is ab003, the scenario information is ab 003.

And introducing the flow nodes in each scene template. Fig. 5 is a schematic diagram showing a scene flow structure in the embodiment of the present application.

In fig. 5, each process node 51 is shown, and the process nodes 51 are associated according to a preset requirement and represented by a connection line between the process nodes 51; the direction in which the flow proceeds is shown from left to right in the figure, but this is by way of example only and not by way of limitation, and may be from top to bottom, from bottom to top, or from right to left.

In other words, a flow node 51 is connected to the left by a connecting line to its preceding flow node 51, and to the right by a connecting line to its succeeding flow node 51. If a flow node 51 has multiple branches connected backward, the flow node 51 may be selected by the user's intention.

In the plurality of flow nodes 51, may have different types, respectively, to each implement the assigned logic to form a flow. In some examples, the types of the plurality of flow nodes 51 include any one or more of the following:

1) and the intention node is used for obtaining the user intention through the service of the intention engine when the flow arrives, and selecting a path to the next flow node according to the user intention.

For example, as shown in fig. 6, in an hastening scenario, if a flow reaches an intention node "core" of "check identity", if a user has been asked "you, ask for how xx is" and a corresponding reply is obtained, for example, "i am not", the corresponding intention node "core" may determine that "i am not" is more matched with a preset intention classification "not i am" according to a user text of "i am not" through a service of an intention engine, and then determine that "i am not i", and lead to a next flow node through a corresponding connecting line.

Or, the intention node ' core body ' judges that the intention node ' is matched with the preset intention classification ' busy ' according to the user text ' i is busy now ', judges that the intention node ' is busy ', and leads to the next process node through a corresponding connecting line.

In some examples, the intent node may also be configured to output a corresponding announced dialog if the querying user is unresponsive. Optionally, the condition that the user is asked to have no response includes any one of the following: timeout has not replied; the reply is rejected.

For example, "timeout operation" may be set corresponding to the timeout unanswered, and the time threshold according to which the timeout is determined may be set according to actual requirements, for example, 5 seconds; such as when the user is first asked: "do you like, ask for a question of a man of mr? "no response is received 5 seconds later," timeout "could be a repeat query," you are good, ask for mr. Zhang? "or" you may not be convenient to call now, i am out and then get to your bar, thanks ".

The "reject speaking operation" may be set corresponding to the rejection response, for example, if the user does not reply, the repeated query "do you be mr. then? "or" you may not be convenient to call now, i am out and then get to your bar, thank you "etc.

Although the above examples illustrate repetition of the speech overtime and speech refusal, it is understood that in practical applications, the two speech techniques can be changed according to different requirements, and are not limited to the content of the two speech techniques.

In some examples, one or more of the intent classifications, time-out utterances, recognition-denied utterances, etc., that may be identified in the intent node may be presented to the user through a graphical user interface for setting by the user.

For example, in the graphical user interface of fig. 7a, the user can edit and set the related contents, such as node name, intention classification, timeout dialogs, rejection dialogs, inquiry dialogs, and their contents and entry addition/deletion (in the figure, "+" indicates addition, and the symbol of the trash can indicates deletion), etc. The node name can be defined by itself, for example, related to the function of the node, such as "check identity"; one or more user intents, such as "missed", "busy", "determined", may be filled out in the intent classification items; the enquiry is for example "respect client, i am of the department of banking credit business, ask for X mr. from oneself"; the timeout technique includes, for example: "can you not hear clearly, can you say again? "," hello, can you hear me how do you speak? "; the denial of speech may coincide with a timeout speech.

It should be noted that, when there is a new intent classification for the intent node, if the service of the intent engine predicts the user's intent through a trained machine learning model, the new added intent classification may be fed into the intent engine as a new classification label to train the machine learning model, for example, a user text (which may be extracted from a user's voice in the same scene, manually input by an experienced person corresponding to the same scene, or imported from existing response data of the user side in another scene) which is previously correctly labeled with the new classification label is input as training data to the machine learning model, and according to the difference between the prediction result and the classification label, namely the loss, the parameters of the machine learning model are automatically adjusted for the purpose of reducing the loss until convergence is achieved, so that the machine learning model can learn the relationship between the new classification label and the corresponding user text.

2) And the reply node is used for outputting the preset broadcast technique when the flow arrives.

For example, in fig. 6, the previous node of the intention node "core" can be the reply node "core" in which a preset reply word "hello, ask for xx mr. then? ".

The reply node may not need to recognize the user's intention, and may output a preset reply dialog only when the flow arrives, i.e., when selected by the corresponding connection line.

FIG. 7b illustrates a graphical user interface for configuring parameters in a reply node. In the graphical user interface of fig. 7b, the user can set the name of the reply node and a preset reply language, for example, the node name is "nuclear", the reply language is "hello, ask for xx mr.? "

3) And the word slot node is used for extracting key information in the user intention through the service of the intention engine when the flow arrives.

In some examples, the key information may include named entity information. The named entities (namespaces) are names of people, organizations, places, and other entities identified by names. The broader entities also include numbers, dates, currencies, places, and the like.

In some examples, the word slot node may also be preset with a corresponding query language for guiding the user to output the response content with the required key information, for example, in the case that a user's intention of "buying a train ticket" has been previously acquired and a corresponding destination needs to be obtained, the preset query language may be output: "where train tickets you want to buy"; further, if the user response is "i want to buy the train ticket to beijing", it can be determined that the user intention is to buy the train ticket, and the key information of "beijing" can be extracted from the train ticket by calling the service of the intention engine through the word slot node, so that a more accurate user intention can be obtained that the train ticket to beijing needs to be bought.

Therefore, when the user intention is obtained, the named entity can be further extracted from the obtained object to provide a more detailed process, so that the conversation precision effect can be greatly improved, the method is more suitable for the actual requirements of the user, and better user experience can be achieved in the actual application, for example, the user satisfaction is high; the application in the telephone sales scene is also beneficial to improving the precision of accurate marketing and achieving higher sales success rate and the like.

FIG. 7c illustrates a graphical user interface for configuring parameters in a word-slot node. In the graphical user interface of fig. 7c, the user can set, for example, a node name (such as "collection place"), a word slot (indicating a category of key information to be extracted, etc.), a question word, etc. by operation.

In the word slot node, a timeout time, a timeout speech technique, a rejection speech technique, etc. may also be set to cope with a situation where no user response is received.

In the configuration interface, there is a call demand for user information, for example, "mr. zhang", and for example, "your account is xxx" and "your tail number is yyy" in a common outgoing call all relate to the user information. In some examples, the dial service unit is further configured with user information related to the dial task, and the user-related content in the broadcast text corresponding to the broadcast dialog used by the node is populated by linking the user information, e.g., "mr. zhang" by, e.g., "user name }" user name function call.

4) And the calling node is used for using the external resource through a preset application interface.

In some examples, an Application Programming Interface (API) is some predefined function, or convention that refers to the engagement of different components of a software system. Optionally, the preset application interface may be a presentation State Transfer (REST) application interface, and the RESTful API may be a set of unified interfaces providing services for Web, iOS, Android, and the like. For a large number of platforms, such as a microblog open platform, a wechat public platform and the like, an explicit front end is not needed, only a set of interfaces for providing services is needed, and the RESTful API is more suitable for the platform.

Fig. 7d is a schematic diagram of a graphical user interface for configuring a calling node in the embodiment of the present application. The items which can be configured by the calling node are node names, calling addresses, request modes, input parameter settings (the value of an input parameter is needed by a called function), output parameter settings (the value of an output parameter is needed by a calling function) and the like.

It should be noted that, in order to indicate the start and the end of the process, the type of the process node may further include a start node and an end node, which are located at the head and the tail of the process, respectively.

As can be seen from the above example, different types of nodes may be combined with each other to implement different logic. The intention node and the word slot node are driven by the user intention to select the flow direction, and multi-step communication with the user in the conversation process can be achieved through the logical combination of the intention node and the word slot node.

The combination formed by the connection between the intention node and the word groove node comprises at least one of the following:

1) and the single word slot node is connected behind an intention node and is used for extracting one or more kinds of key information from the user intention obtained by the intention node.

For example, as shown in fig. 8, it is assumed that a word slot node B is connected behind an intention node a, and if the intention node a determines that the user intends to enter the word slot node B, the word slot node B may obtain the relevant time from the intention, and obtain the intention of entering the service according to xx time.

For example, the word slot node B outputs a query saying "ask you about which day to get home for delivery", and if the user answers "8/20/2020", it is clear that the user needs to get home for delivery on 8/20/2020; alternatively, the wordband node B may obtain time information or location information together, for example, asking "ask you which day you want to get home, and whether the address is xxxx".

For example, as shown in FIG. 9, an intention node C is shown followed by a concatenation of a word slot node D and a word slot node E; in an actual example, the user intention may include various kinds of key information, locations, times, and the like, and may be obtained step by step through a plurality of sequential word slot nodes, for example, if it is obtained that the user intention is "buy a train ticket", and a corresponding destination needs to be obtained, a preset query operation may be output: "where and where you want to buy train tickets"; further, if the user response is that "i want to buy a train ticket from Shanghai to Beijing", it can be determined that the user intention is to buy the train ticket, and the key information of the departure and destination points can be extracted from the information by calling the service of the intention engine through the word slot node D, so that more accurate user intention can be obtained that the train ticket from Shanghai to Beijing needs to be bought; the user is inquired through the word slot node E that the user wants to buy the train ticket on which day, and if the key information of the tomorrow is extracted from the user response, the fact that the user wants to buy the train ticket from Shanghai to Beijing in tomorrow is indicated. In the following process nodes, more detailed contents, such as points, can be continuously inquired through the word slot node; external data, such as train management data, may be called by the calling node, and the train shifts to beijing from shanghai in tomorrow are obtained and then reported to the user.

It is understood that the intention node and the word slot node may encounter situations where the intention cannot be hit in actual operation. For example, in asking "ask you where day you will go home" and the answer obtained is "Saturday", the user may be asked again until a recognizable answer is obtained; alternatively, manual service may be introduced as the case may be, and optionally intent classifications or word slots added accordingly.

In a specific implementation, the reply node may be connected before the intention node, for example, "core" first initiates a preset announcement to the user, "hello, ask for xx mr-now"; the reply node may also be connected after the intent node, e.g., after determining that the user's intent is "not needed," output a preset announcements of the reply "good, see! "and the like.

In some examples, the principles of the intent engine are specifically illustrated. In particular implementations, the intent engine may be implemented by one or more intent prediction models, the types of services provided by the intent engine including one or more of:

1) and the semantic search prediction service is used for outputting a prediction result obtained by semantic search based on template matching.

In some examples, the semantic search based on template matching may be used in a case where the corpus of the user available for other machine learning models is small, such as after which intention classifications are preset, a word bank may be set for each intention classification, and keywords related to the belonged intention classifications may be set in the word bank, so that semantic matching may be performed with the text of the user, and the user intention may be predicted according to statistical results such as word frequency, or may also be combined with weights of the matched keywords.

2) And the machine learning prediction service is used for outputting a semantic prediction result based on the machine learning model.

In some examples, the machine learning semantic model may be a Natural Language Processing (NLP) based semantic model. For example, traditional machine learning models (e.g., bayes, naive bayes, support vector machines, etc.); deep learning models such as Recurrent Neural Networks (RNN), Long-Short Memory models (LSTM), bidirectional Long-Short Memory models (bilst), Convolutional Neural Networks (CNN); optionally, in the semantic recognition of NLP, a word representation model and the like may be used to process a text into feature vectors in advance, and then perform semantic recognition, for example, a word vector related to a context is used, which is more beneficial to processing efficiency and reducing data dimensionality; the simplest one-hot Word vector representation mode has the problems of dimension disaster and semantic gap, and subsequent pre-training models, such as static Word vector representation models like Word2vec and FastText, can greatly improve the efficiency, such as ELMo, GPT, BERT and other dynamic Word vector representation models, and can further solve the problem of effective Word ambiguity; in addition, optionally, an Attention (Attention) algorithm, such as a Self-Attention (Self-Attention) algorithm, may be further combined to give higher weight to features of key information in the text, and the obtained word feature vector may be further encoded by the Attention algorithm to obtain a vector that can reflect more accurate context semantics, so as to make the semantic prediction result more accurate.

When the machine learning model is trained, the user text corpus with the real label is input into the model, the output prediction result is compared with the real label to calculate loss, parameters of the machine learning model are adjusted according to the loss, so that training is completed when the output machine learning model is stable, and the trained semantic prediction model is obtained.

3) And the named entity recognition prediction service is used for outputting a prediction result of the named entity recognition based on a regression or machine learning semantic model.

Named Entity Recognition (NER) is a fundamental task in NLP technology. The NER is an important basic tool for a plurality of NLP tasks such as information extraction, question answering system, syntactic analysis, machine translation and the like. The task of named entity recognition is to identify three major classes (entity class, time class and number class), seven minor classes (person name, organization name, place name, time, date, currency and percentage) in the text to be processed.

The NER models include, for example, Hidden Markov Models (HMMs), Conditional Random Field models (CRFs), or their deformed models or combinations with other models, such as LSTM and CRF models, and the like.

Since the named entity recognition task is a sequence tagging task, the training data can be obtained by performing part-of-speech tagging on the corpus, wherein the part-of-speech tagging includes, for example, a tagging manner of BIO, and each element in the corpus is tagged as "B-X", "I-X", or "O". Wherein "B-X" indicates that the fragment in which the element is located belongs to X type and the element is at the beginning of the fragment, "I-X" indicates that the fragment in which the element is located belongs to X type and the element is in the middle position of the fragment, and "O" indicates that the fragment does not belong to any type. Then, the obtained training data is input into the NER model for training, and the trained NER model is obtained.

4) And the knowledge graph prediction service is used for outputting a prediction result of semantic search based on the knowledge graph.

In some examples, a knowledge graph is a way of expressing knowledge, and a knowledge graph is often expressed based on triples, where a triplet of a knowledge graph refers to < subject (subject), predicate (predicate), object (object) >, and the value of a subject is generally any one of an entity, a fact, or a concept; the value of the predicate is usually a relationship or an attribute; the value of the object can be an entity, an event, a concept, or a common value.

The method has the advantages that the knowledge graph is generated according to the data, semantic search prediction is carried out according to the knowledge graph, and compared with a mode of directly processing the data, extracting features and then carrying out prediction calculation, efficiency is greatly improved. The knowledge map is not necessarily generated in the intelligent outbound system, can be externally existing and introduced into the system, and particularly the existing knowledge map which is relatively related to the customer service/outbound task of related services is more favorable for improving the operating efficiency of the intelligent outbound system.

In the knowledge-graph prediction service, semantic recognition models based on the NLP technology similar to the previous machine learning prediction service, such as deep learning models LSTM, Bi-LSTM and the like, can also be adopted, and the linguistic data and corresponding real classification labels taken from the knowledge graph are used as training data, and input into the corresponding models to train the models so as to obtain the trained semantic prediction model based on the knowledge graph.

5) A reading understanding prediction service for outputting a prediction result of a semantic search based on a reading understanding of a text.

In some embodiments, similar to the knowledge-graph prediction, the knowledge-graph prediction can be performed by using existing documents, such as working data related to customer service calls, Standard Operating Procedure (SOP) related data documents, corresponding related data documents, and so on. In practical implementation, the reading understanding model may adopt the machine learning model in the foregoing embodiment, or a machine learning model more suitable for reading understanding, such as a deep lstm (deep lstm), a focused Reader (Attentive Reader) model, and the like, and the corpus and the real classification tags in the document are taken as training data, and the corresponding model is input to train the corpus and the real classification tags, so as to obtain a trained semantic prediction model based on reading understanding.

In the intention prediction services 1) to 5) described above, machine learning models are used in 2) to 5). In a possible implementation manner, the intention node performs user intention acquisition and may use services 1), 2), 4) and 5), and the word slot node performs key information extraction and may use 3) named entity recognition prediction service. And, after the intention node has a new added intention classification label, the intention classification label may be transmitted to an intention engine for 2), 4), 5) training of the relevant machine learning model; optionally, if there is a new word slot in the word slot node, the new word slot node may also be transmitted to the intention engine, and used for 3) training of the relevant model.

Optionally, the intention node and the word slot node in the plurality of process nodes have intention service attribute information for defining a service type of an intention engine used by the intention node. For example, the intention node has intention service attribute information a defining a service which uses any one or more of 1), 2), 4), 5) above; the slot node has intention service attribute information B, which defines the service it uses 3) above.

The session service unit can monitor the process progress among the process nodes, and can call the corresponding semantic prediction service of the intention engine to obtain the user intention or the key information in the user intention according to the intention service attribute information of the intention node or the word slot node reached by the process.

In some examples, the intention service attribute information of the intention engine or the word slot node may define that multiple services may be used, such as 1), 2), 4), and 5), so that preferably, different priority weights may be given to the services according to characteristics of an actual service scene, for example, the priority is 4) >5) >2) >1 from top to bottom, taking into account prediction accuracy and efficiency); or, in the case of corpus scarcity or the machine learning model is not trained yet, 1) may be given the highest priority.

Referring to fig. 10, a functional module diagram of an intelligent outbound call system in another embodiment of the present application is shown.

Existing automatic outbound systems are directly connected to an external voice exchange (PBX), and therefore if the application is different, a different PBX needs to be developed.

To this end, in the example of the present application, the intelligent outbound system 100 further includes: and the system voice exchange service unit 101 is connected with the dialing service unit 102, the voice service unit 103 and the conversation service unit 104, and can make a user call by connecting to a telephone service network.

It should be noted that the system voice exchange service unit 101 can be used as a PBX in the system, and can be configured to be flexibly switched to adapt to functional units of different applications, so as to avoid the problem of multiple outgoing of the external PBX 106.

The dialing service unit 102 receives the outbound task, and sends a corresponding instruction to the system voice interaction service unit when executing the outbound task, so as to access the PSTN 107 through the external PBX 106 and dial a target user to form a session; the dialing service unit 102 calls the session service unit 104 through the system voice exchange service unit 101, provides scene information related to tasks, and also provides user information, the session service unit 104 starts execution of a flow according to a corresponding scene template started by the scene information, for example, a reply node after the start node outputs a word "hello" for guiding a user, asking for a question of xx mr-birth, and the like; the system voice exchange service unit 101 interacts with the media resource control subunit 1031 of the voice service unit 103 (which may interact via, for example, the MRCP protocol), converts the voice input from the user telephone into a user text via the voice-to-text subunit 1032 of the voice service unit 103, and transmits the user text to the conversation service unit 104 via the system voice exchange service unit 101, and the conversation service unit 104 invokes the intention engine 105 to analyze the user text for the user intention to continue the flow; when the flow node reached by the flow in the scene template needs to broadcast voice to the user, the broadcast text corresponding to the broadcast announcement is transmitted to the media resource control subunit 1031 through the system voice exchange service unit 101, the text-to-voice subunit 1033 is called to convert the broadcast text into broadcast voice, the broadcast voice is output to the external PBX 106 through the system voice exchange service unit 101, and the broadcast voice is played to the user through the PSTN 107 and the telephone device 108 of the user.

Optionally, the voice interaction service unit of the system, the dialing service unit 102 and the session service unit 104 may interact data based on an internet communication protocol (e.g., a hypertext transfer protocol HTTP).

The system voice switching service unit 101 is bridged with an external voice switching device to connect to the telephone service network through the external voice switching device.

Fig. 11 is a schematic diagram showing a hardware architecture of the intelligent outbound system in the embodiment of the present application.

The hardware architecture of the intelligent outbound call system comprises: an application server group 111, a background server group 112, and a data server group 113.

In some examples, the application servers in the application server group 111 are used for deployment of the entire intelligent outbound system, and for the user management terminal 110 of an administrator or a common user to access and manage the intelligent outbound system; optionally, in order to safely isolate the intelligent outbound system, a firewall may be further disposed between the user management terminal 110 and the application server group 111. The plurality of application servers included in the application server group 111 can adjust the ratio of the respective tasks of each other by using a load balancing rule.

In some examples, the background server group 112 may be used to implement various functional units in a software system, such as the intelligent outbound system portion of fig. 10, such as a dial service unit, a voice service unit, and a session service unit. Each background server in the background server group 112 may form a cluster, perform cooperation of the same outbound task, or respectively execute different outbound tasks.

The background server group 112 accesses the PSTN 115 through an external PBX 114 and calls the user's telephone device 116. In practical applications, a firewall may be disposed between PBX 114 and PSTN 115.

In some examples, databases, such as relational databases (e.g., PostgreSQL) and distributed data (e.g., MongoDB), are primarily run in the data server group 113 to provide access to outgoing call related data. The data server group 113 may include at least two data servers, which form a dual-server hot standby, that is, one of the data servers serves as a master and the other one serves as a slave, and the two data servers are connected to each other through a network; the host is in a working state under normal conditions, the slave is in a monitoring state, once the slave finds that the host is abnormal, the slave replaces the host within a short time, the function of the host is completely realized, the data safety is favorably ensured, and the stable operation of the whole intelligent outbound system is favorably realized.

It should be noted that the hardware architecture in fig. 11 is only an implementation example, and the architecture can be completely changed according to actual requirements, if the data volume of the intelligent outbound system is small, each server group may be replaced by a single server, and even the entire intelligent outbound system may be implemented on one server.

As shown in fig. 12, the intelligent outbound method in the embodiment of the present application is shown, and is applied to the intelligent outbound system in the foregoing embodiment, that is, the intelligent outbound system can be used as an execution subject. For details of the intelligent outbound method in this embodiment, reference may be made to the above-mentioned embodiment of the intelligent outbound system, and repeated descriptions of technical details are omitted here.

The intelligent outbound method specifically comprises the following steps:

step S121: executing a dialing task to form a conversation between the intelligent outbound system and the dialed subscriber's telephone;

step S122: converting the user voice in the conversation into user text;

step S123: enabling a corresponding scene template according to scene information related to the dialing task, calling a service of an intention engine to analyze the user text to obtain a corresponding user intention, enabling a plurality of process nodes in the scene template to form a process driven by the user intention, and determining a broadcast dialog at least part of process nodes reached by the process;

step S124: and converting the broadcast text corresponding to the broadcast operation into broadcast voice for the user.

and the word slot node is used for extracting key information in the user intention through the service of the intention engine when the flow arrives.

Optionally, the key information includes: named entity information.

The computer system may be implemented by a server/server group, or may be implemented by a processing system formed by connecting processing devices in communication, where the hardware requirement is not high, such as a desktop computer, a notebook computer, a mobile phone, a tablet computer, and the like, and the processing devices have smaller processing capability.

The computer system may be loaded with, for example, the intelligent outbound system in fig. 2 and 10; the computer system may also be implemented as a backend server or group of backend servers in fig. 12.

The computer system 130 comprises a memory 131 and a processor 132, the memory 131 having stored thereon a computer program operable on the processor 132, the processor 132 executing the computer program to perform the steps of the intelligent callout method, such as in the embodiment of fig. 12.

In some examples, the processor 132 may be a combination that implements a computing function, such as a combination comprising one or more microprocessors, Digital Signal Processing (DSP), ASIC, or the like; the Memory 131 may comprise a high-speed RAM Memory, and may further include a Non-volatile Memory (Non-volatile Memory), such as at least one disk Memory.

When the computer system 130 needs to communicate with the outside, it may further include a communicator 133, and the communicator 133 may include one or more of a wired network card, a wireless network card, and a 2G/3G/4G/5G module, for example, so as to interact information with the outside, such as internet.

Embodiments of the present application may also provide a computer-readable storage medium having a computer program stored thereon, where the computer program is executed to execute the steps of the intelligent outbound method in fig. 12, for example.

That is, the graphic programming work inspection method in the above-described embodiment of the present invention is implemented as software or computer code storable in a recording medium such as a CD ROM, a RAM, a floppy disk, a hard disk, or a magneto-optical disk, or computer code originally stored in a remote recording medium or a non-transitory machine-readable medium downloaded through a network and to be stored in a local recording medium, so that the method described herein can be processed by such software stored on a recording medium using a general-purpose computer, a dedicated processor, or programmable or dedicated hardware such as an ASIC or FPGA. It will be appreciated that the computer, processor, microprocessor controller or programmable hardware includes storage components (e.g., RAM, ROM, flash memory, etc.) that can store or receive software or computer code that, when accessed and executed by the computer, processor or hardware, implements the graphical programming product inspection methods described herein. Further, when a general-purpose computer accesses code for implementing the graphical programming product inspection method illustrated herein, execution of the code transforms the general-purpose computer into a special-purpose computer for performing the methods in the foregoing embodiments.

On the other hand, in the intelligent outbound system in the embodiment of the application, one or various types of traditional models, deep learning models and the like can be used for analyzing the user intention, so that the intelligent outbound system has better scene universality and gives consideration to the conversation accuracy.

On the other hand, in the intelligent outbound system of the embodiment of the application, the form of the module for voice/text interconversion can be divided into independent modules on the system architecture, and a pluggable connection mode is used, so that different divided functional modules can meet different service requirements, and higher use efficiency is achieved.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer programs. The procedures or functions according to the present application are generated in whole or in part when the computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer program may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another computer readable storage medium.

For example, various functional units in the embodiments of fig. 2, 3, 10, etc. may be software implementations; or may be implemented in a combination of hardware and software, for example, as a computer program that runs on a memory through a processor in a computer system embodiment; alternatively, the present invention may be implemented by a hardware circuit.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated components can be realized in the form of hardware or in the form of software functional units. The integrated components described above, if implemented in the form of software functional units and sold or used as separate products, may also be stored in a computer readable storage medium. The storage medium may be a read-only memory, a magnetic or optical disk, or the like.

For example, in the embodiments of fig. 2, 3, and 10, each functional unit may be implemented by a single independent program, or may be implemented by different program segments in a program, and in some implementation scenario templates, the functional units may be located in one physical device, or may be located in different physical devices but communicatively coupled to each other.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present application, "a plurality" means two or more unless specifically limited otherwise.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing units, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process. And the scope of the preferred embodiments of the present application includes other implementations in which functions may be performed out of the order shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved.

The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions.

For example, the intelligent outbound method in the embodiment of fig. 12, etc., the order of the steps may be changed in the specific scene template, and is not limited to the above description.

Although the embodiments of the present invention are disclosed above, the present invention is not limited thereto. Various changes and modifications may be effected by one skilled in the art without departing from the spirit and scope of the embodiments of the invention as defined in the appended claims.

Claims

1. An intelligent outbound system, comprising:

and the session service unit is used for receiving the call of the dialing service unit, starting a corresponding scene template according to the scene information, and calling an intention engine to analyze the user text to obtain a corresponding user intention so as to enable a plurality of process nodes in the scene template to form a process driven by the user intention, determine a broadcast conversation at least part of the process nodes reached by the process, and convert the broadcast text corresponding to the broadcast conversation into broadcast voice for the user by the voice service unit.

2. The intelligent outbound system of claim 1 wherein the types of said plurality of flow nodes comprise any one or more of the following in combination:

3. The intelligent outbound system of claim 2 wherein the intent nodes and/or word slot nodes are further configured to output a corresponding announced utterance if the querying user is unresponsive.

4. The intelligent outbound system of claim 3 wherein said asking the user for no response comprises any of: timeout has not replied; the reply is rejected.

5. The intelligent callout system of claim 2 wherein the combination of connection between the intent node and the word slot node includes at least one of:

6. The intelligent outbound system of claim 2 or 5 wherein the critical information comprises: named entity information.

7. The intelligent outbound system of claim 1 or 2 wherein the types of services provided by the intent engine include one or more of:

8. The intelligent callout system of claim 7 wherein the intent nodes and wordbatch nodes in the plurality of process nodes have intent service attribute information defining the type of service of the intent engine they use.

9. The intelligent outbound system of claim 1 wherein said dialing service unit is further configured with subscriber information related to said dialing tasks; and the user related content in the broadcast text is filled by linking the user information.

10. The intelligent outbound system of claim 1 comprising: and the system voice exchange service unit is connected with the dialing service unit, the voice service unit and the conversation service unit and can dial the user telephone by being connected to the telephone service network.

11. The intelligent outbound system of claim 10 wherein said system voice switch service unit bridges with an external voice switching device to connect to said telephony service network through said external voice switching device.

12. The intelligent outbound system of claim 1 wherein said voice service unit comprises: a speech-to-text subunit and a text-to-speech subunit.

13. The intelligent outbound system of claim 12 wherein said voice service unit further comprises:

14. The intelligent outbound system of claim 13 wherein said pluggable connection comprises: the remote procedure calls the connection.

15. An intelligent outbound method is characterized in that the method is applied to an intelligent outbound system; the intelligent outbound method comprises the following steps:

converting the user voice in the conversation into user text;

16. The intelligent outbound method of claim 15 wherein the types of said plurality of flow nodes comprise any one or more of the following in combination:

17. The intelligent outbound method of claim 16 wherein the intent nodes and/or word slot nodes are further configured to output a corresponding announced utterance if the querying user has no response.

18. The intelligent outbound method of claim 17 wherein said asking the user for no response comprises any of: timeout has not replied; the reply is rejected.

19. The intelligent callout method of claim 16 wherein the combination of the connection between the intent node and the word slot node includes at least one of:

20. The intelligent outbound method of claim 16 or 19 wherein the critical information comprises: named entity information.

21. The intelligent callout method of claim 15 or 16, wherein the types of services provided by the intent engine include one or more of the following:

22. The intelligent callout method of claim 21 wherein the intent nodes and wordbatch nodes in the plurality of process nodes have intent service attribute information defining the service type of the intent engine used by them.

23. The intelligent callout method of claim 15, wherein the user-related content in the transcript of the announcement is populated by linking preconfigured user information.

24. A computer system for an intelligent outbound system comprising a memory and a processor, said memory having stored thereon a computer program operable on said processor, wherein said processor, when executing said computer program, performs the steps of the intelligent outbound method of any of claims 15 to 23.

25. A computer-readable storage medium, having stored thereon a computer program, characterized in that the computer program is operative to perform the steps of the intelligent callout method according to any of the claims 15 to 23.