CN112489645A - Intelligent voice interaction method, system and storage medium - Google Patents

Intelligent voice interaction method, system and storage medium Download PDF

Info

Publication number
CN112489645A
CN112489645A CN202011223378.XA CN202011223378A CN112489645A CN 112489645 A CN112489645 A CN 112489645A CN 202011223378 A CN202011223378 A CN 202011223378A CN 112489645 A CN112489645 A CN 112489645A
Authority
CN
China
Prior art keywords
voice
voice interaction
strategy
intelligent
call
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011223378.XA
Other languages
Chinese (zh)
Inventor
曹玉龙
梁凯丰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zhongkai Xintong Information Technology Co ltd
Original Assignee
Beijing Zhongkai Xintong Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zhongkai Xintong Information Technology Co ltd filed Critical Beijing Zhongkai Xintong Information Technology Co ltd
Priority to CN202011223378.XA priority Critical patent/CN112489645A/en
Publication of CN112489645A publication Critical patent/CN112489645A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/04Details of speech synthesis systems, e.g. synthesiser structure or memory management
    • G10L13/047Architecture of speech synthesisers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1815Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1822Parsing for meaning understanding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/42221Conversation recording systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/50Centralised arrangements for answering calls; Centralised arrangements for recording messages for absent or busy subscribers ; Centralised arrangements for recording messages
    • H04M3/527Centralised call answering arrangements not requiring operator intervention
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Artificial Intelligence (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The invention relates to an intelligent voice interaction method, an intelligent voice interaction system and a storage medium, wherein the intelligent voice interaction method comprises the following steps: putting through a call to be processed; performing semantic analysis on the characters recognized by the ASR speech to obtain an AI recognition result; selecting a corresponding voice interaction strategy according to the AI identification result; the voice interaction strategy comprises at least one of a topic determination strategy, a repeated voice prohibition strategy, a topic conversion strategy, a context association strategy and a security interaction control strategy; playing corresponding response voice according to the voice interaction strategy to perform voice interaction with a calling party; through a plurality of voice interaction strategies, the intelligent voice interaction based on AI recognition is realized naturally and flexibly in the intelligent voice interaction conversation process, and the voice robot can really perform interactive conversation like a human.

Description

Intelligent voice interaction method, system and storage medium
Technical Field
The invention relates to the field of communication, in particular to an intelligent voice interaction method, an intelligent voice interaction system and a storage medium.
Background
In the development process of voice service of operators, a plurality of call centers, customer service centers and call pickup service centers appear. The call center provides an outbound service, and is mainly applied to product sales promotion; the customer service center provides customer service, and is mainly applied to telephone customer service; the call pickup service center provides pickup incoming call service, and is mainly applied to pickup harassing calls and pickup missed calls. These call processing systems have been vigorously developing voice robot products, and these systems have been provided with the following capabilities: call in/out capabilities; voice and text conversion capability; but only have above-mentioned 2 kinds of ability, the voice interaction of voice robot is very hard, very rigid, and the conversation gap with the people's normal call is very big.
Disclosure of Invention
The technical problem to be solved by the invention is to provide an intelligent voice interaction method, system and storage medium, which improve the voice interaction effect and enable the voice robots of the call control systems to be more and more humanized.
The technical scheme for solving the technical problems is as follows: an intelligent voice interaction method, comprising:
putting through a call to be processed;
performing semantic analysis on the characters recognized by the ASR speech to obtain an AI recognition result;
selecting a corresponding voice interaction strategy according to the AI identification result; the voice interaction strategy comprises at least one of a topic determination strategy, a repeated voice prohibition strategy, a topic conversion strategy, a context association strategy and a security interaction control strategy;
and playing corresponding response voice according to the voice interaction strategy to perform voice interaction with a calling party.
The invention has the beneficial effects that: connecting a call to be processed, and performing AI semantic analysis on the characters recognized by the ASR voice to obtain an AI recognition result; and then matching a corresponding voice interaction strategy according to the AI identification result, and executing a corresponding voice interaction process according to the voice interaction strategy, wherein the voice interaction strategy comprises topic determination, repetition prohibition, topic conversion, context association and safe interaction control strategies.
On the basis of the technical scheme, the invention can be further improved as follows:
further, the topic determination strategy comprises:
obtaining each conversation topic from each AI identification result;
selecting a topic associated with the calling party from the conversation topics as a target topic;
and carrying out voice interaction according to the target topic.
The beneficial effect of adopting the further scheme is that: all interactive conversations need to be developed around the topic of a calling party, so that the conversation is ensured to be continued, and more information is obtained.
Further, the prohibiting a repeated voice policy includes:
determining a response voice that has been played to the calling party;
and when the similarity rate of the voice to be played to the calling party and the voice to be played to the calling party is greater than a threshold value, replacing the response voice to be played.
The beneficial effect of adopting the further scheme is that: in the intelligent voice interaction process, repeated voice does not appear, and the natural and smooth conversation process is guaranteed without dragging.
Further, the topic conversion strategy comprises:
when the privacy information related to the calling party is included in each AI identification result, guiding to switch to other topics;
when the voice interaction of the current conversation topic is determined to be finished according to the voice interaction process, guiding to finish the current topic;
when the calling intention of the calling party is determined according to the voice interaction process, switching to the topic corresponding to the calling intention.
The beneficial effect of adopting the further scheme is that: the user can actively convert the topic in the conversation process, and the voice robot can also actively guide the topic conversion, so that a better interactive conversation process is realized.
Further, the context association policy includes:
recording and analyzing each piece of voice information of the calling party;
and performing context association on the voice information, determining the calling intention of the calling party, and playing corresponding response voice according to the calling intention.
The beneficial effect of adopting the further scheme is that: and recording each piece of speech information in the user interaction process, and performing front-back association, thereby ensuring that the conversation topics are orderly and deeply inserted and promoting the conversation willingness of the other party.
Further, the voice interaction control strategy comprises:
when determining that the voice information of the calling party comprises negative speech from each AI recognition result, recording voice character information and recording information corresponding to the voice information, and ending the call with the calling party;
and reporting the voice character information and the recording information.
The beneficial effect of adopting the further scheme is that: and carrying out safety control on the conversation, controlling the conversation within a reasonable and legal range when some bad contents appear in the conversation, and reporting the conversation of the bad contents in time.
Further, the intelligent voice interaction method further comprises the following steps:
and setting voice character characteristics for expressing emotional colors in the voice interaction process.
The beneficial effect of adopting the further scheme is that: the voice robot needs to have definite and continuous character characteristics in the conversation process, and the emotional colors are expressed through different character characteristics, so that the voice interaction process is more natural and flexible, and the voice conversation is not like a machine but like a human.
Further, after playing the corresponding response voice according to the voice interaction policy and performing voice interaction with the calling party, the method includes:
analyzing each AI identification result in the voice interaction process, and determining the call tag information of the voice interaction process;
and sending the call tag information to other statistical analysis systems for data statistical analysis.
The beneficial effect of adopting the further scheme is that: in the intelligent voice interaction process, various AI identification results can appear in each call, and according to the change condition of the AI identification results, the final call tag information of the call is intelligently calculated, so that other systems can conveniently perform data statistical analysis or perform secondary processing on data.
In order to solve the above problem, an embodiment of the present invention further provides an intelligent voice interaction system, where the intelligent voice interaction system includes: the system comprises an AS server, an MS server, an AI server and an interactive strategy server;
the AS server is used for connecting the call to be processed;
the AI server is used for performing semantic analysis on the characters recognized by the ASR voice to obtain an AI recognition result;
the interaction strategy server is used for selecting a corresponding voice interaction strategy according to the AI identification result;
and the MS server is also used for playing corresponding response voice according to the voice interaction strategy to perform voice interaction with a calling party.
In order to solve the above problem, an embodiment of the present invention further provides a storage medium, where the storage medium includes one or more computer programs stored therein, and the one or more computer programs are executable by one or more processors to implement the steps of the intelligent voice interaction method described above.
Drawings
Fig. 1 is a flowchart of an intelligent voice interaction method according to an embodiment of the present invention;
FIG. 2 is a block diagram of an intelligent voice interaction system according to an embodiment of the present invention;
fig. 3 is a network structure diagram of a call center and an intelligent voice interaction system according to an embodiment of the present invention;
fig. 4 is a service flowchart of an outbound call of the intelligent voice interactive system according to an embodiment of the present invention;
FIG. 5 is a flowchart of another intelligent voice interaction method according to an embodiment of the present invention;
fig. 6 is a network structure diagram of a customer service center/pickup call service center and an intelligent voice interaction system according to an embodiment of the present invention;
fig. 7 is a service flow chart of an intelligent voice interaction system for receiving an incoming call in accordance with an embodiment of the present invention;
fig. 8 is a flowchart of another intelligent voice interaction method according to an embodiment of the present invention.
Detailed Description
The principles and features of this invention are described below in conjunction with the following drawings, which are set forth by way of illustration only and are not intended to limit the scope of the invention.
As shown in fig. 1, fig. 1 is a flowchart of an intelligent voice interaction method provided in an embodiment of the present invention, where the intelligent voice interaction method is applied to an intelligent voice interaction system, and helps a voice robot in a call center, a customer service center, and a call pickup service center to use a voice flow intelligently driven based on an AI recognition result, so that a voice interaction effect is greatly improved, and the voice robots in the call control systems become more and more "humanized", and the voice interaction method includes:
s101, a call to be processed is butted;
s102, performing semantic analysis on the characters recognized by the ASR speech to obtain an AI recognition result;
s103, selecting a corresponding voice interaction strategy according to the AI identification result; the voice interaction strategy comprises at least one of a topic determination strategy, a repeated voice forbidding strategy, a topic conversion strategy, a context association strategy and a voice interaction control strategy;
and S104, playing the corresponding response voice according to the voice interaction strategy to perform voice interaction with the calling party.
In the embodiment, a call to be processed is connected, AI semantic analysis is firstly carried out on the characters recognized by ASR voice, and an AI recognition result is obtained; and then matching a corresponding voice interaction strategy according to the AI identification result, and executing a corresponding voice interaction process according to the voice interaction strategy, wherein the voice interaction strategy comprises topic determination, repetition prohibition, topic conversion, context association and safe interaction control strategies.
In this embodiment, the intelligent voice interaction system performs voice interaction with the calling party through the call center, the customer service center or the call pickup service center. Specifically, when voice interaction is carried out with a calling party, a calling center, a customer service center or a calling pick-up service center collects voice information of the calling party, voice is transcribed into characters through ASR voice recognition, the characters are transmitted to an intelligent voice interaction system, the intelligent voice interaction system carries out semantic analysis on words formed by the characters, semantic content represented by the character information is determined, keywords are extracted from the words, AI recognition results are obtained, expression information of the calling party is obtained, corresponding response voice is played according to a voice interaction strategy, and then the corresponding response voice is played to the calling party through the calling center, the customer service center or the calling pick-up service center. In some embodiments, the intelligent voice interaction system can also directly perform voice interaction with the calling party.
In this embodiment, the topic strategy includes obtaining each topic of conversation from each AI identification result; selecting topics associated with the calling party from all conversation topics as target topics, and carrying out voice interaction according to the target topics; namely, all interactive conversations need to be developed around the topic of the calling party, the topic associated with the calling party is selected, the conversation is ensured to be continued, and more information is obtained. For example, after ASR voice transcribes characters, semantic analysis is carried out to obtain 1 or more AI recognition results; for example, the telephone for advertising the loan can provide various AI identification results such as 'tentative sales loan', 'explicit sales loan', 'download APP sales', 'low interest inducement', 'quick loan inducement', 'non-audit inducement', etc. The AI recognition results provide a range of primary topics, and the intelligent voice interaction system intelligently selects a topic of conversation based on 1 or more AI recognition results, determines the topic of "tentative sales offer" associated with the party, and cuts into the intelligent voice interaction conversation around the "tentative sales offer". It is understood that the target topic around the calling party can be one or more, for example, when the topic around the calling party comprises a plurality of topics, and according to the relevance of the topics, the multiple topics can be selected to be correlated, for example, the "tentative sales offer loan" is correlated with the "non-audit-inducing", then the topic of the "tentative sales offer" can be determined, and on the topic, the intelligent voice interaction is performed around the "non-audit-inducing". At this time, corresponding response voice is played according to the voice interaction strategy, such as "please introduce the loan amount free of examination in detail".
In this embodiment, the policy for prohibiting repeated speech includes: determining a response voice played to a calling party; and when the similarity rate of the voice to be played to the calling party and the voice to be played to the calling party is larger than the threshold value, replacing the response voice to be played. Namely, repeated voice does not appear in the intelligent voice interaction process, and the natural and smooth conversation process is ensured without dragging. For example, when a call of a calling party is answered, a polite response voice of "hello, please talk" is played to the calling party first, and in the subsequent intelligent voice interaction process, similar response voices of contents such as "feed, please talk", "hello, please talk" and the like do not appear repeatedly.
In this embodiment, the topic conversion policy includes: when the privacy information related to the calling party is included in each AI identification result, guiding to switch to other topics; the intelligent voice interaction system can actively switch the topics, and through proper guiding voice, the topics with micro signals are not discussed any more, and the intelligent voice interaction system is guided to be switched to other topics.
When the voice interaction of the current conversation topic is determined to be finished according to the voice interaction process, guiding to finish the current topic; the topic conversion can be an ending topic, wherein in the multiple voice interaction process, when the incoming call intention of a calling party is clarified through semantic analysis, the current conversation topic is determined to be completed, for example, when the intelligent voice interaction system judges that the topic of the sales promotion loan is completed in the interaction, and does not need to carry out more responses, the topic can be converted, and by playing thank you, i need to consider that you need to contact you again and see a polite ending conversation process.
When the calling intention of the calling party is determined according to the voice interaction process, the topic corresponding to the calling intention is converted. The topic conversion method comprises the steps that topics are guided to be conducted deeply, for example, after an incoming call with an AI identification result of 'tentative sales offer loan', the incoming call is analyzed through an intelligent voice interaction process to be a sales offer loan, an intelligent voice interaction system can actively convert topics into 'sales offer loan', and the topics are guided to be conducted deeply actively.
In some embodiments, the topic of conversion may also be used for voice appeasing of the calling party, for example, when the intelligent voice interaction system determines that the calling party is coming to contact the called party user for service, for example, the topic of "property declaration", you may be immediately arranged to process "voice appeasing of the calling party by using a gentle answer" you will be.
In this embodiment, the context association policy includes: recording and analyzing each piece of voice information of a calling party; and performing context association on the voice information, determining the calling intention of the calling party, and playing corresponding response voice according to the calling intention. Namely, since the conversation is necessarily related to the front and the back, otherwise, the situation that the front language does not overlap with the back language is necessarily generated; the intelligent voice interaction system can record each piece of speech information in the user interaction process, and perform forward and backward association, so that the conversation topics are ensured to be orderly and deeply inserted, and the conversation willingness of the other party is improved. For example, if the caller receives the above message "financial products", and the following message "low risk", based on the voice recognition of the caller, the caller can be presented with "can you introduce a detailed description about the low risk financial products? ".
In this embodiment, the intelligent voice interaction method further includes: setting voice character features in the voice interaction process, such as a sprinkling character, a humorous character, a Kairan character and a depression character; the specific voice character feature can be determined by the intelligent voice interaction system according to the character feature of the user, and can also be determined by the user. That is, the voice robot needs to have a definite and continuous character feature during the conversation process, so that the voice interaction process can make the voice conversation not like a "machine" but more like a "human". The voice personality characteristic may be set, for example, after placing a pending call and establishing a voice connection with a party to the pending call. In some embodiments, semantic analysis can also be performed on the characters recognized by the ASR speech, and after an AI recognition result is obtained, a speech character feature is set; in the voice interaction process, the voice character feature can be converted according to the AI identification result, for example, when the AI identification result is a bad message, the peace character is converted into the frustrate character.
In this embodiment, the voice interaction control policy includes: and when determining that the voice information of the calling party comprises negative speech from the AI recognition results, recording voice character information and recording information corresponding to the voice information, ending the call with the calling party, and reporting the voice character information and the recording information. When the AI identification result comprises a negative statement, polite response is carried out on the opposite party, the communication with the calling party is ended, and meanwhile, incoming call text information is recorded and recording evidence is carried out; when the intelligent interactive system is in a call access condition, the intelligent interactive system can actively upload the incoming text information and the incoming record information to a bad information processing center of an operator through a reporting interface in the background, and the bad information processing center performs call quality inspection and subsequent processing according to a call judgment result. In some embodiments, there may be no need to actively report to the operator's malicious message processing center in the case of an outbound call.
In some embodiments, the calling number and the incoming call time information of the calling party inform the called party, but the text information and the incoming call record of the incoming call are actively shielded and the called party is not informed, so that the called party is protected from negative speech disturbance.
In this embodiment, after playing the corresponding response voice according to the voice interaction policy to perform voice interaction, the method includes: analyzing each AI identification result in the voice interaction process, and determining the call tag information of the voice interaction process; and sending the call tag information to other statistical analysis systems for data statistical analysis. In other words, in the intelligent voice interaction process, a plurality of AI identification results appear in each call, the final call label information of the call is intelligently calculated according to the change condition of the AI identification results, so that data statistics and analysis are facilitated for other systems, or data is secondarily processed, for example, a plurality of AI identification results such as 'trial sale loan', 'want to add micro signal', 'no audit induction', 'sale loan', 'want to continue speaking', 'need to return electricity' are totally generated in the intelligent voice interaction process, the intelligent voice interaction system judges that the accurate AI identification result of the incoming call is 'sale loan', and 'sale loan' is used as the final label of the incoming call through analysis. And other subsequent systems, such as an operator badness information processing center, can check and seal the frequently harassed calling numbers according to the final label statistics and incoming call analysis.
The method for carrying out intelligent voice interaction based on AI identification results enables a voice robot to quickly identify conversation topics through an advanced AI semantic analysis system and an intelligent voice interaction strategy; flexibly converting conversation topics according to a conversation process; the conversation process is smoother and natural by combining the front and back conversation of the interaction process, and the repeated conversation condition does not occur; the voice robot is endowed with a specific character, so that the conversation process is more vivid; and reporting bad information in the conversation process in time, and carrying out forced interception if necessary, so as to control the conversation process of the voice robot within a reasonable and legal range. By using the intelligent voice interaction strategy system, the problem that the voice robot has a rigid and stiff conversation process in the prior art system is thoroughly solved, so that the conversation process of the voice robot becomes flexible and natural, the conversation of the voice robot becomes more humanized, and the voice robot is not like a machine or a person.
In this embodiment, an intelligent voice interaction system is further provided, as shown in fig. 2, the intelligent voice interaction system includes: AS server 201, AI server 202 and interaction policy server 203 and MS server 204;
the AS server 201 is used for interfacing the call to be processed;
the AI server 202 is used for performing semantic analysis on the characters recognized by the ASR speech to obtain an AI recognition result;
the interaction strategy server 203 is used for selecting a corresponding voice interaction strategy according to the AI identification result;
the MS server 204 is further configured to play the corresponding response voice according to the voice interaction policy to perform voice interaction with the calling party.
It should be noted that the intelligent voice interaction system can directly connect the voice relay of the call center, the customer service center and the call pickup service center. The intelligent voice interaction system can be provided for the call center, the service center and the call pickup service center.
As shown in fig. 3, fig. 3 is a network structure diagram of a call center and an intelligent voice interaction system; wherein the intelligent voice interaction system in fig. 3 comprises:
an AI semantic analysis server: and providing AI semantic analysis capability, and carrying out AI semantic analysis on the characters transcribed by the ASR voice to obtain 1 or more corresponding AI recognition results.
The intelligent interaction policy server: providing intelligent interaction strategy capability, comprehensively selecting an intelligent interaction strategy according to an AI identification result, selecting response voice in an interaction process, and controlling functions of determining topics, prohibiting repetition, converting topics, associating before and after, setting characters, controlling safety and the like in the interaction process.
A database server: the centralized database capability is provided for the server in the intelligent voice interaction system, and various databases such as mysql, oracle, informix and the like can be supported, and the memory database can also be supported.
The voice file server: providing response voice required by the intelligent interaction strategy, wherein the voice can be pre-recorded voice; or directly connecting TTS of the call center, the customer service center and the call pickup service center, and using the response voice synthesized by the TTS.
The AS server: SIP signaling processing capability is provided to control the voice interaction process according to the selected intelligent interaction strategy. The AS server is also used for directly connecting SIP relays of the call center, the customer service center and the call pickup service center.
The MS server: providing SIP media processing capability, and completing functions of voice interaction such AS playing and receiving numbers according to the instruction of the AS server. The MS server is also used for directly connecting SIP relays of a call center, a customer service center and a call pickup service center and supporting an RTP playback mode.
A relay gateway: and when the call center, the customer service center and the call pickup service center can not provide SIP relay and can only provide ISUP relay, the relay gateway is used for butting the ISUP relay.
An interface server: and receiving ASR voice transcription text information of the call center, the customer service center and the call pickup service center, and sending the text information to an AI semantic analysis server for processing. The interface server is in charge of connecting other statistical analysis systems, and can send information such as incoming call number, incoming call time, incoming call information, incoming call final label and the like to other systems for secondary data processing. The interface server is responsible for connecting systems such as the bad information processing center of the operator and the like and actively reports the text information and the incoming call record of the negative incoming call.
As shown in fig. 4, fig. 4 is a service flow diagram of an outbound of an intelligent voice interaction system, where a call center sends an ASR voice transcription word to the intelligent voice interaction system for AI semantic analysis, and determines an intelligent voice interaction policy according to an AI recognition result of the AI analysis, where the intelligent voice interaction policy includes at least one of the above-mentioned topic determination policy, repeat-prohibited voice policy, topic conversion policy, context association policy, and security interaction control policy; after the corresponding voice interaction strategy is determined, a voice interaction flow is carried out, AS signaling control and MS media processing are carried out, a voice file is obtained when the MS media is processed, the AS signaling control is used for controlling a voice relay of a call center to play response voice to a user, meanwhile, outbound call recording information is also collected when the MS media is processed, and after the corresponding voice interaction strategy is selected by the intelligent voice interaction strategy, outbound call character information is collected, and then the outbound call recording information and the character information are packaged into data information, so that the subsequent data statistical analysis and the data secondary processing are facilitated; and when the intelligent voice interaction strategy determines to end the call or determines to be a safety control strategy (negative information appears), the call is terminated.
As shown in fig. 5, fig. 5 is a method for intelligent voice interaction, where the method for intelligent voice interaction includes:
s501, the call center accesses the call needing to be processed into the intelligent voice interaction system.
The call center accesses the intelligent voice interaction system through an AS (SIP relay) or an ISUP relay.
And S502, carrying out AI semantic analysis on the ASR voice transcription characters by an AI semantic analysis server of the intelligent voice interaction system to obtain 1 or more recognition results.
Wherein the ASR voice transcription words can be sent to the intelligent voice interaction system through the interface server.
And S503, the intelligent interaction strategy server of the intelligent voice interaction system selects a corresponding voice interaction process according to the AI recognition result analysis.
S504, the AS server of the intelligent voice interaction system carries out signaling interaction and controls the voice relay of the call center, and the MS server plays response voice to the user through the voice relay of the call center.
The response voice can be directly obtained from the voice file server, and can also be connected with a TTS server of the call center to synthesize a voice file.
And S505, recording the speech transcription characters of the call ASR, recording the single call and the whole-course recording of the call, recording the AI recognition result of the call, and recording the final label of the call by a database server of the intelligent speech interaction system.
S506-1, the intelligent interaction strategy server of the intelligent voice interaction system judges that the voice interaction is finished, the called user does not need to continue to have a conversation, and the outbound call is actively ended.
S506-2, the intelligent interaction strategy server of the intelligent voice interaction system finds that the called user has negative speech, and actively ends the outbound call.
S506-3, the database server of the intelligent voice interaction system records relevant information of the negative speech.
Under the condition of external calling, when the called party has negative speech, the call does not need to be actively reported to a bad information processing center of an operator.
In some embodiments of negative statements, the interface server of the intelligent voice interaction system may also notify the outbound center of the relevant information, and the outbound center determines whether to continue to perform outbound marketing to the user subsequently.
And S507, the interface server of the intelligent voice interaction system finishes the outbound call through the outbound center.
And S508, the interface server of the voice interactive system sends the outbound call information to other statistical analysis systems for data statistical analysis or data secondary processing.
As shown in fig. 6, 7 and 8, fig. 6 is a network structure diagram of a customer service center, a pickup service center and an intelligent voice interaction system, wherein the intelligent voice interaction system of fig. 6 is the same as the intelligent voice interaction system of fig. 3, and is not described in detail herein; fig. 7 is a service flow chart of the intelligent voice interaction system pickup, in which the customer service center/call pickup service center sends the ASR voice transcription text to the intelligent voice interaction system for AI semantic analysis, and determines an intelligent voice interaction policy according to an AI recognition result of the AI analysis, where the intelligent voice interaction policy includes at least one of the above-mentioned topic determination policy, repeat-prohibited voice policy, topic conversion policy, context association policy, and security interaction control policy; after the corresponding voice interaction strategy is determined, a voice interaction flow is carried out, AS signaling control and MS media processing are carried out, a voice file is obtained when the MS media processing is carried out, the AS signaling control is used for controlling voice relays of a customer service center and a pickup incoming call service center to play response voice to a user, incoming call recording information is collected when the MS media processing is carried out, after the corresponding voice interaction strategy is selected by the intelligent voice interaction strategy, incoming call text information is collected, and then the incoming call recording information and the text information are packaged into data information, so that subsequent data statistical analysis and data secondary processing are facilitated; when the intelligent voice interaction strategy determines to end the call or determines to be a safety control strategy (negative information appears), the call is terminated, and meanwhile, when the incoming telegram characters and the recording contain the negative information, the negative information is actively reported to the bad information processing center of the operator.
Fig. 8 is a diagram illustrating an intelligent voice interaction method, which includes:
s801, the customer service center/the agent incoming call service center accesses the call to be processed into the intelligent voice interaction system.
The customer service center/the service center of the call pickup agent accesses the intelligent voice interaction system by directly connecting AS/MS (SIP relay) or connecting a relay gateway (ISUP relay).
And S802, carrying out AI semantic analysis on the ASR voice transcription characters by an AI semantic analysis server of the intelligent voice interaction system to obtain 1 or more recognition results.
And S803, the intelligent interaction strategy server of the intelligent voice interaction system selects the corresponding voice interaction process according to the AI recognition result analysis.
S804, the AS server of the intelligent voice interaction system carries out signaling interaction to control the voice relay of the customer service center/the pickup service center, and the MS server plays response voice to the user through the voice relay of the customer service center/the pickup service center.
And acquiring a voice file through a voice file server, or connecting TTS of the call center to acquire a voice file synthesized by the TTS.
And S805, recording the ASR voice transcription characters of the call, recording a single incoming call and a whole-course recording, recording an AI recognition result of the incoming call, and recording a final label of the incoming call by a database server of the intelligent voice interaction system.
S806-1, the intelligent interaction strategy server of the intelligent voice interaction system judges that the voice interaction is finished, the called user does not need to continue to have a conversation, and the call is actively ended.
S806-2, the intelligent interaction strategy server of the intelligent voice interaction system finds that the calling user has negative speech, and actively ends the call.
S806-3, the database server of the intelligent voice interaction system records the relevant information of the negative speech, and the relevant information of the call of the negative speech is actively reported to the bad information processing center of the operator.
S807, the AS server of the intelligent voice interaction system directly ends the incoming call.
If the incoming call has negative words, the intelligent voice interaction can also inform the client center/the incoming call pickup service center of the related information and stop the incoming call identification characters and the recording information from disturbing the called user.
And S808, the interface server of the intelligent voice interaction system sends the incoming call information to other statistical analysis systems for data statistical analysis or data secondary processing.
The present embodiment further provides a storage medium, where the storage medium includes one or more computer programs stored therein, and the one or more computer programs can be executed by one or more processors to implement the steps of the intelligent voice interaction method described above, which are not described herein again.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a unit is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed.
Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment of the present invention.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention essentially or partially contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The technical solutions provided by the embodiments of the present invention are described in detail above, and the principles and embodiments of the present invention are explained in this patent by applying specific examples, and the descriptions of the embodiments above are only used to help understanding the principles of the embodiments of the present invention; the present invention is not limited to the above preferred embodiments, and any modifications, equivalent replacements, improvements, etc. within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. An intelligent voice interaction method is characterized by comprising the following steps:
putting through a call to be processed;
performing semantic analysis on the characters recognized by the ASR speech to obtain an AI recognition result;
selecting a corresponding voice interaction strategy according to the AI identification result; the voice interaction strategy comprises at least one of a topic determination strategy, a repeated voice prohibition strategy, a topic conversion strategy, a context association strategy and a security interaction control strategy;
and playing corresponding response voice according to the voice interaction strategy to perform voice interaction with a calling party.
2. The intelligent voice interaction method of claim 1, wherein the topic determination strategy comprises:
obtaining each conversation topic from each AI identification result;
selecting a topic associated with the calling party from the conversation topics as a target topic;
and carrying out voice interaction according to the target topic.
3. The intelligent voice interaction method of claim 1, wherein the inhibit repeating voice policy comprises:
determining a response voice that has been played to the calling party;
and when the similarity rate of the voice to be played to the calling party and the voice to be played to the calling party is greater than a threshold value, replacing the response voice to be played.
4. The intelligent voice interaction method of claim 1, wherein the topic conversion strategy comprises:
when the privacy information related to the calling party is included in each AI identification result, guiding to switch to other topics;
when the voice interaction of the current conversation topic is determined to be finished according to the voice interaction process, guiding to finish the current topic;
when the calling intention of the calling party is determined according to the voice interaction process, switching to the topic corresponding to the calling intention.
5. The intelligent voice interaction method of claim 1, wherein the context association policy comprises:
recording and analyzing each piece of voice information of the calling party;
and performing context association on the voice information, determining the calling intention of the calling party, and playing corresponding response voice according to the calling intention.
6. The intelligent voice interaction method according to claim 1, wherein the voice interaction control strategy comprises:
when determining that the voice information of the calling party comprises negative speech from each AI recognition result, recording voice character information and recording information corresponding to the voice information, and ending the call with the calling party;
and reporting the voice character information and the recording information.
7. The intelligent voice interaction method according to any one of claims 1-6, further comprising:
and setting voice character characteristics for expressing emotional colors in the voice interaction process.
8. The intelligent voice interaction method according to claim 7, wherein after playing the corresponding response voice according to the voice interaction policy and performing voice interaction with a calling party, the method comprises:
analyzing each AI identification result in the voice interaction process, and determining the call tag information of the voice interaction process;
and sending the call tag information to other statistical analysis systems for data statistical analysis.
9. An intelligent voice interaction system, comprising: the system comprises an AS server, an MS server, an AI server and an interactive strategy server;
the AS server is used for connecting the call to be processed;
the AI server is used for performing semantic analysis on the characters recognized by the ASR voice to obtain an AI recognition result;
the interaction strategy server is used for selecting a corresponding voice interaction strategy according to the AI identification result;
and the MS server is also used for playing corresponding response voice according to the voice interaction strategy to perform voice interaction with a calling party.
10. A storage medium comprising one or more computer programs stored thereon that are executable by one or more processors to perform the steps of the intelligent voice interaction method of any one of claims 1 to 8.
CN202011223378.XA 2020-11-05 2020-11-05 Intelligent voice interaction method, system and storage medium Pending CN112489645A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011223378.XA CN112489645A (en) 2020-11-05 2020-11-05 Intelligent voice interaction method, system and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011223378.XA CN112489645A (en) 2020-11-05 2020-11-05 Intelligent voice interaction method, system and storage medium

Publications (1)

Publication Number Publication Date
CN112489645A true CN112489645A (en) 2021-03-12

Family

ID=74928289

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011223378.XA Pending CN112489645A (en) 2020-11-05 2020-11-05 Intelligent voice interaction method, system and storage medium

Country Status (1)

Country Link
CN (1) CN112489645A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113284494A (en) * 2021-05-25 2021-08-20 平安普惠企业管理有限公司 Voice assistant recognition method, device, equipment and computer readable storage medium
CN114706966A (en) * 2022-03-23 2022-07-05 平安普惠企业管理有限公司 Voice interaction method, device and equipment based on artificial intelligence and storage medium
CN115002283A (en) * 2022-07-18 2022-09-02 北京烽火万家科技有限公司 Virtual digital man-machine telephone answering system and method

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105578439A (en) * 2016-01-23 2016-05-11 广州市讯飞樽鸿信息技术有限公司 Incoming call transfer intelligent answering method and system for call transfer platform
CN105592196A (en) * 2016-01-23 2016-05-18 广州市讯飞樽鸿信息技术有限公司 Incoming call intelligent response method and system based on intelligent terminal
CN109413286A (en) * 2018-10-22 2019-03-01 北京移数通电讯有限公司 A kind of intelligent customer service voice response system and method
CN110931137A (en) * 2018-09-19 2020-03-27 京东方科技集团股份有限公司 Machine-assisted dialog system, method and device
JP2020077272A (en) * 2018-11-09 2020-05-21 株式会社タカラトミー Conversation system and conversation program
CN111294471A (en) * 2020-02-06 2020-06-16 广州市讯飞樽鸿信息技术有限公司 Intelligent telephone answering method and system
CN111739516A (en) * 2020-06-19 2020-10-02 中国—东盟信息港股份有限公司 Speech recognition system for intelligent customer service call

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105578439A (en) * 2016-01-23 2016-05-11 广州市讯飞樽鸿信息技术有限公司 Incoming call transfer intelligent answering method and system for call transfer platform
CN105592196A (en) * 2016-01-23 2016-05-18 广州市讯飞樽鸿信息技术有限公司 Incoming call intelligent response method and system based on intelligent terminal
CN110931137A (en) * 2018-09-19 2020-03-27 京东方科技集团股份有限公司 Machine-assisted dialog system, method and device
CN109413286A (en) * 2018-10-22 2019-03-01 北京移数通电讯有限公司 A kind of intelligent customer service voice response system and method
JP2020077272A (en) * 2018-11-09 2020-05-21 株式会社タカラトミー Conversation system and conversation program
CN111294471A (en) * 2020-02-06 2020-06-16 广州市讯飞樽鸿信息技术有限公司 Intelligent telephone answering method and system
CN111739516A (en) * 2020-06-19 2020-10-02 中国—东盟信息港股份有限公司 Speech recognition system for intelligent customer service call

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113284494A (en) * 2021-05-25 2021-08-20 平安普惠企业管理有限公司 Voice assistant recognition method, device, equipment and computer readable storage medium
CN113284494B (en) * 2021-05-25 2023-12-01 北京基智科技有限公司 Voice assistant recognition method, device, equipment and computer readable storage medium
CN114706966A (en) * 2022-03-23 2022-07-05 平安普惠企业管理有限公司 Voice interaction method, device and equipment based on artificial intelligence and storage medium
CN115002283A (en) * 2022-07-18 2022-09-02 北京烽火万家科技有限公司 Virtual digital man-machine telephone answering system and method

Similar Documents

Publication Publication Date Title
CN112489645A (en) Intelligent voice interaction method, system and storage medium
US10129402B1 (en) Customer satisfaction analysis of caller interaction event data system and methods
US10110741B1 (en) Determining and denying call completion based on detection of robocall or telemarketing call
CA2320569C (en) System and method for automatically detecting problematic calls
US8094790B2 (en) Method and software for training a customer service representative by analysis of a telephonic interaction between a customer and a contact center
EP2297933B1 (en) Method and system for handling a telephone call
US8094803B2 (en) Method and system for analyzing separated voice data of a telephonic communication between a customer and a contact center by applying a psychological behavioral model thereto
EP1172804B1 (en) Voice filter for the substitution of recognized words of an oral presentation
CN102868836B (en) For real person talk skill system and its implementation of call center
US20060265089A1 (en) Method and software for analyzing voice data of a telephonic communication and generating a retention strategy therefrom
US20190098135A1 (en) System, Method, and Apparatus for Determining a Status of a Call Recipient in a Call System
CN111654582A (en) Intelligent outbound method and device
CN110392168A (en) Call processing method, device, server, storage medium and system
US11522993B2 (en) Systems and methods for rapid analysis of call audio data using a stream-processing platform
CN112565529A (en) Intelligent telephone answering method, system and storage medium
US8189762B2 (en) System and method for interactive voice response enhanced out-calling
CA2600523A1 (en) Systems and methods for analyzing communication sessions
US20090103711A1 (en) Methods and systems for determining inappropriate threats during a telephonic communication between a customer and a contact center
CN113727051A (en) Bidirectional video method, system, equipment and storage medium based on virtual agent
CN113327582B (en) Voice interaction method and device, electronic equipment and storage medium
CN112714217A (en) Telephone traffic quality inspection method, device, storage medium and server
CN107888747B (en) A kind of missed call message leaving method and device
US20240177711A1 (en) Real-time provision of guidance to sales-focused agents of a contact center based on identifiable background sounds
CN114584656B (en) Streaming voice response method and device and voice call robot thereof
CN117711399B (en) Interactive AI intelligent robot control method and intelligent robot

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination