CN106409283B - Man-machine mixed interaction system and method based on audio - Google Patents

Man-machine mixed interaction system and method based on audio Download PDF

Info

Publication number
CN106409283B
CN106409283B CN201610791966.0A CN201610791966A CN106409283B CN 106409283 B CN106409283 B CN 106409283B CN 201610791966 A CN201610791966 A CN 201610791966A CN 106409283 B CN106409283 B CN 106409283B
Authority
CN
China
Prior art keywords
information
unit
recognition module
module
intervention
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610791966.0A
Other languages
Chinese (zh)
Other versions
CN106409283A (en
Inventor
俞凯
石开宇
郑达
陈露
常成
曹迪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sipic Technology Co Ltd
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN201610791966.0A priority Critical patent/CN106409283B/en
Publication of CN106409283A publication Critical patent/CN106409283A/en
Application granted granted Critical
Publication of CN106409283B publication Critical patent/CN106409283B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
    • G10L13/10Prosody rules derived from text; Stress or intonation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Acoustics & Sound (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The invention discloses a man-machine mixed interaction system based on audio, wherein a voice recognition module is connected with a semantic recognition module and transmits text information corresponding to voice, an exception handling module is connected with the voice recognition module and the semantic recognition module, the voice recognition module transmits the text information to the exception handling module, and the semantic recognition module transmits a semantic analysis result to the exception handling module; and the exception handling module is connected with the voice synthesis module and transmits intervention information. The invention also discloses a man-machine mixed interaction method based on audio, wherein the voice recognition module converts voice information into character information and outputs the character information to the semantic recognition unit; the semantic recognition unit extracts a user purpose and corresponding key information from the character information; and the exception processing module judges whether the man-machine conversation is abnormal at present according to the character information of the voice recognition module and the semantic information of the semantic recognition module and replies to the exception processing message. The technical scheme of the invention provides uniform man-machine conversation experience.

Description

Man-machine mixed interaction system and method based on audio
Technical Field
The invention relates to the technical field of information processing, in particular to a human-computer hybrid interaction system and a human-computer hybrid interaction method based on audio.
Background
As shown in fig. 1, the current audio-based human-machine dialog systems use machine replies as final replies to the user, and when the machine decision system cannot make clear the user's intention, most dialog systems choose to present replies such as "please say again" to let the user perform new input, wherein some human-machine dialog systems introduce a manual intervention method based on a traffic center.
At present, the existing man-machine conversation exception handling is mainly realized in a telephone traffic center mode, when a machine cannot handle user input audio or a user clearly shows that manual service is needed, a manual telephone traffic center is requested to intervene, at the moment, one-to-one conversation connection is established between the user and a telephone operator, the telephone operator and the user directly communicate with each other, the user's requirements are obtained, and corresponding instructions are issued through a telephone traffic platform.
The problems existing in the manual intervention mode of the existing telephone traffic center are mainly as follows: the manual efficiency is low, the interventionalist needs to establish one-to-one voice communication with the user, and cannot serve other people within the time period of waiting for the input of the user; the cost is high, a series of telecommunication devices and corresponding service integration are needed for a large-scale call center, and meanwhile, due to low efficiency, more intervention operators are needed for intervention service, so that the labor cost is indirectly increased; is greatly influenced by the network environment: the direct transmission of audio by using network resources requires stable network connection, and the fluctuation of the network environment can cause the audio quality to be reduced, thereby influencing the conversation experience and even interrupting the man-machine conversation process.
Therefore, those skilled in the art are dedicated to develop an audio-based human-computer hybrid interactive system and method, which combine a human intervention reply and a machine reply, so as to unify the flow of human-computer conversation and improve the user experience.
Disclosure of Invention
In view of the above-mentioned defects of the prior art, the technical problem to be solved by the present invention is how to improve the efficiency of man-machine conversation and user experience in the customer service process.
In order to achieve the above object, the present invention provides an audio-based human-computer hybrid interaction system, including a speech recognition module, a speech synthesis module, a semantic recognition module, and an exception handling module, wherein the speech recognition module is configured to be connected to the semantic recognition module and transmit text information corresponding to speech, the exception handling module is configured to be connected to the speech recognition module and the semantic recognition module, the speech recognition module is configured to transmit text information to the exception handling module, and the semantic recognition module is configured to transmit a semantic parsing result to the exception handling module; the exception handling module is configured to connect with the speech synthesis module and transmit intervention information.
Further, the speech recognition module comprises a signal processing and feature extraction unit, an acoustic model, a language model and a decoder, wherein the signal processing and feature extraction unit is configured to be connected with the acoustic model and transmit acoustic feature information, and the decoder is configured to be connected with the acoustic model and the language model and output a recognition result.
Further, the voice synthesis module comprises a text analysis unit, a prosody control unit and a synthesized voice unit, wherein the text analysis unit is configured to receive and process text information, transmit a processing result to the prosody control unit and the synthesized voice unit, the prosody control unit is configured to be connected with the synthesized voice unit and transmit pitch, duration, intensity, pause and intonation information, and the synthesized voice unit is configured to synthesize an output voice by an analysis result of the received text analysis unit and a control parameter of the prosody control unit.
Further, the semantic recognition module comprises a field marking unit, an intention judging unit and an information extracting unit, wherein the field marking unit is connected with the intention judging unit and transmits field information, the intention judging unit is connected with the information extracting unit and transmits user intention information, and the information extracting unit outputs a semantic analysis result.
Further, the exception handling module comprises an exception detecting unit, a database querying unit and an intervener unit, wherein the exception detecting unit is configured to receive the outputs of the voice recognition module and the semantic recognition module and decide whether to take intervention measures, the database querying unit is configured to receive an intervention signal of the exception detecting unit and receive semantic information of the semantic recognition module and query and output an intervention message, and the intervener unit is configured to perform necessary preference and modification on the intervention message output by the database querying unit by an intervener and finally output a reply message to a user.
The invention also provides a man-machine mixed interaction method based on the audio, which comprises the following steps:
step 1, providing a voice recognition module, a voice synthesis module, a semantic recognition module and an exception handling module;
step 2, the voice recognition module converts voice information into character information and outputs the character information to the semantic recognition unit;
step 3, the semantic recognition unit extracts the user purpose and corresponding key information from the character information;
and 4, judging whether the man-machine conversation is abnormal at present by the abnormal processing module according to the character information of the voice recognition module and the semantic information of the semantic recognition module, and replying the abnormal processing message.
Further, in step 2, the method specifically comprises the following steps:
2.1, extracting features from the input audio stream for processing by an acoustic model, and simultaneously reducing the influence of environmental noise, channels and speaker factors on the features;
and 2.2, searching a word string capable of outputting the audio stream with the maximum probability as a voice recognition result by the decoder according to the acoustic model, the linguistic model and the dictionary.
Further, in step 3, the method specifically comprises the following steps:
step 3.1, marking the field to which the current conversation belongs by using the symbolic key words in the character information;
step 3.2, judging the user intention based on rules in the field;
and 3.3, extracting specific key information according to the field and the user intention and by combining rules.
Further, in step 4, the method specifically comprises the following steps:
step 4.1, the abnormality detection unit judges whether the current man-machine conversation is abnormal according to the character information of the voice recognition module and the semantic information of the semantic recognition module, and if the current man-machine conversation is abnormal, the interventionalist unit takes over the man-machine conversation;
step 4.2, the database query unit queries the database according to the semantic information to obtain an intervention message with recommendation degree, if the recommendation degree of the intervention message is higher, intervention is directly performed by using the intervention message, and if the recommendation degree is lower, an intervention engineer is requested to perform manual intervention;
and 4.3, when the machine algorithm cannot find the intervention message with high recommendation degree, the intervention engineer intervenes to select and modify the intervention message, and then the modified intervention message is sent to the client.
Further, the key information comprises a conversation field and a conversation keyword, and the conversation keyword comprises a content keyword and an emotion keyword.
Compared with the prior art, the invention has the technical effects that:
1. the efficiency is improved: the time for waiting the input of the user by the intervener is fully utilized, so that the intervener can intervene the service for a plurality of users at the same time, and the intervention efficiency is improved.
2. The cost is reduced: the intervention platform can be built by utilizing the existing computer and the server without purchasing a series of telecommunication devices related to a telephone traffic center.
3. The working scene is rich: because the intervention engineer interface adopts a B/S (Browser/Server Browser/Server) structure, the intervention engineer can perform intervention operation by opening a Browser and logging in a corresponding website, does not need to answer a call on a station, and can perform intervention service on mobile terminals such as a PAD (PAD, a smart phone, a personal notebook and the like.
4. The network requirement is low: the data volume of text transmission is small, so that the requirement on the network is reduced, and meanwhile, the voice heard by the user is synthesized locally and is not influenced by the network condition.
5. Unified human-machine conversation experience: the interventionalist is transparent to the user, and the user's experience is as if talking to a sufficiently intelligent "machine" that it is possible to seamlessly articulate the current man-machine conversation.
The conception, the specific structure and the technical effects of the present invention will be further described with reference to the accompanying drawings to fully understand the objects, the features and the effects of the present invention.
Drawings
Fig. 1 is a schematic diagram of an intervention mode of a conventional traffic center;
FIG. 2 is a block diagram of a system of the present invention;
FIG. 3 is a system flow diagram of a preferred embodiment of the present invention;
FIG. 4 is a diagram illustrating a role dialog flow according to a preferred embodiment of the present invention.
Detailed Description
The invention is realized by the following technical scheme:
as shown in fig. 2, the present invention relates to an audio-based human-machine dialog exception handling system, which comprises: speech recognition module, speech synthesis module, semantic recognition module and exception handling module, wherein: the voice recognition module is connected with the semantic recognition module and transmits text information corresponding to voice, the voice recognition module and the semantic recognition module are both connected with the exception processing module and respectively transmit the text information and a semantic analysis result, and the exception processing module is connected with the voice synthesis module and transmits intervention information.
The voice recognition module comprises: signal processing and feature extraction unit, acoustic model, language model and decoder, wherein: the signal processing and feature extracting unit is connected with the acoustic model and transmits acoustic feature information, and the decoder is connected with the acoustic model and the language model and outputs a recognition result to the outside.
The speech synthesis module comprises: a text analysis unit, a prosody control unit, and a synthesized speech unit, wherein: the text analysis unit receives and processes the text information, transmits the processing result to the rhythm control unit and the synthesized voice unit, the rhythm control unit is connected with the synthesized voice unit and transmits the pitch, the duration, the intensity, the pause, the intonation and other information of the target, and the synthesized voice unit receives the analysis result of the text analysis unit and the control parameters of the rhythm control unit and outputs the synthesized voice to the outside.
The semantic recognition module comprises: the system comprises a field labeling unit, an intention judging unit and an information extracting unit, wherein: the domain marking unit is connected with the intention judging unit and transmits domain information, the intention judging unit is connected with the information extracting unit and transmits user intention information, and the information unit is connected with the outside and transmits semantic analysis information.
The exception handling module comprises: the system comprises an abnormality detection unit, a database query unit and an interventionalist unit, wherein: the abnormality detection unit receives the output of the voice recognition module and the semantic recognition module and decides whether intervention measures are taken or not, the database query unit receives intervention signals of the abnormality detection unit and receives semantic information of the semantic recognition module to query and output intervention messages, and the intervener unit performs necessary preference and modification on the intervention messages output by the database query unit by means of interveners and finally outputs user reply messages.
The invention relates to a man-machine conversation exception handling method of the system, which specifically comprises the following steps:
step 1, providing a voice recognition module, a voice synthesis module, a semantic recognition module and an exception handling module.
Step 2, the voice recognition module converts the voice information into character information and outputs the character information to a semantic recognition unit, and the specific steps comprise:
2.1 front-end processing the audio stream to extract features from the input signal for acoustic model processing. And simultaneously, the influence of environmental noise, channels, speakers and other factors on the characteristics is reduced as much as possible.
2.2 the decoder searches for a word string that can output the input signal with the maximum probability as a speech recognition result based on the acoustic, linguistic models and dictionaries.
Step 3, the semantic recognition unit extracts the user purpose and corresponding key information from the character information, and the specific steps comprise:
3.1 the domain to which the current conversation belongs is marked by using the symbolic key words in the character information.
3.2 in the specific domain, the intention of the user is judged based on the rule.
3.3 according to the field and the user intention, combining with the rule, such as a preset template, extracting the specific key information.
Step 4, the exception handling module judges whether the man-machine conversation is abnormal at present and carries out exception handling and message reply according to the character information of the voice recognition module and the semantic information of the semantic recognition module, and the specific steps comprise:
4.1 the abnormity detection unit judges whether the current man-machine conversation is abnormal according to the character information of the voice recognition module and the semantic information of the semantic recognition module. If not, the local client end processes the abnormal condition, and the intervention server takes over the man-machine conversation.
4.2 the database query unit queries the database according to the semantic information to obtain recommended intervention information, if the recommendation degree of the intervention information is higher, intervention is directly performed by using the intervention information, and if the recommendation degree is lower, manual intervention is requested by an interventionalist.
4.3 when the machine algorithm can not find the intervention message with high recommendation degree, the intervention engineer intervenes to select and modify the intervention message, and then sends the modified intervention message to the client.
In the process of processing the man-machine conversation exception, after the voice input of the user passes through the voice recognition and the semantic analysis of the machine, the voice recognition result and the semantic analysis result are transmitted to an intervener end in a text form, and the intervener can select to send a conversation message or send a command message after receiving the message. The dialog messages are transmitted to the machine in the form of text, and speech is subsequently synthesized by a speech synthesis system (TTS) and played to the user, the command messages being commands executed directly by the machine.
The embodiment includes the following steps, as shown in fig. 3 and fig. 4, that is, three steps of user input- > intervention message generation- > client pushing intervention message are introduced in the technical solution respectively:
1) user input
In the process of voice input of a user, a voice recognition system is utilized to convert voice input audio of the user into characters, semantic analysis is carried out on the characters (the result of the semantic analysis comprises the current dialogue field of the user, key information of service request of the user and the like), and finally the characters and the result of the semantic analysis are transmitted to an abnormal processing module in a text form through a POST method of an HTTP protocol.
2) Intervention message generation
And under the abnormal condition, the abnormal processing module queries the database according to the text information of the voice recognition and the semantic slot of the semantic recognition to obtain alternative intervention information. And if the recommendation degree of the intervention message is higher, directly using the intervention message for intervention, and if the recommendation degree is lower, requesting an intervention engineer for manual intervention. The intervener can see auxiliary data provided by the exception handling module, such as a recognition result input by a user, a result of semantic analysis and the like, on the interface, and can more accurately and quickly screen and modify candidate intervention messages by combining the information. The intervention message is divided into a conversation message and a command message, and is transmitted in a text form by adopting a uniform Websocket protocol, and the difference is different from the transmission content and the processing mode of a machine.
3) Client push intervention messages
The client returns confirmation that the intervention message is received to the intervener immediately after receiving the intervention message, and the intervention message is cached in the message queue. The client monitors the current man-machine conversation state and tries to take out the message from the message queue to push the message to the user under certain conditions, and the specific push opportunity comprises the following steps: 1. when the intervention message arrives and the voice message broadcast of TTS synthesis is finished; the conditions that need to be met are 1, the message queue is not empty, 2, the audio player of the client is currently idle. And if the intervention message is successfully pushed, returning confirmation information of the intervener that the intervention message is pushed.
For example:
1. user a issues a voice instruction "i want to go to a playful place".
2. The voice recognition module converts the voice input into text.
3. The semantic analysis module processes the data to obtain that the user intention is 'navigation' and the label of the target place of navigation is 'good play'.
4. An abnormality detection unit in the abnormality processing module receives a service request of a user A, and the service request comprises a complete voice recognition result of 'i want to go to a playful place' and a semantic analysis result of 'navigation' and 'playful', and detects that an abnormality occurs in the current conversation state.
5. The database query unit in the exception handling module performs database query according to "navigate", "good play", and obtains some alternative messages such as "ask you go to good-play snack in suzhou? "find 5 locations relevant to fun for you", both messages are less recommended and require manual intervention by the interventionalist unit. The intervention engineer uses the database query result obtained by the exception processing module, the semantic analysis result and the text result of the voice recognition to select and modify the intervention message, and changes the intervention message into' asking what entertainment mode you want? ", the text message is sent to the user.
6. After receiving the intervention message, the client stores the intervention message in a message queue, sends feedback that the message is received to the exception handling module, and tries to push the message.
7. And after the condition is met, the voice synthesis system of the intervention message is synthesized and broadcasted, the user hears the audio 'asking what entertainment mode you want', and the client sends 'message pushed' feedback to the exception handling module.
8. Further speech input by the customer "I want to sing" is "
9. ASR systems convert speech input to text
10. Semantic analysis obtains that the user intention is 'navigation', and the navigation target is 'KTV'
11. The anomaly detection unit obtains the specific service requirements of the user A, and the specific service requirements comprise a complete voice recognition result of 'i want to sing' and a semantic analysis result of 'navigation' and 'KTV'.
12. The database query unit searches the database according to the navigation, the KTV and the relevant information of the user to obtain an alternative intervention message, namely, recommending whether xxx asking for you to go or not? "while bypassing the interventionalist unit and sending a text message" ask you for a query about xxx or not for you? "
13. User confirmation to go
14. The exception handling system user pushes a command type intervention message containing the command type "navigate" and POI information for the destination.
15. The client takes out the message of the command type navigation and the corresponding POI information from the message queue to carry out navigation operation, the client sends the feedback that the message is pushed to the exception handling module, and the interaction is finished.
The foregoing detailed description of the preferred embodiments of the invention has been presented. It should be understood that numerous modifications and variations could be devised by those skilled in the art in light of the present teachings without departing from the inventive concepts. Therefore, the technical solutions available to those skilled in the art through logic analysis, reasoning and limited experiments based on the prior art according to the concept of the present invention should be within the scope of protection defined by the claims.

Claims (8)

1. A man-machine mixed interaction system based on audio is characterized by comprising a voice recognition module, a voice synthesis module, a semantic recognition module and an exception handling module, wherein the voice recognition module is connected with the semantic recognition module and transmits text information corresponding to voice, the exception handling module is connected with the voice recognition module and the semantic recognition module, the voice recognition module is configured to transmit the text information to the exception handling module, and the semantic recognition module is configured to transmit a semantic parsing result to the exception handling module; the exception handling module is configured to connect with the speech synthesis module and transmit intervention information; the voice synthesis module is configured to convert the intervention information transmitted by the exception handling module into voice, send and play the voice to a user, and wait for the user to further feed back;
the abnormality processing module comprises an abnormality detection unit, a database query unit and an intervener unit, wherein the abnormality detection unit is configured to receive the outputs of the voice recognition module and the semantic recognition module and decide whether to take intervention measures, the database query unit is configured to receive an intervention signal of the abnormality detection unit, receive semantic information of the semantic recognition module, query and output the intervention information with high recommendation degree to the voice synthesis module; the intervener unit is configured to perform necessary preference and modification on the intervention information with low recommendation degree output by the database query unit by using an intervener, and then transmit the intervention information to the voice synthesis module to obtain a reply message to be further fed back by the user.
2. The audio-based human-computer hybrid interaction system according to claim 1, wherein the speech recognition module comprises a signal processing and feature extraction unit, an acoustic model, a language model, and a decoder, wherein the signal processing and feature extraction unit is configured to be connected to the acoustic model and to transmit acoustic feature information, and the decoder is configured to be connected to the acoustic model and the language model and to output a recognition result.
3. The audio-based human-computer hybrid interactive system according to claim 1, wherein the speech synthesis module comprises a text analysis unit, a prosody control unit and a synthesized speech unit, wherein the text analysis unit is configured to receive and process text information, transmit the processing result to the prosody control unit and the synthesized speech unit, the prosody control unit is configured to be connected to the synthesized speech unit and transmit pitch, duration, intensity, pause and intonation information, and the synthesized speech unit is configured to receive the analysis result of the text analysis unit and the control parameters of the prosody control unit to synthesize the output speech.
4. The audio-based human-computer hybrid interaction system according to claim 1, wherein the semantic recognition module comprises a domain labeling unit, an intention judging unit, and an information extracting unit, wherein the domain labeling unit is configured to be connected with the intention judging unit and transmit domain information, the intention judging unit is configured to be connected with the information extracting unit and transmit user intention information, and the information extracting unit outputs a result of semantic analysis.
5. A man-machine mixed interaction method based on audio is characterized by comprising the following steps:
step 1, providing a voice recognition module, a voice synthesis module, a semantic recognition module and an exception handling module;
step 2, the voice recognition module converts voice information into character information and outputs the character information to the semantic recognition module;
step 3, the semantic recognition module extracts the user purpose and corresponding key information from the character information;
step 4, the exception handling module judges whether the man-machine conversation is abnormal at present according to the character information of the voice recognition module and the semantic information of the semantic recognition module and replies to exception handling information;
wherein, in the step 4, the method specifically comprises the following steps:
step 4.1, the abnormality detection unit judges whether the current man-machine conversation is abnormal according to the character information of the voice recognition module and the semantic information of the semantic recognition module, and if the current man-machine conversation is abnormal, the interventionalist unit takes over the man-machine conversation;
step 4.2, the database query unit queries the database according to the semantic information to obtain intervention information with recommendation degree, if the recommendation degree of the intervention information is higher, intervention is directly performed by using the intervention information, the intervention information is sent to a client, and the step 2 is entered to wait for further feedback of the user; if the recommendation degree is low, requesting an interventionalist to perform manual intervention;
and 4.3, when the intervention information with high recommendation degree cannot be found by the machine algorithm, an intervention teacher intervenes to select and modify the intervention information, then the modified intervention information is sent to a client, and the step 2 is entered to wait for further feedback of the user.
6. The audio-based human-computer hybrid interaction method according to claim 5, wherein in the step 2, the method specifically comprises the following steps:
2.1, extracting features from the input audio stream for processing by an acoustic model, and simultaneously reducing the influence of environmental noise, channels and speaker factors on the features;
and 2.2, searching a word string capable of outputting the audio stream with the maximum probability as a voice recognition result by the decoder according to the acoustic model, the linguistic model and the dictionary.
7. The audio-based human-computer hybrid interaction method according to claim 5, wherein in step 3, the method specifically comprises the following steps:
step 3.1, marking the field to which the current conversation belongs by using the symbolic key words in the character information;
step 3.2, judging the user intention based on rules in the field;
and 3.3, extracting specific key information according to the field and the user intention and by combining rules.
8. An audio-based human-computer hybrid interaction method according to claim 5 or 7, wherein the key information includes a dialogue domain, dialogue keywords including content keywords and emotion keywords.
CN201610791966.0A 2016-08-31 2016-08-31 Man-machine mixed interaction system and method based on audio Active CN106409283B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610791966.0A CN106409283B (en) 2016-08-31 2016-08-31 Man-machine mixed interaction system and method based on audio

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610791966.0A CN106409283B (en) 2016-08-31 2016-08-31 Man-machine mixed interaction system and method based on audio

Publications (2)

Publication Number Publication Date
CN106409283A CN106409283A (en) 2017-02-15
CN106409283B true CN106409283B (en) 2020-01-10

Family

ID=58001464

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610791966.0A Active CN106409283B (en) 2016-08-31 2016-08-31 Man-machine mixed interaction system and method based on audio

Country Status (1)

Country Link
CN (1) CN106409283B (en)

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107204185B (en) * 2017-05-03 2021-05-25 深圳车盒子科技有限公司 Vehicle-mounted voice interaction method and system and computer readable storage medium
CN107122807B (en) * 2017-05-24 2021-05-21 努比亚技术有限公司 Home monitoring method, server and computer readable storage medium
CN107733780B (en) * 2017-09-18 2020-07-03 上海量明科技发展有限公司 Intelligent task allocation method and device and instant messaging tool
CN109697226A (en) * 2017-10-24 2019-04-30 上海易谷网络科技股份有限公司 Text silence seat monitoring robot interactive method
CN107992587A (en) * 2017-12-08 2018-05-04 北京百度网讯科技有限公司 A kind of voice interactive method of browser, device, terminal and storage medium
CN110069607B (en) * 2017-12-14 2024-03-05 株式会社日立制作所 Method, apparatus, electronic device, and computer-readable storage medium for customer service
US10983526B2 (en) 2018-09-17 2021-04-20 Huawei Technologies Co., Ltd. Method and system for generating a semantic point cloud map
CN110970017B (en) * 2018-09-27 2023-06-23 北京京东尚科信息技术有限公司 Man-machine interaction method and system and computer system
CN111125384B (en) * 2018-11-01 2023-04-07 阿里巴巴集团控股有限公司 Multimedia answer generation method and device, terminal equipment and storage medium
CN110602334A (en) * 2019-09-03 2019-12-20 上海航动科技有限公司 Intelligent outbound method and system based on man-machine cooperation
CN110926493A (en) * 2019-12-10 2020-03-27 广州小鹏汽车科技有限公司 Navigation method, navigation device, vehicle and computer readable storage medium
CN111540353B (en) * 2020-04-16 2022-11-15 重庆农村商业银行股份有限公司 Semantic understanding method, device, equipment and storage medium
CN112509575B (en) * 2020-11-26 2022-07-22 上海济邦投资咨询有限公司 Financial consultation intelligent guiding system based on big data
CN112735427B (en) * 2020-12-25 2023-12-05 海菲曼(天津)科技有限公司 Radio reception control method and device, electronic equipment and storage medium
CN116453540B (en) * 2023-06-15 2023-08-29 山东贝宁电子科技开发有限公司 Underwater frogman voice communication quality enhancement processing method

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1920948A (en) * 2005-08-24 2007-02-28 富士通株式会社 Voice recognition system and voice processing system
CN101276584A (en) * 2007-03-28 2008-10-01 株式会社东芝 Prosody-pattern generating apparatus, speech synthesizing apparatus, and computer program product and method thereof
CN102509483A (en) * 2011-10-31 2012-06-20 苏州思必驰信息科技有限公司 Distributive automatic grading system for spoken language test and method thereof
CN102982799A (en) * 2012-12-20 2013-03-20 中国科学院自动化研究所 Speech recognition optimization decoding method integrating guide probability
CN104678868A (en) * 2015-01-23 2015-06-03 贾新勇 Business and equipment operation and maintenance monitoring system
CN105227790A (en) * 2015-09-24 2016-01-06 北京车音网科技有限公司 A kind of voice answer method, electronic equipment and system
CN105723362A (en) * 2013-10-28 2016-06-29 余自立 Natural expression processing method, processing and response method, device, and system

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1920948A (en) * 2005-08-24 2007-02-28 富士通株式会社 Voice recognition system and voice processing system
CN101276584A (en) * 2007-03-28 2008-10-01 株式会社东芝 Prosody-pattern generating apparatus, speech synthesizing apparatus, and computer program product and method thereof
CN102509483A (en) * 2011-10-31 2012-06-20 苏州思必驰信息科技有限公司 Distributive automatic grading system for spoken language test and method thereof
CN102982799A (en) * 2012-12-20 2013-03-20 中国科学院自动化研究所 Speech recognition optimization decoding method integrating guide probability
CN105723362A (en) * 2013-10-28 2016-06-29 余自立 Natural expression processing method, processing and response method, device, and system
CN104678868A (en) * 2015-01-23 2015-06-03 贾新勇 Business and equipment operation and maintenance monitoring system
CN105227790A (en) * 2015-09-24 2016-01-06 北京车音网科技有限公司 A kind of voice answer method, electronic equipment and system

Also Published As

Publication number Publication date
CN106409283A (en) 2017-02-15

Similar Documents

Publication Publication Date Title
CN106409283B (en) Man-machine mixed interaction system and method based on audio
KR102108500B1 (en) Supporting Method And System For communication Service, and Electronic Device supporting the same
US20060235694A1 (en) Integrating conversational speech into Web browsers
CN111128126A (en) Multi-language intelligent voice conversation method and system
CN101576901B (en) Method for generating search request and mobile communication equipment
CN102196207A (en) Method, device and system for controlling television by using voice
CN101207655A (en) Method and system switching between voice and text exchanging forms in a communication conversation
US11404052B2 (en) Service data processing method and apparatus and related device
CN102724309A (en) Vehicular voice network music system and control method thereof
KR20180091707A (en) Modulation of Packetized Audio Signal
CN105117391A (en) Translating languages
JP2018510407A (en) Q & A information processing method, apparatus, storage medium and apparatus
CN111833875B (en) Embedded voice interaction system
KR20140112360A (en) Vocabulary integration system and method of vocabulary integration in speech recognition
CN108882101B (en) Playing control method, device, equipment and storage medium of intelligent sound box
CN105206272A (en) Voice transmission control method and system
CN106991106A (en) Reduce as the delay caused by switching input mode
CN110119514A (en) The instant translation method of information, device and system
CN102847325A (en) Toy control method and system based on voice interaction of mobile communication terminal
CN112866086A (en) Information pushing method, device, equipment and storage medium for intelligent outbound
CN111094924A (en) Data processing apparatus and method for performing voice-based human-machine interaction
CN116431316B (en) Task processing method, system, platform and automatic question-answering method
JP2022101663A (en) Human-computer interaction method, device, electronic apparatus, storage media and computer program
CN111554280A (en) Real-time interpretation service system for mixing interpretation contents using artificial intelligence and interpretation contents of interpretation experts
CN114064943A (en) Conference management method, conference management device, storage medium and electronic equipment

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20200619

Address after: Room 105G, 199 GuoShoujing Road, Pudong New Area, Shanghai, 200120

Patentee after: Shanghai Jiaotong University Intellectual Property Management Co.,Ltd.

Address before: 200240 Dongchuan Road, Shanghai, No. 800, No.

Patentee before: SHANGHAI JIAO TONG University

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20201105

Address after: 215123 14 Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou, Jiangsu.

Patentee after: AI SPEECH Ltd.

Address before: Room 105G, 199 GuoShoujing Road, Pudong New Area, Shanghai, 200120

Patentee before: Shanghai Jiaotong University Intellectual Property Management Co.,Ltd.

TR01 Transfer of patent right
CP01 Change in the name or title of a patent holder

Address after: 215123 building 14, Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou City, Jiangsu Province

Patentee after: Sipic Technology Co.,Ltd.

Address before: 215123 building 14, Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou City, Jiangsu Province

Patentee before: AI SPEECH Ltd.

CP01 Change in the name or title of a patent holder