CN111524508A - Voice conversation system and voice conversation implementation method - Google Patents

Voice conversation system and voice conversation implementation method Download PDF

Info

Publication number
CN111524508A
CN111524508A CN201910108497.1A CN201910108497A CN111524508A CN 111524508 A CN111524508 A CN 111524508A CN 201910108497 A CN201910108497 A CN 201910108497A CN 111524508 A CN111524508 A CN 111524508A
Authority
CN
China
Prior art keywords
voice
data
server
client
text data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910108497.1A
Other languages
Chinese (zh)
Inventor
王欣
马天泽
林锋
邵鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NIO Co Ltd
Original Assignee
NIO Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NIO Co Ltd filed Critical NIO Co Ltd
Priority to CN201910108497.1A priority Critical patent/CN111524508A/en
Publication of CN111524508A publication Critical patent/CN111524508A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/32Multiple recognisers used in sequence or in parallel; Score combination systems therefor, e.g. voting systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/16Implementation or adaptation of Internet protocol [IP], of transmission control protocol [TCP] or of user datagram protocol [UDP]
    • H04L69/161Implementation details of TCP/IP or UDP/IP stack architecture; Specification of modified or new header fields
    • H04L69/162Implementation details of TCP/IP or UDP/IP stack architecture; Specification of modified or new header fields involving adaptations of sockets based mechanisms
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/225Feedback of the input speech

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The invention relates to a voice conversation realization method and a voice conversation realization system. The method is used for realizing the voice conversation between the client and the server and comprises the following steps: a first transmission step of transmitting voice data from the client to the server; a conversion step, in which the server performs voice recognition and semantic understanding on the voice data and generates text data; and a second transmission step of transmitting the text data from the server to the client. According to the invention, the client can carry out voice recognition and semantic understanding on the voice data only by carrying out one-time communication with the server, and the voice recognition accuracy rate under a specific scene can be improved.

Description

Voice conversation system and voice conversation implementation method
Technical Field
The present invention relates to man-machine interaction technology, and in particular, to a voice dialog system and a voice dialog implementation method.
Background
NLU (natural language understanding) and ASR (automatic speech recognition) are important components of a dialog system, the ASR converts a user's speech input into text, the NLU semantically understands the text, recognizes the user's intention, and thus performs a corresponding task and performs a speech response.
In the prior art, the functionality of NLU and ASR are independent of each other, each provided in an independent module. Fig. 5 is a block diagram of the architecture of a current voice dialog system.
As shown in fig. 5, the communication process of the current voice dialog system includes two communications. Specifically, the first communication is that voice input is sent to an ASR system from a client, and voice data is converted into text by the ASR system and then returned to the client; the second communication is that the text obtained from the client is sent to the NLU system, and the NLU system carries out semantic understanding to obtain a corresponding response and then returns the response to the client.
Therefore, the client needs to perform communication twice to obtain a response, and the communication flow is complicated.
Disclosure of Invention
In view of the above problems, the present invention is directed to a voice dialog system and a voice dialog implementation method capable of simplifying a communication flow.
The invention discloses a voice dialogue realizing method, which is characterized in that the method is used for realizing voice dialogue between a client and a server and comprises the following steps:
a first transmission step of transmitting voice data from the client to the server;
a conversion step, in which the server performs voice recognition and semantic understanding on the voice data and generates text data; and
and a second transmission step of transmitting the text data from the server to the client.
Optionally, in the first transmission step, communication between the client and the server is established in a socket long connection manner.
Optionally, the converting step comprises the sub-steps of:
extracting the characteristics of the voice data and inputting the extracted characteristics into an acoustic model to obtain a score sequence;
searching in a static decoder based on the score sequence to obtain text data corresponding to the voice data, wherein the static decoder is preset with corpus data, and the corpus data comprises scene corpus data based on a scene; and
and post-processing the text data output by the decoder to obtain the text data in a preset format.
Optionally, in the process of searching in the static decoder based on the score sequence to obtain the text data corresponding to the speech data, the static decoder searches in the scene corpus data only when matching with data in the scene corpus data is required.
Optionally, in the first transmission step, decision supplementary information for the scenario decision is further sent to a server together with the voice data.
The present invention provides a voice conversation implementation system for implementing a voice conversation between a client and a server, comprising: a client and a server, wherein the server is connected with the client,
wherein the client is used for transmitting voice data to the server and receiving text data from the server,
the server is used for carrying out voice recognition and semantic understanding on the voice data, generating text data and transmitting the text data to the client.
Optionally, the communication between the client and the server is established in a socket long connection manner.
Optionally, the server includes:
the voice recognizer is used for extracting the characteristics of the voice data and inputting the extracted characteristics into an acoustic model to obtain a score sequence; and
the static decoder is used for searching the score sequence to obtain text data corresponding to the voice data, wherein the static decoder is preset with corpus data, and the corpus data comprises scene corpus data based on scenes; and
and the output module is used for carrying out post-processing on the text data output by the decoder so as to obtain the text data in a preset format.
Optionally, in the process of searching by the static decoder to obtain the text data corresponding to the speech data, the static decoder searches the scene corpus data only when the static decoder needs to match with data in the scene corpus data.
Optionally, the client sends decision supplementary information for the scenario decision to the server together with the voice data.
The computer-readable medium of the present invention, on which a computer program is stored, is characterized in that,
which computer program, when being executed by a processor, carries out the above-mentioned speech dialog realization method.
The computer device of the present invention includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and is characterized in that the processor implements the above-mentioned voice conversation implementing method when executing the computer program.
As described above, according to the voice dialogue system and the voice dialogue implementation method of the present invention, by integrating voice recognition and semantic understanding into one service, the client can directly reply to the client after performing voice recognition and semantic understanding on voice data only by performing communication once with the server. Moreover, the speech recognition accuracy rate under a specific scene can be improved by adding two semantic understanding processes of scene decision and scene decoding network search. Furthermore, communication is carried out between the client and the server through establishing a socket long link, the socket link state is maintained through the conversation state, the link is kept until the conversation state is finished, and resource waste caused by frequent new links can be avoided.
Other features and advantages of the methods and apparatus of the present invention will be more particularly apparent from or elucidated with reference to the drawings described herein, and the following detailed description of the embodiments used to illustrate certain principles of the invention.
Drawings
Fig. 1 is a flowchart showing a voice conversation realization method according to an embodiment of the present invention.
Fig. 2 is a flowchart showing a specific procedure of the conversion step S200.
The data protocol for communication between the client 100 and the server 200 is shown in fig. 3.
Fig. 4 is a block diagram showing an architecture of a voice conversation realization system according to an embodiment of the present invention.
Fig. 5 is a block diagram of the architecture of a current voice dialog system.
Detailed Description
The following description is of some of the several embodiments of the invention and is intended to provide a basic understanding of the invention. It is not intended to identify key or critical elements of the invention or to delineate the scope of the invention.
Fig. 1 is a flowchart showing a voice conversation realization method according to an embodiment of the present invention.
As shown in fig. 1, the voice dialog according to an embodiment of the present invention is used for implementing a voice dialog between a client 100 and a server 200, and the method includes the following steps:
first transmission step S100: transmitting voice data from the client 100 to the server 200;
a conversion step S200: the server 200 performs voice recognition and semantic understanding on the voice data and generates text data; and
second transmission step S300: the text data is transmitted from the server 200 to the client 100.
According to the present application, both speech recognition and semantic understanding are accomplished at the conversion step S200. Thus, instead of the client needing to communicate one voice data to and from the remote end as in the prior art, the client 100 can obtain text data from the server 200 as long as it sends the voice data to the server 200.
The specific contents of the conversion step S200 will be explained here.
Fig. 2 is a flowchart showing a specific procedure of the conversion step S200.
As shown in fig. 2, the converting step S200 includes the following sub-steps: step S201, step S202, and step S203.
Next, these steps will be specifically described.
Step S201: and extracting the characteristics of the voice data and inputting the extracted characteristics into an acoustic model to obtain a score sequence of each state at each moment. The voice data is subjected to feature extraction, input into the acoustic model and score sequence, and conventional processing steps can be adopted, which are not the key points of the present invention, and thus details are not repeated here.
Step S202: based on the resulting sequence of scores, a search is made, for example in a static decoder (wfst), to obtain a result corresponding thereto, referred to herein as a search result. The static decoder comprises a state probability model and a language model, wherein the language model is generated by training collected linguistic data and a dictionary. And searching the maximum scoring path which accords with the language model constraint in the probability model based on the scoring sequence so as to obtain the optimal solution, namely the result which is most matched with the scoring sequence.
It should be noted that the corpus in the language model according to the present application includes scene corpus information, which may be related to various factors, such as the address book of the user, the specific voice habit of the user, the place name, etc., and all information helpful for understanding the semantics of the user and the specific user may be covered herein.
Based on this, step 202 is further explained as follows: in the process of searching based on the score sequence and in the static decoder, not only automatic speech recognition is obtained, but also more accurate information is obtained based on the corpus information, such as scene corpus information, in the recognition process, so as to give the best matching search result, namely, the text which is the best matched with the speech input. It should be understood that not every voice input needs to be matched in the scene prediction information, and the scene corpus information may not be searched if a determination result can be obtained without using the scene corpus information, but in the following example, the scene corpus information is retrieved because of uncertain semantics.
Therefore, compared with the prior art that ASR is independent of NLU automatic speech recognition as shown in FIG. 5, the NLU and the ASR are fused together, so that in the ASR stage, the semantic understanding part of the NLU can be adopted, and more accurate search results corresponding to speech input, namely original text data, can be given.
By way of example, it may be possible to perform the most basic speech recognition and then perform further searches in the scene corpus information as described above to obtain the optimal solution, which is particularly beneficial in cases where there are multiple understandings of speech.
Step S203: the search results (raw text data) of step 202 are post-processed and text results in a predetermined format are obtained.
Here, the speech dialogue implementing method of the present invention is explained by taking an example.
For example, the address book of the user a has a contact Chen I, the address book of the user B has a contact Chen I, and the address books of the users a and B are used as scene corpora. In the case where both user a and user B say "call chenyi", the scoring sequence of the speech inputs of user a and user B is obtained first (corresponding to step 201), and then, based on the scoring sequence, a search is performed at the static decoder, in this process, because of the scene corpus, the search result for user a will be accurately given as "call chenyi", and the search result for user B will be "call chenyi" (corresponding to step 202). Finally, the search results are post-processed to obtain a text result in a predetermined format (equivalent to step 203). The text result will be sent by the server to the client.
In particular, in the present application, communication is performed between the client 100 and the server 200 by establishing a socket long link. The socket refers to that two programs on a network realize data exchange through a bidirectional communication connection, and one end of the connection is called as a socket. The long socket link means that the client and the server only use one socket object in the whole communication process, and socket connection is kept for a long time. The data protocol for communication between the client 100 and the server 200 is shown in fig. 3.
As shown in fig. 3, a header portion, a voice data portion, and an end flag are included in the communication data.
The header part includes the header length and supplementary information needed for the semantic understanding to make the above decision, such as vehicle ID information, current location, bluetooth connection status, current navigation status, etc. (not mentioned in the example of fig. 2, but actually, these information can also be used as scene corpus information). For example, when the user wants to search for a restaurant in the vicinity, the client 100 makes an inquiry to the server 200, and according to the data protocol shown in fig. 3, the client 100 transmits the current location information to the server 200 in the header and then transmits audio data corresponding to "search for a restaurant in the vicinity", and the server 200 can perform a search using the current location information in the header after recognizing the intention.
Moreover, the socket link state is maintained through the conversation state, and the link is kept until the conversation state is finished, so that resource waste caused by frequent new links can be avoided.
The voice dialogue implementing method according to the present invention is explained above, and the voice dialogue implementing system according to the present invention is explained next.
Fig. 4 is a block diagram showing an architecture of a voice conversation realization system according to an embodiment of the present invention.
As shown in fig. 4, the voice conversation realization system according to an embodiment of the present invention is used to realize a voice conversation between a client 100 and a server 200.
Wherein the client 100 is used to transmit voice data to the server 200 and to receive text data from the server 200. The server 200 is configured to perform speech recognition and semantic understanding on the speech data, generate text data, and transmit the text data to the client 100.
The client 100 includes:
a sending module 110, configured to send voice data; and
a receiving module 120, configured to receive text data.
Wherein the server 200 is configured to include:
the voice recognizer 210 is configured to perform feature extraction on the voice data and input the voice data into an acoustic model to obtain a score sequence of each state at each time;
a static decoder 220 that performs a search based on the score sequence to obtain text data corresponding to the speech data, wherein corpus data including scene corpus data based on a scene is preset in the static decoder; and
and an output module 230 for performing post-processing on the search result to obtain a text result in a predetermined format, wherein the output module 230 is, for example, a communication component.
In the process of searching by the static decoder 220 to obtain the text data corresponding to the speech data, the static decoder 220 searches the scene corpus data only when it needs to match with the data in the scene corpus data.
Preferably, a long socket link is established between the client 100 and the server 200, and the communication is performed by using the data protocol shown in fig. 3. Wherein the client 100 transmits decision supplementary information for the scenario decision together with voice data to the server 200.
The present invention also provides a computer-readable medium having stored thereon a computer program which, when executed by a processor, implements the voice conversation implementing method described above.
The present invention also provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the above-mentioned voice conversation implementing method when executing the computer program.
As described above, according to the voice dialogue system and the voice dialogue implementation method of the present invention, by integrating voice recognition and semantic understanding into one service, the client can directly reply to the client after performing voice recognition and semantic understanding on voice data only by performing communication once with the server. Moreover, the speech recognition accuracy rate under a specific scene can be improved by adding two semantic understanding processes of scene decision and scene decoding network search. Furthermore, communication is carried out between the client and the server through establishing a socket long link, the socket link state is maintained through the conversation state, the link is kept until the conversation state is finished, and resource waste caused by frequent new links can be avoided.
The above examples mainly illustrate the voice dialogue system and the voice dialogue implementing method of the present invention. Although only a few embodiments of the present invention have been described in detail, those skilled in the art will appreciate that the present invention may be embodied in many other forms without departing from the spirit or scope thereof. Accordingly, the present examples and embodiments are to be considered as illustrative and not restrictive, and various modifications and substitutions may be made therein without departing from the spirit and scope of the present invention as defined by the appended claims.

Claims (12)

1. A voice conversation realization method, for realizing a voice conversation between a client and a server, comprising the steps of:
a first transmission step of transmitting voice data from the client to the server;
a conversion step, in which the server performs voice recognition and semantic understanding on the voice data and generates text data; and
and a second transmission step of transmitting the text data from the server to the client.
2. The voice dialog implementation method of claim 1,
and in the first transmission step, communication between the client and the server is established in a socket long connection mode.
3. A method for speech dialog realization according to claim 1, characterized in that the conversion step comprises the following sub-steps:
extracting the characteristics of the voice data and inputting the extracted characteristics into an acoustic model to obtain a score sequence;
searching in a static decoder based on the score sequence to obtain text data corresponding to the voice data, wherein the static decoder is preset with corpus data, and the corpus data comprises scene corpus data based on a scene; and
and post-processing the text data output by the decoder to obtain the text data in a preset format.
4. The method according to claim 3, wherein in the searching in the static decoder based on the score sequence to obtain the text data corresponding to the speech data, the static decoder searches in the scene corpus data only when a match with data in the scene corpus data is required.
5. The voice dialog implementation method of claim 1,
in the first transmission step, decision side information for the scenario decision is further sent to a server together with the voice data.
6. A voice conversation realization system for realizing a voice conversation between a client and a server, comprising: a client and a server, wherein the server is connected with the client,
wherein the client is used for transmitting voice data to the server and receiving text data from the server,
the server is used for carrying out voice recognition and semantic understanding on the voice data, generating text data and transmitting the text data to the client.
7. The voice dialog implementation method of claim 6,
and establishing communication between the client and the server in a socket long connection mode.
8. The voice conversation realization system according to claim 6, wherein said server comprises:
the voice recognizer is used for extracting the characteristics of the voice data and inputting the extracted characteristics into an acoustic model to obtain a score sequence;
the static decoder is used for searching the score sequence to obtain text data corresponding to the voice data, wherein the static decoder is preset with corpus data, and the corpus data comprises scene corpus data based on scenes; and
and the output module is used for carrying out post-processing on the text data output by the decoder so as to obtain the text data in a preset format.
9. The voice dialog implementation system of claim 8 wherein,
and in the process that the static decoder searches to obtain the text data corresponding to the voice data, the static decoder searches the scene corpus data only when the static decoder needs to be matched with the data in the scene corpus data.
10. The voice dialog implementation system of claim 8 wherein,
the client sends decision supplemental information for the scenario decision to the server along with the voice data.
11. A computer-readable medium, having stored thereon a computer program,
the computer program, when executed by a processor, implements the method of voice dialog implementation of any of claims 1-5.
12. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any one of claims 1 to 5 when executing the computer program.
CN201910108497.1A 2019-02-03 2019-02-03 Voice conversation system and voice conversation implementation method Pending CN111524508A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910108497.1A CN111524508A (en) 2019-02-03 2019-02-03 Voice conversation system and voice conversation implementation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910108497.1A CN111524508A (en) 2019-02-03 2019-02-03 Voice conversation system and voice conversation implementation method

Publications (1)

Publication Number Publication Date
CN111524508A true CN111524508A (en) 2020-08-11

Family

ID=71900456

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910108497.1A Pending CN111524508A (en) 2019-02-03 2019-02-03 Voice conversation system and voice conversation implementation method

Country Status (1)

Country Link
CN (1) CN111524508A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113470631A (en) * 2021-06-28 2021-10-01 北京小米移动软件有限公司 Voice signal processing method and device, electronic equipment and storage medium
CN113593568A (en) * 2021-06-30 2021-11-02 北京新氧科技有限公司 Method, system, apparatus, device and storage medium for converting speech into text

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130346078A1 (en) * 2012-06-26 2013-12-26 Google Inc. Mixed model speech recognition
CN103794211A (en) * 2012-11-02 2014-05-14 北京百度网讯科技有限公司 Voice recognition method and system
CN105551493A (en) * 2015-11-30 2016-05-04 北京光年无限科技有限公司 Method and device of data processing of children voice robot and children voice robot
CN107943834A (en) * 2017-10-25 2018-04-20 百度在线网络技术(北京)有限公司 Interactive implementation method, device, equipment and storage medium
CN108428446A (en) * 2018-03-06 2018-08-21 北京百度网讯科技有限公司 Audio recognition method and device
CN108899013A (en) * 2018-06-27 2018-11-27 广州视源电子科技股份有限公司 Voice search method, device and speech recognition system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130346078A1 (en) * 2012-06-26 2013-12-26 Google Inc. Mixed model speech recognition
CN103794211A (en) * 2012-11-02 2014-05-14 北京百度网讯科技有限公司 Voice recognition method and system
CN105551493A (en) * 2015-11-30 2016-05-04 北京光年无限科技有限公司 Method and device of data processing of children voice robot and children voice robot
CN107943834A (en) * 2017-10-25 2018-04-20 百度在线网络技术(北京)有限公司 Interactive implementation method, device, equipment and storage medium
CN108428446A (en) * 2018-03-06 2018-08-21 北京百度网讯科技有限公司 Audio recognition method and device
CN108899013A (en) * 2018-06-27 2018-11-27 广州视源电子科技股份有限公司 Voice search method, device and speech recognition system

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113470631A (en) * 2021-06-28 2021-10-01 北京小米移动软件有限公司 Voice signal processing method and device, electronic equipment and storage medium
CN113593568A (en) * 2021-06-30 2021-11-02 北京新氧科技有限公司 Method, system, apparatus, device and storage medium for converting speech into text
CN113593568B (en) * 2021-06-30 2024-06-07 北京新氧科技有限公司 Method, system, device, equipment and storage medium for converting voice into text

Similar Documents

Publication Publication Date Title
KR101683944B1 (en) Speech translation system, control apparatus and control method
US11049493B2 (en) Spoken dialog device, spoken dialog method, and recording medium
EP0954856B1 (en) Context dependent phoneme networks for encoding speech information
EP1125279B1 (en) System and method for providing network coordinated conversational services
US7003463B1 (en) System and method for providing network coordinated conversational services
CN113327609B (en) Method and apparatus for speech recognition
US20060149551A1 (en) Mobile dictation correction user interface
JP2017107078A (en) Voice interactive method, voice interactive device, and voice interactive program
KR20170033722A (en) Apparatus and method for processing user's locution, and dialog management apparatus
JP2005530279A (en) System and method for accessing Internet content
JP5471106B2 (en) Speech translation system, dictionary server device, and program
KR101640024B1 (en) Portable interpretation apparatus and method based on uer's situation
CN101681365A (en) Method and apparatus for distributed voice searching
CN102439661A (en) Service oriented speech recognition for in-vehicle automated interaction
US8509396B2 (en) Automatic creation of complex conversational natural language call routing system for call centers
CN110910903B (en) Speech emotion recognition method, device, equipment and computer readable storage medium
CN105206272A (en) Voice transmission control method and system
JP2014106523A (en) Voice input corresponding device and voice input corresponding program
JP2011232619A (en) Voice recognition device and voice recognition method
CN111094924A (en) Data processing apparatus and method for performing voice-based human-machine interaction
CN111524508A (en) Voice conversation system and voice conversation implementation method
JP3795350B2 (en) Voice dialogue apparatus, voice dialogue method, and voice dialogue processing program
JP4962416B2 (en) Speech recognition system
KR101326262B1 (en) Speech recognition device and method thereof
JP2004515859A (en) Decentralized speech recognition for Internet access

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200811