CN111524508A - Voice conversation system and voice conversation implementation method - Google Patents
Voice conversation system and voice conversation implementation method Download PDFInfo
- Publication number
- CN111524508A CN111524508A CN201910108497.1A CN201910108497A CN111524508A CN 111524508 A CN111524508 A CN 111524508A CN 201910108497 A CN201910108497 A CN 201910108497A CN 111524508 A CN111524508 A CN 111524508A
- Authority
- CN
- China
- Prior art keywords
- voice
- data
- server
- client
- text data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 40
- 230000006854 communication Effects 0.000 claims abstract description 23
- 238000004891 communication Methods 0.000 claims abstract description 21
- 230000005540 biological transmission Effects 0.000 claims abstract description 12
- 238000006243 chemical reaction Methods 0.000 claims abstract description 9
- 230000003068 static effect Effects 0.000 claims description 26
- 238000004590 computer program Methods 0.000 claims description 11
- 230000008569 process Effects 0.000 claims description 9
- 238000012805 post-processing Methods 0.000 claims description 5
- 230000000153 supplemental effect Effects 0.000 claims 1
- 238000010586 diagram Methods 0.000 description 4
- 230000004044 response Effects 0.000 description 4
- 239000002699 waste material Substances 0.000 description 3
- 238000000605 extraction Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007175 bidirectional communication Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G10L15/30—Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G10L15/32—Multiple recognisers used in sequence or in parallel; Score combination systems therefor, e.g. voting systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L69/00—Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
- H04L69/16—Implementation or adaptation of Internet protocol [IP], of transmission control protocol [TCP] or of user datagram protocol [UDP]
- H04L69/161—Implementation details of TCP/IP or UDP/IP stack architecture; Specification of modified or new header fields
- H04L69/162—Implementation details of TCP/IP or UDP/IP stack architecture; Specification of modified or new header fields involving adaptations of sockets based mechanisms
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/225—Feedback of the input speech
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computer Security & Cryptography (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Telephonic Communication Services (AREA)
Abstract
The invention relates to a voice conversation realization method and a voice conversation realization system. The method is used for realizing the voice conversation between the client and the server and comprises the following steps: a first transmission step of transmitting voice data from the client to the server; a conversion step, in which the server performs voice recognition and semantic understanding on the voice data and generates text data; and a second transmission step of transmitting the text data from the server to the client. According to the invention, the client can carry out voice recognition and semantic understanding on the voice data only by carrying out one-time communication with the server, and the voice recognition accuracy rate under a specific scene can be improved.
Description
Technical Field
The present invention relates to man-machine interaction technology, and in particular, to a voice dialog system and a voice dialog implementation method.
Background
NLU (natural language understanding) and ASR (automatic speech recognition) are important components of a dialog system, the ASR converts a user's speech input into text, the NLU semantically understands the text, recognizes the user's intention, and thus performs a corresponding task and performs a speech response.
In the prior art, the functionality of NLU and ASR are independent of each other, each provided in an independent module. Fig. 5 is a block diagram of the architecture of a current voice dialog system.
As shown in fig. 5, the communication process of the current voice dialog system includes two communications. Specifically, the first communication is that voice input is sent to an ASR system from a client, and voice data is converted into text by the ASR system and then returned to the client; the second communication is that the text obtained from the client is sent to the NLU system, and the NLU system carries out semantic understanding to obtain a corresponding response and then returns the response to the client.
Therefore, the client needs to perform communication twice to obtain a response, and the communication flow is complicated.
Disclosure of Invention
In view of the above problems, the present invention is directed to a voice dialog system and a voice dialog implementation method capable of simplifying a communication flow.
The invention discloses a voice dialogue realizing method, which is characterized in that the method is used for realizing voice dialogue between a client and a server and comprises the following steps:
a first transmission step of transmitting voice data from the client to the server;
a conversion step, in which the server performs voice recognition and semantic understanding on the voice data and generates text data; and
and a second transmission step of transmitting the text data from the server to the client.
Optionally, in the first transmission step, communication between the client and the server is established in a socket long connection manner.
Optionally, the converting step comprises the sub-steps of:
extracting the characteristics of the voice data and inputting the extracted characteristics into an acoustic model to obtain a score sequence;
searching in a static decoder based on the score sequence to obtain text data corresponding to the voice data, wherein the static decoder is preset with corpus data, and the corpus data comprises scene corpus data based on a scene; and
and post-processing the text data output by the decoder to obtain the text data in a preset format.
Optionally, in the process of searching in the static decoder based on the score sequence to obtain the text data corresponding to the speech data, the static decoder searches in the scene corpus data only when matching with data in the scene corpus data is required.
Optionally, in the first transmission step, decision supplementary information for the scenario decision is further sent to a server together with the voice data.
The present invention provides a voice conversation implementation system for implementing a voice conversation between a client and a server, comprising: a client and a server, wherein the server is connected with the client,
wherein the client is used for transmitting voice data to the server and receiving text data from the server,
the server is used for carrying out voice recognition and semantic understanding on the voice data, generating text data and transmitting the text data to the client.
Optionally, the communication between the client and the server is established in a socket long connection manner.
Optionally, the server includes:
the voice recognizer is used for extracting the characteristics of the voice data and inputting the extracted characteristics into an acoustic model to obtain a score sequence; and
the static decoder is used for searching the score sequence to obtain text data corresponding to the voice data, wherein the static decoder is preset with corpus data, and the corpus data comprises scene corpus data based on scenes; and
and the output module is used for carrying out post-processing on the text data output by the decoder so as to obtain the text data in a preset format.
Optionally, in the process of searching by the static decoder to obtain the text data corresponding to the speech data, the static decoder searches the scene corpus data only when the static decoder needs to match with data in the scene corpus data.
Optionally, the client sends decision supplementary information for the scenario decision to the server together with the voice data.
The computer-readable medium of the present invention, on which a computer program is stored, is characterized in that,
which computer program, when being executed by a processor, carries out the above-mentioned speech dialog realization method.
The computer device of the present invention includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and is characterized in that the processor implements the above-mentioned voice conversation implementing method when executing the computer program.
As described above, according to the voice dialogue system and the voice dialogue implementation method of the present invention, by integrating voice recognition and semantic understanding into one service, the client can directly reply to the client after performing voice recognition and semantic understanding on voice data only by performing communication once with the server. Moreover, the speech recognition accuracy rate under a specific scene can be improved by adding two semantic understanding processes of scene decision and scene decoding network search. Furthermore, communication is carried out between the client and the server through establishing a socket long link, the socket link state is maintained through the conversation state, the link is kept until the conversation state is finished, and resource waste caused by frequent new links can be avoided.
Other features and advantages of the methods and apparatus of the present invention will be more particularly apparent from or elucidated with reference to the drawings described herein, and the following detailed description of the embodiments used to illustrate certain principles of the invention.
Drawings
Fig. 1 is a flowchart showing a voice conversation realization method according to an embodiment of the present invention.
Fig. 2 is a flowchart showing a specific procedure of the conversion step S200.
The data protocol for communication between the client 100 and the server 200 is shown in fig. 3.
Fig. 4 is a block diagram showing an architecture of a voice conversation realization system according to an embodiment of the present invention.
Fig. 5 is a block diagram of the architecture of a current voice dialog system.
Detailed Description
The following description is of some of the several embodiments of the invention and is intended to provide a basic understanding of the invention. It is not intended to identify key or critical elements of the invention or to delineate the scope of the invention.
Fig. 1 is a flowchart showing a voice conversation realization method according to an embodiment of the present invention.
As shown in fig. 1, the voice dialog according to an embodiment of the present invention is used for implementing a voice dialog between a client 100 and a server 200, and the method includes the following steps:
first transmission step S100: transmitting voice data from the client 100 to the server 200;
a conversion step S200: the server 200 performs voice recognition and semantic understanding on the voice data and generates text data; and
second transmission step S300: the text data is transmitted from the server 200 to the client 100.
According to the present application, both speech recognition and semantic understanding are accomplished at the conversion step S200. Thus, instead of the client needing to communicate one voice data to and from the remote end as in the prior art, the client 100 can obtain text data from the server 200 as long as it sends the voice data to the server 200.
The specific contents of the conversion step S200 will be explained here.
Fig. 2 is a flowchart showing a specific procedure of the conversion step S200.
As shown in fig. 2, the converting step S200 includes the following sub-steps: step S201, step S202, and step S203.
Next, these steps will be specifically described.
Step S201: and extracting the characteristics of the voice data and inputting the extracted characteristics into an acoustic model to obtain a score sequence of each state at each moment. The voice data is subjected to feature extraction, input into the acoustic model and score sequence, and conventional processing steps can be adopted, which are not the key points of the present invention, and thus details are not repeated here.
Step S202: based on the resulting sequence of scores, a search is made, for example in a static decoder (wfst), to obtain a result corresponding thereto, referred to herein as a search result. The static decoder comprises a state probability model and a language model, wherein the language model is generated by training collected linguistic data and a dictionary. And searching the maximum scoring path which accords with the language model constraint in the probability model based on the scoring sequence so as to obtain the optimal solution, namely the result which is most matched with the scoring sequence.
It should be noted that the corpus in the language model according to the present application includes scene corpus information, which may be related to various factors, such as the address book of the user, the specific voice habit of the user, the place name, etc., and all information helpful for understanding the semantics of the user and the specific user may be covered herein.
Based on this, step 202 is further explained as follows: in the process of searching based on the score sequence and in the static decoder, not only automatic speech recognition is obtained, but also more accurate information is obtained based on the corpus information, such as scene corpus information, in the recognition process, so as to give the best matching search result, namely, the text which is the best matched with the speech input. It should be understood that not every voice input needs to be matched in the scene prediction information, and the scene corpus information may not be searched if a determination result can be obtained without using the scene corpus information, but in the following example, the scene corpus information is retrieved because of uncertain semantics.
Therefore, compared with the prior art that ASR is independent of NLU automatic speech recognition as shown in FIG. 5, the NLU and the ASR are fused together, so that in the ASR stage, the semantic understanding part of the NLU can be adopted, and more accurate search results corresponding to speech input, namely original text data, can be given.
By way of example, it may be possible to perform the most basic speech recognition and then perform further searches in the scene corpus information as described above to obtain the optimal solution, which is particularly beneficial in cases where there are multiple understandings of speech.
Step S203: the search results (raw text data) of step 202 are post-processed and text results in a predetermined format are obtained.
Here, the speech dialogue implementing method of the present invention is explained by taking an example.
For example, the address book of the user a has a contact Chen I, the address book of the user B has a contact Chen I, and the address books of the users a and B are used as scene corpora. In the case where both user a and user B say "call chenyi", the scoring sequence of the speech inputs of user a and user B is obtained first (corresponding to step 201), and then, based on the scoring sequence, a search is performed at the static decoder, in this process, because of the scene corpus, the search result for user a will be accurately given as "call chenyi", and the search result for user B will be "call chenyi" (corresponding to step 202). Finally, the search results are post-processed to obtain a text result in a predetermined format (equivalent to step 203). The text result will be sent by the server to the client.
In particular, in the present application, communication is performed between the client 100 and the server 200 by establishing a socket long link. The socket refers to that two programs on a network realize data exchange through a bidirectional communication connection, and one end of the connection is called as a socket. The long socket link means that the client and the server only use one socket object in the whole communication process, and socket connection is kept for a long time. The data protocol for communication between the client 100 and the server 200 is shown in fig. 3.
As shown in fig. 3, a header portion, a voice data portion, and an end flag are included in the communication data.
The header part includes the header length and supplementary information needed for the semantic understanding to make the above decision, such as vehicle ID information, current location, bluetooth connection status, current navigation status, etc. (not mentioned in the example of fig. 2, but actually, these information can also be used as scene corpus information). For example, when the user wants to search for a restaurant in the vicinity, the client 100 makes an inquiry to the server 200, and according to the data protocol shown in fig. 3, the client 100 transmits the current location information to the server 200 in the header and then transmits audio data corresponding to "search for a restaurant in the vicinity", and the server 200 can perform a search using the current location information in the header after recognizing the intention.
Moreover, the socket link state is maintained through the conversation state, and the link is kept until the conversation state is finished, so that resource waste caused by frequent new links can be avoided.
The voice dialogue implementing method according to the present invention is explained above, and the voice dialogue implementing system according to the present invention is explained next.
Fig. 4 is a block diagram showing an architecture of a voice conversation realization system according to an embodiment of the present invention.
As shown in fig. 4, the voice conversation realization system according to an embodiment of the present invention is used to realize a voice conversation between a client 100 and a server 200.
Wherein the client 100 is used to transmit voice data to the server 200 and to receive text data from the server 200. The server 200 is configured to perform speech recognition and semantic understanding on the speech data, generate text data, and transmit the text data to the client 100.
The client 100 includes:
a sending module 110, configured to send voice data; and
a receiving module 120, configured to receive text data.
Wherein the server 200 is configured to include:
the voice recognizer 210 is configured to perform feature extraction on the voice data and input the voice data into an acoustic model to obtain a score sequence of each state at each time;
a static decoder 220 that performs a search based on the score sequence to obtain text data corresponding to the speech data, wherein corpus data including scene corpus data based on a scene is preset in the static decoder; and
and an output module 230 for performing post-processing on the search result to obtain a text result in a predetermined format, wherein the output module 230 is, for example, a communication component.
In the process of searching by the static decoder 220 to obtain the text data corresponding to the speech data, the static decoder 220 searches the scene corpus data only when it needs to match with the data in the scene corpus data.
Preferably, a long socket link is established between the client 100 and the server 200, and the communication is performed by using the data protocol shown in fig. 3. Wherein the client 100 transmits decision supplementary information for the scenario decision together with voice data to the server 200.
The present invention also provides a computer-readable medium having stored thereon a computer program which, when executed by a processor, implements the voice conversation implementing method described above.
The present invention also provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the above-mentioned voice conversation implementing method when executing the computer program.
As described above, according to the voice dialogue system and the voice dialogue implementation method of the present invention, by integrating voice recognition and semantic understanding into one service, the client can directly reply to the client after performing voice recognition and semantic understanding on voice data only by performing communication once with the server. Moreover, the speech recognition accuracy rate under a specific scene can be improved by adding two semantic understanding processes of scene decision and scene decoding network search. Furthermore, communication is carried out between the client and the server through establishing a socket long link, the socket link state is maintained through the conversation state, the link is kept until the conversation state is finished, and resource waste caused by frequent new links can be avoided.
The above examples mainly illustrate the voice dialogue system and the voice dialogue implementing method of the present invention. Although only a few embodiments of the present invention have been described in detail, those skilled in the art will appreciate that the present invention may be embodied in many other forms without departing from the spirit or scope thereof. Accordingly, the present examples and embodiments are to be considered as illustrative and not restrictive, and various modifications and substitutions may be made therein without departing from the spirit and scope of the present invention as defined by the appended claims.
Claims (12)
1. A voice conversation realization method, for realizing a voice conversation between a client and a server, comprising the steps of:
a first transmission step of transmitting voice data from the client to the server;
a conversion step, in which the server performs voice recognition and semantic understanding on the voice data and generates text data; and
and a second transmission step of transmitting the text data from the server to the client.
2. The voice dialog implementation method of claim 1,
and in the first transmission step, communication between the client and the server is established in a socket long connection mode.
3. A method for speech dialog realization according to claim 1, characterized in that the conversion step comprises the following sub-steps:
extracting the characteristics of the voice data and inputting the extracted characteristics into an acoustic model to obtain a score sequence;
searching in a static decoder based on the score sequence to obtain text data corresponding to the voice data, wherein the static decoder is preset with corpus data, and the corpus data comprises scene corpus data based on a scene; and
and post-processing the text data output by the decoder to obtain the text data in a preset format.
4. The method according to claim 3, wherein in the searching in the static decoder based on the score sequence to obtain the text data corresponding to the speech data, the static decoder searches in the scene corpus data only when a match with data in the scene corpus data is required.
5. The voice dialog implementation method of claim 1,
in the first transmission step, decision side information for the scenario decision is further sent to a server together with the voice data.
6. A voice conversation realization system for realizing a voice conversation between a client and a server, comprising: a client and a server, wherein the server is connected with the client,
wherein the client is used for transmitting voice data to the server and receiving text data from the server,
the server is used for carrying out voice recognition and semantic understanding on the voice data, generating text data and transmitting the text data to the client.
7. The voice dialog implementation method of claim 6,
and establishing communication between the client and the server in a socket long connection mode.
8. The voice conversation realization system according to claim 6, wherein said server comprises:
the voice recognizer is used for extracting the characteristics of the voice data and inputting the extracted characteristics into an acoustic model to obtain a score sequence;
the static decoder is used for searching the score sequence to obtain text data corresponding to the voice data, wherein the static decoder is preset with corpus data, and the corpus data comprises scene corpus data based on scenes; and
and the output module is used for carrying out post-processing on the text data output by the decoder so as to obtain the text data in a preset format.
9. The voice dialog implementation system of claim 8 wherein,
and in the process that the static decoder searches to obtain the text data corresponding to the voice data, the static decoder searches the scene corpus data only when the static decoder needs to be matched with the data in the scene corpus data.
10. The voice dialog implementation system of claim 8 wherein,
the client sends decision supplemental information for the scenario decision to the server along with the voice data.
11. A computer-readable medium, having stored thereon a computer program,
the computer program, when executed by a processor, implements the method of voice dialog implementation of any of claims 1-5.
12. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any one of claims 1 to 5 when executing the computer program.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910108497.1A CN111524508A (en) | 2019-02-03 | 2019-02-03 | Voice conversation system and voice conversation implementation method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910108497.1A CN111524508A (en) | 2019-02-03 | 2019-02-03 | Voice conversation system and voice conversation implementation method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111524508A true CN111524508A (en) | 2020-08-11 |
Family
ID=71900456
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910108497.1A Pending CN111524508A (en) | 2019-02-03 | 2019-02-03 | Voice conversation system and voice conversation implementation method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111524508A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113470631A (en) * | 2021-06-28 | 2021-10-01 | 北京小米移动软件有限公司 | Voice signal processing method and device, electronic equipment and storage medium |
CN113593568A (en) * | 2021-06-30 | 2021-11-02 | 北京新氧科技有限公司 | Method, system, apparatus, device and storage medium for converting speech into text |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130346078A1 (en) * | 2012-06-26 | 2013-12-26 | Google Inc. | Mixed model speech recognition |
CN103794211A (en) * | 2012-11-02 | 2014-05-14 | 北京百度网讯科技有限公司 | Voice recognition method and system |
CN105551493A (en) * | 2015-11-30 | 2016-05-04 | 北京光年无限科技有限公司 | Method and device of data processing of children voice robot and children voice robot |
CN107943834A (en) * | 2017-10-25 | 2018-04-20 | 百度在线网络技术(北京)有限公司 | Interactive implementation method, device, equipment and storage medium |
CN108428446A (en) * | 2018-03-06 | 2018-08-21 | 北京百度网讯科技有限公司 | Audio recognition method and device |
CN108899013A (en) * | 2018-06-27 | 2018-11-27 | 广州视源电子科技股份有限公司 | Voice search method, device and speech recognition system |
-
2019
- 2019-02-03 CN CN201910108497.1A patent/CN111524508A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130346078A1 (en) * | 2012-06-26 | 2013-12-26 | Google Inc. | Mixed model speech recognition |
CN103794211A (en) * | 2012-11-02 | 2014-05-14 | 北京百度网讯科技有限公司 | Voice recognition method and system |
CN105551493A (en) * | 2015-11-30 | 2016-05-04 | 北京光年无限科技有限公司 | Method and device of data processing of children voice robot and children voice robot |
CN107943834A (en) * | 2017-10-25 | 2018-04-20 | 百度在线网络技术(北京)有限公司 | Interactive implementation method, device, equipment and storage medium |
CN108428446A (en) * | 2018-03-06 | 2018-08-21 | 北京百度网讯科技有限公司 | Audio recognition method and device |
CN108899013A (en) * | 2018-06-27 | 2018-11-27 | 广州视源电子科技股份有限公司 | Voice search method, device and speech recognition system |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113470631A (en) * | 2021-06-28 | 2021-10-01 | 北京小米移动软件有限公司 | Voice signal processing method and device, electronic equipment and storage medium |
CN113593568A (en) * | 2021-06-30 | 2021-11-02 | 北京新氧科技有限公司 | Method, system, apparatus, device and storage medium for converting speech into text |
CN113593568B (en) * | 2021-06-30 | 2024-06-07 | 北京新氧科技有限公司 | Method, system, device, equipment and storage medium for converting voice into text |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR101683944B1 (en) | Speech translation system, control apparatus and control method | |
US11049493B2 (en) | Spoken dialog device, spoken dialog method, and recording medium | |
EP0954856B1 (en) | Context dependent phoneme networks for encoding speech information | |
EP1125279B1 (en) | System and method for providing network coordinated conversational services | |
US7003463B1 (en) | System and method for providing network coordinated conversational services | |
CN113327609B (en) | Method and apparatus for speech recognition | |
US20060149551A1 (en) | Mobile dictation correction user interface | |
JP2017107078A (en) | Voice interactive method, voice interactive device, and voice interactive program | |
KR20170033722A (en) | Apparatus and method for processing user's locution, and dialog management apparatus | |
JP2005530279A (en) | System and method for accessing Internet content | |
JP5471106B2 (en) | Speech translation system, dictionary server device, and program | |
KR101640024B1 (en) | Portable interpretation apparatus and method based on uer's situation | |
CN101681365A (en) | Method and apparatus for distributed voice searching | |
CN102439661A (en) | Service oriented speech recognition for in-vehicle automated interaction | |
US8509396B2 (en) | Automatic creation of complex conversational natural language call routing system for call centers | |
CN110910903B (en) | Speech emotion recognition method, device, equipment and computer readable storage medium | |
CN105206272A (en) | Voice transmission control method and system | |
JP2014106523A (en) | Voice input corresponding device and voice input corresponding program | |
JP2011232619A (en) | Voice recognition device and voice recognition method | |
CN111094924A (en) | Data processing apparatus and method for performing voice-based human-machine interaction | |
CN111524508A (en) | Voice conversation system and voice conversation implementation method | |
JP3795350B2 (en) | Voice dialogue apparatus, voice dialogue method, and voice dialogue processing program | |
JP4962416B2 (en) | Speech recognition system | |
KR101326262B1 (en) | Speech recognition device and method thereof | |
JP2004515859A (en) | Decentralized speech recognition for Internet access |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200811 |