CN110491383A - A kind of voice interactive method, device, system, storage medium and processor - Google Patents
A kind of voice interactive method, device, system, storage medium and processor Download PDFInfo
- Publication number
- CN110491383A CN110491383A CN201910910484.6A CN201910910484A CN110491383A CN 110491383 A CN110491383 A CN 110491383A CN 201910910484 A CN201910910484 A CN 201910910484A CN 110491383 A CN110491383 A CN 110491383A
- Authority
- CN
- China
- Prior art keywords
- target
- voice
- result
- speech recognition
- recognition result
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 130
- 230000002452 interceptive effect Effects 0.000 title claims abstract description 46
- 238000012545 processing Methods 0.000 claims abstract description 86
- 230000008569 process Effects 0.000 claims abstract description 73
- 238000003058 natural language processing Methods 0.000 claims abstract description 56
- 230000003993 interaction Effects 0.000 claims description 7
- 235000013399 edible fruits Nutrition 0.000 claims description 4
- 238000004378 air conditioning Methods 0.000 description 6
- 230000015572 biosynthetic process Effects 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 5
- 238000003786 synthesis reaction Methods 0.000 description 5
- 238000004891 communication Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 241001672694 Citrus reticulata Species 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 238000003780 insertion Methods 0.000 description 3
- 230000037431 insertion Effects 0.000 description 3
- 241000208340 Araliaceae Species 0.000 description 1
- 241000238558 Eucarida Species 0.000 description 1
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 description 1
- 235000003140 Panax quinquefolius Nutrition 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 235000008434 ginseng Nutrition 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/04—Details of speech synthesis systems, e.g. synthesiser structure or memory management
- G10L13/047—Architecture of speech synthesisers
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G10L15/32—Multiple recognisers used in sequence or in parallel; Score combination systems therefor, e.g. voting systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/06—Protocols specially adapted for file transfer, e.g. file transfer protocol [FTP]
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a kind of voice interactive method, device, system, storage medium and processors, this method comprises: obtaining input voice flow, input voice flow is distributed to each speech recognition engine and carries out speech recognition, chooses target voice recognition result obtaining each speech recognition result;Target voice recognition result is distributed to each natural language processing engine, target semanteme processing result is chosen in obtaining each semantic processes result;Input voice flow is replied according to target semanteme processing result.In the above method, target voice recognition result is filtered out in each speech recognition result, it is distributed to multiple natural language processing engines, target semanteme processing result is chosen in obtained each semantic processes result, interactive voice process is avoided to be handled using single ASR, NLP, TTS, limitation is bigger, if the problem of ASR and/or NLP identification is not allowed, influences interactive voice.
Description
Technical field
The present invention relates to human-computer interaction technique field more particularly to a kind of voice interactive method, device, system, storage Jie
Matter and processor.
Background technique
During interactive voice, the voice data of intelligent sound box acquisition input, by speech recognition ASR (Automatic
Speech Recognition) after the text recognized is sent to natural language processing NLP (NaturalLanguage
Processing), voice after semantic understanding end side is returned to using speech synthesis technique TTS (Text To Speech) to broadcast
It puts.
Existing interactive voice process is to be handled using single ASR, NLP, TTS input voice flow, limitation
Bigger, if ASR early period identification is inaccurate, while influencing whether that the understanding of NLP or ASR identification are accurate, NLP understands not enough meeting
Influence entire interactive voice process.
Summary of the invention
In view of this, the present invention provides a kind of infrastructure services method and device based on block chain, it is existing to solve
Interactive voice process be mostly single ASR, NLP, TTS processing, limitation is bigger, for example ASR early period identification not
Standard, while influencing whether that the understanding of NLP or ASR identification are accurate, if NLP understands not enough, equally influence whether entire voice
The problem of interactive process, concrete scheme are as follows:
A kind of voice interactive method, comprising:
Input voice flow is obtained, the input voice flow is distributed to each target voice identification engine and carries out voice knowledge
Not, each speech recognition result is obtained;
Target voice recognition result is chosen in each speech recognition result;
The target voice recognition result is distributed to each target natural language processing engine, obtains each semantic processes
As a result;
Target semanteme processing result is chosen in each semantic processes result;
The input voice flow is replied according to the target semanteme processing result.
Above-mentioned method optionally chooses target voice recognition result in each speech recognition result, comprising:
Obtain the discrimination of each speech recognition result;
Using the highest recognition result of discrimination in each discrimination as target identification result.
Above-mentioned method optionally chooses target semanteme processing result in each semantic processes result, comprising:
Obtain the confidence level of each semantic processes result;
Using the highest semantic processes result of confidence level in each confidence level as target semanteme processing result.
Above-mentioned method optionally replys the input voice flow according to the target semanteme processing result, packet
It includes:
Obtain with the matched target retro of the target semanteme processing result and determine the use of the generation input voice flow
Family group;
According to the user group, target voice Compositing Engine is determined;
The target retro is converted into output voice flow by the target voice Compositing Engine.
Above-mentioned method, optionally, the determining user group for generating the input voice flow, comprising:
Obtain the type and/or face speech recognition for identifying the target voice identification engine of the target voice recognition result
As a result;
According to the type and/or the face speech recognition result, the user group is determined.
A kind of voice interaction device, comprising:
The input voice flow is distributed to each target voice for obtaining input voice flow by acquisition and identification module
It identifies that engine carries out speech recognition, obtains each speech recognition result;
Speech recognition result chooses module, for choosing target voice identification knot in each speech recognition result
Fruit;
Processing module is obtained for the target voice recognition result to be distributed to each target natural language processing engine
To each semantic processes result;
Processing result chooses module, for choosing target semanteme processing result in each semantic processes result;
Module is replied, for replying according to the target semanteme processing result the input voice flow.
Above-mentioned device, optionally, the reply module includes:
Acquisition and determination unit, for obtaining and the matched target retro of the target semanteme processing result and determining generation
The user group of the input voice flow;
Determination unit, for determining target voice Compositing Engine according to the user group;
Converting unit, for the target retro to be converted to output voice flow by the target voice Compositing Engine.
A kind of voice interactive system, comprising: Cloud Server, speech recognition module, semantic processes module, technical ability module, language
Sound synthesis module and intelligent sound terminal, wherein
The Cloud Server is used to obtain the input voice flow of the intelligent sound terminal acquisition, by the input voice flow
It is distributed to the speech recognition module and carries out speech recognition, obtain target voice recognition result;
The target voice recognition result is sent to the Cloud Server, the Cloud Server by the speech recognition module
By semantic processes module described in the target voice recognition result, target semanteme processing result is obtained;
The target semanteme processing result is sent to the Cloud Server, the Cloud Server by the semantic processes module
The target semanteme processing result is sent to the technical ability module, obtains target retro;
The target retro is sent to the Cloud Server by the technical ability module, and the Cloud Server returns the target
The voice synthetic module is given in recurrence, obtains output voice flow;
The output voice flow is sent to the Cloud Server by the voice synthetic module, and the Cloud Server will be described
Output voice flow is sent to the intelligent sound terminal and plays out.
A kind of storage medium, the storage medium include the program of storage, wherein described program executes a kind of above-mentioned language
Sound exchange method.
A kind of processor, the processor is for running program, wherein described program executes a kind of above-mentioned language when running
Sound exchange method.
Compared with prior art, the present invention includes the following advantages:
The invention discloses a kind of voice interactive method, device, system, storage medium and processors, this method comprises: obtaining
Input voice flow is taken, input voice flow is distributed to each speech recognition engine and carries out speech recognition, is known obtaining each voice
Other result chooses target voice recognition result;Target voice recognition result is distributed to each natural language processing engine,
Target semanteme processing result is chosen into each semantic processes result;Input voice flow is carried out according to target semanteme processing result
It replys.In the above method, target voice recognition result is filtered out in each speech recognition result, is distributed to multiple natures
Language processing engine chooses target semanteme processing result in obtained each semantic processes result, avoids interactive voice mistake
Cheng Caiyong single ASR, NLP, TTS is handled, and limitation is bigger, if ASR and/or NLP identification is inaccurate, influences voice friendship
Mutual problem.
Certainly, it implements any of the products of the present invention and does not necessarily require achieving all the advantages described above at the same time.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with
It obtains other drawings based on these drawings.
Fig. 1 is a kind of voice interactive method flow chart disclosed in the embodiment of the present application;
Fig. 2 is a kind of another flow chart of voice interactive method disclosed in the embodiment of the present application;
Fig. 3 is a kind of voice interactive system structural block diagram disclosed in the embodiment of the present application;
Fig. 4 is a kind of voice interaction device structural block diagram disclosed in the embodiment of the present application.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other
Embodiment shall fall within the protection scope of the present invention.
The foregoing description of the disclosed embodiments enables those skilled in the art to implement or use the present invention.
Various modifications to these embodiments will be readily apparent to those skilled in the art, as defined herein
General Principle can be realized in other embodiments without departing from the spirit or scope of the present invention.Therefore, of the invention
It is not intended to be limited to the embodiments shown herein, and is to fit to and the principles and novel features disclosed herein phase one
The widest scope of cause.
It the invention discloses a kind of voice interactive method and device, applies during interactive voice, existing voice is handed over
Input voice flow is handled by single ASR, NLP, TTS during mutually, if the speech recognition result and/or NLP of ASR
Natural language processing result and corresponding actual result deviation it is larger, it may appear that the case where giving an irrelevant answer influences interactive voice
Process, the present invention provides a kind of voice interactive methods for solving the above problems, and the execution process of the exchange method is such as
Shown in Fig. 1, comprising steps of
S101, input voice flow is obtained, the input voice flow is distributed to each target voice identification engine and carries out language
Sound identification, obtains each speech recognition result;
In the embodiment of the present invention, the input voice flow is obtained from intelligent sound equipment, the intelligent sound equipment
It can be intelligent sound box, intelligent sound robot, smart phone etc., the language that the intelligent sound equipment acquisition user issues
Sound is converted into input voice flow, and the input voice flow is distributed to each target voice identification engine and is identified, is obtained
To each speech recognition result.
Wherein, it is illustrated for the process of distributing, if in system including 10 speech recognition engines, the target language
Sound identify engine quantity can be less than or equal to 10, such as: can all regard above-mentioned 10 speech recognition engines as target
The quantity of speech recognition engine, i.e., the described speech recognition engine is equal with the target voice identification quantity of shade, will be described
Input voice flow is distributed to above-mentioned 10 target voices identification engine and carries out speech recognition, but this processing mode is to processor
It is more demanding, when the configuration of processor cannot be met the requirements, the speed that will lead to speech recognition is slow, so influence voice
Interactive process causes user experience during interactive voice bad, therefore, in order to improve the speed of speech recognition, Ke Yi
It is distributed to before speech recognition engine, the type of the input voice flow is obtained, according to the type to 10 above-mentioned voices
Identification engine is screened, and no less than two target voice identification engines, the quantity of the engine of target voice identification at this time are obtained
It can be less than or equal to 10.Wherein, the type can according to actual scene, vertically segment field, such as: the classification can be with
Classified by language, can also be carried out by professional domain classification or other scenes classify, wherein divided by language
Class can be subdivided into Chinese and foreign language, and Chinese can be subdivided into mandarin and dialect again, can also be directed to according to particular situation
Dialect continues to segment, and foreign language can be English, Japanese, Korean etc., can also classify by professional domain, such as: computer
Perhaps machinery field etc. can also be according to for computer field, the communications field or machinery field etc. for field, the communications field
Continue to segment according to concrete condition, details are not described herein, can also also implement comprising other zoned formats, the present invention certainly
In example, to the concrete form of the type without limiting.
S102, target voice recognition result is chosen in each speech recognition result;
In the embodiment of the present invention, engine is identified for each target voice, in output and the input voice flow pair
The discrimination of the recognition result can be also exported while the recognition result answered, discrimination can be because of signal-to-noise ratio, on-line/off-line identification etc.
Difference can be generated, therefore, it is necessary to obtain whether the signal-to-noise ratio of the input voice flow and target voice identification engine wait shadows online
After the factor for ringing discrimination, discrimination of the input voice flow in the case where corresponding target voice identifies engine is being determined.
In real work, the direct indicator of general discrimination be Word Error Rate WER (Word Error Rate) its definition such as
Under: in order to make to be consistent between the word sequence identified and the word sequence of standard, needs to be replaced, deletes or be inserted into
Certain words, the total number of these insertions, replacement or the word deleted, divided by the percentage of the total number of word in the word sequence of standard,
As WER.
Formula are as follows:
Accuracy=100-WER% (2)
Wherein: the number for the word that S- is replaced;
D- is deleted the number of word;
The number of I- insertion word;
N- word total number;
WER- Word Error Rate;
Accuracy- discrimination;
Wherein: WER can divide situations such as men and women, speed, accent, number/English/Chinese, respectively from the point of view of because there is insertion
Word, so theoretically WER is possible to be greater than 100%, but in practice, particularly when large sample size, be it is impossible, otherwise
It is just too poor, it is impossible to commercial.
Further, sentence error rate SER (Sentence Error Rate) can be used, i.e. " of sentence identification mistake
Several/total sentence number ".But in actual operation, general sentence error rate is 2~3 times of character error rate, so not adopting usually
Identification process is measured with sentence error rate.
Using discrimination as reference in the embodiment of the present invention, the discrimination of each speech recognition result is calculated first, it will
The highest speech recognition result of discrimination is as target voice recognition result in each discrimination.
S103, the target voice recognition result is distributed to each target natural language processing engine, obtains each language
Adopted processing result;
In the embodiment of the present invention, the target voice recognition result is distributed to each target and handles engine naturally, wherein
It is illustrated for the process of distributing, if in system including 10 natural language processing engines, at the target natural language
The quantity of engine is managed less than or equal to 10, such as: above-mentioned 10 natural language processing engines can be all used as target natural
Language processing engine, i.e., the quantity of the described target natural language processing engine are equal to the quantity of the natural language processing engine,
But this processing mode, when the configuration of processor cannot be met the requirements, will lead to voice knowledge to the more demanding of processor
Other speed is slow, and then influences the process of interactive voice, causes user experience during interactive voice bad, therefore, voice
Interactive speed can determine institute before the target identification result is distributed to each target natural language processing engine
State target identification resulting class, wherein the classification can determine according to actual scene, vertical subdivision field, such as: institute
State classification can be classified by language and also by professional domain carry out classification or other scenes classify, wherein press
Language, which carries out classification, can be subdivided into Chinese and foreign language, and Chinese can be subdivided into mandarin and dialect again, according to particular situation
It can also continue to segment for dialect, foreign language can be English, Japanese, Korean etc., can also classify by professional domain, example
Such as: computer field, the communications field perhaps machinery field for computer field, the communications field or machinery field etc.
Deng can also continue to segment according to concrete condition, details are not described herein, certainly also can also comprising other zoned formats,
In the embodiment of the present invention, to the concrete form of classification without limiting, it is preferred that for target voice identification engine and institute
Stating the classification of target natural language processing engine, there are corresponding relationships.For example, if the target voice recognition result is to pass through needle
The target voice identification engine of dialect is obtained, can be directly distributed to the target natural language processing engine of dialect i.e.
It can.
S104, target semanteme processing result is chosen in each semantic processes result;
In the embodiment of the present invention, for each target natural language processing engine, in output and the target voice
The confidence level of the target semanteme processing result can be also exported while recognition result corresponding target semanteme processing result, with described
Target natural language processing engine be Baidu NLP semantic computation general frame for, mainly divide three parts, bottom relies on
Big data, web data and user behavior data and High-Performance Computing Cluster (GPU, CPU and FPGA) have been made based on DNN and general
The target natural language processing engine of rate graph model, by entering the target voice recognition result to target natural language processing
Engine, available target semanteme processing result, wherein the target semanteme processing result is for the input voice flow
Text is replied, and then based on the semantic processes as a result, carrying out the calculating of semantic level, including semantic matches, semantic retrieval, text
This classification, sequence generation and sequence labelling etc., so that it is determined that the confidence level of semantic processes result, due to different target nature language
The determination method difference to confidence level of speech processing engine, may cause between each confidence level and does not have referential, will be described
Each confidence level be normalized or other processing after be compared, by the highest semantic processes of confidence level in each confidence level
As a result it is used as target semanteme processing result.
S105, the input voice flow is replied according to the target semanteme processing result.
It is by target language described in text using speech synthesis TTS (Text-To-Speech) technology in the embodiment of the present invention
Adopted processing result is converted into output voice flow, and reads out by the way that the intelligent sound equipment is bright, is analogous to the mouth of the mankind.Example
Such as: the sound heard in the various voice assistants of Siri is generated by TTS.
The invention discloses a kind of voice interactive methods, comprising: obtains input voice flow, input voice flow is distributed to respectively
A speech recognition engine carries out speech recognition, chooses target voice recognition result obtaining each speech recognition result;By target
Speech recognition result is distributed to each natural language processing engine, chooses at target semanteme in obtaining each semantic processes result
Manage result;Input voice flow is replied according to target semanteme processing result.In the above method, in each speech recognition result
In filter out target voice recognition result, multiple natural language processing engines are distributed to, in obtained each semantic processes
As a result target semanteme processing result is chosen in, is avoided interactive voice process and is handled using single ASR, NLP, TTS, office
It is sex-limited bigger, if the problem of ASR and/or NLP identification is not allowed, influences interactive voice.
In the embodiment of the present invention, according to the target semanteme processing result to the processing for inputting voice flow and being replied
Process as shown in Fig. 2, comprising steps of
S201, it obtains and the matched target retro of the target semanteme processing result and the determining generation input voice flow
User group;
In the embodiment of the present invention, the keyword in the target semanteme processing result is obtained, is determined according to the keyword
Technical ability unit corresponding with the target semanteme processing result receives handling for the target voice for technical ability unit feedback
Terminal objective is replied.Obtain the type and/or face language for identifying the target voice identification engine of the target voice recognition result
Sound recognition result determines the user for generating the input voice flow according to the type and/or the face speech recognition result
Group, the user group can be men and women, old and young, kinsfolk or the voice sender for using certain dialect or languages
Deng.
S202, according to the user group, determine target voice Compositing Engine;
In the embodiment of the present invention, speech synthesis engine selection can also in conjunction with actual scene, vertical subdivision field into
Row divides, and according to the target group, determines target voice Compositing Engine, such as: the target voice Compositing Engine can be by
Language, which carries out classification, can be subdivided into Chinese and foreign language, and Chinese can be subdivided into mandarin and dialect again, according to particular situation
It can also continue to segment for dialect, foreign language can be English, Japanese, Korean etc., in the embodiment of the present invention, to the specific of classification
Form is without limiting.Such as: if the user group is the sender of dialect, therefore target voice identification engine can be adopted
With target voice corresponding with dialect type identify engine, then can directly according to dialect type select speech synthesis engine as
Target voice Compositing Engine.
S203, the target retro is converted into output voice flow by the target voice Compositing Engine.
In the embodiment of the present invention, the target retro is converted into output voice by the target voice Compositing Engine
The type of stream, the target voice Compositing Engine is different, and the mode of reply is different.The target voice Compositing Engine can also be according to
It is identified according to face recognition technology by user's portrait, such as: the intelligent sound terminal is recognized according to face recognition technology
Received be input voice flow is mother's word, and is analyzed to obtain mother by historical record or the reply rule of setting
Mother most wants to hear the sound of son, at this point, target voice Compositing Engine can be sent out the target retro using the sound of son
Be sent to the intelligent sound terminal, certainly also can also according to particular situation by the target retro by English, dialect or
Person's others mode is sent to the intelligent sound terminal.
Based on a kind of above-mentioned voice interactive method, a kind of voice interactive system is provided in the embodiment of the present invention, it is described
The structural block diagram of interactive system is as shown in Figure 3, comprising: Cloud Server 301, speech recognition module 302, semantic processes module 303,
Technical ability module 304, voice synthetic module 305 and intelligent sound terminal 306, wherein
The Cloud Server 301 is used to obtain the input voice flow that the intelligent sound terminal 306 acquires, by the input
Voice flow is distributed to the speech recognition module 302 and carries out speech recognition, obtains target voice recognition result;
In the embodiment of the present invention, the speech recognition module 302 includes multiple speech recognition engines, it is preferred that in order to mention
High recognition efficiency can preferentially screen multiple speech recognition engines in speech recognition process, obtain multiple target languages
Sound identifies engine, carries out speech recognition according to multiple target voices identification engine, selects in obtained each speech recognition result
Take the highest speech recognition result of discrimination as target voice recognition result.
The target voice recognition result is sent to the Cloud Server 301, the cloud by the speech recognition module 302
Semantic processes module 303 described in the target voice recognition result is obtained target semanteme processing result by server 301;
In the embodiment of the present invention, the speech recognition module 303 includes multiple natural language processing engines, it is preferred that is
Example improves treatment effeciency, can screen, obtain more to multiple natural language processing engines during natural language processing
The target voice recognition result is sent to multiple target natural language processing processing and drawn by a target natural language processing engine
It holds up, the highest semantic processes result of confidence level is chosen in obtained multiple semantic processes results as target semantic processes knot
Fruit.
The target semanteme processing result is sent to the Cloud Server 301, the cloud by the semantic processes module 303
The target semanteme processing result is sent to the technical ability module 304 by server 301, obtains target retro.
In the embodiment of the present invention, the technical ability module 304 according to the target semanteme processing result according to concrete condition into
Row processing, is replied if necessary to the intelligent sound terminal 306, then the result returned is target retro, if it is control
Instruction then continues to be handled in the technical ability module 304.The present invention is directed to the feelings returned the result as target retro in implementing
Condition is illustrated.Such as: user says " air-conditioning for opening parlor " that target voice recognition result is exactly " to open the sky in parlor
Adjust ", " field is air-conditioning, and instruction is to open, and specific location is parlor ", Cloud Server are translated into after natural language understanding
304 can distribute result in the technical ability module 304 in technical ability corresponding with air-conditioning according to field, and the technical ability of air-conditioning is according to finger
It enables and position, then can be opened the air-conditioning in parlor by controlling, return to target retro after success, such as the target retro can be with
For " good, parlor air-conditioning has already turned on ".
The target retro is sent to the Cloud Server 301 by the technical ability module 304, and the Cloud Server 301 will
The target retro is sent to the voice synthetic module 305, obtains output voice flow;
The output voice flow is sent to the Cloud Server 301, the Cloud Server by the voice synthetic module 305
The output voice flow is sent to the intelligent sound terminal 306 to play out.
Based on a kind of above-mentioned voice interactive method, a kind of voice interaction device is provided in the embodiment of the present invention, it is described
The structural block diagram of interactive device is as shown in Figure 4, comprising:
It obtains and identification module 401, speech recognition result chooses mould 402, processing module 403, processing result and choose mould 404
With reply module 405.
Wherein,
The input voice flow is distributed to each mesh for obtaining input voice flow by the acquisition and identification module 401
It marks speech recognition engine and carries out speech recognition, obtain each speech recognition result;
Institute's speech recognition result chooses module 402, for choosing target voice in each speech recognition result
Recognition result;
The processing module 403, for the target voice recognition result to be distributed to each target natural language processing
Engine obtains each semantic processes result;
The processing result chooses module 404, for choosing target semantic processes in each semantic processes result
As a result;
The reply module 405, for replying according to the target semanteme processing result the input voice flow.
The invention discloses a kind of voice interaction devices, comprising: obtains input voice flow, input voice flow is distributed to respectively
A speech recognition engine carries out speech recognition, chooses target voice recognition result obtaining each speech recognition result;By target
Speech recognition result is distributed to each natural language processing engine, chooses at target semanteme in obtaining each semantic processes result
Manage result;Input voice flow is replied according to target semanteme processing result.In above-mentioned apparatus, in each speech recognition result
In filter out target voice recognition result, multiple natural language processing engines are distributed to, in obtained each semantic processes
As a result target semanteme processing result is chosen in, is avoided interactive voice process and is handled using single ASR, NLP, TTS, office
It is sex-limited bigger, if the problem of ASR and/or NLP identification is not allowed, influences interactive voice.
In the embodiment of the present invention, the reply module 405 includes:
It obtains and determination unit 406, determination unit 407 and converting unit 408.
Wherein,
It is described acquisition and determination unit 406, for obtain with the matched target retro of the target semanteme processing result and
Determine the user group for generating the input voice flow;
The determination unit 407, for determining target voice Compositing Engine according to the user group;
The converting unit 408, for the target retro to be converted to output by the target voice Compositing Engine
Voice flow.
The voice interaction device includes processor and memory, and above-mentioned acquisition and identification module, speech recognition result are selected
Modulus, processing module, processing result are chosen mould and reply module etc. and are stored as program unit in memory, by processor
Above procedure unit stored in memory is executed to realize corresponding function.
Include kernel in processor, is gone in memory to transfer corresponding program unit by kernel.Kernel can be set one
Or more, target voice recognition result is filtered out in each speech recognition result, by the target voice recognition result
Multiple natural language processing engines are distributed to, target semanteme processing result is chosen in each semantic processes result, is avoided
Interactive voice process is handled using single ASR, NLP, TTS, and limitation is bigger, if ASR and/or NLP identification is not
Standard, the problem of influencing whether entire interactive voice process.
Memory may include the non-volatile memory in computer-readable medium, random access memory (RAM) and/
Or the forms such as Nonvolatile memory, if read-only memory (ROM) or flash memory (flash RAM), memory include that at least one is deposited
Store up chip.
The embodiment of the invention provides a kind of storage mediums, are stored thereon with program, real when which is executed by processor
The existing voice interactive method.
The embodiment of the invention provides a kind of processor, the processor is for running program, wherein described program operation
Voice interactive method described in Shi Zhihang.
The embodiment of the invention provides a kind of equipment, equipment include processor, memory and storage on a memory and can
The program run on a processor, processor perform the steps of when executing program
Input voice flow is obtained, the input voice flow is distributed to each target voice identification engine and carries out voice knowledge
Not, each speech recognition result is obtained;
Target voice recognition result is chosen in each speech recognition result;
The target voice recognition result is distributed to each target natural language processing engine, obtains each semantic processes
As a result;
Target semanteme processing result is chosen in each semantic processes result;
The input voice flow is replied according to the target semanteme processing result.
Above-mentioned method optionally chooses target voice recognition result in each speech recognition result, comprising:
Obtain the discrimination of each speech recognition result;
Using the highest recognition result of discrimination in each discrimination as target identification result.
Above-mentioned method optionally chooses target semanteme processing result in each semantic processes result, comprising:
Obtain the confidence level of each semantic processes result;
Using the highest semantic processes result of confidence level in each confidence level as target semanteme processing result.
Above-mentioned method optionally replys the input voice flow according to the target semanteme processing result, packet
It includes:
Obtain with the matched target retro of the target semanteme processing result and determine the use of the generation input voice flow
Family group;
According to the user group, target voice Compositing Engine is determined;
The target retro is converted into output voice flow by the target voice Compositing Engine.
Above-mentioned method, optionally, the determining user group for generating the input voice flow, comprising:
Obtain the type and/or face speech recognition for identifying the target voice identification engine of the target voice recognition result
As a result;
According to the type and/or the face speech recognition result, the user group is determined.
Equipment herein can be server, PC, PAD, mobile phone etc..
Present invention also provides a kind of computer program products, when executing on data processing equipment, have been adapted for carrying out
The program of following method and step:
Input voice flow is obtained, the input voice flow is distributed to each target voice identification engine and carries out voice knowledge
Not, each speech recognition result is obtained;
Target voice recognition result is chosen in each speech recognition result;
The target voice recognition result is distributed to each target natural language processing engine, obtains each semantic processes
As a result;
Target semanteme processing result is chosen in each semantic processes result;
The input voice flow is replied according to the target semanteme processing result.
Above-mentioned method optionally chooses target voice recognition result in each speech recognition result, comprising:
Obtain the discrimination of each speech recognition result;
Using the highest recognition result of discrimination in each discrimination as target identification result.
Above-mentioned method optionally chooses target semanteme processing result in each semantic processes result, comprising:
Obtain the confidence level of each semantic processes result;
Using the highest semantic processes result of confidence level in each confidence level as target semanteme processing result.
Above-mentioned method optionally replys the input voice flow according to the target semanteme processing result, packet
It includes:
Obtain with the matched target retro of the target semanteme processing result and determine the use of the generation input voice flow
Family group;
According to the user group, target voice Compositing Engine is determined;
The target retro is converted into output voice flow by the target voice Compositing Engine.
Above-mentioned method, optionally, the determining user group for generating the input voice flow, comprising:
Obtain the type and/or face speech recognition for identifying the target voice identification engine of the target voice recognition result
As a result;
According to the type and/or the face speech recognition result, the user group is determined.
It should be noted that all the embodiments in this specification are described in a progressive manner, each embodiment weight
Point explanation is the difference from other embodiments, and the same or similar parts between the embodiments can be referred to each other.
For device class embodiment, since it is basically similar to the method embodiment, so being described relatively simple, related place ginseng
See the part explanation of embodiment of the method.
Finally, it is to be noted that, herein, relational terms such as first and second and the like be used merely to by
One entity or operation are distinguished with another entity or operation, without necessarily requiring or implying these entities or operation
Between there are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant meaning
Covering non-exclusive inclusion, so that the process, method, article or equipment for including a series of elements not only includes that
A little elements, but also including other elements that are not explicitly listed, or further include for this process, method, article or
The intrinsic element of equipment.In the absence of more restrictions, the element limited by sentence "including a ...", is not arranged
Except there is also other identical elements in the process, method, article or apparatus that includes the element.
For convenience of description, it is divided into various units when description apparatus above with function to describe respectively.Certainly, implementing this
The function of each unit can be realized in the same or multiple software and or hardware when invention.
As seen through the above description of the embodiments, those skilled in the art can be understood that the present invention can
It realizes by means of software and necessary general hardware platform.Based on this understanding, technical solution of the present invention essence
On in other words the part that contributes to existing technology can be embodied in the form of software products, the computer software product
It can store in storage medium, such as ROM/RAM, magnetic disk, CD, including some instructions are used so that a computer equipment
(can be personal computer, server or the network equipment etc.) executes the certain of each embodiment or embodiment of the invention
Method described in part.
A kind of voice interactive method provided by the present invention, device, system, storage medium and processor are carried out above
It is discussed in detail, used herein a specific example illustrates the principle and implementation of the invention, above embodiments
Illustrate to be merely used to help understand method and its core concept of the invention;At the same time, for those skilled in the art, according to
According to thought of the invention, there will be changes in the specific implementation manner and application range, in conclusion the content of the present specification
It should not be construed as limiting the invention.
Claims (10)
1. a kind of voice interactive method characterized by comprising
Input voice flow is obtained, the input voice flow is distributed to each target voice identification engine and carries out speech recognition, is obtained
To each speech recognition result;
Target voice recognition result is chosen in each speech recognition result;
The target voice recognition result is distributed to each target natural language processing engine, obtains each semantic processes knot
Fruit;
Target semanteme processing result is chosen in each semantic processes result;
The input voice flow is replied according to the target semanteme processing result.
2. the method according to claim 1, wherein choosing target voice in each speech recognition result
Recognition result, comprising:
Obtain the discrimination of each speech recognition result;
Using the highest recognition result of discrimination in each discrimination as target identification result.
3. the method according to claim 1, wherein it is semantic to choose target in each semantic processes result
Processing result, comprising:
Obtain the confidence level of each semantic processes result;
Using the highest semantic processes result of confidence level in each confidence level as target semanteme processing result.
4. the method according to claim 1, wherein according to the target semanteme processing result to the input language
Sound stream is replied, comprising:
Obtain with the matched target retro of the target semanteme processing result and determine the user group of the generation input voice flow
Body;
According to the user group, target voice Compositing Engine is determined;
The target retro is converted into output voice flow by the target voice Compositing Engine.
5. according to the method described in claim 4, it is characterized in that, the determining user group for generating the input voice flow
Body, comprising:
Obtain the type and/or face speech recognition knot for identifying the target voice identification engine of the target voice recognition result
Fruit;
According to the type and/or the face speech recognition result, the user group is determined.
6. a kind of voice interaction device characterized by comprising
It obtains and the input voice flow is distributed to each target voice and identified by identification module for obtaining input voice flow
Engine carries out speech recognition, obtains each speech recognition result;
Speech recognition result chooses module, for choosing target voice recognition result in each speech recognition result;
Processing module obtains each for the target voice recognition result to be distributed to each target natural language processing engine
A semantic processes result;
Processing result chooses module, for choosing target semanteme processing result in each semantic processes result;
Module is replied, for replying according to the target semanteme processing result the input voice flow.
7. device according to claim 6, which is characterized in that the reply module includes:
Acquisition and determination unit for acquisition and the matched target retro of the target semanteme processing result and determine described in generation
Input the user group of voice flow;
Determination unit, for determining target voice Compositing Engine according to the user group;
Converting unit, for the target retro to be converted to output voice flow by the target voice Compositing Engine.
8. a kind of voice interactive system characterized by comprising Cloud Server, speech recognition module, semantic processes module, skill
Energy module, voice synthetic module and intelligent sound terminal, wherein
The Cloud Server is used to obtain the input voice flow of the intelligent sound terminal acquisition, and the input voice flow is distributed
Speech recognition is carried out to the speech recognition module, obtains target voice recognition result;
The target voice recognition result is sent to the Cloud Server by the speech recognition module, and the Cloud Server is by institute
Semantic processes module described in target voice recognition result is stated, target semanteme processing result is obtained;
The target semanteme processing result is sent to the Cloud Server by the semantic processes module, and the Cloud Server is by institute
It states target semanteme processing result and is sent to the technical ability module, obtain target retro;
The target retro is sent to the Cloud Server by the technical ability module, and the Cloud Server sends out the target retro
The voice synthetic module is given, output voice flow is obtained;
The output voice flow is sent to the Cloud Server by the voice synthetic module, and the Cloud Server is by the output
Voice flow is sent to the intelligent sound terminal and plays out.
9. a kind of storage medium, which is characterized in that the storage medium includes the program of storage, wherein described program right of execution
Benefit require any one of 1 to 5 described in a kind of voice interactive method.
10. a kind of processor, which is characterized in that the processor is for running program, wherein right of execution when described program is run
Benefit require any one of 1 to 5 described in a kind of voice interactive method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910910484.6A CN110491383B (en) | 2019-09-25 | 2019-09-25 | Voice interaction method, device and system, storage medium and processor |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910910484.6A CN110491383B (en) | 2019-09-25 | 2019-09-25 | Voice interaction method, device and system, storage medium and processor |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110491383A true CN110491383A (en) | 2019-11-22 |
CN110491383B CN110491383B (en) | 2022-02-18 |
Family
ID=68544152
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910910484.6A Active CN110491383B (en) | 2019-09-25 | 2019-09-25 | Voice interaction method, device and system, storage medium and processor |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110491383B (en) |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111798848A (en) * | 2020-06-30 | 2020-10-20 | 联想(北京)有限公司 | Voice synchronous output method and device and electronic equipment |
CN111862949A (en) * | 2020-07-30 | 2020-10-30 | 北京小米松果电子有限公司 | Natural language processing method and device, electronic equipment and storage medium |
CN111883122A (en) * | 2020-07-22 | 2020-11-03 | 海尔优家智能科技(北京)有限公司 | Voice recognition method and device, storage medium and electronic equipment |
CN112003991A (en) * | 2020-09-02 | 2020-11-27 | 深圳壹账通智能科技有限公司 | Outbound method and related equipment |
CN112509565A (en) * | 2020-11-13 | 2021-03-16 | 中信银行股份有限公司 | Voice recognition method and device, electronic equipment and readable storage medium |
CN112614490A (en) * | 2020-12-09 | 2021-04-06 | 北京罗克维尔斯科技有限公司 | Method, device, medium, equipment, system and vehicle for generating voice instruction |
CN112820295A (en) * | 2020-12-29 | 2021-05-18 | 华人运通(上海)云计算科技有限公司 | Voice processing device and system, cloud server and vehicle |
CN112861542A (en) * | 2020-12-31 | 2021-05-28 | 思必驰科技股份有限公司 | Method and device for limiting scene voice interaction |
CN112992151A (en) * | 2021-03-15 | 2021-06-18 | 中国平安财产保险股份有限公司 | Speech recognition method, system, device and readable storage medium |
CN113077793A (en) * | 2021-03-24 | 2021-07-06 | 北京儒博科技有限公司 | Voice recognition method, device, equipment and storage medium |
WO2021135548A1 (en) * | 2020-06-05 | 2021-07-08 | 平安科技(深圳)有限公司 | Voice intent recognition method and device, computer equipment and storage medium |
CN113506565A (en) * | 2021-07-12 | 2021-10-15 | 北京捷通华声科技股份有限公司 | Speech recognition method, speech recognition device, computer-readable storage medium and processor |
CN114446279A (en) * | 2022-02-18 | 2022-05-06 | 青岛海尔科技有限公司 | Voice recognition method, voice recognition device, storage medium and electronic equipment |
CN114464179A (en) * | 2022-01-28 | 2022-05-10 | 达闼机器人股份有限公司 | Voice interaction method, system, device, equipment and storage medium |
WO2022262542A1 (en) * | 2021-06-15 | 2022-12-22 | 南京硅基智能科技有限公司 | Text output method and system, storage medium, and electronic device |
Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5991719A (en) * | 1998-04-27 | 1999-11-23 | Fujistu Limited | Semantic recognition system |
CN101354886A (en) * | 2007-07-27 | 2009-01-28 | 陈修志 | Apparatus for recognizing speech |
US20090258333A1 (en) * | 2008-03-17 | 2009-10-15 | Kai Yu | Spoken language learning systems |
CN101923854A (en) * | 2010-08-31 | 2010-12-22 | 中国科学院计算技术研究所 | Interactive speech recognition system and method |
CN105096953A (en) * | 2015-08-11 | 2015-11-25 | 东莞市凡豆信息科技有限公司 | Voice recognition method capable of realizing multi-language mixed use |
CN106373569A (en) * | 2016-09-06 | 2017-02-01 | 北京地平线机器人技术研发有限公司 | Voice interaction apparatus and method |
CN106648082A (en) * | 2016-12-09 | 2017-05-10 | 厦门快商通科技股份有限公司 | Intelligent service device capable of simulating human interactions and method |
CN107093425A (en) * | 2017-03-30 | 2017-08-25 | 安徽继远软件有限公司 | Speech guide system, audio recognition method and the voice interactive method of power system |
CN107170446A (en) * | 2017-05-19 | 2017-09-15 | 深圳市优必选科技有限公司 | Semantic processes server and the method for semantic processes |
US10049656B1 (en) * | 2013-09-20 | 2018-08-14 | Amazon Technologies, Inc. | Generation of predictive natural language processing models |
CN208284230U (en) * | 2018-04-20 | 2018-12-25 | 贵州小爱机器人科技有限公司 | A kind of speech recognition equipment, speech recognition system and smart machine |
CN109545197A (en) * | 2019-01-02 | 2019-03-29 | 珠海格力电器股份有限公司 | Voice instruction identification method and device and intelligent terminal |
US20190102378A1 (en) * | 2017-09-29 | 2019-04-04 | Apple Inc. | Rule-based natural language processing |
CN109727597A (en) * | 2019-01-08 | 2019-05-07 | 未来电视有限公司 | The interaction householder method and device of voice messaging |
CN109791767A (en) * | 2016-09-30 | 2019-05-21 | 罗伯特·博世有限公司 | System and method for speech recognition |
-
2019
- 2019-09-25 CN CN201910910484.6A patent/CN110491383B/en active Active
Patent Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5991719A (en) * | 1998-04-27 | 1999-11-23 | Fujistu Limited | Semantic recognition system |
CN101354886A (en) * | 2007-07-27 | 2009-01-28 | 陈修志 | Apparatus for recognizing speech |
US20090258333A1 (en) * | 2008-03-17 | 2009-10-15 | Kai Yu | Spoken language learning systems |
CN101923854A (en) * | 2010-08-31 | 2010-12-22 | 中国科学院计算技术研究所 | Interactive speech recognition system and method |
US10049656B1 (en) * | 2013-09-20 | 2018-08-14 | Amazon Technologies, Inc. | Generation of predictive natural language processing models |
CN105096953A (en) * | 2015-08-11 | 2015-11-25 | 东莞市凡豆信息科技有限公司 | Voice recognition method capable of realizing multi-language mixed use |
CN106373569A (en) * | 2016-09-06 | 2017-02-01 | 北京地平线机器人技术研发有限公司 | Voice interaction apparatus and method |
CN109791767A (en) * | 2016-09-30 | 2019-05-21 | 罗伯特·博世有限公司 | System and method for speech recognition |
CN106648082A (en) * | 2016-12-09 | 2017-05-10 | 厦门快商通科技股份有限公司 | Intelligent service device capable of simulating human interactions and method |
CN107093425A (en) * | 2017-03-30 | 2017-08-25 | 安徽继远软件有限公司 | Speech guide system, audio recognition method and the voice interactive method of power system |
CN107170446A (en) * | 2017-05-19 | 2017-09-15 | 深圳市优必选科技有限公司 | Semantic processes server and the method for semantic processes |
US20190102378A1 (en) * | 2017-09-29 | 2019-04-04 | Apple Inc. | Rule-based natural language processing |
CN208284230U (en) * | 2018-04-20 | 2018-12-25 | 贵州小爱机器人科技有限公司 | A kind of speech recognition equipment, speech recognition system and smart machine |
CN109545197A (en) * | 2019-01-02 | 2019-03-29 | 珠海格力电器股份有限公司 | Voice instruction identification method and device and intelligent terminal |
CN109727597A (en) * | 2019-01-08 | 2019-05-07 | 未来电视有限公司 | The interaction householder method and device of voice messaging |
Non-Patent Citations (2)
Title |
---|
P VANAJAKSHI ET AL.: "A Detailed Survey on Large Vocabulary Continuous Speech Recognition Techniques", 《ICCCI 2017》 * |
刘悦等: "语音识别技术在车载领域的应用及发展", 《控制与信息技术》 * |
Cited By (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021135548A1 (en) * | 2020-06-05 | 2021-07-08 | 平安科技(深圳)有限公司 | Voice intent recognition method and device, computer equipment and storage medium |
CN111798848B (en) * | 2020-06-30 | 2024-05-31 | 联想(北京)有限公司 | Voice synchronous output method and device and electronic equipment |
CN111798848A (en) * | 2020-06-30 | 2020-10-20 | 联想(北京)有限公司 | Voice synchronous output method and device and electronic equipment |
CN111883122A (en) * | 2020-07-22 | 2020-11-03 | 海尔优家智能科技(北京)有限公司 | Voice recognition method and device, storage medium and electronic equipment |
CN111883122B (en) * | 2020-07-22 | 2023-10-27 | 海尔优家智能科技(北京)有限公司 | Speech recognition method and device, storage medium and electronic equipment |
CN111862949A (en) * | 2020-07-30 | 2020-10-30 | 北京小米松果电子有限公司 | Natural language processing method and device, electronic equipment and storage medium |
CN111862949B (en) * | 2020-07-30 | 2024-04-02 | 北京小米松果电子有限公司 | Natural language processing method and device, electronic equipment and storage medium |
CN112003991A (en) * | 2020-09-02 | 2020-11-27 | 深圳壹账通智能科技有限公司 | Outbound method and related equipment |
CN112509565A (en) * | 2020-11-13 | 2021-03-16 | 中信银行股份有限公司 | Voice recognition method and device, electronic equipment and readable storage medium |
CN112614490B (en) * | 2020-12-09 | 2024-04-16 | 北京罗克维尔斯科技有限公司 | Method, device, medium, equipment, system and vehicle for generating voice instruction |
CN112614490A (en) * | 2020-12-09 | 2021-04-06 | 北京罗克维尔斯科技有限公司 | Method, device, medium, equipment, system and vehicle for generating voice instruction |
CN112820295A (en) * | 2020-12-29 | 2021-05-18 | 华人运通(上海)云计算科技有限公司 | Voice processing device and system, cloud server and vehicle |
CN112820295B (en) * | 2020-12-29 | 2022-12-23 | 华人运通(上海)云计算科技有限公司 | Voice processing device and system, cloud server and vehicle |
CN112861542A (en) * | 2020-12-31 | 2021-05-28 | 思必驰科技股份有限公司 | Method and device for limiting scene voice interaction |
CN112861542B (en) * | 2020-12-31 | 2023-05-26 | 思必驰科技股份有限公司 | Method and device for voice interaction in limited scene |
CN112992151A (en) * | 2021-03-15 | 2021-06-18 | 中国平安财产保险股份有限公司 | Speech recognition method, system, device and readable storage medium |
CN112992151B (en) * | 2021-03-15 | 2023-11-07 | 中国平安财产保险股份有限公司 | Speech recognition method, system, device and readable storage medium |
CN113077793B (en) * | 2021-03-24 | 2023-06-13 | 北京如布科技有限公司 | Voice recognition method, device, equipment and storage medium |
CN113077793A (en) * | 2021-03-24 | 2021-07-06 | 北京儒博科技有限公司 | Voice recognition method, device, equipment and storage medium |
US11651139B2 (en) | 2021-06-15 | 2023-05-16 | Nanjing Silicon Intelligence Technology Co., Ltd. | Text output method and system, storage medium, and electronic device |
WO2022262542A1 (en) * | 2021-06-15 | 2022-12-22 | 南京硅基智能科技有限公司 | Text output method and system, storage medium, and electronic device |
CN113506565A (en) * | 2021-07-12 | 2021-10-15 | 北京捷通华声科技股份有限公司 | Speech recognition method, speech recognition device, computer-readable storage medium and processor |
CN113506565B (en) * | 2021-07-12 | 2024-06-04 | 北京捷通华声科技股份有限公司 | Speech recognition method, device, computer readable storage medium and processor |
WO2023143439A1 (en) * | 2022-01-28 | 2023-08-03 | 达闼机器人股份有限公司 | Speech interaction method, system and apparatus, and device and storage medium |
CN114464179A (en) * | 2022-01-28 | 2022-05-10 | 达闼机器人股份有限公司 | Voice interaction method, system, device, equipment and storage medium |
CN114464179B (en) * | 2022-01-28 | 2024-03-19 | 达闼机器人股份有限公司 | Voice interaction method, system, device, equipment and storage medium |
CN114446279A (en) * | 2022-02-18 | 2022-05-06 | 青岛海尔科技有限公司 | Voice recognition method, voice recognition device, storage medium and electronic equipment |
Also Published As
Publication number | Publication date |
---|---|
CN110491383B (en) | 2022-02-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110491383A (en) | A kind of voice interactive method, device, system, storage medium and processor | |
CN106776936B (en) | Intelligent interaction method and system | |
CN103345467B (en) | Speech translation system | |
JP2021018797A (en) | Conversation interaction method, apparatus, computer readable storage medium, and program | |
CN103456314B (en) | A kind of emotion identification method and device | |
CN110148416A (en) | Audio recognition method, device, equipment and storage medium | |
WO2019084810A1 (en) | Information processing method and terminal, and computer storage medium | |
CN110459222A (en) | Sound control method, phonetic controller and terminal device | |
WO2021114841A1 (en) | User report generating method and terminal device | |
CN107591155A (en) | Audio recognition method and device, terminal and computer-readable recording medium | |
US10108707B1 (en) | Data ingestion pipeline | |
CN108932945A (en) | A kind of processing method and processing device of phonetic order | |
CN108804609A (en) | Song recommendation method and device | |
CN108735201A (en) | continuous speech recognition method, device, equipment and storage medium | |
CN113051362B (en) | Data query method, device and server | |
US20200265843A1 (en) | Speech broadcast method, device and terminal | |
CN110162780A (en) | The recognition methods and device that user is intended to | |
CN105893351B (en) | Audio recognition method and device | |
CN109741735A (en) | The acquisition methods and device of a kind of modeling method, acoustic model | |
US20220261545A1 (en) | Systems and methods for producing a semantic representation of a document | |
CN110297893A (en) | Natural language question-answering method, device, computer installation and storage medium | |
CN108804525A (en) | A kind of intelligent Answering method and device | |
CN108763202A (en) | Method, apparatus, equipment and the readable storage medium storing program for executing of the sensitive text of identification | |
CN109410934A (en) | A kind of more voice sound separation methods, system and intelligent terminal based on vocal print feature | |
CN109739968A (en) | A kind of data processing method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |