CN110223694A - Method of speech processing, system and device - Google Patents
Method of speech processing, system and device Download PDFInfo
- Publication number
- CN110223694A CN110223694A CN201910563423.7A CN201910563423A CN110223694A CN 110223694 A CN110223694 A CN 110223694A CN 201910563423 A CN201910563423 A CN 201910563423A CN 110223694 A CN110223694 A CN 110223694A
- Authority
- CN
- China
- Prior art keywords
- speech recognition
- voice
- speech
- result
- recognition result
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/04—Details of speech synthesis systems, e.g. synthesiser structure or memory management
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G10L15/30—Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/225—Feedback of the input speech
Abstract
The embodiment of the present application discloses method of speech processing, system and device.One specific embodiment of this method includes: the user speech that receiving terminal apparatus is sent, and carries out speech recognition to the user speech, obtains speech recognition result;Institute's speech recognition result is sent to semantic service device, receives reply text that the semantic service device returns, for institute's speech recognition result;The reply text is sent to voice synthesizing server, the reply voice of the received voice synthesizing server transmission of institute is forwarded to the terminal device.The embodiment of the present application is omitted the result that terminal device returns to server and is analyzed and processed and generates request, has been effectively saved the processing time, and then can shorten terminal device and when user interacts, the reaction time of terminal device.
Description
Technical field
The invention relates to field of computer technology, and in particular at Internet technical field more particularly to voice
Manage mthods, systems and devices.
Background technique
In the related technology, during user and terminal device carry out interactive voice, terminal device and service are generally required
Device is repeatedly interacted.In general, terminal device needs successively to speech recognition server, semantics recognition server and language
Sound synthesis server sends processing request, to interact with these servers.
And terminal device to server send processing request before, need to be analyzed and processed, thus dragged slowly with
Family carries out reaction speed when interactive voice.Also, terminal device repeatedly with the communication process of server, it is also desirable to consumption is a large amount of
Time.
Summary of the invention
The embodiment of the present application proposes method of speech processing, system and device.
In a first aspect, the embodiment of the present application provides a kind of method of speech processing, it to be used for speech recognition server, this method
Include: the user speech that receiving terminal apparatus is sent, speech recognition is carried out to user speech, obtains speech recognition result;To language
Adopted server sends speech recognition result, receives reply text that semantic service device returns, for speech recognition result;To language
Sound synthesis server, which is sent, replys text, and the reply voice that the received voice synthesizing server of institute is sent is to terminal device turn
Hair.
In some embodiments, speech recognition server is set to same with semantic service device, voice synthesizing server
In local area network.
In some embodiments, method further include: in response to obtaining speech recognition result, send voice to terminal device and know
Other result;And method further include: in response to receiving reply text, sent to terminal device and reply text.
In some embodiments, before sending speech recognition result to semantic service device, method further include: judge voice
Whether recognition result is effectively and related to the recognition result of a upper voice, generates the first judging result, wherein a upper voice
With user speech in the same wake-up interactive process;And speech recognition result is sent to semantic service device, comprising: Xiang Yuyi
Server sends speech recognition result, so that semantic service device judges whether speech recognition result meets default session semantic type
And generate the second judging result;And before sending speech recognition result to terminal device, method further include: receive semantic clothes
Second judging result of device feedback of being engaged in, is based on the first judging result and the second judging result, determines whether user speech is intentional
Adopted voice.
In some embodiments, speech recognition result is sent to terminal device, comprising: in response to determining that user speech is to have
Meaning voice sends speech recognition result to terminal device.
In some embodiments, it is based on the first judging result and the second judging result, determines whether user speech is intentional
Adopted voice, comprising: in response to determine at least one of the first judging result and the second judging result be it is yes, determine user speech
For significant voice.
In some embodiments, the first judging result and the second judging result are indicated in the form of numerical value, the first judgement knot
The numerical value of fruit is for characterizing speech recognition result effectively and probability relevant to the recognition result of a upper voice, the second judgement knot
The numerical value of fruit is for characterizing the probability that speech recognition result meets default session semantic type;And based on the first judging result and
Second judging result determines whether user speech is significant voice, comprising: the numerical value and second for determining the first judging result are sentenced
The sum of the numerical value of disconnected result;In response to determining and being greater than or equal to preset threshold, determine that user speech is significant voice.
In some embodiments, the numerical value of the second judging result is that semantic service device utilizes multiple default session semantic types
Maximum numerical value in multiple candidate values that model is determined.
Second aspect, the embodiment of the present application provide a kind of voice processing apparatus, are used for speech recognition server, the device
Include: voice recognition unit, be configured to the user speech of receiving terminal apparatus transmission, speech recognition is carried out to user speech,
Obtain speech recognition result;Text generation unit is configured to send speech recognition result to semantic service device, receives semantic clothes
At least one reply text that business device returns, for speech recognition result;Feedback unit is configured to speech synthesis service
Device sends at least one and replys the reply text in text, by the reply voice of institute's received voice synthesizing server transmission to end
End equipment forwarding, wherein reply voice is the reply text generation sent based on voice synthesizing server.
In some embodiments, speech recognition server is set to same with semantic service device, voice synthesizing server
In local area network.
In some embodiments, device further include: the first transmission unit is configured in response to obtain speech recognition knot
Fruit sends speech recognition result to terminal device;And method further include: the second transmission unit is configured in response to receive
To text is replied, is sent to terminal device and reply text.
In some embodiments, device further include: judging unit is configured to sending speech recognition to semantic service device
As a result before, judge whether speech recognition result is effectively and related to the recognition result of a upper voice, generate the first judgement knot
Fruit, wherein a upper voice and user speech are in the same wake-up interactive process;And text generation unit, comprising: first
Sending module is configured to send speech recognition result to semantic service device, so that semantic service device judges speech recognition result
Whether meet default session semantic type and generates the second judging result;And device further include: receiving unit is configured to
Before sending speech recognition result to terminal device, the second judging result of semantic service device feedback is received, based on the first judgement
As a result with the second judging result, determine whether user speech is significant voice.
In some embodiments, the first transmission unit, comprising: the second sending module is in response to determining that user speech is intentional
Adopted voice sends speech recognition result to terminal device.
In some embodiments, receiving unit comprises determining that module, be configured in response to determine the first judging result and
At least one of second judging result be it is yes, determine user speech be significant voice.
In some embodiments, the first judging result and the second judging result are indicated in the form of numerical value, the first judgement knot
The numerical value of fruit is for characterizing speech recognition result effectively and probability relevant to the recognition result of a upper voice, the second judgement knot
The numerical value of fruit is for characterizing the probability that speech recognition result meets default session semantic type;And based on the first judging result and
Second judging result determines whether user speech is significant voice, comprising: the numerical value and second for determining the first judging result are sentenced
The sum of the numerical value of disconnected result;In response to determining and being greater than or equal to preset threshold, determine that user speech is significant voice.
In some embodiments, the numerical value of the second judging result is that semantic service device utilizes multiple default session semantic types
Maximum numerical value in multiple candidate values that model is determined.
The third aspect, the embodiment of the present application provide a kind of speech processing system, including speech recognition server, semantic clothes
Business device and voice synthesizing server;Speech recognition server, for the user speech that receiving terminal apparatus is sent, to user speech
Speech recognition is carried out, speech recognition result is obtained, speech recognition result is sent to semantic service device, and by semantic service device
The reply text of return is sent to voice synthesizing server, receives the reply language for the reply text that voice synthesizing server is sent
Reply voice is sent to terminal device by sound.
In some embodiments, speech recognition server is set to same with semantic service device, voice synthesizing server
In local area network.
In some embodiments, speech recognition server is also used in response to obtaining speech recognition result, to terminal device
Send speech recognition result;And speech recognition server, it is also used to send in response to receiving reply text to terminal device
Reply text.
In some embodiments, semantic service device, be also used to receive text generation request, wherein text generation request be
Terminal device replys text and reply voice in response to not receiving in the first preset time period, sends to semantic service device
, text generation request includes speech recognition result, and the first preset time period receives speech recognition result with terminal device and makees
For time zero.
In some embodiments, voice synthesizing server is also used to receive speech synthesis request, wherein speech synthesis is asked
Seeking Truth terminal device is in response in the second preset time period, receiving reply text and not receiving reply voice, Xiang Yuyin
What synthesis server was sent, speech synthesis request includes replying text, and the second preset time period receives voice with terminal device
Recognition result replys text as time zero to receive.
In some embodiments, speech recognition server is also used before sending speech recognition result to semantic service device
In judge speech recognition result whether effectively and it is related to the recognition result of a upper voice, generation the first judging result, wherein
A upper voice and user speech are in the same wake-up interactive process;Speech recognition server is also used to semantic service device
Send speech recognition result;Semantic service device, is also used to judge whether speech recognition result meets default session semantic type simultaneously
Generate the second judging result;And speech recognition server is also used to connect before sending speech recognition result to terminal device
The second judging result of semantic service device feedback is received, the first judging result and the second judging result is based on, determines that user speech is
No is significant voice.
In some embodiments, speech recognition server is also used in response to determining user speech be significant voice, to
Terminal device sends speech recognition result.
In some embodiments, speech recognition server is also used in response to determining the first judging result and the second judgement
At least one of as a result be it is yes, determine that user speech is significant voice.
In some embodiments, the first judging result and the second judging result are indicated in the form of numerical value, the first judgement knot
The numerical value of fruit is for characterizing speech recognition result effectively and probability relevant to the recognition result of a upper voice, the second judgement knot
The numerical value of fruit is for characterizing the probability that speech recognition result meets default session semantic type;And speech recognition server, also
For determine the numerical value of the first judging result and the numerical value of the second judging result and;It is default in response to determining and being greater than or equal to
Threshold value determines that user speech is significant voice.
In some embodiments, semantic service device is also used to determine using multiple default session semantic type models more
A candidate values;Numerical value maximum in multiple candidate values is determined as to the numerical value of the second judging result.
Fourth aspect, the embodiment of the present application provide a kind of electronic equipment, comprising: one or more processors;Storage dress
It sets, for storing one or more programs, when one or more programs are executed by one or more processors, so that one or more
A processor realizes the method such as any embodiment in method of speech processing.
5th aspect, the embodiment of the present application provide a kind of computer readable storage medium, are stored thereon with computer journey
Sequence realizes the method such as any embodiment in method of speech processing when the program is executed by processor.
Speech processes scheme provided by the embodiments of the present application, firstly, the user speech that receiving terminal apparatus is sent, to user
Voice carries out speech recognition, obtains speech recognition result.Later, speech recognition result is sent to semantic service device, received semantic
Reply text that server returns, for speech recognition result.Text is replied finally, sending to voice synthesizing server, it will
The reply voice that sends of received voice synthesizing server forwarded to terminal device.Terminal device is omitted in the embodiment of the present application
The result returned to server is analyzed and processed and generates request, has been effectively saved the processing time, and then can shorten
When terminal device and user interact, the reaction time of terminal device.
Detailed description of the invention
By reading a detailed description of non-restrictive embodiments in the light of the attached drawings below, the application's is other
Feature, objects and advantages will become more apparent upon:
Fig. 1 is that this application can be applied to exemplary system architecture figures therein;
Fig. 2 is the flow chart according to one embodiment of the method for speech processing of the application;
Fig. 3 is the structural schematic diagram according to one embodiment of the speech processing system of the application;
Fig. 4 is the structural schematic diagram according to one embodiment of the voice processing apparatus of the application;
Fig. 5 is adapted for the structural schematic diagram for the computer system for realizing the electronic equipment of the embodiment of the present application.
Specific embodiment
The application is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched
The specific embodiment stated is used only for explaining related invention, rather than the restriction to the invention.It also should be noted that in order to
Convenient for description, part relevant to related invention is illustrated only in attached drawing.
It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase
Mutually combination.The application is described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.
Fig. 1 is shown can be using the exemplary system of the embodiment of the method for speech processing or voice processing apparatus of the application
System framework 100.
As shown in Figure 1, system architecture 100 may include terminal device 101, network 102 and server 103,104,105.
Network 102 between terminal device 101 and server 103,104,105 to provide the medium of communication link.Network 102 can be with
Including various connection types, such as wired, wireless communication link or fiber optic cables etc..
User can be used terminal device 101 and be interacted by network 102 with server 103,104,105, to receive or send out
Send message etc..Various telecommunication customer end applications can be installed, such as speech processing applications, video class are answered on terminal device 101
With, live streaming application, instant messaging tools, mailbox client, social platform software etc..
Here terminal device 101 can be hardware, be also possible to software.It, can be with when terminal device 101, being hardware
It is the various electronic equipments with display screen, including but not limited to smart phone, tablet computer, E-book reader, on knee
Portable computer and desktop computer etc..When terminal device 101 is software, it may be mounted at above-mentioned cited electronics and set
In standby.Multiple softwares or software module may be implemented into (such as providing multiple softwares of Distributed Services or software mould in it
Block), single software or software module also may be implemented into.It is not specifically limited herein.
Server 103,104,105 can be to provide the server of various services, may include speech recognition server, language
Adopted server and voice synthesizing server.In practice, server 103,104,105 can be set in the same local area network.
Such as the background server supported is provided terminal device 101.Background server can be to data such as the user speech received
It carries out the processing such as analyzing, and processing result (such as reply voice) is fed back into terminal device.
It should be noted that method of speech processing provided by the embodiment of the present application can be by server 103,104,105
Or terminal device 101 executes, correspondingly, voice processing apparatus can be set to be set in server 103,104,105 or terminal
In standby 101.
It should be understood that the number of terminal device, network and server in Fig. 1 is only schematical.According to realization need
It wants, can have any number of terminal device, network and server.
With continued reference to Fig. 2, the process 200 of one embodiment of the method for speech processing according to the application is shown.The language
Voice handling method, comprising the following steps:
Step 201, the user speech that receiving terminal apparatus is sent carries out speech recognition to user speech, obtains voice knowledge
Other result.
In the present embodiment, the executing subject (such as server shown in FIG. 1) of method of speech processing can receive terminal
The user speech that equipment is sent.Also, above-mentioned executing subject can carry out speech recognition to user speech, to obtain speech recognition
As a result.Specifically, speech recognition is to convert speech into the process of corresponding text.Here speech recognition result, which then refers to, to be turned
The text got in return.
Step 202, to semantic service device send speech recognition result, receive semantic service device return, for voice know
The reply text of other result.
In the present embodiment, above-mentioned executing subject can send obtained speech recognition result to semantic service device, and
Receive the reply text that semantic service device returns.Here reply text is the reply text for upper speech recognition result.
Specifically, semantic service device can be analyzed and processed speech recognition result, obtain during interacting with user,
For replying the reply text of user.Herein, obtained reply text is typically only a reply text.
In some optional implementations of the present embodiment, the above method further include:
In response to obtaining speech recognition result, speech recognition result is sent to terminal device;Text is replied in response to receiving
This, sends to terminal device and replys text.
In these optional implementations, above-mentioned executing subject can in response to obtaining speech recognition result, in time to
Terminal device sends speech recognition result.In this way, terminal device can show speech recognition result to user in time, text is avoided
Output delay.
Also, above-mentioned executing subject can send this time to terminal device in time in response to determining above-mentioned reply text
Multiple text.In this way, terminal device can be while broadcasting reply voice to user in time, text is replied in display.
In some optional application scenarios of these implementations, to semantic service device send speech recognition result it
Before, method further include: judge whether speech recognition result is effectively and related to the recognition result of a upper voice, generate first and sentence
Disconnected result, wherein a upper voice and user speech are in the same wake-up interactive process;And language is sent to semantic service device
Sound recognition result, comprising: speech recognition result is sent to semantic service device, so that semantic service device judges that speech recognition result is
It is no to meet default session semantic type and generate the second judging result;Before sending speech recognition result to terminal device, side
Method includes: to receive the second judging result of semantic service device feedback, is based on the first judging result and the second judging result, determines and uses
Whether family voice is significant voice.
In these optional application scenarios, above-mentioned executing subject can judge speech recognition result, Jin Ersheng
At the first judging result.And speech recognition result and the first judging result are sent to semantic service device, so that semantic service
Device judges whether speech recognition result meets default session semantic type.Whether above-mentioned executing subject determines user speech in turn
For significant voice.Specifically, above-mentioned executing subject needs to judge whether speech recognition result is effective, it is also necessary to judge that voice is known
Whether other result and the recognition result of a upper voice are related.When judge speech recognition result whether effectively and with a upper language
The recognition result of sound is related, can determine that the first judging result is yes.User speech is issued after following a voice closely
Voice, with a upper voice in the same wake-up interactive process.
Speech recognition result can effectively refer to that speech recognition result has clear meaning, can by the speech recognition result
It is exchanged.For example speech recognition result " today, weather was how " is effective, and " " is then invalid.With a upper language
It is continuous that the semanteme for the voice that the recognition result correlation of sound issues before and after then referring to, which is associated, semantic logic,.On such as
The recognition result of one voice is " today, how is weather ", and the recognition result of user speech is " weather of tomorrow ", then this two
The recognition result of a voice is related.For another example, the recognition result of a upper voice is " today, how is weather ", the knowledge of user speech
Not the result is that " oh ", then the recognition result of the two voices is uncorrelated.
Default session semantic type is the semantic type of pre-set session, is referred to as class of hanging down.For example, default
Session semantic type may include date type, cuisines type, navigation type etc..
Above-mentioned semantic service device can judge whether speech recognition result meets default session semantic category using various ways
Type.For example, determining that the keyword of speech recognition result is target keywords, it is corresponding pre- to search each default session semantic type
If whether including above-mentioned target keywords in keyword.If including the second judging result is to meet default session semantic type.
In practice, above-mentioned executing subject can receive the second judging result of semantic service device feedback, and be based on first
Judging result and the second judging result, it is final to determine whether user speech is significant voice.Significant voice refers to the voice
Speech recognition result is effective, and related to the recognition result of a upper voice.Here speech recognition result whether effectively and phase
It closes, needs to carry out comprehensive descision using the first judging result and the second judging result.
Specifically, above-mentioned executing subject can be determined using various ways based on the first judging result and the second judging result
Whether user speech is significant voice.For example, above-mentioned executing subject if it is determined that the first judging result and the second judging result all
It is yes, it is determined that user speech is significant voice.
Optionally, speech recognition result is sent to semantic service device, may include sending speech recognition to semantic service device
As a result with the first judging result.Correspondingly, semantic service device can be based on the first judging result, whether judge speech recognition result
Meet default session semantic type and generates the second judging result.
For example, the corresponding pass between the first judging result of characterization, speech recognition result and the second judging result can be preset
The mapping table of system, semantic service device can inquire the mapping table, and find and the first judging result and speech recognition
As a result corresponding second judging result.
Herein, semantic service device not only can feed back the second judging result to above-mentioned executing subject, can also be to above-mentioned
Executing subject feed back the first judging result, in this way, above-mentioned executing subject can the first judging result and second based on feedback sentence
Disconnected result determines whether user speech is significant voice in time.
The first judging result and the second judging result can be generated in the executing subject of these implementations, to determine user
Whether voice is significant, is preferably analyzed to realize user speech.
In some optional situations of these application scenarios, speech recognition result is sent to terminal device, may include:
In response to determining that user speech is significant voice, speech recognition result is sent to terminal device.
In these cases, however, it is determined that the user speech is significant voice, and above-mentioned executing subject can be to terminal device
Send speech recognition result.In addition, above-mentioned executing subject can abandon above-mentioned voice if user speech is not significant voice
Recognition result.Executing subject in the case of these can be in the case where user speech be significant voice, just to terminal device
Speech recognition result is fed back, and user says that the corresponding sentence of some meaningless voices is then not necessarily to show to user, to subtract
The process of few invalidation and the degree of intelligence for improving equipment.
Optionally, above-mentioned to be based on the first judging result and the second judging result, determine whether user speech is significant language
Sound, may include: in response to determine at least one of the first judging result and the second judging result be it is yes, determine user speech
For significant voice.
These implementations can use the judging result of speech recognition server and the judging result of semantic service device
Neatly determine whether user speech is significant voice, to avoid speech recognition server or semantic service device individually true
The mistake filtering that may cause when fixed significant voice or leakage filter process.For example, speech recognition result is " tomorrow ", it should
A upper voice for voice is " today, how is weather ".Speech recognition server is determining speech recognition result and a upper voice
Recognition result it is whether relevant during, it is possible that erroneous judgement, to obtain the first unrelated judging result.And semantic clothes
Business device can then determine that speech recognition result meets the weather pattern in default session semantic type.
Optionally, the first judging result and the second judging result are indicated in the form of numerical value, the numerical value of the first judging result
For characterizing speech recognition result effectively and probability relevant to the recognition result of a upper voice, the numerical value of the second judging result
Meet the probability of default session semantic type for characterizing speech recognition result;And based on the first judging result and the second judgement
As a result, determining whether user speech is significant voice, comprising: determine the numerical value and the second judging result of the first judging result
The sum of numerical value;In response to determining and being greater than or equal to preset threshold, determine that user speech is significant voice.
Specifically, the first judging result and the second judging result can be presented in the form of numerical value.Numerical value is bigger, then generally
Rate is bigger, and the numerical value of two judging results be added and it is bigger.For example, preset threshold 15, for one of Zhang San
The speech recognition result of user speech, the numerical value of the first judging result are 5 (for example the full marks of the numerical value are 10), the second judgement knot
The numerical value of fruit is 10 (for example the full marks of the numerical value are 10), then the two numerical value and be 15, should and be equal to preset threshold, so
This user speech that can determine Zhang San is significant voice.
Optionally it is determined that the weighted sum of the numerical value of the first judging result and the numerical value of the second judging result;In response to determination
Weighted sum is greater than or equal to default Weighted Threshold, determines that user speech is significant voice.
Above-mentioned executing subject can not only determine the sum of judging result to determine whether user speech is significant voice, also
It can use the default weight of the first judging result and the default weight of the second judging result, to the first judging result and
Two judging results are weighted.And the comparison result of the weighted sum and default Weighted Threshold obtained using weighting, to determine user
Whether voice is significant voice.
In practice, during semantic service device generates the second judging result, multiple default meeting language be can use
Adopted Type model determines multiple candidate values, and therefrom chooses numerical value of the maximum numerical value as the second judging result.Each
Default session semantic type model can determine a candidate values to speech recognition result.
Specifically, default session semantic type model here can be the class model or mapping table etc. of hanging down.It lifts
For example, vertical class model can be date vertical class model, the vertical class model of navigation etc..Here vertical class model can be nerve net
Network model.For example, semantic service device can be defeated by the first judging result and speech recognition result if the class model that hangs down is neural network
Enter vertical class model, and obtains the second judging result exported from vertical class model.
Step 203, it is sent to voice synthesizing server and replys text, time that the received voice synthesizing server of institute is sent
Multiple voice is forwarded to terminal device.
In the present embodiment, the reply text received can be sent to voice synthesizing server by above-mentioned executing subject,
So that voice synthesizing server carries out speech synthesis, reply voice is obtained.Later, above-mentioned executing subject can receive speech synthesis
The reply voice that server is sent, and the reply voice is transmitted to terminal device.Voice synthesizing server carries out speech synthesis
It specifically can be and the reply text received carried out from Text To Speech (Text To Speech, TTS) processing, to obtain
The voice that can be broadcasted to user.
In some optional implementations of the present embodiment, speech recognition server and semantic service device, speech synthesis
Server is set in the same local area network.
In these optional implementations, speech recognition server and semantic service device and voice synthesizing server can
To be set in the same local area network.In this way, the communication speed between speech recognition server and semantic service device can be accelerated,
And accelerate the communication speed between speech recognition server and voice synthesizing server.
Terminal device in the prior art needs after obtaining information, generates request, and request is successively sent to voice
Identify server, semantic service device and voice synthesizing server.Also, terminal device also have to wait for each server to its
Feedback information, could obtain information, and whole process consumes the plenty of time.In comparison, the above process is omitted in the present embodiment,
Information transmitting is carried out between servers, has been effectively saved the processing time, and then can shorten terminal device and user's progress
When interaction, the reaction time of terminal device.
As shown in figure 3, present invention also provides a kind of speech processing system, including speech recognition server 310, semantic clothes
Business device 320 and voice synthesizing server 330.
Speech recognition server 310 carries out voice knowledge to user speech for the user speech that receiving terminal apparatus is sent
Not, speech recognition result is obtained, speech recognition result is sent to semantic service device 320, and semantic service device 320 is returned
Reply text be sent to voice synthesizing server 330, receive the reply language for the reply text that voice synthesizing server 330 is sent
Reply voice is sent to terminal device by sound.
In some optional implementations of the present embodiment, speech recognition server 310 and semantic service device 320, language
Sound synthesis server 330 is set in the same local area network.
In some optional implementations of the present embodiment, speech recognition server 310 is also used in response to obtaining language
Sound recognition result sends speech recognition result to terminal device.
In addition, speech recognition server 310, is also used to send and reply to terminal device in response to receiving reply text
Text.
In some optional implementations of the present embodiment, above-mentioned terminal device is also used to reply in response to receiving
Voice, and do not receive speech recognition result and reply at least one in text, it shows and broadcasts default revert statement.
Specifically, if terminal device has received reply voice, but speech recognition result and/or reply are not received
Text, terminal device can show the corresponding text of default revert statement, and broadcast the voice of default revert statement.For example, pre-
If revert statement can be " network is bad, woulds you please try again later ".In this way, these embodiments can be shown to avoid information it is incomplete
Problem avoids user that from can not accurately obtaining revert statement.
In some optional implementations of the present embodiment, semantic service device is also used to receive text generation request,
In, text generation request is terminal device in response in the first preset time period, not receiving reply text and reply voice,
It is sent to semantic service device, text generation request includes speech recognition result, and the first preset time period is received with terminal device
To speech recognition result as time zero.
Specifically, it if terminal device is after receiving speech recognition result, is not received by and replys text and reply
Voice then can send the text generation including speech recognition result to semantic service device 320 and request.In this way, semantic service device
320 can receive text generation request, and handle speech recognition result, generate and reply text.Here request is to ask
Semantic service device 320 is asked to generate the information for replying text.Later, semantic service device 320 can will reply text feedback to terminal
Equipment, then, it includes replying the speech synthesis request of text, and connect that terminal device can be sent to voice synthesizing server 330
Receive the reply voice that voice synthesizing server 330 is fed back.
In these implementations, semantic service device can connect in the case where not receiving reply text and reply voice
The request that terminal device is sent is received, to ensure going on smoothly for interactive voice.
In some optional implementations of the present embodiment, voice synthesizing server is also used to receive speech synthesis and asks
It asks, wherein speech synthesis request is terminal device in response in the second preset time period, receiving reply text and not receiving
It to reply voice, is sent to voice synthesizing server, speech synthesis request includes replying text, and the second preset time period is with end
End equipment receives speech recognition result or replys text as time zero to receive.
Specifically, if terminal device has received speech recognition result, and text is replied, but does not receive reply
Voice can then send speech synthesis request to voice synthesizing server 330.In this way, voice synthesizing server 330 can handle
Above-mentioned reply text generates reply voice, and reply voice is fed back to terminal device.
These implementations can be asked in the case where not receiving reply voice to the transmission of voice synthesizing server 330
It asks, to ensure going on smoothly for interactive voice.
In some optional implementations of the present embodiment, speech recognition server is sending language to semantic service device
Before sound recognition result, it is also used to judge whether speech recognition result is effectively and related to the recognition result of a upper voice, it is raw
At the first judging result, wherein a upper voice and user speech are in the same wake-up interactive process;Speech-recognition services
Device is also used to send speech recognition result to semantic service device;Semantic service device, is also used to judge whether speech recognition result accords with
It closes default session semantic type and generates the second judging result;And speech recognition server, voice is being sent to terminal device
Before recognition result, it is also used to receive the second judging result of semantic service device feedback, is sentenced based on the first judging result and second
Break as a result, determining whether user speech is significant voice.
In some optional implementations of the present embodiment, speech recognition server is also used in response to determining user
Voice is significant voice, sends speech recognition result to terminal device.
In some optional implementations of the present embodiment, speech recognition server is also used in response to determining first
At least one of judging result and the second judging result be it is yes, determine user speech be significant voice.
In some optional implementations of the present embodiment, the first judging result and the second judging result are with the shape of numerical value
Formula indicates that the numerical value of the first judging result is effectively and related to the recognition result of a upper voice for characterizing speech recognition result
Probability, the numerical value of the second judging result is for characterizing the probability that speech recognition result meets default session semantic type;And
Speech recognition server, be also used to determine the first judging result numerical value and the second judging result numerical value and;In response to true
Determine and be greater than or equal to preset threshold, determines that user speech is significant voice.
In some optional implementations of the present embodiment, semantic service device is also used to utilize multiple default meeting language
Adopted Type model determines multiple candidate values;Numerical value maximum in multiple candidate values is determined as to the number of the second judging result
Value.
Terminal device in the prior art needs after obtaining information, generates request, and request is successively sent to voice
Identify server, semantic service device and voice synthesizing server.Also, terminal device also have to wait for each server to its
Feedback information, could obtain information, and whole process consumes the plenty of time.In comparison, the above process is omitted in the present embodiment,
Information transmitting is carried out between servers, has been effectively saved the processing time, and then can shorten terminal device and user's progress
When interaction, the reaction time of terminal device.
With further reference to Fig. 4, as the realization to method shown in above-mentioned each figure, this application provides a kind of speech processes dresses
The one embodiment set, the Installation practice is corresponding with embodiment of the method shown in Fig. 2, which specifically can be applied to respectively
In kind electronic equipment.
As shown in figure 4, the voice processing apparatus 400 of the present embodiment includes: voice recognition unit 401, text generation unit
402 and feedback unit 403.Wherein, voice recognition unit 401, be configured to receiving terminal apparatus transmission user speech, to
Family voice carries out speech recognition, obtains speech recognition result;Text generation unit 402 is configured to send to semantic service device
Speech recognition result receives at least one reply text that semantic service device returns, for speech recognition result;Feedback unit
403, it is configured to send at least one to voice synthesizing server and replys the reply text in text, the received voice of institute is closed
It is forwarded at the reply voice that server is sent to terminal device, wherein reply voice is sent based on voice synthesizing server
Reply text generation.
In some embodiments, the voice recognition unit 401 of voice processing apparatus 400 can receive terminal device transmission
User speech.Also, above-mentioned executing subject can carry out speech recognition to user speech, to obtain speech recognition result.Specifically
Ground, speech recognition are to convert speech into the process of corresponding text.Here speech recognition result, which then refers to, to be converted to
Text.
In some embodiments, text generation unit 402 can send obtained speech recognition knot to semantic service device
Fruit, and receive the reply text of semantic service device return.Here reply text is the reply for upper speech recognition result
Text.Specifically, semantic service device can be analyzed and processed speech recognition result, obtain in the mistake interacted with user
Cheng Zhong, for replying the reply text of user.
In some embodiments, the reply text received can be sent to voice synthesizing server by feedback unit 403,
So that voice synthesizing server carries out speech synthesis, reply voice is obtained.Later, above-mentioned executing subject can receive speech synthesis
The reply voice that server is sent, and the reply voice is transmitted to terminal device.
In some optional implementations of the present embodiment, speech recognition server and semantic service device, speech synthesis
Server is set in the same local area network.
In some optional implementations of the present embodiment, device further include: the first transmission unit is configured to respond to
In obtaining speech recognition result, speech recognition result is sent to terminal device;And method further include: the second transmission unit, quilt
It is configured to receive reply text, is sent to terminal device and reply text.
In some optional implementations of the present embodiment, device further include: judging unit is configured in Xiang Yuyi
Server send speech recognition result before, judge speech recognition result whether effectively and the recognition result phase with a upper voice
It closes, generates the first judging result, wherein a upper voice and user speech are in the same wake-up interactive process;And text
Generation unit, comprising: the first sending module is configured to send speech recognition result to semantic service device, so that semantic service
Device judges whether speech recognition result meets default session semantic type and generate the second judging result;And device further include:
Receiving unit is configured to before sending speech recognition result to terminal device, and receive semantic service device feedback second is sentenced
Break as a result, determining whether user speech is significant voice based on the first judging result and the second judging result.
In some optional implementations of the present embodiment, the first transmission unit, comprising: the second sending module in response to
It determines that user speech is significant voice, sends speech recognition result to terminal device.
In some optional implementations of the present embodiment, receiving unit comprises determining that module, is configured in response to
Determine at least one of the first judging result and the second judging result be it is yes, determine user speech be significant voice.
In some optional implementations of the present embodiment, the first judging result and the second judging result are with the shape of numerical value
Formula indicates that the numerical value of the first judging result is effectively and related to the recognition result of a upper voice for characterizing speech recognition result
Probability, the numerical value of the second judging result is for characterizing the probability that speech recognition result meets default session semantic type;And
Based on the first judging result and the second judging result, determine whether user speech is significant voice, comprising: determine the first judgement
As a result the numerical value of numerical value and the second judging result and;In response to determining and being greater than or equal to preset threshold, user's language is determined
Sound is significant voice.
In some optional implementations of the present embodiment, the numerical value of the second judging result utilizes more for semantic service device
Maximum numerical value in multiple candidate values that a default session semantic type model is determined.
As shown in figure 5, electronic equipment 500 may include processing unit (such as central processing unit, graphics processor etc.)
501, random access can be loaded into according to the program being stored in read-only memory (ROM) 502 or from storage device 508
Program in memory (RAM) 503 and execute various movements appropriate and processing.In RAM 503, it is also stored with electronic equipment
Various programs and data needed for 500 operations.Processing unit 501, ROM 502 and RAM503 are connected with each other by bus 504.
Input/output (I/O) interface 505 is also connected to bus 504.
In general, following device can connect to I/O interface 505: including such as touch screen, touch tablet, keyboard, mouse, taking the photograph
As the input unit 506 of head, microphone, accelerometer, gyroscope etc.;Including such as liquid crystal display (LCD), loudspeaker, vibration
The output device 507 of dynamic device etc.;Storage device 508 including such as tape, hard disk etc.;And communication device 509.Communication device
509, which can permit electronic equipment 500, is wirelessly or non-wirelessly communicated with other equipment to exchange data.Although Fig. 5 shows tool
There is the electronic equipment 500 of various devices, it should be understood that being not required for implementing or having all devices shown.It can be with
Alternatively implement or have more or fewer devices.Each box shown in Fig. 5 can represent a device, can also root
According to needing to represent multiple devices.
Particularly, in accordance with an embodiment of the present disclosure, it may be implemented as computer above with reference to the process of flow chart description
Software program.For example, embodiment of the disclosure includes a kind of computer program product comprising be carried on computer-readable medium
On computer program, which includes the program code for method shown in execution flow chart.In such reality
It applies in example, which can be downloaded and installed from network by communication device 509, or from storage device 508
It is mounted, or is mounted from ROM 502.When the computer program is executed by processing unit 501, the implementation of the disclosure is executed
The above-mentioned function of being limited in the method for example.It should be noted that the computer-readable medium of embodiment of the disclosure can be meter
Calculation machine readable signal medium or computer readable storage medium either the two any combination.Computer-readable storage
Medium for example may be-but not limited to-system, device or the device of electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor,
Or any above combination.The more specific example of computer readable storage medium can include but is not limited to: have one
Or the electrical connections of multiple conducting wires, portable computer diskette, hard disk, random access storage device (RAM), read-only memory (ROM),
Erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light
Memory device, magnetic memory device or above-mentioned any appropriate combination.In embodiment of the disclosure, computer-readable to deposit
Storage media can be any tangible medium for including or store program, which can be commanded execution system, device or device
Part use or in connection.And in embodiment of the disclosure, computer-readable signal media may include in base band
In or as carrier wave a part propagate data-signal, wherein carrying computer-readable program code.This propagation
Data-signal can take various forms, including but not limited to electromagnetic signal, optical signal or above-mentioned any appropriate combination.Meter
Calculation machine readable signal medium can also be any computer-readable medium other than computer readable storage medium, which can
Read signal medium can be sent, propagated or be transmitted for being used by instruction execution system, device or device or being tied with it
Close the program used.The program code for including on computer-readable medium can transmit with any suitable medium, including but not
It is limited to: electric wire, optical cable, RF (radio frequency) etc. or above-mentioned any appropriate combination.
Flow chart and block diagram in attached drawing are illustrated according to the system of the various embodiments of the application, method and computer journey
The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation
A part of one module, program segment or code of table, a part of the module, program segment or code include one or more use
The executable instruction of the logic function as defined in realizing.It should also be noted that in some implementations as replacements, being marked in box
The function of note can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are actually
It can be basically executed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.Also it to infuse
Meaning, the combination of each box in block diagram and or flow chart and the box in block diagram and or flow chart can be with holding
The dedicated hardware based system of functions or operations as defined in row is realized, or can use specialized hardware and computer instruction
Combination realize.
Being described in unit involved in the embodiment of the present application can be realized by way of software, can also be by hard
The mode of part is realized.Described unit also can be set in the processor, for example, can be described as: a kind of processor packet
Include voice recognition unit, text generation unit and feedback unit.Wherein, the title of these units is not constituted under certain conditions
Restriction to the unit itself, for example, voice recognition unit is also described as " user's language that receiving terminal apparatus is sent
Sound carries out speech recognition to user speech, obtains the unit of speech recognition result ".
As on the other hand, present invention also provides a kind of computer-readable medium, which be can be
Included in device described in above-described embodiment;It is also possible to individualism, and without in the supplying device.Above-mentioned calculating
Machine readable medium carries one or more program, when said one or multiple programs are executed by the device, so that should
Device: the user speech that receiving terminal apparatus is sent carries out speech recognition to user speech, obtains speech recognition result;To language
Adopted server sends speech recognition result, receives reply text that semantic service device returns, for speech recognition result;To language
Sound synthesis server, which is sent, replys text, and the reply voice that the received voice synthesizing server of institute is sent is to terminal device turn
Hair.
Above description is only the preferred embodiment of the application and the explanation to institute's application technology principle.Those skilled in the art
Member is it should be appreciated that invention scope involved in the application, however it is not limited to technology made of the specific combination of above-mentioned technical characteristic
Scheme, while should also cover in the case where not departing from foregoing invention design, it is carried out by above-mentioned technical characteristic or its equivalent feature
Any combination and the other technical solutions formed.Such as features described above has similar function with (but being not limited to) disclosed herein
Can technical characteristic replaced mutually and the technical solution that is formed.
Claims (21)
1. a kind of method of speech processing is used for speech recognition server, which comprises
The user speech that receiving terminal apparatus is sent carries out speech recognition to the user speech, obtains speech recognition result;
Send institute's speech recognition result to semantic service device, receive it is that the semantic service device returns, know for the voice
The reply text of other result;
The reply text is sent to voice synthesizing server, by the reply language of the received voice synthesizing server transmission of institute
Sound is forwarded to the terminal device.
2. according to the method described in claim 1, wherein, the speech recognition server and the semantic service device, institute's predicate
Sound synthesis server is set in the same local area network.
3. according to the method described in claim 1, wherein, the method also includes:
The speech recognition result in response to obtaining, Xiang Suoshu terminal device send institute's speech recognition result;And
The method also includes:
In response to receiving the reply text, Xiang Suoshu terminal device sends the reply text.
4. according to the method described in claim 3, wherein, it is described to semantic service device send institute's speech recognition result it
Before, the method also includes:
Judge whether institute's speech recognition result is effectively and related to the recognition result of a upper voice, generates the first judgement knot
Fruit, wherein a upper voice and the user speech are in the same wake-up interactive process;And
It is described to send institute's speech recognition result to semantic service device, comprising:
Institute's speech recognition result is sent to the semantic service device, so that the semantic service device judges the speech recognition knot
Whether fruit meets default session semantic type and generates the second judging result;And
Before transmission institute's speech recognition result to the terminal device, the method also includes:
Second judging result for receiving the semantic service device feedback, is sentenced based on first judging result and described second
Break as a result, determining whether the user speech is significant voice.
5. described to send institute's speech recognition result, packet to the terminal device according to the method described in claim 4, wherein
It includes:
It is significant voice in response to the determination user speech, Xiang Suoshu terminal device sends institute's speech recognition result.
6. according to the method described in claim 4, wherein, first judging result and described second that is based on judges knot
Fruit determines whether the user speech is significant voice, comprising:
Be in response at least one of determination first judging result and described second judging result it is yes, determine the user
Voice is significant voice.
7. according to the method described in claim 4, wherein, first judging result and second judging result are with numerical value
Form indicates that the numerical value of first judging result is for characterizing institute's speech recognition result effectively and the knowledge with a upper voice
The relevant probability of other result, the numerical value of second judging result for characterize institute's speech recognition result meet it is default can language
The probability of adopted type;And
It is described to be based on first judging result and second judging result, determine whether the user speech is significant language
Sound, comprising:
Determine the numerical value of first judging result and the numerical value of second judging result and;It is described and big in response to determining
In or equal to preset threshold, determine that the user speech is significant voice.
8. according to the method described in claim 7, wherein, the numerical value of second judging result is semantic service device utilization
Maximum numerical value in multiple candidate values that multiple default session semantic type models are determined.
9. a kind of speech processing system, including speech recognition server, semantic service device and voice synthesizing server;
The speech recognition server carries out voice to the user speech for the user speech that receiving terminal apparatus is sent
Identification, obtains speech recognition result, institute's speech recognition result is sent to the semantic service device, and the semanteme is taken
The reply text that business device returns is sent to the voice synthesizing server, receives described time that the voice synthesizing server is sent
The reply voice of multiple text, is sent to the terminal device for the reply voice.
10. system according to claim 9, wherein the speech recognition server and the semantic service device, institute's predicate
Sound synthesis server is set in the same local area network.
11. system according to claim 9, wherein
The speech recognition server, is also used to the speech recognition result in response to obtaining, and Xiang Suoshu terminal device sends institute
Speech recognition result;And
The speech recognition server is also used in response to receiving the reply text, described in Xiang Suoshu terminal device is sent
Reply text.
12. the system according to one of claim 9-11, wherein
The semantic service device is also used to receive text generation request, wherein the text generation request is the terminal device
In response in the first preset time period, not receiving the reply text and the reply voice, Xiang Suoshu semantic service device
It sends, the text generation request includes the speech recognition result, and first preset time period is with the terminal device
Institute's speech recognition result is received as time zero.
13. the system according to one of claim 9-11, wherein
The voice synthesizing server is also used to receive speech synthesis request, wherein the speech synthesis request is the terminal
Equipment is in response in the second preset time period, receiving the reply text and not receiving the reply voice, Xiang Suoshu
What voice synthesizing server was sent, the speech synthesis request includes the reply text, and second preset time period is with institute
It states terminal device and receives institute's speech recognition result or to receive the reply text as time zero.
14. system according to claim 11, wherein
The speech recognition server is also used to judge before transmission institute's speech recognition result to semantic service device
Whether institute's speech recognition result is effectively and related to the recognition result of a upper voice, generates the first judging result, wherein institute
A voice and the user speech are stated in the same wake-up interactive process;
The speech recognition server is also used to send institute's speech recognition result to the semantic service device;
The semantic service device, is also used to judge whether institute's speech recognition result meets default session semantic type and generate the
Two judging results;And
The speech recognition server is also used to connect before transmission institute's speech recognition result to the terminal device
Second judging result for receiving the semantic service device feedback, based on first judging result and the second judgement knot
Fruit determines whether the user speech is significant voice.
15. system according to claim 14, wherein the speech recognition server is also used in response to described in determination
User speech is significant voice, and Xiang Suoshu terminal device sends institute's speech recognition result.
16. system according to claim 14, wherein
The speech recognition server is also used in response in determination first judging result and second judging result
At least one be it is yes, determine the user speech be significant voice.
17. system according to claim 14, wherein first judging result and second judging result are with numerical value
Form indicate, the numerical value of first judging result for characterize institute's speech recognition result effectively and with a upper voice
The numerical value of the relevant probability of recognition result, second judging result meets default session for characterizing institute's speech recognition result
The probability of semantic type;And
The speech recognition server is also used to determine the numerical value of first judging result and the number of second judging result
The sum of value;In response to determining described and being greater than or equal to preset threshold, determine that the user speech is significant voice.
18. system according to claim 17, wherein
The semantic service device is also used to determine multiple candidate values using multiple default session semantic type models;By institute
State the numerical value that maximum numerical value in multiple candidate values is determined as second judging result.
19. a kind of voice processing apparatus, is used for speech recognition server, described device includes:
Voice recognition unit is configured to the user speech of receiving terminal apparatus transmission, carries out voice knowledge to the user speech
Not, speech recognition result is obtained;
Text generation unit is configured to send institute's speech recognition result to semantic service device, receives the semantic service device
At least one reply text returning, for institute's speech recognition result;
Feedback unit is configured to send the reply text at least one described reply text to voice synthesizing server, will
The reply voice that sends of the received voice synthesizing server forwarded to the terminal device, wherein the reply voice
It is the reply text generation sent based on the voice synthesizing server.
20. a kind of electronic equipment, comprising:
One or more processors;
Storage device, for storing one or more programs,
When one or more of programs are executed by one or more of processors, so that one or more of processors are real
Now such as method described in any one of claims 1-8.
21. a kind of computer readable storage medium, is stored thereon with computer program, wherein when the program is executed by processor
Realize such as method described in any one of claims 1-8.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111108547.XA CN113823282A (en) | 2019-06-26 | 2019-06-26 | Voice processing method, system and device |
CN201910563423.7A CN110223694B (en) | 2019-06-26 | 2019-06-26 | Voice processing method, system and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910563423.7A CN110223694B (en) | 2019-06-26 | 2019-06-26 | Voice processing method, system and device |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111108547.XA Division CN113823282A (en) | 2019-06-26 | 2019-06-26 | Voice processing method, system and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110223694A true CN110223694A (en) | 2019-09-10 |
CN110223694B CN110223694B (en) | 2021-10-15 |
Family
ID=67814866
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111108547.XA Pending CN113823282A (en) | 2019-06-26 | 2019-06-26 | Voice processing method, system and device |
CN201910563423.7A Active CN110223694B (en) | 2019-06-26 | 2019-06-26 | Voice processing method, system and device |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111108547.XA Pending CN113823282A (en) | 2019-06-26 | 2019-06-26 | Voice processing method, system and device |
Country Status (1)
Country | Link |
---|---|
CN (2) | CN113823282A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111477224A (en) * | 2020-03-23 | 2020-07-31 | 一汽奔腾轿车有限公司 | Human-vehicle virtual interaction system |
WO2021135713A1 (en) * | 2019-12-30 | 2021-07-08 | 华为技术有限公司 | Text-to-voice processing method, terminal and server |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9269354B2 (en) * | 2013-03-11 | 2016-02-23 | Nuance Communications, Inc. | Semantic re-ranking of NLU results in conversational dialogue applications |
CN107316643A (en) * | 2017-07-04 | 2017-11-03 | 科大讯飞股份有限公司 | Voice interactive method and device |
CN107943834A (en) * | 2017-10-25 | 2018-04-20 | 百度在线网络技术(北京)有限公司 | Interactive implementation method, device, equipment and storage medium |
CN108877792A (en) * | 2018-05-30 | 2018-11-23 | 北京百度网讯科技有限公司 | For handling method, apparatus, electronic equipment and the computer readable storage medium of voice dialogue |
CN109545185A (en) * | 2018-11-12 | 2019-03-29 | 百度在线网络技术(北京)有限公司 | Interactive system evaluation method, evaluation system, server and computer-readable medium |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105529028B (en) * | 2015-12-09 | 2019-07-30 | 百度在线网络技术(北京)有限公司 | Speech analysis method and apparatus |
CN106373569B (en) * | 2016-09-06 | 2019-12-20 | 北京地平线机器人技术研发有限公司 | Voice interaction device and method |
EP3561643B1 (en) * | 2017-01-20 | 2023-07-19 | Huawei Technologies Co., Ltd. | Method and terminal for implementing voice control |
CN107146618A (en) * | 2017-06-16 | 2017-09-08 | 北京云知声信息技术有限公司 | Method of speech processing and device |
-
2019
- 2019-06-26 CN CN202111108547.XA patent/CN113823282A/en active Pending
- 2019-06-26 CN CN201910563423.7A patent/CN110223694B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9269354B2 (en) * | 2013-03-11 | 2016-02-23 | Nuance Communications, Inc. | Semantic re-ranking of NLU results in conversational dialogue applications |
CN107316643A (en) * | 2017-07-04 | 2017-11-03 | 科大讯飞股份有限公司 | Voice interactive method and device |
CN107943834A (en) * | 2017-10-25 | 2018-04-20 | 百度在线网络技术(北京)有限公司 | Interactive implementation method, device, equipment and storage medium |
CN108877792A (en) * | 2018-05-30 | 2018-11-23 | 北京百度网讯科技有限公司 | For handling method, apparatus, electronic equipment and the computer readable storage medium of voice dialogue |
CN109545185A (en) * | 2018-11-12 | 2019-03-29 | 百度在线网络技术(北京)有限公司 | Interactive system evaluation method, evaluation system, server and computer-readable medium |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021135713A1 (en) * | 2019-12-30 | 2021-07-08 | 华为技术有限公司 | Text-to-voice processing method, terminal and server |
CN111477224A (en) * | 2020-03-23 | 2020-07-31 | 一汽奔腾轿车有限公司 | Human-vehicle virtual interaction system |
Also Published As
Publication number | Publication date |
---|---|
CN113823282A (en) | 2021-12-21 |
CN110223694B (en) | 2021-10-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108335696A (en) | Voice awakening method and device | |
CN109902186A (en) | Method and apparatus for generating neural network | |
CN108764487A (en) | For generating the method and apparatus of model, the method and apparatus of information for identification | |
CN108769745A (en) | Video broadcasting method and device | |
CN109190114A (en) | Method and apparatus for generating return information | |
CN108595628A (en) | Method and apparatus for pushed information | |
CN109829164A (en) | Method and apparatus for generating text | |
CN110263142A (en) | Method and apparatus for output information | |
CN108334498A (en) | Method and apparatus for handling voice request | |
CN109684188A (en) | Test method and device | |
CN110059623A (en) | Method and apparatus for generating information | |
CN109858045A (en) | Machine translation method and device | |
CN109873756A (en) | Method and apparatus for sending information | |
CN109344330A (en) | Information processing method and device | |
CN108959087A (en) | test method and device | |
CN108521516A (en) | Control method and device for terminal device | |
CN110223694A (en) | Method of speech processing, system and device | |
CN109862100A (en) | Method and apparatus for pushed information | |
CN110232920A (en) | Method of speech processing and device | |
CN109949806A (en) | Information interacting method and device | |
CN109688086A (en) | Authority control method and device for terminal device | |
CN115146038A (en) | Conversational AI platform with closed domain and open domain conversation integration | |
CN109492687A (en) | Method and apparatus for handling information | |
CN110008926A (en) | The method and apparatus at age for identification | |
CN109819042A (en) | For providing the method and apparatus of Software Development Kit |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20211014 Address after: 100176 101, floor 1, building 1, yard 7, Ruihe West 2nd Road, Beijing Economic and Technological Development Zone, Daxing District, Beijing Patentee after: Apollo Zhilian (Beijing) Technology Co.,Ltd. Address before: 100085 Baidu Building, 10 Shangdi Tenth Street, Haidian District, Beijing Patentee before: BAIDU ONLINE NETWORK TECHNOLOGY (BEIJING) Co.,Ltd. |