CN107943834A - Interactive implementation method, device, equipment and storage medium - Google Patents

Interactive implementation method, device, equipment and storage medium Download PDF

Info

Publication number
CN107943834A
CN107943834A CN201711008491.4A CN201711008491A CN107943834A CN 107943834 A CN107943834 A CN 107943834A CN 201711008491 A CN201711008491 A CN 201711008491A CN 107943834 A CN107943834 A CN 107943834A
Authority
CN
China
Prior art keywords
server
voice
semantic understanding
speech recognition
result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201711008491.4A
Other languages
Chinese (zh)
Other versions
CN107943834B (en
Inventor
常先堂
远超
陈怀亮
米雪
范中吉
唐海员
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Shanghai Xiaodu Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201711008491.4A priority Critical patent/CN107943834B/en
Publication of CN107943834A publication Critical patent/CN107943834A/en
Application granted granted Critical
Publication of CN107943834B publication Critical patent/CN107943834B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1815Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols

Abstract

The invention discloses interactive implementation method, device, equipment and storage medium, wherein method includes:Client obtains the voice data of user, and voice data is sent to speech recognition server, so that speech recognition server carries out voice data speech recognition, and voice recognition result is sent to semantic understanding server and carries out semantic understanding;Client obtains the voice messaging that voice synthesizing server is generated according to the reply content got, and voice messaging is reported to user, what reply content generated for semantic understanding server according to semantic understanding result.Using scheme of the present invention, the response speed of interactive voice can be lifted.

Description

Interactive implementation method, device, equipment and storage medium
【Technical field】
The present invention relates to Computer Applied Technology, more particularly to interactive implementation method, device, equipment and storage are situated between Matter.
【Background technology】
In interactive system, people and machine carry out natural language dialogue, i.e., are engaged in the dialogue with the language of the mankind, mainly Including three processes:Speech recognition (ASR, Automatic Speech Recognize) process, semantic understanding (NLU, Natural Language Understanding) process and phonetic synthesis (TTS, Text to Speech) process.
Wherein, speech recognition process refers to machine recognition user and has said what content, and it is true that semantic understanding process refers to machine Just understanding what content user has said, machine has got a real idea of the intention of user afterwards, it is necessary to provide suitable reply content, and Need to synthesize reply content after voice messaging and expressed by way of voice broadcast, this process is phonetic synthesis Process.
In traditional interactive system, what three above process typically serially carried out, moreover, each step is all visitor One or many hypertext transfer protocol of family end and server end (HTTP, HyperText Transfer Protocol) or Https network services.
As shown in Figures 1 to 3, wherein, nets of the Fig. 1 between existing customer end and speech recognition server (ASR Server) Network communication mode schematic diagram, network service sides of the Fig. 2 between existing customer end and semantic understanding server (NLU Server) Formula schematic diagram, network communication mode schematic diagrames of the Fig. 3 between existing customer end and voice synthesizing server (TTS Server).
Due to needing to carry out above-mentioned multiple network service, it can cause human-computer dialogue process occur (to be said from Human-to-Machine Words terminate to machine to start broadcast voice information) slow problem, that is, reduce the response speed of interactive voice.
【The content of the invention】
In view of this, the present invention provides interactive implementation method, device, equipment and storage medium, can be lifted The response speed of interactive voice.
Concrete technical scheme is as follows:
A kind of interactive implementation method, including:
Client obtains the voice data of user, the voice data is sent to speech recognition server, so as to described Speech recognition server carries out speech recognition to the voice data, and voice recognition result is sent to semantic understanding server Carry out semantic understanding;
The client obtains voice messaging of the voice synthesizing server according to the reply content generation got, and by institute It is the semantic understanding server according to the generation of semantic understanding result to state voice messaging and report to the user, the reply content 's.
According to one preferred embodiment of the present invention, the client obtains voice synthesizing server according in the reply got Holding the voice messaging of generation includes:
The client obtains the voice synthesizing server and is generated and sent according to the reply content to the client The voice messaging at end, the reply content are sent to for the semantic understanding server by the speech recognition server The client, the voice synthesizing server is sent to by the client;
Alternatively,
The client obtains the voice synthesizing server and is generated according to the reply content and known by the voice Other server is sent to the voice messaging of the client, and the reply content passes through institute for the semantic understanding server State speech recognition server and be sent to the voice synthesizing server.
A kind of interactive implementation method, including:
Speech recognition server obtains the voice data of the user from client;
The speech recognition server carries out speech recognition to the voice data, and voice recognition result is sent to language Reason and good sense solution server carries out semantic understanding;
The speech recognition server is obtained in the reply that the semantic understanding server is generated according to semantic understanding result Hold, the client is sent to by the reply content or according to the voice messaging that the reply content obtains.
According to one preferred embodiment of the present invention, the speech recognition server carries out speech recognition to the voice data, And by voice recognition result be sent to semantic understanding server carry out semantic understanding include:
The speech recognition server is before final voice recognition result is got, when meeting transmission condition every time When, then the part of speech recognition result currently got is sent to the semantic understanding server, so as to the semantic understanding Server carries out semantic understanding according to the part of speech recognition result got, obtains semantic understanding result;
The speech recognition server is when getting final voice recognition result, by the final speech recognition knot Fruit is sent to the semantic understanding server, each part of speech got before being determined so as to the semantic understanding server In recognition result whether the included final voice recognition result, if so, the final language that will then obtain before The corresponding semantic understanding result of sound recognition result is as final required semantic understanding as a result, if it is not, then to the final language Sound recognition result carries out semantic understanding, obtains final required semantic understanding result.
According to one preferred embodiment of the present invention, it is described by the reply content or the voice obtained according to the reply content Information, which is sent to the client, to be included:
The reply content is sent to the client by the speech recognition server, so that the client is by described in After reply content is sent to voice synthesizing server, being given birth to according to the reply content for the voice synthesizing server return is obtained Into voice messaging, and the voice messaging is reported to the user;
Alternatively,
The reply content is sent to the voice synthesizing server by the speech recognition server, and obtains institute's predicate The voice messaging generated according to the reply content that sound synthesis server returns;
The voice messaging is sent to the client by the speech recognition server, so that the client is by described in Voice messaging is reported to the user.
A kind of interactive implementation method, including:
Semantic understanding server obtains the voice recognition result from speech recognition server, according to the speech recognition knot Fruit carries out semantic understanding, and institute's speech recognition result passes through the user's to being obtained from client for the speech recognition server Voice data carries out what speech recognition obtained;
The semantic understanding server generates reply content according to semantic understanding result, and the reply content is sent to The speech recognition server, obtains so as to the speech recognition server by the reply content or according to the reply content Voice messaging be sent to the client.
According to one preferred embodiment of the present invention, the semantic understanding server obtains the voice from speech recognition server Recognition result, carrying out semantic understanding according to institute's speech recognition result includes:
The semantic understanding server carries out semantic understanding to the part of speech recognition result got every time, obtains semanteme Understand as a result, the part of speech recognition result is before final voice recognition result is got, when meeting transmission every time During condition, part of speech recognition result that what the speech recognition server was sent currently get;
The semantic understanding server obtains the final voice recognition result from the speech recognition server, determines In the before each part of speech recognition result got whether the included final voice recognition result, if so, then Using the corresponding semantic understanding result of the final voice recognition result obtained before as final required semantic understanding knot Fruit, if it is not, then carrying out semantic understanding to the final voice recognition result, obtains final required semantic understanding result.
A kind of interactive implementation method, including:
Voice synthesizing server obtains speech recognition server is sent, semantic understanding server according to semantic understanding knot The reply content of fruit generation;The semantic understanding result passes through to being obtained from the speech recognition for the semantic understanding server The voice recognition result of server carries out what semantic understanding obtained, and institute's speech recognition result leads to for the speech recognition server The voice data for crossing the user to being obtained from client carries out what speech recognition obtained;
The voice synthesizing server generates voice messaging according to the reply content, and the voice messaging is passed through institute State speech recognition server and be sent to the client, so that the client reports the voice messaging to the user.
A kind of interactive realization device, including:First processing units and second processing unit;
The first processing units, for obtaining the voice data of user, speech recognition is sent to by the voice data Server, so that the speech recognition server carries out speech recognition to the voice data, and voice recognition result is sent Semantic understanding is carried out to semantic understanding server;
The second processing unit, the voice generated for obtaining voice synthesizing server according to the reply content got Information, and it is the semantic understanding server according to semanteme that the voice messaging, which is reported to the user, the reply content, Understand result generation.
According to one preferred embodiment of the present invention,
The second processing unit obtain the voice synthesizing server according to the reply content generate and send come The voice messaging, the reply content are sent to described for the semantic understanding server by the speech recognition server Second processing unit, the voice synthesizing server is sent to by the second processing unit;
Alternatively,
The second processing unit obtains the voice synthesizing server and is generated according to the reply content and by described The voice messaging that speech recognition server is sent, the reply content pass through institute's predicate for the semantic understanding server Sound identification server is sent to the voice synthesizing server.
A kind of interactive realization device, including:3rd processing unit, fourth processing unit and the 5th processing are single Member;
3rd processing unit, for obtaining the voice data of the user from client;
The fourth processing unit, for carrying out speech recognition to the voice data, and voice recognition result is sent Semantic understanding is carried out to semantic understanding server;
5th processing unit, the reply generated for obtaining the semantic understanding server according to semantic understanding result Content, the client is sent to by the reply content or according to the voice messaging that the reply content obtains.
According to one preferred embodiment of the present invention,
The fourth processing unit is before final voice recognition result is got, when meeting transmission condition every time, The part of speech recognition result currently got is then sent to the semantic understanding server, so as to the semantic understanding service Device carries out semantic understanding according to the part of speech recognition result got, obtains semantic understanding result;
The fourth processing unit is when getting final voice recognition result, by the final voice recognition result The semantic understanding server is sent to, the part of speech that each time gets before being determined so as to the semantic understanding server is known In other result whether the included final voice recognition result, if so, the final voice that will then obtain before The corresponding semantic understanding result of recognition result is as final required semantic understanding as a result, if it is not, then to the final voice Recognition result carries out semantic understanding, obtains final required semantic understanding result.
According to one preferred embodiment of the present invention,
The reply content is sent to the client by the 5th processing unit, so that the client described will be returned After multiple content is sent to voice synthesizing server, being generated according to the reply content for the voice synthesizing server return is obtained Voice messaging, and the voice messaging is reported to the user;
Alternatively,
The reply content is sent to the voice synthesizing server by the 5th processing unit, and obtains the voice The voice messaging generated according to the reply content that synthesis server returns;
The voice messaging is sent to the client by the 5th processing unit, so that the client is by institute's predicate Message breath is reported to the user.
A kind of interactive realization device, including:6th processing unit and the 7th processing unit;
6th processing unit, for obtaining the voice recognition result from speech recognition server, according to institute's predicate Sound recognition result carries out semantic understanding, and institute's speech recognition result passes through to being obtained from client for the speech recognition server The voice data of user carry out speech recognition and obtain;
7th processing unit, for generating reply content according to semantic understanding result, and the reply content is sent out The speech recognition server is given, so that the speech recognition server is by the reply content or according to the reply content Obtained voice messaging is sent to the client.
According to one preferred embodiment of the present invention,
6th processing unit carries out semantic understanding to the part of speech recognition result got every time, obtains semantic reason Solution is as a result, the part of speech recognition result is before final voice recognition result is got, when meeting transmission bar every time During part, part of speech recognition result that what the speech recognition server was sent currently get;
6th processing unit obtains the final voice recognition result from the speech recognition server, determines it In the first each time part of speech recognition result got whether the included final voice recognition result, if so, then will The corresponding semantic understanding result of the final voice recognition result obtained before as final required semantic understanding as a result, If it is not, then carrying out semantic understanding to the final voice recognition result, final required semantic understanding result is obtained.
A kind of interactive realization device, including:8th processing unit and the 9th processing unit;
8th processing unit, for obtaining speech recognition server is sent, semantic understanding server according to language The reply content of reason and good sense solution result generation;The semantic understanding result passes through described to being obtained from for the semantic understanding server The voice recognition result of speech recognition server carries out what semantic understanding obtained, and institute's speech recognition result is the speech recognition Server carries out what speech recognition obtained by the voice data of the user to being obtained from client;
9th processing unit, for generating voice messaging according to the reply content, and the voice messaging is led to Cross the speech recognition server and be sent to the client, so that the client reports the voice messaging to the use Family.
A kind of computer equipment, including memory, processor and be stored on the memory and can be in the processor The computer program of upper operation, the processor realize method as described above when performing described program.
A kind of computer-readable recording medium, is stored thereon with computer program, real when described program is executed by processor Existing method as described above.
It can be seen that based on above-mentioned introduction using scheme of the present invention, speech recognition server is to being obtained from client User voice data carry out speech recognition, after voice recognition result is obtained, it is not necessary to be returned to client, but It is sent to semantic understanding server and carries out semantic understanding, correspondingly, semantic understanding server will can gives birth to according to semantic understanding result Into reply content be directly returned to speech recognition server, moreover, speech recognition server can directly transmit reply content To voice synthesizing server, and the voice messaging for being obtained from voice synthesizing server is sent to client and is reported, compared Can be used in the prior art, scheme of the present invention network service between speech recognition server and semantic understanding server with And the network service between speech recognition server and voice synthesizing server come replace client and semantic understanding server with And the network service between client and voice synthesizing server, and the network service speed between server and server is excellent Network service between client and server, so as to improve the response speed of interactive voice during human-computer dialogue.
【Brief description of the drawings】
Network communication mode schematic diagrames of the Fig. 1 between existing customer end and speech recognition server.
Network communication mode schematic diagrames of the Fig. 2 between existing customer end and semantic understanding server.
Network communication mode schematic diagrames of the Fig. 3 between existing customer end and voice synthesizing server.
Network services of the Fig. 4 between client of the present invention, speech recognition server and semantic understanding server Schematic diagram.
Fig. 5 is client of the present invention, speech recognition server, semantic understanding server and voice synthesizing server Between network communication mode schematic diagram.
Fig. 6 is the flow chart of interactive implementation method first embodiment of the present invention.
Fig. 7 is the flow chart of interactive implementation method second embodiment of the present invention.
Fig. 8 is the flow chart of interactive implementation method 3rd embodiment of the present invention.
Fig. 9 is the flow chart of interactive implementation method fourth embodiment of the present invention.
Figure 10 is the composition structure diagram of interactive realization device first embodiment of the present invention.
Figure 11 is the composition structure diagram of interactive realization device second embodiment of the present invention.
Figure 12 is the composition structure diagram of interactive realization device 3rd embodiment of the present invention.
Figure 13 is the composition structure diagram of interactive realization device fourth embodiment of the present invention.
Figure 14 is shown suitable for being used for the frame for the exemplary computer system/server 12 for realizing embodiment of the present invention Figure.
【Embodiment】
For problems of the prior art, propose a kind of interactive implementation in the present invention, can be lifted The response speed of interactive voice during human-computer dialogue.
In the prior art, two processes of speech recognition and semantic understanding are to separate processing, and client is known with voice respectively Other server and semantic understanding server carry out network service, and in scheme of the present invention, can be by speech recognition server Two-in-one operation is carried out in server end with semantic understanding server, i.e., is merged into Fig. 1 and Fig. 2 shown in Fig. 4, Fig. 4 is this Invent the network communication mode schematic diagram between the client, speech recognition server and semantic understanding server.
As shown in figure 4, client after the voice data of user is got, sends it to speech recognition server, language Sound identifies that server carries out speech recognition to voice data, is not to return to client after obtaining voice recognition result, but sends out Give semantic understanding server, semantic understanding server carries out semantic understanding to voice recognition result, obtain semantic understanding as a result, And reply content is generated according to semantic understanding result, reply content is sent to speech recognition server, then taken by speech recognition Reply content is sent to client by business device.
Afterwards, reply content can be sent to voice synthesizing server by client, and voice synthesizing server generation is corresponding Voice messaging, and voice messaging is returned into client, voice messaging is reported to user by client.
In above-mentioned processing mode, with the network service between speech recognition server and semantic understanding server come instead of visitor Network service between family end and semantic understanding server, and the network service speed between server and server is better than visitor Network service between family end and server, so as to improve the response speed of interactive voice during human-computer dialogue.It is actual to survey Test result is shown, after above-mentioned processing mode, can save the time of about 100~120ms.
In addition, in practical applications, speech recognition server is to use streaming when carrying out speech recognition to voice data Mode piecemeal speech recognition is carried out to voice flow, for this reason, proposed in scheme of the present invention, speech recognition A kind of implementation prefetched can be used between server and semantic understanding server.
That is speech recognition server is before final voice recognition result is got, when meeting transmission condition every time, The part of speech recognition result currently got (partial result) is then sent to semantic understanding server, semantic understanding Server carries out semantic understanding according to the part of speech recognition result got, obtains semantic understanding result.
Afterwards, when getting final voice recognition result, speech recognition server is by final voice recognition result Be sent to semantic understanding server, before semantic understanding server determines in each part of speech recognition result got whether Included final voice recognition result, if so, then by the corresponding semantic reason of the final voice recognition result obtained before Result is solved as final required semantic understanding as a result, if it is not, then being obtained to final voice recognition result progress semantic understanding Final required semantic understanding result.
That is, before final voice recognition result is got, speech recognition server will can be got Part of speech recognition result is sent to semantic understanding server and carries out semantic understanding, when getting final voice recognition result When, speech recognition server can be notified by certain mode semantic understanding server this be final voice recognition result, language Reason and good sense solution server is after judgement, if it is determined that certain part of speech recognition result once sent and final speech recognition before As a result it is identical, then then can be directly using the corresponding semantic understanding result of current part of speech recognition result before as final institute The semantic understanding needed result it is not necessary to spend additional time to carry out semantic understanding again, so that the time is saved, this process It can be described as " prediction prefetches ", i.e., when user does not finish words, ask semantic understanding result in advance.
For example when gathering the voice data of user, user terminates to speak in moment A, stop gathering in moment B, moment B Lag behind moment A, then to carrying out the part of speech recognition result obtained after speech recognition just by the end of the voice data of moment A It is likely to identical with the final voice recognition result to by the end of the voice data of moment B obtain after speech recognition.
Actual test is the results show that the success rate prefetched can reach 65%~75%, in this way, can be with Save the time of about 300~350ms.
As it was previously stated, client after the reply content from speech recognition server is got, can send out reply content Voice synthesizing server is given, voice synthesizing server generates corresponding voice messaging, and voice messaging is returned to client, Voice messaging is reported to user by client.
Itd is proposed in scheme of the present invention, speech recognition server can not be also sent to after reply content is got To client, but voice synthesizing server is transmitted directly to, and obtains the voice messaging of voice synthesizing server return, by language Message breath is sent to client.
By the above-mentioned means, replace client with the network service between speech recognition server and voice synthesizing server Network service between end and voice synthesizing server, and the network service speed between server and server is better than client Network service between end and server, so as to further improve the response speed of interactive voice during human-computer dialogue.
By the above-mentioned means, realize speech recognition server, semantic understanding server and voice synthesizing server Three-in-one operation.Fig. 5 is client of the present invention, speech recognition server, semantic understanding server and phonetic synthesis Network communication mode schematic diagram between server.As can be seen that except being client between client and speech recognition server Outside network service between end and server, other is the network service between server and server.
Actual test is the results show that by using directly carrying out net between speech recognition server and voice synthesizing server The mode of network communication, can further save the time of about 100~120ms.
In order to make technical scheme clearer, clear, develop simultaneously embodiment referring to the drawings, to institute of the present invention The scheme of stating is further described.
Obviously, described embodiment is part of the embodiment of the present invention, instead of all the embodiments.Based on the present invention In embodiment, all other embodiment that those skilled in the art are obtained without creative efforts, all Belong to the scope of protection of the invention.
Fig. 6 is the flow chart of interactive implementation method first embodiment of the present invention.As shown in fig. 6, including with Lower specific implementation.
In 601, client obtains the voice data of user, and voice data is sent to speech recognition server, so as to Speech recognition server carries out voice data speech recognition, and voice recognition result is sent to semantic understanding server and is carried out Semantic understanding.
The voice data of user is sent to speech recognition server and carries out speech recognition, speech recognition server by client Voice recognition result is transmitted directly to semantic understanding server, so that semantic understanding server carries out language to voice recognition result Reason and good sense solution.
In 602, client obtains the voice messaging that voice synthesizing server is generated according to the reply content got, and Voice messaging is reported to user, what reply content generated for semantic understanding server according to semantic understanding result.
After semantic understanding server gets semantic understanding result, the voice number for user can be generated according to the prior art According to reply content, and reply content is sent to speech recognition server.
Speech recognition server can use two kinds of processing modes, and one kind is that reply content is sent to client, a kind of It is that reply content is sent to voice synthesizing server.
If reply content is sent to client by speech recognition server, then client can be further by reply content Voice synthesizing server is sent to, and obtains voice synthesizing server and voice to client is generated and sent according to reply content Information.
If reply content is sent to voice synthesizing server by speech recognition server, then voice synthesizing server exists After generating voice messaging, voice messaging can be returned to speech recognition server, be sent out voice messaging by speech recognition server Give client.
Fig. 7 is the flow chart of interactive implementation method second embodiment of the present invention.As shown in fig. 7, comprises with Lower specific implementation.
In 701, speech recognition server obtains the voice data of the user from client.
At 702, speech recognition server carries out the voice data that gets speech recognition, and by voice recognition result It is sent to semantic understanding server and carries out semantic understanding.
Speech recognition server, can when meeting transmission condition every time before final voice recognition result is got The part of speech recognition result currently got is sent to semantic understanding server, so that semantic understanding server is according to acquisition The part of speech recognition result arrived carries out semantic understanding, obtains semantic understanding result.
Meet transmission condition and specifically refer to which kind of condition can be decided according to the actual requirements.Such as it was previously stated, speech recognition Server is that piecemeal carries out language to voice flow by the way of streaming when carrying out speech recognition to voice data Sound identification, then then can be sent to the part of speech recognition result currently got after every identification for completing a part Semantic understanding server, the part of speech recognition result currently got refer to all recognition results currently obtained, relative to For final voice recognition result, the voice recognition result currently got is typically incomplete, therefore can be referred to as Part of speech recognition result.
Semantic understanding server can carry out semantic understanding to the part of speech recognition result got every time, so as to obtain language Reason and good sense solution result.
Afterwards, speech recognition server is when getting final voice recognition result, by final voice recognition result Be sent to semantic understanding server, and by certain mode notify semantic understanding server this be final voice recognition result, Correspondingly, it is whether included final in each part of speech recognition result got before semantic understanding server can determine that Voice recognition result, if so, then using the corresponding semantic understanding result of the final voice recognition result obtained before as most Required semantic understanding to final voice recognition result progress semantic understanding as a result, if it is not, then obtain final required language eventually Reason and good sense solution result.
Illustrate:
Speech recognition server have sent part of speech recognition result twice to semantic understanding server altogether, be respectively part Voice recognition result a and part of speech recognition result b, if speech recognition server is sent to semantic understanding server most Whole voice recognition result is identical with part of speech recognition result b, then semantic understanding server can then identify part of speech As a result the corresponding semantic understanding results of b as final required semantic understanding as a result, if speech recognition server is sent to language The final voice recognition result and part of speech recognition result a and part of speech recognition result b of reason and good sense solution server are not It is identical, then semantic understanding server then needs to carry out semantic understanding to final voice recognition result, so as to obtain final institute The semantic understanding result needed.
In 703, speech recognition server is obtained in the reply that semantic understanding server is generated according to semantic understanding result Hold, reply content or the voice messaging obtained according to reply content are sent to client.
Semantic understanding server can generate reply content according to the prior art according to final required semantic understanding result, And reply content is sent to speech recognition server.
Reply content can be sent to client by speech recognition server, so that reply content is sent to voice by client After synthesis server, the voice messaging generated according to reply content that voice synthesizing server returns is obtained, and by voice messaging Report to user.Alternatively, reply content can be also sent to voice synthesizing server by speech recognition server, and obtain voice conjunction Voice messaging is further transmitted to client by the voice messaging returned into server, speech recognition server.
Fig. 8 is the flow chart of interactive implementation method 3rd embodiment of the present invention.As shown in figure 8, including with Lower specific implementation.
In 801, semantic understanding server obtains the voice recognition result from speech recognition server, is known according to voice Other result carries out semantic understanding, and voice recognition result is voice of the speech recognition server by the user to being obtained from client Data carry out what speech recognition obtained.
Speech recognition server obtains the voice data of the user from client, and carries out voice knowledge to voice data Not, voice recognition result is obtained.
Wherein, speech recognition server is before final voice recognition result is got, when meeting transmission condition every time When, the part of speech recognition result currently got can be sent to semantic understanding server, correspondingly, semantic understanding server Semantic understanding can be carried out to the part of speech recognition result got every time, obtain semantic understanding result.
Afterwards, speech recognition server, can be by final speech recognition knot when getting final voice recognition result Fruit is sent to semantic understanding server, and can be notified by certain mode semantic understanding server this be final speech recognition knot Fruit, it is correspondingly, whether included in each part of speech recognition result got before semantic understanding server can determine that Final voice recognition result, if so, then making the corresponding semantic understanding result of the final voice recognition result obtained before For final required semantic understanding as a result, if it is not, then being obtained final required to final voice recognition result progress semantic understanding Semantic understanding result.
In 802, semantic understanding server generates reply content according to semantic understanding result, and reply content is sent to Speech recognition server, so that reply content or the voice messaging obtained according to reply content are sent to by speech recognition server Client.
Semantic understanding server can generate reply content according to the prior art according to final required semantic understanding result, And reply content is sent to speech recognition server.
Afterwards, reply content can be sent to client by speech recognition server, so that client sends reply content After voice synthesizing server, the voice messaging generated according to reply content that voice synthesizing server returns is obtained, and by language Message breath is reported to user.Alternatively, reply content can be also sent to voice synthesizing server by speech recognition server, and obtain Voice messaging is further transmitted to client by the voice messaging that voice synthesizing server returns, speech recognition server.
Fig. 9 is the flow chart of interactive implementation method fourth embodiment of the present invention.As shown in figure 9, including with Lower specific implementation.
In 901, voice synthesizing server obtains speech recognition server is sent, semantic understanding server according to language The reply content of reason and good sense solution result generation;Semantic understanding result passes through to being obtained from speech-recognition services for semantic understanding server The voice recognition result of device carries out what semantic understanding obtained, and voice recognition result passes through to being obtained from visitor for speech recognition server The voice data of the user at family end carries out what speech recognition obtained.
In 902, voice synthesizing server generates voice messaging according to reply content, and voice messaging is known by voice Other server is sent to client, so that client reports voice messaging to user.
After speech recognition server gets the reply content from semantic understanding server, language can be transmitted directly to Sound synthesis server, voice synthesizing server can generate corresponding voice messaging, and voice messaging is sent to speech recognition clothes Business device, and then voice messaging is sent to client by speech recognition server.
It should be noted that for foregoing each method embodiment, in order to be briefly described, therefore it is all expressed as a series of Combination of actions, but those skilled in the art should know, the present invention and from the limitation of described sequence of movement because According to the present invention, some steps can use other orders or be carried out at the same time.Secondly, those skilled in the art should also know Know, embodiment described in this description belongs to preferred embodiment, and involved action and module are not necessarily of the invention It is necessary.
In the above-described embodiments, the description to each embodiment all emphasizes particularly on different fields, and does not have the portion being described in detail in some embodiment Point, it may refer to the associated description of other embodiments.
In short, using scheme described in above-mentioned each method embodiment, speech recognition server and semantic understanding server can be used Between network service and speech recognition server and voice synthesizing server between network service come replace client with Network service between semantic understanding server and client and voice synthesizing server, and between server and server Network service speed is better than the network service between client and server, is handed over so as to improve voice during human-computer dialogue Mutual response speed, moreover, by using the mode prefetched, saves the time needed for semantic understanding, so that further Improve the response speed of interactive voice during human-computer dialogue.
Above is the introduction on embodiment of the method, below by way of device embodiment, to scheme of the present invention into traveling One step explanation.
Figure 10 is the composition structure diagram of interactive realization device first embodiment of the present invention.Such as Figure 10 institutes Show, including:First processing units 1001 and second processing unit 1002.
First processing units 1001, for obtaining the voice data of user, speech-recognition services are sent to by voice data Device, so that speech recognition server carries out voice data speech recognition, and is sent to semantic understanding clothes by voice recognition result Business device carries out semantic understanding.
Second processing unit 1002, the voice generated for obtaining voice synthesizing server according to the reply content got Information, and voice messaging is reported to user, what reply content generated for semantic understanding server according to semantic understanding result.
Wherein, second processing unit 1002 can obtain voice synthesizing server according to reply content generate and send come language Message ceases, and reply content is sent to second processing unit 1002 for semantic understanding server by speech recognition server, by the Two processing units 1002 are sent to voice synthesizing server.
Generated according to reply content and known by voice alternatively, second processing unit 1002 can obtain voice synthesizing server The voice messaging that other server is sent, reply content are sent to voice for semantic understanding server by speech recognition server Synthesis server.
Figure 11 is the composition structure diagram of interactive realization device second embodiment of the present invention.Such as Figure 11 institutes Show, including:3rd processing unit 1101,1102 and the 5th processing unit 1103 of fourth processing unit.
3rd processing unit 1101, for obtaining the voice data of the user from client.
Fourth processing unit 1102, for carrying out speech recognition to voice data, and is sent to language by voice recognition result Reason and good sense solution server carries out semantic understanding.
5th processing unit 1103, in the reply generated according to semantic understanding result for obtaining semantic understanding server Hold, reply content or the voice messaging obtained according to reply content are sent to client.
Wherein, fourth processing unit 1102 is before final voice recognition result is got, when meeting transmission bar every time During part, the part of speech recognition result currently got can be sent to semantic understanding server, so as to semantic understanding server Semantic understanding is carried out according to the part of speech recognition result got, obtains semantic understanding result.
Fourth processing unit 1102 can send out final voice recognition result when getting final voice recognition result Semantic understanding server is given, is in each part of speech recognition result got before being determined so as to semantic understanding server No included final voice recognition result, if so, the corresponding semanteme of final voice recognition result that will then obtain before Result is understood as final required semantic understanding as a result, if it is not, then being obtained to final voice recognition result progress semantic understanding To final required semantic understanding result.
Reply content can be sent to client by the 5th processing unit 1103, so that reply content is sent to language by client After sound synthesis server, the voice messaging generated according to reply content that voice synthesizing server returns is obtained, and voice is believed Breath is reported to user.
Alternatively, reply content can be also sent to voice synthesizing server by the 5th processing unit 1103, and obtain voice conjunction The voice messaging generated according to reply content returned into server, afterwards, the 5th processing unit 1103 can send out voice messaging Client is given, so that client reports voice messaging to user.
Figure 12 is the composition structure diagram of interactive realization device 3rd embodiment of the present invention.Such as Figure 12 institutes Show, including:6th processing unit 1201 and the 7th processing unit 1202.
6th processing unit 1201, for obtaining the voice recognition result from speech recognition server, knows according to voice Other result carries out semantic understanding, and voice recognition result is voice of the speech recognition server by the user to being obtained from client Data carry out what speech recognition obtained.
7th processing unit 1202, for generating reply content according to semantic understanding result, and reply content is sent to Speech recognition server, so that reply content or the voice messaging obtained according to reply content are sent to by speech recognition server Client.
Wherein, the 6th processing unit 1201 can carry out semantic understanding to the part of speech recognition result got every time, obtain To semantic understanding as a result, part of speech recognition result is before final voice recognition result is got, when meeting hair every time When sending condition, part of speech recognition result that what speech recognition server was sent currently get.
6th processing unit 1202 can also obtain the final voice recognition result from speech recognition server, and determine In the before each part of speech recognition result got whether included final voice recognition result, if so, then by it Before the obtained corresponding semantic understanding result of final voice recognition result as final required semantic understanding as a result, if it is not, Semantic understanding then is carried out to final voice recognition result, obtains final required semantic understanding result.
Figure 13 is the composition structure diagram of interactive realization device fourth embodiment of the present invention.Such as Figure 13 institutes Show, including:8th processing unit 1301 and the 9th processing unit 1302.
8th processing unit 1301, for obtaining speech recognition server is sent, semantic understanding server according to language The reply content of reason and good sense solution result generation;Semantic understanding result passes through to being obtained from speech-recognition services for semantic understanding server The voice recognition result of device carries out what semantic understanding obtained, and voice recognition result passes through to being obtained from visitor for speech recognition server The voice data of the user at family end carries out what speech recognition obtained.
9th processing unit 1302, for generating voice messaging according to reply content, and voice messaging is known by voice Other server is sent to client, so that client reports voice messaging to user.
The specific workflow of above-mentioned each device embodiment refer to the related description in aforementioned approaches method embodiment, no longer Repeat.
In short, using scheme described in above-mentioned each device embodiment, speech recognition server and semantic understanding server can be used Between network service and speech recognition server and voice synthesizing server between network service come replace client with Network service between semantic understanding server and client and voice synthesizing server, and between server and server Network service speed is better than the network service between client and server, is handed over so as to improve voice during human-computer dialogue Mutual response speed, moreover, by using the mode prefetched, saves the time needed for semantic understanding, so that further Improve the response speed of interactive voice during human-computer dialogue.
Figure 14 is shown suitable for being used for the frame for the exemplary computer system/server 12 for realizing embodiment of the present invention Figure.The computer system/server 12 that Figure 14 is shown is only an example, function that should not be to the embodiment of the present invention and use Range band carrys out any restrictions.
As shown in figure 14, computer system/server 12 is showed in the form of universal computing device.Computer system/clothes The component of business device 12 can include but is not limited to:One or more processor (processing unit) 16, memory 28, connection are different The bus 18 of system component (including memory 28 and processor 16).
Bus 18 represents the one or more in a few class bus structures, including memory bus or Memory Controller, Peripheral bus, graphics acceleration port, processor or the local bus using any bus structures in a variety of bus structures.Lift For example, these architectures include but not limited to industry standard architecture (ISA) bus, microchannel architecture (MAC) Bus, enhanced isa bus, Video Electronics Standards Association (VESA) local bus and periphery component interconnection (PCI) bus.
Computer system/server 12 typically comprises various computing systems computer-readable recording medium.These media can be appointed What usable medium that can be accessed by computer system/server 12, including volatile and non-volatile medium, it is moveable and Immovable medium.
Memory 28 can include the computer system readable media of form of volatile memory, such as random access memory Device (RAM) 30 and/or cache memory 32.Computer system/server 12 may further include it is other it is removable/no Movably, volatile/non-volatile computer system storage medium.Only as an example, storage system 34 can be used for reading and writing Immovable, non-volatile magnetic media (Figure 14 is not shown, is commonly referred to as " hard disk drive ").Although not shown in Figure 14, It can provide for the disc driver to moving non-volatile magnetic disk (such as " floppy disk ") read-write, and to removable non-easy The CD drive of the property lost CD (such as CD-ROM, DVD-ROM or other optical mediums) read-write.In these cases, each Driver can be connected by one or more data media interfaces with bus 18.Memory 28 can include at least one journey Sequence product, the program product have one group of (for example, at least one) program module, these program modules are configured to perform this hair The function of bright each embodiment.
Program/utility 40 with one group of (at least one) program module 42, can be stored in such as memory 28 In, such program module 42 includes --- but being not limited to --- operating system, one or more application program, other programs Module and routine data, may include the realization of network environment in each or certain combination in these examples.Program mould Block 42 usually performs function and/or method in embodiment described in the invention.
Computer system/server 12 can also be (such as keyboard, sensing equipment, aobvious with one or more external equipments 14 Show device 24 etc.) communication, it can also enable a user to lead to the equipment that the computer system/server 12 interacts with one or more Letter, and/or any set with make it that the computer system/server 12 communicates with one or more of the other computing device Standby (such as network interface card, modem etc.) communicates.This communication can be carried out by input/output (I/O) interface 22.And And computer system/server 12 can also pass through network adapter 20 and one or more network (such as LAN (LAN), wide area network (WAN) and/or public network, such as internet) communication.As shown in figure 14, network adapter 20 is by total Line 18 communicates with other modules of computer system/server 12.It should be understood that calculated although not shown in the drawings, can combine Machine systems/servers 12 use other hardware and/or software module, include but not limited to:Microcode, device driver, redundancy Processing unit, external disk drive array, RAID system, tape drive and data backup storage system etc..
Processor 16 is stored in the program in memory 28 by operation, so as to perform various functions at application and data Reason, such as realize the method in the illustrated embodiment of Fig. 6,7,8 or 9.
The present invention discloses a kind of computer-readable recording medium, computer program is stored thereon with, the program quilt It will be realized when processor performs such as the method in the illustrated embodiment of Fig. 6,7,8 or 9.
Any combination of one or more computer-readable media can be used.Computer-readable medium can be calculated Machine readable signal medium or computer-readable recording medium.Computer-readable recording medium for example can be --- but it is unlimited In system, device or the device of --- electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor, or it is any more than combination.Calculate The more specifically example (non exhaustive list) of machine readable storage medium storing program for executing includes:Electrical connection with one or more conducting wires, just Take formula computer disk, hard disk, random access memory (RAM), read-only storage (ROM), erasable type and may be programmed read-only storage Device (EPROM or flash memory), optical fiber, portable compact disc read-only storage (CD-ROM), light storage device, magnetic memory device, Or above-mentioned any appropriate combination.In this document, computer-readable recording medium can any include or store journey The tangible medium of sequence, the program can be commanded the either device use or in connection of execution system, device.
Computer-readable signal media can include in a base band or as carrier wave a part propagation data-signal, Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including --- but It is not limited to --- electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be Any computer-readable medium beyond computer-readable recording medium, which can send, propagate or Transmission be used for by instruction execution system, device either device use or program in connection.
The program code included on computer-readable medium can be transmitted with any appropriate medium, including --- but it is unlimited In --- wireless, electric wire, optical cable, RF etc., or above-mentioned any appropriate combination.
It can be write with one or more programming languages or its combination for performing the computer that operates of the present invention Program code, described program design language include object oriented program language-such as Java,
Smalltalk, C++, further include conventional procedural programming language-such as " C " language or similar program Design language.Program code fully can on the user computer be performed, partly performed on the user computer, as one A independent software kit performs, part performs or remotely counting completely on the remote computer on the user computer for part Performed on calculation machine or server.In the situation of remote computer is related to, remote computer can pass through the net of any kind Network --- including LAN (LAN) or wide area network (WAN)-subscriber computer is connected to, or, it may be connected to outside calculates Machine (such as passing through Internet connection using ISP).
In several embodiments provided by the present invention, it should be understood that disclosed apparatus and method etc., can pass through Other modes are realized.For example, device embodiment described above is only schematical, for example, the division of the unit, Only a kind of division of logic function, can there is other dividing mode when actually realizing.
The unit illustrated as separating component may or may not be physically separate, be shown as unit The component shown may or may not be physical location, you can with positioned at a place, or can also be distributed to multiple In network unit.Some or all of unit therein can be selected to realize the mesh of this embodiment scheme according to the actual needs 's.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, can also That unit is individually physically present, can also two or more units integrate in a unit.Above-mentioned integrated list Member can both be realized in the form of hardware, can also be realized in the form of hardware adds SFU software functional unit.
The above-mentioned integrated unit realized in the form of SFU software functional unit, can be stored in one and computer-readable deposit In storage media.Above-mentioned SFU software functional unit is stored in a storage medium, including some instructions are used so that a computer It is each that equipment (can be personal computer, server, or network equipment etc.) or processor (processor) perform the present invention The part steps of embodiment the method.And foregoing storage medium includes:USB flash disk, mobile hard disk, read-only storage (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disc or CD etc. it is various Can be with the medium of store program codes.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all essences in the present invention God and any modification, equivalent substitution, improvement and etc. within principle, done, should be included within the scope of protection of the invention.

Claims (24)

  1. A kind of 1. interactive implementation method, it is characterised in that including:
    Client obtains the voice data of user, the voice data is sent to speech recognition server, so as to the voice Identify that server carries out the voice data speech recognition, and voice recognition result is sent to semantic understanding server and is carried out Semantic understanding;
    The client obtains voice messaging of the voice synthesizing server according to the reply content generation got, and by institute's predicate Message breath is reported to the user, what the reply content generated for the semantic understanding server according to semantic understanding result.
  2. 2. according to the method described in claim 1, it is characterized in that,
    The client obtains voice synthesizing server to be included according to the voice messaging that the reply content got generates:
    The client obtains the voice synthesizing server and is generated and sent according to the reply content to the client The voice messaging, the reply content are sent to described for the semantic understanding server by the speech recognition server Client, the voice synthesizing server is sent to by the client;
    Alternatively,
    The client obtains the voice synthesizing server and is generated according to the reply content and taken by the speech recognition Business device is sent to the voice messaging of the client, and the reply content passes through institute's predicate for the semantic understanding server Sound identification server is sent to the voice synthesizing server.
  3. A kind of 3. interactive implementation method, it is characterised in that including:
    Speech recognition server obtains the voice data of the user from client;
    The speech recognition server carries out speech recognition to the voice data, and voice recognition result is sent to semantic reason Solve server and carry out semantic understanding;
    The speech recognition server obtains the reply content that the semantic understanding server is generated according to semantic understanding result, will The reply content is sent to the client according to the voice messaging that the reply content obtains.
  4. 4. according to the method described in claim 3, it is characterized in that,
    The speech recognition server carries out speech recognition to the voice data, and voice recognition result is sent to semantic reason Solution server, which carries out semantic understanding, to be included:
    The speech recognition server is before final voice recognition result is got, will when meeting transmission condition every time The part of speech recognition result currently got is sent to the semantic understanding server, so as to the semantic understanding server root Semantic understanding is carried out according to the part of speech recognition result got, obtains semantic understanding result;
    The speech recognition server sends out the final voice recognition result when getting final voice recognition result The semantic understanding server is given, each part of speech identification got before being determined so as to the semantic understanding server As a result in whether the included final voice recognition result, if so, the final voice that will then obtain before is known The corresponding semantic understanding result of other result is as final required semantic understanding as a result, if it is not, then knowing to the final voice Other result carries out semantic understanding, obtains final required semantic understanding result.
  5. 5. according to the method described in claim 3, it is characterized in that,
    It is described by the reply content or the client is sent to according to the voice messaging that the reply content obtains to include:
    The reply content is sent to the client by the speech recognition server, so that the client is by the reply After content is sent to voice synthesizing server, obtain that the voice synthesizing server returns generates according to the reply content Voice messaging, and the voice messaging is reported to the user;
    Alternatively,
    The reply content is sent to the voice synthesizing server by the speech recognition server, and is obtained the voice and closed The voice messaging generated according to the reply content returned into server;
    The voice messaging is sent to the client by the speech recognition server, so that the client is by the voice Information is reported to the user.
  6. A kind of 6. interactive implementation method, it is characterised in that including:
    Semantic understanding server obtain the voice recognition result from speech recognition server, according to institute's speech recognition result into Row semantic understanding, institute's speech recognition result are voice of the speech recognition server by the user to being obtained from client Data carry out what speech recognition obtained;
    The semantic understanding server generates reply content according to semantic understanding result, and the reply content is sent to described Speech recognition server, so that the speech recognition server is by the reply content or the language obtained according to the reply content Message breath is sent to the client.
  7. 7. according to the method described in claim 6, it is characterized in that,
    The semantic understanding server obtains the voice recognition result from speech recognition server, according to the speech recognition knot Fruit, which carries out semantic understanding, to be included:
    The semantic understanding server carries out semantic understanding to the part of speech recognition result got every time, obtains semantic understanding As a result, the part of speech recognition result is before final voice recognition result is got, when meeting transmission condition every time When, part of speech recognition result that what the speech recognition server was sent currently get;
    The semantic understanding server obtains the final voice recognition result from the speech recognition server, before determining In each part of speech recognition result got whether the included final voice recognition result, if so, then by it The corresponding semantic understanding result of the final voice recognition result obtained before if as final required semantic understanding as a result, It is no, then semantic understanding is carried out to the final voice recognition result, obtain final required semantic understanding result.
  8. A kind of 8. interactive implementation method, it is characterised in that including:
    Voice synthesizing server obtains speech recognition server is sent, semantic understanding server and is given birth to according to semantic understanding result Into reply content;The semantic understanding result passes through to being obtained from the speech-recognition services for the semantic understanding server The voice recognition result of device carries out what semantic understanding obtained, and it is right that institute's speech recognition result passes through for the speech recognition server The voice data for being obtained from the user of client carries out what speech recognition obtained;
    The voice synthesizing server generates voice messaging according to the reply content, and the voice messaging is passed through institute's predicate Sound identification server is sent to the client, so that the client reports the voice messaging to the user.
  9. A kind of 9. interactive realization device, it is characterised in that including:First processing units and second processing unit;
    The first processing units, for obtaining the voice data of user, speech-recognition services are sent to by the voice data Device, so that the speech recognition server carries out speech recognition to the voice data, and is sent to language by voice recognition result Reason and good sense solution server carries out semantic understanding;
    The second processing unit, believes for obtaining the voice that voice synthesizing server is generated according to the reply content got Breath, and the voice messaging is reported and is managed to the user, the reply content for the semantic understanding server according to semanteme Solve result generation.
  10. 10. interactive realization device according to claim 9, it is characterised in that
    The second processing unit obtains the voice synthesizing server and is generated and sent according to the reply content come described in Voice messaging, the reply content are sent to described second for the semantic understanding server by the speech recognition server Processing unit, the voice synthesizing server is sent to by the second processing unit;
    Alternatively,
    The second processing unit obtains the voice synthesizing server and is generated according to the reply content and by the voice The voice messaging that identification server is sent, the reply content are known for the semantic understanding server by the voice Other server is sent to the voice synthesizing server.
  11. A kind of 11. interactive realization device, it is characterised in that including:3rd processing unit, fourth processing unit and Five processing units;
    3rd processing unit, for obtaining the voice data of the user from client;
    The fourth processing unit, for carrying out speech recognition to the voice data, and is sent to language by voice recognition result Reason and good sense solution server carries out semantic understanding;
    5th processing unit, in the reply generated according to semantic understanding result for obtaining the semantic understanding server Hold, the client is sent to by the reply content or according to the voice messaging that the reply content obtains.
  12. 12. interactive realization device according to claim 11, it is characterised in that
    The fourth processing unit, ought when meeting transmission condition every time before final voice recognition result is got Before the part of speech recognition result that gets be sent to the semantic understanding server, so as to the semantic understanding server according to The part of speech recognition result got carries out semantic understanding, obtains semantic understanding result;
    The fourth processing unit sends the final voice recognition result when getting final voice recognition result To the semantic understanding server, the part of speech that each time gets before being determined so as to the semantic understanding server identifies knot In fruit whether the included final voice recognition result, if so, the final speech recognition that will then obtain before As a result corresponding semantic understanding result as final required semantic understanding as a result, if it is not, then to the final speech recognition As a result semantic understanding is carried out, obtains final required semantic understanding result.
  13. 13. interactive realization device according to claim 11, it is characterised in that
    The reply content is sent to the client by the 5th processing unit, so that the client is by the reply After appearance is sent to voice synthesizing server, the language generated according to the reply content that the voice synthesizing server returns is obtained Message ceases, and the voice messaging is reported to the user;
    Alternatively,
    The reply content is sent to the voice synthesizing server by the 5th processing unit, and obtains the phonetic synthesis The voice messaging generated according to the reply content that server returns;
    The voice messaging is sent to the client by the 5th processing unit, so that the client believes the voice Breath is reported to the user.
  14. A kind of 14. interactive realization device, it is characterised in that including:6th processing unit and the 7th processing unit;
    6th processing unit, for obtaining the voice recognition result from speech recognition server, knows according to the voice Other result carries out semantic understanding, and institute's speech recognition result passes through the use to being obtained from client for the speech recognition server The voice data at family carries out what speech recognition obtained;
    7th processing unit, for generating reply content according to semantic understanding result, and the reply content is sent to The speech recognition server, obtains so as to the speech recognition server by the reply content or according to the reply content Voice messaging be sent to the client.
  15. 15. interactive realization device according to claim 14, it is characterised in that
    6th processing unit carries out semantic understanding to the part of speech recognition result got every time, obtains semantic understanding knot Fruit, the part of speech recognition result be before final voice recognition result is got, when every time meet transmission condition when, The part of speech recognition result that what the speech recognition server was sent currently get;
    6th processing unit obtains the final voice recognition result from the speech recognition server, each before determining In the secondary part of speech recognition result got whether the included final voice recognition result, if so, then will before If the obtained corresponding semantic understanding result of the final voice recognition result as final required semantic understanding as a result, It is no, then semantic understanding is carried out to the final voice recognition result, obtain final required semantic understanding result.
  16. A kind of 16. interactive realization device, it is characterised in that including:8th processing unit and the 9th processing unit;
    8th processing unit, is managed for obtaining speech recognition server is sent, semantic understanding server according to semanteme Solve the reply content of result generation;The semantic understanding result passes through to being obtained from the voice for the semantic understanding server The voice recognition result of identification server carries out what semantic understanding obtained, and institute's speech recognition result is the speech-recognition services Device carries out what speech recognition obtained by the voice data of the user to being obtained from client;
    9th processing unit, for generating voice messaging according to the reply content, and passes through institute by the voice messaging State speech recognition server and be sent to the client, so that the client reports the voice messaging to the user.
  17. 17. a kind of computer equipment, including memory, processor and it is stored on the memory and can be on the processor The computer program of operation, it is characterised in that the processor is realized when performing described program as any in claim 1~2 Method described in.
  18. 18. a kind of computer-readable recording medium, is stored thereon with computer program, it is characterised in that described program is processed Such as method according to any one of claims 1 to 2 is realized when device performs.
  19. 19. a kind of computer equipment, including memory, processor and it is stored on the memory and can be on the processor The computer program of operation, it is characterised in that the processor is realized when performing described program as any in claim 3~5 Method described in.
  20. 20. a kind of computer-readable recording medium, is stored thereon with computer program, it is characterised in that described program is processed The method as any one of claim 3~5 is realized when device performs.
  21. 21. a kind of computer equipment, including memory, processor and it is stored on the memory and can be on the processor The computer program of operation, it is characterised in that the processor is realized when performing described program as any in claim 6~7 Method described in.
  22. 22. a kind of computer-readable recording medium, is stored thereon with computer program, it is characterised in that described program is processed The method as any one of claim 6~7 is realized when device performs.
  23. 23. a kind of computer equipment, including memory, processor and it is stored on the memory and can be on the processor The computer program of operation, it is characterised in that the processor realizes side as claimed in claim 8 when performing described program Method.
  24. 24. a kind of computer-readable recording medium, is stored thereon with computer program, it is characterised in that described program is processed Device realizes method as claimed in claim 8 when performing.
CN201711008491.4A 2017-10-25 2017-10-25 Method, device, equipment and storage medium for implementing man-machine conversation Active CN107943834B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711008491.4A CN107943834B (en) 2017-10-25 2017-10-25 Method, device, equipment and storage medium for implementing man-machine conversation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711008491.4A CN107943834B (en) 2017-10-25 2017-10-25 Method, device, equipment and storage medium for implementing man-machine conversation

Publications (2)

Publication Number Publication Date
CN107943834A true CN107943834A (en) 2018-04-20
CN107943834B CN107943834B (en) 2021-06-11

Family

ID=61936489

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711008491.4A Active CN107943834B (en) 2017-10-25 2017-10-25 Method, device, equipment and storage medium for implementing man-machine conversation

Country Status (1)

Country Link
CN (1) CN107943834B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109637519A (en) * 2018-11-13 2019-04-16 百度在线网络技术(北京)有限公司 Interactive voice implementation method, device, computer equipment and storage medium
CN110223694A (en) * 2019-06-26 2019-09-10 百度在线网络技术(北京)有限公司 Method of speech processing, system and device
CN111524508A (en) * 2019-02-03 2020-08-11 上海蔚来汽车有限公司 Voice conversation system and voice conversation implementation method
CN111883120A (en) * 2020-07-15 2020-11-03 百度在线网络技术(北京)有限公司 Earphone electric quantity prompting method and device, electronic equipment and storage medium
CN111899732A (en) * 2020-06-17 2020-11-06 北京百度网讯科技有限公司 Voice input method and device and electronic equipment
CN112700769A (en) * 2020-12-26 2021-04-23 科大讯飞股份有限公司 Semantic understanding method, device, equipment and computer readable storage medium
CN114822540A (en) * 2022-06-29 2022-07-29 广州小鹏汽车科技有限公司 Vehicle voice interaction method, server and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002073080A (en) * 2000-09-01 2002-03-12 Fujitsu Ten Ltd Voice interactive system
US20080010058A1 (en) * 2006-07-07 2008-01-10 Robert Bosch Corporation Method and apparatus for recognizing large list of proper names in spoken dialog systems
CN103000052A (en) * 2011-09-16 2013-03-27 上海先先信息科技有限公司 Man-machine interactive spoken dialogue system and realizing method thereof
CN104679472A (en) * 2015-02-13 2015-06-03 百度在线网络技术(北京)有限公司 Man-machine voice interactive method and device
CN107016070A (en) * 2017-03-22 2017-08-04 北京光年无限科技有限公司 A kind of interactive method and device for intelligent robot
CN107170446A (en) * 2017-05-19 2017-09-15 深圳市优必选科技有限公司 Semantic processes server and the method for semantic processes

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002073080A (en) * 2000-09-01 2002-03-12 Fujitsu Ten Ltd Voice interactive system
US20080010058A1 (en) * 2006-07-07 2008-01-10 Robert Bosch Corporation Method and apparatus for recognizing large list of proper names in spoken dialog systems
CN103000052A (en) * 2011-09-16 2013-03-27 上海先先信息科技有限公司 Man-machine interactive spoken dialogue system and realizing method thereof
CN104679472A (en) * 2015-02-13 2015-06-03 百度在线网络技术(北京)有限公司 Man-machine voice interactive method and device
CN107016070A (en) * 2017-03-22 2017-08-04 北京光年无限科技有限公司 A kind of interactive method and device for intelligent robot
CN107170446A (en) * 2017-05-19 2017-09-15 深圳市优必选科技有限公司 Semantic processes server and the method for semantic processes

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
SURYANNARAYANACHANDAKA: "Support vector machines employing cross-correlation for emotional speech recognition", 《MEASUREMENT》 *
逄淑宁: "移动智能终端新型人机交互技术研究", 《信息通信技术与政策》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109637519A (en) * 2018-11-13 2019-04-16 百度在线网络技术(北京)有限公司 Interactive voice implementation method, device, computer equipment and storage medium
CN109637519B (en) * 2018-11-13 2020-01-21 百度在线网络技术(北京)有限公司 Voice interaction implementation method and device, computer equipment and storage medium
CN111524508A (en) * 2019-02-03 2020-08-11 上海蔚来汽车有限公司 Voice conversation system and voice conversation implementation method
CN110223694A (en) * 2019-06-26 2019-09-10 百度在线网络技术(北京)有限公司 Method of speech processing, system and device
CN110223694B (en) * 2019-06-26 2021-10-15 百度在线网络技术(北京)有限公司 Voice processing method, system and device
CN113823282A (en) * 2019-06-26 2021-12-21 百度在线网络技术(北京)有限公司 Voice processing method, system and device
CN111899732A (en) * 2020-06-17 2020-11-06 北京百度网讯科技有限公司 Voice input method and device and electronic equipment
CN111883120A (en) * 2020-07-15 2020-11-03 百度在线网络技术(北京)有限公司 Earphone electric quantity prompting method and device, electronic equipment and storage medium
CN112700769A (en) * 2020-12-26 2021-04-23 科大讯飞股份有限公司 Semantic understanding method, device, equipment and computer readable storage medium
CN114822540A (en) * 2022-06-29 2022-07-29 广州小鹏汽车科技有限公司 Vehicle voice interaction method, server and storage medium

Also Published As

Publication number Publication date
CN107943834B (en) 2021-06-11

Similar Documents

Publication Publication Date Title
CN107943834A (en) Interactive implementation method, device, equipment and storage medium
WO2020182153A1 (en) Method for performing speech recognition based on self-adaptive language, and related apparatus
KR102535338B1 (en) Speaker diarization using speaker embedding(s) and trained generative model
CN110223705A (en) Phonetics transfer method, device, equipment and readable storage medium storing program for executing
CN109637519A (en) Interactive voice implementation method, device, computer equipment and storage medium
CN108428446A (en) Audio recognition method and device
CN109147810A (en) Establish the method, apparatus, equipment and computer storage medium of speech enhan-cement network
CN108269567A (en) For generating the method, apparatus of far field voice data, computing device and computer readable storage medium
CN109313666A (en) Computer proxy message robot
CN109446907A (en) A kind of method, apparatus of Video chat, equipment and computer storage medium
CN108491421A (en) A kind of method, apparatus, equipment and computer storage media generating question and answer
WO2022170848A1 (en) Human-computer interaction method, apparatus and system, electronic device and computer medium
CN109545193A (en) Method and apparatus for generating model
CN110704618B (en) Method and device for determining standard problem corresponding to dialogue data
Pandey et al. Liptype: A silent speech recognizer augmented with an independent repair model
CN109599095A (en) A kind of mask method of voice data, device, equipment and computer storage medium
EP4148727A1 (en) Speech recognition and codec method and apparatus, electronic device and storage medium
CN109697978B (en) Method and apparatus for generating a model
CN108564944A (en) Intelligent control method, system, equipment and storage medium
CN109558605A (en) Method and apparatus for translating sentence
CN115050354B (en) Digital human driving method and device
CN109859747A (en) Voice interactive method, equipment and storage medium
CN113035180A (en) Voice input integrity judgment method and device, electronic equipment and storage medium
CN116612541A (en) Multi-mode emotion recognition method, device and storage medium
CN113948090B (en) Voice detection method, session recording product and computer storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20210510

Address after: 100085 Baidu Building, 10 Shangdi Tenth Street, Haidian District, Beijing

Applicant after: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY Co.,Ltd.

Applicant after: Shanghai Xiaodu Technology Co.,Ltd.

Address before: 100085 Baidu Building, 10 Shangdi Tenth Street, Haidian District, Beijing

Applicant before: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY Co.,Ltd.

GR01 Patent grant
GR01 Patent grant