CN108920125A - It is a kind of for determining the method and apparatus of speech recognition result - Google Patents
It is a kind of for determining the method and apparatus of speech recognition result Download PDFInfo
- Publication number
- CN108920125A CN108920125A CN201810290479.5A CN201810290479A CN108920125A CN 108920125 A CN108920125 A CN 108920125A CN 201810290479 A CN201810290479 A CN 201810290479A CN 108920125 A CN108920125 A CN 108920125A
- Authority
- CN
- China
- Prior art keywords
- user
- response message
- voice
- current response
- speech recognition
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 32
- 230000004044 response Effects 0.000 claims abstract description 287
- 230000002452 interceptive effect Effects 0.000 claims abstract description 51
- 238000004590 computer program Methods 0.000 claims description 5
- 238000012216 screening Methods 0.000 claims description 5
- 230000003993 interaction Effects 0.000 claims description 4
- 238000013459 approach Methods 0.000 claims description 2
- 238000001914 filtration Methods 0.000 claims 1
- 235000013311 vegetables Nutrition 0.000 description 45
- 239000010408 film Substances 0.000 description 30
- 238000010411 cooking Methods 0.000 description 24
- PCHJSUWPFVWCPO-UHFFFAOYSA-N gold Chemical compound [Au] PCHJSUWPFVWCPO-UHFFFAOYSA-N 0.000 description 18
- 239000010931 gold Substances 0.000 description 18
- 229910052737 gold Inorganic materials 0.000 description 18
- 235000001674 Agaricus brunnescens Nutrition 0.000 description 15
- 235000013601 eggs Nutrition 0.000 description 15
- 235000015277 pork Nutrition 0.000 description 15
- 235000013372 meat Nutrition 0.000 description 11
- 241000219793 Trifolium Species 0.000 description 8
- 238000010586 diagram Methods 0.000 description 8
- 238000011156 evaluation Methods 0.000 description 8
- 235000012054 meals Nutrition 0.000 description 8
- 238000012545 processing Methods 0.000 description 7
- 230000001755 vocal effect Effects 0.000 description 7
- 238000004891 communication Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 5
- 241000196324 Embryophyta Species 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 235000015067 sauces Nutrition 0.000 description 4
- 235000013555 soy sauce Nutrition 0.000 description 4
- 239000010409 thin film Substances 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 3
- 229910017435 S2 In Inorganic materials 0.000 description 2
- 230000009471 action Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000012937 correction Methods 0.000 description 2
- 238000000151 deposition Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000005611 electricity Effects 0.000 description 2
- 230000010365 information processing Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 108091064702 1 family Proteins 0.000 description 1
- 230000001133 acceleration Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
- G06F3/167—Audio in a user interface, e.g. using voice commands for navigating, audio feedback
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/226—Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics
- G10L2015/228—Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics of application context
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Acoustics & Sound (AREA)
- User Interface Of Digital Computer (AREA)
- Telephonic Communication Services (AREA)
Abstract
The object of the present invention is to provide a kind of for determining the method and apparatus of speech recognition result.Specifically, the natural language instructions of user's input are obtained;According to the current response message for receiving intelligent terminal corresponding to the natural language instructions, the natural-sounding order is parsed, to obtain corresponding speech recognition result.Compared with prior art, the present invention realizes the efficiency and accuracy for improving speech recognition, also improves the interactive voice experience of user.
Description
Technical field
The present invention relates to technical field of voice recognition more particularly to a kind of for determining the technology of speech recognition result.
Background technique
Speech recognition technology is exactly to allow machine that voice signal is changed into accordingly by identification and understanding process in simple terms
Text technology, cut and appear in fields such as household electrical appliances, automotive electronics, consumption electronic products, great convenience people with
The interaction of equipment.
Existing speech recognition technology usually only considers voice signal itself in speech recognition process, and then combines complete
The language model of office's training carries out speech recognition and optimization.However, user is under the situation inputted using voice, it is to be expressed
Content is usually related with content seen in active user, user's factors such as obtained response message during interactive voice, than
Such as, user is asking nearby there is what bank, then says how to remove xxx;" xxx " at this time is largely obtained with user the last time
The response message (such as neighbouring some Bank Names) taken or a series of continual commands institute before the natural language instructions are right
The response message answered is related, and this etc. the contents of factors be dynamic change.Obviously, existing speech recognition technology can not utilize
This etc. follows the dynamic factor of different application, different service sides, different service objects and change in time and space to carry out speech recognition,
The efficiency and accuracy for reducing speech recognition also affect the interactive voice experience of user.
Summary of the invention
It is an object of the present invention to provide a kind of for determining the method and apparatus of speech recognition result.
According to one embodiment of present invention, a kind of method for determining speech recognition result is provided, wherein the party
Method includes the following steps:
A obtains the natural language instructions of user's input;
B is according to the current response message for receiving intelligent terminal corresponding to the natural language instructions, to the natural language
Sound order is parsed, to obtain corresponding speech recognition result.
According to another embodiment of the invention, one kind is additionally provided to be used to determine speech recognition result locking equipment really,
Wherein, which includes:
Acquisition device, for obtaining the natural language instructions of user's input;
Resolver, for according to the current response message for receiving intelligent terminal corresponding to the natural language instructions,
The natural-sounding order is parsed, to obtain corresponding speech recognition result.
According to still another embodiment of the invention, a kind of calculating equipment is additionally provided, including:
One or more processors;
Memory, for storing one or more programs,
When one or more of programs are executed by one or more of processors, so that one or more of places
It manages device and executes a kind of such as aforementioned method for determining speech recognition result according to an embodiment of the invention.
A still further embodiment according to the present invention additionally provides a kind of computer readable storage medium, is stored thereon with meter
Calculation machine program, wherein realized when the program is executed by processor as aforementioned according to an embodiment of the invention a kind of for true
Determine the method for speech recognition result.
Compared with prior art, one embodiment of the present of invention passes through according to the natural language instructions institute for receiving user's input
The current response message of corresponding intelligent terminal parses the natural-sounding order, to obtain corresponding speech recognition
As a result, improving the efficiency and accuracy of speech recognition, the interactive voice experience of user is also improved.
Detailed description of the invention
By reading a detailed description of non-restrictive embodiments in the light of the attached drawings below, of the invention other
Feature, objects and advantages will become more apparent upon:
A kind of equipment for determining speech recognition result locking equipment really that Fig. 1 shows one aspect according to the present invention is shown
It is intended to;
Fig. 2 shows the current sound of the intelligent terminal of the natural language instructions for receiving user's input of one embodiment of the invention
Answer information schematic diagram;
Fig. 3 shows the current of the intelligent terminal of the natural language instructions for receiving user's input of another embodiment of the present invention
Response message schematic diagram;
Fig. 4 shows the current of the intelligent terminal of the natural language instructions for receiving user's input of a still further embodiment of the present invention
Response message schematic diagram;
Fig. 5 shows according to a further aspect of the present invention a kind of for determining the method flow diagram of speech recognition result;
Fig. 6 shows the block diagram for being suitable for the exemplary computer system/server for being used to realize embodiment of the present invention.
The same or similar appended drawing reference represents the same or similar component in attached drawing.
Specific embodiment
Present invention is further described in detail with reference to the accompanying drawing.
Fig. 1 shows a kind of equipment for determining speech recognition result locking equipment 1 really of one aspect according to the present invention
Schematic diagram, wherein determine that equipment 1 includes acquisition device 11 and resolver 12.Specifically, acquisition device 11 obtains user's input
Natural language instructions;Resolver 12 is according to the current response letter for receiving intelligent terminal corresponding to the natural language instructions
Breath, parses the natural-sounding order, to obtain corresponding speech recognition result.
Where it determines that equipment 1 refers to that one kind being capable of the intelligence according to corresponding to the natural language instructions for receiving user's input
The current response message of terminal, parses the natural-sounding order, the equipment to obtain corresponding speech recognition result.
In a particular embodiment, it determines that equipment 1 can be realized by intelligent terminal, can also mutually be collected with intelligent terminal by network by the network equipment
It is realized at the equipment (being matched by intelligent terminal and the network equipment) constituted, is also used as software module and/or hardware
Module is contained in intelligent terminal, be can also be used as hardware device and intelligent terminal and is connected by wired or wireless mode.
Here, the network equipment includes but is not limited to such as network host, single network server, multiple network server collection or is based on
Set of computers of cloud computing etc. is realized.Here, cloud is by a large amount of hosts or network based on cloud computing (Cloud Computing)
Server is constituted, wherein cloud computing is one kind of distributed computing, and one consisting of a loosely coupled set of computers super
Grade virtual machine.Here, the intelligent terminal, which can be any one, to pass through keyboard, touch tablet, touch screen, distant with user
One or more modes such as control device, interactive voice or handwriting equipment carry out the electronic product of human-computer interaction, such as PC, mobile phone, intelligence
It can mobile phone, PDA, wearable device, palm PC PPC, wearable device, tablet computer, intelligent vehicle device, smart television, intelligence
Speaker etc..In practical applications, determine equipment 1 be intelligent terminal when, can carry/install thereon can identify, parse, understanding,
Handle and respond the natural language instructions of user and by the client (can be APP form) that response results export, it can also
To be that the client is only capable of carrying out speech recognition to the natural language instructions that user inputs but the corresponding server of need comes to this certainly
Right verbal order parsed, understood, handled and respond user natural language instructions and by response results return client into
Row output.The network includes but is not limited to internet, wide area network, Metropolitan Area Network (MAN), local area network, VPN network, wireless self-organization network
(Ad Hoc network) etc..Those skilled in the art will be understood that above-mentioned determining equipment 1 is only for example, other are existing or from now on may be used
The network equipment or intelligent terminal that can occur such as are applicable to the present invention, should also be included within the scope of protection of the present invention, and
This is incorporated herein by reference.Here, the network equipment and intelligent terminal include that one kind can be according to being previously set or store
Instruction, the automatic numerical value that carries out calculates and the electronic equipment of information processing, and hardware includes but is not limited to microprocessor, dedicated collection
At circuit (ASIC), programmable gate array (FPGA), digital processing unit (DSP), embedded device etc..
In one embodiment, however, it is determined that equipment 1 is the intelligent terminal of user, it is determined that equipment 1 passes through its own first
Provided application programming interfaces (API), or by application programming interfaces (API) provided by pick up facility, obtain user
The natural language instructions of input;Then, it is determined that the intelligent terminal according to corresponding to the reception natural language instructions of equipment 1 is worked as
Preceding response message parses the natural-sounding order, to obtain corresponding speech recognition result.
In another embodiment, however, it is determined that equipment 1 is the equipment that the network equipment and intelligent terminal are integrated, i.e. determination is set
Standby 1 matches realization by intelligent terminal and the network equipment, then intelligent terminal passes through its own provided application program first and connects
Mouth (API), or by application programming interfaces (API) provided by pick up facility, obtain the natural language instructions of user's input;
Then, the natural language instructions are sent to the network equipment by intelligent terminal;And it will receive corresponding to the natural language instructions
The current response message of intelligent terminal be also sent to the network equipment;Then, the network equipment is right according to the current response message
The natural-sounding order is parsed, to obtain corresponding speech recognition result.
In a still further embodiment, however, it is determined that equipment 1 is the equipment that the network equipment and intelligent terminal are integrated, i.e. determination is set
Standby 1 matches realization by intelligent terminal and the network equipment, then intelligent terminal passes through its own provided application program first and connects
Mouth (API), or by application programming interfaces (API) provided by pick up facility, obtain the natural language instructions of user's input;
Then, the natural language instructions are sent to the network equipment by intelligent terminal;Then, the network equipment is ordered according to the natural language
It enables, and the current response message of intelligent terminal corresponding to the reception natural language instructions obtained, to the natural language
Speech order is parsed, to obtain corresponding speech recognition result.
Specifically, acquisition device 11 is by application programming interfaces (API) provided by intelligent terminal itself, or passes through all
The application programming interfaces as provided by the third party devices such as pick up facility (API) obtain the natural language instructions of user's input.
For example, it is assumed that take-away is ordered in user A plan, open intelligent terminal (such as smart television), carried on the smart television/
It is mounted with the visitor that can be identified, parse, understand, handle and respond the natural language instructions of user and export response results
Family end (such as voice assistant APP), user A say " take-away " that dining room is taken out in the recommendation as shown in Figure 2 of corresponding response message, if
User A wants to order the take-away in " U ancient cooking vessel emits dish " this family dining room, and then user A says " U ancient cooking vessel emits dish ", then acquisition device 11 passes through intelligence first
Application programming interfaces (API) provided by energy TV itself, get the natural language instructions " U ancient cooking vessel emits dish " of user A input.
Then, resolver 12 passes through application programming interfaces (API) provided by such as described intelligent terminal first, or
Person is asked by the inquiry for sending current response message to server corresponding to the intelligent terminal (such as speech recognition server)
It asks, and receives the current response message that server is returned based on the inquiry request, receive the natural language instructions institute to obtain
The current response message of corresponding intelligent terminal, alternatively, the current sound of the reception intelligent terminal active upload can be passed through
Information is answered to obtain the current response message;Alternatively, can also obtain the current sound by speech-recognition services provider
Answer information;Then, according to the current response message, the natural-sounding order is parsed, to obtain corresponding voice
Recognition result.Here, the current response message can be content (the i.e. institute that the present user interface of the intelligent terminal is shown
State user's visible content in the user interface of the intelligent terminal), or because of the display screen size of the intelligent terminal
Fail in the user interface of the intelligent terminal display but user Ke Tong completely caused by the factors such as limited, front end UI design
Cross at least page turning, check more details or be discussed in detail or the current response message associated by every satellite information etc.
The content that can be shown in the user interface of the intelligent terminal is operated (for example, the current response message includes multiple at this time
Paging and its be the non-homepage in multiple pagings), wherein the response mode of the current response message includes following at least any
?:
The current response message is that the previous interactive instruction based on the user responds;
The current response message is not based on the previous interactive instruction response of the user.
Herein, it should be noted that above-mentioned current response message can respond under any response mode, below only to lift
Example form is illustrated, and is not to be construed as limiting the invention.
For example, if the current response message is that the previous interactive instruction based on the user responds, here, previous friendship
Mutually instruction can be the user by keyboard, touch tablet, touch screen, remote controler, interactive voice or handwriting equipment etc. it is a kind of or
The interactive instruction that various ways issue, for the natural language instructions " U ancient cooking vessel emits dish " of user A input, then 12 basis of resolver
Receive the current response message of the smart television of the natural language instructions, i.e. the content as shown in Figure 2 that shows of smart television, it is right
Natural language instructions " U ancient cooking vessel emits dish " are parsed, that is, are based on user's word " U ancient cooking vessel emits dish " this voice flow, and combine and be based on
The current response message (i.e. content shown in Fig. 2) that the previous interactive instruction (i.e. natural language instructions " take-away ") of user A is responded,
It can accurate judgement user thinks is " U ancient cooking vessel emits dish " rather than " emitting dish a little " or other similar voice.
For another example, if the current response message is not based on the content of the previous interactive instruction response of the user, such as:
The current response message is booting homepage shown by the present user interface of the intelligent terminal, the current response message
The content for being in response to the interactive instruction in other users and showing, here, the booting homepage can be the page of default, it can also
To be the page stopped when shutdown.Assuming that take-away is ordered in the household user B plan of user A, intelligent terminal is opened (such as intelligence electricity
Depending on), the natural language instructions of user can be identified, parses, understands, handles and respond simultaneously by carrying/being mounted on the smart television
The client (such as voice assistant APP) that response results are exported, user B say " take-away ", corresponding response message such as Fig. 3 institute
Dining room is taken out in the recommendation shown, at this time if the busy just explanation user A of user B takes out after renewal, since user A wants to order " one spoon will "
The take-away in this family dining room, then user A says " what dish will be one spoon will have ", then resolver 12 is according to the reception natural language instructions
Smart television current response message, i.e. the content as shown in Figure 3 that shows of smart television, to " one spoon of natural language instructions
To have any dish " it parses, that is, it is based on user's word " what dish will be one spoon will have " this voice flow, and as shown in connection with fig. 3
Content, can accurate judgement user think be " what dish will be one spoon will have " rather than " what dish one spoon of sauce has ", " one spoon of soy sauce is assorted
Dish " or other similar voice.
Optionally, the current response message is user company performed before executing the natural language instructions
Continue response contents corresponding to each interactive instruction in multiple interactive instructions.
For example, for response message as shown in Figure 3, it is assumed that user A says " the 4th ", in the active user of smart television
After the vegetable for seeing " the 4th " dining room " Yoshinoya " on interface, user A discovery is not fond of eating, because depositing in the memory of user A
There are the dining rooms " one spoon is incited somebody to action " in corresponding information as shown in Figure 3, and then user A says " what dish will be one spoon will have ", then parses dress
Set 12 according to receive the natural language instructions smart television current response message, i.e. user A execute natural language instructions
Response contents corresponding to each interactive instruction in performed continuous multiple interactive instructions before " what dish will be one spoon will have ", such as
The content as shown in Figure 3 that smart television is shown parses natural language instructions " what dish will be one spoon will have ", i.e., based on use
Family word " what dish will be one spoon will have " this voice flow, and content as shown in connection with fig. 3, can accurate judgement user think be
" what dish will be one spoon will have " rather than " what dish one spoon of sauce has ", " one spoon soy sauce what dish " or other similar voice.
Optionally, described current if the current response message is that the previous interactive instruction based on the user responds
Response message is shown in the user interface of the intelligent terminal, alternatively, the current response message include multiple pagings and its be
Non- homepage in multiple pagings.
For example, it is assumed that film ticket is ordered in user A plan, open intelligent terminal (such as smart television), is taken on the smart television
It carries/is mounted with the natural language instructions that can identify, parse, understand, handle and respond user and export response results
Client (such as voice assistant APP), user A says " film ticket " that corresponding response message is since content is more, paging exhibition
Show, user A in the user interface of smart television it is seen that the first paged content, as shown in Fig. 4 (a), other paged contents
Need user by least such as page-turning instruction can in the user interface of smart television as it can be seen that as shown in Fig. 4 (b), if
User wants to order film《Old beast》Film ticket, then say " film for having old beast ", then resolver 12 is according to receive should be from
The content as shown in Fig. 4 (a)-(b) that the current response message of the smart television of right verbal order, i.e. smart television are shown is right
Natural language instructions " film for having old beast " are parsed, can accurate judgement user thinks is " film for having old beast ",
Although content shown in Fig. 4 (b) is not depicted on present user interface, if being based purely on user's word " has the film of old beast
" this voice flow, it may appear that the situation of system identification mistake, as being identified as " having old thin film ", " having veteran
Film " etc..
Those skilled in the art will be understood that above-mentioned current response message is only for example, other are existing or may go out from now on
Existing current response message is such as applicable to the present invention, should also be included within the scope of protection of the present invention, and herein with reference side
Formula is incorporated herein.
Those skilled in the art will be understood that the response mode of above-mentioned current response message is only for example, other it is existing or
The response mode for the current response message being likely to occur from now on is such as applicable to the present invention, should also be included in the scope of the present invention
Within, and be incorporated herein by reference.
Here, the present invention passes through according to the current sound for receiving intelligent terminal corresponding to the natural language instructions that user inputs
Information is answered, the natural-sounding order is parsed, to obtain corresponding speech recognition result, improves the effect of speech recognition
Rate and accuracy also improve the interactive voice experience of user.
Optionally, resolver 12 can also be according to the current sound for receiving intelligent terminal corresponding to the natural language instructions
It answers history corpus corresponding to information and the user to feed back library, the natural-sounding order is parsed, with acquisition pair
The speech recognition result answered.
Here, history corpus feedback library cover the user with intelligent terminal (can be same intelligent terminal,
It is also possible to multiple and different intelligent terminals) what is said or talked about in all dialogues by the user during interactive voice, and it optionally, should
It is renewable that corpus feeds back library.
For example, also for the natural language instructions " film for having old beast " of user A, it is assumed that do not have in current response message
Information relevant to voice flow " laoshou ", then resolver 12 can feed back library in the history corpus of user A according to the voice flow
In inquired, discovery have information corresponding with the voice flow such as " play South Korea《Veteran》This film ", then resolver 12
Can determine whether that user A thinks is " film for having veteran ", rather than " having old thin film ", " film for having old beast ",
Or other similar voice.
Here, the present invention passes through according to the current sound for receiving intelligent terminal corresponding to the natural language instructions that user inputs
It answers history corpus corresponding to information and the user to feed back library, the natural-sounding order is parsed, is further mentioned
The high efficiency and accuracy of speech recognition, further improves the interactive voice experience of user.
If the current response message includes multiple response contents items and each response contents item corresponds to multiple subitems
Content, wherein resolver 12 is according to multiple corresponding to the multiple response contents item and each response contents item
Subitem content parses the natural-sounding order, to obtain corresponding speech recognition result.
For example, it is assumed that current response message is that (multiple take-away dining rooms of the page are in recommendation take-away dining room as shown in Figure 2
For multiple response contents items included by the current response message, multiple vegetables in each dining room constitute corresponding multiple
Subitem content), user A says " taking a shredded pork and eggs with dired mushroom " at this time, if only dining room " gold million " has this vegetable, resolver 12
Can multiple subitem contents according to corresponding to " gold million " this response contents item in current response message as shown in Figure 2,
That is vegetable possessed by " gold million ", can determine whether that user A thinks is " taking a shredded pork and eggs with dired mushroom ", rather than " a clover
Meat " or other similar voice.
Here, the present invention passes through multiple response contents items according to included by the current response message and each institute
Multiple subitem contents corresponding to response contents item are stated, the natural-sounding order is parsed, so that although each
Multiple subitem contents corresponding to response contents item are not depicted on present user interface, and the present invention can also accurately identify user's
Natural-sounding order, further improves the efficiency and accuracy of speech recognition, and improves the interactive voice experience of user.
Optionally, resolver 12 can also be according to the current sound for receiving intelligent terminal corresponding to the natural language instructions
Content information associated by information and the current response message is answered, the natural-sounding order is parsed, to obtain
Corresponding speech recognition result.Specifically, resolver 12 is first by such as to server corresponding to the intelligent terminal
The inquiry request of content information associated by the current response message is sent, and receives server and is returned based on the inquiry request
The current response message associated by content information, to obtain content information associated by the current response message;
Then, further according to content information associated by the current response message and the current response message to the natural language
Sound order is parsed, to obtain corresponding speech recognition result.
Here, content information associated by the current response message refers to items associated by the current response message
Satellite information etc..For example, associated content information includes the meal that current vegetable is belonged to if current response message is vegetable
Shop, other content related with the restaurant such as restaurant name/style of cooking/vegetable/business hours/evaluation/geographical location/etc.;If current
Response message is restaurant, and associated content information includes vegetable possessed by current restaurant, related with the restaurant in other
The appearance such as restaurant name/style of cooking/vegetable/business hours/evaluation/geographical location/;If current response message is film, associated
Content information includes type/age/protagonist of current movie etc..
Here, for content information associated by the current response message, may be in response to user's execution in UI
Hold relevant operation and is shown in the user interface of the intelligent terminal.
For example, for response message as shown in Figure 3, it is assumed that user A says " the 4th is double-screen flip " or the " the 4th
Good a few taste Two bors d's oeuveres meals ", then resolver 12 is according to response message as shown in Figure 3, to the natural language instructions " of user A
Four as double-screen flips " or " the 4th good a few taste Two bors d's oeuveres meals " are parsed, because of response message Chinese Restaurant as shown in Figure 3
" Yoshinoya " and its attached dish information, can determine whether that user A thinks is " the 4th lucky taste Two bors d's oeuveres meal ".
For another example, for response message as shown in Figure 4, it is assumed that user A says what " Zhan Nifo Sean Connery " was drilled, then resolver
12, according to response message as shown in Figure 4, parse the natural language instructions " Zhan Nifo Sean Connery " of user A, because of such as Fig. 4
Shown in film " advancing bravely " and its attached protagonist information in response message, that can determine whether that user A thinks is " Janney Fo Kang
What Na Li was drilled ".
Here, the present invention according to the current response for receiving intelligent terminal corresponding to the natural language instructions by believing
Content information associated by breath and the current response message, parses the natural-sounding order, to not only may be used
It eliminates the ambiguity of natural language instructions, facilitate voice error correction, correspondingly, also further improve the efficiency and standard of speech recognition
Exactness, and improve the interactive voice experience of user.
In one embodiment, determine that equipment 1 further includes determining device (not shown) and offer device (not shown).Specifically
Ground, determining device determine voice-response information corresponding to the natural-sounding order according to institute's speech recognition result;It provides
The voice-response information is supplied to the user by device.
For example, resolver 12 is to the natural language instructions for the natural language instructions " U ancient cooking vessel emits dish " of user A input
It is parsed, obtains corresponding speech recognition result, then, it is determined that device obtains speech recognition result according to resolver 12,
Determine that corresponding voice-response information is as " the recommendation vegetable that U ancient cooking vessel emits dish " then provides device and mention the voice-response information
User A is supplied, if voice plays " recommending U ancient cooking vessel to emit the following vegetable of dish for you ", while showing correlation on the interface of smart television
Dish information (such as title, price, evaluation, vegetable picture).
In a still further embodiment, if the voice-response information is described in the multiple and described voice-response information belongs to
Subitem content, wherein determine that equipment 1 further includes screening plant (not shown).Specifically, screening plant is from multiple institute's Voice Responses
It answers and filters out preferred voice-response information in information;Wherein, the preferred voice-response information is supplied to by the offer device
The user.
For example, it is assumed that current response message is that (multiple take-away dining rooms of the page are in recommendation take-away dining room as shown in Figure 2
For multiple response contents items included by the current response message, multiple vegetables in each dining room constitute corresponding multiple
Subitem content), user A says " taking a shredded pork and eggs with dired mushroom " at this time, for this voice flow, if in addition to dining room " gold million " has " the wooden palpus
This vegetable of meat ", " 25 piece half " also has vegetable " clover meat ", and resolver 12 is according to current sound as shown in Figure 2
Multiple subitem contents (i.e. vegetable possessed by " gold million ") corresponding to " gold million " this response contents item in information are answered,
And multiple subitem content (i.e. dishes possessed by " 25 piece half " corresponding to " 25 piece half " this response contents item
Product), it can determine whether that " shredded pork and eggs with dired mushroom " and " clover meat " is the speech recognition result of the voice flow " muxurou " of user A, correspondingly,
Determining device according to the speech recognition result, determine corresponding voice-response information be as " vegetable ' shredded pork and eggs with dired mushroom ' of gold million and
25 piece half of vegetable ' clover meat ' ";Then, screening plant filters out preferred voice from multiple voice-response information and rings
Information is answered, such as is screened according to the preference of user, natural quality, social property factor, such as assumes user A preference vegetable
" shredded pork and eggs with dired mushroom ", then screening plant can determine that preferred voice-response information is " vegetable ' shredded pork and eggs with dired mushroom ' of gold million ";Then, dress is provided
It sets and the preferred voice-response information is supplied to the user, " recommend vegetable ' the wooden palpus of gold million for you as voice plays
Meat ' ", at the same show on the interface of smart television related dish information (such as title, price, evaluation, vegetable picture).
Optionally, providing device also directly can be supplied to the user for multiple voice-response informations, so that
User screens in multiple voice-response informations.
Fig. 5 shows according to a further aspect of the present invention a kind of for determining the method flow diagram of speech recognition result.
Wherein, the method comprising the steps of S1 and step S2.
Specifically, in step sl, determine that equipment 1 obtains the natural language instructions of user's input;In step s 2, it determines
Equipment 1 orders the natural-sounding according to the current response message for receiving intelligent terminal corresponding to the natural language instructions
Order is parsed, to obtain corresponding speech recognition result.
Where it determines that equipment 1 refers to that one kind being capable of the intelligence according to corresponding to the natural language instructions for receiving user's input
The current response message of terminal, parses the natural-sounding order, the equipment to obtain corresponding speech recognition result.
In a particular embodiment, it determines that equipment 1 can be realized by intelligent terminal, can also mutually be collected with intelligent terminal by network by the network equipment
It is realized at the equipment (being matched by intelligent terminal and the network equipment) constituted, is also used as software module and/or hardware
Module is contained in intelligent terminal, be can also be used as hardware device and intelligent terminal and is connected by wired or wireless mode.
Here, the network equipment includes but is not limited to such as network host, single network server, multiple network server collection or is based on
Set of computers of cloud computing etc. is realized.Here, cloud is by a large amount of hosts or network based on cloud computing (Cloud Computing)
Server is constituted, wherein cloud computing is one kind of distributed computing, and one consisting of a loosely coupled set of computers super
Grade virtual machine.Here, the intelligent terminal, which can be any one, to pass through keyboard, touch tablet, touch screen, distant with user
One or more modes such as control device, interactive voice or handwriting equipment carry out the electronic product of human-computer interaction, such as PC, mobile phone, intelligence
It can mobile phone, PDA, wearable device, palm PC PPC, wearable device, tablet computer, intelligent vehicle device, smart television, intelligence
Speaker etc..In practical applications, determine equipment 1 be intelligent terminal when, can carry/install thereon can identify, parse, understanding,
Handle and respond the natural language instructions of user and by the client (can be APP form) that response results export, it can also
To be that the client is only capable of carrying out speech recognition to the natural language instructions that user inputs but the corresponding server of need comes to this certainly
Right verbal order parsed, understood, handled and respond user natural language instructions and by response results return client into
Row output.The network includes but is not limited to internet, wide area network, Metropolitan Area Network (MAN), local area network, VPN network, wireless self-organization network
(Ad Hoc network) etc..Those skilled in the art will be understood that above-mentioned determining equipment 1 is only for example, other are existing or from now on may be used
The network equipment or intelligent terminal that can occur such as are applicable to the present invention, should also be included within the scope of protection of the present invention, and
This is incorporated herein by reference.Here, the network equipment and intelligent terminal include that one kind can be according to being previously set or store
Instruction, the automatic numerical value that carries out calculates and the electronic equipment of information processing, and hardware includes but is not limited to microprocessor, dedicated collection
At circuit (ASIC), programmable gate array (FPGA), digital processing unit (DSP), embedded device etc..
In one embodiment, however, it is determined that equipment 1 is the intelligent terminal of user, it is determined that equipment 1 passes through its own first
Provided application programming interfaces (API), or by application programming interfaces (API) provided by pick up facility, obtain user
The natural language instructions of input;Then, it is determined that the intelligent terminal according to corresponding to the reception natural language instructions of equipment 1 is worked as
Preceding response message parses the natural-sounding order, to obtain corresponding speech recognition result.
In another embodiment, however, it is determined that equipment 1 is the equipment that the network equipment and intelligent terminal are integrated, i.e. determination is set
Standby 1 matches realization by intelligent terminal and the network equipment, then intelligent terminal passes through its own provided application program first and connects
Mouth (API), or by application programming interfaces (API) provided by pick up facility, obtain the natural language instructions of user's input;
Then, the natural language instructions are sent to the network equipment by intelligent terminal;And it will receive corresponding to the natural language instructions
The current response message of intelligent terminal be also sent to the network equipment;Then, the network equipment is right according to the current response message
The natural-sounding order is parsed, to obtain corresponding speech recognition result.
In a still further embodiment, however, it is determined that equipment 1 is the equipment that the network equipment and intelligent terminal are integrated, i.e. determination is set
Standby 1 matches realization by intelligent terminal and the network equipment, then intelligent terminal passes through its own provided application program first and connects
Mouth (API), or by application programming interfaces (API) provided by pick up facility, obtain the natural language instructions of user's input;
Then, the natural language instructions are sent to the network equipment by intelligent terminal;Then, the network equipment is ordered according to the natural language
It enables, and the current response message of intelligent terminal corresponding to the reception natural language instructions obtained, to the natural language
Speech order is parsed, to obtain corresponding speech recognition result.
Specifically, in step sl, determine that equipment 1 passes through application programming interfaces (API) provided by intelligent terminal itself,
Or by application programming interfaces (API) provided by the third party devices such as pick up facility, obtain the nature of user's input
Verbal order.
For example, it is assumed that take-away is ordered in user A plan, open intelligent terminal (such as smart television), carried on the smart television/
It is mounted with the visitor that can be identified, parse, understand, handle and respond the natural language instructions of user and export response results
Family end (such as voice assistant APP), user A say " take-away " that dining room is taken out in the recommendation as shown in Figure 2 of corresponding response message, if
User A wants to order the take-away in " U ancient cooking vessel emits dish " this family dining room, and then user A says " U ancient cooking vessel emits dish ", then in step sl, determines equipment 1
First by application programming interfaces (API) provided by smart television itself, the natural language instructions " U of user A input is got
Ancient cooking vessel emits dish ".
Then, in step s 2, determine equipment 1 first by application programming interfaces provided by such as described intelligent terminal
(API), alternatively, by sending current response message to server corresponding to the intelligent terminal (such as speech recognition server)
Inquiry request, and receive the current response message that server is returned based on the inquiry request, receive the natural language to obtain
The current response message of the corresponding intelligent terminal of speech order, alternatively, the institute of the reception intelligent terminal active upload can be passed through
Current response message is stated to obtain the current response message;Alternatively, can also be by speech-recognition services provider to obtain
State current response message;Then, according to the current response message, the natural-sounding order is parsed, with acquisition pair
The speech recognition result answered.Here, what the present user interface that the current response message can be the intelligent terminal was shown
Content (the i.e. described user visible content in the user interface of the intelligent terminal), or because of the display of the intelligent terminal
Fail in the user interface of the intelligent terminal caused by the factors such as screen size is limited, front end UI design display completely but
User can by least such as page turning, check more details or be discussed in detail or the current response message associated by items it is attached
The content that the operations such as category information can be shown in the user interface of the intelligent terminal is (for example, the current response message is at this time
Including multiple pagings and its be the non-homepage in multiple pagings), wherein the response mode of the current response message includes following
Any one of at least:
The current response message is that the previous interactive instruction based on the user responds;
The current response message is not based on the previous interactive instruction response of the user.
Herein, it should be noted that above-mentioned response message can respond under any response mode, below only with shape of illustrating
Formula is illustrated, and is not to be construed as limiting the invention.
For example, if the current response message is that the previous interactive instruction based on the user responds, here, previous friendship
Mutually instruction can be the user by keyboard, touch tablet, touch screen, remote controler, interactive voice or handwriting equipment etc. it is a kind of or
The interactive instruction that various ways issue then in step s 2, determines the natural language instructions " U ancient cooking vessel emits dish " of user A input
According to the current response message for the smart television for receiving the natural language instructions, i.e. smart television shows as shown in Figure 2 equipment 1
Content, natural language instructions " U ancient cooking vessel emits dish " are parsed, that is, be based on user's word " U ancient cooking vessel emits dish " this voice flow, and
Current response message (the i.e. Fig. 2 institute responded in conjunction with the previous interactive instruction (i.e. natural language instructions " take-away ") based on user A
Show content), it can accurate judgement user thinks is " U ancient cooking vessel emits dish " rather than " emitting dish a little " or other similar voice.
For another example, if the current response message is not based on the content of the previous interactive instruction response of the user, such as:
The current response message is booting homepage shown by the present user interface of the intelligent terminal, the current response message
The content for being in response to the interactive instruction in other users and showing, here, the booting homepage can be the page of default, it can also
To be the page stopped when shutdown.Assuming that take-away is ordered in the household user B plan of user A, intelligent terminal is opened (such as intelligence electricity
Depending on), the natural language instructions of user can be identified, parses, understands, handles and respond simultaneously by carrying/being mounted on the smart television
The client (such as voice assistant APP) that response results are exported, user B say " take-away ", corresponding response message such as Fig. 3 institute
Dining room is taken out in the recommendation shown, at this time if the busy just explanation user A of user B takes out after renewal, since user A wants to order " one spoon will "
The take-away in this family dining room, then user A says " what dish will be one spoon will have ", then in step s 2, determines that equipment 1 should be certainly according to reception
The content as shown in Figure 3 that the current response message of the smart television of right verbal order, i.e. smart television are shown, to natural language
Order " what dish will be one spoon will have " parses, that is, is based on user's word " what dish will be one spoon will have " this voice flow, and tie
Close Fig. 3 shown in content, can accurate judgement user think be " what dish will be one spoon will have " rather than " what dish one spoon of sauce has ",
" one spoon soy sauce what dish " or other similar voice.
Optionally, the current response message is user company performed before executing the natural language instructions
Continue response contents corresponding to each interactive instruction in multiple interactive instructions.
For example, for response message as shown in Figure 3, it is assumed that user A says " the 4th ", in the active user of smart television
After the vegetable for seeing " the 4th " dining room " Yoshinoya " on interface, user A discovery is not fond of eating, because depositing in the memory of user A
There are the dining rooms " one spoon is incited somebody to action " in corresponding information as shown in Figure 3, and then user A says " what dish will be one spoon will have ", then in step
In S2, equipment 1 is determined according to the current response message for the smart television for receiving the natural language instructions, i.e. user A is being executed certainly
Before right verbal order " what dish will be one spoon will have " in performed continuous multiple interactive instructions corresponding to each interactive instruction
Response contents solve natural language instructions " what dish will be one spoon will have " such as the content as shown in Figure 3 that smart television is shown
Analysis is based on user's word " what dish will be one spoon will have " this voice flow, and content as shown in connection with fig. 3, can accurate judgement use
What family was thought is " what dish will be one spoon will have " rather than " what dish one spoon of sauce has ", " one spoon soy sauce what dish " or other similar
Voice.
Optionally, described current if the current response message is that the previous interactive instruction based on the user responds
Response message is shown in the user interface of the intelligent terminal, alternatively, the current response message include multiple pagings and its be
Non- homepage in multiple pagings.
For example, it is assumed that film ticket is ordered in user A plan, open intelligent terminal (such as smart television), is taken on the smart television
It carries/is mounted with the natural language instructions that can identify, parse, understand, handle and respond user and export response results
Client (such as voice assistant APP), user A says " film ticket " that corresponding response message is since content is more, paging exhibition
Show, user A in the user interface of smart television it is seen that the first paged content, as shown in Fig. 4 (a), other paged contents
Need user by least such as page-turning instruction can in the user interface of smart television as it can be seen that as shown in Fig. 4 (b), if
User wants to order film《Old beast》Film ticket, then say " film for having old beast ", then in step s 2, determine equipment 1
According to receive the natural language instructions smart television current response message, i.e., smart television show as shown in Fig. 4 (a)-(b)
Content, natural language instructions " film for having old beast " are parsed, can accurate judgement user thinks is " to have old beast
Film ", although content shown in Fig. 4 (b) is not depicted on present user interface, if being based purely on user's word " has old
This voice flow of the film of beast ", it may appear that the situation of system identification mistake, be such as identified as " have old thin film ",
" film for having veteran " etc..
Those skilled in the art will be understood that above-mentioned current response message is only for example, other are existing or may go out from now on
Existing current response message is such as applicable to the present invention, should also be included within the scope of protection of the present invention, and herein with reference side
Formula is incorporated herein.
Those skilled in the art will be understood that the response mode of above-mentioned current response message is only for example, other it is existing or
The response mode for the current response message being likely to occur from now on is such as applicable to the present invention, should also be included in the scope of the present invention
Within, and be incorporated herein by reference.
Here, the present invention passes through according to the current sound for receiving intelligent terminal corresponding to the natural language instructions that user inputs
Information is answered, the natural-sounding order is parsed, to obtain corresponding speech recognition result, improves the effect of speech recognition
Rate and accuracy also improve the interactive voice experience of user.
Optionally, in step s 2, determine that equipment 1 can also be whole according to intelligence corresponding to the natural language instructions is received
End current response message and the user corresponding to history corpus feed back library, the natural-sounding order is solved
Analysis, to obtain corresponding speech recognition result.
Here, history corpus feedback library cover the user with intelligent terminal (can be same intelligent terminal,
It is also possible to multiple and different intelligent terminals) what is said or talked about in all dialogues by the user during interactive voice, and it optionally, should
It is renewable that corpus feeds back library.
For example, also for the natural language instructions " film for having old beast " of user A, it is assumed that do not have in current response message
Information relevant to voice flow " laoshou ", then in step s 2, determine equipment 1 can according to the voice flow user A history
Corpus feedback is inquired in library, and discovery has information corresponding with the voice flow such as " to play South Korea《Veteran》This film ", then
In step s 2, determine that equipment 1 can determine whether that user A thinks is " film for having veteran ", rather than " has old thin film
", " film for having old beast " or other similar voice.
Here, the present invention passes through according to the current sound for receiving intelligent terminal corresponding to the natural language instructions that user inputs
It answers history corpus corresponding to information and the user to feed back library, the natural-sounding order is parsed, is further mentioned
The high efficiency and accuracy of speech recognition, further improves the interactive voice experience of user.
If the current response message includes multiple response contents items and each response contents item corresponds to multiple subitems
Content, wherein in step s 2, determine equipment 1 according to the multiple response contents item and each response contents item institute
Corresponding multiple subitem contents, parse the natural-sounding order, to obtain corresponding speech recognition result.
For example, it is assumed that current response message is that (multiple take-away dining rooms of the page are in recommendation take-away dining room as shown in Figure 2
For multiple response contents items included by the current response message, multiple vegetables in each dining room constitute corresponding multiple
Subitem content), user A says " taking a shredded pork and eggs with dired mushroom " at this time, if only dining room " gold million " has this vegetable, in step S2
In, determine that equipment 1 can be more according to corresponding to " gold million " this response contents item in current response message as shown in Figure 2
A subitem content, i.e. vegetable possessed by " gold million ", can determine whether that user A thinks is " taking a shredded pork and eggs with dired mushroom ", rather than "
A clover meat " or other similar voice.
Here, the present invention passes through multiple response contents items according to included by the current response message and each institute
Multiple subitem contents corresponding to response contents item are stated, the natural-sounding order is parsed, so that although each
Multiple subitem contents corresponding to response contents item are not depicted on present user interface, and the present invention can also accurately identify user's
Natural-sounding order, further improves the efficiency and accuracy of speech recognition, and improves the interactive voice experience of user.
Optionally, in step s 2, determine that equipment 1 can also be whole according to intelligence corresponding to the natural language instructions is received
Content information associated by the current response message and the current response message at end carries out the natural-sounding order
Parsing, to obtain corresponding speech recognition result.Specifically, in step s 2, determine equipment 1 first by such as to the intelligence
Server corresponding to energy terminal sends the inquiry request of content information associated by the current response message, and receives service
Content information associated by the current response message that device is returned based on the inquiry request, to obtain the current response letter
The associated content information of breath;Then, further according to interior associated by the current response message and the current response message
Hold information to parse the natural-sounding order, to obtain corresponding speech recognition result.
Here, content information associated by the current response message refers to items associated by the current response message
Satellite information etc..For example, associated content information includes the meal that current vegetable is belonged to if current response message is vegetable
Shop, other content related with the restaurant such as restaurant name/style of cooking/vegetable/business hours/evaluation/geographical location/etc.;If current
Response message is restaurant, and associated content information includes vegetable possessed by current restaurant, related with the restaurant in other
The appearance such as restaurant name/style of cooking/vegetable/business hours/evaluation/geographical location/;If current response message is film, associated
Content information includes type/age/protagonist of current movie etc..
Here, for content information associated by the current response message, may be in response to user's execution in UI
Hold relevant operation and is shown in the user interface of the intelligent terminal.
For example, for response message as shown in Figure 3, it is assumed that user A says " the 4th is double-screen flip " or the " the 4th
Good a few taste Two bors d's oeuveres meals " then in step s 2 determine equipment 1 according to response message as shown in Figure 3, to the natural language of user A
Speech order " the 4th is double-screen flip " or " the 4th good a few taste Two bors d's oeuveres meals " are parsed, because of response as shown in Figure 3
Information Chinese Restaurant " Yoshinoya " and its attached dish information, can determine whether that user A thinks is " the 4th lucky taste Two bors d's oeuveres meal ".
For another example, for response message as shown in Figure 4, it is assumed that user A says what " Zhan Nifo Sean Connery " was drilled, then in step S2
In, determine that equipment 1 according to response message as shown in Figure 4, solves the natural language instructions " Zhan Nifo Sean Connery " of user A
Analysis, because of film " advancing bravely " in response message as shown in Figure 4 and its attached protagonist information, can determine whether what user A thought
" Janney Buddhist Sean Connery is drilled ".
Here, the present invention according to the current response for receiving intelligent terminal corresponding to the natural language instructions by believing
Content information associated by breath and the current response message, parses the natural-sounding order, to not only may be used
It eliminates the ambiguity of natural language instructions, facilitate voice error correction, correspondingly, also further improve the efficiency and standard of speech recognition
Exactness, and improve the interactive voice experience of user.
In one embodiment, determine that equipment 1 further includes step S3 (not shown) and step S4 (not shown).Specifically,
In step s3, determine that equipment 1 according to institute's speech recognition result, determines voice response corresponding to the natural-sounding order
Information;In step s 4, determine that the voice-response information is supplied to the user by equipment 1.
For example, in step s 2, determining equipment 1 to this certainly the natural language instructions " U ancient cooking vessel emits dish " of user A input
Right verbal order is parsed, and is obtained corresponding speech recognition result, then, in step s3, is determined equipment 1 according to it in step
The speech recognition result obtained in rapid S2, determine corresponding voice-response information be such as " the recommendation vegetable that U ancient cooking vessel emits dish ", then,
In step s 4, determine that the voice-response information is supplied to user A by equipment 1, as voice play " for you recommend U ancient cooking vessel emit dish with
Lower vegetable ", at the same show on the interface of smart television related dish information (such as title, price, evaluation, vegetable picture
Deng).
In a still further embodiment, if the voice-response information is described in the multiple and described voice-response information belongs to
Subitem content, wherein determine that equipment 1 further includes step S5 (not shown).Specifically, in step s 5, determine equipment 1 from multiple
Preferred voice-response information is filtered out in the voice-response information;Wherein, it is described in step s 4, determine that equipment 1 will be described
It is preferred that voice-response information is supplied to the user.
For example, it is assumed that current response message is that (multiple take-away dining rooms of the page are in recommendation take-away dining room as shown in Figure 2
For multiple response contents items included by the current response message, multiple vegetables in each dining room constitute corresponding multiple
Subitem content), user A says " taking a shredded pork and eggs with dired mushroom " at this time, for this voice flow, if in addition to dining room " gold million " has " the wooden palpus
This vegetable of meat ", " 25 piece half " also have vegetable " clover meat ", and in step s 2, determine equipment 1 according to such as Fig. 2 institute
(i.e. " gold million " is had multiple subitem contents corresponding to " gold million " this response contents item in the current response message shown
Some vegetables), and multiple subitem content (i.e. " 25 piece half " institutes corresponding to " 25 piece half " this response contents item
The vegetable having), it can determine whether that " shredded pork and eggs with dired mushroom " and " clover meat " is the speech recognition result of the voice flow " muxurou " of user A,
Correspondingly, in step s3, determine that equipment 1 according to the speech recognition result, determines that corresponding voice-response information is such as " gold hundred
Ten thousand vegetable ' shredded pork and eggs with dired mushroom ' and 25 piece half of vegetable ' clover meat ' ";Then, in step s 5, determine equipment 1 from multiple
Preferred voice-response information is filtered out in voice-response information, such as according to the preference of user, natural quality, social property factor
It is screened, such as assumes that user A preference vegetable " shredded pork and eggs with dired mushroom " then in step s 5 determines that equipment 1 can determine preferred voice response
Information is " vegetable ' shredded pork and eggs with dired mushroom ' of gold million ";Then, in step s 4, determine that equipment 1 mentions the preferred voice-response information
The user is supplied, if voice plays " vegetable ' shredded pork and eggs with dired mushroom ' for recommending gold million for you ", while on the interface of smart television
Show related dish information (such as title, price, evaluation, vegetable picture).
Optionally, in step s 4, determine that multiple voice-response informations also directly can be supplied to the use by equipment 1
Family, so that user screens in multiple voice-response informations.
Fig. 6 shows the block diagram for being suitable for the exemplary computer system/server for being used to realize embodiment of the present invention.Figure
The computer system/servers 2 of 6 displays are only an example, should not function and use scope band to the embodiment of the present invention
Carry out any restrictions.
As shown in fig. 6, computer system/server 2 is showed in the form of universal computing device.Computer system/service
The component of device 2 can include but is not limited to:One or more processor or processing unit 21, system storage 22, connection
The bus 23 of different system components (including system storage 22 and processing unit 21).
Bus 23 indicates one of a few class bus structures or a variety of, including memory bus or Memory Controller,
Peripheral bus, graphics acceleration port, processor or the local bus using any bus structures in a variety of bus structures.It lifts
For example, these architectures include but is not limited to industry standard architecture (ISA) bus, microchannel architecture (MAC)
Bus, enhanced isa bus, Video Electronics Standards Association (VESA) local bus and peripheral component interconnection (PCI) bus.
Computer system/server 2 typically comprises a variety of computer system readable media.These media can be any
The usable medium that can be accessed by computer system/server 2, including volatile and non-volatile media, movably and not
Moveable medium.
System storage 22 may include the computer system readable media of form of volatile memory, such as arbitrary access
Memory (RAM) 221 and/or cache memory 222.Computer system/server 2 may further include other removable
Dynamic/immovable, volatile/non-volatile computer system storage medium.Only as an example, storage system 223 can be used
In reading and writing immovable, non-volatile magnetic media (Fig. 6 do not show, commonly referred to as " hard disk drive ").Although not showing in Fig. 6
Out, the disc driver for reading and writing to removable non-volatile magnetic disk (such as " floppy disk ") can be provided, and to removable
The CD drive of anonvolatile optical disk (such as CD-ROM, DVD-ROM or other optical mediums) read-write.In these cases,
Each driver can be connected by one or more data media interfaces with bus 23.System storage 22 may include to
A few program product, the program product have one group of (for example, at least one) program module, these program modules are configured to
Execute the function of various embodiments of the present invention.
Program/utility 224 with one group of (at least one) program module 225, can store and deposit in such as system
In reservoir 22, such program module 225 include --- but being not limited to --- operating system, one or more application program,
It may include the reality of network environment in other program modules and program data, each of these examples or certain combination
It is existing.Program module 225 usually executes function and/or method in embodiment described in the invention.
Computer system/server 2 can also be with one or more external equipments 25 (such as keyboard, sensing equipment, display
Device 24 etc.) communication, the equipment interacted with the computer system/server 2 communication can be also enabled a user to one or more,
And/or with enable the computer system/server 2 and one or more of the other any equipment for being communicated of calculating equipment
(such as network interface card, modem etc.) communication.This communication can be carried out by input/output (I/O) interface 26.Also,
Computer system/server 2 can also by network adapter 20 and one or more network (such as local area network (LAN), extensively
Domain net (WAN) and/or public network, such as internet) communication.As shown in fig. 6, network adapter 20 is by bus 23 and calculates
Other modules of machine systems/servers 2 communicate.It should be understood that computer system/service can be combined although being not shown in Fig. 6
Device 2 uses other hardware and/or software module, including but not limited to:Microcode, device driver, redundant processing unit, outside
Disk drive array, RAID system, tape drive and data backup storage system etc..
Processing unit 21 by the program that is stored in system storage 22 of operation, thereby executing various function application and
Data processing, such as realize a kind of following method for determining speech recognition result, wherein this approach includes the following steps:
A obtains the natural language instructions of user's input;
B is according to the current response message for receiving intelligent terminal corresponding to the natural language instructions, to the natural language
Sound order is parsed, to obtain corresponding speech recognition result.
It should be noted that the present invention can be carried out in the assembly of software and/or software and hardware, for example, can adopt
With specific integrated circuit (ASIC), general purpose computer or any other realized similar to hardware device.In one embodiment
In, software program of the invention can be executed by processor to realize steps described above or son.Similarly, of the invention soft
Part program (including relevant data structure) can be stored in computer readable recording medium, for example, RAM memory, magnetic
Or CD-ROM driver or floppy disc and similar devices.In addition, hardware can be used to realize in some steps of the invention or son, for example,
As the circuit cooperated with processor thereby executing each step or son.
In addition, a part of the invention can be applied to computer program product, such as computer program instructions, when its quilt
When computer executes, by the operation of the computer, it can call or provide according to the method for the present invention and/or technical solution.
And the program instruction of method of the invention is called, it is possibly stored in fixed or moveable recording medium, and/or pass through
Broadcast or the data flow in other signal-bearing mediums and transmitted, and/or be stored according to described program instruction operation
In the working storage of computer equipment.Here, according to one embodiment of present invention including a device, which includes using
Memory in storage computer program instructions and processor for executing program instructions, wherein when the computer program refers to
When enabling by processor execution, method and/or skill of the device operation based on aforementioned multiple embodiments according to the present invention are triggered
Art scheme.
It is obvious to a person skilled in the art that invention is not limited to the details of the above exemplary embodiments, Er Qie
In the case where without departing substantially from spirit or essential attributes of the invention, the present invention can be realized in other specific forms.Therefore, no matter
From the point of view of which point, the present embodiments are to be considered as illustrative and not restrictive, and the scope of the present invention is by appended power
Benefit requires rather than above description limits, it is intended that all by what is fallen within the meaning and scope of the equivalent elements of the claims
Variation is included in the present invention.Any reference signs in the claims should not be construed as limiting the involved claims.This
Outside, it is clear that one word of " comprising " does not exclude other units or steps, and odd number is not excluded for plural number.That states in device claim is multiple
Unit or device can also be implemented through software or hardware by a unit or device.The first, the second equal words are used to table
Show title, and does not indicate any particular order.
Claims (20)
1. a kind of method for determining speech recognition result, wherein this approach includes the following steps:
A obtains the natural language instructions of user's input;
B orders the natural-sounding according to the current response message for receiving intelligent terminal corresponding to the natural language instructions
Order is parsed, to obtain corresponding speech recognition result.
2. according to the method described in claim 1, wherein, the response mode of the current response message includes following at least any
?:
The current response message is that the previous interactive instruction based on the user responds;
The current response message is not based on the previous interactive instruction response of the user.
3. according to the method described in claim 2, wherein, if the current response message is the previous interaction based on the user
Instruction response, the current response message is shown in the user interface of the intelligent terminal, alternatively, the current response message
Including multiple pagings and its be the non-homepage in multiple pagings.
4. according to the method described in claim 1, wherein, shown current response message is that the user is executing the natural language
Response contents corresponding to each interactive instruction in performed continuous multiple interactive instructions before speech order.
5. method according to claim 1 to 4, wherein if the current response message includes multiple responses
Content item and each response contents item corresponds to multiple subitem contents, wherein the step b includes:
According to multiple subitem contents corresponding to the multiple response contents item and each response contents item, to described
Natural-sounding order is parsed, to obtain corresponding speech recognition result.
6. method according to claim 1 to 4, wherein the step b includes:
It is right according to the current response message for receiving intelligent terminal corresponding to the natural language instructions and user institute
The history corpus feedback library answered, parses the natural-sounding order, to obtain corresponding speech recognition result.
7. method according to claim 1 to 4, wherein the step b includes:
According to the current response message and the current response for receiving intelligent terminal corresponding to the natural language instructions
Content information associated by information parses the natural-sounding order, to obtain corresponding speech recognition result.
8. method according to any one of claim 1 to 7, wherein this method further includes step:
According to institute's speech recognition result, voice-response information corresponding to the natural-sounding order is determined;
The voice-response information is supplied to the user by n.
9. according to the method described in claim 8, wherein, claim 8 quotes claim 5, if the voice-response information
Belong to the subitem content for the multiple and described voice-response information, wherein this method further includes step:
Preferred voice-response information is filtered out from multiple voice-response informations;
Wherein, the step n includes:
The preferred voice-response information is supplied to the user.
10. one kind is for determining speech recognition result locking equipment really, wherein the determination equipment includes:
Acquisition device, for obtaining the natural language instructions of user's input;
Resolver, for the current response message of the intelligent terminal according to corresponding to the reception natural language instructions, to institute
It states natural-sounding order to be parsed, to obtain corresponding speech recognition result.
11. determining equipment according to claim 10, wherein the response mode of the current response message include with down toward
It is any one of few:
The current response message is that the previous interactive instruction based on the user responds;
The current response message is not based on the previous interactive instruction response of the user.
12. determining equipment according to claim 11, wherein if the current response message is based on before the user
One interactive instruction response, the current response message is shown in the user interface of the intelligent terminal, alternatively, the current sound
Answer information include multiple pagings and its be the non-homepage in multiple pagings.
13. determining equipment according to claim 10, wherein shown current response message is the user described in the execution
Response contents corresponding to each interactive instruction in performed continuous multiple interactive instructions before natural language instructions.
14. locking equipment really described in any one of 0 to 13 according to claim 1, wherein if the current response message includes more
A response contents item and each response contents item corresponds to multiple subitem contents, wherein the resolver is used for:
According to multiple subitem contents corresponding to the multiple response contents item and each response contents item, to described
Natural-sounding order is parsed, to obtain corresponding speech recognition result.
15. locking equipment really described in any one of 0 to 13 according to claim 1, wherein the resolver is used for:
It is right according to the current response message for receiving intelligent terminal corresponding to the natural language instructions and user institute
The history corpus feedback library answered, parses the natural-sounding order, to obtain corresponding speech recognition result.
16. locking equipment really described in any one of 0 to 13 according to claim 1, wherein the resolver is used for:
According to the current response message and the current response for receiving intelligent terminal corresponding to the natural language instructions
Content information associated by information parses the natural-sounding order, to obtain corresponding speech recognition result.
17. locking equipment really described in any one of 0 to 16 according to claim 1, wherein the determination equipment further includes:
Determining device, for determining that voice response corresponding to the natural-sounding order is believed according to institute's speech recognition result
Breath;
Device is provided, for the voice-response information to be supplied to the user.
18. determining equipment according to claim 17, wherein claim 17 quotes claim 14, if the voice
Response message is that the multiple and voice-response information belongs to the subitem content, wherein the determination equipment further includes:
Screening plant, for filtering out preferred voice-response information from multiple voice-response informations;
Wherein, the offer device is used for:
The preferred voice-response information is supplied to the user.
19. a kind of calculating equipment, including:
One or more processors;
Memory, for storing one or more programs,
When one or more of programs are executed by one or more of processors, so that one or more of processors
Execute method as claimed in any one of claims 1-9 wherein.
20. a kind of computer readable storage medium, is stored thereon with computer program, wherein when the program is executed by processor
Realize method as claimed in any one of claims 1-9 wherein.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810290479.5A CN108920125B (en) | 2018-04-03 | 2018-04-03 | It is a kind of for determining the method and apparatus of speech recognition result |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810290479.5A CN108920125B (en) | 2018-04-03 | 2018-04-03 | It is a kind of for determining the method and apparatus of speech recognition result |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108920125A true CN108920125A (en) | 2018-11-30 |
CN108920125B CN108920125B (en) | 2019-10-18 |
Family
ID=64402933
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810290479.5A Active CN108920125B (en) | 2018-04-03 | 2018-04-03 | It is a kind of for determining the method and apparatus of speech recognition result |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108920125B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109859761A (en) * | 2019-02-22 | 2019-06-07 | 安徽卓上智能科技有限公司 | A kind of intelligent sound interaction control method |
CN110211589A (en) * | 2019-06-05 | 2019-09-06 | 广州小鹏汽车科技有限公司 | Awakening method, device and vehicle, the machine readable media of onboard system |
CN111096680A (en) * | 2019-12-31 | 2020-05-05 | 广东美的厨房电器制造有限公司 | Cooking equipment, electronic equipment, voice server, voice control method and device |
CN113330489A (en) * | 2019-05-20 | 2021-08-31 | 深圳市欢太科技有限公司 | Voice information processing method, device, electronic equipment and storage medium |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130218573A1 (en) * | 2012-02-21 | 2013-08-22 | Yiou-Wen Cheng | Voice command recognition method and related electronic device and computer-readable medium |
CN103377652A (en) * | 2012-04-25 | 2013-10-30 | 上海智臻网络科技有限公司 | Method, device and equipment for carrying out voice recognition |
CN104978964A (en) * | 2014-04-14 | 2015-10-14 | 美的集团股份有限公司 | Voice control instruction error correction method and system |
CN105469789A (en) * | 2014-08-15 | 2016-04-06 | 中兴通讯股份有限公司 | Voice information processing method and voice information processing terminal |
CN106796788A (en) * | 2014-08-28 | 2017-05-31 | 苹果公司 | Automatic speech recognition is improved based on user feedback |
CN106910498A (en) * | 2017-03-01 | 2017-06-30 | 成都启英泰伦科技有限公司 | The method for improving voice control command word discrimination |
CN107066227A (en) * | 2013-01-07 | 2017-08-18 | 三星电子株式会社 | Display device and the method for controlling display device |
CN107342083A (en) * | 2017-07-05 | 2017-11-10 | 百度在线网络技术(北京)有限公司 | Method and apparatus for providing voice service |
CN107680590A (en) * | 2017-09-18 | 2018-02-09 | 北京小蓦机器人技术有限公司 | A kind of method, equipment and storage medium for being used to handle natural language instructions |
-
2018
- 2018-04-03 CN CN201810290479.5A patent/CN108920125B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130218573A1 (en) * | 2012-02-21 | 2013-08-22 | Yiou-Wen Cheng | Voice command recognition method and related electronic device and computer-readable medium |
CN103377652A (en) * | 2012-04-25 | 2013-10-30 | 上海智臻网络科技有限公司 | Method, device and equipment for carrying out voice recognition |
CN107066227A (en) * | 2013-01-07 | 2017-08-18 | 三星电子株式会社 | Display device and the method for controlling display device |
CN104978964A (en) * | 2014-04-14 | 2015-10-14 | 美的集团股份有限公司 | Voice control instruction error correction method and system |
CN105469789A (en) * | 2014-08-15 | 2016-04-06 | 中兴通讯股份有限公司 | Voice information processing method and voice information processing terminal |
CN106796788A (en) * | 2014-08-28 | 2017-05-31 | 苹果公司 | Automatic speech recognition is improved based on user feedback |
CN106910498A (en) * | 2017-03-01 | 2017-06-30 | 成都启英泰伦科技有限公司 | The method for improving voice control command word discrimination |
CN107342083A (en) * | 2017-07-05 | 2017-11-10 | 百度在线网络技术(北京)有限公司 | Method and apparatus for providing voice service |
CN107680590A (en) * | 2017-09-18 | 2018-02-09 | 北京小蓦机器人技术有限公司 | A kind of method, equipment and storage medium for being used to handle natural language instructions |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109859761A (en) * | 2019-02-22 | 2019-06-07 | 安徽卓上智能科技有限公司 | A kind of intelligent sound interaction control method |
CN113330489A (en) * | 2019-05-20 | 2021-08-31 | 深圳市欢太科技有限公司 | Voice information processing method, device, electronic equipment and storage medium |
CN110211589A (en) * | 2019-06-05 | 2019-09-06 | 广州小鹏汽车科技有限公司 | Awakening method, device and vehicle, the machine readable media of onboard system |
CN110211589B (en) * | 2019-06-05 | 2022-03-15 | 广州小鹏汽车科技有限公司 | Awakening method and device of vehicle-mounted system, vehicle and machine readable medium |
CN111096680A (en) * | 2019-12-31 | 2020-05-05 | 广东美的厨房电器制造有限公司 | Cooking equipment, electronic equipment, voice server, voice control method and device |
Also Published As
Publication number | Publication date |
---|---|
CN108920125B (en) | 2019-10-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108920125B (en) | It is a kind of for determining the method and apparatus of speech recognition result | |
US20230042931A1 (en) | Menu Personalization | |
CN105607756B (en) | Information recommendation method and device | |
US8296194B2 (en) | Method, medium, and system for ranking dishes at eating establishments | |
US10163090B1 (en) | Method and system for tagging of content | |
US20170212961A1 (en) | Method, client device and server of accessing network information through graphic code | |
CN104537000B (en) | A kind of method and apparatus for pushed information | |
US20130304469A1 (en) | Information processing method and apparatus, computer program and recording medium | |
CN107341350A (en) | Food materials intelligent management and device, server, intelligent refrigerator, storage medium | |
WO2012064664A2 (en) | Mobile-based real-time food-and-beverage recommendation system | |
CN111259281B (en) | Method and device for determining merchant label and storage medium | |
CN107993008A (en) | Definite method, apparatus, culinary art processing equipment and the storage medium of cooking sequence | |
CN108573393B (en) | Comment information processing method and device, server and storage medium | |
US9582835B2 (en) | Apparatus, system, and method for searching for power user in social media | |
CN103001855B (en) | A kind of client and customer group divide and the method for information transmission | |
WO2018200295A1 (en) | Chat conversation based on knowledge base specific to object | |
WO2016061326A1 (en) | Suggesting activities | |
CN101907967A (en) | Virtual scene-based ordering method and equipment | |
CN107182209A (en) | Detect digital content observability | |
US11650981B1 (en) | Method and system for reducing scan-time for single user query | |
CN108257596B (en) | Method and equipment for providing target presentation information | |
Xu et al. | Change and continuity in Hurstville’s Chinese restaurants: An ethnographic linguistic landscape study in Sydney | |
US20130268543A1 (en) | System and method for recommending content | |
CN201465029U (en) | Virtual scene dish ordering method-based equipment | |
WO2020034521A1 (en) | Display method and apparatus, terminal and readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20220826 Address after: Room 35201, 5th Floor, Zone 2, Building 3, No. 2, Zhuantang Science and Technology Economic Zone, Xihu District, Hangzhou City, Zhejiang Province, 310024 Patentee after: Hangzhou suddenly Cognitive Technology Co.,Ltd. Address before: 100080 Room 401, gate 2, east area, block a, 768 Industrial Park, No.5, Xueyuan Road, Haidian District, Beijing Patentee before: BEIJING XIAOMO ROBOT TECHNOLOGY CO.,LTD. |