CN108920125B

CN108920125B - It is a kind of for determining the method and apparatus of speech recognition result

Info

Publication number: CN108920125B
Application number: CN201810290479.5A
Authority: CN
Inventors: 李国华; 戴帅湘
Original assignee: Beijing Moran Cognitive Technology Co Ltd
Current assignee: Hangzhou Suddenly Cognitive Technology Co ltd
Priority date: 2018-04-03
Filing date: 2018-04-03
Publication date: 2019-10-18
Anticipated expiration: 2038-04-03
Also published as: CN108920125A

Abstract

The object of the present invention is to provide a kind of for determining the method and apparatus of speech recognition result.Specifically, the natural language instructions of user's input are obtained；According to the current response message for receiving intelligent terminal corresponding to the natural language instructions, the natural-sounding order is parsed, to obtain corresponding speech recognition result.Compared with prior art, the present invention realizes the efficiency and accuracy for improving speech recognition, also improves the interactive voice experience of user.

Description

It is a kind of for determining the method and apparatus of speech recognition result

Technical field

The present invention relates to technical field of voice recognition more particularly to a kind of for determining the technology of speech recognition result.

Background technique

Speech recognition technology is exactly to allow machine that voice signal is changed into accordingly by identification and understanding process in simple terms Text technology, cut and appear in fields such as household electrical appliances, automotive electronics, consumption electronic products, great convenience people with The interaction of equipment.

Existing speech recognition technology usually only considers voice signal itself in speech recognition process, and then combines complete The language model of office's training carries out speech recognition and optimization.However, user is under the situation inputted using voice, it is to be expressed Content is usually related with content seen in active user, user's factors such as obtained response message during interactive voice, than Such as, user is asking nearby there is what bank, then says how to remove xxx；" xxx " at this time is largely obtained with user the last time The response message (such as neighbouring some Bank Names) taken or a series of continual commands institute before the natural language instructions are right The response message answered is related, and this etc. the contents of factors be dynamic change.Obviously, existing speech recognition technology can not utilize This etc. follows the dynamic factor of different application, different service sides, different service objects and change in time and space to carry out speech recognition, The efficiency and accuracy for reducing speech recognition also affect the interactive voice experience of user.

Summary of the invention

It is an object of the present invention to provide a kind of for determining the method and apparatus of speech recognition result.

According to one embodiment of present invention, a kind of method for determining speech recognition result is provided, wherein the party Method the following steps are included:

A obtains the natural language instructions of user's input；

B is according to the current response message for receiving intelligent terminal corresponding to the natural language instructions, to the natural language Sound order is parsed, to obtain corresponding speech recognition result.

According to another embodiment of the invention, one kind is additionally provided to be used to determine speech recognition result locking equipment really, Wherein, which includes:

Acquisition device, for obtaining the natural language instructions of user's input；

Resolver, for according to the current response message for receiving intelligent terminal corresponding to the natural language instructions, The natural-sounding order is parsed, to obtain corresponding speech recognition result.

According to still another embodiment of the invention, a kind of calculating equipment is additionally provided, comprising:

One or more processors；

Memory, for storing one or more programs,

When one or more of programs are executed by one or more of processors, so that one or more of places It manages device and executes a kind of such as aforementioned method for determining speech recognition result according to an embodiment of the invention.

A still further embodiment according to the present invention additionally provides a kind of computer readable storage medium, is stored thereon with meter Calculation machine program, wherein realized when the program is executed by processor as aforementioned according to an embodiment of the invention a kind of for true Determine the method for speech recognition result.

Compared with prior art, one embodiment of the present of invention passes through according to the natural language instructions institute for receiving user's input The current response message of corresponding intelligent terminal parses the natural-sounding order, to obtain corresponding speech recognition As a result, improving the efficiency and accuracy of speech recognition, the interactive voice experience of user is also improved.

Detailed description of the invention

By reading a detailed description of non-restrictive embodiments in the light of the attached drawings below, of the invention other Feature, objects and advantages will become more apparent upon:

A kind of equipment for determining speech recognition result locking equipment really that Fig. 1 shows one aspect according to the present invention is shown It is intended to；

Fig. 2 shows the current sound of the intelligent terminal of the natural language instructions for receiving user's input of one embodiment of the invention Answer information schematic diagram；

Fig. 3 shows the current of the intelligent terminal of the natural language instructions for receiving user's input of another embodiment of the present invention Response message schematic diagram；

Fig. 4 shows the current of the intelligent terminal of the natural language instructions for receiving user's input of a still further embodiment of the present invention Response message schematic diagram；

Fig. 5 shows according to a further aspect of the present invention a kind of for determining the method flow diagram of speech recognition result；

Fig. 6 shows the block diagram for being suitable for the exemplary computer system/server for being used to realize embodiment of the present invention.

The same or similar appended drawing reference represents the same or similar component in attached drawing.

Specific embodiment

Present invention is further described in detail with reference to the accompanying drawing.

Fig. 1 shows a kind of equipment for determining speech recognition result locking equipment 1 really of one aspect according to the present invention Schematic diagram, wherein determine that equipment 1 includes acquisition device 11 and resolver 12.Specifically, acquisition device 11 obtains user's input Natural language instructions；Resolver 12 is according to the current response letter for receiving intelligent terminal corresponding to the natural language instructions Breath, parses the natural-sounding order, to obtain corresponding speech recognition result.

Where it determines that equipment 1 refers to that one kind being capable of the intelligence according to corresponding to the natural language instructions for receiving user's input The current response message of terminal, parses the natural-sounding order, the equipment to obtain corresponding speech recognition result. In a particular embodiment, it determines that equipment 1 can be realized by intelligent terminal, can also mutually be collected with intelligent terminal by network by the network equipment It is realized at the equipment (being matched by intelligent terminal and the network equipment) constituted, is also used as software module and/or hardware Module is contained in intelligent terminal, be can also be used as hardware device and intelligent terminal and is connected by wired or wireless mode. Here, the network equipment includes but is not limited to such as network host, single network server, multiple network server collection or is based on Set of computers of cloud computing etc. is realized.Here, cloud is by a large amount of hosts or network based on cloud computing (Cloud Computing) Server is constituted, wherein cloud computing is one kind of distributed computing, and one consisting of a loosely coupled set of computers super Grade virtual machine.Here, the intelligent terminal, which can be any one, to pass through keyboard, touch tablet, touch screen, distant with user One or more modes such as control device, interactive voice or handwriting equipment carry out the electronic product of human-computer interaction, such as PC, mobile phone, intelligence It can mobile phone, PDA, wearable device, palm PC PPC, wearable device, tablet computer, intelligent vehicle device, smart television, intelligence Speaker etc..In practical applications, determine equipment 1 be intelligent terminal when, can carry/install thereon can identify, parse, understanding, Handle and respond the natural language instructions of user and by the client (can be APP form) that response results export, it can also To be that the client is only capable of carrying out speech recognition to the natural language instructions that user inputs but the corresponding server of need comes to this certainly Right verbal order parsed, understood, handled and respond user natural language instructions and by response results return client into Row output.The network includes but is not limited to internet, wide area network, Metropolitan Area Network (MAN), local area network, VPN network, wireless self-organization network (Ad Hoc network) etc..Those skilled in the art will be understood that above-mentioned determining equipment 1 is only for example, other are existing or from now on may be used The network equipment or intelligent terminal that can occur such as are applicable to the present invention, should also be included within the scope of protection of the present invention, and This is incorporated herein by reference.Here, the network equipment and intelligent terminal include that one kind can be according to being previously set or store Instruction, the automatic numerical value that carries out calculates and the electronic equipment of information processing, and hardware includes but is not limited to microprocessor, dedicated collection At circuit (ASIC), programmable gate array (FPGA), digital processing unit (DSP), embedded device etc..

In one embodiment, however, it is determined that equipment 1 is the intelligent terminal of user, it is determined that equipment 1 passes through its own first Provided application programming interfaces (API), or by application programming interfaces (API) provided by pick up facility, obtain user The natural language instructions of input；Then, it is determined that the intelligent terminal according to corresponding to the reception natural language instructions of equipment 1 is worked as Preceding response message parses the natural-sounding order, to obtain corresponding speech recognition result.

In another embodiment, however, it is determined that equipment 1 is the equipment that the network equipment and intelligent terminal are integrated, i.e. determination is set Standby 1 matches realization by intelligent terminal and the network equipment, then intelligent terminal passes through its own provided application program first and connects Mouth (API), or by application programming interfaces (API) provided by pick up facility, obtain the natural language instructions of user's input； Then, the natural language instructions are sent to the network equipment by intelligent terminal；And it will receive corresponding to the natural language instructions The current response message of intelligent terminal be also sent to the network equipment；Then, the network equipment is right according to the current response message The natural-sounding order is parsed, to obtain corresponding speech recognition result.

In a still further embodiment, however, it is determined that equipment 1 is the equipment that the network equipment and intelligent terminal are integrated, i.e. determination is set Standby 1 matches realization by intelligent terminal and the network equipment, then intelligent terminal passes through its own provided application program first and connects Mouth (API), or by application programming interfaces (API) provided by pick up facility, obtain the natural language instructions of user's input； Then, the natural language instructions are sent to the network equipment by intelligent terminal；Then, the network equipment is ordered according to the natural language It enables, and the current response message of intelligent terminal corresponding to the reception natural language instructions obtained, to the natural language Speech order is parsed, to obtain corresponding speech recognition result.

Specifically, acquisition device 11 is by application programming interfaces (API) provided by intelligent terminal itself, or passes through all The application programming interfaces as provided by the third party devices such as pick up facility (API) obtain the natural language instructions of user's input.

For example, it is assumed that take-away is ordered in user A plan, open intelligent terminal (such as smart television), carried on the smart television/ It is mounted with the visitor that can be identified, parse, understand, handle and respond the natural language instructions of user and export response results Family end (such as voice assistant APP), user A say " take-away " that dining room is taken out in the recommendation as shown in Figure 2 of corresponding response message, if User A wants to order the take-away in " U ancient cooking vessel emits dish " this family dining room, and then user A says " U ancient cooking vessel emits dish ", then acquisition device 11 passes through intelligence first Application programming interfaces (API) provided by energy TV itself, get the natural language instructions " U ancient cooking vessel emits dish " of user A input.

Then, resolver 12 passes through application programming interfaces (API) provided by such as described intelligent terminal first, or Person is asked by the inquiry for sending current response message to server corresponding to the intelligent terminal (such as speech recognition server) It asks, and receives the current response message that server is returned based on the inquiry request, receive the natural language instructions institute to obtain The current response message of corresponding intelligent terminal, alternatively, the current sound of the reception intelligent terminal active upload can be passed through Information is answered to obtain the current response message；Alternatively, can also obtain the current sound by speech-recognition services provider Answer information；Then, according to the current response message, the natural-sounding order is parsed, to obtain corresponding voice Recognition result.Here, the current response message can be content (the i.e. institute that the present user interface of the intelligent terminal is shown State user's visible content in the user interface of the intelligent terminal), or because of the display screen size of the intelligent terminal Fail in the user interface of the intelligent terminal display but user Ke Tong completely caused by the factors such as limited, front end UI design Cross at least page turning, check more details or be discussed in detail or the current response message associated by every satellite information etc. The content that can be shown in the user interface of the intelligent terminal is operated (for example, the current response message includes multiple at this time Paging and its be the non-homepage in multiple pagings), wherein the response mode of the current response message includes following at least any :

The current response message is that the previous interactive instruction based on the user responds；

The current response message is not based on the previous interactive instruction response of the user.

Herein, it should be noted that above-mentioned current response message can respond under any response mode, below only to lift Example form is illustrated, and is not to be construed as limiting the invention.

For example, if the current response message is that the previous interactive instruction based on the user responds, here, previous friendship Mutually instruction can be the user by keyboard, touch tablet, touch screen, remote controler, interactive voice or handwriting equipment etc. it is a kind of or The interactive instruction that various ways issue, for the natural language instructions " U ancient cooking vessel emits dish " of user A input, then 12 basis of resolver Receive the current response message of the smart television of the natural language instructions, i.e. the content as shown in Figure 2 that shows of smart television, it is right Natural language instructions " U ancient cooking vessel emits dish " are parsed, that is, are based on user's word " U ancient cooking vessel emits dish " this voice flow, and combine and be based on The current response message (i.e. content shown in Fig. 2) that the previous interactive instruction (i.e. natural language instructions " take-away ") of user A is responded, It can accurate judgement user thinks is " U ancient cooking vessel emits dish " rather than " emitting dish a little " or other similar voice.

For another example, if the current response message is not based on the content of the previous interactive instruction response of the user, such as: The current response message is booting homepage shown by the present user interface of the intelligent terminal, the current response message The content for being in response to the interactive instruction in other users and showing, here, the booting homepage can be the page of default, it can also To be the page stopped when shutdown.Assuming that take-away is ordered in the household user B plan of user A, intelligent terminal is opened (such as intelligence electricity Depending on), the natural language instructions of user can be identified, parses, understands, handles and respond simultaneously by carrying/being mounted on the smart television The client (such as voice assistant APP) that response results are exported, user B say " take-away ", corresponding response message such as Fig. 3 institute Dining room is taken out in the recommendation shown, at this time if the busy just explanation user A of user B takes out after renewal, since user A wants to order " one spoon will " The take-away in this family dining room, then user A says " what dish will be one spoon will have ", then resolver 12 is according to the reception natural language instructions Smart television current response message, i.e. the content as shown in Figure 3 that shows of smart television, to " one spoon of natural language instructions To have any dish " it parses, that is, it is based on user's word " what dish will be one spoon will have " this voice flow, and as shown in connection with fig. 3 Content, can accurate judgement user think be " what dish will be one spoon will have " rather than " what dish one spoon of sauce has ", " one spoon of soy sauce is assorted Dish " or other similar voice.

Optionally, the current response message is user company performed before executing the natural language instructions Continue response contents corresponding to each interactive instruction in multiple interactive instructions.

For example, for response message as shown in Figure 3, it is assumed that user A says " the 4th ", in the active user of smart television After the vegetable for seeing " the 4th " dining room " Yoshinoya " on interface, user A discovery is not fond of eating, because depositing in the memory of user A There are the dining rooms " one spoon is incited somebody to action " in corresponding information as shown in Figure 3, and then user A says " what dish will be one spoon will have ", then parses dress Set 12 according to receive the natural language instructions smart television current response message, i.e. user A execute natural language instructions Response contents corresponding to each interactive instruction in performed continuous multiple interactive instructions before " what dish will be one spoon will have ", such as The content as shown in Figure 3 that smart television is shown parses natural language instructions " what dish will be one spoon will have ", i.e., based on use Family word " what dish will be one spoon will have " this voice flow, and content as shown in connection with fig. 3, can accurate judgement user think be " what dish will be one spoon will have " rather than " what dish one spoon of sauce has ", " one spoon soy sauce what dish " or other similar voice.

Optionally, described current if the current response message is that the previous interactive instruction based on the user responds Response message is shown in the user interface of the intelligent terminal, alternatively, the current response message include multiple pagings and its be Non- homepage in multiple pagings.

For example, it is assumed that film ticket is ordered in user A plan, open intelligent terminal (such as smart television), is taken on the smart television It carries/is mounted with the natural language instructions that can identify, parse, understand, handle and respond user and export response results Client (such as voice assistant APP), user A says " film ticket " that corresponding response message is since content is more, paging exhibition Show, user A in the user interface of smart television it is seen that the first paged content, as shown in Fig. 4 (a), other paged contents Need user by least such as page-turning instruction can in the user interface of smart television as it can be seen that as shown in Fig. 4 (b), if User thinks the film ticket of order film " old beast ", then says " film for having old beast ", then resolver 12 should be certainly according to reception The content as shown in Fig. 4 (a)-(b) that the current response message of the smart television of right verbal order, i.e. smart television are shown is right Natural language instructions " film for having old beast " are parsed, can accurate judgement user thinks is " film for having old beast ", Although content shown in Fig. 4 (b) is not depicted on present user interface, if being based purely on user's word " has the film of old beast " this voice flow, it may appear that the situation of system identification mistake, as being identified as " having old thin film ", " having veteran Film " etc..

Those skilled in the art will be understood that above-mentioned current response message is only for example, other are existing or may go out from now on Existing current response message is such as applicable to the present invention, should also be included within the scope of protection of the present invention, and herein with reference side Formula is incorporated herein.

Those skilled in the art will be understood that the response mode of above-mentioned current response message is only for example, other it is existing or The response mode for the current response message being likely to occur from now on is such as applicable to the present invention, should also be included in the scope of the present invention Within, and be incorporated herein by reference.

Here, the present invention passes through according to the current sound for receiving intelligent terminal corresponding to the natural language instructions that user inputs Information is answered, the natural-sounding order is parsed, to obtain corresponding speech recognition result, improves the effect of speech recognition Rate and accuracy also improve the interactive voice experience of user.

Optionally, resolver 12 can also be according to the current sound for receiving intelligent terminal corresponding to the natural language instructions It answers history corpus corresponding to information and the user to feed back library, the natural-sounding order is parsed, with acquisition pair The speech recognition result answered.

Here, history corpus feedback library cover the user with intelligent terminal (can be same intelligent terminal, It is also possible to multiple and different intelligent terminals) what is said or talked about in all dialogues by the user during interactive voice, and it optionally, should It is renewable that corpus feeds back library.

For example, also for the natural language instructions " film for having old beast " of user A, it is assumed that do not have in current response message Information relevant to voice flow " laoshou ", then resolver 12 can feed back library in the history corpus of user A according to the voice flow In inquired, discovery has information corresponding with the voice flow such as " play South Korea " veteran " this film ", then resolver 12 Can determine whether that user A thinks is " film for having veteran ", rather than " having old thin film ", " film for having old beast ", Or other similar voice.

Here, the present invention passes through according to the current sound for receiving intelligent terminal corresponding to the natural language instructions that user inputs It answers history corpus corresponding to information and the user to feed back library, the natural-sounding order is parsed, is further mentioned The high efficiency and accuracy of speech recognition, further improves the interactive voice experience of user.

If the current response message includes multiple response contents items and each response contents item corresponds to multiple subitems Content, wherein resolver 12 is according to multiple corresponding to the multiple response contents item and each response contents item Subitem content parses the natural-sounding order, to obtain corresponding speech recognition result.

For example, it is assumed that current response message is that (multiple take-away dining rooms of the page are in recommendation take-away dining room as shown in Figure 2 For multiple response contents items included by the current response message, multiple vegetables in each dining room constitute corresponding multiple Subitem content), user A says " taking a shredded pork and eggs with dired mushroom " at this time, if only dining room " Jin Baiwan " has this vegetable, resolver 12 Can multiple subitem contents according to corresponding to this response contents item of " Jin Baiwan " in current response message as shown in Figure 2, That is vegetable possessed by " Jin Baiwan ", can determine whether that user A thinks is " taking a shredded pork and eggs with dired mushroom ", rather than " carrys out a clover Meat " or other similar voice.

Here, the present invention passes through multiple response contents items according to included by the current response message and each institute Multiple subitem contents corresponding to response contents item are stated, the natural-sounding order is parsed, so that although each Multiple subitem contents corresponding to response contents item are not depicted on present user interface, and the present invention can also accurately identify user's Natural-sounding order, further improves the efficiency and accuracy of speech recognition, and improves the interactive voice experience of user.

Optionally, resolver 12 can also be according to the current sound for receiving intelligent terminal corresponding to the natural language instructions Content information associated by information and the current response message is answered, the natural-sounding order is parsed, to obtain Corresponding speech recognition result.Specifically, resolver 12 is first by such as to server corresponding to the intelligent terminal The inquiry request of content information associated by the current response message is sent, and receives server and is returned based on the inquiry request The current response message associated by content information, to obtain content information associated by the current response message； Then, further according to content information associated by the current response message and the current response message to the natural language Sound order is parsed, to obtain corresponding speech recognition result.

Here, content information associated by the current response message refers to items associated by the current response message Satellite information etc..For example, associated content information includes the meal that current vegetable is belonged to if current response message is vegetable Shop, other content related with the restaurant such as restaurant name/style of cooking/vegetable/business hours/evaluation/geographical location/etc.；If current Response message is restaurant, and associated content information includes vegetable possessed by current restaurant, related with the restaurant in other The appearance such as restaurant name/style of cooking/vegetable/business hours/evaluation/geographical location/；If current response message is film, associated Content information includes type/age/protagonist of current movie etc..

Here, for content information associated by the current response message, may be in response to user's execution in UI Hold relevant operation and is shown in the user interface of the intelligent terminal.

For example, for response message as shown in Figure 3, it is assumed that user A says " the 4th is double-screen flip " or the " the 4th Good a few taste Two bors d's oeuveres meals ", then resolver 12 is according to response message as shown in Figure 3, to the natural language instructions " of user A Four as double-screen flips " or " the 4th good a few taste Two bors d's oeuveres meals " are parsed, because of response message Chinese Restaurant as shown in Figure 3 " Yoshinoya " and its attached dish information, can determine whether that user A thinks is " the 4th lucky taste Two bors d's oeuveres meal ".

For another example, for response message as shown in Figure 4, it is assumed that user A says what " Zhan Nifo Sean Connery " was drilled, then resolver 12, according to response message as shown in Figure 4, parse the natural language instructions " Zhan Nifo Sean Connery " of user A, because of such as Fig. 4 Shown in film " advancing bravely " and its attached protagonist information in response message, that can determine whether that user A thinks is " Janney Fo Kang What Na Li was drilled ".

Here, the present invention according to the current response for receiving intelligent terminal corresponding to the natural language instructions by believing Content information associated by breath and the current response message, parses the natural-sounding order, to not only may be used It eliminates the ambiguity of natural language instructions, facilitate voice error correction, correspondingly, also further improve the efficiency and standard of speech recognition Exactness, and improve the interactive voice experience of user.

In one embodiment, determine that equipment 1 further includes determining device (not shown) and offer device (not shown).Specifically Ground, determining device determine voice-response information corresponding to the natural-sounding order according to institute's speech recognition result；It provides The voice-response information is supplied to the user by device.

For example, resolver 12 is to the natural language instructions for the natural language instructions " U ancient cooking vessel emits dish " of user A input It is parsed, obtains corresponding speech recognition result, then, it is determined that device obtains speech recognition result according to resolver 12, Determine that corresponding voice-response information is as " the recommendation vegetable that U ancient cooking vessel emits dish " then provides device and mention the voice-response information User A is supplied, if voice plays " recommending U ancient cooking vessel to emit the following vegetable of dish for you ", while showing correlation on the interface of smart television Dish information (such as title, price, evaluation, vegetable picture).

In a still further embodiment, if the voice-response information is described in the multiple and described voice-response information belongs to Subitem content, wherein determine that equipment 1 further includes screening plant (not shown).Specifically, screening plant is from multiple institute's Voice Responses It answers and filters out preferred voice-response information in information；Wherein, the preferred voice-response information is supplied to by the offer device The user.

For example, it is assumed that current response message is that (multiple take-away dining rooms of the page are in recommendation take-away dining room as shown in Figure 2 For multiple response contents items included by the current response message, multiple vegetables in each dining room constitute corresponding multiple Subitem content), user A says " taking a shredded pork and eggs with dired mushroom " at this time, for this voice flow, if in addition to dining room " Jin Baiwan " has " the wooden palpus This vegetable of meat ", " 25 piece half " also has vegetable " clover meat ", and resolver 12 is according to current sound as shown in Figure 2 Multiple subitem contents (i.e. vegetable possessed by " Jin Baiwan ") corresponding to " Jin Baiwan " this response contents item in information are answered, And multiple subitem content (i.e. dishes possessed by " 25 piece half " corresponding to " 25 piece half " this response contents item Product), it can determine whether that " shredded pork and eggs with dired mushroom " and " clover meat " is the speech recognition result of the voice flow " muxurou " of user A, correspondingly, Determining device according to the speech recognition result, determine corresponding voice-response information be as " vegetable ' shredded pork and eggs with dired mushroom ' of Jin Baiwan and 25 piece half of vegetable ' clover meat ' "；Then, screening plant filters out preferred voice from multiple voice-response information and rings Information is answered, such as is screened according to the preference of user, natural quality, social property factor, such as assumes user A preference vegetable " shredded pork and eggs with dired mushroom ", then screening plant can determine that preferred voice-response information is " vegetable ' shredded pork and eggs with dired mushroom ' of Jin Baiwan "；Then, dress is provided It sets and the preferred voice-response information is supplied to the user, " recommend vegetable ' the wooden palpus of gold million for you as voice plays Meat ' ", at the same show on the interface of smart television related dish information (such as title, price, evaluation, vegetable picture).

Optionally, providing device also directly can be supplied to the user for multiple voice-response informations, so that User screens in multiple voice-response informations.

Fig. 5 shows according to a further aspect of the present invention a kind of for determining the method flow diagram of speech recognition result.

Wherein, the method comprising the steps of S1 and step S2.

Specifically, in step sl, determine that equipment 1 obtains the natural language instructions of user's input；In step s 2, it determines Equipment 1 orders the natural-sounding according to the current response message for receiving intelligent terminal corresponding to the natural language instructions Order is parsed, to obtain corresponding speech recognition result.

Specifically, in step sl, determine that equipment 1 passes through application programming interfaces (API) provided by intelligent terminal itself, Or by application programming interfaces (API) provided by the third party devices such as pick up facility, obtain the nature of user's input Verbal order.

For example, it is assumed that take-away is ordered in user A plan, open intelligent terminal (such as smart television), carried on the smart television/ It is mounted with the visitor that can be identified, parse, understand, handle and respond the natural language instructions of user and export response results Family end (such as voice assistant APP), user A say " take-away " that dining room is taken out in the recommendation as shown in Figure 2 of corresponding response message, if User A wants to order the take-away in " U ancient cooking vessel emits dish " this family dining room, and then user A says " U ancient cooking vessel emits dish ", then in step sl, determines equipment 1 First by application programming interfaces (API) provided by smart television itself, the natural language instructions " U of user A input is got Ancient cooking vessel emits dish ".

Then, in step s 2, determine equipment 1 first by application programming interfaces provided by such as described intelligent terminal (API), alternatively, by sending current response message to server corresponding to the intelligent terminal (such as speech recognition server) Inquiry request, and receive the current response message that server is returned based on the inquiry request, receive the natural language to obtain The current response message of the corresponding intelligent terminal of speech order, alternatively, the institute of the reception intelligent terminal active upload can be passed through Current response message is stated to obtain the current response message；Alternatively, can also be by speech-recognition services provider to obtain State current response message；Then, according to the current response message, the natural-sounding order is parsed, with acquisition pair The speech recognition result answered.Here, what the present user interface that the current response message can be the intelligent terminal was shown Content (the i.e. described user visible content in the user interface of the intelligent terminal), or because of the display of the intelligent terminal Fail in the user interface of the intelligent terminal caused by the factors such as screen size is limited, front end UI design display completely but User can by least such as page turning, check more details or be discussed in detail or the current response message associated by items it is attached The content that the operations such as category information can be shown in the user interface of the intelligent terminal is (for example, the current response message is at this time Including multiple pagings and its be the non-homepage in multiple pagings), wherein the response mode of the current response message includes following Any one of at least:

Herein, it should be noted that above-mentioned response message can respond under any response mode, below only with shape of illustrating Formula is illustrated, and is not to be construed as limiting the invention.

For example, if the current response message is that the previous interactive instruction based on the user responds, here, previous friendship Mutually instruction can be the user by keyboard, touch tablet, touch screen, remote controler, interactive voice or handwriting equipment etc. it is a kind of or The interactive instruction that various ways issue then in step s 2, determines the natural language instructions " U ancient cooking vessel emits dish " of user A input According to the current response message for the smart television for receiving the natural language instructions, i.e. smart television shows as shown in Figure 2 equipment 1 Content, natural language instructions " U ancient cooking vessel emits dish " are parsed, that is, be based on user's word " U ancient cooking vessel emits dish " this voice flow, and Current response message (the i.e. Fig. 2 institute responded in conjunction with the previous interactive instruction (i.e. natural language instructions " take-away ") based on user A Show content), it can accurate judgement user thinks is " U ancient cooking vessel emits dish " rather than " emitting dish a little " or other similar voice.

For another example, if the current response message is not based on the content of the previous interactive instruction response of the user, such as: The current response message is booting homepage shown by the present user interface of the intelligent terminal, the current response message The content for being in response to the interactive instruction in other users and showing, here, the booting homepage can be the page of default, it can also To be the page stopped when shutdown.Assuming that take-away is ordered in the household user B plan of user A, intelligent terminal is opened (such as intelligence electricity Depending on), the natural language instructions of user can be identified, parses, understands, handles and respond simultaneously by carrying/being mounted on the smart television The client (such as voice assistant APP) that response results are exported, user B say " take-away ", corresponding response message such as Fig. 3 institute Dining room is taken out in the recommendation shown, at this time if the busy just explanation user A of user B takes out after renewal, since user A wants to order " one spoon will " The take-away in this family dining room, then user A says " what dish will be one spoon will have ", then in step s 2, determines that equipment 1 should be certainly according to reception The content as shown in Figure 3 that the current response message of the smart television of right verbal order, i.e. smart television are shown, to natural language Order " what dish will be one spoon will have " parses, that is, is based on user's word " what dish will be one spoon will have " this voice flow, and tie Close Fig. 3 shown in content, can accurate judgement user think be " what dish will be one spoon will have " rather than " what dish one spoon of sauce has ", " one spoon soy sauce what dish " or other similar voice.

For example, for response message as shown in Figure 3, it is assumed that user A says " the 4th ", in the active user of smart television After the vegetable for seeing " the 4th " dining room " Yoshinoya " on interface, user A discovery is not fond of eating, because depositing in the memory of user A There are the dining rooms " one spoon is incited somebody to action " in corresponding information as shown in Figure 3, and then user A says " what dish will be one spoon will have ", then in step In S2, equipment 1 is determined according to the current response message for the smart television for receiving the natural language instructions, i.e. user A is being executed certainly Before right verbal order " what dish will be one spoon will have " in performed continuous multiple interactive instructions corresponding to each interactive instruction Response contents solve natural language instructions " what dish will be one spoon will have " such as the content as shown in Figure 3 that smart television is shown Analysis is based on user's word " what dish will be one spoon will have " this voice flow, and content as shown in connection with fig. 3, can accurate judgement use What family was thought is " what dish will be one spoon will have " rather than " what dish one spoon of sauce has ", " one spoon soy sauce what dish " or other similar Voice.

For example, it is assumed that film ticket is ordered in user A plan, open intelligent terminal (such as smart television), is taken on the smart television It carries/is mounted with the natural language instructions that can identify, parse, understand, handle and respond user and export response results Client (such as voice assistant APP), user A says " film ticket " that corresponding response message is since content is more, paging exhibition Show, user A in the user interface of smart television it is seen that the first paged content, as shown in Fig. 4 (a), other paged contents Need user by least such as page-turning instruction can in the user interface of smart television as it can be seen that as shown in Fig. 4 (b), if User wants to order the film ticket of film " old beast ", then says " film for having old beast ", then in step s 2, determine equipment 1 According to receive the natural language instructions smart television current response message, i.e., smart television show as shown in Fig. 4 (a)-(b) Content, natural language instructions " film for having old beast " are parsed, can accurate judgement user thinks is " to have old beast Film ", although content shown in Fig. 4 (b) is not depicted on present user interface, if being based purely on user's word " has old This voice flow of the film of beast ", it may appear that the situation of system identification mistake, be such as identified as " have old thin film ", " film for having veteran " etc..

Optionally, in step s 2, determine that equipment 1 can also be whole according to intelligence corresponding to the natural language instructions is received End current response message and the user corresponding to history corpus feed back library, the natural-sounding order is solved Analysis, to obtain corresponding speech recognition result.

For example, also for the natural language instructions " film for having old beast " of user A, it is assumed that do not have in current response message Information relevant to voice flow " laoshou ", then in step s 2, determine equipment 1 can according to the voice flow user A history Corpus feedback is inquired in library, and discovery has information corresponding with the voice flow such as " to play South Korea " veteran " this film ", then In step s 2, determine that equipment 1 can determine whether that user A thinks is " film for having veteran ", rather than " has old thin film ", " film for having old beast " or other similar voice.

If the current response message includes multiple response contents items and each response contents item corresponds to multiple subitems Content, wherein in step s 2, determine equipment 1 according to the multiple response contents item and each response contents item institute Corresponding multiple subitem contents, parse the natural-sounding order, to obtain corresponding speech recognition result.

For example, it is assumed that current response message is that (multiple take-away dining rooms of the page are in recommendation take-away dining room as shown in Figure 2 For multiple response contents items included by the current response message, multiple vegetables in each dining room constitute corresponding multiple Subitem content), user A says " taking a shredded pork and eggs with dired mushroom " at this time, if only dining room " Jin Baiwan " has this vegetable, in step S2 In, determine that equipment 1 can be more according to corresponding to this response contents item of " Jin Baiwan " in current response message as shown in Figure 2 A subitem content, i.e. vegetable possessed by " Jin Baiwan ", can determine whether that user A thinks is " taking a shredded pork and eggs with dired mushroom ", rather than " is come A clover meat " or other similar voice.

Optionally, in step s 2, determine that equipment 1 can also be whole according to intelligence corresponding to the natural language instructions is received Content information associated by the current response message and the current response message at end carries out the natural-sounding order Parsing, to obtain corresponding speech recognition result.Specifically, in step s 2, determine equipment 1 first by such as to the intelligence Server corresponding to energy terminal sends the inquiry request of content information associated by the current response message, and receives service Content information associated by the current response message that device is returned based on the inquiry request, to obtain the current response letter The associated content information of breath；Then, further according to interior associated by the current response message and the current response message Hold information to parse the natural-sounding order, to obtain corresponding speech recognition result.

For example, for response message as shown in Figure 3, it is assumed that user A says " the 4th is double-screen flip " or the " the 4th Good a few taste Two bors d's oeuveres meals " then in step s 2 determine equipment 1 according to response message as shown in Figure 3, to the natural language of user A Speech order " the 4th is double-screen flip " or " the 4th good a few taste Two bors d's oeuveres meals " are parsed, because of response as shown in Figure 3 Information Chinese Restaurant " Yoshinoya " and its attached dish information, can determine whether that user A thinks is " the 4th lucky taste Two bors d's oeuveres meal ".

For another example, for response message as shown in Figure 4, it is assumed that user A says what " Zhan Nifo Sean Connery " was drilled, then in step S2 In, determine that equipment 1 according to response message as shown in Figure 4, solves the natural language instructions " Zhan Nifo Sean Connery " of user A Analysis, because of film " advancing bravely " in response message as shown in Figure 4 and its attached protagonist information, can determine whether what user A thought " Janney Buddhist Sean Connery is drilled ".

In one embodiment, determine that equipment 1 further includes step S3 (not shown) and step S4 (not shown).Specifically, In step s3, determine that equipment 1 according to institute's speech recognition result, determines voice response corresponding to the natural-sounding order Information；In step s 4, determine that the voice-response information is supplied to the user by equipment 1.

For example, in step s 2, determining equipment 1 to this certainly the natural language instructions " U ancient cooking vessel emits dish " of user A input Right verbal order is parsed, and is obtained corresponding speech recognition result, then, in step s3, is determined equipment 1 according to it in step The speech recognition result obtained in rapid S2, determine corresponding voice-response information be such as " the recommendation vegetable that U ancient cooking vessel emits dish ", then, In step s 4, determine that the voice-response information is supplied to user A by equipment 1, as voice play " for you recommend U ancient cooking vessel emit dish with Lower vegetable ", at the same show on the interface of smart television related dish information (such as title, price, evaluation, vegetable picture Deng).

In a still further embodiment, if the voice-response information is described in the multiple and described voice-response information belongs to Subitem content, wherein determine that equipment 1 further includes step S5 (not shown).Specifically, in step s 5, determine equipment 1 from multiple Preferred voice-response information is filtered out in the voice-response information；Wherein, it is described in step s 4, determine that equipment 1 will be described It is preferred that voice-response information is supplied to the user.

For example, it is assumed that current response message is that (multiple take-away dining rooms of the page are in recommendation take-away dining room as shown in Figure 2 For multiple response contents items included by the current response message, multiple vegetables in each dining room constitute corresponding multiple Subitem content), user A says " taking a shredded pork and eggs with dired mushroom " at this time, for this voice flow, if in addition to dining room " Jin Baiwan " has " the wooden palpus This vegetable of meat ", " 25 piece half " also have vegetable " clover meat ", and in step s 2, determine equipment 1 according to such as Fig. 2 institute (i.e. " Jin Baiwan " is had multiple subitem contents corresponding to " Jin Baiwan " in current response message this response contents item shown Some vegetables), and multiple subitem content (i.e. " 25 piece half " institutes corresponding to " 25 piece half " this response contents item The vegetable having), it can determine whether that " shredded pork and eggs with dired mushroom " and " clover meat " is the speech recognition result of the voice flow " muxurou " of user A, Correspondingly, in step s3, determine that equipment 1 according to the speech recognition result, determines that corresponding voice-response information is such as " gold hundred The vegetable ' clover meat ' of ten thousand vegetable ' shredded pork and eggs with dired mushroom ' He Ershi five and half "；Then, in step s 5, determine equipment 1 from multiple Preferred voice-response information is filtered out in voice-response information, such as according to the preference of user, natural quality, social property factor It is screened, such as assumes that user A preference vegetable " shredded pork and eggs with dired mushroom " then in step s 5 determines that equipment 1 can determine preferred voice response Information is " vegetable ' shredded pork and eggs with dired mushroom ' of Jin Baiwan "；Then, in step s 4, determine that equipment 1 mentions the preferred voice-response information The user is supplied, if voice plays " vegetable ' shredded pork and eggs with dired mushroom ' for recommending gold million for you ", while on the interface of smart television Show related dish information (such as title, price, evaluation, vegetable picture).

Optionally, in step s 4, determine that multiple voice-response informations also directly can be supplied to the use by equipment 1 Family, so that user screens in multiple voice-response informations.

Fig. 6 shows the block diagram for being suitable for the exemplary computer system/server for being used to realize embodiment of the present invention.Figure The computer system/servers 2 of 6 displays are only an example, should not function and use scope band to the embodiment of the present invention Carry out any restrictions.

As shown in fig. 6, computer system/server 2 is showed in the form of universal computing device.Computer system/service The component of device 2 can include but is not limited to: one or more processor or processing unit 21, system storage 22, connection The bus 23 of different system components (including system storage 22 and processing unit 21).

Bus 23 indicates one of a few class bus structures or a variety of, including memory bus or Memory Controller, Peripheral bus, graphics acceleration port, processor or the local bus using any bus structures in a variety of bus structures.It lifts For example, these architectures include but is not limited to industry standard architecture (ISA) bus, microchannel architecture (MAC) Bus, enhanced isa bus, Video Electronics Standards Association (VESA) local bus and peripheral component interconnection (PCI) bus.

Computer system/server 2 typically comprises a variety of computer system readable media.These media can be any The usable medium that can be accessed by computer system/server 2, including volatile and non-volatile media, movably and not Moveable medium.

System storage 22 may include the computer system readable media of form of volatile memory, such as arbitrary access Memory (RAM) 221 and/or cache memory 222.Computer system/server 2 may further include other removable Dynamic/immovable, volatile/non-volatile computer system storage medium.Only as an example, storage system 223 can be used In reading and writing immovable, non-volatile magnetic media (Fig. 6 do not show, commonly referred to as " hard disk drive ").Although not showing in Fig. 6 Out, the disc driver for reading and writing to removable non-volatile magnetic disk (such as " floppy disk ") can be provided, and to removable The CD drive of anonvolatile optical disk (such as CD-ROM, DVD-ROM or other optical mediums) read-write.In these cases, Each driver can be connected by one or more data media interfaces with bus 23.System storage 22 may include to A few program product, the program product have one group of (for example, at least one) program module, these program modules are configured to Execute the function of various embodiments of the present invention.

Program/utility 224 with one group of (at least one) program module 225, can store and deposit in such as system In reservoir 22, such program module 225 include --- but being not limited to --- operating system, one or more application program, It may include the reality of network environment in other program modules and program data, each of these examples or certain combination It is existing.Program module 225 usually executes function and/or method in embodiment described in the invention.

Computer system/server 2 can also be with one or more external equipments 25 (such as keyboard, sensing equipment, display Device 24 etc.) communication, the equipment interacted with the computer system/server 2 communication can be also enabled a user to one or more, And/or with enable the computer system/server 2 and one or more of the other any equipment for being communicated of calculating equipment (such as network interface card, modem etc.) communication.This communication can be carried out by input/output (I/O) interface 26.Also, Computer system/server 2 can also by network adapter 20 and one or more network (such as local area network (LAN), extensively Domain net (WAN) and/or public network, such as internet) communication.As shown in fig. 6, network adapter 20 is by bus 23 and calculates Other modules of machine systems/servers 2 communicate.It should be understood that computer system/service can be combined although being not shown in Fig. 6 Device 2 uses other hardware and/or software module, including but not limited to: microcode, device driver, redundant processing unit, outside Disk drive array, RAID system, tape drive and data backup storage system etc..

Processing unit 21 by the program that is stored in system storage 22 of operation, thereby executing various function application and Data processing, such as realize a kind of following method for determining speech recognition result, wherein method includes the following steps:

A obtains the natural language instructions of user's input；

It should be noted that the present invention can be carried out in the assembly of software and/or software and hardware, for example, can adopt With specific integrated circuit (ASIC), general purpose computer or any other realized similar to hardware device.In one embodiment In, software program of the invention can be executed by processor to realize steps described above or son.Similarly, of the invention soft Part program (including relevant data structure) can be stored in computer readable recording medium, for example, RAM memory, magnetic Or CD-ROM driver or floppy disc and similar devices.In addition, hardware can be used to realize in some steps of the invention or son, for example, As the circuit cooperated with processor thereby executing each step or son.

In addition, a part of the invention can be applied to computer program product, such as computer program instructions, when its quilt When computer executes, by the operation of the computer, it can call or provide according to the method for the present invention and/or technical solution. And the program instruction of method of the invention is called, it is possibly stored in fixed or moveable recording medium, and/or pass through Broadcast or the data flow in other signal-bearing mediums and transmitted, and/or be stored according to described program instruction operation In the working storage of computer equipment.Here, according to one embodiment of present invention including a device, which includes using Memory in storage computer program instructions and processor for executing program instructions, wherein when the computer program refers to When enabling by processor execution, method and/or skill of the device operation based on aforementioned multiple embodiments according to the present invention are triggered Art scheme.

It is obvious to a person skilled in the art that invention is not limited to the details of the above exemplary embodiments, Er Qie In the case where without departing substantially from spirit or essential attributes of the invention, the present invention can be realized in other specific forms.Therefore, no matter From the point of view of which point, the present embodiments are to be considered as illustrative and not restrictive, and the scope of the present invention is by appended power Benefit requires rather than above description limits, it is intended that all by what is fallen within the meaning and scope of the equivalent elements of the claims Variation is included in the present invention.Any reference signs in the claims should not be construed as limiting the involved claims.This Outside, it is clear that one word of " comprising " does not exclude other units or steps, and odd number is not excluded for plural number.That states in device claim is multiple Unit or device can also be implemented through software or hardware by a unit or device.The first, the second equal words are used to table Show title, and does not indicate any particular order.

Claims

1. a kind of method for determining speech recognition result, wherein method includes the following steps:

Natural language instructions are obtained, the natural language instructions are parsed, by the inquiry for sending current response message to server It requests and receives server to determine current response message, the current sound based on the current response message that the inquiry request returns Answer that information includes multiple response contents items and each response contents item corresponds to multiple subitem contents；

The voice flow of user's input is obtained, the voice flow includes similar but wrong with the voice flow voice out by speech recognition Speech recognition result accidentally, parses the voice flow according to the current response message, to obtain the voice flow pair The speech recognition result answered；

Wherein, the interactive instruction before the natural language instructions are the voice flow, institute's speech recognition result is that user thinks The voice said rather than voice as the voice class thought with user；

The current response message includes at least one of the following:

Sound corresponding to each interactive instruction in performed continuous multiple interactive instructions before executing the natural language instructions Answer content；

Based on response message corresponding to interactive instruction before user acquisition natural language instructions；

Based on response message corresponding to interactive instruction before other users acquisition natural language instructions.

2. according to the method described in claim 1, wherein, the current response message is shown in user interface, alternatively, described work as Preceding response message includes multiple pagings and the response message is the non-homepage in multiple pagings.

3. any method in -2 according to claim 1, wherein this method further comprises the steps of:

According to institute's speech recognition result, voice-response information corresponding to the voice flow is determined；

The voice-response information is supplied to the user.

4. according to the method described in claim 3, wherein, if the voice-response information is the multiple and described voice response letter Breath belongs to the subitem content, wherein this method further comprises the steps of:

Preferred voice-response information is filtered out from multiple voice-response informations；

Wherein, described the voice-response information is supplied to the user to include:

The preferred voice-response information is supplied to the user.

5. a kind of method for determining speech recognition result, wherein method includes the following steps:

Voice flow is obtained, the voice flow includes going out the voice similar but wrong with the voice flow voice by speech recognition to know Not as a result, the content information according to associated by the current response message and the current response message, to the voice flow It is parsed, to obtain the corresponding speech recognition result of the voice flow；

The current response message includes at least one of the following:

6. according to the method described in claim 5, wherein, the current response message is shown in user interface, alternatively, described work as Preceding response message includes multiple pagings and the response message is the non-homepage in multiple pagings.

7. the method according to any one of claim 5 to 6, wherein this method further comprises the steps of:

The voice-response information is supplied to the user.

8. according to the method described in claim 7, wherein, if the voice-response information is the multiple and described voice response letter Breath belongs to the subitem content, wherein this method further comprises the steps of:

The preferred voice-response information is supplied to the user.

9. one kind is for determining speech recognition result locking equipment really, wherein the determination equipment includes:

Acquisition device, for obtaining the voice flow of natural language instructions and user's input, the natural language instructions are institute's predicate Interactive instruction before sound stream, the voice flow include being gone out the language similar but wrong with the voice flow voice by speech recognition Sound recognition result；

Resolver sends current response letter to the corresponding server of the determination equipment for parsing the natural language instructions The inquiry request of breath；

Determining device determines current response letter based on the current response message that the inquiry request returns for receiving server Breath, the current response message includes multiple response contents items and each response contents item corresponds to multiple subitem contents；

The current response message includes at least one of the following:

Based on response message corresponding to interactive instruction before other users acquisition natural language instructions；

Wherein, the resolver is also used to parse the voice flow according to the current response message, to obtain The corresponding speech recognition result of voice flow is stated, institute's speech recognition result is the voice that user thinks rather than the language thought with user The similar voice of sound.

10. one kind is for determining speech recognition result locking equipment really, wherein the determination equipment includes:

Resolver sends the inquiry request of current response message to server for parsing the natural language instructions；

The current response message includes at least one of the following:

Wherein, the resolver is also used to according to associated by the current response message and the current response message Hold information, the voice flow is parsed, to obtain the corresponding speech recognition result of the voice flow, the speech recognition knot Voice as fruit is the voice that user thinks rather than the voice class thought with user.

11. determining equipment according to claim 9 or 10, wherein the current response message is shown in user interface, or Person, the current response message include multiple pagings and its be the non-homepage in multiple pagings.

12. determining equipment according to claim 9 or 10, wherein

The determining device is also used to determine that voice response corresponding to the voice flow is believed according to institute's speech recognition result Breath；

Device is provided, for the voice-response information to be supplied to the user.

13. determining equipment according to claim 12, wherein if the voice-response information is the multiple and described voice Response message belongs to the subitem content, wherein the determination equipment further include:

Screening plant, for filtering out preferred voice-response information from multiple voice-response informations；

Wherein, the offer device is used for:

The preferred voice-response information is supplied to the user.

14. a kind of calculating equipment, comprising:

One or more processors；

Memory, for storing one or more programs,

When one or more of programs are executed by one or more of processors, so that one or more of processors Execute such as method described in any item of the claim 1 to 8.

15. a kind of computer readable storage medium, is stored thereon with computer program, wherein when the program is executed by processor Realize such as method described in any item of the claim 1 to 8.