CN105869640A

CN105869640A - Method and device for recognizing voice control instruction for entity in current page

Info

Publication number: CN105869640A
Application number: CN201510031182.3A
Authority: CN
Inventors: 雷欣
Original assignee: Shanghai Ink Hundred Meaning Information Technology Co Ltd
Current assignee: Volkswagen China Investment Co Ltd; Mobvoi Innovation Technology Co Ltd
Priority date: 2015-01-21
Filing date: 2015-01-21
Publication date: 2016-08-17
Anticipated expiration: 2035-01-21
Also published as: CN105869640B

Abstract

The invention provides a method and a device for recognizing a voice control instruction for an entity in a current page. The method comprises steps: an entity is extracted from the current page; based on the extracted entity and a corresponding construction template, a candidate instruction set is constructed; and based on matching between voice of the user and the candidate instruction in the candidate instruction set, the voice control instruction for the entity in the current page is recognized from the voice of the user. Thus, the voice instruction recognition flexibility is enhanced.

Description

The method and apparatus identifying the phonetic control command for the entity in current page

Technical field

The present invention relates to speech recognition technology, particularly relate to a kind of identification for the entity in current page The method and apparatus of phonetic control command.

Background technology

In prior art, when carrying out phonetic order identification, it is typically only capable to based on fixing phonetic order collection In phonetic order whether mate with the voice of user and determine whether that the voice of user is a voice Instruction.Such as, instruction " my Yao Qumaiqu Pekinese train ticket " is contained with fixing phonetic order collection As a example by, the voice content that only user produces is identical with this phonetic order, just can think that user sends this Phonetic order, and then perform associative operation.If the voice content that user produces is " my train to be bought Ticket goes to Beijing ", the order of clause will be overturned, then be must not believe that user have issued phonetic order, Thus do not perform associative operation, thus cause the flexibility ratio of phonetic order identification poor.

Summary of the invention

One of present invention solves the technical problem that the motility being an up phonetic order identification.

An embodiment according to an aspect of the present invention, it is provided that a kind of identification is for current page In the method for phonetic control command of entity, including: from current page, extract entity；Based on The entity extracted and corresponding structure template, construct candidate instruction set；The voice said based on user With mating of the candidate instruction in candidate instruction set, be recognized for from the voice that user says work as The phonetic control command of the entity in the front page.

Alternatively, the step extracting entity from current page includes: by the word in current page Participle；Judge the part of speech of the word separated；Each word in the word with specific part of speech that will separate is defeated Enter grader, to judge whether this word is to be constituted the word of entity and constituted the beginning of entity, centre Or end, grader is that the set of prior entity and non-physical word sample trains；According to The grader result of determination to each word in the described word with specific part of speech separated, it is judged that should Whether the word with specific part of speech is entity.

Alternatively, structure template is previously formed as follows: order from the historical Voice command of active user Each voice control command in the set of order extracts entity, and extracts the entity week of this extraction The language mode enclosed, as a structure template corresponding with the entity extracted.

Alternatively, structure template is previously formed as follows: order from the historical Voice command of all users Each voice control command in the set of order extracts entity, and extracts the entity week of this extraction The language mode enclosed, as a structure template corresponding with the entity extracted.

Alternatively, based on the entity extracted and the step of corresponding structure structure of transvers plate candidate instruction set Including: based on the entity extracted, the synonym of the entity extracted described in acquisition；By extract Entity, the synonym of acquisition be respectively applied to described in the corresponding structure template of entity that extracts, respectively Obtain corresponding candidate instruction, put in candidate instruction set.

Alternatively, the voice said based on user and the candidate instruction in candidate instruction set mate from The step of the phonetic control command of the entity being recognized in current page in the voice that user says Including: the voice said in response to user mates with a candidate instruction in candidate instruction set, knows The entity extracted that this candidate instruction is not corresponding, thus identify pin from the voice that user says The phonetic control command of the entity that this in current page is extracted.

An embodiment according to an aspect of the present invention, it is provided that a kind of identification is for current page In the device of phonetic control command of entity, including extracting unit, be configured to from current page In extract entity；Structural unit, is configured to based on the entity extracted and corresponding structure template, Structure candidate instruction set；Recognition unit, is configured to voice and the candidate instruction said based on user The coupling of the candidate instruction in set, is recognized in current page from the voice that user says The phonetic control command of entity.

Alternatively, extracting unit is configured to: by the word participle in current page；Judgement separates The part of speech of word；Each word input grader in the word with specific part of speech that will separate, to judge Whether this word is to constitute the word of entity and constitute the beginning of entity, centre or end, grader It is that the set of prior entity and non-physical word sample trains；Separate described according to grader The word with specific part of speech in the result of determination of each word, it is judged that this has the word of specific part of speech Whether it is entity.

Alternatively, structural unit is configured to: based on the entity extracted, and extracts described in acquisition The synonym of entity；Extract described in the synonym of the entity extracted, acquisition is respectively applied to The corresponding structure template of entity, respectively obtains corresponding candidate instruction, puts in candidate instruction set.

Alternatively, recognition unit be configured to respond to voice that user says with in candidate instruction set Candidate instruction coupling, identify the entity extracted that this candidate instruction is corresponding, thus from The Voice command of this entity extracted being recognized in current page in the voice that family is said refers to Order.

Owing to the candidate instruction set of the embodiment of the present invention is not fixing, but according to current page Different in real time according to entity present on current page and combine what corresponding structure template constructed, therefore, Can be very flexible when user sends instruction, it is various that the embodiment of the present invention can recognize that user sends Flexible instruction, it is to avoid in prior art, user can only send out phonetic order by fixing phonetic order collection Rigid pattern.

Although those of ordinary skill in the art it will be appreciated that detailed description below by referenced in schematic embodiment, Accompanying drawing is carried out, but the present invention is not limited in these embodiments.But, the scope of the present invention is extensive , and it is intended to be bound only by appended claims restriction the scope of the present invention.

Accompanying drawing explanation

The detailed description that non-limiting example is made made with reference to the following drawings by reading, this The other features, objects and advantages of invention will become more apparent upon:

Fig. 1 is the voice control for the entity in current page of the identification according to one embodiment of the invention The flow chart of the method for system instruction；

Fig. 2 is according to the mistake extracting entity in the method for one embodiment of the invention from current page One concrete flow chart of journey；

Fig. 3 is to construct based on the entity extracted and correspondence according in the method for one embodiment of the invention One concrete flow chart of the process of structure of transvers plate candidate instruction set；

Fig. 4 is the voice control for the entity in current page of the identification according to one embodiment of the invention The block diagram of the device of system instruction.

In accompanying drawing, same or analogous reference represents same or analogous parts.

Detailed description of the invention

Below in conjunction with the accompanying drawings the present invention is described in further detail.

Fig. 1 is the voice control for the entity in current page of the identification according to one embodiment of the invention The flow chart of the method 1 of system instruction.The method may be used for mobile unit, mobile terminal, fixing sets Standby (such as desktop computer) etc..Current page refer at present at mobile unit, mobile terminal, fixing set The page shown in standby (such as desktop computer) etc..It can be to be not responsive to active user (operate vehicle-mounted The user of equipment, mobile terminal, fixing equipment etc.) the page that shows of action, it is also possible to be in response to The page that the action of active user shows.Entity refer on the page display, represent that user may want to The word of object of action or sequence number.Such as, on desktop show " spicy temptation ", " spicy hot pot ", Cuisines projects such as " characteristic grilled fish ", each of which cuisines project is regarded as an entity, " fiber crops Peppery temptation ", the sequence number (such as 1,2,3 etc.) that is displayed next to of " spicy hot pot ", " characteristic grilled fish " It is considered as an entity, because the phonetic order that next user may send (such as " my fiber crops to be eaten Peppery temptation ", " I selects 3 ") be likely to for they.

Current page is to be not responsive to active user (operation mobile unit, mobile terminal, fixing equipment Deng user) the application scenarios of the page that shows of action such as, on mobile unit Application, when mobile unit is opened, gives tacit consent on the desktop of mobile unit and opens.Show on the table " navigate ", " cuisines ", " shopping " etc..After active user sends the voice of " I wants to go shopping ", The side identifying the phonetic control command for the entity in current page of one embodiment of the present of invention Method 1 just identifies its phonetic control command for this entity of " doing shopping " in current page, Thus perform further action, for example, market etc. near active user's display.Certainly, Such a application is also likely to be on mobile terminal or fixing equipment, when mobile terminal or fixing set During standby unlatching so that on display desktop, some options occurs in acquiescence.

Current page is in response to active user (operation mobile unit, mobile terminal, fixing equipment etc. User) the application scenarios of the page that shows of action such as, in a certain vehicular applications, when Front user first activates this vehicular applications, then says such as " please be supplied to the restaurant that I am neighbouring ", at this moment The display screen display " spicy temptation " of mobile unit, " Boiled Fish township ", " Quanjude " etc., currently User says " my Quanjude to be gone ", and now the identification of one embodiment of the present of invention is for current page The method 1 of the phonetic control command of the entity in face just identifies it for " complete in current page Poly-moral " phonetic control command of this entity, and then perform further action, such as connect complete poly- The phone of moral or be shown to the concrete route etc. of Quanjude.Certainly, such a application is likely to It is on mobile terminal or fixing equipment, makes mobile terminal or solid active user by preamble operation After occurring some options on the display screen of locking equipment, the identification of one embodiment of the present of invention is for currently The method 1 of the phonetic control command of the entity in the page just can be used to identify that next active user sends out Whether the voice gone out is the phonetic control command for the entity in current page and is for which The phonetic control command of entity.

In step 110, from current page, entity is extracted.

In one case, after analyzing the composition of current page, find that current page mainly includes several Frame, has a word (to identify that the word in frame is a word or several word structure in these frames respectively The phrase or the sentence that become can be realized by existing participle technique), it is believed that the word in each frame It it is respectively an entity.

In another case, after analyzing the composition of current page, find that current page mainly includes several Individual frame, has a phrase or sentence in these frames respectively, or analyzes the composition of the current page After, find that current page is an article, or include various word, the labyrinth of various frame The page, now need from the method for such as Fig. 2 to extract entity.

In sub-step 1101, by the word participle in current page.

It is usually all word participles identified by current page.Such as main at aforementioned current page Including several frames, in these frames, have a phrase or sentence respectively in the case of, to each word Group or sentence carry out participle.Such as in the case of current page is an article, article is carried out point Word.Participle can use existing segmenting method to realize.

In sub-step 1102, it is judged that the part of speech of the word separated.

At present, the existing mature technology of semantic analysis aspect.Prior art can be used to judge the side of part of speech Method judges the part of speech of the word separated.In general, only the notional word such as noun, verb, adjective with And sequence number word is likely to become entity.Function word unlikely becomes entity.

Each word input point in sub-step 1103, in the word with specific part of speech that will separate Class device, with judge this word whether constituted entity word and constituted the beginning of entity, centre or End, grader is that the set of prior entity and non-physical word sample trains.

The word of specific part of speech such as notional word and sequence number word.In some cases, the word of specific part of speech is permissible Only it is defined as noun and sequence number word.

Machine learning has mature technology at present.Can be with a large amount of entity word and a large amount of non-solid pronouns, general term for nouns, numerals and measure words sample Set training one model, the i.e. grader constituted.Specifically, by these entity word and non-physical Each word input grader of word sample, and be from an entity word or from one by this word Individual non-solid pronouns, general term for nouns, numerals and measure words, be constitute the beginning of entity word, centre or end input grader, grader from Learning from entity word, from non-solid pronouns, general term for nouns, numerals and measure words word and constitute entity word beginning, middle with And the word at end has any rule respectively.So, after inputting a new word to grader, grader Just can determine that whether this word is to constitute the word of entity and constitute the beginning of entity, middle or last Tail.

In sub-step 1104, according to grader in the described word with specific part of speech separated The result of determination of each word, it is judged that whether this word with specific part of speech is entity.

Such as, for " Boiled Fish township ", grader judges that " boiling " warp is frequently as the beginning of entity, " rising " Through frequently as the centre of entity or end, " fish " through frequently as the centre of entity, " township " through frequently as The end of entity, therefore, it is judged that " Boiled Fish township " is entity.

In the step 120, based on the entity extracted and corresponding structure template, candidate instruction collection is constructed Close.

Candidate instruction refers to the entity for current page, the instruction that user may send.Structure mould Plate refers to the entity for current page, and user is the language mode possible used sending instruction.Example As, for " 2. Boiled Fish township " on current page, user may send instruction, and " I to go boiling Fish township ", " going to Boiled Fish township ", " selecting Boiled Fish township ", " Boiled Fish township ", " 2 ", " choosing 2 " etc.." my Yao Qu Boiled Fish township ", " going to Boiled Fish township ", " selecting Boiled Fish township ", " boiling Teng Yu township ", " 2 ", " selecting 2 " be candidate instruction, " I xx to be gone ", " removing xx ", " choosing Xx ", " xx ", " No. ", " selecting No. " etc. be structure template.

A kind of generation type of structure template is to be predefined various structure by people for various entities in advance Template is got well and is stored in a data base.

The another kind of generation type of structure template is: from the historical voice control command of all users Set in each voice control command in extract entity, and extract around the entity of this extraction Language mode, as a structure template corresponding with the entity extracted.

Language mode around entity refers to the language structure of the independent sentence with predicate at entity place One-tenth mode.Such as, from " my Tian An-men to be gone, the most how much？Extract in " entity " my god Peace door ".The independent sentence with predicate at its place is " my Tian An-men to be gone ", and its language mode is " I xx to be gone ".

Such as, for using the user of a certain application, can be from the use of all these application of use Every voice control command that family sends when using this application in history all extracts entity, and extracts Go out the language mode around the entity of this extraction, as a structure mould corresponding with the entity extracted Plate.

This benefit forming structure template is to be the practice from user owing to these construct templates That middle collection comes rather than in advance figured out without foundation by people, improve the objectivity of structure template, Thus improve the precision identifying the phonetic control command for the entity in current page.

The another kind of generation type of structure template is: from the historical voice control command of active user Set in each voice control command in extract entity, and extract around the entity of this extraction Language mode, as a structure template corresponding with the entity extracted.

Such as, for using the active user of a certain user, can be from using working as of this application Every voice control command that front user sends when using this application in history is all extracted entity, and Extract the language mode around the entity of this extraction, as a structure corresponding with the entity extracted Mold board.

This form structure template benefit be, owing to structure template is historical from active user Extracting in voice control command, it reflects the feature of the language of active user self, such as when Often say in " Xiang Quchi Boiled Fish township " when front user sees on the page in history to be had " Boiled Fish township " Rather than say in " my Yao Qu Boiled Fish township ", at this moment may " wanting to eat xx " active user be come Saying the structure template being probably more often, therefore, this mode forming structure template can adapt to use The individual demand at family, improves the precision identifying the phonetic control command for the entity in current page.

A kind of mode based on the entity extracted and the structure of transvers plate candidate instruction set of corresponding structure is, The corresponding structure template of the entity extracted described in being directly applied to by the entity extracted, obtains candidate Instruction is put in candidate instruction set.

Such as, the entity of extraction is " Boiled Fish township ", the structure template of its correspondence be " I xx to be gone ", " remove xx ", " xx ", " selecting xx ", the entity application of extraction is constructed template in these, the time obtained Selecting instruction is " my Yao Qu Boiled Fish township ", " going to Boiled Fish township ", " Boiled Fish township ", " selecting Boiled Fish township ", Place them in candidate instruction set.In a kind of mode, can be by them and " Boiled Fish township " Put into accordingly in candidate instruction set.

As it is shown on figure 3, in another embodiment, based on the entity extracted and corresponding structure mould The step of plate structure candidate instruction set 120 includes sub-step 1201 and sub-step 1202.

In sub-step 1201, based on the entity extracted, the entity extracted described in acquisition same Justice word.

Build a database of synonyms in advance.Such as, expert it is aforementioned from all users or current Each voice control command in the set of the historical voice control command of user extracts Entity finds synonym one by one, is placed in database of synonyms.Or, by expert to the institute in dictionary Having word to sort out, by word one TongYiCi CiLin of composition close for implication, all TongYiCi CiLin are just Constitute database of synonyms.Database of synonyms can also be built otherwise.

After database of synonyms builds, it is possible to by search by the way of database of synonyms based on The synonym of the entity extracted described in the entity acquisition extracted.

In sub-step 1202, the synonym of the entity extracted, acquisition is respectively applied to described The corresponding structure template of the entity extracted, respectively obtains corresponding candidate instruction, puts into candidate instruction collection In conjunction.

Such as, the entity extracted is " Peking University ", and the synonym of acquisition is " Beijing University ", its Corresponding structure template be " navigating to xx ", " removing xx ", " going to xx ", " I thinks xx ", " phone xx ", then the candidate instruction finally obtained is:

-navigate to Peking University

-go to Peking University

-my Xiang Qu Peking University

-phone Peking University

-navigate to Beijing University

-go to Beijing University

-my Xiang Qu Beijing University

-phone Beijing University.

In step 130, the voice said based on user and the candidate instruction in candidate instruction set Coupling, the phonetic control command of the entity being recognized in current page from the voice that user says.

Such as, the part paused in the voice of active user is identified, it is believed that in the voice of active user The voice that voice is a minor sentence between two parts paused, uses speech recognition known in the art The speech recognition of this minor sentence is become word by method, with the candidate instruction set that constructs in step 120 In candidate instruction contrast one by one, when find the voice being identified as the minor sentence after word with in step 120 A candidate instruction in the candidate instruction set constructed is completely the same or comprises structure in step 120 During a candidate instruction in the candidate instruction set produced, it is considered as have found the voice that user says With mating of the candidate instruction in candidate instruction set, that candidate in the candidate instruction set found The phonetic control command of the entity that instruction is aiming in current page.

It is then also possible to determine whether which of current page the phonetic control command identified be for Individual entity.Due to as it was previously stated, in the corresponding candidate instruction that will obtain according to the entity that extracts It is can be by side corresponding with the entity extracted for these candidate instruction when putting in candidate instruction set Formula is stored in candidate instruction set, the voice therefore said in response to user and candidate instruction set In candidate instruction coupling, it is possible to identify the entity extracted that this candidate instruction is corresponding, Thus judge that the phonetic control command identified is which entity for current page.

The Voice command of the entity being recognized in current page in the voice said from user refers to After order, then can perform this voice control command.Such as, by each time in candidate instruction set The execution program code of choosing instruction correspondence is placed in another data base, when finding in candidate instruction set A candidate instruction (i.e. identifying phonetic control command) after, by performing this another data base In perform program code accordingly, it is possible to perform this voice control command.

As shown in Figure 4, identification according to an embodiment of the invention is for the entity in current page The device 2 of phonetic control command include: extracting unit 210, be configured to take out from current page Take out entity；Structural unit 220, is configured to based on the entity extracted and corresponding structure template, Structure candidate instruction set；Recognition unit 230, is configured to voice and the candidate said based on user The coupling of the candidate instruction in instruction set, is recognized for current page from the voice that user says In the phonetic control command of entity.Above-mentioned each unit can be (FPGA, integrated with software, hardware Circuit etc.) or the mode of software and hardware combining realize.

Alternatively, extracting unit 210 is configured to: by the word participle in current page；Judge to divide The part of speech of the word gone out；Each word input grader in the word with specific part of speech that will separate, with Judge whether this word is to constitute the word of entity and constitute the beginning of entity, centre or end, point Class device is that the set of prior entity and non-physical word sample trains；According to grader to described The result of determination of each word in the word with specific part of speech separated, it is judged that this has specific part of speech Word whether be entity.

Alternatively, recognition unit 230 is configured to respond to voice and the candidate instruction collection that user says A candidate instruction coupling in conjunction, identifies the entity extracted that this candidate instruction is corresponding, thus The voice control of this entity extracted being recognized in current page from the voice that user says System instruction.

Flow chart and block diagram in accompanying drawing show the system of multiple embodiments according to the present invention, method Architectural framework in the cards, function and operation with computer program.In this, flow process Each square frame in figure or block diagram can represent a module, program segment or a part for code, described A part for module, program segment or code comprises one or more logic function for realizing regulation Executable instruction.It should also be noted that some as replace realization in, the merit marked in square frame Can occur to be different from the order marked in accompanying drawing.Such as, two continuous print square frames are actual On can perform substantially in parallel, they can also perform sometimes in the opposite order, and this is according to involved Function depending on.It is also noted that each square frame in block diagram and/or flow chart and block diagram and / or flow chart in the combination of square frame, can with perform the function of regulation or the special of operation based on firmly The system of part realizes, or can realize with the combination of specialized hardware with computer instruction.

It is obvious to a person skilled in the art that the invention is not restricted to the thin of above-mentioned one exemplary embodiment Joint, and without departing from the spirit or essential characteristics of the present invention, it is possible to concrete with other Form realizes the present invention.Therefore, no matter from the point of view of which point, embodiment all should be regarded as exemplary , and be nonrestrictive, the scope of the present invention is limited by claims rather than described above It is fixed, it is intended that all changes fallen in the implication of equivalency and scope of claim are included In the present invention.Any reference in claim should not be considered as limit involved right want Ask.

Claims

1. the method (1) identifying the phonetic control command for the entity in current page, bag Include:

Entity (110) is extracted from current page；

Based on the entity extracted and corresponding structure template, structure candidate instruction set (120)；

The voice said based on user mates with the candidate instruction in candidate instruction set, says from user The phonetic control command (130) of the entity being recognized in current page in the voice gone out.

Method the most according to claim 1, wherein extracts entity (110) from current page Step includes:

By the word participle (1101) in current page；

Judge the part of speech (1102) of the word separated；

Each word input grader in the word with specific part of speech that will separate, to judge that this word is No is to constitute the word of entity and constitute the beginning of entity, centre or end (1103), classification Device is that the set of prior entity and non-physical word sample trains；

According to grader, the judgement of each word in the described word with specific part of speech separated is tied Really, it is judged that whether this word with specific part of speech is entity (1104).

Method the most according to claim 1, wherein structure template is previously formed as follows: from active user Historical voice control command set in each voice control command in extract entity, and Extract the language mode around the entity of this extraction, as a structure corresponding with the entity extracted Mold board.

Method the most according to claim 1, wherein structure template is previously formed as follows: from all users Historical voice control command set in each voice control command in extract entity, and Extract the language mode around the entity of this extraction, as a structure corresponding with the entity extracted Mold board.

Method the most according to claim 1, wherein based on the entity extracted and corresponding structure template structure The step making candidate instruction set (120) includes:

Based on the entity extracted, the synonym (1201) of the entity extracted described in acquisition；

The correspondence of the entity extracted described in the synonym of the entity extracted, acquisition is respectively applied to Structure template, respectively obtains corresponding candidate instruction, puts into (1202) in candidate instruction set.

Method the most according to claim 1, the voice wherein said based on user and candidate instruction set In the entity that is recognized in current page from the voice that user says of the coupling of candidate instruction The step of phonetic control command (130) including:

The voice said in response to user mates with a candidate instruction in candidate instruction set, identifies The entity extracted that this candidate instruction is corresponding, thus be recognized for from the voice that user says The phonetic control command of this entity extracted in current page.

7. the device (2) identifying the phonetic control command for the entity in current page, bag Include:

Extracting unit (210), is configured to extract entity from current page；

Structural unit (220), is configured to based on the entity extracted and corresponding structure template, structure Make candidate instruction set；

Recognition unit (230), be configured to the voice said based on user with in candidate instruction set The coupling of candidate instruction, the entity being recognized in current page from the voice that user says Phonetic control command.

Device the most according to claim 7, wherein extracting unit (210) is configured to:

By the word participle in current page；

Judge the part of speech of the word separated；

Each word input grader in the word with specific part of speech that will separate, to judge that this word is No is to constitute the word of entity and constitute the beginning of entity, centre or end, and grader is prior Train with the set of entity and non-physical word sample；

According to grader, the judgement of each word in the described word with specific part of speech separated is tied Really, it is judged that whether this word with specific part of speech is entity.

Device the most according to claim 7, wherein structure template is previously formed as follows: from active user Historical voice control command set in each voice control command in extract entity, and Extract the language mode around the entity of this extraction, as a structure corresponding with the entity extracted Mold board.

Device the most according to claim 7, wherein structure template is previously formed as follows: useful from institute Each voice control command in the set of the historical voice control command at family extracts entity, And extract the language mode around the entity of this extraction, corresponding with the entity extracted as one Structure template.