CN105869640B - Method and device for recognizing voice control instruction aiming at entity in current page - Google Patents

Method and device for recognizing voice control instruction aiming at entity in current page Download PDF

Info

Publication number
CN105869640B
CN105869640B CN201510031182.3A CN201510031182A CN105869640B CN 105869640 B CN105869640 B CN 105869640B CN 201510031182 A CN201510031182 A CN 201510031182A CN 105869640 B CN105869640 B CN 105869640B
Authority
CN
China
Prior art keywords
entity
extracted
candidate
word
speech
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510031182.3A
Other languages
Chinese (zh)
Other versions
CN105869640A (en
Inventor
雷欣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Volkswagen China Investment Co Ltd
Mobvoi Innovation Technology Co Ltd
Original Assignee
Shanghai Ink Hundred Meaning Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Ink Hundred Meaning Information Technology Co Ltd filed Critical Shanghai Ink Hundred Meaning Information Technology Co Ltd
Priority to CN201510031182.3A priority Critical patent/CN105869640B/en
Publication of CN105869640A publication Critical patent/CN105869640A/en
Application granted granted Critical
Publication of CN105869640B publication Critical patent/CN105869640B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention provides a method and a device for recognizing a voice control instruction aiming at an entity in a current page. The method comprises the following steps: extracting entities from the current page; constructing a candidate instruction set based on the extracted entity and the corresponding construction template; based on the matching of the speech spoken by the user with the candidate instructions in the set of candidate instructions, speech control instructions for the entities in the current page are identified from the speech spoken by the user. The invention improves the flexibility of voice instruction recognition.

Description

Method and device for recognizing voice control instruction aiming at entity in current page
Technical Field
The present invention relates to a voice recognition technology, and in particular, to a method and an apparatus for recognizing a voice control command for an entity in a current page.
Background
In the prior art, when voice instruction recognition is performed, whether the voice of the user is a voice instruction can be determined only based on whether the voice instruction in the fixed voice instruction set is matched with the voice of the user. For example, taking an example that a fixed voice instruction set contains an instruction "i want to buy a train ticket in beijing", only if the voice content generated by the user is the same as the voice instruction, the user can be considered to issue the voice instruction, and then the related operation is executed. If the voice content generated by the user is 'i want to buy the train ticket to get to Beijing', that is, the sequence of the sentence patterns is reversed, the user cannot be considered to send the voice command, so that the related operation is not executed, and the flexibility of voice command recognition is poor.
Disclosure of Invention
One of the technical problems solved by the invention is to improve the flexibility of voice instruction recognition.
According to an embodiment of one aspect of the present invention, there is provided a method of recognizing a voice control instruction for an entity in a current page, including: extracting entities from the current page; constructing a candidate instruction set based on the extracted entity and the corresponding construction template; based on the matching of the speech spoken by the user with the candidate instructions in the set of candidate instructions, speech control instructions for the entities in the current page are identified from the speech spoken by the user.
Optionally, the step of extracting the entity from the current page includes: segmenting words in the current page; judging the part of speech of the divided words; inputting each word of the separated words with specific part of speech into a classifier to judge whether the word is a word forming an entity and whether the word forms the beginning, the middle or the end of the entity, wherein the classifier is trained in advance by a set of word samples of the entity and non-entity; and judging whether the word with the specific part of speech is an entity or not according to the judgment result of the classifier on each word in the separated words with the specific part of speech.
Alternatively, the construction template is formed beforehand as follows: and extracting an entity from each voice control command in the historical voice control command set of the current user, and extracting language patterns around the extracted entity to be used as a construction template corresponding to the extracted entity.
Alternatively, the construction template is formed beforehand as follows: an entity is extracted from each voice control command in a set of historical voice control commands of all users, and language patterns around the extracted entity are extracted to be used as a construction template corresponding to the extracted entity.
Optionally, the step of constructing a candidate instruction set based on the extracted entities and the corresponding construction templates includes: acquiring synonyms of the extracted entities based on the extracted entities; and respectively applying the extracted entity and the obtained synonym to the corresponding construction template of the extracted entity to respectively obtain corresponding candidate instructions, and putting the corresponding candidate instructions into a candidate instruction set.
Optionally, the step of identifying the voice control instruction for the entity in the current page from the voice spoken by the user based on the matching of the voice spoken by the user with the candidate instruction in the candidate instruction set comprises: and in response to the voice spoken by the user being matched with one candidate instruction in the candidate instruction set, identifying the extracted entity corresponding to the candidate instruction, so as to identify the voice control instruction aiming at the extracted entity in the current page from the voice spoken by the user.
According to an embodiment of an aspect of the present invention, there is provided an apparatus for recognizing a voice control instruction for an entity in a current page, including: an extraction unit configured to extract an entity from a current page; a construction unit configured to construct a set of candidate instructions based on the extracted entities and corresponding construction templates; and the recognition unit is configured to recognize the voice control instruction aiming at the entity in the current page from the voice spoken by the user based on the matching of the voice spoken by the user and the candidate instruction in the candidate instruction set.
Optionally, the extraction unit is configured to: segmenting words in the current page; judging the part of speech of the divided words; inputting each word of the separated words with specific part of speech into a classifier to judge whether the word is a word forming an entity and whether the word forms the beginning, the middle or the end of the entity, wherein the classifier is trained in advance by a set of word samples of the entity and non-entity; and judging whether the word with the specific part of speech is an entity or not according to the judgment result of the classifier on each word in the separated words with the specific part of speech.
Alternatively, the construction template is formed beforehand as follows: and extracting an entity from each voice control command in the historical voice control command set of the current user, and extracting language patterns around the extracted entity to be used as a construction template corresponding to the extracted entity.
Alternatively, the construction template is formed beforehand as follows: an entity is extracted from each voice control command in a set of historical voice control commands of all users, and language patterns around the extracted entity are extracted to be used as a construction template corresponding to the extracted entity.
Optionally, the construction unit is configured to: acquiring synonyms of the extracted entities based on the extracted entities; and respectively applying the extracted entity and the obtained synonym to the corresponding construction template of the extracted entity to respectively obtain corresponding candidate instructions, and putting the corresponding candidate instructions into a candidate instruction set.
Optionally, the recognition unit is configured to, in response to a match between the speech spoken by the user and one candidate instruction in the candidate instruction set, recognize the extracted entity corresponding to the candidate instruction, so as to recognize the speech control instruction for the extracted entity in the current page from the speech spoken by the user.
The candidate instruction set of the embodiment of the invention is not fixed, but is constructed according to the entity existing on the current page and the corresponding construction template in real time according to the difference of the current page, so that the user can flexibly send the instruction.
It will be appreciated by those of ordinary skill in the art that although the following detailed description will proceed with reference being made to illustrative embodiments, the present invention is not intended to be limited to these embodiments. Rather, the scope of the invention is broad and is intended to be defined only by the claims appended hereto.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments made with reference to the following drawings:
FIG. 1 is a flow diagram of a method of identifying voice control instructions for entities in a current page in accordance with one embodiment of the present invention;
FIG. 2 is a flowchart illustrating the process of extracting entities from a current page in a method according to an embodiment of the present invention;
FIG. 3 is a detailed flow diagram of a process for constructing a set of candidate instructions based on extracted entities and corresponding construction templates in a method according to one embodiment of the invention;
FIG. 4 is a block diagram of an apparatus for identifying voice control commands for entities in a current page according to one embodiment of the present invention.
The same or similar reference numbers in the drawings identify the same or similar elements.
Detailed Description
The present invention is described in further detail below with reference to the attached drawing figures.
FIG. 1 is a flow diagram of a method 1 of identifying voice control commands for entities in a current page according to one embodiment of the invention. The method can be used for vehicle-mounted equipment, mobile terminals, fixed equipment (such as desktop computers) and the like. The current page refers to a page displayed on vehicle-mounted equipment, a mobile terminal, fixed equipment (such as a desktop computer) and the like at present. It may be a page that is not displayed in response to the action of the current user (a user operating a vehicle-mounted device, a mobile terminal, a fixed device, or the like), or may be a page that is displayed in response to the action of the current user. An entity refers to a word or sequence number displayed on a page that represents an object of an action that a user may desire. For example, the table top displays food items such as "spicy attraction", "spicy pot", "special grilled fish", etc., wherein each food item is considered as an entity, and the serial numbers (such as 1, 2, 3, etc.) displayed beside the items of "spicy attraction", "spicy pot", "special grilled fish" are also considered as an entity, because the voice instructions (such as "i want to eat spicy attraction", "i select 3") that the user may issue next are likely to be directed to them.
One application scenario in which the current page is a page that is not displayed in response to the action of the current user (a user operating the in-vehicle device, the mobile terminal, the stationary device, or the like), for example, one application on the in-vehicle device, is opened by default on the desktop of the in-vehicle device when the in-vehicle device is turned on. Navigation, food, shopping and the like are displayed on the desktop. After the current user utters the voice "i want to go and shop", the method 1 of one embodiment of the present invention recognizes that it is the voice control command for the entity in the current page "shop" and performs further actions, such as displaying a nearby mall for the current user. Of course, such an application may also be on a mobile terminal or a fixed device, which when turned on causes some options to appear on the display desktop by default.
An application scenario in which the current page is a page displayed in response to an action of a current user (a user operating a vehicle-mounted device, a mobile terminal, a fixed device, or the like), for example, in a certain vehicle-mounted application, the current user activates the vehicle-mounted application first, and then says, for example, "please provide to restaurants near me", at which time "spicy enticement," "boiling fish village," "full focus," or the like is displayed on a display screen of the vehicle-mounted device, and the current user says "i want to go to full focus", at which time method 1 of identifying a voice control instruction for an entity in the current page of an embodiment of the present invention identifies that it is a voice control instruction for the entity "full focus" in the current page, and further performs further actions, for example, turning on a phone for full focus, or displaying a specific route to full focus, or the like. Of course, such an application may also be on a mobile terminal or a fixed device, after the current user makes some options appear on the display screen of the mobile terminal or the fixed device through the preceding operation, the method 1 for recognizing the voice control command for the entity in the current page of the embodiment of the present invention may be used to recognize whether the voice next uttered by the current user is the voice control command for the entity in the current page, and to which entity.
In step 110, entities are extracted from the current page.
In one case, after analyzing the composition of the current page, it is found that the current page mainly includes several frames, in which there is a word (identifying whether the text in the frame is a word or a phrase or sentence composed of several words can be realized by the existing word segmentation technology), and the word in each frame can be considered as an entity.
In another case, after analyzing the composition of the current page, it is found that the current page mainly includes several frames, in which there is a phrase or sentence, respectively, or after analyzing the composition of the current page, it is found that the current page is an article, or a page with a complex structure including various characters and various frames, and at this time, entities need to be extracted from the method of fig. 2, for example.
In sub-step 1101, the text in the current page is participled.
Typically all the words identified on the current page are segmented. For example, in the case that the current page mainly includes several frames, and there is a phrase or sentence in each of the several frames, the phrase or sentence is divided into words. For example, where the current page is an article, the article is segmented. The word segmentation can be realized by adopting the existing word segmentation method.
In sub-step 1102, the part of speech of the segmented word is determined.
At present, mature technologies exist in the aspect of semantic analysis. The part-of-speech of the separated word can be judged by adopting the method for judging the part-of-speech in the prior art. Generally, only real words such as nouns, verbs, adjectives, etc., and ordinal words may become entities. It is unlikely that a particle will become an entity.
In sub-step 1103, each of the separated words having a particular part of speech is input to a classifier trained in advance using a set of word samples of entities and non-entities to determine whether the word is a word constituting an entity and whether the word constitutes the beginning, middle, or end of the entity.
Words of a particular part of speech such as real words and ordinal words. In some cases, words of a particular part of speech may be specified only as nouns and ordinal words.
Machine learning currently has mature technology. A model, i.e., classifier, may be trained using a set of a large number of real words and a large number of non-real word samples. Specifically, each of the samples of the real words and the non-real words is input to a classifier, and whether the word is from a real word or a non-real word and constitutes the head, middle or end of the real word is input to the classifier, from which the classifier learns what rules the words from the real words, from the non-real words, and the words constituting the head, middle and end of the real words respectively have. Thus, when a new word is input to the classifier, the classifier can determine whether the word is a word constituting an entity and whether the word is at the beginning, middle, or end of the constituting entity.
In sub-step 1104, it is determined whether the word with the specific part of speech is an entity according to the determination result of the classifier for each word in the separated words with the specific part of speech.
For example, for "boiling fish town", the classifier determines that "boiling" is often the beginning of the entity, "boiled" is often the middle or end of the entity, "fish" is often the middle of the entity, "town" is often the end of the entity, and thus, determines that "boiling fish town" is the entity.
In step 120, a set of candidate instructions is constructed based on the extracted entities and corresponding construction templates.
Candidate instructions are instructions that a user may issue for an entity of a current page. The construction template refers to a language mode which can be used when a user issues an instruction aiming at the entity of the current page. For example, for "2. boiling fish village" on the current page, the user may issue instructions "i want to go to boiling fish village", "select boiling fish village", "2", "select 2", etc. "i want to go to boil fish village", "select to boil fish village", "boiling fish village", "2", "select 2" are candidate instructions, and "i want to go xx", "select xx", "No." and "select No." are construction templates.
One way of forming the construction templates is to pre-define various construction templates for various entities by a person in advance and store them in a database.
Another way of forming the construction formwork is: an entity is extracted from each voice control command in a set of historical voice control commands of all users, and language patterns around the extracted entity are extracted to be used as a construction template corresponding to the extracted entity.
For example, the entity "Tiananmen" is extracted from "i want to go to Tiananmen, sit how much money? on subway", the independent sentence with the predicate in which it is located is "i want to go to Tiananmen", and the language mode is "i want to go xx".
For example, for a user using an application, an entity may be extracted from each voice control command issued when the user using the application uses the application historically, and the language patterns around the extracted entity may be extracted as a construction template corresponding to the extracted entity.
The advantage of forming the construction templates is that the construction templates are collected from the actual application of the user, and are not thought out by people in advance, so that the objectivity of the construction templates is improved, and the accuracy of recognizing the voice control instructions for the entities in the current page is improved.
Another way of forming the construction formwork is: and extracting an entity from each voice control command in the historical voice control command set of the current user, and extracting language patterns around the extracted entity to be used as a construction template corresponding to the extracted entity.
For example, for a current user using a certain user, entities can be extracted from each voice control command issued when the current user using the application uses the application historically, and language patterns around the extracted entities can be extracted as a construction template corresponding to the extracted entities.
The advantage of forming the construction template is that the construction template is extracted from the historical voice control commands of the current user, and reflects the characteristics of the current user's own language, for example, when the current user historically sees that the page has "boiling fish village", the user often says "want to eat the boiling fish village" instead of "i want to go to the boiling fish village", at this time, the user may "want to eat xx", which may be a more common construction template for the current user, so that the manner of forming the construction template can adapt to the personalized requirements of the user, and the accuracy of recognizing the voice control commands for the entities in the current page is improved.
One way to construct a set of candidate instructions based on the extracted entities and corresponding construction templates is to directly apply the extracted entities to the corresponding construction templates of the extracted entities, resulting in candidate instructions to be placed in the set of candidate instructions.
For example, the extracted entity is "boiling fish village", and the corresponding construction templates are "i want to go xx", "xx" and "choose xx", and the extracted entity is applied to these construction templates, and the obtained candidate instructions are "i want to go boiling fish village", "boiling fish village" and "choose boiling fish village", and they are put into the candidate instruction set. In one approach, they may be placed in the candidate instruction set corresponding to "boiling fish village".
In another embodiment, as shown in FIG. 3, the step of constructing the set of candidate instructions 120 based on the extracted entities and corresponding construction templates includes sub-steps 1201 and 1202.
In sub-step 1201, a synonym of the extracted entity is obtained based on the extracted entity.
A synonym database is constructed in advance. For example, the expert finds synonyms one by one for the entities extracted from each voice control command in the set of historical voice control commands of all users or current users, and places the synonyms in the synonym database. Or classifying all words in the dictionary by an expert, forming a synonym set by the words with similar meanings, and forming a synonym database by all the synonym sets. The synonym database may also be constructed in other ways.
After the synonym database is constructed, the synonyms of the extracted entities can be obtained based on the extracted entities in a mode of searching the synonym database.
In the sub-step 1202, the extracted entity and the obtained synonym are respectively applied to the corresponding construction template of the extracted entity, so as to respectively obtain corresponding candidate instructions, and the corresponding candidate instructions are put into a candidate instruction set.
For example, if the extracted entity is "Beijing university," the obtained synonym is "Beida," and the corresponding construction templates are "navigate to xx," "go xx," "i want to go xx," and "call to xx," then the resulting candidate instructions are:
navigation to Beijing university
To Beijing university
To Beijing university
I want to go to Beijing university
-telephone to Beijing university
Navigation to North university
Northeast China root of China
To north go
I want to go to North university
Telephone calls to north.
In step 130, voice control instructions for entities in the current page are identified from the user spoken speech based on a match of the user spoken speech with candidate instructions in the set of candidate instructions.
For example, a part of a pause in the speech of the current user is recognized, the speech between two parts of the pause in the speech of the current user is considered to be speech of a small sentence, the speech of the small sentence is recognized as characters by using a speech recognition method known in the art, the characters are compared with candidate instructions in the candidate instruction set constructed in step 120 one by one, when the speech of the small sentence after being recognized as characters is found to be completely consistent with one candidate instruction in the candidate instruction set constructed in step 120 or comprises one candidate instruction in the candidate instruction set constructed in step 120, it is considered that a match between the speech spoken by the user and the candidate instruction in the candidate instruction set is found, and the candidate instruction in the found candidate instruction set is the speech control instruction for the entity in the current page.
Then, it can be further determined to which entity of the current page the recognized voice control instruction is directed. As described above, when the corresponding candidate instructions obtained according to the extracted entities are put into the candidate instruction set, the candidate instructions may be stored in the candidate instruction set in a manner of corresponding to the extracted entities, so that in response to the matching of the voice spoken by the user with one candidate instruction in the candidate instruction set, the extracted entity corresponding to the candidate instruction may be identified, and thus, which entity of the current page the identified voice control instruction is directed to is determined.
The voice control command may then be executed after recognizing the voice control instruction for the entity in the current page from the voice spoken by the user. For example, the executive program code corresponding to each candidate instruction in the candidate instruction set is placed in another database, and when one candidate instruction in the candidate instruction set is found (i.e. the voice control instruction is identified), the voice control command can be executed by executing the corresponding executive program code in the another database.
As shown in fig. 4, the apparatus 2 for recognizing a voice control command for an entity in a current page according to one embodiment of the present invention includes: an extracting unit 210 configured to extract an entity from a current page; a construction unit 220 configured to construct a set of candidate instructions based on the extracted entities and corresponding construction templates; a recognition unit 230 configured to recognize a voice control instruction for an entity in the current page from the voice spoken by the user based on a matching of the voice spoken by the user with a candidate instruction in the set of candidate instructions. The above units can be implemented in software, hardware (FPGA, integrated circuit, etc.) or a combination of software and hardware.
Optionally, the extraction unit 210 is configured to: segmenting words in the current page; judging the part of speech of the divided words; inputting each word of the separated words with specific part of speech into a classifier to judge whether the word is a word forming an entity and whether the word forms the beginning, the middle or the end of the entity, wherein the classifier is trained in advance by a set of word samples of the entity and non-entity; and judging whether the word with the specific part of speech is an entity or not according to the judgment result of the classifier on each word in the separated words with the specific part of speech.
Alternatively, the construction template is formed beforehand as follows: and extracting an entity from each voice control command in the historical voice control command set of the current user, and extracting language patterns around the extracted entity to be used as a construction template corresponding to the extracted entity.
Alternatively, the construction template is formed beforehand as follows: an entity is extracted from each voice control command in a set of historical voice control commands of all users, and language patterns around the extracted entity are extracted to be used as a construction template corresponding to the extracted entity.
Optionally, the construction unit is configured to: acquiring synonyms of the extracted entities based on the extracted entities; and respectively applying the extracted entity and the obtained synonym to the corresponding construction template of the extracted entity to respectively obtain corresponding candidate instructions, and putting the corresponding candidate instructions into a candidate instruction set.
Optionally, the identifying unit 230 is configured to identify the extracted entity corresponding to a candidate instruction in response to the voice spoken by the user matching with the candidate instruction in the candidate instruction set, so as to identify the voice control instruction for the extracted entity in the current page from the voice spoken by the user.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.

Claims (8)

1. A method (1) of recognizing speech control instructions for entities in a current page, comprising:
extracting an entity (110) from a current page, wherein the entity comprises a real word or a serial number displayed by the page;
constructing a set of candidate instructions (120) based on the extracted entities and corresponding construction templates;
identifying voice control instructions (130) for entities in the current page from the voice spoken by the user based on a match of the voice spoken by the user with candidate instructions in the set of candidate instructions;
wherein the construction template is formed in advance as follows: extracting an entity from each voice control command in a historical voice control command set of a current user, and extracting language modes around the extracted entity to serve as a construction template corresponding to the extracted entity;
alternatively, the construction template is formed beforehand as follows: an entity is extracted from each voice control command in a set of historical voice control commands of all users, and language patterns around the extracted entity are extracted to be used as a construction template corresponding to the extracted entity.
2. The method of claim 1, wherein the step of extracting the entity (110) from the current page comprises:
segmenting words in a current page (1101);
judging the part of speech of the separated words (1102);
inputting each word of the separated words with specific part of speech into a classifier to determine whether the word is a word constituting an entity and whether the word constitutes the beginning, middle or end of the entity (1103), the classifier being trained in advance using a set of word samples of entities and non-entities;
and judging whether the word with the specific part of speech is an entity or not according to the judgment result of the classifier on each word in the separated words with the specific part of speech (1104).
3. The method of claim 1, wherein the step of constructing a set of candidate instructions (120) based on the extracted entities and corresponding construction templates comprises:
acquiring synonyms of the extracted entities based on the extracted entities (1201);
and respectively applying the extracted entity and the obtained synonym to the corresponding construction template of the extracted entity to respectively obtain corresponding candidate instructions, and putting the corresponding candidate instructions into a candidate instruction set (1202).
4. The method of claim 1, wherein the step of identifying speech control instructions (130) for entities in the current page from the user's spoken speech based on a match of the user's spoken speech with candidate instructions in the set of candidate instructions comprises:
and in response to the voice spoken by the user being matched with one candidate instruction in the candidate instruction set, identifying the extracted entity corresponding to the candidate instruction, so as to identify the voice control instruction aiming at the extracted entity in the current page from the voice spoken by the user.
5. An apparatus (2) for recognizing speech control instructions for entities in a current page, comprising:
the extraction unit (210) is configured to extract an entity from the current page, wherein the entity comprises a real word or a serial number displayed by the page;
a construction unit (220) configured to construct a set of candidate instructions based on the extracted entities and corresponding construction templates;
a recognition unit (230) configured to recognize a voice control instruction for an entity in the current page from the voice spoken by the user based on a matching of the voice spoken by the user with a candidate instruction of the set of candidate instructions
Wherein the construction template is formed in advance as follows: extracting an entity from each voice control command in a historical voice control command set of a current user, and extracting language modes around the extracted entity to serve as a construction template corresponding to the extracted entity;
alternatively, the construction template is formed beforehand as follows: an entity is extracted from each voice control command in a set of historical voice control commands of all users, and language patterns around the extracted entity are extracted to be used as a construction template corresponding to the extracted entity.
6. The apparatus according to claim 5, wherein the decimation unit (210) is configured to:
segmenting words in the current page;
judging the part of speech of the divided words;
inputting each word of the separated words with specific part of speech into a classifier to judge whether the word is a word forming an entity and whether the word forms the beginning, the middle or the end of the entity, wherein the classifier is trained in advance by a set of word samples of the entity and non-entity;
and judging whether the word with the specific part of speech is an entity or not according to the judgment result of the classifier on each word in the separated words with the specific part of speech.
7. The apparatus according to claim 5, wherein the construction unit (220) is configured to:
acquiring synonyms of the extracted entities based on the extracted entities;
and respectively applying the extracted entity and the obtained synonym to the corresponding construction template of the extracted entity to respectively obtain corresponding candidate instructions, and putting the corresponding candidate instructions into a candidate instruction set.
8. Apparatus according to claim 5, wherein the recognition unit (230) is configured to recognize the extracted entity corresponding to a candidate instruction in response to the user uttering speech matching the candidate instruction of the set of candidate instructions, thereby to recognize from the user uttered speech the speech control instruction for the extracted entity in the current page.
CN201510031182.3A 2015-01-21 2015-01-21 Method and device for recognizing voice control instruction aiming at entity in current page Active CN105869640B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510031182.3A CN105869640B (en) 2015-01-21 2015-01-21 Method and device for recognizing voice control instruction aiming at entity in current page

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510031182.3A CN105869640B (en) 2015-01-21 2015-01-21 Method and device for recognizing voice control instruction aiming at entity in current page

Publications (2)

Publication Number Publication Date
CN105869640A CN105869640A (en) 2016-08-17
CN105869640B true CN105869640B (en) 2019-12-31

Family

ID=56623123

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510031182.3A Active CN105869640B (en) 2015-01-21 2015-01-21 Method and device for recognizing voice control instruction aiming at entity in current page

Country Status (1)

Country Link
CN (1) CN105869640B (en)

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108074565A (en) * 2016-11-11 2018-05-25 上海诺悦智能科技有限公司 Phonetic order redirects the method and system performed with detailed instructions
CN109215644B (en) * 2017-07-07 2021-10-15 佛山市顺德区美的电热电器制造有限公司 Control method and device
CN107678309B (en) * 2017-09-01 2021-07-06 科大讯飞股份有限公司 Control sentence pattern generation and application control method and device and storage medium
CN107919129A (en) * 2017-11-15 2018-04-17 百度在线网络技术(北京)有限公司 Method and apparatus for controlling the page
CN108470566B (en) * 2018-03-08 2020-09-15 腾讯科技(深圳)有限公司 Application operation method and device
CN110176227B (en) * 2018-03-26 2023-07-14 腾讯科技(深圳)有限公司 Voice recognition method and related device
CN111742539B (en) 2018-08-07 2022-05-06 华为技术有限公司 Voice control command generation method and terminal
CN111383631B (en) * 2018-12-11 2024-01-23 阿里巴巴集团控股有限公司 Voice interaction method, device and system
CN110400576B (en) * 2019-07-29 2021-10-15 北京声智科技有限公司 Voice request processing method and device
CN110782897B (en) * 2019-11-18 2021-11-23 成都启英泰伦科技有限公司 Voice terminal communication method and system based on natural semantic coding
CN112331207A (en) * 2020-09-30 2021-02-05 音数汇元(上海)智能科技有限公司 Service content monitoring method and device, electronic equipment and storage medium
CN112509573A (en) * 2020-11-19 2021-03-16 北京蓦然认知科技有限公司 Voice recognition method and device
CN112668337B (en) * 2020-12-23 2022-08-19 广州橙行智动汽车科技有限公司 Voice instruction classification method and device
TWI805008B (en) * 2021-10-04 2023-06-11 中華電信股份有限公司 Customized intent evaluation system, method and computer-readable medium

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003296333A (en) * 2002-04-04 2003-10-17 Canon Inc Image display system, its control method and program for realizing the control method
KR101056511B1 (en) * 2008-05-28 2011-08-11 (주)파워보이스 Speech Segment Detection and Continuous Speech Recognition System in Noisy Environment Using Real-Time Call Command Recognition
CN101645064B (en) * 2008-12-16 2011-04-06 中国科学院声学研究所 Superficial natural spoken language understanding system and method thereof
CN101901235B (en) * 2009-05-27 2013-03-27 国际商业机器公司 Method and system for document processing
CN103455507B (en) * 2012-05-31 2017-03-29 国际商业机器公司 Search engine recommends method and device
CN103020098A (en) * 2012-07-11 2013-04-03 腾讯科技(深圳)有限公司 Navigation service searching method with speech recognition function
CN102833610B (en) * 2012-09-24 2015-05-13 北京多看科技有限公司 Program selection method, apparatus and digital television terminal
CN103219005B (en) * 2013-04-28 2016-01-20 北京云知声信息技术有限公司 A kind of audio recognition method and device
CN103678281B (en) * 2013-12-31 2016-10-19 北京百度网讯科技有限公司 The method and apparatus that text is carried out automatic marking

Also Published As

Publication number Publication date
CN105869640A (en) 2016-08-17

Similar Documents

Publication Publication Date Title
CN105869640B (en) Method and device for recognizing voice control instruction aiming at entity in current page
US10402501B2 (en) Multi-lingual virtual personal assistant
JP7022062B2 (en) VPA with integrated object recognition and facial expression recognition
US9484034B2 (en) Voice conversation support apparatus, voice conversation support method, and computer readable medium
US10558701B2 (en) Method and system to recommend images in a social application
US11494161B2 (en) Coding system and coding method using voice recognition
JP3962763B2 (en) Dialogue support device
US8543375B2 (en) Multi-mode input method editor
US20170103061A1 (en) Interaction apparatus and method
KR20180025121A (en) Method and apparatus for inputting information
WO2014190732A1 (en) Method and apparatus for building a language model
CN111666380A (en) Intelligent calling method, device, equipment and medium
CN110910903B (en) Speech emotion recognition method, device, equipment and computer readable storage medium
CN104915420B (en) Knowledge base data processing method and system
CN110808032A (en) Voice recognition method and device, computer equipment and storage medium
CN112818680B (en) Corpus processing method and device, electronic equipment and computer readable storage medium
US20160110339A1 (en) Information processing apparatus, information processing method, and program
CN107424612A (en) Processing method, device and machine readable media
CN112860995A (en) Interaction method, device, client, server and storage medium
CN114969195B (en) Dialogue content mining method and dialogue content evaluation model generation method
CN113268981A (en) Information processing method and device and electronic equipment
KR20220111570A (en) The electronic device for processing the user's inquiry and the method for operating the same
CN109144284B (en) Information display method and device
CN113535970A (en) Information processing method and apparatus, electronic device, and computer-readable storage medium
KR20160054751A (en) System for editing a text and method thereof

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20211117

Address after: 210034 floor 8, building D11, Hongfeng Science Park, Nanjing Economic and Technological Development Zone, Jiangsu Province

Patentee after: New Technology Co.,Ltd.

Patentee after: Volkswagen (China) Investment Co., Ltd

Address before: Room 307, Building 489 Songtao Road, Zhangjiang High-tech Park, Pudong New Area, Shanghai, 201203

Patentee before: SHANGHAI MOBVOI INFORMATION TECHNOLOGY Co.,Ltd.

TR01 Transfer of patent right