CN111640432A

CN111640432A - Voice control method and device, electronic equipment and storage medium

Info

Publication number: CN111640432A
Application number: CN202010463288.1A
Authority: CN
Inventors: 高丛; 苏少炜; 常乐
Original assignee: Beijing SoundAI Technology Co Ltd
Current assignee: Beijing SoundAI Technology Co Ltd
Priority date: 2020-05-27
Filing date: 2020-05-27
Publication date: 2020-09-08
Anticipated expiration: 2040-05-27
Also published as: CN111640432B

Abstract

The disclosure provides a voice control method, a voice control device, electronic equipment and a storage medium, and belongs to the technical field of internet. The method comprises the following steps: acquiring a first statement corresponding to an input first voice signal, wherein the first voice signal is used for controlling the execution of a target task; acquiring target inference information from the first sentence, wherein the target inference information is a phrase describing an entity in an indirect manner; acquiring a target entity corresponding to the target reasoning information according to the target reasoning information; replacing the target inference information in the first statement with the target entity to obtain a second statement; based on the second statement, the target task is executed. By the method, the hidden target entity is obtained under the condition that the target entity is given by the input sentence in an indirect mode, and the accuracy of voice control can be improved.

Description

Voice control method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of internet technologies, and in particular, to a voice control method and apparatus, an electronic device, and a storage medium.

Background

The task-based dialog has a wide application scene, for example, the task-based dialog can be applied to the scenes of weather inquiry, air ticket reservation and the like. Among other things, a voice signal of a task-based dialog input generally describes an instruction, and the electronic device is required to recognize an intention of the voice signal and then perform a task corresponding to the instruction based on the intention.

The intention includes a slot position value of the slot, and when the electronic device recognizes the intention of the voice signal, the electronic device needs to recognize an entity in the voice signal and then recognize the slot position value of the slot from the entity. When the entity is not given in the voice signal in a direct mode, the intelligent device cannot identify the implicit entity, so that the intention of the voice signal cannot be correctly acquired, and the task corresponding to the voice signal cannot be correctly executed, and the accuracy of voice control is low.

Disclosure of Invention

The embodiment of the disclosure provides a voice control method, a voice control device, an electronic device and a storage medium, which can improve the accuracy of voice control. The technical scheme is as follows:

in a first aspect, a method for controlling voice is provided, the method comprising:

acquiring a first statement corresponding to an input first voice signal, wherein the first voice signal is used for controlling the execution of a target task;

acquiring target inference information from the first sentence, wherein the target inference information is a phrase describing an entity in an indirect manner;

acquiring a target entity corresponding to the target reasoning information according to the target reasoning information;

replacing the target inference information in the first statement with the target entity to obtain a second statement;

executing the target task based on the second statement.

In a possible implementation manner, the obtaining target inference information from the first sentence includes:

acquiring entity names in the first sentence, and acquiring a syntax tree of the first sentence, wherein the syntax tree comprises a plurality of nodes and a syntax relation between each node, and each node corresponds to a word in the first sentence;

and acquiring the target inference information according to the entity designation and the syntax tree.

In another possible implementation manner, the obtaining the target inference information according to the entity name and the syntax tree includes:

taking the node corresponding to the entity designation as a first base node of the syntax tree;

selecting a first target node from the first base node among adjacent nodes of the syntax tree, wherein a first target word corresponding to the first target node and the entity designation meet a target grammatical relationship, and the first target word and the entity designation are adjacent in the first sentence;

splicing the first target words and the entity names to obtain first reasoning information of the syntax tree;

and acquiring the target inference information according to the first inference information.

In another possible implementation manner, the obtaining the target inference information according to the first inference information includes:

taking the node corresponding to the first inference information as a second basic node of the syntax tree;

responding to a second target node existing in the adjacent node of the second basic node in the syntax tree, splicing a second target word corresponding to the second target node with the first reasoning information to obtain second reasoning information of the syntax tree, and acquiring the target reasoning information according to the second reasoning information, wherein the second target word and the first reasoning information meet the target grammatical relation, and the second target word and the first reasoning information are adjacent in the first sentence;

in response to the second target node not being present in the neighboring nodes of the second base node in the syntax tree, taking the first inference information as the target inference information.

In another possible implementation manner, the obtaining an entity name in the first statement includes:

performing word segmentation processing on the first sentence to obtain a word segmentation in the first sentence;

and taking the participles with the part of speech as the target part of speech in the participles as the entity names, or taking the participles with the type as the first target type in the participles as the entity names, or taking the participles matched with the entities in a predefined entity library in the participles as the entity names.

In another possible implementation manner, the obtaining, according to the target inference information, a target entity corresponding to the target inference information includes:

acquiring a first entity in the target inference information and a first attribute relationship of the first entity;

constructing a query statement according to the first entity and the first attribute relation;

inquiring a first attribute value corresponding to the inquiry statement through the inquiry statement;

and taking the first attribute value as the target entity.

selecting a target relational statement with the highest similarity to the target reasoning information from a relational statement library according to the target reasoning information;

acquiring a second attribute value corresponding to the target relational statement;

and taking the second attribute value as the target entity.

In another possible implementation manner, the selecting, according to the target inference information, a target relational statement from a relational statement library with the highest similarity to the target inference information includes:

acquiring a first feature vector corresponding to the target inference information and a second feature vector corresponding to each relational statement in the relational statement library;

and selecting the target relational statement from the relational statement library according to the first characteristic vector and the second characteristic vector corresponding to each relational statement.

In another possible implementation manner, the executing the target task based on the second statement includes:

obtaining an intention of the second statement, the intention comprising a slot bit value of a slot;

in response to the slot location value being the target entity and the type of the target entity being a second target type corresponding to the slot, or in response to the slot location value being the target entity and the target entity being an entity in a predefined entity library, performing the target task according to the intent.

In a second aspect, there is provided a voice control apparatus, the apparatus comprising:

the first sentence acquisition module is configured to acquire a first sentence corresponding to an input first voice signal, and the first voice signal is used for controlling the execution of a target task;

a target inference information acquisition module configured to acquire target inference information from the first sentence, the target inference information being a phrase describing an entity in an indirect manner;

a target entity obtaining module configured to obtain a target entity corresponding to the target inference information according to the target inference information;

the second statement acquisition module is configured to replace the target inference information in the first statement with the target entity to obtain a second statement;

a task execution module configured to execute the target task based on the second statement.

In a possible implementation manner, the target inference information obtaining module is further configured to obtain an entity name in the first sentence, and obtain a syntax tree of the first sentence, where the syntax tree includes a plurality of nodes and a syntactic relation between each node, and each node corresponds to a word in the first sentence; and acquiring the target inference information according to the entity designation and the syntax tree.

In another possible implementation manner, the target inference information obtaining module is further configured to use a node corresponding to the entity designation as a first base node of the syntax tree; selecting a first target node from the first base node among adjacent nodes of the syntax tree, wherein a first target word corresponding to the first target node and the entity designation meet a target grammatical relationship, and the first target word and the entity designation are adjacent in the first sentence; splicing the first target words and the entity names to obtain first reasoning information of the syntax tree; and acquiring the target inference information according to the first inference information.

In another possible implementation manner, the target inference information obtaining module is further configured to use a node corresponding to the first inference information as a second base node of the syntax tree; responding to a second target node existing in the adjacent node of the second basic node in the syntax tree, splicing a second target word corresponding to the second target node with the first reasoning information to obtain second reasoning information of the syntax tree, and acquiring the target reasoning information according to the second reasoning information, wherein the second target word and the first reasoning information meet the target grammatical relation, and the second target word and the first reasoning information are adjacent in the first sentence; in response to the second target node not being present in the neighboring nodes of the second base node in the syntax tree, taking the first inference information as the target inference information.

In another possible implementation manner, the target inference information obtaining module is further configured to perform word segmentation processing on the first sentence to obtain a word segmentation in the first sentence; and taking the participles with the part of speech as the target part of speech in the participles as the entity names, or taking the participles with the type as the first target type in the participles as the entity names, or taking the participles matched with the entities in a predefined entity library in the participles as the entity names.

In another possible implementation manner, the target entity obtaining module is further configured to obtain a first entity in the target inference information and a first attribute relationship of the first entity; constructing a query statement according to the first entity and the first attribute relation; inquiring a first attribute value corresponding to the inquiry statement through the inquiry statement; and taking the first attribute value as the target entity.

In another possible implementation manner, the target entity obtaining module is further configured to select, according to the target inference information, a target relational statement with the highest similarity to the target inference information from a relational statement library; acquiring a second attribute value corresponding to the target relational statement; and taking the second attribute value as the target entity.

In another possible implementation manner, the target entity obtaining module is further configured to obtain a first feature vector corresponding to the target inference information and a second feature vector corresponding to each relational statement in the relational statement library; and selecting the target relational statement from the relational statement library according to the first characteristic vector and the second characteristic vector corresponding to each relational statement.

In another possible implementation manner, the task execution module is further configured to obtain an intention of the second statement, where the intention includes a slot bit value of a slot; in response to the slot location value being the target entity and the type of the target entity being a second target type corresponding to the slot, or in response to the slot location value being the target entity and the target entity being an entity in a predefined entity library, performing the target task according to the intent.

In a third aspect, an electronic device is provided, where the electronic device includes a processor and a memory, where the memory stores at least one instruction, and the instruction is loaded and executed by the processor to implement the operations performed in the voice control method in any one of the above possible implementations.

In a fourth aspect, a computer-readable storage medium is provided, where at least one instruction is stored in the computer-readable storage medium, and the instruction is loaded and executed by a processor to implement the operations performed by the electronic device in the voice control method in any one of the above possible implementation manners.

The technical scheme provided by the embodiment of the disclosure has the following beneficial effects:

in the embodiment of the disclosure, a first sentence corresponding to an input first voice signal is obtained, where the first voice signal is used to control execution of a target task, and target inference information is obtained from the first sentence, where the target inference information is a phrase describing an entity in an indirect manner, that is, a phrase describing a target entity in an indirect manner is obtained first, and then according to the target inference information, a target entity corresponding to the target inference information is obtained, that is, the target entity is obtained according to the phrase describing the target entity in an indirect manner, so that the implicit target entity is obtained in a case where the target entity is not directly given by an input sentence, and then the target inference information in the first sentence is replaced with the target entity to obtain a second sentence, and the target task is executed based on the second sentence, which can improve accuracy of voice control. The method can be applied to a task-based dialog system, and the task-based dialog system can also understand the sentences described by the target entities in an indirect mode through knowledge reasoning. In addition, the method firstly acquires the target inference information from the first sentence and acquires the corresponding target entity according to the target inference information, namely, the knowledge inference technology is applied to the target inference information instead of the whole sentence, so that the inference difficulty caused by the calculated amount and the irrelevant semantics can be reduced. In addition, the accurate entity is obtained by carrying out knowledge inference on the target inference information, which is equivalent to carrying out disambiguation operation on subsequent intention identification and slot extraction, so that the semantic understanding accuracy can be improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present disclosure, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present disclosure, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.

FIG. 1 is a schematic illustration of an implementation environment provided by embodiments of the present disclosure;

FIG. 2 is a flow chart of a voice control method provided by an embodiment of the present disclosure;

FIG. 3 is a flow chart of a voice control method provided by an embodiment of the present disclosure;

FIG. 4 is a schematic diagram of a syntax tree provided by an embodiment of the present disclosure;

FIG. 5 is a schematic diagram of a syntax tree provided by an embodiment of the present disclosure;

FIG. 6 is a schematic diagram of a syntax tree provided by an embodiment of the present disclosure;

FIG. 7 is a flow chart of a voice control method provided by an embodiment of the present disclosure;

fig. 8 is a block diagram of a voice control apparatus provided in an embodiment of the present disclosure;

fig. 9 is a schematic structural diagram of an electronic device provided in an embodiment of the present disclosure;

fig. 10 is a schematic structural diagram of a server according to an embodiment of the present disclosure.

Detailed Description

To make the objects, technical solutions and advantages of the present disclosure more apparent, embodiments of the present disclosure will be described in detail with reference to the accompanying drawings.

First, the professional data referred to in the specification is explained:

entity designation (Mention): the word in the natural language sentence that may correspond to an entity in the knowledge base, e.g., the entity in the sentence "i want to listen to a song of B" refers to "B" as corresponding to entity in the knowledge base.

Named Entity Recognition (NER): a task in the field of natural language processing is to identify the words such as names of people and place names in a sentence by giving the sentence. Wherein, the name of person and the name of place are types and can be customized.

Syntactic dependency analysis (syntax Parsing): analyzing the syntactic structure of the sentence to obtain the relation between words in the sentence, wherein the possible relations are as follows: a cardinal relationship, a motile relationship, a centered relationship, an additional relationship, etc.

FIG. 1 is a schematic diagram of an implementation environment provided by embodiments of the present disclosure. Referring to fig. 1, the implementation environment includes an electronic device 101 and a server 102, and the electronic device 101 and the server 102 are connected through a wireless or wired network. Moreover, a target application that the server 102 provides services may be installed on the electronic device 101, and a user corresponding to the electronic device 101 may implement functions such as data transmission and message interaction through the target application.

The electronic device 101 may be a computer, a mobile phone, a tablet computer, a smart speaker, a smart home, a smart toy, or other electronic device. The target application may be any target application installed on the electronic device 101; moreover, the target application may be a target application in the operating system of the electronic device 101, and may also be a target application provided by a third party. For example, the target application may be a shopping application, a query application, a social application, or a music application, among others. The server 102 may be a background server corresponding to the target application. Accordingly, the server 102 may be a shopping server, a query server, a social application server, or a music server, among others.

The electronic device 101 may perform voice interaction with the user through the target application, that is, the electronic device 101 may receive a first voice signal of the user through the target application, and perform a corresponding task according to the first voice signal. For example, the electronic device 101 receives a first voice signal input by a user through the query application, where the first voice signal is "query weather on moscow tomorrow", the electronic device 101 determines that the intention of the first voice signal is query weather, obtains a plurality of entities, "moscow", "tomorrow", and "weather" corresponding to the first voice signal, then identifies slot values of the intended slots from the plurality of entities, fills the slot values into the intended slots, and obtains intended slot information, where, in connection with this example, the intended slot information includes: city: moscow, date: tomorrow. Then, the electronic device 101 executes a task corresponding to the first voice signal according to the intended slot information.

In many cases, the entity in the first voice signal is often not directly given, but the target entity is indirectly described by a phrase, for example, the first voice signal is "query weather of russian capital tomorrow", which includes an implicit entity "moscow", and if the implicit entity is not recognized, the electronic device 101 cannot acquire the intended accurate slot information, and thus cannot correctly perform the task corresponding to the first voice signal.

The voice control method provided in the disclosure can accurately identify a target entity implied in the first voice signal, the electronic device 101 can avoid missing the implied entity when identifying the entity by acquiring target inference information from a first sentence corresponding to the first voice signal, where the target inference information is a phrase describing the entity in an indirect manner, and then acquiring the target entity according to the target inference information, and in combination with the above example, the electronic device 101 acquires the target inference information "russian capital", in the first sentence corresponding to the first voice signal, and then acquires the target entity "moscow" corresponding to the target inference information, so as to obtain the implied target entity. The subsequent electronic device 101 replaces the target inference information in the first sentence with the target entity to obtain a second sentence, and executes the target task based on the second sentence, so that the accuracy of voice control can be improved.

In the above scheme, the electronic device 101 may transmit the input first voice signal to the server 102, acquire the second sentence by the server 102, and execute the target task based on the second sentence.

Fig. 2 is a flowchart of a voice control method according to an embodiment of the present disclosure. Referring to fig. 2, the embodiment includes:

step 201: and acquiring a first sentence corresponding to the input first voice signal, wherein the first voice signal is used for controlling the execution of the target task.

Step 202: target inference information is obtained from the first sentence, the target inference information being a phrase describing the entity in an indirect manner.

Step 203: and acquiring a target entity corresponding to the target reasoning information according to the target reasoning information.

Step 204: and replacing the target inference information in the first statement with a target entity to obtain a second statement.

Step 205: based on the second statement, the target task is executed.

In one possible implementation, obtaining target inference information from a first statement includes:

acquiring an entity name in a first statement, and acquiring a syntax tree of the first statement, wherein the syntax tree comprises a plurality of nodes and a syntax relation between each node, and each node corresponds to a word in the first statement;

and acquiring target reasoning information according to the entity designation and the syntax tree.

In another possible implementation manner, obtaining target inference information according to the entity designation and the syntax tree includes:

taking a node corresponding to the entity designation as a first basic node of the syntax tree;

selecting a first target node from adjacent nodes of the syntax tree of the first basic node, wherein a first target word corresponding to the first target node and the entity designation meet a target grammatical relation, and the first target word and the entity designation are adjacent in a first sentence;

and acquiring target inference information according to the first inference information.

In another possible implementation manner, obtaining the target inference information according to the first inference information includes:

taking a node corresponding to the first inference information as a second basic node of the syntax tree;

responding to the second target node existing in the adjacent node of the second basic node in the syntax tree, splicing a second target word corresponding to the second target node with the first reasoning information to obtain second reasoning information of the syntax tree, acquiring the target reasoning information according to the second reasoning information, wherein the second target word and the first reasoning information meet the target grammatical relation, and the second target word and the first reasoning information are adjacent in the first sentence;

and in response to the second target node not existing in the adjacent nodes of the second base node in the syntax tree, using the first inference information as target inference information.

In another possible implementation manner, obtaining the entity designation in the first sentence includes:

and taking the participles with the part of speech as the target part of speech in the participles as entity names, or taking the participles with the type as the first target type in the participles as the entity names, or taking the participles matched with the entities in a predefined entity library in the participles as the entity names.

In another possible implementation manner, acquiring, according to the target inference information, a target entity corresponding to the target inference information includes:

acquiring a first entity in the target inference information and a first attribute relation of the first entity;

and taking the first attribute value as a target entity.

selecting a target relational statement with the highest similarity to the target reasoning information from the relational statement library according to the target reasoning information;

and taking the second attribute value as a target entity.

In another possible implementation manner, selecting a target relational statement with the highest similarity to the target inference information from the relational statement library according to the target inference information includes:

and selecting a target relational statement from the relational statement library according to the first characteristic vector and the second characteristic vector corresponding to each relational statement.

In another possible implementation, based on the second statement, performing the target task includes:

obtaining an intention of a second statement, the intention comprising a slot bit value of the slot;

and executing the target task according to the intention in response to the slot position value being the target entity and the type of the target entity being a second target type corresponding to the slot position, or in response to the slot position value being the target entity and the target entity being an entity in a predefined entity library.

Fig. 3 is a flowchart of a voice control method according to an embodiment of the present disclosure. In the embodiment of the present disclosure, an example in which an electronic device executes a task corresponding to a voice is described. Referring to fig. 3, the embodiment includes:

step 301: the electronic equipment acquires a first sentence corresponding to the input first voice signal, and the first voice signal is used for controlling the execution of the target task.

The target task refers to a specific operation that the user wants the electronic device to complete, for example, the task is an alarm setting operation, a weather query operation, a music playing operation, or other operations, which is not limited by this disclosure.

The implementation steps of the step comprise: the electronic device receives the input first voice signal, and performs Automatic Speech Recognition (ASR) on the first voice signal to obtain a first sentence corresponding to the first voice signal. For example, the first voice signal is "i want to listen to the song sung by a senior citizen" and the electronic device recognizes that the first sentence is the text information "i want to listen to the song by a senior citizen". If the first voice signal is "inquiring exchange rate of Chinese currency to money in the United states", the electronic device recognizes that the first sentence is the text message "inquiring exchange rate of Chinese currency to money in the United states".

In the embodiment of the disclosure, the electronic device performs voice recognition on the first voice signal to obtain the first sentence corresponding to the first voice signal, so as to facilitate subsequent processing.

Step 302: the electronic device obtains an entity designation in the first sentence.

Wherein the entity designation refers to a word in a natural language sentence that may correspond to an entity in the knowledge base, for example, the entity designation "B" in the sentence "i want to listen to a song of B" corresponds to the entity in the knowledge base. Also, for example, the phrase "the entity in" song' unique to play C "means" C "corresponds to the entity" C "in the knowledge base, and the entity designation" unique "corresponds to the entity" unique "in the knowledge base.

In a possible implementation manner, the implementation step of the electronic device obtaining the entity designation in the first statement may be implemented by the following steps (1) and (2), including:

(1) the electronic equipment carries out word segmentation processing on the first sentence to obtain words in the first sentence.

The electronic equipment can perform word segmentation processing on the first sentence through any word segmentation tool to obtain a word segmentation in the first sentence; for example, the electronic device may perform word segmentation processing on the first sentence through the word segmentation model to obtain a word segmentation in the first sentence. Correspondingly, the steps can be as follows: the electronic equipment inputs the first sentence into the word segmentation model to obtain the word segmentation in the first sentence output by the word segmentation model.

The word Segmentation model may be a dog search word Segmentation, a Chinese word Segmentation, a simple Chinese word Segmentation system (SCWS), Tencent Wenzhi, Pangu word Segmentation, or other word Segmentation models, which is not limited in this disclosure.

For example, the electronic device performs word segmentation on the first sentence "i want to listen to a song sung by a senior citizen" and obtaining the word segmentation may include: "I", "want", "listen", "A", "of", "old man", "singing", "of", "song". For another example, the electronic device performs a word segmentation process on the first sentence "inquiring the exchange rate of the money of China for money of the United states", and obtaining the word segmentation may include: "query", "China", "currency", "Pair", "American", "of", "money", "exchange rate".

In the embodiment of the disclosure, the electronic device performs word segmentation processing on the first sentence through the word segmentation model to obtain the word segmentation in the first sentence, and the efficiency of word segmentation processing is high, so that the efficiency of obtaining the entity name is high.

(2) The electronic equipment takes the target word segmentation in the word segmentation as an entity reference.

In one possible implementation, the target participle may be a participle whose part-of-speech is the target part-of-speech. Correspondingly, the steps can be as follows: the electronic equipment takes the participle with the part of speech as the target part of speech in the participle as an entity name. The target part of speech can be preset in the electronic device, and the target part of speech can be any part of speech. For example, the target part of speech is a noun, but of course, the target part of speech may also be other nouns, and the disclosure does not limit this.

For example, taking the target part of speech as an example, the term that the electronic device obtains the first sentence "i want to listen to the song of the old official of a" from "i", "want", "listen", "a", "that", "old official", "sing", "of", "song" includes: "A", "old man", "song".

In the embodiment of the disclosure, the participle with the part of speech as the target part of speech in the participle is used as the entity designation, so that the method is simple and easy to implement.

In another possible implementation, the target participle may be a participle of a type of the first target type. Correspondingly, the step of the electronic device using the target participle in the participles as an entity reference may be: and the electronic equipment takes the participles with the type of the first target type in the participles as entity names.

In this step, the electronic device may obtain, through Named Entity Recognition (NER), a participle of which the type is the first target type from the participles. The first object type may be preset in the electronic device, and the first object type may be any type, for example, the first object type is a name of a person, a name of a place, a name of an organization, a proper noun, and the like, and of course, the first object type may also be other types, for example, time, which is not limited in this disclosure.

For example, taking the first target type as the name of a person, the electronic device obtains the first sentence "i want to listen to the song of a's husband" from "i", "want", "listen", "a", "old man", "sing", "of", "song" and refers to the following: "A".

In the embodiment of the disclosure, the participles with the type of the first target type in the participles are used as entity names, so that the method is simple and easy to implement.

In another possible implementation, the target participle may be a participle of the participle that matches an entity in a predefined entity library. Correspondingly, the step of the electronic device using the target participle in the participles as an entity reference may be: the electronic device takes the participles in the participles that match the entities in the predefined entity library as entity designations.

The electronic equipment takes the participles matched with the entities in the predefined entity library in the participles as entity names, and the implementation steps comprise: for each participle, the electronic equipment determines the similarity between each entity in a predefined entity library and the participle, determines whether an entity with the similarity between the predefined entity library and the participle being greater than a preset threshold exists in the predefined entity library according to the similarity between each entity and the participle, and determines the participle as an entity name in response to the existence; in response to not being present, the participle is determined not to be an entity designation.

Further, the electronic device may use the vector similarity as a similarity between two words, and accordingly, the step of determining, by the electronic device, the similarity between each entity in the predefined entity library and the word includes: for each entity, determining a first vector corresponding to the entity and a second vector corresponding to the participle, determining the vector similarity between the first vector and the second vector, and taking the vector similarity as the similarity between the entity and the participle.

In the embodiment of the disclosure, since the entity designation refers to a word in the natural language sentence that may correspond to an entity in the knowledge base, the electronic device may ensure accuracy of the obtained entity designation by using a participle in the participle that matches an entity in the predefined entity base as the entity designation.

Step 303: the electronic equipment obtains a syntax tree of a first sentence, wherein the syntax tree comprises a plurality of nodes and a syntax relation between each node, and each node corresponds to one word in the first sentence.

The syntax tree is used to describe a relationship between words included in the first sentence. The electronic equipment analyzes the first sentence into a syntax tree, each node corresponds to one word in the first sentence, the position of the word in the node of the syntax tree is determined by the relative relationship between the words in the first sentence, the syntax tree describes the dependency relationship between the words in the first sentence through the structure of the tree, namely indicates the collocating relationship between the words in syntax, and the collocating relationship is related to semantics.

In one possible implementation, the obtaining, by the electronic device, the syntax tree in the first sentence may include: the electronic equipment inputs the first sentence into the syntax tree model to obtain a syntax tree of the first sentence output by the syntax tree model. The syntax tree models may be TreeLSTMs (one syntax tree model) and TBCNNs (another syntax tree model), and of course, the syntax tree models may also be other models, which is not limited in this disclosure. The syntax tree in the first sentence is obtained through the syntax tree model, the method is simple and easy to implement, and the accuracy of the syntax tree can be ensured.

For example, if the first sentence is "i want to listen to a song of a's old public", the electronic device generates a syntax tree corresponding to the first sentence as shown in fig. 4.

Step 304: the electronic device obtains target inference information based on the entity designation and the syntax tree.

The target inference information is included in the first statement, and is a phrase describing an entity in an indirect manner, the target inference information corresponds to an implicit entity, and in order to acquire the implicit entity, information to be inferred, that is, the target inference information, needs to be acquired from the first statement first. For example, the first sentence is "query weather of russian capital tomorrow", the target inference information is "russian capital", the first sentence is "i want to listen to the song of the old man of a", and the target inference information is "the old man of a".

In one possible implementation, the electronic device obtains the target inference information according to the entity designation and the syntax tree, and includes the following steps (1) to (3):

(1) the electronic device takes the node corresponding to the entity designation as a first base node of the syntax tree.

(2) The electronic equipment selects a first target node from adjacent nodes of the syntax tree of the first basic node, and splices a first target word corresponding to the first target node with the entity designation to obtain first reasoning information of the syntax tree.

Wherein the first target word and the entity reference satisfy a target grammatical relationship, and the first target word and the entity reference are adjacent in the first sentence, and the target grammatical relationship may include one of a centering relationship, a shape relationship, and an additional relationship.

(3) And the electronic equipment acquires the target inference information according to the first inference information. The method comprises the following steps (A) to (B):

(A) the method comprises the following steps The electronic equipment takes the node corresponding to the first reasoning information as a second basic node of the syntax tree, searches a second target node from the second basic node in the adjacent node of the syntax tree, responds to the second target node existing in the adjacent node of the second basic node in the syntax tree, splices a second target word corresponding to the second target node with the first reasoning information to obtain second reasoning information of the syntax tree, and acquires the target reasoning information according to the second reasoning information.

And the second target word and the first inference information meet the target grammatical relation, and the second target word and the first inference information are adjacent in the first sentence.

(B) The method comprises the following steps The electronic device takes the first inference information as target inference information in response to an absence of a second target node in a neighbor node of a second base node in the syntax tree.

It should be noted that the step of the electronic device obtaining the target inference information according to the second inference information is similar to the step of the electronic device obtaining the target inference information according to the first inference information, and is not repeated here.

For example, referring to fig. 4, the first sentence is "i want to listen to the song of the husband of a", the entity obtained by the electronic device from the first sentence is referred to as "a", and the node corresponding to "a" is taken as the first base node of the syntax tree, so the adjacent nodes include node 1 and node 2, where the word of node 1 is "and the word of node 2 is" husband ", the relationship of" and "a" is an additional relationship, and the "and" a "are adjacent in the first sentence, then node 1 is the first target node, the relationship of" and "a" is a central relationship, but the "and" a "are not adjacent in the first sentence, and then node 2 is not the first target node. And after the words and the entity of the first target node are spliced by the electronic equipment, the obtained first reasoning information is 'A'.

Then the electronic device takes the node 0 and the node 1 corresponding to the first inference information "A" as a second basic node of the syntax tree, the adjacent node is the node 2, the relationship between the words "old man" of the node 2 and "A" is an additional relationship, and the first statement is adjacent, the node 2 is a second target node, the electronic device splices the "old man" and the "of the first inference information" A "to obtain the old man of the second inference information" A ", then the electronic device takes the node 0, the node 1 and the node 2 corresponding to the" old man "of the second inference information" A as a third basic node of the syntax tree, the adjacent node is the node 3, the relationship between the words "sung" of the node 3 and "old man" of A is a cardinal relation, the node 3 is not a third target node, and the electronic device responds that no third target node exists in the adjacent node of the third basic node in the syntax tree, and taking the second inference information 'the old man of A' as target inference information.

For another example, referring to fig. 5, the first statement is "query exchange rate of money of chinese currency to us", the entity obtained by the electronic device from the first statement is referred to as "china" and "us", and nodes corresponding to "china" and "us" are used as the first base node of the syntax tree, and then the adjacent nodes include node 2, node 3 and node 4, where the term of node 2 is "money" and the term of node 3 is "money". Node 4, whose words are "currency," has an additional relationship with "the united states" and "of" and "the united states" are adjacent in the first sentence, is a first target node, node 2 has a centering relationship with "the relationship of" money "and" the united states, "but" money "and" the united states "are not adjacent in the first sentence, is not a first target node, and node 3 is not a second target node. If the relationship between "currency" and "China" is a centering relationship and "currency" and "China" are adjacent in the first sentence, the node 4 is the first target node. After the words and the entity names of the first target node are spliced by the electronic equipment, the obtained first reasoning information comprises American currency and Chinese currency.

Then, the electronic device uses the node 0, the node 1, the node 2, and the node 4 corresponding to the first inference information "american" and "chinese currency" as a second basic node of the syntax tree, and then the adjacent nodes include the node 3 and the node 5, the relationship between the word "money" of the node 3 and "american" is a fixed relationship, and the adjacent nodes are adjacent in the first sentence, then the node 3 is a second target node. If the relation between the word "exchange rate" of the node 5 and the word "chinese currency" is a fixed relation, but is not adjacent in the first sentence, and the node 5 is not the second target node, the electronic device uses the first inference information "chinese currency" as the target inference information in response to the second target node not being searched from the adjacent nodes of the node 0 and the node 4. Then, the electronic device splices the money and the first inference information U.S to obtain second inference information U.S. money, then the electronic device uses a node 1, a node 2 and a node 3 corresponding to the second inference information U.S. money as a second basic node of a syntax tree, then the adjacent nodes comprise a node 6 and a node 7, the relation of the word pair of the node 6 to the U.S. money is a concierge relation, the node 6 is not a third target node, the relation of the word pair of the node 7 to the U.S. money is an additional relation, and the adjacent nodes are adjacent in a first sentence, the node 7 is a third target node, and the electronic device splices the money to the U.S. money to obtain third inference information U.S. money. And taking the node corresponding to the third inference information as a fourth basic node, if the adjacent node only has the node 6, and the node 6 is not a fourth target node, and if the fourth target node does not exist in the adjacent node of the fourth basic node in the syntax tree, the electronic device takes the third inference information "money in the united states" as the target inference information. The target inference information for "inquiring the exchange rate of the money of the chinese currency to the money of the united states" thus includes "the money of chinese" and "the money of the united states".

For another example, referring to fig. 6, the first sentence is "i want to listen to a song of a old man", the entity obtained by the electronic device from the first sentence is referred to as "a", the node corresponding to "a" is used as the first base node of the syntax tree, the adjacent node is node 1, the term of node 1 is "old man", the relationship between "old man" and "a" is a middle relationship, and "old man" and "a" are adjacent in the first sentence, node 1 is the first target node, and the electronic device splices the term of the first target node and the term of the first base node to obtain the first inference term of "a old man".

Then the electronic device takes a node 0 and a node 1 corresponding to the first inference information "a-old-man" as second basic nodes of the syntax tree, the adjacent nodes comprise a node 2 and a node 3, the relationship between the word "of the node 2 and" a-old-man "is an additional relationship, the first statement is adjacent, the node 2 is a second target node, the relationship between the word" song "of the node 3 and" a-old-man "is a neutral relationship, but the" song "and" a-old-man "are not adjacent in the first statement, the node 3 is not a second target node, and the electronic device splices the" song "and the first inference information" a-old-man "to obtain second inference information" a-old-man ". Then the electronic equipment takes the node 0, the node 1 and the node 2 corresponding to the second inference information ' A old man ' as a third basic node of the syntax tree, the adjacent node is the node 3, the relation between the words ' song ' of the node 3 and ' A old man ' is a fixed relation, the ' song ' and the ' A old man ' are adjacent in the first semantic signal, the node 3 is a third target node, and the electronic equipment splices the ' song ' and the second inference information ' A old man ' to obtain the third inference information ' A old man ' song '. Then, the electronic device takes the node corresponding to the third inference information as a fourth basic node, if the adjacent node is only the node 4, and the node 4 is not the fifth target node, the electronic device responds that the fourth target node does not exist in the adjacent nodes of the fourth basic node in the syntax tree, and takes the third inference information "a song of old and well-known" as the target inference information.

In the embodiment of the disclosure, since the target inference information is a phrase describing the target entity in an indirect manner, and the syntax tree describes the dependency relationship between words in the sentence and indicates the syntactic matching relationship between words, which is associated with semantics, the electronic device acquires the target inference information by acquiring the entity designation in the first sentence and acquiring the syntax tree of the first sentence and combining the entity designation with the syntax tree, thereby ensuring the accuracy of the target inference information.

Step 305: and the electronic equipment acquires a target entity corresponding to the target reasoning information according to the target reasoning information.

In one possible implementation manner, the implementation step of this step includes: the electronic equipment acquires a first entity in the target inference information and a first attribute relationship of the first entity; the electronic equipment constructs a query statement according to the first entity and the first attribute relation; inquiring a first attribute value corresponding to an inquiry statement through the inquiry statement; the electronic equipment takes the first attribute value as a target entity.

The electronic equipment can acquire the first entity and the first attribute relationship of the first entity in the target inference information by querying a preset dictionary, and the dictionary is used for storing the entities and the attribute relationship of the entities. For example, entities stored in the dictionary include "a", "C", "b", and stored attribute relationships include "husband", "wife", "work-song", "work-movie", and the like. The above-described entity and attribute relationships are merely exemplary and are not intended to be limiting in this disclosure.

Taking the target inference information as "song of old man a" as an example for explanation, the step of acquiring, by the electronic device, the first entity and the first attribute relationship of the first entity in the target inference information includes: the electronic equipment obtains a first entity 'A' in a dictionary corresponding to the participle 'A', a first attribute relation 'husband' in the dictionary corresponding to the participle 'A', a first attribute relation 'work-song' in the dictionary corresponding to the participle 'song', and determines that the first entity included in the 'Song of the participle' A and the first attribute relation of the first entity are < A >, < husband >, < work-song >.

The electronic device may query the knowledge-graph for a first attribute value corresponding to the query statement.

In combination with the above example, the electronic device constructs the first entity < a > and the first attribute relation < husband > as < husband > of the query statement < a >, obtains the attribute value corresponding to the < husband > of the < a > in the knowledge graph, then the electronic device constructs and the < work-song > as < work-song > of the query statement , obtains the first attribute value corresponding to the < work-song > of the in the knowledge graph as < rice fragrance >, and then takes the "rice fragrance" as the target entity.

In the embodiment of the disclosure, the electronic device acquires the first entity in the target inference information and the first attribute relationship of the first entity, constructs the query statement according to the first entity and the first attribute relationship, acquires the first attribute value corresponding to the query statement through the query statement, and uses the first attribute value as the target entity.

In another possible implementation manner, the step of acquiring, by the electronic device, the target entity corresponding to the target inference information according to the target inference information includes: and the electronic equipment selects a target relational statement with the highest similarity with the target inference information from the relational statements according to the target inference information, acquires a second attribute value corresponding to the target relational statement, and takes the second attribute value as a target entity.

The relational statement library is configured to store a plurality of relational statements, where a relational statement includes an entity and an attribute relationship of the entity, and the structure of the relational statement may be < attribute relationship > of < entity >, for example, the relational statement is "wife of C".

The electronic device may create the relational sentence library in advance, and the step of creating the relational sentence library by the electronic device may include: the electronic equipment acquires all entity and attribute relations in the knowledge graph, splices the entity and attribute relations into artificial sentences according to the sentence pattern of the attribute relation of the entity to obtain relation sentences, and sets attribute values corresponding to the relation sentences. For example, entity "a" and attribute relationship "husband" in the knowledge-graph are spliced into a relationship statement: the husband of A sets the attribute value corresponding to the relational statement as 'b'. For another example, the entity "a", the attribute relationship "husband" and the attribute relationship "work-song" in the knowledge graph are spliced into a relational statement: and B, setting the attribute value corresponding to the relational statement as rice aroma of the song of the husband A.

Further, the implementation step of the electronic device selecting the target relational statement with the highest similarity to the target inference information from the relational statements according to the target inference information may include: the electronic equipment acquires a first feature vector corresponding to the target inference information and a second feature vector corresponding to each relational statement in the relational statement library; and selecting a target relational statement from the relational statement library according to the first characteristic vector and the second characteristic vector corresponding to each relational statement.

Optionally, the electronic device may input the target inference information and each relational statement in the relational statement library into the feature vector model, and generate a first feature vector corresponding to the target inference information and a second feature vector corresponding to each relational statement through the feature vector model.

The electronic device may select the target relational statement from the relational statement library according to the first feature vector and the second feature vector corresponding to each relational statement in an implementation manner: and the electronic equipment calculates the vector similarity between the first feature vector and each second feature vector, and takes the relational statement corresponding to the second feature vector with the highest vector similarity as the target relational statement. The vector similarity can well reflect the similarity between the text sentences, and the accuracy of the target relational sentences can be ensured by acquiring the target relational sentences through the vector similarity.

For example, the target inference information is "song of a husband a", the target relationship statement acquired by the electronic device and having the highest similarity to the song of the husband a is "song of the husband a", and the electronic device takes the second attribute value "rice aroma" corresponding to the target relationship statement as the target entity.

It should be noted that, in the present disclosure, the process of acquiring the target entity is described by taking the knowledge graph as an example, where the knowledge graph is used as a knowledge source, and of course, the knowledge graph may be replaced by a relational database, an online encyclopedia, and the like as a knowledge source, which is not limited by the present disclosure.

It should be noted that, there may be a plurality of attribute values corresponding to the attribute relationship of the entity, and there may also be a plurality of second attribute values corresponding to the relational statement, for example, the second attribute value corresponding to the relational statement "song of husband of a" may include "rice fragrance", "qilixiang", and the like. Accordingly, the number of the target entities obtained through the above steps may be multiple, for example, the obtained target entities corresponding to the target inference information "song of old man" may include "rice fragrance", "qilixiang", and the like, which is not limited in this disclosure.

In the embodiment of the disclosure, the electronic device obtains the second attribute value corresponding to the target relational statement by selecting the target relational statement with the highest similarity to the target inference information from the relational statements according to the target inference information, and takes the second attribute value as the target entity.

The point to be explained is that the target reasoning information is acquired firstly, the target reasoning information is subjected to knowledge reasoning to obtain the implicit target entity, and the knowledge reasoning technology is introduced, so that the task type dialog system can understand the sentences expressed by the target entity in an indirect form. And moreover, target reasoning information is constructed through entity designation recognition and syntactic dependency analysis, and a knowledge reasoning technology is applied to the target reasoning information instead of the whole sentence, so that the calculated amount is reduced, and the reasoning difficulty caused by irrelevant semantics is reduced.

In addition, the method obtains the implied accurate entity by carrying out knowledge inference on the target inference information, namely, carrying out disambiguation operation on subsequent intention identification and slot position extraction, and can improve the accuracy of intention identification and slot position extraction and further improve the accuracy of voice control.

Step 306: and the electronic equipment replaces the target reasoning information in the first sentence with the target entity to obtain a second sentence.

For example, the first sentence is "i want to listen to a song of a old man", the target inference information is "a song of a old man", the target entity corresponding to the target inference information is "rice fragrance", and the second sentence is "i want to listen to rice fragrance".

In the embodiment of the disclosure, since the target inference information is a phrase describing the target entity in an indirect manner, that is, the target inference information substantially corresponds to the target entity, the electronic device obtains the second sentence by replacing the target inference information in the first sentence with the target entity, subsequently obtains the intention of the second sentence, and executes the task corresponding to the second sentence according to the intention, so that the accuracy of semantic understanding can be improved.

It should be noted that the second sentence may be multiple, for example, the target inference information is "song of a senior citizen", the target entities corresponding to the target inference information include "rice fragrance" and "qilixiang", and the second sentence includes "i want to listen to rice fragrance" and "i want to listen to qilixiang".

Step 307: the electronic device executes the target task based on the second statement.

In one possible implementation, the electronic device obtains an intent of the second statement, the intent including a slot bit value of the slot, from which the electronic device can directly perform the target task.

The electronic device may preset a plurality of syntax rules, each syntax rule corresponds to an intention, the electronic device obtains the intention of the second sentence according to the preset syntax rules, and accordingly, the step of obtaining the intention of the second sentence by the electronic device includes: and the electronic equipment determines the syntactic rule met by the second statement, the intention corresponding to the syntactic rule is used as the intention of the second statement, and the slot position value of the intended slot position is obtained from the second statement according to the syntactic rule.

The syntax rule may be "i want to listen to { song name }", the intention corresponding to the syntax rule is to play music, the syntax rule may also be "inquire weather of { city } { date }", and the intention corresponding to the syntax rule is to inquire weather. The format of the syntax rule is only an example, and the syntax rule may be in other formats, which is not limited in this disclosure.

Taking the second sentence as an example of "i want to listen to rice aroma", the electronic device determines the syntax rule that the second sentence satisfies, and the implementation step of taking the intention corresponding to the syntax rule as the intention of the second sentence may be: the electronic equipment analyzes the syntax of a second sentence 'i want to listen to rice fragrance', determines that the syntax rule met by the second sentence is 'i want to listen to { song name }', takes music played with the intention corresponding to the syntax rule as the intention of the second sentence, and then takes the entity 'rice fragrance' corresponding to the song name of the slot position as the slot position value of the slot position.

In the embodiment of the disclosure, the electronic device obtains the intention of the second sentence according to the preset syntax rule, and the method is simple and easy to implement.

In another possible implementation manner, the electronic device may obtain the intention of the second sentence through the intention recognition model, and accordingly, the step of obtaining the intention of the second sentence by the electronic device may be: and the electronic equipment inputs the second sentence into the intention recognition model to obtain the intention of the second sentence output by the intention recognition model, and the slot position value of the slot position of the intention is acquired from the second sentence through the sequence marking model.

In the embodiment of the disclosure, the electronic device acquires the intention of the second sentence through the intention recognition model, and the efficiency of acquiring the intention can be greatly improved.

In another possible implementation manner, the electronic device may obtain the intention of the second sentence through the intention keyword in the second sentence, and accordingly, the step of obtaining the intention of the second sentence by the electronic device may be: the electronic equipment acquires the intention keyword in the second sentence and the entity corresponding to the second sentence, takes the intention corresponding to the intention keyword as the intention of the second sentence, and identifies the slot position value of the slot position of the intention from the entity corresponding to the second sentence. Wherein, the intention keyword can be preset in the electronic device for determining the intention of the voice signal.

For example, the intention keywords preset in the electronic device include "weather", "play", "remind", and "weather" corresponds to intention to query weather, "play" corresponds to intention to play music, and "remind" corresponds to intention to set an alarm. Taking the second sentence as "query weather of moskotomorrow", the implementation steps of the electronic device for obtaining the intention of the second sentence may be: the electronic equipment acquires an intention keyword 'weather' in 'inquiring weather of Moscow tomorrow', acquires entities 'Moscow' and 'tomorrow' corresponding to 'inquiring weather of Moscow tomorrow', takes the intention inquiring weather corresponding to 'weather' as an intention of 'inquiring weather of Moscow tomorrow', identifies a slot value 'Moscow' of a slot city of inquiring weather from the entities 'Moscow' and 'tomorrow', and identifies a slot value 'tomorrow' of a slot date.

In the embodiment of the disclosure, the electronic device obtains the intention of the second sentence through the intention keyword in the second sentence, and the method is simple and easy to implement.

The electronic device performs slot bit verification after acquiring the intent of the second statement, and executes the target task according to the intent in response to the slot bit verification being passed, and does not execute the target task corresponding to the intent in response to the slot bit verification not being passed.

The slot position verification performed by the electronic device may include: and the electronic equipment determines that the slot position passes the verification in response to the fact that the slot position value is the target entity and the type of the target entity is the second target type corresponding to the slot position. Or determining that the slot is validated in response to the slot location value being the target entity and the target entity being an entity in a predefined library of entities.

In the embodiment of the disclosure, the electronic device may perform slot verification first after acquiring the intention of the second sentence, and execute the target task according to the intention only in response to the passing of the slot verification, so as to improve the accuracy of the voice control.

The implementation manner of the electronic device executing the target task according to the intention can be as follows: the electronic equipment inputs the slot value of the slot of the intention as a parameter into a target Skill Server module of an interaction model stored in the electronic equipment, and executes a target task through the target Skill Server module, wherein the target Skill Server module is the Skill Server module corresponding to the intention and is used for executing the task corresponding to the intention.

For example, the first sentence is "i want to listen to a song of a senior citizen", the second sentence corresponding to the first sentence obtained by the above method is "i want to listen to rice fragrance", the intention of the second sentence is to play music, and the slot position information is: the song name: rice aroma, comprising the steps of: the electronic equipment inputs the rice aroma as a parameter to the first Skill Server module, and the operation of playing the song rice aroma is executed through the first Skill Server module. Wherein the first SkillServer module is the SkillServer module corresponding to the music which is intended to be played.

Fig. 7 is a flowchart of a voice control method according to an embodiment of the present disclosure. In the embodiment of the present disclosure, an example in which the electronic device triggers the server to execute a task corresponding to a voice is described. Referring to fig. 7, the embodiment includes:

step 701: the electronic equipment receives an input first voice signal, and the first voice signal is used for controlling the execution of a target task.

Step 702: the electronic device sends the first voice signal to a server.

The server may be a background server of a target application installed on the electronic device, where the target application has a function of voice interaction.

Step 703: the server receives the first voice signal and obtains a first sentence corresponding to the first voice signal.

The implementation manner of this step is the same as that of the electronic device in step 301, and is not described herein again.

It should be noted that the electronic device may also obtain the first sentence corresponding to the first voice signal, directly send the first sentence to the server, and the server receives the first sentence.

Step 704: the server obtains entity names in the first statement and obtains a syntax tree of the first statement, wherein the syntax tree comprises a plurality of nodes and a syntax relation among the nodes, and each node corresponds to one word in the first statement.

The implementation manner of this step is the same as that of the electronic device in

steps

302 and 303, and is not described herein again.

Step 705: the server obtains target inference information based on the entity designations and the syntax tree.

The implementation manner of this step is the same as that of the electronic device in step 304, and is not described herein again.

Step 706: and the server acquires a target entity corresponding to the target reasoning information according to the target reasoning information.

The implementation manner of this step is the same as that of the electronic device in step 305, and is not described herein again.

Step 707: and the server replaces the target inference information in the first statement with the target entity to obtain a second statement.

The implementation manner of this step is the same as that of the electronic device in step 306, and is not described herein again.

Step 708: and the server executes the target task based on the second statement to obtain a task result.

The implementation manner of this step is the same as that of the electronic device in step 307, and is not described herein again. For example, if the first speech signal is "i want to listen to a song of a senior citizen", the task result is an audio file of "rice fragrance".

Step 709: the server returns the task result to the electronic device.

For example, the server returns an audio file of "rice aroma" to the electronic device.

Step 710: and the electronic equipment receives the task result and outputs the task result.

The electronic equipment receives the audio file of the rice fragrance and plays the audio file of the rice fragrance.

To illustrate, the steps 708-710 may be replaced by: and the server acquires the intention of the second statement, wherein the intention comprises a slot position value of the slot position, and the server responds to the situation that the slot position value is the target entity and the type of the target entity is a second target type corresponding to the slot position, or responds to the situation that the slot position value is the target entity and the target entity is an entity in a predefined entity library, and sends the intention of the second statement to the electronic equipment. After receiving the intention of the second sentence, the electronic equipment executes the target task according to the intention.

Fig. 8 is a flowchart of a voice control apparatus according to an embodiment of the present disclosure. Referring to fig. 8, the embodiment includes:

the first sentence acquisition module 801 is configured to acquire a first sentence corresponding to an input first voice signal, where the first voice signal is used to control execution of a target task.

A target inference information obtaining module 802 configured to obtain target inference information from the first sentence, the target inference information being a phrase describing the entity in an indirect manner.

A target entity obtaining module 803 configured to obtain a target entity corresponding to the target inference information according to the target inference information.

And the second statement acquisition module 804 is configured to replace the target inference information in the first statement with a target entity to obtain a second statement.

A task execution module 805 configured to execute the target task based on the second statement.

In a possible implementation manner, the target inference information obtaining module 802 is further configured to obtain an entity name in the first sentence, and obtain a syntax tree of the first sentence, where the syntax tree includes a plurality of nodes and a syntax relationship between each node, and each node corresponds to a word in the first sentence; and acquiring target reasoning information according to the entity designation and the syntax tree.

In another possible implementation, the target inference information obtaining module 802 is further configured to use the node corresponding to the entity designation as a first base node of the syntax tree; selecting a first target node from adjacent nodes of the syntax tree of the first basic node, wherein a first target word corresponding to the first target node and the entity designation meet a target grammatical relation, and the first target word and the entity designation are adjacent in a first sentence; splicing the first target words and the entity names to obtain first reasoning information of the syntax tree; and acquiring target inference information according to the first inference information.

In another possible implementation manner, the target inference information obtaining module 802 is further configured to use a node corresponding to the first inference information as a second base node of the syntax tree; responding to the second target node existing in the adjacent node of the second basic node in the syntax tree, splicing a second target word corresponding to the second target node with the first reasoning information to obtain second reasoning information of the syntax tree, acquiring the target reasoning information according to the second reasoning information, wherein the second target word and the first reasoning information meet the target grammatical relation, and the second target word and the first reasoning information are adjacent in the first sentence; and in response to the second target node not existing in the adjacent nodes of the second base node in the syntax tree, using the first inference information as target inference information.

In another possible implementation manner, the target inference information obtaining module 802 is further configured to perform word segmentation processing on the first sentence to obtain a word segmentation in the first sentence; and taking the participles with the part of speech as the target part of speech in the participles as entity names, or taking the participles with the type as the first target type in the participles as the entity names, or taking the participles matched with the entities in a predefined entity library in the participles as the entity names.

In another possible implementation manner, the target entity obtaining module 803 is further configured to obtain the first entity and the first attribute relationship of the first entity in the target inference information; constructing a query statement according to the first entity and the first attribute relation; inquiring a first attribute value corresponding to the inquiry statement through the inquiry statement; and taking the first attribute value as a target entity.

In another possible implementation manner, the target entity obtaining module 803 is further configured to select, according to the target inference information, a target relational statement from the relational statement library, where the similarity between the target relational statement and the target inference information is the highest; acquiring a second attribute value corresponding to the target relational statement; and taking the second attribute value as a target entity.

In another possible implementation manner, the target entity obtaining module 803 is further configured to obtain a first feature vector corresponding to the target inference information and a second feature vector corresponding to each relational statement in the relational statement library; and selecting a target relational statement from the relational statement library according to the first characteristic vector and the second characteristic vector corresponding to each relational statement.

In another possible implementation, the task execution module 805 is further configured to obtain an intent of the second statement, the intent including a slot bit value of the slot; and executing the target task according to the intention in response to the slot position value being the target entity and the type of the target entity being a second target type corresponding to the slot position, or in response to the slot position value being the target entity and the target entity being an entity in a predefined entity library.

All the above optional technical solutions may be combined arbitrarily to form the optional embodiments of the present disclosure, and are not described herein again.

It should be noted that: in the voice control apparatus provided in the foregoing embodiment, when performing voice control, only the division of the functional modules is illustrated, and in practical applications, the functions may be distributed by different functional modules according to needs, that is, the internal structure of the device may be divided into different functional modules to complete all or part of the functions described above. In addition, the voice control apparatus and the voice control method provided by the above embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments for details, which are not described herein again.

Fig. 9 shows a block diagram of an electronic device 900 according to an exemplary embodiment of the disclosure. The electronic device 900 may be: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group audio Layer III, motion Picture Experts compression standard audio Layer 3), an MP4 player (Moving Picture Experts Group audio Layer IV, motion Picture Experts compression standard audio Layer 4), a notebook computer, or a desktop computer. Electronic device 900 may also be referred to by other names as user equipment, portable electronic device, laptop electronic device, desktop electronic device, and so on.

In general, the electronic device 900 includes: a processor 901 and a memory 902.

Processor 901 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so forth. The processor 901 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 901 may also include a main processor and a coprocessor, where the main processor is a processor for processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 901 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content required to be displayed on the display screen. In some embodiments, the processor 901 may further include an AI (Artificial Intelligence) processor for processing computing operations related to machine learning.

Memory 902 may include one or more computer-readable storage media, which may be non-transitory. The memory 902 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 902 is used to store at least one instruction for execution by processor 901 to implement the speech control methods provided by the method embodiments herein.

In some embodiments, the electronic device 900 may further optionally include: a peripheral interface 903 and at least one peripheral. The processor 901, memory 902, and peripheral interface 903 may be connected by buses or signal lines. Various peripheral devices may be connected to the peripheral interface 903 via a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of a radio frequency circuit 904, a touch display screen 905, a camera assembly 906, an audio circuit 907, a positioning assembly 908, and a power supply 909.

The peripheral interface 903 may be used to connect at least one peripheral related to I/O (Input/Output) to the processor 901 and the memory 902. In some embodiments, the processor 901, memory 902, and peripheral interface 903 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 901, the memory 902 and the peripheral interface 903 may be implemented on a separate chip or circuit board, which is not limited by this embodiment.

The Radio Frequency circuit 904 is used for receiving and transmitting RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuitry 904 communicates with communication networks and other communication devices via electromagnetic signals. The radio frequency circuit 904 converts an electrical signal into an electromagnetic signal to transmit, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 904 comprises: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuitry 904 may communicate with other electronic devices via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: metropolitan area networks, various generation mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the radio frequency circuit 904 may also include NFC (Near Field Communication) related circuits, which are not limited in this application.

The display screen 905 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 905 is a touch display screen, the display screen 905 also has the ability to capture touch signals on or over the surface of the display screen 905. The touch signal may be input to the processor 901 as a control signal for processing. At this point, the display 905 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the display screen 905 may be one, providing the front panel of the electronic device 900; in other embodiments, the number of the display panels 905 may be at least two, and the at least two display panels are respectively disposed on different surfaces of the electronic device 900 or are in a folding design; in still other embodiments, the display 905 may be a flexible display disposed on a curved surface or on a folded surface of the electronic device 900. Even more, the display screen 905 may be arranged in a non-rectangular irregular figure, i.e. a shaped screen. The Display panel 905 can be made of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), and other materials.

The camera assembly 906 is used to capture images or video. Optionally, camera assembly 906 includes a front camera and a rear camera. Generally, a front camera is disposed on a front panel of an electronic apparatus, and a rear camera is disposed on a rear surface of the electronic apparatus. In some embodiments, the number of the rear cameras is at least two, and each rear camera is any one of a main camera, a depth-of-field camera, a wide-angle camera and a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize panoramic shooting and VR (Virtual Reality) shooting functions or other fusion shooting functions. In some embodiments, camera assembly 906 may also include a flash. The flash lamp can be a monochrome temperature flash lamp or a bicolor temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp, and can be used for light compensation at different color temperatures.

Audio circuit 907 may include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 901 for processing, or inputting the electric signals to the radio frequency circuit 904 for realizing voice communication. For stereo capture or noise reduction purposes, the microphones may be multiple and located at different locations of the electronic device 900. The microphone may also be an array microphone or an omni-directional pick-up microphone. The speaker is used to convert electrical signals from the processor 901 or the radio frequency circuit 904 into sound waves. The loudspeaker can be a traditional film loudspeaker or a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, the speaker can be used for purposes such as converting an electric signal into a sound wave audible to a human being, or converting an electric signal into a sound wave inaudible to a human being to measure a distance. In some embodiments, audio circuit 907 may also include a headphone jack.

The positioning component 908 is used to locate a current geographic location of the electronic device 900 to implement navigation or LBS (location based Service). The positioning component 908 may be a positioning component based on the GPS (global positioning System) of the united states, the beidou System of china, the graves System of russia, or the galileo System of the european union.

The power supply 909 is used to supply power to various components in the electronic device 900. The power source 909 may be alternating current, direct current, disposable or rechargeable. When power source 909 comprises a rechargeable battery, the rechargeable battery may support wired or wireless charging. The rechargeable battery may also be used to support fast charge technology.

In some embodiments, the electronic device 900 also includes one or more sensors 910. The one or more sensors 910 include, but are not limited to: acceleration sensor 911, gyro sensor 912, pressure sensor 913, fingerprint sensor 914, optical sensor 915, and proximity sensor 916.

The acceleration sensor 911 may detect the magnitude of acceleration in three coordinate axes of a coordinate system established with the electronic device 900. For example, the acceleration sensor 911 may be used to detect the components of the gravitational acceleration in three coordinate axes. The processor 901 can control the touch display 905 to display the user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 911. The acceleration sensor 911 may also be used for acquisition of motion data of a game or a user.

The gyro sensor 912 may detect a body direction and a rotation angle of the electronic device 900, and the gyro sensor 912 and the acceleration sensor 911 cooperate to acquire a 3D motion of the user on the electronic device 900. The processor 901 can implement the following functions according to the data collected by the gyro sensor 912: motion sensing (such as changing the UI according to a user's tilting operation), image stabilization at the time of photographing, game control, and inertial navigation.

The pressure sensor 913 may be disposed on a side bezel of the electronic device 900 and/or underneath the touch display screen 905. When the pressure sensor 913 is disposed on the side frame of the electronic device 900, the user's holding signal of the electronic device 900 may be detected, and the processor 901 performs left-right hand recognition or shortcut operation according to the holding signal collected by the pressure sensor 913. When the pressure sensor 913 is disposed at a lower layer of the touch display 905, the processor 901 controls the operability control on the UI interface according to the pressure operation of the user on the touch display 905. The operability control comprises at least one of a button control, a scroll bar control, an icon control and a menu control.

The fingerprint sensor 914 is used for collecting a fingerprint of the user, and the processor 901 identifies the user according to the fingerprint collected by the fingerprint sensor 914, or the fingerprint sensor 914 identifies the user according to the collected fingerprint. Upon recognizing that the user's identity is a trusted identity, processor 901 authorizes the user to perform relevant sensitive operations including unlocking the screen, viewing encrypted information, downloading software, paying, and changing settings, etc. The fingerprint sensor 914 may be disposed on the front, back, or side of the electronic device 900. When a physical button or vendor Logo is provided on the electronic device 900, the fingerprint sensor 914 may be integrated with the physical button or vendor Logo.

The optical sensor 915 is used to collect ambient light intensity. In one embodiment, the processor 901 may control the display brightness of the touch display 905 based on the ambient light intensity collected by the optical sensor 915. Specifically, when the ambient light intensity is high, the display brightness of the touch display screen 905 is increased; when the ambient light intensity is low, the display brightness of the touch display screen 905 is turned down. In another embodiment, the processor 901 can also dynamically adjust the shooting parameters of the camera assembly 906 according to the ambient light intensity collected by the optical sensor 915.

The proximity sensor 916, also known as a distance sensor, is typically disposed on the front panel of the electronic device 900. The proximity sensor 916 is used to capture the distance between the user and the front of the electronic device 900. In one embodiment, when the proximity sensor 916 detects that the distance between the user and the front face of the electronic device 900 gradually decreases, the processor 901 controls the touch display 905 to switch from the bright screen state to the dark screen state; when the proximity sensor 916 detects that the distance between the user and the front surface of the electronic device 900 becomes gradually larger, the processor 901 controls the touch display 905 to switch from the breath screen state to the bright screen state.

Those skilled in the art will appreciate that the configuration shown in fig. 9 does not constitute a limitation of the electronic device 900, and may include more or fewer components than those shown, or combine certain components, or employ a different arrangement of components.

Fig. 10 is a schematic structural diagram of a server provided in an embodiment of the present disclosure, where the server 1000 may generate a relatively large difference due to a difference in configuration or performance, and may include one or more processors (CPUs) 1001 and one or more memories 1002, where the memory 1002 stores at least one instruction, and the at least one instruction is loaded and executed by the processors 1001 to implement the voice control method. Of course, the server may also have components such as a wired or wireless network interface, a keyboard, and an input/output interface, so as to perform input/output, and the server may also include other components for implementing the functions of the device, which are not described herein again.

In an exemplary embodiment, a computer-readable storage medium, such as a memory, is also provided that includes instructions executable by a processor in an electronic device to perform a voice control method in the embodiments described below. For example, the computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The above description is intended to be exemplary only and not to limit the present disclosure, and any modification, equivalent replacement, or improvement made without departing from the spirit and scope of the present disclosure is to be considered as the same as the present disclosure.

Claims

1. A method for voice control, the method comprising:

executing the target task based on the second statement.

2. The method of claim 1, wherein the obtaining target inference information from the first sentence comprises:

3. The method of claim 2, wherein said obtaining the target inference information based on the entity designations and the syntax tree comprises:

4. The method of claim 3, wherein obtaining the target inference information according to the first inference information comprises:

5. The method of claim 2, wherein the obtaining the entity designation in the first sentence comprises:

6. The method according to claim 1, wherein the obtaining a target entity corresponding to the target inference information according to the target inference information comprises:

and taking the first attribute value as the target entity.

7. The method according to claim 1, wherein the obtaining a target entity corresponding to the target inference information according to the target inference information comprises:

and taking the second attribute value as the target entity.

8. The method according to claim 7, wherein said selecting a target relational statement from a relational statement library with a highest similarity to the target inference information according to the target inference information comprises:

9. The method of claim 1, wherein the performing the target task based on the second statement comprises:

10. A voice control apparatus, characterized in that the apparatus comprises:

11. An electronic device, comprising a processor and a memory, wherein at least one instruction is stored in the memory, and wherein the instruction is loaded and executed by the processor to implement the operations performed by the voice control method according to any one of claims 1 to 9.

12. A computer-readable storage medium having stored therein at least one instruction, which is loaded and executed by a processor to perform operations performed by the voice control method of any one of claims 1 to 9.