WO2018153273A1

WO2018153273A1 - Semantic parsing method and apparatus, and storage medium

Info

Publication number: WO2018153273A1
Application number: PCT/CN2018/075795
Authority: WO
Inventors: 冯晓冰; 廖玲; 王飞; 徐浩
Original assignee: 腾讯科技（深圳）有限公司
Priority date: 2017-02-23
Filing date: 2018-02-08
Publication date: 2018-08-30
Also published as: CN106874259B; CN106874259A

Abstract

A semantic parsing method based on a state machine. The method comprises: determining a function of a speech product (S601); determining a step set of the speech product in semantic parsing according to the function of the speech product, the step set comprising at least two steps (S602); determining a corresponding node of a state machine for each step in the step set (S603); forming a node set according to the determined nodes (S604); and forming the state machine of the speech product by using the node set (S605). Also disclosed are a semantic parsing apparatus based on a state machine, and a storage medium.

Description

Semantic analysis method, device and storage medium

This application claims the priority of the Chinese patent application filed on February 23, 2017, the Chinese Patent Office, the application number is 201710099405.9, and the invention name is "a semantic analysis method and device based on state machine". The citations are incorporated herein by reference.

Technical field

The present application relates to voice analysis technology, and in particular, to a state machine based semantic analysis method, apparatus, and storage medium.

Background of the invention

Voice Assistant is an intelligent terminal application that helps users solve problems through intelligent interactions between intelligent conversations and instant questions and answers. It is mainly to help users solve life problems. The voice assistant is a voice control application (App, Application; referred to as an application). The voice generated by the user is collected by the sound collection hardware on the terminal, and then the voice is recognized by the voice recognition technology, and then the recognized voice is semantically determined. Then, respond quickly at the front desk; you can also make a voice chat with the user through the microphone, or help the user to manipulate the smart terminal through commands from the user. As can be seen from the above, the voice assistant is an application that can replace all or part of the user's query and operation on the terminal such as a mobile phone through voice interaction. Through such voice applications, users can greatly improve the convenience of operating mobile phones in different business scenarios.

Summary of the invention

The embodiment of the present application provides a semantic analysis method, device, and storage medium based on a state machine to solve at least one problem existing in the prior art, and can enhance the scalability of the voice platform.

The technical solution of the embodiment of the present application is implemented as follows:

An embodiment of the present application provides a state machine-based semantic parsing method, where the method is applied to a server, including:

Determine the functionality of the voice product;

Determining, according to a function of the voice product, a step set of the voice product in semantic parsing, where the step set includes at least two or more steps, and the two or more steps are used to complete at least the following operations: inputting to a user The voice instruction performs preprocessing, parses the voice instruction, and invokes a corresponding function according to the parsed result;

Determining a node of the corresponding state machine for each step in the set of steps;

Forming a set of nodes according to the determined nodes;

Forming the node into a state machine of the voice product, so that the server parses the voice command input by the user according to the state machine, and provides a function corresponding to the voice command to the user according to the analysis result.

Obtain a statement to be parsed of the voice product;

Inputting the to-be-resolved statement into a first node of a preset state machine; wherein each node of the state machine corresponds to a step in a set of semantic parsing steps; the step set is based on the voice The functionality provided by the product is determined and includes at least two or more steps; the two or more steps are used to perform at least the following operations: pre-processing the voice command, parsing the voice command of the user, and Calling the corresponding function according to the result of the parsing;

Obtaining an output result from a last node of the state machine;

The output result is output.

An embodiment of the present application provides a state machine based semantic parsing apparatus, the apparatus is applied to a server, the apparatus includes a processor and a memory connected to the processor; and the memory is stored by the processor Executing machine readable instruction unit; the machine readable instruction unit comprising: a first determining unit, a second determining unit, a third determining unit, a first forming unit, and a second forming unit, wherein:

The first determining unit is configured to determine a function of the voice product;

The second determining unit is configured to determine, according to a function of the voice product, a step set of the voice product in semantic parsing, where the step set includes at least two or more steps, and the two or more steps are used by Performing at least the following operations: pre-processing a voice command input by the user, parsing the voice command, and invoking a corresponding function according to the result of the parsing;

The third determining unit is configured to determine a node of the corresponding state machine for each step in the step set,

The first forming unit is configured to form a node set according to the determined node;

The second forming unit is configured to form the node to form a state machine of the voice product, so that the server parses a voice instruction input by the user according to the state machine, and provides the user with the result according to the analysis result. A function corresponding to the voice command.

An embodiment of the present application provides a state machine based semantic parsing apparatus, the apparatus is applied to a server, the apparatus includes a processor and a memory connected to the processor; and the memory is stored by the processor Executing a machine readable instruction unit; the machine readable instruction unit comprising: a third acquisition unit, an input unit, a fourth acquisition unit, and an output unit, wherein:

The third obtaining unit is configured to acquire a to-be-analyzed statement of the voice product;

The input unit is configured to input the to-be-resolved statement into a first node of a preset state machine; wherein each node of the state machine corresponds to a step in a set of steps of semantic parsing; The step set is determined according to the function provided by the voice product, and includes at least two or more steps; the two or more steps are used to complete at least the following operations: pre-processing the voice command, and the user The voice instruction is parsed, and the corresponding function is called according to the result of the parsing;

The fourth obtaining unit is configured to obtain an output result from a last node of the state machine;

The output unit is configured to output the output result.

A non-transitory computer readable storage medium storing one or more programs, the one or more programs comprising instructions that, when executed by a computing device, cause the computing device to perform the first aspect described above Or the method of the second aspect.

BRIEF DESCRIPTION OF THE DRAWINGS

1A is a schematic structural diagram of a system applicable to implementation of a semantic parsing method according to an embodiment of the present application;

FIG. 1B is a schematic flowchart of a semantic analysis method based on a state machine according to an embodiment of the present application;

2 is a state diagram of a finite state machine of an elevator door in an embodiment of the present application;

3 is a state diagram of a state machine configuration in the embodiment;

4 is a schematic flowchart of semantic analysis of an embodiment of the present application;

FIG. 5 is a schematic flowchart of semantic analysis of an embodiment of the present application; FIG.

6 is a schematic flowchart of an implementation process of a semantic analysis method based on a state machine according to an embodiment of the present application;

FIG. 7 is a schematic structural diagram of a semantic parsing apparatus based on a state machine according to an embodiment of the present application;

FIG. 8 is a schematic structural diagram of a semantic parsing apparatus based on a state machine according to an embodiment of the present application;

FIG. 9 is a schematic diagram of a network architecture according to an embodiment of the present application;

FIG. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Implementation

The technical solutions of the present application are further elaborated below in conjunction with the accompanying drawings and specific embodiments. In a semantic analysis solution of the embodiment of the present application, it is assumed that Company A has a browser service and a video service, and both of these services require semantic analysis because voice assistants are embedded to help those who do not like to perform text. Enter or not have the ability to write. In this way, the user can search for the movie of interest on the web page of the company's video service, and search for the keyword of interest on the web page of the browser service. Since the voice parser is required for both the video service and the browser business, the company integrates the two services on one voice platform; however, due to the business data size, field and browser service of the video service There are big differences in data size and fields. Therefore, a semantic parser is built for each service in the voice platform. When Company A wants to launch a music business (such as QQ music), the company also needs to build a semantic parser for the music business, so that users can search for their interest in instant messaging (QQ). Music. It can be seen that although the voice platform puts various businesses together, it does not integrate in a practical sense.

In addition, in the process of semantic parsing in the background service, there are many specific parsing methods, such as traditional regular templates and deep learning. At the same time, different products will require different scenarios and corresponding services when productizing. For example, for speakers, only need to analyze limited scenes such as music, weather, and reminders; while the voice assistant of the micro-desktop, calling and texting are essential scenes. Different voice products have different requirements for pre-adaptation and rear-end pockets. For example, browser voice assistants, jump search is a reasonable choice when the parsing semantics cannot be provided, and watch voice assistant is not suitable for the current logic. In the face of so many variables in the parsing process, if all the logic is written in the code, it will have to be re-encoded when the new access method or new access product, which is very inflexible.

In order to make the resources more rationally utilized, the following embodiments of the present application propose a method for applying a finite state machine to a semantic parsing method, in which all possible steps in the semantic parsing process are abstracted into one node in the state machine. It is convenient for developers to add or delete a certain step, or to customize each step when each product is accessed, to generate a semantic analysis model suitable for the business; thus, the researcher can flexibly update the analytical method, voice The analysis process can be flexibly customized when the product is accessed. It can be seen from the above that the technical solution provided by the embodiment of the present application improves the voice platform, which not only enables the resource to be more rationally utilized, but also can construct a semantic for the new service when a new service is accessed. The parser is no longer tough.

For a better understanding of the embodiments of the present application, the embodiments of the present application provide an explanation of the following nouns:

Voice Assistant: Software that provides users with corresponding services based on their voice input.

The voice platform, the voice platform in this embodiment is an improvement to the existing voice platform, and can provide semantic resolution services for multiple products.

Scene: The scope of a sentence; for example, I want to listen to music, for music scenes; then like a joke, a joke scene.

Semantic parsing: parsing a sentence into a scene, intent, and parameters that the computer can recognize. For example, I want to listen to the ice rain, the scene is a music scene, the intention is to listen, the parameter is ice rain.

Micro Desktop: A desktop product from the Intelligent Platforms Division.

Finite state machine: Finite-State Machine (FSM, referred to as state machine) is a mathematical model that represents a limited number of states and behaviors such as transitions and actions between these states.

Named entities (NER), such as ice and rain.

The technical solutions of the present application are further elaborated below in conjunction with the accompanying drawings and specific embodiments.

Before introducing the various embodiments of the present application, the related knowledge of the state machine will be introduced. The FSM is composed of a limited state and a transition between each other, and can only be in one of a given number of states at any time. When an input event is received, the state machine produces an output that may be accompanied by a transition of the state. The finite state machine includes the following components:

State: The basic component of a behavioral model that reflects the stage and activity of an object in the system (eg, preprocessing state);

Transition: The process by which an object moves from one state to another (eg, from a preprocessing state to a process in which a semantic algorithm resolves a state);

Transition condition: The event and condition that causes the state of the object to be transformed (eg, the condition needs to be resolved);

Action: The action taken by the object (eg, pre-processing action) before the state transition.

FIG. 1A is a schematic diagram showing the structure of a system to which the semantic analysis method of the example of the present application is applied. The system includes at least a terminal device 101, a first server 102, a second server 103, and a network 104.

The terminal device 101 refers to a terminal device 101 having a data calculation processing function, including but not limited to a smart phone (a handheld computer, a tablet computer, a PC computer, etc.) (with a communication module installed). Operating systems are installed on these terminal devices 101, including but not limited to: Android operating system, Symbian operating system, Windows mobile operating system, and Apple iPhone OS operating system.

The terminal device 101 is installed with an application client (for example, a voice assistant APP), and the application client is installed with the application server software corresponding to the application client through the network 104 and the first server 102 (for example, a voice assistant server) (for example, The voice assistant application server software performs information interaction to implement intelligent interaction between the application client's intelligent dialogue and instant question and answer.

The second server 103 is installed with application server software (for example, a voice platform server) for constructing a state machine, and the built state machine is sent to the first server 102 through the network 104, so that the first server 102 can perform the above according to its state machine. The user voice sent by the client is used for parsing and replying.

The network 105 can be a wired network or a wireless network.

According to the solution provided by the embodiment of the present application, when a new voice service needs to be added, the voice platform server can customize different parsing steps according to requirements, so as to construct a new semantic parser for the service. The voice platform server provided by the embodiment of the present application can rapidly expand the new service, and solves the problem that each voice service is created by each voice service due to the huge difference in the size and field of different service data involved in different service scenarios. The problem of the server; therefore, for the information service provider, the voice platform provided by the embodiment of the present application can implement the integration of the voice service of each service in the actual sense.

The flow of the state machine-based semantic parsing method provided by the embodiment of the present application is shown in FIG. 1B. For example, the user says “I want to listen to the ice rain” in the voice product (ie, the audio or music client), and the background (such as The workflow of the server corresponding to the music client includes: detecting the phrase "I want to listen to the ice rain" spoken by the user, assigning the state machine according to the statement source, inputting the statement into the assigned state machine, and acquiring the state machine Output the result and return the output result to the user. As can be seen from the above, the background work is completely controlled by different state machines.

Figure 2 shows a state diagram of a finite state machine for an elevator door, as shown in Figure 2, which includes two states: state 1 is open and state 2 is closed. Among them, for state 1, the action of entering state 1 is to open the door, and for state 2, the action of entering state 2 is closing; the transition condition between state 1 and state 2 is opening or closing.

The following describes in detail how to apply the FSM model to the semantic parsing process. The semantic parsing state machine implementation process of the embodiment of the present application is as follows:

First, design a unified interface for the state, transition, condition, and action of the state machine.

For example, a language that uses a uniform format and is identifiable from each other.

Second, all the steps in semantic parsing are inherited from the unified interface and encapsulated as nodes in the state machine.

Finally, all the nodes are connected to form a state diagram, and finally the state machine containing all the semantic resolution steps runs.

In general, the speech resolution process includes the following steps:

Step S1, a preprocessing process;

Generally speaking, the voice input by the user and the voice server recognize the voice as the to-be-processed text after being voice-recognized (that is, the statement to be parsed); whether the statement to be parsed needs further analysis, and if the sentence to be parsed needs further analysis, then the step needs to be entered. S2, that is, the parsing statement is parsed by the semantic parsing method; if the parsing statement does not need further parsing, then the process proceeds to step S3, and the vertical service is called.

Among them, the speech recognition technology converts the speech signal into a computer-readable text symbol, and solves the problem that the machine understands the problem of the person speaking.

For example, a music client (eg, a music app) obtains voice information of the user through a local radio device (eg, a built-in microphone) of the mobile terminal device, and the music client sends the voice information to the voice server, and the voice server transmits the voice message to the voice server. The information is pre-processed, the voice information is identified, and the pending text of “I want to listen to the ice rain” is recognized. At this time, the voice server needs to further semantically recognize the pending text of “I want to listen to the ice rain” to understand If the user's intention needs further analysis, the pre-processing state is transferred to the speech analysis algorithm state, that is, step S2, and the parsing statement is parsed by the semantic parsing method. When the pre-processing state of the voice server can determine the user's intention (ie, when understanding the intention of the user's pending text), the voice server does not need to further parse, and directly proceeds to step S3 to invoke the vertical service. Step S2, parsing the parsing statement by a semantic parsing method;

Among them, the semantic analysis method includes a deep learning method, a multi-scene analysis template, a NER+ vocabulary template, and a regular template.

If the parsing is successful, the process proceeds to step S3, that is, the vertical service is called; if the parsing is unsuccessful, the process proceeds to step S4, that is, the Frequently Asked Questions (FAQ) is called.

The successful resolution means that the voice server can parse out the user's intention (for example, if the voice server parses out the pending text “I want to listen to the ice rain”, the user wants to listen to the song “ice rain”), then the process proceeds to step S3, that is, the call is made. Vertical service.

If the resolution is unsuccessful, the voice server cannot parse the user's intention (for example, the voice server cannot parse out the pending text "I want to listen to the ice rain" is the user wants to listen to the song "ice rain"), then proceeds to step S4, Call Frequently Asked Questions (FAQ).

Step S3, calling a vertical service;

The vertical service may include: a music scene service, a map scene service, a video scene service, and an a la carte scene service.

Here, if the calling vertical service is not correct, the process proceeds to step S2, and the processed text is re-analyzed; if the vertical service fails, the process proceeds to step S3 to re-invoke the vertical service. If the call is successful, the process ends (entering the end state).

Step S4, calling a general answer (FAQ);

Here, for example, when the music server does not find a result when searching for a song, it returns a general answer to the music client, for example, causing the music client to make a voice "No song found." If the statement to be parsed by the user is not recognized, then the general answer may also be returned to the user, for example, the voice is "unrecognizable". After returning the general answer to the user, the process ends (ie, enters the end state).

Step S5, performing a local search on the parsing statement;

Here, for some voice products, a search service is also required. Then, in step S5, the process proceeds from step S4 to step S5, and no condition is required; after the local search is performed, the process ends (entering the end state).

In step S6, the process ends.

Taking the above six steps as an example, each of the above steps corresponds to one state in FIG. 3, for example, steps S1 to S6 correspond to state 31 to state 36, respectively, and the association between steps S1 to S6. The relationship corresponds to the state transition condition between the state 31 and the state 36, for example, the relationship between the step S1 and the step S2, that is, the connection relationship is: whether the statement to be parsed needs further analysis, and if the statement to be parsed needs further analysis, then It is necessary to proceed to step S2; and the state transition condition between state 31 and state 32 is that an analysis condition is required. The relationship between the step S1 and the step S3 is as follows: determining whether the statement to be parsed needs further parsing, if the parsing statement does not need further parsing, then it is required to proceed to step S3; and the state transition condition between state 31 and state 33 For: the analysis is successful.

In other embodiments of the present application, FIG. 4 is a schematic flowchart of semantic analysis of an embodiment of the present application. As shown in FIG. 4, the semantic parsing process may further include the following steps:

Step S401, preprocessing;

Here, see step S1 in the above embodiment.

Step S402, calling a semantic analysis method for semantic analysis;

Here, the semantic analysis method includes a deep learning method, a multi-scene analysis template, a NER+ vocabulary template, and a regular template.

Step S403, semantic disambiguation;

Step S404, adapting logic;

Wherein, when the semantics parsed in step S402 has ambiguity or multiple meanings, step S403 is required to perform semantic elimination. After the semantic elimination, the logic is performed in step S403 to determine the true intention of the user, thereby The selection of the scene in step S405 is performed.

Step S405, searching for a vertical scene;

Here, the vertical scene includes removing the phone scene, removing the SMS scene, the music scene, the joke scene, the eating scene, the à la carte scene, the purchase scene, the cooking scene, the cooking scene, and the like.

Step S406, the bottom operation, wherein the bottom operation generally includes a FAQ, an encyclopedia search, a jump search page, and an open domain search.

In step S403, many words have a lot of meanings or semantics, and in a specific context, words have a certain meaning. Separate from the context to consider the meaning of words, semantics generally have semantic ambiguity. The task of disambiguating is to determine which semantics a polysemy uses in a particular context; the specific semantics can be determined by considering the context in which the vocabulary is used.

A relatively simple method is to give a definition of a vocabulary from a dictionary to determine the semantics of the vocabulary. But for most vocabulary, semantics and usage are not simply listed according to the definitions in the dictionary. Some of the semantics listed in the dictionary are clearly distinguishable, but most of the content is not. Determined and mixed together. What is even more difficult is that each vocabulary in the dictionary can only list a certain amount of semantics, and the semantics defined by the vocabulary in the actual context may not be found out from the semantics of the dictionary. Moreover, a word also has different part of speech. Determining the specific part of speech of a word belongs to the task of labeling. It is not involved here, but it needs to know that the determination of different parts of speech of the same word can effectively eliminate lexical ambiguity. Here are three methods of disambiguation. 1. Supervised Disambiguation - Disambiguation based on annotated training sets. 2. Dictionary-based disambiguation - built on dictionary resources. 3. Unsupervised disambiguation – unlabeled text will be applied to the training.

A product does not need all the steps in Figure 4. Semantic analysis can only be done with one or two. Taking the browser voice assistant as an example, the flow step of the browser voice assistant is a subset of FIG. 4, as shown in FIG. 5, the browser voice parsing process includes:

Step S501, preprocessing;

Step S502, calling a semantic analysis method for semantic analysis;

Here, the semantic analysis method includes a deep learning method, a multi-scene analysis template, and a NER+ vocabulary template.

Step S503, semantic disambiguation;

Step S504, adapting logic;

Step S505, searching for a vertical scene;

Here, the vertical scene includes removing the phone scene and removing the short message scene.

Step S506, the bottom operation, wherein the bottom operation generally includes an encyclopedia search and a jump search page.

Based on the foregoing embodiments, an embodiment of the present application provides a state machine based semantic parsing method, which is applied to a first computing device, and the functions implemented by the method may be implemented by a processor calling program code in the first computing device. Of course, the program code can be stored in a computer storage medium. As can be seen, the first computing device includes at least a processor and a storage medium.

FIG. 6 is a schematic diagram of an implementation process of a semantic parsing method based on a state machine according to an embodiment of the present application. As shown in FIG. 6, the method may be applied to a voice server, where the method includes:

Step S601, determining a function of the voice product;

Here, for the speaker, the function of the voice product is to search for songs according to the user's voice command, and play the song; for the air conditioner, the function of the voice product is to control the temperature, humidity, duration, etc. of the air conditioner according to the voice command of the user. Working parameters, and working according to the determined working parameters; for the browser voice assistant, searching according to the user's voice command, and returning the result; for the voice chat assistant, the dialogue is performed according to the user's voice.

Step S602, determining, according to a function of the voice product, a step set of the voice product in semantic parsing, where the step set includes at least two or more steps;

Step S603, determining a node of the corresponding state machine for each step in the step set;

Step S604, forming a node set according to the determined node;

Step S605, the node set is formed into a state machine of the voice product.

In the process of implementation, the functions or steps in the embodiment of the present application may be represented by a configuration file, for example, <machine></machine> is defined as a state machine, and the content under <state></state> is The action corresponding to the state name and state, wherein the action is implemented by a class of the unified interface. <transmition></transmition> is defined as a transfer. The definition format is Transfer = Current Status | Condition | Next Status.

It should be noted that after the first computing device forms the state machine, the state machine may be run on the first computing device; or the state machine is output to the second computing device, and then the second computing device runs the state machine. Based on this, whether the first computing device or the second computing device runs the state machine, the method further includes:

Step S606, acquiring a to-be-analyzed statement of the voice product;

Step S607, input the statement to be parsed into a first node of a preset state machine;

Step S608, obtaining an output result from a last node of the state machine;

Step S609, outputting the output result.

After applying the technical solution provided by the embodiment of the present application, accessing any new voice product does not require re-encoding, and only needs to customize different parsing processes according to product requirements, which is simple, flexible, and efficient, and the user experience is good. For example, the user said in the browser voice assistant that the character A (for example, Li Xiaopeng) is the user, and the action that the user sees is to jump to the search page and use the browser to search for the keyword A. In the micro-desktop, the encyclopedic information of character A is directly spit out.

Several implementation steps S605 are provided below, "the manner in which the nodes are aggregated to form a state machine of the voice product":

Manner 1: First, in step S603, "determining a node of the corresponding state machine for each step in the set of steps" includes: in the step set, determining each connection relationship according to each step and other steps One step corresponds to a transition condition between the node and the other node corresponding to the other step; correspondingly, step S605 includes: forming the node into a state machine of the voice product according to the transition condition.

The second mode, the forming the node to form a state machine of the voice product, comprising: determining, according to a connection relationship between each two steps in the step set, a connection relationship between nodes corresponding to each two steps Forming a state machine of the voice product according to a connection relationship between nodes in the set of nodes.

Here, every two steps refers to all possible combinations of steps in the step set. Assuming that the step set includes steps a, b, c and d, then each two steps includes step a and step b, step a and step c, step a and Step d, step b and step c, step b and step d, step c and step d.

Here, the connection relationship (association relationship) between every two steps refers to step S1 and step S2 described above. For example, the relationship between step S1 and step S2 is: determining whether the statement to be parsed needs further parsing, if the statement to be parsed If further analysis is needed, then step S2 needs to be entered; and the state transition condition between state 31 and state 32 is that the condition needs to be resolved. The relationship between the step S1 and the step S3 is as follows: determining whether the statement to be parsed needs further parsing, if the parsing statement does not need further parsing, then it is required to proceed to step S3; and the state transition condition between the state 31 and the state 33 For: the analysis is successful.

Manner 3: The state machine for forming the node to form the voice product includes: acquiring an identifier of a node corresponding to each step; forming the voice according to a preset state map according to an identifier of a node corresponding to each step The state machine of the product.

In the foregoing manner 3, the process of forming a preset state map includes:

Step SA1, determining a complete set of steps in semantic parsing, the complete set of steps comprising at least two or more steps, the step set being a subset of the complete set of steps;

Here, the step ensemble and step set may include the same number of steps, but the step ensemble may be more than the step set, wherein the subset represents the step corpus includes the same number of steps as the step set.

Step SA2, encapsulating each step of the step set into a node of the state machine;

Step SA3, determining, according to the connection relationship between each two steps in the step set, the connection relationship (or transition condition) between the nodes corresponding to each two steps;

In step SA4, a state diagram is formed based on the connection relationship (or transition condition) between the nodes.

Here, in step A2, the determining, for each step in the set of steps, the node of the corresponding state machine, comprising: acquiring the association information between the step and the node; and according to the association information, each of the step sets A step determines the node of the corresponding state machine.

Here, the association information is used to represent the correspondence between the step and the node. In the process of implementation, the correspondence relationship list may be used to implement the corresponding relationship list according to the identifier of the step, and the corresponding node is obtained.

In other embodiments of the present application, in order to ensure the correspondence between the step and the node (the state of the state machine), the embodiment of the present application further includes a matching correspondence between the determining step and the node, that is, the method in this embodiment Also includes:

Step SB1: Acquire a first connection relationship, where the first connection relationship is a connection relationship between a step in the step set and any other step in the step set;

Step SB2, obtaining a second connection relationship, where the second connection relationship is a connection relationship (or a transition condition) between a node corresponding to one step in the step set and a node corresponding to any other step in the state machine;

Step SB3, if the first connection relationship matches the second connection relationship, determine a node corresponding to the one step as one node in the node set;

Here, it is determined whether the first connection relationship and the second connection relationship match, and a determination result is obtained; if the determination result indicates that the first connection relationship matches the second connection relationship, determining the node as One node in the set of nodes;

In step SB4, if the first connection relationship does not match the second connection relationship, the node is determined again for the one step.

An embodiment of the present application provides a state machine-based semantic parsing method and apparatus, wherein a function of a voice product is determined; and a step set of the voice product in semantic parsing is determined according to a function of the voice product, where the step set is Determining at least two or more steps; determining a node of the corresponding state machine for each step in the set of steps; forming a set of nodes according to the determined node; forming the set of nodes to form a state machine of the voice product; It enhances the scalability of the voice platform.

Based on the foregoing embodiments, the embodiment of the present application provides a state machine-based semantic parsing apparatus, and each unit included in the apparatus, and each module included in each unit, can be implemented by a processor in the first computing device. In the process of implementation, the functions implemented by the processor may of course be implemented by specific logic circuits; in the process of the specific embodiment, the processor may be a central processing unit (CPU), a microprocessor (MPU), and a digital Signal processor (DSP) or field programmable gate array (FPGA).

In the process of implementation, the first computing device is implemented by using various electronic devices with information processing capabilities, for example, the electronic device can be implemented for a smart phone, a notebook computer, a desktop computer, a server cluster, or the like.

FIG. 7 is a schematic structural diagram of a semantic parsing apparatus based on a state machine according to an embodiment of the present application. As shown in FIG. 7, the apparatus 700 includes a processor and a memory connected to the processor; a machine readable instruction unit executed by the processor; the machine readable instruction unit comprising: a first determining unit 701, a second determining unit 702, a third determining unit 703, a first forming unit 704, and a second forming unit 705 ,among them:

The first determining unit 701 is configured to determine a function of the voice product;

The second determining unit 702 is configured to determine, according to a function of the voice product, a step set of the voice product in semantic parsing, where the step set includes at least two or more steps;

The third determining unit 703 is configured to determine a node of the corresponding state machine for each step in the step set,

The first forming unit 704 is configured to form a node set according to the determined node;

The second forming unit 705 is configured to form the node to form a state machine of the voice product.

Two ways of implementing the second forming unit 705 are provided below:

Manner 1: The second forming unit 705 includes a first determining module 7051 and a first forming module 7052, wherein: the first determining module 7051 is configured to perform a connection relationship between each two steps in the step set. Determining a connection relationship (or transition condition) between nodes corresponding to each of the two steps; the first forming module 7052 is configured to form a connection relationship (or a transition condition) between nodes in the node set State machine for voice products.

In a second manner, the second forming unit 705 includes an obtaining module 7053 and a second forming module 7054, wherein: the obtaining module 7053 is configured to acquire an identifier of a node corresponding to each step; and the second forming module 7054 is configured to use The state machine of the voice product is formed according to a preset state map according to the identifier of the node corresponding to each step.

In other embodiments of the present application, in the second manner, the apparatus 700 further includes a third forming unit 706 for forming a preset state diagram, where the third forming unit 706 includes a second determining module 7061 and a packaging module. 7062. The third determining module 7063 and the third forming module 7064, wherein:

The second determining module 7061 is configured to determine a complete set of steps in semantic parsing, where the complete set of steps includes at least two or more steps, and the step set is a subset of the complete set of steps;

The encapsulating module 7062 is configured to encapsulate each step of the step set into a node of a state machine;

The second determining module 7063 is configured to determine, according to the connection relationship between each two steps in the step set, the connection relationship (or transition condition) between the nodes corresponding to each two steps;

The third forming module 7064 is configured to form a state diagram according to a connection relationship between the nodes.

In other embodiments of the present application, the encapsulating module 7062 in the second method further includes an obtaining submodule and a determining submodule, where:

The obtaining submodule is configured to acquire association information between the step and the node;

The determining submodule is configured to determine, according to the association information, a node that determines a corresponding state machine for each step in the set of steps.

In other embodiments of the present application, the apparatus further includes a first obtaining unit, a second acquiring unit, a matching unit, and a non-matching unit, where:

The first obtaining unit is configured to acquire a first connection relationship, where the first connection relationship is a connection relationship between a step in the step set and any other step in the step set;

The second obtaining unit is configured to acquire a second connection relationship, where the second connection relationship is a connection relationship between a node corresponding to one step in the step set and a node corresponding to any other step in the state machine (or transfer conditions);

The matching unit is configured to determine, as the first connection relationship and the second connection relationship, a node corresponding to the one step as one node in the node set;

The unmatching unit is configured to re-determine the node for the one step if the first connection relationship does not match the second connection relationship.

Here, the device further includes a determining unit, configured to determine whether the first connection relationship and the second connection relationship match, and obtain a determination result; if the determination result indicates the first connection relationship and the second The connection relationship is matched, and the node is determined as one node in the node set; if the first connection relationship does not match the second connection relationship, determining, in the step, determining the node as the node A node in the collection.

In other embodiments of the present application, the apparatus may further include: a statement obtaining unit 707 that acquires a statement to be parsed of the voice product; a sentence input unit 708 that inputs the to-be-resolved sentence into the state machine a node; a result obtaining unit 709, which obtains an output result from a last node of the state machine; and a result output unit 710 that outputs the output result.

It should be noted here that the description of the above device embodiment is similar to the description of the above method embodiment, and has similar advantageous effects as the method embodiment, and therefore will not be described again. For the details of the technical solutions that are not disclosed in the embodiments of the present application, please refer to the description of the method embodiments of the present application, and the details are not described herein.

Based on the foregoing embodiments, the embodiment of the present application provides a state machine-based semantic parsing apparatus, and each unit included in the apparatus, and each module included in each unit, can be implemented by a processor in the second computing device. In the process of implementation, the functions implemented by the processor may of course be implemented by specific logic circuits; in the process of the specific embodiment, the processor may be a central processing unit (CPU), a microprocessor (MPU), and a digital Signal processor (DSP) or field programmable gate array (FPGA).

In the process of implementation, the second computing device is implemented by using various electronic devices with information processing capabilities, for example, the electronic device can be implemented for a smart phone, a notebook computer, a desktop computer, a server cluster, or the like.

8 is a schematic structural diagram of a semantic parsing apparatus based on a state machine according to an embodiment of the present application. As shown in FIG. 8, the apparatus 800 includes a processor and a memory connected to the processor; The machine readable instruction unit executed by the processor; the machine readable instruction unit comprises: a third acquisition unit 801, an input unit 802, a third acquisition unit 803, and an output unit 804, wherein:

The third obtaining unit 801 is configured to obtain a to-be-analyzed statement of the voice product.

The input unit 802 is configured to input the to-be-resolved statement into a first node of a preset state machine;

The fourth obtaining unit 803 is configured to obtain an output result from a last node of the state machine;

The output unit 804 is configured to output the output result.

In other embodiments of the present application, the apparatus includes a first determining unit, a second determining unit, a third determining unit, a first forming unit, and a second forming unit, wherein:

The second determining unit is configured to determine, according to a function of the voice product, a step set of the voice product in semantic parsing, where the step set includes at least two or more steps;

The second forming unit is configured to form the node to form a state machine of the voice product.

In other embodiments of the present application, in order to form a state machine, the first computing device formed by the first computing device may run on the first computing device or may operate as a functional module in the second On the computing device, the second computing device may be a server of the voice product or a terminal of the voice product. In other words, the state machine formed by the first computing device may be output to the server of the voice product or may be output to the terminal of the voice product. Based on this understanding, an embodiment of the present application further provides a state machine based semantic parsing system, which has multiple implementation modes, wherein:

First mode: As shown in FIG. 9A, the system 900 of the first mode includes a first computing device 901, a second computing device 902, and a terminal 903, where:

The first computing device 901 is configured to form a state machine (such as the foregoing method or the embodiment shown in Figure 8), and then output the formed state machine to the second computing device 902;

The client 903 is installed with a voice product client (for example, a mobile phone voice assistant, a browser voice assistant), the user opens the client on the terminal, and then the user speaks a sentence, and the client sends the received voice to the second calculation. Device 902;

The second computing device 902 is a server of the terminal 903, the second computing device 902 is configured to run the state machine output by the first device 901, and the second computing device 902 is further configured to receive the voice output by the client of the terminal 903, and then the voice is Performing speech recognition preprocessing, obtaining a statement to be parsed, inputting the statement to be parsed into a state machine running on the second computing device 902, and then obtaining an output result outputted from the state machine, and returning the output result to the client of the terminal 903. Finally, the client of the terminal 903 outputs the output result to the user.

The second mode: as shown in FIG. 9B, the second mode system 900 includes a first computing device 901 and a second computing device 902, where:

The second computing device 902 serves as a terminal, and the second computing device 902 is installed with a client of the voice product (for example, a mobile phone voice assistant such as Apple's siri, browser voice assistant), the user opens the client on the terminal, and then the user speaks In a word, the client sends the received voice to the state machine running on the second computing device 902; after the state machine runs, the output is sent to the client, and then the client obtains the output output from the state machine, and finally The client outputs the output to the user. In an implementation process, the state machine can be independent of the client, wherein the client includes detection means for detecting what the user said, and then the detecting means sends the voice to the state machine running on the second computing device 902.

It should be noted that, in the embodiment of the present application, if the foregoing state machine based semantic parsing method is implemented in the form of a software function module, and is sold or used as a standalone product, it may also be stored in a computer readable storage medium. in. Based on such understanding, the technical solution of the embodiments of the present application may be embodied in the form of a software product in essence or in the form of a software product stored in a storage medium, including a plurality of instructions. A computer device (which may be a personal computer, server, or network device, etc.) is caused to perform all or part of the methods described in various embodiments of the present application. The foregoing storage medium includes various media that can store program codes, such as a USB flash drive, a mobile hard disk, a read only memory (ROM), a magnetic disk, or an optical disk. Thus, embodiments of the present application are not limited to any particular combination of hardware and software.

Correspondingly, the embodiment of the present application further provides a computer readable storage medium, where the computer readable storage medium stores computer executable instructions, and when the computer executable instructions are executed by the processor, are used to execute the embodiment of the present application. A semantic parsing method based on state machine.

Accordingly, the embodiment of the present application further provides a computing device, including: a memory, a processor, and a computer program for being stored on the memory and operable on the processor, the processor executing the program It is used to implement the state machine based semantic parsing method in the embodiments of the present application.

It should be noted here that the description of the above computing device embodiment item is similar to the above method description, and has the same beneficial effects as the method embodiment. For technical details that are not disclosed in the embodiments of the computing device of the present application, those skilled in the art should understand the description of the method embodiments of the present application.

In the process of implementation, the first computing device, the second computing device, and the terminal may be implemented by using an electronic device. FIG. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in FIG. 10, the computing device 1000 may be configured. The method includes: at least one processor 1001, at least one communication bus 1002, a user interface 1003, at least one external communication interface 1004, and at least one memory 1005. Among them, the communication bus 1002 is used to implement connection communication between these components. The user interface 1003 can include a display screen and a keyboard. External communication interface 1004 can include standard wired and wireless interfaces.

It is to be understood that the phrase "one embodiment" or "an embodiment" or "an embodiment" or "an embodiment" means that the particular features, structures, or characteristics relating to the embodiments are included in at least one embodiment of the present application. Thus, "in one embodiment" or "in an embodiment" or "an" In addition, these particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. It should be understood that, in the various embodiments of the present application, the size of the sequence numbers of the foregoing processes does not mean the order of execution sequence, and the order of execution of each process should be determined by its function and internal logic, and should not be applied to the embodiment of the present application. The implementation process constitutes any limitation. The serial numbers of the embodiments of the present application are merely for the description, and do not represent the advantages and disadvantages of the embodiments.

It is to be understood that the term "comprises", "comprising", or any other variants thereof, is intended to encompass a non-exclusive inclusion, such that a process, method, article, or device comprising a series of elements includes those elements. It also includes other elements that are not explicitly listed, or elements that are inherent to such a process, method, article, or device. An element that is defined by the phrase "comprising a ..." does not exclude the presence of additional equivalent elements in the process, method, item, or device that comprises the element.

In the several embodiments provided by the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. The device embodiments described above are merely illustrative. For example, the division of the unit is only a logical function division. In actual implementation, there may be another division manner, such as: multiple units or components may be combined, or Can be integrated into another system, or some features can be ignored or not executed. In addition, the coupling, or direct coupling, or communication connection of the components shown or discussed may be indirect coupling or communication connection through some interfaces, devices or units, and may be electrical, mechanical or other forms. of.

The units described above as separate components may or may not be physically separated, and the components displayed as the unit may or may not be physical units; they may be located in one place or distributed on multiple network units; Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may be separately used as one unit, or two or more units may be integrated into one unit; the above integration The unit can be implemented in the form of hardware or in the form of hardware plus software functional units.

It will be understood by those skilled in the art that all or part of the steps of implementing the foregoing method embodiments may be performed by hardware related to program instructions. The foregoing program may be stored in a computer readable storage medium, and when executed, the program includes The foregoing steps of the method embodiment; and the foregoing storage medium includes: a removable storage device, a read only memory (ROM), a magnetic disk, or an optical disk, and the like, which can store program codes.

Alternatively, the above-described integrated unit of the present application may be stored in a computer readable storage medium if it is implemented in the form of a software function module and sold or used as a stand-alone product. Based on such understanding, the technical solution of the embodiments of the present application may be embodied in the form of a software product in essence or in the form of a software product stored in a storage medium, including a plurality of instructions. A computer device (which may be a personal computer, server, or network device, etc.) is caused to perform all or part of the methods described in various embodiments of the present application. The foregoing storage medium includes various media that can store program codes, such as a mobile storage device, a ROM, a magnetic disk, or an optical disk.

The foregoing is only a specific embodiment of the present application, but the scope of protection of the present application is not limited thereto, and any person skilled in the art can easily think of changes or substitutions within the technical scope disclosed in the present application. It should be covered by the scope of protection of this application. Therefore, the scope of protection of the present application should be determined by the scope of the claims.

Claims

A state machine based semantic parsing method, the method being applied to a server, the method comprising:

Determine the functionality of the voice product;

Determining, according to a function of the voice product, a step set of the voice product in semantic parsing, where the step set includes at least two or more steps, and the two or more steps are used to complete at least the following operations: inputting to a user The voice instruction performs preprocessing, parses the voice instruction, and invokes a corresponding function according to the parsed result;

Determining a node of the corresponding state machine for each step in the set of steps;

Forming a set of nodes according to the determined nodes;

Forming the node into a state machine of the voice product, so that the server parses the voice command input by the user according to the state machine, and provides a function corresponding to the voice command to the user according to the analysis result.
The method of claim 1, the grouping the nodes to form a state machine of the voice product, comprising:

Determining a connection relationship between nodes corresponding to each two steps according to a connection relationship between each two steps in the step set;

Forming a state machine of the voice product according to a connection relationship between nodes in the node set.
The method of claim 1, the grouping the nodes to form a state machine of the voice product, comprising:

Obtain the identifier of the node corresponding to each step;

Forming a state machine of the voice product according to a preset state map according to the identifier of the node corresponding to each step.
The method according to claim 3, wherein the forming the preset state map comprises:

Determining a complete set of steps in semantic parsing, the set of steps comprising at least two or more steps, the set of steps being a subset of the complete set of steps;

Encapsulating each step of the step set into a node of a state machine;

Determining a connection relationship between nodes corresponding to each two steps according to a connection relationship between each two steps in the step set;

A state diagram is formed according to the connection relationship between the nodes.
The method of claim 4, wherein determining, for each step in the set of steps, a node of a corresponding state machine, comprising:

Obtain association information between the step and the node;

Determining, based on the association information, a node that determines a corresponding state machine for each step in the set of steps.
The method according to any one of claims 1 to 5, further comprising:

Obtaining a first connection relationship, where the first connection relationship is a connection relationship between a step in the step set and any other step in the step set;

Obtaining a second connection relationship, where the second connection relationship is a connection relationship between a node corresponding to one step in the step set and a node corresponding to any other step in the state machine;

If the first connection relationship matches the second connection relationship, determine a node corresponding to the one step as one node in the node set;

If the first connection relationship does not match the second connection relationship, the node is determined again for the one step.
The method of claim 1 further comprising:

Obtaining a statement to be parsed of the voice product;

Inputting the to-be-resolved statement into the first node of the state machine;

Obtaining an output result from a last node of the state machine;

The output result is output.
A state machine based semantic parsing method, the semantic parsing method is applied to a server; the semantic parsing method comprises:

Obtain a statement to be parsed of the voice product;

Inputting the to-be-resolved statement into a first node of a preset state machine; wherein each node of the state machine corresponds to a step in a set of semantic parsing steps; the step set is based on the voice The functionality provided by the product is determined and includes at least two or more steps; the two or more steps are used to perform at least the following operations: pre-processing the voice command, parsing the voice command of the user, and Calling the corresponding function according to the result of the parsing;

Obtaining an output result from a last node of the state machine;

The output result is output.
A state machine based semantic parsing device, the device being applied to a server, the device comprising:

a processor and a memory coupled to the processor; the memory having machine readable instruction units executable by the processor; the machine readable instruction unit comprising:

a first determining unit, a second determining unit, a third determining unit, a first forming unit, and a second forming unit, wherein:

The first determining unit is configured to determine a function of the voice product;

The second determining unit is configured to determine, according to a function of the voice product, a step set of the voice product in semantic parsing, where the step set includes at least two or more steps, and the two or more steps are used by Performing at least the following operations: pre-processing a voice command input by the user, parsing the voice command, and invoking a corresponding function according to the result of the parsing;

The third determining unit is configured to determine, in each step of the step set, a node of a corresponding state machine,

The first forming unit is configured to form a node set according to the determined node;

The second forming unit is configured to form the node to form a state machine of the voice product, so that the server parses a voice instruction input by the user according to the state machine, and provides the user with the result according to the analysis result. A function corresponding to the voice command.
The apparatus of claim 9, the second forming unit comprising:

a first determining module, determining, according to a connection relationship between each two steps in the step set, a connection relationship between nodes corresponding to each two steps;

The first forming module forms a state machine of the voice product according to a connection relationship between nodes in the node set.
The apparatus of claim 9, the second forming unit comprising:

Obtaining a module, obtaining an identifier of a node corresponding to each step;

The second forming module forms a state machine of the voice product according to a preset state map according to the identifier of the node corresponding to each step.
The apparatus according to claim 11, further comprising a third forming unit to form a preset state map;

The third forming unit includes a second determining module, a packaging module, a third determining module, and a third forming module, wherein:

The second determining module determines a complete set of steps in the semantic parsing, where the complete set of steps includes at least two or more steps, and the step set is a subset of the complete set of steps;

The encapsulating module is encapsulated as a node of the state machine for each step of the step set;

The third determining module determines, according to the connection relationship between each two steps in the step set, the connection relationship between the nodes corresponding to each two steps;

The third forming module forms a state diagram according to a connection relationship between the nodes.
The device of claim 12, the package module comprising:

Obtain a sub-module, and obtain association information between the step and the node;

Determining a sub-module, determining, according to the association information, a node that determines a corresponding state machine for each step in the set of steps.
The apparatus according to any one of claims 10-13, further comprising: a first obtaining unit, a second obtaining unit, a matching unit and a mismatching unit, wherein:

The first acquiring unit acquires a first connection relationship, where the first connection relationship is a connection relationship between a step in the step set and any other step in the step set;

The second obtaining unit acquires a second connection relationship, where the second connection relationship is a connection relationship between a node corresponding to one step in the step set and a node corresponding to any other step in the state machine;

And the matching unit determines, if the first connection relationship and the second connection relationship, the node corresponding to the one step as one node in the node set;

The unmatching unit re-determines the node for the one step if the first connection relationship does not match the second connection relationship.
The device of claim 10, the device further comprising:

a statement obtaining unit, which acquires a statement to be parsed of the voice product;

a statement input unit, inputting the statement to be parsed into a first node of the state machine;

a result obtaining unit, which obtains an output result from a last node of the state machine;

The result output unit outputs the output result.
A state machine based semantic parsing device, the semantic parsing device is applied to a server, and the semantic parsing device comprises:

a processor and a memory coupled to the processor; the memory having machine readable instruction units executable by the processor; the machine readable instruction unit comprising:

a third obtaining unit, an input unit, a fourth obtaining unit, and an output unit, wherein:

The third obtaining unit is configured to acquire a to-be-analyzed statement of the voice product;

The input unit is configured to input the to-be-resolved statement into a first node of a preset state machine; wherein each node of the state machine corresponds to a step in a set of steps of semantic parsing; The step set is determined according to the function provided by the voice product, and includes at least two or more steps; the two or more steps are used to complete at least the following operations: pre-processing the voice command, and the user The voice instruction is parsed, and the corresponding function is called according to the result of the parsing;

The fourth obtaining unit is configured to obtain an output result from a last node of the state machine;

The output unit is configured to output the output result.
A non-transitory computer readable storage medium storing one or more programs, the one or more programs comprising instructions that, when executed by a computing device, cause the computing device to perform the claim 1 The method of any of -8.