WO2021254411A1

WO2021254411A1 - Intent recognigion method and electronic device

Info

Publication number: WO2021254411A1
Application number: PCT/CN2021/100475
Authority: WO
Inventors: 潘龙飞
Original assignee: 华为技术有限公司
Priority date: 2020-06-17
Filing date: 2021-06-17
Publication date: 2021-12-23
Also published as: CN113806473A

Abstract

An intent recognition method and an electronic device. In the described method, an electronic device converts a voice input into a first text, then uses a preset FST intent slot model to perform rule matching on a second text determined according to the first text, obtains a third text, and then obtains intent information and/or slot information from the third text according to preset intent labeling information and/or preset slot labeling information. By implementing the technical solution provided by the present application, the speed and accuracy of matching are improved when using an NLU rule for intent recognition.

Description

Intention recognition method and electronic equipment

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on June 17, 2020 with the application number 202010555603.3 and the application title "Intent Recognition Method and Electronic Equipment", the entire content of which is incorporated into this application by reference.

Technical field

This application relates to the field of artificial intelligence technology, in particular to an intention recognition method and electronic equipment.

Background technique

Natural language processing (NLP) is a sub-field of artificial intelligence (AI). Natural language understanding (NLU) is a sub-field of natural language processing, and it is also the most difficult subject of NLP. Intent recognition and slot filling are the two most critical tasks of NLU, but due to factors such as language diversity, ambiguity, robustness, knowledge dependence and context, it is very difficult for NLU to complete these two tasks well. big.

At present, there is a grammatical analysis method based on the parsing algorithm (context free grammar, CFG), for example, the CYK algorithm (cocke younger kasami algorithm, CYK algorithm), etc.: first convert the general form of CFG to Chomsky normal form (CNF) form, and then use the CYK algorithm to perform bottom-up grammatical analysis of the converted CNF grammar. The CYK algorithm uses the idea of dynamic programming, starting from the input word, and step by step to the initial state according to the grammar recursively, and finally completes the grammatical analysis. When the grammatical analysis is completed, the entire analysis path can generate a grammatical analysis tree, and the user can extract grammatical features based on this grammatical tree, and then obtain the desired grammatical information, such as parts of speech, entities, sentence components and other information.

However, using the CFG parsing algorithm to perform grammatical analysis to obtain the intent and slot of the intent recognition is feasible when the number of grammatical rules is small. However, in the case of a large number of grammatical rules, its parsing speed will be greatly affected, the parsing is very slow, and even the parsing service may be unavailable.

Summary of the invention

This application provides a method and electronic device for intent recognition to improve the speed and accuracy of matching when using NLU rules for intent recognition.

In a first aspect, the present application provides an intent recognition method. The method includes: in response to a user's voice input, an electronic device converts the voice input into a first text; the electronic device uses a preset FST intent slot model to pair Rule matching is performed on the second text to obtain the third text; the second text is determined according to the first text; the preset FST intent slot model is a preset FST; the third text includes preset intent tagging information And/or preset slot labeling information, the preset intent labeling information is used to label the intent information of the second text, and the preset slot labeling information is used to label the slot information in the second text; the The electronic device obtains the intention information and/or the slot information from the third text according to the preset intention labeling information and/or the preset slot labeling information.

In the above embodiment, the electronic device uses the preset FST intent slot model to recognize user input. Since the preset FST intent slot model is an FST, based on the characteristics of FST rule matching, it performs rule matching extremely fast . And after matching, the third text with preset intent labeling information and preset slot labeling information can be obtained. After the matching is completed, the intent information and slot information can be easily extracted from it, which greatly improves the use of NLU rules for intent The speed and accuracy of matching during recognition.

With reference to some embodiments of the first aspect, in some embodiments, the electronic device uses a preset FST intent slot model to perform rule matching on the second text, and before the step of obtaining the third text, the method further includes: the electronic The device performs format preprocessing on the first text to obtain the second text; the format characters in the second text are less than or equal to the format characters in the first text.

In the foregoing embodiment, the electronic device may first perform format preprocessing on the first text to obtain the second text. In this way, the complexity of the preset FST intent slot model for matching the second text with FST rules can be simplified, and the matching speed can be further improved.

With reference to some embodiments of the first aspect, in some embodiments, the electronic device obtains the intent information and/or slot information from the third text according to preset intent labeling information and/or preset slot labeling information. After the steps, the method further includes: the electronic device outputs the intent information and/or slot information in a structured manner.

In the foregoing embodiment, the electronic device can output the intent information and/or slot information in a structured manner, so that other modules in the electronic device can use the intent information and/or slot information.

With reference to some embodiments of the first aspect, in some embodiments, the preset FST intent slot model is a preset WFST intent slot model, and a preset WFST intent slot model is a preset WFST, In the preset WFST intent slot model, each state transition has a weight.

In the foregoing embodiment, the preset FST intent slot model is the preset WFST intent slot model, so that the weight of each match can be obtained during parallel matching, which facilitates the screening of matching results.

With reference to some embodiments of the first aspect, in some embodiments, the electronic device uses a preset FST intent slot model to perform rule matching on the second text to obtain the third text, which specifically includes: the electronic device uses multiple presets Set the WFST intent slot model, perform parallel rule matching on the second text to obtain the WFST result; where the WFST result includes the matching text of the second text and the cumulative weight of the matching path, and the matching text of the second text is passed After the matching path is successfully matched, the output text contains the preset intent labeling information and/or the preset slot labeling information; when the WFST result is one, the electronic device determines the matching text of the second text in the WFST result Is the third text; when there are multiple WFST results, the electronic device determines that the matching text of the second text in the WFST result with the highest credibility is the third text.

In the foregoing embodiment, the electronic device determines the matching text of the second text in the WFST result with the highest credibility after parallel rule matching as the third text. While improving the efficiency of parallel matching, it also ensures the accuracy of intent recognition.

With reference to some embodiments of the first aspect, in some embodiments, when there are multiple WFST results, the electronic device determines that the matching text of the second text in the WFST result with the highest credibility is the third text, Specifically: when there are multiple WFST results, the electronic device calculates the credibility score for the multiple WFST results according to the cumulative weight of the matching path in the multiple WFST results; the electronic device determines that the credibility score is the highest The matching text of the second text in the WFST result is the third text.

In the foregoing embodiment, the electronic device determines the matching text of the second text in the WFST result with the highest credibility score as the third text, which improves the accuracy of credibility evaluation.

In combination with some embodiments of the first aspect, in some embodiments, in the preset WFST intent slot model, the weight of state transitions that accept wildcards is greater than the weight of state transitions that do not accept wildcards, and each state on the matching path The greater the weight of the transfer, the greater the cumulative weight of the matching path.

With reference to some embodiments of the first aspect, in some embodiments, in the preset WFST intent slot model, the cumulative weight of the matching path is equal to the sum of the weights of each state transition on the matching path.

With reference to some embodiments of the first aspect, in some embodiments, the electronic device performs credibility score calculation on the multiple WFST results according to the cumulative weight of the matching path in the multiple WFST results, which specifically includes: the electronic device Use the credibility score calculation formula 1 to calculate the credibility score of multiple WFST results;

Reliability score calculation formula 1:

Among them, w represents the cumulative weight of the matching path in the WFST results that need to be scored, and w _max represents the largest cumulative weight among the cumulative weights of the matching path in the multiple WFST results.

With reference to some embodiments of the first aspect, in some embodiments, the electronic device uses a preset FST intent slot model to perform rule matching on the second text, and before the step of obtaining the third text, the method further includes: the electronic The device loads the preset WFST intent slot model.

In a second aspect, an embodiment of the present application provides an electronic device, the electronic device includes: one or more processors and a memory; the memory is coupled with the one or more processors, and the memory is used to store computer program codes, The computer program code includes computer instructions, and the one or more processors call the computer instructions to cause the electronic device to execute: in response to a user's voice input, convert the voice input into a first text; use a preset FST intent slot The bit model performs rule matching on the second text to obtain the third text; the second text is determined according to the first text; the preset FST intent slot model is a preset FST; the third text includes presets Intent labeling information and/or preset slot labeling information, the preset intent labeling information is used to label the intent information of the second text, and the preset slot labeling information is used to label the slot in the second text Information: According to the preset intent labeling information and/or the preset slot labeling information, the intent information and/or slot information are obtained from the third text.

With reference to some embodiments of the first aspect, in some embodiments, the one or more processors are also used to call the computer instructions to make the electronic device execute: format preprocessing of the first text to obtain the first text Second text; the format characters in the second text are less than or equal to the format characters in the first text.

With reference to some embodiments of the first aspect, in some embodiments, the one or more processors are also used to call the computer instructions to make the electronic device execute: structure the intent information and/or slot information Output.

With reference to some embodiments of the first aspect, in some embodiments, the one or more processors are specifically configured to call the computer instructions to make the electronic device execute: using multiple preset WFST intent slot models, The second text is matched with parallel rules to obtain the WFST result; where the WFST result includes the matching text of the second text and the cumulative weight of the matching path, and the matching text of the second text is the output after successfully matching through the matching path A text containing preset intent labeling information and/or preset slot labeling information; when the WFST result is one, it is determined that the matching text of the second text in the WFST result is the third text; when the WFST result is multiple At this time, it is determined that the matching text of the second text in the WFST result with the highest credibility is the third text.

With reference to some embodiments of the first aspect, in some embodiments, the one or more processors are specifically configured to call the computer instructions to make the electronic device execute: when the WFST result is multiple, according to the multiple The cumulative weight of the matching path in the WFST result is calculated on the credibility score of the multiple WFST results; the matching text of the second text in the WFST result with the highest credibility score is determined to be the third text.

With reference to some embodiments of the first aspect, in some embodiments, the one or more processors are specifically configured to call the computer instructions to make the electronic device execute: using the credibility score calculation formula 1, calculate multiple WFSTs Reliability score of results;

Reliability score calculation formula 1:

Wherein, w represents the cumulative weight of the matching path in the WFST results that need to be scored, and represents the largest cumulative weight among the cumulative weights of the matching path in the multiple WFST results.

With reference to some embodiments of the first aspect, in some embodiments, the one or more processors are also used to call the computer instructions to cause the electronic device to execute: load a preset WFST intent slot model.

In a third aspect, embodiments of the present application provide a chip system that is applied to an electronic device. The chip system includes one or more processors for invoking computer instructions to make the electronic device execute the first Aspect and the method described in any possible implementation of the first aspect.

It is understandable that the chip system may include one processor 110 in the electronic device 100 as shown in FIG. 5, or may include multiple processors 110 in the electronic device 100 as shown in FIG. 5, which is not limited here. .

In a fourth aspect, the embodiments of the present application provide a computer program product containing instructions. When the computer program product is run on an electronic device, the electronic device executes the first aspect and any possible implementation manner in the first aspect. Described method.

In a fifth aspect, an embodiment of the present application provides a computer-readable storage medium, including instructions, which when the foregoing instructions run on an electronic device, cause the electronic device to execute the first aspect and any possible implementation manner in the first aspect Described method.

Understandably, the electronic equipment provided in the second aspect, the chip system provided in the third aspect, the computer program product provided in the fourth aspect, and the computer storage medium provided in the fifth aspect are all used to implement the methods provided in the embodiments of the present application. . Therefore, the beneficial effects that can be achieved can refer to the beneficial effects in the corresponding method, which will not be repeated here.

Description of the drawings

Figure 1 is a schematic diagram of the relationship between intent and slot;

Figure 2 is an exemplary schematic diagram of an FSA;

Figure 3 is an exemplary schematic diagram of an FST;

Figure 4 is an exemplary schematic diagram of a WFST;

FIG. 5 is a schematic structural diagram of an electronic device provided by an embodiment of the present application;

FIG. 6 is a block diagram of the software structure of an electronic device according to an embodiment of the present application;

FIG. 7 is a schematic diagram of the structure of the intention recognition module in an embodiment of the present application;

FIG. 8 is a schematic diagram of a usage scenario of the intention recognition method in an embodiment of the present application;

FIG. 9 is an exemplary schematic diagram of an FST intention slot model in an embodiment of the present application;

FIG. 10 is a schematic flowchart of an intention recognition method in an embodiment of the present application;

FIG. 11 is a schematic diagram of another flow chart of the intention recognition method in an embodiment of the present application;

FIG. 12 is an exemplary schematic diagram of a WFST intention slot model in an embodiment of the present application;

FIG. 13 is another exemplary schematic diagram of a WFST intention slot model in an embodiment of the present application;

FIG. 14 is a schematic diagram of another flow chart of an intention recognition method in an embodiment of the present application;

15 is an exemplary schematic diagram of parallel rule matching of multiple preset WFST intent slot models for the second text in an embodiment of the present application;

FIG. 16 is an exemplary schematic diagram of a situation in which different settings are used for weights and cumulative weights in the preset WFST intention slot model in an embodiment of the present application.

detailed description

The terms used in the following embodiments of the present application are only for the purpose of describing specific embodiments, and are not intended to limit the present application. As used in the specification and appended claims of this application, the singular expressions "a", "an", "said", "above", "the" and "this" are intended to also Including plural expressions, unless the context clearly indicates to the contrary. It should also be understood that the term "and/or" used in this application refers to and includes any or all possible combinations of one or more of the listed items.

Hereinafter, the terms "first" and "second" are only used for descriptive purposes, and cannot be understood as implying or implying relative importance or implicitly specifying the number of indicated technical features. Therefore, the features defined with “first” and “second” may explicitly or implicitly include one or more of these features. In the description of the embodiments of the present application, unless otherwise specified, “multiple” The meaning is two or more.

Since the embodiments of the present application involve the application of intention recognition technology, in order to facilitate understanding, the following first introduces related terms and concepts involved in the embodiments of the present application.

(1) Intent and slot:

1.1. Definition of intent and slot:

Intent refers to the identification of the actual or potential needs of the user by the electronic device. Fundamentally speaking, intent is a classifier that divides user needs into certain types.

Intentions and slots together constitute a "user action", and electronic devices cannot directly understand natural language. Therefore, the role of intention recognition is to map natural language into a structured semantic representation that machines can understand.

Intention recognition, also known as SUC (Spoken Utterance Classification), as the name suggests, is to classify the natural language conversation input by the user, and the classified category corresponds to the user's intent. For example, "what's the weather today", the intention is "ask the weather". Naturally, intent recognition can be regarded as a typical classification problem. Exemplarily, the classification and definition of intent can refer to the ISO-24617-2 standard, which has 56 detailed definitions. The definition of intention has a lot to do with the positioning of the system itself and the knowledge base it possesses, that is, the definition of intention has a very strong domain relevance. It can be understood that in the embodiments of the present application, the classification and definition of intentions are not limited to the ISO-24617-2 standard.

The slot is the parameter of the intent. An intent may correspond to several slots. For example, when asking for a bus route, you need to provide necessary parameters such as departure place, destination, and time. The above parameters are the slots corresponding to the intention of "asking for bus route".

For example, the main goal of the semantic slot filling task is to extract the pre-defined semantic slot values in the semantic frame from the input sentence on the premise that the semantic frame of a specific domain or specific intention is known. The semantic slot filling task can be transformed into a sequence labeling task, that is, using the classic IOB notation method to mark a word as the beginning, continuation (inside), or non-semantic slot (outside) of a certain semantic slot.

To make a system work properly, you must first design intent and slot location. Intent and slot position can let the system know which specific task to perform, and give the type of parameters needed to perform the task.

Taking a specific "inquiry about the weather" requirement as an example, introduce the design of intent and slot in the task-oriented dialogue system:

User input example: "How is the weather in Shanghai today";

User intention definition: Ask the weather, Ask_Weather;

Slot definition: Slot 1: Time, Date; Slot 2: Location, Location.

Fig. 1 is a schematic diagram of a relationship between an intention and a slot in an embodiment of the application. As shown in Figure 1 (a), in this example, two necessary slots are defined for the "Ask the weather" task, which are "time" and "location". For a single task, the above definition can solve the task requirement. However, in a real business environment, a system often needs to be able to handle several tasks at the same time. For example, the weather station should be able to answer the question of “inquiring about the weather” as well as the question of “inquiring about the temperature”.

For the complex situation in which the same system handles multiple tasks, an optimized strategy is to define higher-level domains, such as "asking for the weather" intentions and "asking for temperature" intentions are both in the "weather" domain. In this case, the domain can be simply understood as a collection of intents. The advantage of defining the domain and performing domain recognition first is that it can constrain the scope of domain knowledge and reduce the search space for subsequent intent recognition and slot filling. In addition, a deeper understanding of each field and the use of specific knowledge and characteristics related to tasks and fields can often significantly improve the effect of Natural Language Understanding (NLU). Based on this, the example in Figure 1 (a) is improved by adding the "weather" field:

User input example:

1. "How is the weather in Shanghai today";

2. "What is the current temperature in Shanghai";

Field definition: weather, Weather;

User intent definition:

1. Ask the weather, Ask_Weather;

2. Ask the temperature, Ask_Temperature;

Slot definition:

Slot 1: Time, Date;

Slot 2: Location, Location.

The intent and slot corresponding to the improved "Ask the weather" requirement are shown in Figure 1 (b).

1.2. Intent identification and slot filling:

After the intent and slot are defined, the user intent and the corresponding slot value of the corresponding slot can be identified from the user input.

The goal of intent recognition is to identify user intent from the input. A single task can be simply modeled as a two-category question, such as "asking for the weather" intent, which can be modeled as "asking for the weather" or "not as for asking about the weather" during intent recognition. "Weather" two classification problem. When it comes to the need for the system to handle multiple tasks, the system needs to be able to distinguish each intent. In this case, the two-category problem is transformed into a multi-category problem.

The task of slot filling is to extract information from the data and fill it into a pre-defined slot. For example, the intent and the corresponding slot have been defined in Figure 1. For the user to input "How is the weather in Shanghai today", the system should Can extract "Today" and "Shanghai" and fill them into the "Time" and "Location" slots respectively.

(2) Finite state converter (FST):

FST is currently widely used in speech recognition and natural language search and processing. For example, in natural language processing, some operations that modify text content according to rules are often encountered. For example, a rule is: if c is immediately followed by x in the string, then c is changed to b. FST is based on mathematical operations on these rules, integrating several rules into a one-way large-scale rule to effectively improve the efficiency of the rule-based system.

To facilitate the understanding of FST, the following first introduces the finite state acceptor (FSA):

For a given input sequence, FSA returns "receiving" or "not receiving" two states.

As shown in Figure 2, it is an exemplary schematic diagram of an FSA, and its nodes and arcs respectively correspond to state and state transitions. In the FSA shown in Figure 2, state 0 represents the initial state, and state 5 represents the end state. State 0 to state 1 can accept character a, state 1 to state 1 can accept character b, state 1 to state 2 can accept character c, state 2 to state 5 can accept character d, and state 0 to state 3 can accept character b. State 3 to State 4 can accept the character c, State 4 to State 4 can accept the character d, and State 4 to State 5 can accept the character e. The regular expression corresponding to the FSA described in Figure 2 is: ab*cd|bcd*e, where * means that the previous character can be repeated any number of times.

For example, the FSA shown in FIG. 2 can receive a symbol sequence "a, b, c, d" through

paths

0, 1, 1, 2, and 5. At this time, the FSA can return to the "receive" state. For another example, if a sequence of "a, b, d" is entered in the FSA shown in FIG. 2, since there is no path in the FSA shown in FIG. 2 to obtain the sequence, the FSA will return to the "not receiving" state.

FST is an extension of FSA, and each state transition has an output tag, called an input-output tag pair. As shown in Figure 2, it is an exemplary schematic diagram of an FST. In the FST shown in Figure 3, state 0 represents the initial state, and state 5 represents the end state. The input and output label pairs from state 0 to state 1 are a: z; the input and output label pairs from state 1 to state 1 are b: y; the input and output label pairs from state 1 to state 2 are c: x; from state 2 to state 5 The I/O label pair of d:w; the I/O label pair of state 0 to state 3 is b:y; the I/O label pair of state 3 to state 4 is c:x; the I/O label pair of state 4 to state 4 It is d:w; the input-output label pair from state 4 to state 5 is e:v.

Through such tag pairs, FST can describe the conversion of a set of rules or the conversion of a set of symbol sequences to another set of conforming sequences. As shown in Figure 2, the input symbol sequence "a, b, c, d", through the

path

0, 1, 1, 2, 5, because a is converted to z, b is converted to y, c is converted to x, and d is converted to w , You can get another symbol sequence "z, y, x, w".

FST is an efficient data structure, and its basic theory is based on the graph theory in the data structure. FST is divided into non-deterministic (Non-Deterministic) FST and deterministic (Deterministic) FST.

The deterministic FST is a 7-tuple: (Q,Σ,Γ,δ,ω,q ₀ ,F), where:

(Definition 1) Q is a finite set of states (states);

(Definition 2) Σ is a limited set of input characters (alphabet);

(Definition 3) Γ is a limited set of output characters (output alphabet);

(Definition 4) δ: Q×Σ→P(Q) is the state transition function;

(Definition 5) ω: Q×Σ→Γ is the output function;

(Definition 6) q ₀ ∈Q is the initial state;

(Definition 7)

Is the collection of acceptance states.

Non-deterministic FST is also a 7-tuple, but non-deterministic FST may have multiple choices when performing state transitions, so Definition 4 and Definition 5 are different from deterministic FST. The definitions of δ and ω in non-deterministic FST as follows:

(Definition 4) δ: Q×Σ∪{ε}→P(Q) is the state transition function;

(Definition 5) ω: Q×Σ∪{ε}×Q→Γ ^* is the output function.

As defined above, an FST has multiple states and multiple state transitions, starting from the initial state, passing through multiple accepting states, and finally reaching the ending state to complete a matching operation.

The typical features of FST are flexible use, high matching efficiency, and low memory overhead.

For example: There are three rules:

1. When c is immediately followed by x, change c to b: cx→bx;

2. When a is in front of rs, change a to b: rsa→rsb;

3. When b is preceded by rs and followed by xy, change b to a: rsbxy→rsaxy.

When the input string is rsaxyrscxy, according to the above three rules, it will be transformed as follows:

1. rsaxyrscxy→rsaxyrsbxy;

2. rsaxyrsbxy→rsbxyrsbxy;

3. rsbxyrsbxy→rsaxyrsaxy.

It can be found that the transformation in accordance with this single rule is relatively inefficient. The transformation made in the second step is changed back in the third step. It is equivalent to the second step of transformation that is meaningless. So FST appeared, providing a way to eliminate these inefficient treatments. After these rules are combined into one FST, this FST can achieve the effects of the above three rules, and only one traversal through the FST is required, and no inefficient conversion will be generated. The time spent on tasks based on these rules is related to the number of rules, length, and the number of input characters. However, the time spent after using FST is only related to the number of characters in the input string.

(3) Weighted finite state transducer (WFST):

WFST is a type of FST. Each state transition has a weight, each initial state has an initial weight, and each terminal state has an end weight. The weight is generally the probability or loss of transition or initial/termination state. The weight will be accumulated along each path and accumulated on different paths.

The calculation method of the cumulative weight can be specified by the specific WFST. For example, the cumulative weight can be multiplying all weights on the passing path, adding all the weights on the passing path, or calculating the weights on the passing path. The method is not limited here.

Figure 4 is an exemplary schematic diagram of WFST. Here, each state transition label is transferred in the form of "input-label: output-label/weight", and the initial state and the final state also have corresponding weights. In the WFST shown in Figure 4, the cumulative weight is calculated by multiplying all the weights on the passing path. If the character sequence "a, b, c, d" is input, it will be converted into a character sequence with a cumulative weight of 0.5*1.2*0.7*3*2*0.1=0.252 according to the

path

0, 1, 1, 2, 5 z, y, x, w".

WFST can be defined by an 8-tuple (∑, Λ, Q, I, F, E, λ, ρ):

(Definition 1) ∑ is a limited set of input tags;

(Definition 2) Λ is a set of limited output tags;

(Definition 3) Q is a set of finite states;

(Definition 4)

Is a set of initial states;

(Definition 5)

Is a set of termination states;

(Definition 6)

It is a group of finite transitions; among them, "∈" is a meta-symbol label, which represents unsigned input and output; K is a set of weight elements;

(Definition 7) λ: I→K is the weight initial function;

(Definition 8) ρ: F→K is the weight termination function.

Exemplarily, through the above definition, the WFST shown in Figure 4 can be defined as follows:

(Definition 1) ∑={a,b,c,d,e};

(Definition 2) Λ={v,x,y,w,z};

(Definition 3) Q={0,1,2,3,4,5};

(Definition 4) I={0};

(Definition 5) F={5};

(Definition 6)

Among them, each transition in E consists of (source state, input label, output label, weight, target state);

(Definition 7) λ(0)=0.5;

(Definition 8) ρ(5) = 0.1.

The following first introduces an exemplary electronic device 100 provided in an embodiment of the present application.

FIG. 5 is a schematic structural diagram of an electronic device 100 provided by an embodiment of the present application.

Hereinafter, the embodiment will be described in detail by taking the electronic device 100 as an example. It should be understood that the electronic device 100 may have more or fewer components than shown in the figure, may combine two or more components, or may have different component configurations. The various components shown in the figure may be implemented in hardware, software, or a combination of hardware and software including one or more signal processing and/or application specific integrated circuits.

The electronic device 100 may include: a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2. Mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, earphone jack 170D, sensor module 180, buttons 190, motor 191, indicator 192, camera 193, display screen 194 and Subscriber identification module (subscriber identification module, SIM) card interface 195, etc. The sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, and ambient light Sensor 180L, bone conduction sensor 180M, etc.

It can be understood that the structure illustrated in the embodiment of the present invention does not constitute a specific limitation on the electronic device 100. In other embodiments of the present application, the electronic device 100 may include more or fewer components than shown, or combine certain components, or split certain components, or arrange different components. The illustrated components can be implemented in hardware, software, or a combination of software and hardware.

The processor 110 may include one or more processing units. For example, the processor 110 may include an application processor (AP), a modem processor, a graphics processing unit (GPU), and an image signal processor. (image signal processor, ISP), controller, memory, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural-network processing unit (NPU) Wait. Among them, the different processing units may be independent devices or integrated in one or more processors.

The controller may be the nerve center and command center of the electronic device 100. The controller can generate operation control signals according to the instruction operation code and timing signals to complete the control of fetching and executing instructions.

A memory may also be provided in the processor 110 to store instructions and data. In some embodiments, the memory in the processor 110 is a cache memory. The memory can store instructions or data that have just been used or recycled by the processor 110. If the processor 110 needs to use the instruction or data again, it can be directly called from the memory. Repeated accesses are avoided, the waiting time of the processor 110 is reduced, and the efficiency of the system is improved.

In some embodiments, the processor 110 may include one or more interfaces. The interface can include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, and a universal asynchronous transmitter (universal asynchronous transmitter) interface. receiver/transmitter, UART) interface, mobile industry processor interface (MIPI), general-purpose input/output (GPIO) interface, subscriber identity module (SIM) interface, and / Or Universal Serial Bus (USB) interface, etc.

The I2C interface is a bidirectional synchronous serial bus, which includes a serial data line (SDA) and a serial clock line (SCL). In some embodiments, the processor 110 may include multiple sets of I2C buses. The processor 110 may couple the touch sensor 180K, charger, flash, camera 193, etc., respectively through different I2C bus interfaces. For example, the processor 110 may couple the touch sensor 180K through an I2C interface, so that the processor 110 and the touch sensor 180K communicate through the I2C bus interface to implement the touch function of the electronic device 100.

The I2S interface can be used for audio communication. In some embodiments, the processor 110 may include multiple sets of I2S buses. The processor 110 may be coupled with the audio module 170 through an I2S bus to implement communication between the processor 110 and the audio module 170. In some embodiments, the audio module 170 may transmit audio signals to the wireless communication module 160 through an I2S interface, so as to realize the function of answering calls through a Bluetooth headset.

The PCM interface can also be used for audio communication to sample, quantize and encode analog signals. In some embodiments, the audio module 170 and the wireless communication module 160 may be coupled through a PCM bus interface. In some embodiments, the audio module 170 may also transmit audio signals to the wireless communication module 160 through the PCM interface, so as to realize the function of answering calls through the Bluetooth headset. Both the I2S interface and the PCM interface can be used for audio communication.

The UART interface is a universal serial data bus used for asynchronous communication. The bus can be a two-way communication bus. It converts the data to be transmitted between serial communication and parallel communication. In some embodiments, the UART interface is generally used to connect the processor 110 and the wireless communication module 160. For example, the processor 110 communicates with the Bluetooth module in the wireless communication module 160 through the UART interface to realize the Bluetooth function. In some embodiments, the audio module 170 may transmit audio signals to the wireless communication module 160 through a UART interface, so as to realize the function of playing music through a Bluetooth headset.

The MIPI interface can be used to connect the processor 110 with the display screen 194, the camera 193 and other peripheral devices. The MIPI interface includes camera serial interface (camera serial interface, CSI), display serial interface (display serial interface, DSI), etc. In some embodiments, the processor 110 and the camera 193 communicate through a CSI interface to implement the shooting function of the electronic device 100. The processor 110 and the display screen 194 communicate through a DSI interface to realize the display function of the electronic device 100.

The GPIO interface can be configured through software. The GPIO interface can be configured as a control signal or as a data signal. In some embodiments, the GPIO interface can be used to connect the processor 110 with the camera 193, the display screen 194, the wireless communication module 160, the audio module 170, the sensor module 180, and so on. The GPIO interface can also be configured as an I2C interface, I2S interface, UART interface, MIPI interface, etc.

The SIM interface can be used to communicate with the SIM card interface 195 to realize the function of transmitting data to the SIM card or reading data in the SIM card.

The USB interface 130 is an interface that complies with the USB standard specification, and specifically may be a Mini USB interface, a Micro USB interface, a USB Type C interface, and so on. The USB interface 130 can be used to connect a charger to charge the electronic device 100, and can also be used to transfer data between the electronic device 100 and peripheral devices. It can also be used to connect earphones and play audio through earphones. The interface can also be used to connect other electronic devices, such as AR devices.

It can be understood that the interface connection relationship between the modules illustrated in the embodiment of the present invention is only a schematic illustration, and does not constitute a structural limitation of the electronic device 100. In other embodiments of the present application, the electronic device 100 may also adopt different interface connection modes in the foregoing embodiments, or a combination of multiple interface connection modes.

The charging management module 140 is used to receive charging input from the charger. Among them, the charger can be a wireless charger or a wired charger.

The power management module 141 is used to connect the battery 142, the charging management module 140 and the processor 110. The power management module 141 receives input from the battery 142 and/or the charging management module 140, and supplies power to the processor 110, the internal memory 121, the external memory, the display screen 194, the camera 193, and the wireless communication module 160.

The wireless communication function of the electronic device 100 can be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modem processor, and the baseband processor.

The antenna 1 and the antenna 2 are used to transmit and receive electromagnetic wave signals. Each antenna in the electronic device 100 can be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve antenna utilization. For example, antenna 1 can be multiplexed as a diversity antenna of a wireless local area network. In other embodiments, the antenna can be used in combination with a tuning switch.

The mobile communication module 150 may provide a wireless communication solution including 2G/3G/4G/5G and the like applied to the electronic device 100. The mobile communication module 150 may include at least one filter, a switch, a power amplifier, a low noise amplifier (LNA), and the like. The mobile communication module 150 can receive electromagnetic waves by the antenna 1, and perform processing such as filtering, amplifying and transmitting the received electromagnetic waves to the modem processor for demodulation. The mobile communication module 150 can also amplify the signal modulated by the modem processor, and convert it into electromagnetic wave radiation via the antenna 1. In some embodiments, at least part of the functional modules of the mobile communication module 150 may be provided in the processor 110. In some embodiments, at least part of the functional modules of the mobile communication module 150 and at least part of the modules of the processor 110 may be provided in the same device.

The modem processor may include a modulator and a demodulator. Among them, the modulator is used to modulate the low frequency baseband signal to be sent into a medium and high frequency signal. The demodulator is used to demodulate the received electromagnetic wave signal into a low-frequency baseband signal. The demodulator then transmits the demodulated low-frequency baseband signal to the baseband processor for processing. The low-frequency baseband signal is processed by the baseband processor and then passed to the application processor. The application processor outputs a sound signal through an audio device (not limited to the speaker 170A, the receiver 170B, etc.), or displays an image or video through the display screen 194. In some embodiments, the modem processor may be an independent device. In other embodiments, the modem processor may be independent of the processor 110 and be provided in the same device as the mobile communication module 150 or other functional modules.

The wireless communication module 160 can provide applications on the electronic device 100 including wireless local area networks (WLAN) (such as wireless fidelity (Wi-Fi) networks), bluetooth (BT), and global navigation satellites. System (global navigation satellite system, GNSS), frequency modulation (frequency modulation, FM), near field communication technology (near field communication, NFC), infrared technology (infrared, IR) and other wireless communication solutions. The wireless communication module 160 may be one or more devices integrating at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via the antenna 2, frequency modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 110. The wireless communication module 160 may also receive a signal to be sent from the processor 110, perform frequency modulation, amplify it, and convert it into electromagnetic waves to radiate through the antenna 2.

In some embodiments, the antenna 1 of the electronic device 100 is coupled with the mobile communication module 150, and the antenna 2 is coupled with the wireless communication module 160, so that the electronic device 100 can communicate with the network and other devices through wireless communication technology. The wireless communication technology may include global system for mobile communications (GSM), general packet radio service (GPRS), code division multiple access (CDMA), broadband Code division multiple access (wideband code division multiple access, WCDMA), time-division code division multiple access (time-division code division multiple access, TD-SCDMA), long term evolution (LTE), BT, GNSS, WLAN, NFC , FM, and/or IR technology, etc. The GNSS may include the global positioning system (GPS), the global navigation satellite system (GLONASS), the Beidou navigation satellite system (BDS), and the quasi-zenith satellite system (quasi). -zenith satellite system, QZSS) and/or satellite-based augmentation systems (SBAS).

The electronic device 100 implements a display function through a GPU, a display screen 194, an application processor, and the like. The GPU is an image processing microprocessor, which is connected to the display screen 194 and the application processor. The GPU is used to perform mathematical and geometric calculations and is used for graphics rendering. The processor 110 may include one or more GPUs that execute program instructions to generate or change display information.

The display screen 194 is used to display images, videos, and the like. The display screen 194 includes a display panel. The display panel can adopt liquid crystal display (LCD), organic light-emitting diode (OLED), active matrix organic light-emitting diode or active-matrix organic light-emitting diode (active-matrix organic light-emitting diode). Emitting diode, AMOLED, flexible light-emitting diode (FLED), Miniled, MicroLed, Micro-oLed, quantum dot light-emitting diode (QLED), etc. In some embodiments, the electronic device 100 may include one or N display screens 194, and N is a positive integer greater than one.

The electronic device 100 can realize a shooting function through an ISP, a camera 193, a video codec, a GPU, a display screen 194, and an application processor.

The ISP is used to process the data fed back from the camera 193. For example, when taking a picture, the shutter is opened, the light is transmitted to the photosensitive element of the camera through the lens, the light signal is converted into an electrical signal, and the photosensitive element of the camera transfers the electrical signal to the ISP for processing and is converted into an image visible to the naked eye. ISP can also optimize the image noise, brightness, and skin color. ISP can also optimize the exposure, color temperature and other parameters of the shooting scene. In some embodiments, the ISP may be provided in the camera 193.

The camera 193 is used to capture still images or videos. The object generates an optical image through the lens and is projected to the photosensitive element. The photosensitive element may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor. The photosensitive element converts the optical signal into an electrical signal, and then transfers the electrical signal to the ISP to convert it into a digital image signal. ISP outputs digital image signals to DSP for processing. DSP converts digital image signals into standard RGB, YUV and other formats of image signals. In some embodiments, the electronic device 100 may include one or N cameras 193, and N is a positive integer greater than one.

Digital signal processors are used to process digital signals. In addition to digital image signals, they can also process other digital signals. For example, when the electronic device 100 selects a frequency point, the digital signal processor is used to perform Fourier transform on the energy of the frequency point.

Video codecs are used to compress or decompress digital video. The electronic device 100 may support one or more video codecs. In this way, the electronic device 100 can play or record videos in multiple encoding formats, such as: moving picture experts group (MPEG) 1, MPEG2, MPEG3, MPEG4, and so on.

NPU is a neural-network (NN) computing processor. By drawing on the structure of biological neural networks, for example, the transfer mode between human brain neurons, it can quickly process input information, and it can also continuously self-learn. Through the NPU, applications such as intelligent cognition of the electronic device 100 can be realized, such as image recognition, face recognition, voice recognition, text understanding, and so on.

The external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the electronic device 100. The external memory card communicates with the processor 110 through the external memory interface 120 to realize the data storage function. For example, save music, video and other files in an external memory card.

The internal memory 121 may be used to store computer executable program code, where the executable program code includes instructions. The processor 110 executes various functional applications and data processing of the electronic device 100 by running instructions stored in the internal memory 121. The internal memory 121 may include a storage program area and a storage data area. Among them, the storage program area can store an operating system, at least one application required by a function (such as a face recognition function, a fingerprint recognition function, a mobile payment function, etc.) and so on. The storage data area can store data created during the use of the electronic device 100 (such as face information template data, fingerprint information template, etc.) and the like. In addition, the internal memory 121 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, a universal flash storage (UFS), and the like.

The electronic device 100 can implement audio functions through the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the earphone interface 170D, and the application processor. For example, music playback, recording, etc.

The audio module 170 is used to convert digital audio information into an analog audio signal for output, and is also used to convert an analog audio input into a digital audio signal. The audio module 170 can also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be provided in the processor 110, or part of the functional modules of the audio module 170 may be provided in the processor 110.

The speaker 170A, also called "speaker", is used to convert audio electrical signals into sound signals. The electronic device 100 can listen to music through the speaker 170A, or listen to a hands-free call.

The receiver 170B, also called a "handset", is used to convert audio electrical signals into sound signals. When the electronic device 100 answers a call or voice message, it can receive the voice by bringing the receiver 170B close to the human ear.

The microphone 170C, also called "microphone", "microphone", is used to convert sound signals into electrical signals. When making a call or sending a voice message, the user can make a sound by approaching the microphone 170C through the human mouth, and input the sound signal into the microphone 170C. The electronic device 100 may be provided with at least one microphone 170C. In other embodiments, the electronic device 100 may be provided with two microphones 170C, which can implement noise reduction functions in addition to collecting sound signals. In other embodiments, the electronic device 100 may also be provided with three, four or more microphones 170C to collect sound signals, reduce noise, identify sound sources, and realize directional recording functions.

The earphone interface 170D is used to connect wired earphones. The earphone interface 170D may be a USB interface 130, or a 3.5mm open mobile terminal platform (OMTP) standard interface, or a cellular telecommunications industry association (cellular telecommunications industry association of the USA, CTIA) standard interface.

The pressure sensor 180A is used to sense the pressure signal and can convert the pressure signal into an electrical signal. In some embodiments, the pressure sensor 180A may be provided on the display screen 194. There are many types of pressure sensors 180A, such as resistive pressure sensors, inductive pressure sensors, capacitive pressure sensors and so on. The capacitive pressure sensor may include at least two parallel plates with conductive materials. When a force is applied to the pressure sensor 180A, the capacitance between the electrodes changes. The electronic device 100 determines the intensity of the pressure according to the change in capacitance. When a touch operation acts on the display screen 194, the electronic device 100 detects the intensity of the touch operation according to the pressure sensor 180A. The electronic device 100 may also calculate the touched position according to the detection signal of the pressure sensor 180A. In some embodiments, touch operations that act on the same touch position but have different touch operation intensities can correspond to different operation instructions. For example: when a touch operation whose intensity is less than the first pressure threshold is applied to the short message application icon, an instruction to view the short message is executed. When a touch operation with a touch operation intensity greater than or equal to the first pressure threshold acts on the short message application icon, an instruction to create a new short message is executed.

The gyro sensor 180B may be used to determine the movement posture of the electronic device 100. In some embodiments, the angular velocity of the electronic device 100 around three axes (ie, x, y, and z axes) can be determined by the gyroscope sensor 180B. The gyro sensor 180B can be used for image stabilization. Exemplarily, when the shutter is pressed, the gyro sensor 180B detects the shake angle of the electronic device 100, calculates the distance that the lens module needs to compensate according to the angle, and allows the lens to counteract the shake of the electronic device 100 through reverse movement to achieve anti-shake. The gyro sensor 180B can also be used for navigation and somatosensory game scenes.

The air pressure sensor 180C is used to measure air pressure. In some embodiments, the electronic device 100 calculates the altitude based on the air pressure value measured by the air pressure sensor 180C to assist positioning and navigation.

The magnetic sensor 180D includes a Hall sensor. The electronic device 100 can use the magnetic sensor 180D to detect the opening and closing of the flip holster. In some embodiments, when the electronic device 100 is a flip machine, the electronic device 100 can detect the opening and closing of the flip according to the magnetic sensor 180D. Then, according to the detected opening and closing state of the leather case or the opening and closing state of the flip cover, features such as automatic unlocking of the flip cover are set.

The acceleration sensor 180E can detect the magnitude of the acceleration of the electronic device 100 in various directions (generally three axes). When the electronic device 100 is stationary, the magnitude and direction of gravity can be detected. It can also be used to identify the posture of electronic devices, and be used in applications such as horizontal and vertical screen switching, pedometers and so on.

Distance sensor 180F, used to measure distance. The electronic device 100 can measure the distance by infrared or laser. In some embodiments, when shooting a scene, the electronic device 100 may use the distance sensor 180F to measure the distance to achieve fast focusing.

The proximity light sensor 180G may include, for example, a light emitting diode (LED) and a light detector such as a photodiode. The light emitting diode may be an infrared light emitting diode. The electronic device 100 emits infrared light to the outside through the light emitting diode. The electronic device 100 uses a photodiode to detect infrared reflected light from nearby objects. When sufficient reflected light is detected, it can be determined that there is an object near the electronic device 100. When insufficient reflected light is detected, the electronic device 100 can determine that there is no object near the electronic device 100. The electronic device 100 can use the proximity light sensor 180G to detect that the user holds the electronic device 100 close to the ear to talk, so as to automatically turn off the screen to save power. The proximity light sensor 180G can also be used in leather case mode, and the pocket mode will automatically unlock and lock the screen.

The ambient light sensor 180L is used to sense the brightness of the ambient light. The electronic device 100 can adaptively adjust the brightness of the display screen 194 according to the perceived brightness of the ambient light. The ambient light sensor 180L can also be used to automatically adjust the white balance when taking pictures. The ambient light sensor 180L can also cooperate with the proximity light sensor 180G to detect whether the electronic device 100 is in the pocket to prevent accidental touch.

The fingerprint sensor 180H is used to collect fingerprints. The electronic device 100 can use the collected fingerprint characteristics to implement fingerprint unlocking, access application locks, fingerprint photographs, fingerprint answering calls, and so on.

The temperature sensor 180J is used to detect temperature. In some embodiments, the electronic device 100 uses the temperature detected by the temperature sensor 180J to execute a temperature processing strategy. For example, when the temperature reported by the temperature sensor 180J exceeds a threshold value, the electronic device 100 reduces the performance of the processor located near the temperature sensor 180J, so as to reduce power consumption and implement thermal protection. In other embodiments, when the temperature is lower than another threshold, the electronic device 100 heats the battery 142 to avoid abnormal shutdown of the electronic device 100 due to low temperature. In some other embodiments, when the temperature is lower than another threshold, the electronic device 100 boosts the output voltage of the battery 142 to avoid abnormal shutdown caused by low temperature.

Touch sensor 180K, also called "touch panel". The touch sensor 180K may be provided on the display screen 194, and the touch screen is composed of the touch sensor 180K and the display screen 194, which is also called a “touch screen”. The touch sensor 180K is used to detect touch operations acting on or near it. The touch sensor can pass the detected touch operation to the application processor to determine the type of touch event. The visual output related to the touch operation can be provided through the display screen 194. In other embodiments, the touch sensor 180K may also be disposed on the surface of the electronic device 100, which is different from the position of the display screen 194.

The button 190 includes a power-on button, a volume button, and so on. The button 190 may be a mechanical button. It can also be a touch button. The electronic device 100 may receive key input, and generate key signal input related to user settings and function control of the electronic device 100.

The motor 191 can generate vibration prompts. The motor 191 can be used for incoming call vibration notification, and can also be used for touch vibration feedback. For example, touch operations that act on different applications (such as taking pictures, audio playback, etc.) can correspond to different vibration feedback effects. Acting on touch operations in different areas of the display screen 194, the motor 191 can also correspond to different vibration feedback effects. Different application scenarios (for example: time reminding, receiving information, alarm clock, games, etc.) can also correspond to different vibration feedback effects. The touch vibration feedback effect can also support customization.

The indicator 192 may be an indicator light, which may be used to indicate the charging status, power change, or to indicate messages, missed calls, notifications, and so on.

The SIM card interface 195 is used to connect to the SIM card. The SIM card can be inserted into the SIM card interface 195 or pulled out from the SIM card interface 195 to achieve contact and separation with the electronic device 100. The electronic device 100 may support 1 or N SIM card interfaces, and N is a positive integer greater than 1. The SIM card interface 195 can support Nano SIM cards, Micro SIM cards, SIM cards, etc. The same SIM card interface 195 can insert multiple cards at the same time. The types of the multiple cards can be the same or different. The SIM card interface 195 can also be compatible with different types of SIM cards. The SIM card interface 195 can also be compatible with external memory cards. The electronic device 100 interacts with the network through the SIM card to implement functions such as call and data communication.

In the embodiment of the present application, the electronic device 100 may receive the user's voice input and environmental information through the microphone 170C and the sensor module 180. After the user's voice input is converted into digital audio information through the audio module 170, the processor 110 may perform voice recognition , Converted into text information. Then execute the intention recognition method in the embodiment of the present application to identify the user's intention and slot, and express it with structured semantics.

FIG. 6 is a block diagram of the software structure of the electronic device 100 according to an embodiment of the present application.

The layered architecture divides the software into several layers, and each layer has a clear role and division of labor. Communication between layers through software interface. In some embodiments, the system is divided into four layers, from top to bottom, the application layer, the application framework layer, the runtime and system libraries, and the kernel layer.

The application layer can include a series of application packages.

As shown in Figure 6, the application package may include applications (also referred to as applications) such as camera, gallery, calendar, call, map, navigation, WLAN, Bluetooth, music, video, short message, etc.

In the embodiment of the present application, the application layer may also include an intention recognition module. The intention recognition module is used to execute the intention recognition method in the embodiment of the present application.

The application framework layer provides an application programming interface (application programming interface, API) and a programming framework for applications in the application layer. The application framework layer includes some predefined functions.

As shown in Figure 6, the application framework layer can include a window manager, a content provider, a view system, a phone manager, a resource manager, a notification manager, and so on.

The window manager is used to manage window programs. The window manager can obtain the size of the display screen, determine whether there is a status bar, lock the screen, take a screenshot, etc.

The content provider is used to store and retrieve data and make these data accessible to applications. The data may include video, image, audio, phone calls made and received, browsing history and bookmarks, phone book, etc.

The view system includes visual controls, such as controls that display text, controls that display pictures, and so on. The view system can be used to build applications. The display interface can be composed of one or more views. For example, a display interface that includes a short message notification icon may include a view that displays text and a view that displays pictures.

The phone manager is used to provide the communication function of the electronic device 100. For example, the management of the call status (including connecting, hanging up, etc.).

The resource manager provides various resources for the application, such as localized strings, icons, pictures, layout files, video files, and so on.

The notification manager enables the application to display notification information in the status bar, which can be used to convey notification-type messages, and it can automatically disappear after a short stay without user interaction. For example, the notification manager is used to notify that the download is complete, message reminders, and so on. The notification manager can also be a notification that appears in the status bar at the top of the system in the form of a chart or a scroll bar text, such as a notification of an application running in the background, or a notification that appears on the screen in the form of a dialogue interface. For example, prompt text information in the status bar, sound a prompt sound, electronic device vibration, flashing indicator light, etc.

Runtime includes core libraries and virtual machines. Runtime is responsible for system scheduling and management.

The core library consists of two parts: one part is the function function that the java language needs to call, and the other part is the core library of the system.

The application layer and the application framework layer run in a virtual machine. The virtual machine executes the java files of the application layer and the application framework layer as binary files. The virtual machine is used to perform functions such as object life cycle management, stack management, thread management, security and exception management, and garbage collection.

The system library can include multiple functional modules. For example: surface manager (surface manager), media library (Media Libraries), three-dimensional graphics processing library (for example: OpenGL ES), two-dimensional graphics engine (for example: SGL), etc.

The surface manager is used to manage the display subsystem, and provides a combination of two-dimensional (2-dimensional, 2D) and three-dimensional (3-dimensional, 3D) layers for multiple applications.

The media library supports playback and recording of a variety of commonly used audio and video formats, as well as still image files. The media library can support a variety of audio and video encoding formats, such as: MPEG4, H.264, MP3, AAC, AMR, JPG, PNG, etc.

The 3D graphics processing library is used to implement 3D graphics drawing, image rendering, synthesis, and layer processing.

The 2D graphics engine is a drawing engine for 2D drawing.

The kernel layer is the layer between hardware and software. The kernel layer contains at least display driver, camera driver, audio driver, sensor driver, and virtual card driver.

In the following, the workflow of the software and hardware of the electronic device 100 will be exemplified in conjunction with capturing a photo scene.

When the touch sensor 180K receives a touch operation, the corresponding hardware interrupt is sent to the kernel layer. The kernel layer processes the touch operation into the original input event (including touch coordinates, time stamp of the touch operation, etc.). The original input events are stored in the kernel layer. The application framework layer obtains the original input event from the kernel layer and identifies the control corresponding to the input event. Taking the touch operation as a touch click operation, and the control corresponding to the click operation is the control of the camera application icon as an example, the camera application calls the interface of the application framework layer to start the camera application, and then starts the camera driver by calling the kernel layer. The camera 193 captures still images or videos.

Specifically, as shown in FIG. 7, it is a schematic diagram of the architecture of the intention recognition module in the embodiment of this application. The intent recognition module is an NLU engine 700.

The NLU engine 700 is used to perform semantic analysis on user input, and output analysis results such as intent and slot position for use by other modules.

The NLU engine 700 includes a text preprocessing unit 701, a rule engine 702, a machine learning engine 703, an entity recognition unit 704, an intent classification unit 705, and a slot filling unit 706.

The text preprocessing unit 701 is used to preprocess the text input by the user, mainly including removing format symbols in the text that are not needed for subsequent semantic analysis, such as punctuation and spaces. It is understandable that the text input by the user is generally obtained after voice recognition of the user's voice information.

The rule engine 702 is configured to perform rule matching on the preprocessed text input by the user according to preset rules based on FST, perform high-frequency sentence pattern coverage, and obtain formatted text with intentional slot format tags. The intention recognition method in the embodiment of the present application is mainly used for the construction of the rule engine 702.

The machine learning engine 703 is used to process the preprocessed text input by the user through machine learning to obtain formatted text marked with an intentional slot format.

Entity recognition 704 is used to extract entity information from formatted text with intentional slot format tags output by the rule engine 702 or the machine learning engine 703;

The intention classification 705 is used to extract the intention information from the formatted text marked with the intention slot format output by the rule engine 702 or the machine learning engine 703;

Slot filling 706 is used to extract slot information from the formatted text of the intentional slot format mark output by the rule engine 702 or the machine learning engine 703.

As shown in FIG. 8, a schematic diagram of a usage scenario of the intention recognition method in an embodiment of this application. After the user inputs "Call Dad" into the electronic device 100 by voice, the electronic device 100 will recognize the user input by voice, convert it into the text input by the user, and then use the NLU engine 700 to perform intent recognition on the text input by the user and output The intention is to "call" and the slot "dad" to the dial-up application. The dialing application dials the phone number of "Dad" according to the intention and slot.

In the process of the NLU engine 700 in the electronic device 100 performing intent recognition on the text input by the user, compared with the existing rule matching system based on the CFG parsing algorithm, the FST-based rule matching system used in the intent recognition method of the embodiment of the present application , The actual engineering test results are shown in Table 1 below:

Hardware environment: CPU: Intel(R)Xeon(R)E5-2690v2@3.00GHz, memory: 32GB;

指标项Index item	CFG规则匹配系统CFG rule matching system	FST规则匹配系统FST rule matching system
规则配置Rule configuration	25万模板250,000 templates	25万规则250,000 rule
构建方式Construction method	人工编写规则Manually write rules	人工编写规则Manually write rules
匹配时延Match delay	限制条件：单节点，并发Restrictions: single node, concurrent	限制条件：单节点，并发Restrictions: single node, concurrent

To

100tps, end-to-end: 150ms

100tps, end-to-end: 12ms

Table 1

It can be clearly seen from the test results shown in Table 1 that using the FST-based rule matching system in the intention identification method of the embodiment of the present application, under the condition that other variables are completely the same, compared with the existing CFG-based analysis With the algorithmic rule matching system, the delay of rule matching is significantly reduced, and the matching speed is greatly improved.

The following specifically describes the intention recognition method in the embodiment of the present application in conjunction with the hardware and software mechanisms of the above exemplary electronic device 100:

As shown in FIG. 9, it is an exemplary schematic diagram of the FST intention slot model in the embodiment of this application.

In the FST intent slot model shown in FIG. 9, the intent information is inserted after the initial state of the FST, and the slot label information is inserted before and after all slots, so as to realize the output of the intent information and the slot identification information when the FST is matched. The FST intent slot model includes states and transitions between states, specifically:

(1) The FST status is represented by a circle plus a status number;

(2) The transition between FST state and state is represented by the connection of directed curves between them;

(3) The conversion function between FST states is the <key>: <value> mapping on the directed edge. When FST accepts <key>, it will output <value>. Among them, <eps> means accepting a null character input (does not consume the current input string);

(4) State 0 and State 14 are the initial state and end state of FST respectively, and the other states are intermediate states.

The FST intent slot model shown in Figure 9 is manually pre-written, and the specific process can be as follows:

(1) Write FST rules. For example, the FST rule of the FST intent slot model shown in Figure 9 can be written as:

person="Dad"|"Mom"|"Xiao Ming";

call_1="Call"("":"<")person("":":contact>");

call_2="Give"("":"<")person("":":contact>")"Call";

export CALL=("":"<CALL>")(call_1|call_2);

(2) Use the rule compiler to compile the FST rules into the FST intent slot model as shown in FIG. 9.

The following describes the process of performing NLU intent recognition and slot filling on text input by the user based on the FST intent slot model in the intent recognition method of the embodiment of the present application in conjunction with the FST intent slot model shown in FIG. 9:

As shown in FIG. 10, it is a schematic flowchart of an intention recognition method in an embodiment of this application.

S1001. In response to a user's voice input, the electronic device converts the voice input into a first text;

Exemplarily, as shown in FIG. 8, when the user speaks to the electronic device 100 by voice: "Call Dad", the electronic device 100 may convert the voice input into text: "Call Dad".

It is understandable that the electronic device 100 may execute S1001 only after receiving a certain trigger. For example, the electronic device 100 may perform step S1001 only after detecting that the user has turned on the voice assistant function; or, the electronic device 100 may perform step S1001 after detecting that the user double-clicks on the screen; or the electronic device 100 may also perform step S1001 after it is turned on. Step S1001 can be executed, which is not limited here.

S1002, the electronic device performs format preprocessing on the first text to obtain the second text;

The main purpose of the electronic device performing format preprocessing on the first text is to remove format characters that are not used in the subsequent FST intent slot model matching in the first text. For example, remove spaces, punctuation marks, etc. in the first text.

Exemplarily, if the first text is: "Call Xiaoming." After the format preprocessing is performed in step S1002, the second text obtained is: "Call Xiaoming". Removed the spaces and periods.

It can be understood that, in some embodiments, step S1002 may not be performed, and step S1003 is directly performed on the first text obtained after step S1001 is performed. At this time, the first text is the second text. Because the FST intent slot model can also add relevant rules for formatting the first text, but this will increase the complexity of the FST intent slot model. Which scheme to use can be selected according to the actual situation and is not limited here.

S1003. The electronic device uses the preset FST intent slot model to perform rule matching on the second text to obtain the third text;

The preset FST intent slot model is a preset FST; the preset FST intent slot model will add preset intent label information and/or preset slot labels to the input text during the state transition process Information, the preset intent labeling information is used to label the intent information of the input text, and the preset slot labeling information is used to label the slot information in the input text.

It is understandable that in some preset FST intent slot models, you can only use the preset intent label information to mark the location of the intent; in some preset FST intent slot models, you can directly add in the input text Intent information, and use the preset intent labeling information to mark the location of the intent information; in some preset FST intent slot models, you can use the preset intent labeling information to mark the location of the intent, and use the preset slot labeling information to mark The location of the slot information; in some preset FST intent slot models, you can directly add intent information to the input text, use the preset intent annotation information to mark the position of the intent information, and use the preset slot annotation information to mark The location of the slot information. There is no limitation here.

The third text is a text containing preset intent labeling information and/or preset slot labeling information after the rule matching is performed on the second text.

Exemplarily, taking the second text as: "Call Xiaoming", using the rule matching in the FST intent slot model shown in Figure 9 as an example to describe the matching process, as shown in Figure 9 below. The FST intent slot model is abbreviated as FST:

A) FST receives null characters and outputs intent information <CALL>, and transfers from initial state 0 to state 1;

B) FST accepts the character "beat", outputs the character "beat", and transfers from state 1 to state 2;

C) FST accepts the character "electricity", outputs the character "electricity", and transfers from state 2 to state 3;

D) FST accepts the character "word", outputs the character "word", and transfers from state 3 to state 4;

E) FST accepts the character "give", outputs the character "give", and transfers from state 4 to state 5;

F) FST accepts empty characters, outputs the slot label information prefix "<", and transfers from state 5 to state 6;

G) FST accepts the character "small", outputs the character "small", and transfers from state 6 to state 7;

H) FST accepts the character "Ming", outputs the character "Ming", and transfers from state 7 to state 10;

I) FST accepts empty characters, outputs the suffix ":contact>" of the slot label information, and transfers from state 10 to termination state 14, thereby completing the matching process;

J) Output the FST matching result, such as: "<CALL> call <小明:contact>".

The output "<CALL> to call <小明:contact>" is the third text obtained after rule matching.

In the same way, according to the FST intent slot model shown in Figure 9, if the input second text is "Call Xiaoming", the output third text is: ""<CALL>to <小明:contact> Phone", the specific matching process will not be repeated here.

It can be seen from the above example that the third text will contain preset intent labeling information, such as <>, and preset slot labeling information, such as <:contact>. Among them, the preset intent tagging information <> indicates that the intention of the input second text is CALL, and the preset slot tagging information <:contact> indicates that the slot in the input second text is Xiaoming.

It is understandable that Fig. 9 is an exemplary FST intent slot model. In actual applications, other preset symbols can also be used as preset intent labeling information and preset slot labeling information according to actual needs, which is not limited here.

S1004. The electronic device obtains intent information and/or slot information from the third text according to preset intent labeling information and/or preset slot labeling information;

After the electronic device obtains the third text containing the preset intent labeling information and the preset slot labeling information, it can extract the intent information and the preset slot labeling information from the third text according to the position of the preset intent labeling information and the preset slot labeling information. Slot information.

Exemplarily, for the third text: "<CALL>Call <小明:contact>", the electronic device labels the information according to the preset intent <>, and the intent information can be extracted as CALL; label according to the preset slot Information <:contact>, the slot information can be extracted as Xiaoming.

S1005. The electronic device outputs the intent information and/or slot information in a structured manner.

After obtaining the intent information and the slot information, the electronic device can structure the output of the intent information and the slot information for use by other modules in the electronic device. The structured output may also contain other information, such as the first text, the second text, etc., which are not limited here.

Exemplarily, for the first text: "Call Xiaoming." After step S1001 to step S1004, after extracting the intent information as CALL and the slot information as Xiaoming, the intent information, the slot information, and the second The texts together form a structured output for use by other modules in the electronic device 100. For example, you can output JSON "{"text":"Call Xiaoming","intent":"CALL","slots":[{"slotName":"contact","slotValue":"Xiaoming"}]} ".

In the embodiment of this application, the electronic device uses a preset FST intent slot model to recognize user input. Since the preset FST intent slot model is an FST, based on the characteristics of FST rule matching, it performs rule matching extremely fast . And after matching, the third text with preset intent labeling information and preset slot labeling information can be obtained. After the matching is completed, the intent information and slot information can be easily extracted from it, which greatly improves the use of NLU rules for intent The speed and accuracy of matching during recognition.

In practical applications, since the electronic device 100 needs to recognize many different types of intents, when constructing the FST intent slot model, a strategy of one intent and one FST intent slot model is generally adopted, which can effectively reduce the number of different intents. Conflict of rules between intentions. When performing rule matching, a multi-intent parallel matching method can be used to minimize the matching delay. Therefore, the electronic device 100 can store a large number of preset FST intent slot models.

However, when multiple preset FST intent slot models are matched to a user input at the same time, multiple outputs may appear. At this time, in order to facilitate the electronic device to determine the credibility of the output intent information and the slot information, the preset FST intent slot model may be a preset WFST intent slot model. Since each state transition in WFST will have a weight, the cumulative weight will be obtained after the final matching is completed.

Therefore, after matching multiple preset WFST intent slot models on a user input, multiple output results and a score corresponding to each output result can be obtained. At this time, the electronic device can extract intent information and slot information from the output result with the highest score.

As shown in FIG. 11, it is a schematic diagram of another process of the intention recognition method according to an embodiment of this application.

1. Offline rules compilation.

(1) The corpus producer or researcher compiles the WFST rules of each intent through the front-end rule editor, and adopts the organization method of one intent and one WFST.

For example, write the following WFST rules:

WFST Rule 1:

word=wildcard(1)<1>;

rule_1="Open" "<" "WeChat" ":app>";

rule_2="Open" "<"word+:app>";

In WFST rule 1, word is a wildcard character, which can match any character, and its weight is 1. The weight for matching other characters is the default weight 0.

The intent of the WFST Rule 1 is: to open the application WeChat.

WFST Rule 2:

rule_1="Open SMS";

There are no wildcard characters in WFST rule 2, and the weight for matching all characters is the default weight of 0.

The intent of this WFST rule 1 is: to open a short message.

(2) The grammar compiler compiles the rule file to obtain the WFST intent slot model file;

The WFST intent slot model file is the WFST intent slot model.

For example, as shown in Figure 12, it is a WFST intent slot model 1 compiled according to WFST rule 1.

The WFST intention slot model 1 includes 10 states. Among them, state 0 represents the initial state, and states 7 and 9 represent the end state. The WFST intent slot model 1 has two paths: 0, 1, 2, 3, 4, 5, 6, 7 and 0, 1, 2, 3, 4, 8, 9:

The path from state 0 to state 4 is the same:

From state 0 to state 1, an empty character <eps> is accepted, and the preset intent label information and intent information <OPEN_APP> are output, and the weight is 0;

From state 1 to state 2, the character "beat" is accepted, and the character "beat" is output, and its weight is 0;

From state 2 to state 3, the character "on" is accepted, and the character "on" is output, and its weight is 0;

From state 3 to state 4, it receives an empty character <eps>, and outputs the first half of the label information of the preset slot <, and its weight is 0;

Starting from state 4, the two paths are different:

One of the paths is 4, 5, 6, 7:

From state 4 to state 5, the character "micro" is accepted, and the character "micro" is output, and its weight is 0;

From state 5 to state 6, the character "letter" is accepted, and the character "letter" is output, and its weight is 0;

From state 6 to state 7, an empty character <eps> is received, and the second half of the label information of the preset slot: app> is output, and its weight is 0;

The other path is 4, 8, 9:

From state 4 to state 8 is to receive an arbitrary character <any>, output the arbitrary character <any>, and its weight is 1;

From state 8 to state 8 is to receive an arbitrary character <any>, output the arbitrary character <any>, and its weight is 1;

From state 8 to state 9, an empty character <eps> is received, and the second half of the preset slot label information: app> is output, and its weight is 0.

The calculation method of the cumulative weight of the WFST intention slot model 1 is the addition of the weights in the path.

As shown in Figure 13, it is a WFST intent slot model 2 compiled according to WFST rule 2.

The WFST intent slot model 2 includes 6 states. Among them, state 0 represents the initial state, and state 5 represents the end state. The WFST intent slot model 2 has only one path: 0, 1, 2, 3, 4, 5.

From state 0 to state 1, an empty character <eps> is accepted, and the preset intention label information and intention information <CHECK_MESSAGE> are output, and the weight is 0;

From state 3 to state 4, the character "short" is accepted, and the character "short" is output, and its weight is 0;

From state 4 to state 5, the character "letter" is accepted, and the character "letter" is output, and its weight is 0.

The cumulative weight of the WFST intention slot model 2 is calculated by adding the weights in the path.

(3) Output the WFST intent slot model file for persistent storage.

2. Online rule matching.

(1) First load all the compiled WFST intent slot model files and initialize the NLU engine;

(2) Use the WFST intent slot model to perform parallel rule matching on user input to obtain output results. Each output result includes intent identification information, slot identification information and weight;

(3) Use one or more WFST matching results with the highest structured output score.

The following describes (2) and (3) of the online rule matching in detail in conjunction with the WFST intention slot model shown in FIG. 12 and FIG. 13.

Please refer to FIG. 14, which is a schematic diagram of another process of the intention recognition method in an embodiment of this application.

S1401, the electronic device loads the preset WFST intent slot model;

The electronic device can load the preset WFST intent slot model by reading the WFST intent slot model file stored in the electronic device.

It is understandable that when the preset WFST intent slot model is loaded, all of it can be loaded; or a part of it can be loaded according to the current scene of the terminal, which is not limited here.

S1402, in response to the user's voice input, the electronic device converts the voice input into a first text;

It is similar to step S1001 and will not be repeated here.

S1403. The electronic device performs format preprocessing on the first text to obtain the second text;

It is similar to step S1002 and will not be repeated here.

It can be understood that steps S1402 and S1403 can be performed simultaneously with step S1401, can be performed before step S1401, or can be performed after step S1401, which is not limited here.

S1404. The electronic device uses multiple preset WFST intent slot models to perform parallel rule matching on the second text to obtain a WFST result;

A preset WFST intent slot model is a preset WFST. The preset WFST intent slot model will add preset intent labeling information and/or preset slot labeling information to the input text during the state transition process. The preset intent labeling information is used to label the input text. Intentionally, the preset slot labeling information is used to label the slot information in the input text.

It is understandable that in some preset WFST intent slot models, you can only use the preset intent label information to mark the location of the intent; in some preset WFST intent slot models, you can directly add in the input text Intent information, and use the preset intent labeling information to mark the location of the intent information; in some preset WFST intent slot models, you can use the preset intent labeling information to mark the location of the intent, and use the preset slot labeling information to mark The position of the slot information; in some preset WFST intent slot models, you can directly add intent information to the input text, use the preset intent annotation information to mark the position of the intent information, and use the preset slot annotation information to mark The location of the slot information. There is no limitation here.

In the preset WFST intent slot model, each state transition has a weight. The WFST result includes the matching text of the second text and the cumulative weight of the matching path. The matching text of the second text is the output text containing the preset intent label information and/or the preset slot label information after successful matching through the matching path.

As shown in FIG. 15, it is an exemplary schematic diagram of performing parallel rule matching of multiple preset WFST intent slot models on the second text in an embodiment of this application. The electronic device adopts a strategy of intent to a WFST intent slot model, and stores multiple preset WFST intent slot models, such as CALL.fst, MESSAGE.fst, NAVIGATE.fst, and so on. After the electronic device loads these preset WFST intent slot models, it can perform parallel rule matching on the second text obtained by preprocessing. Because in the preset WFST intent slot model, some models have wildcards in some paths, the second text may only match one path in a preset WFST intent slot model, or it may be matched with multiple presets. Multiple paths in the WFST intent slot model are all matched successfully. Therefore, one WFST result may be obtained, or multiple WFST results may be obtained.

When only one WFST result is obtained, it can be directly determined that the WFST result is the most reliable WFST result, and step S1406 is executed;

When multiple WFST results are obtained, step S1405 needs to be executed, and the WFST result with the highest credibility among them is determined according to the cumulative weight.

It is understandable that the weights in the preset WFST intent slot model can be customized, and the calculation method of cumulative weights can also be customized. Therefore, the concept of weight and cumulative weight can be different according to the setting. The calculation method of the weight can be different, and accordingly, the method of determining the WFST result with the highest credibility can also be different, which is not limited here.

As shown in FIG. 16, this is an exemplary schematic diagram of a case where different settings are used for the weight and the cumulative weight in the preset WFST intent slot model in this embodiment of the present application. The following is an exemplary description of these different settings:

Case 1: The weight of a state transition that accepts wildcards is greater than that of a state transition that does not accept wildcards, and the greater the weight of each state transition on the matching path, the greater the cumulative weight of the matching path.

In this case, the weight represents the cost of state transition, and the cumulative weight represents the cost of matching paths. The greater the cumulative weight, the greater the cost of WFST matching the current path, and correspondingly, the lower the credibility. Therefore, when the state transitions, the weight of the state transition that accepts the wildcard is greater than the weight of the state transition that does not accept the wildcard. And you can also set the weight according to the degree and range of wildcard wildcards: set the weight of wildcards with a high degree of wildcarding and a large wildcard range to be greater than those with a low degree of wildcarding and a small wildcard range. .

The WFST intent slot model 1 shown in FIG. 11 and the WFST intent slot model 2 shown in FIG. 12 are the WFST intent slot models set according to the situation 1. Taking the second text as "Open SMS" as an example, when the second text uses WFST intent slot model 1 and WFST intent slot model 2 to perform parallel rule matching:

WFST intent slot model 1 will follow the

matching path

0, 1, 2, 3, 6, 10, 10, 11 to get WFST result 1: <OPEN_APP> open <message:app>,Weight:2.

WFST intent slot model 2 will follow the

matching path

0, 1, 2, 3, 4, 5 to get WFST result 2: <CHECK_MESSAGE> open the message, Weight:0.

Since the matching path of WFST intent slot model 1: state 9 to state 10, state 10 to state 10 is wildcard matching, its weight 1 is higher than the default weight 0 of other state transitions that do not use wildcards. Since the calculation method of the cumulative weight in the WFST intent slot model 1 is the sum of the weights of the state transitions on the matching path, the cumulative weight is 2.

However, there is no wildcard state transition on the matching path of the WFST intent slot model 2, so the weight of each state transition is the default weight 0. The final cumulative weight is also 0.

Case 2: The weight of the state transition that accepts the wildcard is greater than the weight of the state transition that does not accept the wildcard, and the greater the weight of each state transition on the matching path, the smaller the cumulative weight of the matching path.

In some cases, the weight of the state transition can be set to indicate the cost of the state transition, and the cumulative weight of the matching path indicates the credibility of the matching path. The greater the cumulative weight, the higher the credibility.

In this case, the cumulative weight calculation method can be set so that the greater the weight of the state transition on the matching path, the smaller the cumulative weight of the matching path. For example, the calculation method of the cumulative path adopts the negative number of the sum of state transitions on the matching path, etc., which is not limited here.

Case 3: The weight of the state transition that accepts the wildcard is less than the weight of the state transition that does not accept the wildcard, and the smaller the weight of each state transition on the matching path, the greater the cumulative weight of the matching path.

In some cases, the weight of the state transition can be set to indicate the credibility of the state transition, and the cumulative weight of the matching path indicates the cost of the matching path. The larger the cumulative weight, the higher the cost and the lower the credibility.

Case 4: The weight of the state transition that accepts the wildcard is less than the weight of the state transition that does not accept the wildcard, and the smaller the weight of each state transition on the matching path, the smaller the cumulative weight of the matching path.

In some cases, the weight of the state transition can be set to indicate the credibility of the state transition, and the cumulative weight of the matching path indicates the credibility of the matching path. The greater the cumulative weight, the higher the credibility.

It is understandable that what kind of situation is used to set the preset WFST intent slot model can be determined according to actual needs and is not limited here.

S1405. When there are multiple WFST results, the electronic device determines that the matching text of the second text in the WFST result with the highest credibility is the third text;

In some cases, electronic devices can directly use the cumulative weights in the WFST results for comparison. For example, when the cumulative weight represents credibility, it is determined that the higher the cumulative weight, the higher the credibility of the WFST result. When the cumulative weight represents the cost, it is determined that the smaller the cumulative weight, the higher the credibility of the WFST result.

However, in actual applications, since each preset WFST intent slot model may have a different number of state transitions, in order to make the final credibility comparison fair. Therefore, the cumulative weight can be normalized first to obtain the credibility score, and then the WFST result with the highest credibility can be determined according to the credibility score. Specifically:

1. When there are multiple WFST results, the electronic device performs credibility score calculation on the multiple WFST results;

Specifically, if the cumulative weight represents the cost of matching paths.

Then a formula for calculating the credibility score can be:

Reliability score calculation formula 1:

Among them, w represents the cumulative weight of the matching path in the WFST results that need to be scored, and w _max represents the largest cumulative weight among the cumulative weights of the matching path in the multiple WFST results. The score range after the normalized credibility score calculation is [0, 1], the higher the score, the higher the credibility.

Exemplarily, take the WFST result 1: <OPEN_APP> open <message:app>, Weight:2, and WFST result 2: <CHECK_MESSAGE> open the SMS, Weight:0, to calculate the credibility score obtained in the above example As an example, since the maximum cumulative weight in WFST result 1 and WFST result 2 is 2, so w _max is 2, then:

The credibility score of WFST result 1 score(w ₁ ,w _max )=1-2/2=0;

The credibility score of WFST result 2 is score(w ₂ , w _max )=1-0/2=1.

It is understandable that the credibility score calculation formula 1 is an optional credibility score calculation formula when the cumulative weight represents the cost of the matching path. In practical applications, other calculation formulas can also be used to determine the credibility score of the WFST results. It is only necessary to make the higher the score, the higher the credibility.

It is understandable that if the cumulative weight represents the credibility of the matching path, other formulas can be used to calculate the normalized credibility score, so that the higher the score, the higher the credibility, so I won’t repeat it here. .

2. Determine the matching text of the second text in the WFST result with the highest credibility score as the third text;

Exemplarily, in WFST result 1 and WFST result 2, the highest credibility score is WFST result 2. Therefore, the electronic device can determine that the matching text "<CHECK_MESSAGE> Open SMS" of the second text in WFST result 2 is the first Three texts.

S1406: The electronic device obtains the intent information and/or slot information from the third text according to the preset intent labeling information and/or the preset slot labeling information;

It is similar to step S1004, and will not be repeated here.

Exemplarily, if the third text is: <CHECK_MESSAGE> Open the short message. The electronic device labels the information <> according to the preset intention, and can extract the intention information from the third text: CHECK_MESSAGE. It is understandable that if there is slot marking information in the third text, the electronic device can also extract the slot information from the third text, which is not limited here.

S1407. The electronic device outputs the intent information and/or slot information in a structured manner.

It is similar to step S1005 and will not be repeated here.

Exemplarily, after extracting the intention information CHECK_MESSAGE from the third text in the WFST result 2, the electronic device can structure the intention information:

JSON (no slots):

The structured output can be used by other modules in the electronic device.

In the embodiment of the present application, multiple preset WFST intent slot models can be stored in the electronic device. When performing intent recognition, the electronic device can use multiple WFST intent slot models to match user input in parallel. From the multiple WFST results obtained, the WFST result with the highest credibility is determined to extract and output the intention information and slot information. While improving the rate of intent recognition, it also improves the accuracy of intent recognition, and the WFST intent slot model is used for parallel matching, which greatly improves the matching rate and reduces the computing load of electronic devices.

As mentioned above, the above embodiments are only used to illustrate the technical solutions of the present application, but not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: The technical solutions recorded in the embodiments are modified, or some of the technical features are equivalently replaced; and these modifications or replacements do not cause the essence of the corresponding technical solutions to deviate from the scope of the technical solutions of the embodiments of the present application.

As used in the above embodiments, depending on the context, the term "when" can be interpreted as meaning "if..." or "after" or "in response to determining..." or "in response to detecting...". Similarly, depending on the context, the phrase "when determining..." or "if detected (statement or event)" can be interpreted as meaning "if determined..." or "in response to determining..." or "when detected (Condition or event stated)" or "in response to detection of (condition or event stated)".

In the foregoing embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented by software, it can be implemented in the form of a computer program product in whole or in part. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the processes or functions described in the embodiments of the present application are generated in whole or in part. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices. The computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium. For example, the computer instructions may be transmitted from a website, computer, server, or data center. Transmission to another website, computer, server or data center via wired (such as coaxial cable, optical fiber, digital subscriber line) or wireless (such as infrared, wireless, microwave, etc.). The computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or a data center integrated with one or more available media. The usable medium may be a magnetic medium, (for example, a floppy disk, a hard disk, and a magnetic tape), an optical medium (for example, a DVD), or a semiconductor medium (for example, a solid state hard disk).

A person of ordinary skill in the art can understand that all or part of the process in the above-mentioned embodiment method can be realized. The process can be completed by a computer program instructing relevant hardware. The program can be stored in a computer readable storage medium. , May include the processes of the foregoing method embodiments. The aforementioned storage media include: ROM or random storage RAM, magnetic disks or optical discs and other media that can store program codes.

Claims

An intention recognition method, which is characterized in that it includes:

In response to the user's voice input, the electronic device converts the voice input into the first text;

The electronic device uses a preset FST intent slot model to perform rule matching on the second text to obtain a third text; the second text is determined according to the first text; the preset FST intent slot model is A preset FST; the third text includes preset intent tagging information and/or preset slot tagging information, the preset intent tagging information is used to annotate the intent information of the second text, the The preset slot marking information is used to mark the slot information in the second text;

The electronic device obtains intent information and/or slot information from the third text according to preset intent labeling information and/or preset slot labeling information.
The method according to claim 1, wherein the electronic device uses a preset FST intent slot model to perform rule matching on the second text, and before the step of obtaining the third text, the method further comprises:

The electronic device performs format preprocessing on the first text to obtain the second text; the format characters in the second text are less than or equal to the format characters in the first text.
The method according to claim 1 or 2, wherein the electronic device obtains the intent information and/or the slot from the third text according to preset intent labeling information and/or preset slot labeling information After the information step, the method further includes:

The electronic device outputs the intention information and/or slot information in a structured manner.
The method according to any one of claims 1 to 3, wherein the preset FST intent slot model is a preset WFST intent slot model, and a preset WFST intent slot model is a In the preset WFST, in the preset WFST intent slot model, each state transition has a weight.
The method according to claim 4, wherein the electronic device uses a preset FST intent slot model to perform rule matching on the second text to obtain the third text, which specifically includes:

The electronic device uses a plurality of preset WFST intent slot models to perform parallel rule matching on the second text to obtain a WFST result; wherein the WFST result includes the matching text of the second text and the cumulative weight of the matching path, The matching text of the second text is a text that contains preset intent labeling information and/or preset slot labeling information that is output after successful matching through the matching path;

When the WFST result is one, the electronic device determines that the matching text of the second text in the WFST result is the third text;

When there are multiple WFST results, the electronic device determines that the matching text of the second text in the WFST result with the highest credibility is the third text.
The method according to claim 5, wherein when there are multiple WFST results, the electronic device determines that the matching text of the second text in the WFST result with the highest credibility is the first Three texts, specifically including:

When there are multiple WFST results, the electronic device performs credibility score calculation on the multiple WFST results according to the cumulative weights of the matching paths in the multiple WFST results;

The electronic device determines that the matching text of the second text in the WFST result with the highest credibility score is the third text.
The method according to claim 6, wherein in the preset WFST intent slot model, the weight of the state transition that accepts wildcards is greater than the weight of the state transition that does not accept wildcards, and each state transition on the matching path The greater the weight of, the greater the cumulative weight of the matching path.
The method according to claim 7, wherein in the preset WFST intent slot model, the cumulative weight of the matching path is equal to the sum of the weights of each state transition on the matching path.
The method according to claim 7 or 8, wherein the electronic device performs credibility score calculation on the multiple WFST results according to the cumulative weights of the matching paths in the multiple WFST results, which specifically includes:

The electronic device uses the credibility score calculation formula 1 to calculate the credibility scores of multiple WFST results;

Reliability score calculation formula 1:

Wherein, w represents the cumulative weight of the matching path in the WFST results that need to be scored, and w max represents the largest cumulative weight among the cumulative weights of the matching path in the multiple WFST results.
The method according to any one of claims 4 to 9, wherein the electronic device uses a preset FST intent slot model to perform rule matching on the second text, and before the step of obtaining the third text, the Methods also include:

The electronic device loads a preset WFST intent slot model.
An electronic device, characterized in that, the electronic device includes: one or more processors and memories;

The memory is coupled with the one or more processors, and the memory is used to store computer program code, the computer program code includes computer instructions, and the one or more processors invoke the computer instructions to cause the Electronic equipment execution:

In response to the user's voice input, converting the voice input into the first text;

Use the preset FST intent slot model to perform rule matching on the second text to obtain the third text; the second text is determined according to the first text; the preset FST intent slot model is a preset FST; the third text includes preset intent tagging information and/or preset slot tagging information, the preset intent tagging information is used to annotate the intent information of the second text, the preset slot The labeling information is used to label the slot information in the second text;

According to the preset intent labeling information and/or the preset slot labeling information, the intent information and/or the slot information are obtained from the third text.
The electronic device according to claim 11, wherein the one or more processors are further configured to call the computer instructions to cause the electronic device to execute:

Performing format preprocessing on the first text to obtain the second text; format characters in the second text are less than or equal to format characters in the first text.
The electronic device according to claim 11 or 12, wherein the one or more processors are further configured to call the computer instructions to cause the electronic device to execute:

The intention information and/or slot information are output in a structured manner.
The electronic device according to any one of claims 11 to 13, wherein the preset FST intent slot model is a preset WFST intent slot model, and a preset WFST intent slot model is A preset WFST, in the preset WFST intent slot model, each state transition has a weight.
The electronic device according to claim 14, wherein the one or more processors are specifically configured to invoke the computer instructions to cause the electronic device to execute:

Use multiple preset WFST intent slot models to perform parallel rule matching on the second text to obtain a WFST result; wherein, the WFST result includes the matching text of the second text and the cumulative weight of the matching path, and the second The matching text of the text is the text that contains preset intent label information and/or preset slot label information that is output after successful matching through the matching path;

When the WFST result is one, it is determined that the matching text of the second text in the WFST result is the third text;

When there are multiple WFST results, it is determined that the matching text of the second text in the WFST result with the highest credibility is the third text.
The electronic device according to claim 15, wherein the one or more processors are specifically configured to invoke the computer instructions to cause the electronic device to execute:

When there are multiple WFST results, perform credibility score calculation on the multiple WFST results according to the cumulative weights of the matching paths in the multiple WFST results;

It is determined that the matching text of the second text in the WFST result with the highest credibility score is the third text.
The electronic device according to claim 16, wherein in the preset WFST intent slot model, the weight of the state transition that accepts wildcards is greater than the weight of the state transition that does not accept wildcards, and each state on the matching path The greater the weight of the transfer, the greater the cumulative weight of the matching path.
The electronic device of claim 17, wherein in the preset WFST intent slot model, the cumulative weight of the matching path is equal to the sum of the weights of each state transition on the matching path.
The electronic device according to claim 17 or 18, wherein the one or more processors are specifically configured to invoke the computer instructions to cause the electronic device to execute:

Use the credibility score calculation formula 1 to calculate the credibility score of multiple WFST results;

Reliability score calculation formula 1:

Wherein, w represents the cumulative weight of the matching path in the WFST results that need to be scored, and w max represents the largest cumulative weight among the cumulative weights of the matching path in the multiple WFST results.
The electronic device according to any one of claims 14 to 19, wherein the one or more processors are further configured to call the computer instructions to cause the electronic device to execute:

Load the preset WFST intent slot model.
A chip system, the chip system is applied to an electronic device, the chip system includes one or more processors, the processor is used to call computer instructions to make the electronic device execute any one of claims 1-10 The method described in the item.
A computer program product containing instructions, characterized in that, when the computer program product runs on an electronic device, the electronic device is caused to execute the method according to any one of claims 1-10.
A computer-readable storage medium, comprising instructions, characterized in that, when the instructions run on an electronic device, the electronic device is caused to execute the method according to any one of claims 1-10.