CN113806473A - Intention recognition method and electronic equipment - Google Patents

Intention recognition method and electronic equipment Download PDF

Info

Publication number
CN113806473A
CN113806473A CN202010555603.3A CN202010555603A CN113806473A CN 113806473 A CN113806473 A CN 113806473A CN 202010555603 A CN202010555603 A CN 202010555603A CN 113806473 A CN113806473 A CN 113806473A
Authority
CN
China
Prior art keywords
text
wfst
electronic device
intention
preset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010555603.3A
Other languages
Chinese (zh)
Inventor
潘龙飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN202010555603.3A priority Critical patent/CN113806473A/en
Priority to PCT/CN2021/100475 priority patent/WO2021254411A1/en
Publication of CN113806473A publication Critical patent/CN113806473A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Abstract

An intention recognition method and an electronic device. In the method, after the electronic equipment converts voice input into a first text, a preset FST intention slot model is used for carrying out rule matching on a second text determined according to the first text to obtain a third text, and intention information and/or slot position information are obtained from the third text according to preset intention labeling information and/or preset slot position labeling information. According to the technical scheme, matching speed and accuracy are improved when the NLU rule is used for intention recognition.

Description

Intention recognition method and electronic equipment
Technical Field
The present application relates to the field of artificial intelligence technologies, and in particular, to an intention recognition method and an electronic device.
Background
Natural Language Processing (NLP) is a sub-domain of Artificial Intelligence (AI). Natural Language Understanding (NLU) is a sub-field of natural language processing and is also one of the most difficult problems of NLP. Intent recognition and slot filling are the most critical two tasks of the NLU, but the NLU has great difficulty in well completing the two tasks due to factors such as language diversity, ambiguity, robustness, knowledge dependence and context.
At present, there is a syntax analysis method based on a Context Free Grammar (CFG) Parsing algorithm (Parsing algorithm), for example, using a CYK algorithm (cocke kasami algorithm, CYK algorithm) or the like: the CFG of the generic form is first converted to the form of the geomsky normal form (CNF), and the CNF grammar resulting from the conversion is then parsed bottom-up using the CYK algorithm. The CYK algorithm uses a dynamic programming idea, starts from input words, and performs recursive specification to an initial state step by step according to a grammar, and finally completes grammar analysis. When the syntactic analysis is completed, the whole analysis path can generate a syntactic analysis tree, and a user can extract the syntactic characteristics according to the syntactic tree, so as to obtain the desired syntactic information, such as the information of parts of speech, entities, sentence components and the like.
However, the CFG parsing algorithm is used for parsing to obtain the intentions and slots identified by intentions, and the method is feasible under the condition of less grammar rule quantity. However, when the number of grammar rules is large, the parsing speed is greatly affected, the parsing is slow, and even a grammar parsing service is not available.
Disclosure of Invention
The application provides an intention identification method and electronic equipment, which improve the matching speed and accuracy when an NLU rule is used for intention identification.
In a first aspect, the present application provides an intent recognition method, comprising: in response to a voice input by a user, the electronic device converts the voice input to a first text; the electronic equipment uses a preset FST intention slot position model to carry out rule matching on the second text to obtain a third text; the second text is determined according to the first text; the preset FST intention slot model is a preset FST; the third text comprises preset intention marking information and/or preset slot position marking information, the preset intention marking information is used for marking out intention information of the second text, and the preset slot position marking information is used for marking out slot position information in the second text; and the electronic equipment acquires intention information and/or slot position information from the third text according to preset intention marking information and/or preset slot position marking information.
In the above embodiment, the electronic device performs intent recognition on the user input by using the preset FST intent slot model, and since the preset FST intent slot model is an FST, it performs rule matching very quickly based on the characteristic of FST rule matching. And after matching, a third text with preset intention marking information and preset slot position marking information can be obtained, intention information and slot position information can be conveniently extracted from the third text after matching is completed, and matching speed and accuracy during intention identification by using an NLU rule are greatly improved.
With reference to some embodiments of the first aspect, in some embodiments, before the step of performing rule matching on the second text by using a preset FST intention slot model by the electronic device to obtain a third text, the method further includes: the electronic equipment performs format preprocessing on the first text to obtain a second text; the format characters in the second text are less than or equal to the format characters in the first text.
In the above embodiment, the electronic device may perform format preprocessing on the first text to obtain the second text. Therefore, the complexity of the preset FST intention slot model for carrying out FST rule matching on the second text can be simplified, and the matching speed is further improved.
With reference to some embodiments of the first aspect, in some embodiments, after the step of obtaining, by the electronic device, the intention information and/or the slot position information from the third text according to the preset intention tagging information and/or the preset slot position tagging information, the method further includes: the electronic device performs structured output on the intention information and/or the slot position information.
In the above embodiments, the electronic device may perform structured output of the intent information and/or slot information, which facilitates use of the intent information and/or slot information by other modules in the electronic device.
With reference to some embodiments of the first aspect, in some embodiments, the default FST-intended slot model is a default WFST-intended slot model, and a default WFST-intended slot model is a default WFST, and each state transition in the default WFST-intended slot model has a weight.
In the above embodiment, the preset FST intention slot model is the preset WFST intention slot model, so that when performing parallel matching, a weight of each match is obtained, which facilitates screening of matching results.
With reference to some embodiments of the first aspect, in some embodiments, the electronic device performs rule matching on the second text by using a preset FST intention slot model to obtain a third text, which specifically includes: the electronic equipment uses a plurality of preset WFST intention slot models to carry out parallel rule matching on the second text to obtain a WFST result; the WFST result comprises a matching text of a second text and an accumulated weight of a matching path, wherein the matching text of the second text is a text which is output after being successfully matched through the matching path and contains preset intention marking information and/or preset slot position marking information; when the WFST result is one, the electronic equipment determines that the matching text of the second text in the WFST result is the third text; when the WFST result is multiple, the electronic device determines that the matching text of the second text in the WFST result with the highest credibility is the third text.
In the above embodiment, the electronic device determines the matching text of the second text in the WFST result with the highest credibility after the parallel rule is matched as the third text. The accuracy of intention identification is guaranteed while parallel matching efficiency is improved.
With reference to some embodiments of the first aspect, in some embodiments, when the WFST result is multiple, the determining, by the electronic device, that a matching text of the second text in the WFST result with the highest confidence level is the third text specifically includes: when the WFST results are multiple, the electronic equipment carries out credibility score calculation on the multiple WFST results according to the cumulative weight of the matched paths in the multiple WFST results; the electronic device determines the matching text of the second text in the WFST result with the highest confidence score as the third text.
In the embodiment, the electronic device determines the matching text of the second text in the WFST result with the highest credibility score as the third text, so that the accuracy of credibility evaluation is improved.
With reference to some embodiments of the first aspect, in some embodiments, in the preset WFST intention slot model, the weight of accepting a state transition of a wildcard is greater than the weight of not accepting a state transition of a wildcard, and the greater the weight of each state transition on a matching path is, the greater the cumulative weight of the matching path is.
In some embodiments, in combination with some embodiments of the first aspect, in the preset WFST intention slot model, the cumulative weight of the matching path is equal to the sum of the weights of each state transition on the matching path.
With reference to some embodiments of the first aspect, in some embodiments, the electronic device performs, according to the cumulative weight of the matching path in the multiple WFST results, calculation of the confidence score for the multiple WFST results, specifically including: the electronic equipment calculates the credibility scores of a plurality of WFST results by using a credibility score calculation formula 1;
confidence score calculation equation 1:
Figure BDA0002544192610000031
wherein w represents the cumulative weight of the matching path in the WFST result to be scored, and wmaxThe maximum cumulative weight among the cumulative weights of the matching paths in the plurality of WFST results is indicated.
With reference to some embodiments of the first aspect, in some embodiments, before the step of performing rule matching on the second text by using a preset FST intention slot model by the electronic device to obtain a third text, the method further includes: the electronic device loads a preset WFST intent slot model.
In a second aspect, an embodiment of the present application provides an electronic device, including: one or more processors and memory; the memory coupled with the one or more processors, the memory to store computer program code, the computer program code including computer instructions, the one or more processors to invoke the computer instructions to cause the electronic device to perform: in response to a voice input by a user, converting the voice input into a first text; performing rule matching on the second text by using a preset FST intention slot model to obtain a third text; the second text is determined according to the first text; the preset FST intention slot position model is a preset FST; the third text comprises preset intention marking information and/or preset slot position marking information, the preset intention marking information is used for marking out intention information of the second text, and the preset slot position marking information is used for marking out slot position information in the second text; and acquiring intention information and/or slot position information from the third text according to the preset intention marking information and/or the preset slot position marking information.
In the above embodiment, the electronic device performs intent recognition on the user input by using the preset FST intent slot model, and since the preset FST intent slot model is an FST, it performs rule matching very quickly based on the characteristic of FST rule matching. And after matching, a third text with preset intention marking information and preset slot position marking information can be obtained, intention information and slot position information can be conveniently extracted from the third text after matching is completed, and matching speed and accuracy during intention identification by using an NLU rule are greatly improved.
In some embodiments, in combination with some embodiments of the first aspect, the one or more processors are further configured to invoke the computer instructions to cause the electronic device to perform: carrying out format preprocessing on the first text to obtain a second text; the format characters in the second text are less than or equal to the format characters in the first text.
In some embodiments, in combination with some embodiments of the first aspect, the one or more processors are further configured to invoke the computer instructions to cause the electronic device to perform: and outputting the intention information and/or the slot position information in a structured mode.
With reference to some embodiments of the first aspect, in some embodiments, the default FST-intended slot model is a default WFST-intended slot model, and a default WFST-intended slot model is a default WFST, and each state transition in the default WFST-intended slot model has a weight.
With reference to some embodiments of the first aspect, in some embodiments, the one or more processors are specifically configured to invoke the computer instructions to cause the electronic device to perform: performing parallel rule matching on the second text by using a plurality of preset WFST intention slot models to obtain a WFST result; the WFST result comprises a matching text of a second text and the cumulative weight of a matching path, wherein the matching text of the second text is a text which is output after being successfully matched through the matching path and contains preset icon annotation information and/or preset slot position annotation information; when the WFST result is one, determining that the matching text of the second text in the WFST result is the third text; when the WFST result is multiple, determining that the matching text of the second text in the WFST result with the highest credibility is the third text.
With reference to some embodiments of the first aspect, in some embodiments, the one or more processors are specifically configured to invoke the computer instructions to cause the electronic device to perform: when the WFST results are multiple, carrying out credibility score calculation on the multiple WFST results according to the cumulative weight of the matched path in the multiple WFST results; and determining the matching text of the second text in the WFST result with the highest credibility score as the third text.
With reference to some embodiments of the first aspect, in some embodiments, in the preset WFST intention slot model, the weight of accepting a state transition of a wildcard is greater than the weight of not accepting a state transition of a wildcard, and the greater the weight of each state transition on a matching path is, the greater the cumulative weight of the matching path is.
In some embodiments, in combination with some embodiments of the first aspect, in the preset WFST intention slot model, the cumulative weight of the matching path is equal to the sum of the weights of each state transition on the matching path.
With reference to some embodiments of the first aspect, in some embodiments, the one or more processors are specifically configured to invoke the computer instructions to cause the electronic device to perform: calculating the confidence scores of the multiple WFST results by using a confidence score calculation formula 1;
confidence score calculation equation 1:
Figure BDA0002544192610000041
w represents the cumulative weight of the matching path in the WFST results to be scored, and represents the largest cumulative weight among the cumulative weights of the matching paths in the plurality of WFST results.
In some embodiments, in combination with some embodiments of the first aspect, the one or more processors are further configured to invoke the computer instructions to cause the electronic device to perform: the preset WFST intent slot model is loaded.
In a third aspect, an embodiment of the present application provides a chip system, where the chip system is applied to an electronic device, and the chip system includes one or more processors, and the processor is configured to invoke a computer instruction to cause the electronic device to perform a method as described in the first aspect and any possible implementation manner of the first aspect.
It is understood that the system-on-chip may include one processor 110 in the electronic device 100 shown in fig. 5, and may also include a plurality of processors 110 in the electronic device 100 shown in fig. 5, which is not limited herein.
In a fourth aspect, embodiments of the present application provide a computer program product including instructions, which, when run on an electronic device, cause the electronic device to perform the method described in the first aspect and any possible implementation manner of the first aspect.
In a fifth aspect, an embodiment of the present application provides a computer-readable storage medium, which includes instructions that, when executed on an electronic device, cause the electronic device to perform the method described in the first aspect and any possible implementation manner of the first aspect.
It is understood that the electronic device provided by the second aspect, the chip system provided by the third aspect, the computer program product provided by the fourth aspect, and the computer storage medium provided by the fifth aspect are all used to execute the method provided by the embodiments of the present application. Therefore, the beneficial effects achieved by the method can refer to the beneficial effects in the corresponding method, and are not described herein again.
Drawings
FIG. 1 is a schematic diagram of intent and slot relationships;
FIG. 2 is an exemplary schematic diagram of an FSA;
FIG. 3 is an exemplary schematic diagram of an FST;
FIG. 4 is an exemplary diagram of a WFST;
fig. 5 is a schematic structural diagram of an electronic device provided in an embodiment of the present application;
FIG. 6 is a block diagram of a software architecture of an electronic device according to an embodiment of the present application;
FIG. 7 is a schematic diagram of an architecture of an intent recognition module in an embodiment of the present application;
FIG. 8 is a schematic diagram of a usage scenario of an intention identification method in an embodiment of the present application;
FIG. 9 is an exemplary diagram of an FST intent slot model in an embodiment of the present application;
FIG. 10 is a flow chart illustrating an intent recognition method in an embodiment of the present application;
FIG. 11 is another flow chart illustrating an intent recognition method in an embodiment of the present application;
FIG. 12 is an exemplary diagram of a WFST intent slot model in an embodiment of the present application;
FIG. 13 is another exemplary diagram of a WFST intended slot model in an embodiment of the present application;
FIG. 14 is another flow chart illustrating an intent recognition method in an embodiment of the present application;
FIG. 15 is an exemplary diagram of multiple predefined WFST intention slot model parallel rule matching for a second text in the embodiment of the present application;
fig. 16 is an exemplary diagram illustrating a case where different settings are applied to the weight and the cumulative weight in the preset WFST intended slot model in the embodiment of the present application.
Detailed Description
The terminology used in the following embodiments of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the present application. As used in the specification of the present application and the appended claims, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the listed items.
In the following, the terms "first", "second" are used for descriptive purposes only and are not to be understood as implying or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature, and in the description of embodiments of the application, unless stated otherwise, "plurality" means two or more.
Since the embodiments of the present application relate to the application of the intention recognition technology, for the convenience of understanding, related terms and concepts related to the embodiments of the present application will be described below.
(1) Intent and slot position:
1.1, definition of intent and slot:
intent, means that the electronic device identifies what the user's actual or potential needs are. Fundamentally, the intent is a classifier that classifies the user's needs into certain categories.
The intent and the slot together constitute a "user action" that the electronic device cannot directly understand the natural language, and thus the intent recognition functions to map the natural language into a machine-understandable structured semantic representation.
The intention recognition is also called suc (spoken utterances classification), and as the name implies, the natural language conversation input by the user is classified into categories (classification), and the classified categories correspond to the user intention. For example, "how the weather is today", the intention is to "ask for weather". Naturally, intent recognition can be seen as a typical classification problem. For example, the intended classification and definition may refer to the ISO-24617-2 standard, where there are 56 detailed definitions. The definition of the intention has a great relationship with the location of the system itself and the knowledge base it has, i.e. the definition of the intention has a very strong domain relevance. It is to be understood that in the embodiments of the present application, the intended classification and definition is not limited to the ISO-24617-2 standard.
Slot position, i.e. the parameter with which the intent is taken. An intention may correspond to several slots, for example, when inquiring about a bus route, the necessary parameters of departure, destination, time, etc. need to be given. The above parameter is the slot corresponding to the intention of "inquiring bus route".
For example, the main goal of the semantic slot filling task is to extract the values of predefined semantic slots in a semantic frame (semantic frame) from an input sentence on the premise that the semantic frame is known for a specific domain or a specific intent. The semantic slot filling task can be converted into a sequence labeling task, namely, a classical IOB labeling method is used for labeling a start (begin), a continuation (inside) or a non-semantic slot (outside) of a certain semantic slot.
To make a system work properly, the intent and slot are first designed. The intent and slot position allow the system to know which particular task should be performed and to give the type of parameters needed to perform the task.
Taking a specific requirement of 'inquiring weather' as an example, the design of intents and slots in a task-oriented dialog system is introduced:
an example of user input is: "how much the weather is today in the Shanghai";
the user intent defines: ask for Weather, Ask _ Weather;
slot position definition: a first slot position: time, Date; a second slot position: location, Location.
Fig. 1 is a schematic diagram of an intention and a slot position relationship in an embodiment of the present application. As shown in fig. 1 (a), in this example, two necessary slots are defined for the "ask for weather" task, which are "time" and "location", respectively. For a single task, the above definition can solve the task requirement. However, in a real business environment, a system is often required to be able to handle several tasks simultaneously, for example, a weather station should be able to answer the question of "asking the weather" as well as the question of "asking the temperature".
For the complex situation that the same system handles multiple tasks, one optimized strategy is to define a higher-level domain, such as to attribute the "inquire weather" intention and the "inquire temperature" intention to the "weather" domain. In this case, the domain can be simply understood as a set of intentions. The advantages of defining the domain and performing domain identification first are that the domain knowledge range can be constrained, and the search space for subsequent intent identification and slot filling is reduced. In addition, for each domain, with specific knowledge and characteristics related to tasks and domains, the effect of Natural Language Understanding (NLU) can be improved remarkably. Accordingly, the example of fig. 1 (a) is modified to add to the "weather" field:
an example of user input is:
"how much today's Shanghai weather;
"how much temperature is in the present in Shanghai";
domain definition: weather, Weather;
the user intent defines:
1. ask for Weather, Ask _ Weather;
2. query Temperature, Ask _ Temperature;
slot position definition:
a first slot position: time, Date;
a second slot position: location, Location.
The modified "ask for weather" requirement corresponds to the intention and slot position as shown in fig. 1 (b).
8.2, intention identification and slot filling:
after the intent and slot position are defined, the user intent and the slot value corresponding to the corresponding slot can be identified from the user input.
The goal of intent recognition is to recognize the user intent from the input, and a single task can be modeled simply as a two-class question, such as a "ask weather" intent, which can be modeled as a "ask weather" or "not ask weather" two-class question at the time of intent recognition. When it comes to requiring a system to handle a variety of tasks, the system needs to be able to discriminate between the various intents, in which case the two-classification problem translates into a multiple-classification problem.
The task of slot filling is to extract information from the data and fill in slots defined in advance, for example, in fig. 1, intentions and corresponding slots have been defined, and for the user input "what is the weather today and shanghai" the system should be able to extract and fill "today" and "shanghai" slots to "time" and "location" slots, respectively.
(2) Finite State Transducer (FST):
FST is currently widely used in speech recognition and natural language search, processing, etc. For example, in natural language processing, some operations are often encountered to modify text content according to rules. For example, one rule is: if c is immediately followed by x in the string, c is modified to b. The FST is based on mathematical operations on these rules to integrate several rules into a single large rule to effectively improve the efficiency of the rule-based system.
To facilitate understanding of the FST, a finite state receiver (FSA) is first introduced below:
for a given input sequence, the FSA returns both "receive" and "not receive" states.
Fig. 2 is an exemplary diagram of an FSA, in which nodes and arcs correspond to states and state transitions, respectively. In the FSA shown in fig. 2, state 0 represents the initial state and state 5 represents the termination state. State 0 to state 1 may accept character a, state 1 to state 1 may accept character b, state 1 to state 2 may accept character c, state 2 to state 5 may accept character d, state 0 to state 3 may accept character b, state 3 to state 4 may accept character c, state 4 to state 4 may accept character d, and state 4 to state 5 may accept character e. The regular expression corresponding to the FSA described in fig. 2 is: ab cd bcd e, where indicates that the previous character can be repeated any number of times.
For example, the FSA shown in fig. 2 may receive a symbol sequence "a, b, c, d" over paths 0,1, 1,2, 5, at which point the FSA may return to the "receive" state. For another example, if the "a, b, d" sequence is entered in the FSA shown in fig. 2, the FSA will return to the "not received" state because no path in the FSA shown in fig. 2 can get the sequence.
FST is an extension of FSA, which has an output tag, called an input-output tag pair, at each state transition. Fig. 2 is an exemplary diagram of an FST. In the FST shown in fig. 3, state 0 represents the initial state and state 5 represents the termination state. The input-output tag pair of state 0 to state 1 is a: z; the input-output tag pair of state 1 to state 1 is b: y; the input-output tag pair of state 1 to state 2 is c: x; the input-output tag pairs of state 2 to state 5 are d: w; the input-output tag pair of state 0 to state 3 is b: y; the input-output tag pair of state 3 to state 4 is c: x; the input-output tag pair of state 4 to state 4 is d: w; the input-output tag pair of state 4 to state 5 is e: v.
With such a tag pair, the FST may describe the conversion of a set of rules or the conversion of a set of symbol sequences into another set of conforming sequences. As shown in fig. 2, the input symbol sequence "a, b, c, d" passes through path 0,1, 1,2, 5, and another symbol sequence "z, y, x, w" is obtained because a is converted to z, b is converted to y, c is converted to x, and d is converted to w.
FST is an efficient data structure, the underlying theory of which is based on Graph theory in data structures. The FST is classified into a Non-Deterministic (Non-Deterministic) FST and a Deterministic (Deterministic) FST.
A deterministic FST is a 7-tuple: (Q, Σ, Γ, δ, ω, Q)0F), wherein:
(definition 1) Q is a finite set of states (states);
(definition 2) Σ is a limited set of input characters (alphabet);
(definition 3) Γ is a limited set of output characters (output alphabet);
(definition 4) δ: Q × Σ → p (Q) is a state transfer function;
(definition 5) ω: Q × Σ → Γ is the output function;
(definition 6) q0e.Q is an initial state;
(definition 7)
Figure BDA0002544192610000081
Is the set of accepted states.
The non-deterministic FST is also a 7-tuple, but it may have multiple choices when making state transitions, so definition 4, definition 5 and deterministic FST differ, where δ and ω are defined as follows:
(definition 4) δ Q × Σ { [ epsilon } → p (Q) is a state transfer function;
(definition 5) ω: Q × Σ { [ ε } x Q → Γ*Is an output function.
As defined above, an FST has multiple states and multiple state transitions, starting from an initial state, through multiple accepting states, and finally to an ending state to complete a matching operation.
The typical characteristics of FST are flexible use, high matching efficiency and low memory overhead.
For example: there are three rules:
1. when c is immediately followed by x, change c to b: cx → bx;
2. when a is preceded by rs, change a to b: rsa → rsb;
3. when b is preceded by rs and followed by xy, b is changed to a: rsbxy → rsaxy.
When the input string is rsaxyyrscxy, it is transformed according to the 3 rules as follows:
1、rsaxyrscxy→rsaxyrsbxy;
2、rsaxyrsbxy→rsbxyrsbxy;
3、rsbxyrsbxy→rsaxyrsaxy。
it has been found that transforming in this single regular manner is less efficient. The transformation made in the second step is changed back in the third step. The second step of conversion, which is equivalent to doing, is not meaningful. FST then appears, providing a way to eliminate these inefficient processes. After the rules are combined into one FST, the FST can achieve the effects of the three rules, and only one traversal through the FST is needed, so that inefficient conversion is not generated. The time spent by the original tasks based on these rules is related to the number of rules, the length, the number of characters entered, whereas the time spent using FST is related only to the number of characters entered into the string.
(3) Weighted Finite State Transducer (WFST):
WFST is a type of FST. There is a weight at each state transition, an initial weight at each initial state, and a termination weight at each termination state. The weight is typically the probability or loss of a transition or initial/terminal state. The weights are accumulated along each path and over different paths.
The calculation method of the cumulative weight may be specified by a specific WFST, and for example, the cumulative weight may be a multiplication of all the weights on the passing path, a summation of all the weights on the passing path, or another calculation method of the weights on the passing path, and is not limited herein.
Fig. 4 is an exemplary diagram of a WFST. Here, each state transition tag is represented by "input-tag: the output-tag/weight' form is transferred, and the initial state and the termination state have corresponding weights. In WFST shown in fig. 4, the cumulative weight is calculated by multiplying all weights on the passing path. If the character sequence "a, b, c, d" is input, the character sequence "z, y, x, w" is converted into the character sequence "z, y, x, w" according to the path 0,1, 1,2, 5 with the cumulative weight of 0.5 × 1.2 × 0.7 × 3 × 2 × 0.1 ═ 0.252.
WFST can be defined by an 8-tuple (∑, Λ, Q, I, F, E, λ, ρ):
(definition 1) Σ is a limited set of input tags;
(definition 2) Λ is a finite set of output tags;
(definition 3) Q is a finite set of states;
(definition 4)
Figure BDA0002544192610000091
Is a set of initial states;
(definition 5)
Figure BDA0002544192610000092
Is a set of termination states;
(definition 6)
Figure BDA0002544192610000093
Are groups of finite transitions; wherein 'e' is a meta-symbol tag representing unsigned input and output; k is a set of weight elements;
(definition 7) λ: I → K is a weight initial function;
(definition 8) ρ: F → K is a weight termination function.
Illustratively, with the above definitions, the WFST shown in fig. 4 may be defined as follows:
(definition 1) ∑ a, { a, b, c, d, e };
(definition 2) Λ ═ { v, x, y, w, z };
(definition 3) Q ═ {0,1,2,3,4,5 };
(definition 4) I ═ {0 };
(definition 5) F ═ {5 };
(definition 6)
Figure BDA0002544192610000094
Wherein each transition in E consists of (source state, input label, output label, weight, target state);
(definition 7) λ (0) ═ 0.5;
(definition 8) ρ (5) ═ 0.1.
An exemplary electronic device 100 provided by embodiments of the present application is first described below.
Fig. 5 is a schematic structural diagram of an electronic device 100 according to an embodiment of the present application.
The following describes an embodiment specifically by taking the electronic device 100 as an example. It should be understood that electronic device 100 may have more or fewer components than shown, may combine two or more components, or may have a different configuration of components. The various components shown in the figures may be implemented in hardware, software, or a combination of hardware and software, including one or more signal processing and/or application specific integrated circuits.
The electronic device 100 may include: the mobile terminal includes a processor 110, an external memory interface 120, an internal memory 121, a Universal Serial Bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, a sensor module 180, a button 190, a motor 191, an indicator 192, a camera 193, a display screen 194, a Subscriber Identity Module (SIM) card interface 195, and the like. The sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, a bone conduction sensor 180M, and the like.
It is to be understood that the illustrated structure of the embodiment of the present invention does not specifically limit the electronic device 100. In other embodiments of the present application, electronic device 100 may include more or fewer components than illustrated, or some components may be combined, some components may be separated, or a different arrangement of components may be used. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.
Processor 110 may include one or more processing units, such as: the processor 110 may include an Application Processor (AP), a modem processor, a Graphics Processing Unit (GPU), an Image Signal Processor (ISP), a controller, a memory, a video codec, a Digital Signal Processor (DSP), a baseband processor, and/or a neural-Network Processing Unit (NPU), etc. The different processing units may be separate devices or may be integrated into one or more processors.
The controller may be, among other things, a neural center and a command center of the electronic device 100. The controller can generate an operation control signal according to the instruction operation code and the timing signal to complete the control of instruction fetching and instruction execution.
A memory may also be provided in processor 110 for storing instructions and data. In some embodiments, the memory in the processor 110 is a cache memory. The memory may hold instructions or data that have just been used or recycled by the processor 110. If the processor 110 needs to reuse the instruction or data, it can be called directly from the memory. Avoiding repeated accesses reduces the latency of the processor 110, thereby increasing the efficiency of the system.
In some embodiments, processor 110 may include one or more interfaces. The interface may include an integrated circuit (I2C) interface, an integrated circuit built-in audio (I2S) interface, a Pulse Code Modulation (PCM) interface, a universal asynchronous receiver/transmitter (UART) interface, a Mobile Industry Processor Interface (MIPI), a general-purpose input/output (GPIO) interface, a Subscriber Identity Module (SIM) interface, and/or a Universal Serial Bus (USB) interface, etc.
The I2C interface is a bi-directional synchronous serial bus that includes a serial data line (SDA) and a Serial Clock Line (SCL). In some embodiments, processor 110 may include multiple sets of I2C buses. The processor 110 may be coupled to the touch sensor 180K, the charger, the flash, the camera 193, etc. through different I2C bus interfaces, respectively. For example: the processor 110 may be coupled to the touch sensor 180K via an I2C interface, such that the processor 110 and the touch sensor 180K communicate via an I2C bus interface to implement the touch functionality of the electronic device 100.
The I2S interface may be used for audio communication. In some embodiments, processor 110 may include multiple sets of I2S buses. The processor 110 may be coupled to the audio module 170 via an I2S bus to enable communication between the processor 110 and the audio module 170. In some embodiments, the audio module 170 may communicate audio signals to the wireless communication module 160 via the I2S interface, enabling answering of calls via a bluetooth headset.
The PCM interface may also be used for audio communication, sampling, quantizing and encoding analog signals. In some embodiments, the audio module 170 and the wireless communication module 160 may be coupled by a PCM bus interface. In some embodiments, the audio module 170 may also transmit audio signals to the wireless communication module 160 through the PCM interface, so as to implement a function of answering a call through a bluetooth headset. Both the I2S interface and the PCM interface may be used for audio communication.
The UART interface is a universal serial data bus used for asynchronous communications. The bus may be a bidirectional communication bus. It converts the data to be transmitted between serial communication and parallel communication. In some embodiments, a UART interface is generally used to connect the processor 110 with the wireless communication module 160. For example: the processor 110 communicates with a bluetooth module in the wireless communication module 160 through a UART interface to implement a bluetooth function. In some embodiments, the audio module 170 may transmit the audio signal to the wireless communication module 160 through a UART interface, so as to realize the function of playing music through a bluetooth headset.
MIPI interfaces may be used to connect processor 110 with peripheral devices such as display screen 194, camera 193, and the like. The MIPI interface includes a Camera Serial Interface (CSI), a Display Serial Interface (DSI), and the like. In some embodiments, the processor 110 and the camera 193 communicate through a CSI interface to implement the capture functionality of the electronic device 100. The processor 110 and the display screen 194 communicate through the DSI interface to implement the display function of the electronic device 100.
The GPIO interface may be configured by software. The GPIO interface may be configured as a control signal and may also be configured as a data signal. In some embodiments, a GPIO interface may be used to connect the processor 110 with the camera 193, the display 194, the wireless communication module 160, the audio module 170, the sensor module 180, and the like. The GPIO interface may also be configured as an I2C interface, an I2S interface, a UART interface, a MIPI interface, and the like.
The SIM interface may be used to communicate with the SIM card interface 195, implementing functions to transfer data to or read data from the SIM card.
The USB interface 130 is an interface conforming to the USB standard specification, and may specifically be a Mini USB interface, a Micro USB interface, a USB Type C interface, or the like. The USB interface 130 may be used to connect a charger to charge the electronic device 100, and may also be used to transmit data between the electronic device 100 and a peripheral device. And the earphone can also be used for connecting an earphone and playing audio through the earphone. The interface may also be used to connect other electronic devices, such as AR devices and the like.
It should be understood that the connection relationship between the modules according to the embodiment of the present invention is only illustrative, and is not limited to the structure of the electronic device 100. In other embodiments of the present application, the electronic device 100 may also adopt different interface connection manners or a combination of multiple interface connection manners in the above embodiments.
The charging management module 140 is configured to receive charging input from a charger. The charger may be a wireless charger or a wired charger.
The power management module 141 is used to connect the battery 142, the charging management module 140 and the processor 110. The power management module 141 receives input from the battery 142 and/or the charge management module 140 and provides power to the processor 110, the internal memory 121, the external memory, the display 194, the camera 193, the wireless communication module 160, and the like.
The wireless communication function of the electronic device 100 may be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, a modem processor, a baseband processor, and the like.
The antennas 1 and 2 are used for transmitting and receiving electromagnetic wave signals. Each antenna in electronic device 100 may be used to cover a single or multiple communication bands. Different antennas can also be multiplexed to improve the utilization of the antennas. For example: the antenna 1 may be multiplexed as a diversity antenna of a wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.
The mobile communication module 150 may provide a solution including 2G/3G/4G/5G wireless communication applied to the electronic device 100. The mobile communication module 150 may include at least one filter, a switch, a power amplifier, a Low Noise Amplifier (LNA), and the like. The mobile communication module 150 may receive the electromagnetic wave from the antenna 1, filter, amplify, etc. the received electromagnetic wave, and transmit the electromagnetic wave to the modem processor for demodulation. The mobile communication module 150 may also amplify the signal modulated by the modem processor, and convert the signal into electromagnetic wave through the antenna 1 to radiate the electromagnetic wave. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be disposed in the processor 110. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be disposed in the same device as at least some of the modules of the processor 110.
The modem processor may include a modulator and a demodulator. The modulator is used for modulating a low-frequency baseband signal to be transmitted into a medium-high frequency signal. The demodulator is used for demodulating the received electromagnetic wave signal into a low-frequency baseband signal. The demodulator then transmits the demodulated low frequency baseband signal to a baseband processor for processing. The low frequency baseband signal is processed by the baseband processor and then passed to the application processor. The application processor outputs a sound signal through an audio device (not limited to the speaker 170A, the receiver 170B, etc.) or displays an image or video through the display screen 194. In some embodiments, the modem processor may be a stand-alone device. In other embodiments, the modem processor may be provided in the same device as the mobile communication module 150 or other functional modules, independent of the processor 110.
The wireless communication module 160 may provide a solution for wireless communication applied to the electronic device 100, including Wireless Local Area Networks (WLANs) (e.g., wireless fidelity (Wi-Fi) networks), bluetooth (bluetooth, BT), Global Navigation Satellite System (GNSS), Frequency Modulation (FM), Near Field Communication (NFC), Infrared (IR), and the like. The wireless communication module 160 may be one or more devices integrating at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via the antenna 2, performs frequency modulation and filtering processing on electromagnetic wave signals, and transmits the processed signals to the processor 110. The wireless communication module 160 may also receive a signal to be transmitted from the processor 110, perform frequency modulation and amplification on the signal, and convert the signal into electromagnetic waves through the antenna 2 to radiate the electromagnetic waves.
In some embodiments, antenna 1 of electronic device 100 is coupled to mobile communication module 150 and antenna 2 is coupled to wireless communication module 160 so that electronic device 100 can communicate with networks and other devices through wireless communication techniques. The wireless communication technology may include global system for mobile communications (GSM), General Packet Radio Service (GPRS), code division multiple access (code division multiple access, CDMA), Wideband Code Division Multiple Access (WCDMA), time-division code division multiple access (time-division code division multiple access, TD-SCDMA), Long Term Evolution (LTE), BT, GNSS, WLAN, NFC, FM, and/or IR technologies, etc. The GNSS may include a Global Positioning System (GPS), a global navigation satellite system (GLONASS), a beidou navigation satellite system (BDS), a quasi-zenith satellite system (QZSS), and/or a Satellite Based Augmentation System (SBAS).
The electronic device 100 implements display functions via the GPU, the display screen 194, and the application processor. The GPU is a microprocessor for image processing, and is connected to the display screen 194 and the application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. The processor 110 may include one or more GPUs that execute program instructions to generate or alter display information.
The display screen 194 is used to display images, video, and the like. The display screen 194 includes a display panel. The display panel may adopt a Liquid Crystal Display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode (active-matrix organic light-emitting diode, AMOLED), a flexible light-emitting diode (FLED), a miniature, a Micro-oeld, a quantum dot light-emitting diode (QLED), and the like. In some embodiments, the electronic device 100 may include 1 or N display screens 194, with N being a positive integer greater than 1.
The electronic device 100 may implement a shooting function through the ISP, the camera 193, the video codec, the GPU, the display screen 194, and the application processor, etc.
The ISP is used to process the data fed back by the camera 193. For example, when a photo is taken, the shutter is opened, light is transmitted to the camera photosensitive element through the lens, the optical signal is converted into an electrical signal, and the camera photosensitive element transmits the electrical signal to the ISP for processing and converting into an image visible to naked eyes. The ISP can also carry out algorithm optimization on the noise, brightness and skin color of the image. The ISP can also optimize parameters such as exposure, color temperature and the like of a shooting scene. In some embodiments, the ISP may be provided in camera 193.
The camera 193 is used to capture still images or video. The object generates an optical image through the lens and projects the optical image to the photosensitive element. The photosensitive element may be a Charge Coupled Device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor. The light sensing element converts the optical signal into an electrical signal, which is then passed to the ISP where it is converted into a digital image signal. And the ISP outputs the digital image signal to the DSP for processing. The DSP converts the digital image signal into image signal in standard RGB, YUV and other formats. In some embodiments, the electronic device 100 may include 1 or N cameras 193, N being a positive integer greater than 1.
The digital signal processor is used for processing digital signals, and can process digital image signals and other digital signals. For example, when the electronic device 100 selects a frequency bin, the digital signal processor is used to perform fourier transform or the like on the frequency bin energy.
Video codecs are used to compress or decompress digital video. The electronic device 100 may support one or more video codecs. In this way, the electronic device 100 may play or record video in a variety of encoding formats, such as: moving Picture Experts Group (MPEG) 1, MPEG2, MPEG3, MPEG4, and the like.
The NPU is a neural-network (NN) computing processor that processes input information quickly by using a biological neural network structure, for example, by using a transfer mode between neurons of a human brain, and can also learn by itself continuously. Applications such as intelligent recognition of the electronic device 100 can be implemented by the NPU, for example: image recognition, face recognition, speech recognition, text understanding, and the like.
The external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to extend the memory capability of the electronic device 100. The external memory card communicates with the processor 110 through the external memory interface 120 to implement a data storage function. For example, files such as music, video, etc. are saved in an external memory card.
The internal memory 121 may be used to store computer-executable program code, which includes instructions. The processor 110 executes various functional applications of the electronic device 100 and data processing by executing instructions stored in the internal memory 121. The internal memory 121 may include a program storage area and a data storage area. The storage program area may store an operating system, an application (such as a face recognition function, a fingerprint recognition function, a mobile payment function, and the like) required by at least one function, and the like. The storage data area may store data (such as face information template data, fingerprint information template, etc.) created during the use of the electronic device 100, and the like. Further, the internal memory 121 may include a high-speed random access memory, and may further include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, a universal flash memory (UFS), and the like.
The electronic device 100 may implement audio functions via the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the headphone interface 170D, and the application processor. Such as music playing, recording, etc.
The audio module 170 is used to convert digital audio information into an analog audio signal for output, and also used to convert an analog audio input into a digital audio signal. The audio module 170 may also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be disposed in the processor 110, or some functional modules of the audio module 170 may be disposed in the processor 110.
The speaker 170A, also called a "horn", is used to convert the audio electrical signal into an acoustic signal. The electronic apparatus 100 can listen to music through the speaker 170A or listen to a handsfree call.
The receiver 170B, also called "earpiece", is used to convert the electrical audio signal into an acoustic signal. When the electronic apparatus 100 receives a phone call or voice information, it can receive a voice by placing the receiver 170B close to the ear of the person.
The microphone 170C, also referred to as a "microphone," is used to convert sound signals into electrical signals. When making a call or sending voice information, the user can input a voice signal to the microphone 170C by speaking the user's mouth near the microphone 170C. The electronic device 100 may be provided with at least one microphone 170C. In other embodiments, the electronic device 100 may be provided with two microphones 170C to achieve a noise reduction function in addition to collecting sound signals. In other embodiments, the electronic device 100 may further include three, four or more microphones 170C to collect sound signals, reduce noise, identify sound sources, perform directional recording, and so on.
The headphone interface 170D is used to connect a wired headphone. The headset interface 170D may be the USB interface 130, or may be a 3.5mm open mobile electronic device platform (OMTP) standard interface, a cellular telecommunications industry association (cellular telecommunications industry association of the USA, CTIA) standard interface.
The pressure sensor 180A is used for sensing a pressure signal, and converting the pressure signal into an electrical signal. In some embodiments, the pressure sensor 180A may be disposed on the display screen 194. The pressure sensor 180A can be of a wide variety, such as a resistive pressure sensor, an inductive pressure sensor, a capacitive pressure sensor, and the like. The capacitive pressure sensor may be a sensor comprising at least two parallel plates having an electrically conductive material. When a force acts on the pressure sensor 180A, the capacitance between the electrodes changes. The electronic device 100 determines the strength of the pressure from the change in capacitance. When a touch operation is applied to the display screen 194, the electronic apparatus 100 detects the intensity of the touch operation according to the pressure sensor 180A. The electronic apparatus 100 may also calculate the touched position based on the detection signal of the pressure sensor 180A. In some embodiments, touch operations that are applied to the same touch position but with different touch operation intensities may correspond to different operation instructions. For example: and when the touch operation with the touch operation intensity smaller than the first pressure threshold value acts on the short message application icon, executing an instruction for viewing the short message. And when the touch operation with the touch operation intensity larger than or equal to the first pressure threshold value acts on the short message application icon, executing an instruction of newly building the short message.
The gyro sensor 180B may be used to determine the motion attitude of the electronic device 100. In some embodiments, the angular velocity of electronic device 100 about three axes (i.e., the x, y, and z axes) may be determined by gyroscope sensor 180B. The gyro sensor 180B may be used for photographing anti-shake. For example, when the shutter is pressed, the gyro sensor 180B detects a shake angle of the electronic device 100, calculates a distance to be compensated for by the lens module according to the shake angle, and allows the lens to counteract the shake of the electronic device 100 through a reverse movement, thereby achieving anti-shake. The gyroscope sensor 180B may also be used for navigation, somatosensory gaming scenes.
The air pressure sensor 180C is used to measure air pressure. In some embodiments, electronic device 100 calculates altitude, aiding in positioning and navigation, from barometric pressure values measured by barometric pressure sensor 180C.
The magnetic sensor 180D includes a hall sensor. The electronic device 100 may detect the opening and closing of the flip holster using the magnetic sensor 180D. In some embodiments, when the electronic device 100 is a flip phone, the electronic device 100 may detect the opening and closing of the flip according to the magnetic sensor 180D. And then according to the opening and closing state of the leather sheath or the opening and closing state of the flip cover, the automatic unlocking of the flip cover is set.
The acceleration sensor 180E may detect the magnitude of acceleration of the electronic device 100 in various directions (typically three axes). The magnitude and direction of gravity can be detected when the electronic device 100 is stationary. The method can also be used for recognizing the posture of the electronic equipment, and is applied to horizontal and vertical screen switching, pedometers and other applications.
A distance sensor 180F for measuring a distance. The electronic device 100 may measure the distance by infrared or laser. In some embodiments, taking a picture of a scene, electronic device 100 may utilize range sensor 180F to range for fast focus.
The proximity light sensor 180G may include, for example, a Light Emitting Diode (LED) and a light detector, such as a photodiode. The light emitting diode may be an infrared light emitting diode. The electronic device 100 emits infrared light to the outside through the light emitting diode. The electronic device 100 detects infrared reflected light from a nearby object using a photodiode. When sufficient reflected light is detected, it can be determined that there is an object near the electronic device 100. When insufficient reflected light is detected, the electronic device 100 may determine that there are no objects near the electronic device 100. The electronic device 100 can utilize the proximity light sensor 180G to detect that the user holds the electronic device 100 close to the ear for talking, so as to automatically turn off the screen to achieve the purpose of saving power. The proximity light sensor 180G may also be used in a holster mode, a pocket mode automatically unlocking and locking the screen.
The ambient light sensor 180L is used to sense the ambient light level. The electronic device 100 may adaptively adjust the brightness of the display screen 194 according to the perceived ambient light level. The ambient light sensor 180L may also be used to automatically adjust the white balance when taking a picture. The ambient light sensor 180L may also cooperate with the proximity light sensor 180G to detect whether the electronic device 100 is in a pocket to prevent accidental touches.
The fingerprint sensor 180H is used to collect a fingerprint. The electronic device 100 may utilize the collected fingerprint characteristics to implement fingerprint unlocking, access to an application lock, fingerprint photographing, fingerprint incoming call answering, and the like.
The temperature sensor 180J is used to detect temperature. In some embodiments, electronic device 100 implements a temperature processing strategy using the temperature detected by temperature sensor 180J. For example, when the temperature reported by the temperature sensor 180J exceeds a threshold, the electronic device 100 performs a reduction in performance of a processor located near the temperature sensor 180J, so as to reduce power consumption and implement thermal protection. In other embodiments, the electronic device 100 heats the battery 142 when the temperature is lower than another threshold, so as to avoid abnormal shutdown of the electronic device 100 due to low temperature. In other embodiments, when the temperature is lower than a further threshold, the electronic device 100 performs boosting on the output voltage of the battery 142 to avoid abnormal shutdown due to low temperature.
The touch sensor 180K is also referred to as a "touch panel". The touch sensor 180K may be disposed on the display screen 194, and the touch sensor 180K and the display screen 194 form a touch screen, which is also called a "touch screen". The touch sensor 180K is used to detect a touch operation applied thereto or thereabout. The touch sensor can communicate the detected touch operation to the application processor to determine the touch event type. Visual output associated with the touch operation may be provided through the display screen 194. In other embodiments, the touch sensor 180K may be disposed on a surface of the electronic device 100, different from the position of the display screen 194.
The keys 190 include a power-on key, a volume key, and the like. The keys 190 may be mechanical keys. Or may be touch keys. The electronic device 100 may receive a key input, and generate a key signal input related to user setting and function control of the electronic device 100.
The motor 191 may generate a vibration cue. The motor 191 may be used for both an incoming call vibration prompt and a touch vibration feedback. For example, touch operations applied to different applications (e.g., photographing, audio playing, etc.) may correspond to different vibration feedback effects. The motor 191 may also respond to different vibration feedback effects for touch operations applied to different areas of the display screen 194. Different application scenes (such as time reminding, receiving information, alarm clock, game and the like) can also correspond to different vibration feedback effects. The touch vibration feedback effect may also support customization.
Indicator 192 may be an indicator light that may be used to indicate a state of charge, a change in charge, or a message, missed call, notification, etc.
The SIM card interface 195 is used to connect a SIM card. The SIM card can be attached to and detached from the electronic apparatus 100 by being inserted into the SIM card interface 195 or being pulled out from the SIM card interface 195. The electronic device 100 may support 1 or N SIM card interfaces, N being a positive integer greater than 1. The SIM card interface 195 may support a Nano SIM card, a Micro SIM card, a SIM card, etc. The same SIM card interface 195 can be inserted with multiple cards at the same time. The types of the plurality of cards may be the same or different. The SIM card interface 195 may also be compatible with different types of SIM cards. The SIM card interface 195 may also be compatible with an external memory card. The electronic device 100 interacts with the network through the SIM card to implement functions such as communication and data communication.
In this embodiment, the electronic device 100 may receive the voice input and the environmental information of the user through the microphone 170C and the sensor module 180, and after the voice input of the user is converted into the digital audio information through the audio module 170, the voice input may be recognized by the processor 110 and converted into the text information. And then, the intention identification method in the embodiment of the application is executed, the intention and the slot position of the user are identified, and the structural semantic representation is used.
Fig. 6 is a block diagram of a software configuration of the electronic device 100 according to the embodiment of the present application.
The layered architecture divides the software into several layers, each layer having a clear role and division of labor. And the layers communicate with each other through a software interface. In some embodiments, the system is divided into four layers, an application layer, an application framework layer, a Runtime (Runtime) and system library, and a kernel layer from top to bottom.
The application layer may include a series of application packages.
As shown in fig. 6, the application package may include applications (also referred to as applications) such as camera, gallery, calendar, phone call, map, navigation, WLAN, bluetooth, music, video, short message, etc.
In the embodiment of the application, the application layer may further include an intention identification module. The intention identification module is used for executing the intention identification method in the embodiment of the application.
The application framework layer provides an Application Programming Interface (API) and a programming framework for the application program of the application layer. The application framework layer includes a number of predefined functions.
As shown in FIG. 6, the application framework layers may include a window manager, content provider, view system, phone manager, resource manager, notification manager, and the like.
The window manager is used for managing window programs. The window manager can obtain the size of the display screen, judge whether a status bar exists, lock the screen, intercept the screen and the like.
The content provider is used to store and retrieve data and make it accessible to applications. The data may include video, images, audio, calls made and answered, browsing history and bookmarks, phone books, and the like.
The view system includes visual controls such as controls to display text, controls to display pictures, and the like. The view system may be used to construct an application. The display interface may be composed of one or more views. For example, the display interface including the short message notification icon may include a view for displaying text and a view for displaying a picture.
The phone manager is used to provide communication functions of the electronic device 100. Such as management of call status (including on, off, etc.).
The resource manager provides various resources for the application, such as localized strings, icons, pictures, layout files, video files, and the like.
The notification manager enables the application to display notification information in the status bar, can be used to convey notification-type messages, can disappear automatically after a short dwell, and does not require user interaction. Such as a notification manager used to inform download completion, message alerts, etc. The notification manager may also be a notification that appears in the form of a chart or scroll bar text at the top status bar of the system, such as a notification of a background running application, or a notification that appears on the screen in the form of a dialog interface. Such as prompting text information in the status bar, sounding a prompt tone, vibrating the electronic device, flashing an indicator light, etc.
The Runtime (Runtime) includes a core library and a virtual machine. Runtime is responsible for scheduling and management of the system.
The core library comprises two parts: one part is a function which needs to be called by java language, and the other part is a core library of the system.
The application layer and the application framework layer run in a virtual machine. And executing java files of the application program layer and the application program framework layer into a binary file by the virtual machine. The virtual machine is used for performing the functions of object life cycle management, stack management, thread management, safety and exception management, garbage collection and the like.
The system library may include a plurality of functional modules. For example: surface managers (surface managers), Media Libraries (Media Libraries), three-dimensional graphics processing Libraries (e.g., OpenGL ES), two-dimensional graphics engines (e.g., SGL), and the like.
The surface manager is used to manage the display subsystem and provide a fusion of two-dimensional (2-dimensional, 2D) and three-dimensional (3-dimensional, 3D) layers for multiple applications.
The media library supports a variety of commonly used audio, video format playback and recording, and still image files, among others. The media library may support a variety of audio-video encoding formats, such as MPEG4, h.264, MP3, AAC, AMR, JPG, PNG, and the like.
The three-dimensional graphic processing library is used for realizing 3D graphic drawing, image rendering, synthesis, layer processing and the like.
The 2D graphics engine is a drawing engine for 2D drawing.
The kernel layer is a layer between hardware and software. The kernel layer at least comprises a display driver, a camera driver, an audio driver, a sensor driver and a virtual card driver.
The following describes exemplary workflow of the software and hardware of the electronic device 100 in connection with capturing a photo scene.
When the touch sensor 180K receives a touch operation, a corresponding hardware interrupt is issued to the kernel layer. The kernel layer processes the touch operation into an original input event (including touch coordinates, timestamp of the touch operation, and other information). The raw input events are stored at the kernel layer. And the application program framework layer acquires the original input event from the kernel layer and identifies the control corresponding to the input event. Taking the touch operation as a touch click operation, and taking a control corresponding to the click operation as a control of a camera application icon as an example, the camera application calls an interface of an application framework layer, starts the camera application, further starts a camera drive by calling an inner core layer, and captures a still image or a video through the camera 193.
Specifically, as shown in fig. 7, a schematic diagram of a framework of an intent recognition module in an embodiment of the present application is shown. The intent recognition module is an NLU engine 700.
The NLU engine 700 is used to perform semantic analysis on the user input, and output analysis results such as intention and slot position for other modules to use.
The NLU engine 700 includes a text preprocessing unit 701, a rule engine 702, a machine learning engine 703, an entity identification unit 704, an intention classification unit 705, and a slot filling unit 706.
The text preprocessing unit 701 is configured to preprocess a text input by a user, and mainly includes removing format symbols, such as punctuations and spaces, that are not needed for subsequent semantic analysis in the text. It will be appreciated that the text entered by the user is typically speech recognized from the user's speech information.
And the rule engine 702 is configured to perform rule matching on the preprocessed text input by the user according to the preset rule based on the FST, and perform high-frequency sentence pattern coverage to obtain a formatted text marked by the intentional drawing slot position format. The intention identifying method in the embodiment of the present application is mainly used for the construction of the rule engine 702.
And the machine learning engine 703 is configured to process the preprocessed text input by the user through machine learning to obtain a formatted text marked in the intentional slot position format.
An entity identification 704, configured to extract entity information from a formatted text of the intentional slot position format mark output by the rule engine 702 or the machine learning engine 703;
the intention classification 705 is used for extracting intention information from the formatted text of the intentional slot position format mark output by the rule engine 702 or the machine learning engine 703;
the slot filling 706 is used for extracting slot information from the formatted text of the intentional slot format mark output by the rule engine 702 or the machine learning engine 703.
Fig. 8 is a schematic view of a usage scenario of the intention identification method in the embodiment of the present application. After the user inputs "call dad" to the electronic device 100 by voice, the electronic device 100 recognizes the user input by voice, converts the user input into a text of the user input, performs intention recognition on the text of the user input by the NLU engine 700, and outputs an intention "call" and a slot "dad" to the dialing application. The dialing application dials the phone number of "dad" according to the intent and slot.
Compared with the conventional rule matching system based on the CFG parsing algorithm, in the process of performing intent recognition on the text input by the user by the NLU engine 700 in the electronic device 100, the actual engineering test result of the rule matching system based on the FST used in the intent recognition method according to the embodiment of the present application is shown in table 1 below:
hardware environment: a CPU: intel (R) Xeon (R) E5-2690 v2@3.00GHz, memory: 32 GB;
Figure BDA0002544192610000171
TABLE 1
As is apparent from the test results shown in table 1, when the rule matching system based on FST in the intent recognition method according to the embodiment of the present invention is used, the delay of rule matching is significantly reduced and the matching speed is greatly increased compared to the conventional rule matching system based on CFG analysis algorithm under the condition that other variables are controlled to be completely the same.
The intention identification method in the embodiment of the present application is specifically described below with reference to the software and hardware mechanisms of the above exemplary electronic device 100:
fig. 9 is an exemplary diagram of an FST intention slot model in the embodiment of the present application.
In the FST intended slot model shown in fig. 9, the intention information is inserted after the FST initial state, and the slot marking information is inserted before and after all slots, so that the intention information and the slot identification information are output during FST matching. The FST intention slot model includes states and transitions between states, specifically:
(1) the FST states are represented by circles plus state numbers;
(2) the FST states and transitions between states are represented by directed curve connections between them;
(3) the transition function between FST states is < key > on a directed edge: < value > map, when the FST accepts < key >, will output < value >. Where < eps > indicates acceptance of an empty character input (without consuming the currently entered string);
(4) state 0 and state 14 are the initial and ending states of the FST, respectively, with the other states being intermediate states.
The FST intended slot model shown in fig. 9 is manually written in advance, and the specific process may be as follows:
(1) writing FST rules. For example, the FST rule of the FST intent slot model shown in fig. 9 may be written as:
person ═ dad | "mom" | "xiaoming";
call _1 ═ call to "(": "<") person (":": contact > ");
call _2 ═ calls to "(": "<") person (":": contact > ")";
export CALL=(“”:”<CALL>”)(call_1|call_2);
(2) the FST rule is compiled into the FST intent slot model as shown in fig. 9 using a rule compiler.
In the intention identification method according to the embodiment of the present application, the following describes, with reference to the FST intention slot model shown in fig. 9, a process of NLU intention identification and slot filling for a text input by a user based on the FST intention slot model:
fig. 10 is a schematic flow chart of an intention identification method in the embodiment of the present application.
S1001, responding to voice input of a user, and converting the voice input into a first text by the electronic equipment;
illustratively, as shown in FIG. 8, when the user speaks into the electronic device 100: "call dad", electronic device 100 can convert the speech input to text: "call to father".
It is understood that the electronic device 100 may perform S1001 after receiving a certain trigger. For example, the electronic device 100 may perform step S1001 after detecting that the user turns on the voice assistant function; alternatively, the electronic device 100 may perform step S1001 after detecting that the user double-clicks the screen; alternatively, the electronic device 100 may execute step S1001 after being powered on, which is not limited herein.
S1002, the electronic equipment performs format preprocessing on the first text to obtain a second text;
the electronic equipment performs format preprocessing on the first text, and mainly aims to: and removing format characters which are useless in the subsequent FST intention slot position model matching in the first text. For example, spaces, punctuation marks, etc. in the first text are removed.
Illustratively, if the first text is: "call to xiaoming. "after the format preprocessing is performed in step S1002, the obtained second text is: "call to Xiaoming". The spaces and periods are removed.
It is understood that, in some embodiments, step S1003 may not be executed, and the first text obtained after step S1001 is executed may be executed directly without step S1002, in which case, the first text is the second text. Because relevant rules for format preprocessing of the first text can also be added into the FST intention slot model, the complexity of the FST intention slot model is increased. The specific scheme used can be selected according to the actual situation, and is not limited herein.
S1003, the electronic equipment uses a preset FST intention slot model to perform rule matching on the second text to obtain a third text;
the preset FST intention slot position model is a preset FST; the preset FST intention slot model adds preset intention marking information and/or preset slot marking information in an input text in the state conversion process, wherein the preset intention marking information is used for marking out the intention information of the input text, and the preset slot marking information is used for marking out the slot information in the input text.
It can be understood that in some preset FST intention slot models, the position of the intention can be marked only by preset intention marking information; in some preset FST intention slot models, intention information can be directly added into an input text, and the position of the intention information is marked by using preset intention marking information; in some preset FST intention slot models, the position of an intention can be marked by preset intention marking information, and the position of slot information can be marked by the preset slot marking information; in some preset FST intention slot models, intention information can be directly added into an input text, the position of the intention information is marked by using the preset intention marking information, and the position of the slot information is marked by using the preset slot marking information. And is not limited herein.
And the third text is the text which contains the preset intention marking information and/or the preset slot position marking information after the rule matching is carried out on the second text.
Exemplarily, the second text is: "call to xiaoming", using the matching of rules in the FST intention slot model shown in fig. 9 as an example, the matching process is described, and the FST intention slot model shown in fig. 9 is simply referred to as FST as follows:
A) the FST receives the null character, outputs intention information < CALL >, and is transferred from an initial state 0 to a state 1;
B) the FST receives the character typing, outputs the character typing and transfers the state from the state 1 to the state 2;
C) the FST receives the character 'electricity', outputs the character 'electricity', and transits from the state 2 to the state 3;
D) the FST receives the character 'word' and outputs the character 'word', and the state 3 transits to the state 4;
E) the FST receives the character 'give', outputs the character 'give', and transits from the state 4 to the state 5;
F) the FST receives the null character, outputs a slot position marking information prefix "<", and transfers from the state 5 to the state 6;
G) the FST accepts the character 'small', outputs the character 'small', and transits from the state 6 to the state 7;
H) the FST receives the character 'bright' and outputs the character 'bright', and the state 7 transits to a state 10;
I) the FST receives the null character, outputs a slot position marking information suffix ": contact >", and transfers from the state 10 to the termination state 14, thereby completing the matching process;
J) and outputting FST matching results, such as: "< CALL > CALLs < Xiaoming: contact >".
The outputted < CALL > CALLs < Xiaoming: contact > "to obtain a third text after rule matching.
Similarly, according to the FST intention slot model shown in fig. 9, if the second text input is "call for xiao ming", the third text output is: "< CALL > CALLs < Xiaoming: contact >, and the detailed matching process is not described herein.
As can be seen from the above example, the third text will include preset intention tagging information, such as < >, and preset slot tagging information, such as < contact >. The preset slot position marking information <: contact > marks the slot position in the input second text as well as marks the position of the slot in the input second text as well.
It is to be appreciated that FIG. 9 is an exemplary FST intended slot model. In practical application, other preset symbols can be used as the preset intention labeling information and the preset slot position labeling information according to actual requirements, which is not limited herein.
S1004, the electronic equipment acquires intention information and/or slot position information from the third text according to preset intention marking information and/or preset slot position marking information;
after the electronic device obtains the third text containing the preset intention labeling information and the preset slot position labeling information, the intention information and the slot position information can be extracted from the third text according to the positions of the preset intention labeling information and the preset slot position labeling information.
Illustratively, for the third text: the < CALL > CALLs < Xiaoming: contact > ", and the electronic equipment can extract the intention information as CALL according to the preset intention marking information < >; according to the preset slot marking information < contact >, the slot information can be extracted as Xiaoming.
S1005, the electronic device performs structured output of the intention information and/or the slot position information.
After obtaining the intent information and the slot position information, the electronic device may perform structured output of the intent information and the slot position information for use by other modules in the electronic device. The structured output may also include other information, such as, but not limited to, the first text, the second text, etc.
Illustratively, for the first text: "call to xiaoming. After steps S1001 to S1004, if the intention information is CALL and the slot information is small, the intention information, the slot information, and the second text may be combined into a structured output for use by other modules in the electronic device 100. For example, JSON "{" text ": CALL to Ming, intent": CALL "," slots "[ {" slotted name ": contact", "slotted value": Ming "} may be output.
In the embodiment of the application, the electronic equipment performs intention identification on user input by using the preset FST intention slot model, the preset FST intention slot model is an FST, and rule matching is performed very fast based on the characteristic of FST rule matching. And after matching, a third text with preset intention marking information and preset slot position marking information can be obtained, intention information and slot position information can be conveniently extracted from the third text after matching is completed, and matching speed and accuracy during intention identification by using an NLU rule are greatly improved.
In practical applications, since the electronic device 100 needs to identify a great many different kinds of intentions, when constructing the FST intention slot model, a strategy of attempting to construct an FST intention slot model is generally adopted, so that the problem of rule conflicts between different intentions can be effectively reduced. When rule matching is carried out, a mode of multi-purpose parallel matching can be adopted, and matching time delay is reduced to the maximum extent. Therefore, a very large number of preset FST intention slot models can be stored in the electronic device 100.
However, when multiple preset FST intent slot models are matched for one user input at the same time, multiple outputs may occur. At this time, in order to facilitate the electronic device to determine the reliability of the outputted intention information and slot information, the preset FST intention slot model may be a preset WFST intention slot model. Since each state transition in WFST will have a weight, the final match will get the cumulative weight.
Therefore, after a plurality of preset WFST intention slot models are matched for one user input, a plurality of output results and scores corresponding to each output result can be obtained. At this time, the electronic device may extract the intention information and the slot position information from the output result with the highest score.
Fig. 11 is another schematic flow chart of the intention identifying method according to the embodiment of the present application.
First, offline rule compilation.
(1) The language material producer or the research and development personnel compile WFST rules of various intentions through a front-end rule editor and adopt an organization mode of one intention and one WFST.
For example, write the following WFST rules:
WFST rule 1:
word=wildcard(1)<1>;
rule _1 ═ open "<" "" WeChat ": app >";
rule _2 ═ open "<" word +: app > ";
in WFST rule 1, word is a wildcard character, which can be matched with any character, and its weight is 1. The weight of the other character matches is a default weight of 0.
The intent of this WFST rule 1 is: opening the application WeChat.
WFST rule 2:
rule _1 is "open sms";
there are no wildcards in WFST rule 2, and all the weights for character matching are default weights of 0.
The intent of this WFST rule 1 is: and opening the short message.
(2) Compiling the rule file by a grammar compiler to obtain a WFST intention slot position model file;
the WFST intended slot model file is a WFST intended slot model.
For example, as shown in FIG. 12, WFST intent slot model 1 is compiled according to WFST rule 1.
The WFST intent slot model 1 includes 10 states. Where state 0 represents the initial state and states 7 and 9 represent the terminal state. The WFST intended slot model 1 has two paths: 0.1, 2,3,4,5, 6, 7 and 0,1,2,3,4, 8, 9:
the path from state 0 to state 4 is the same:
outputting preset intention marking information and intention information < OPEN _ APP > from the state 0 to the state 1 for receiving an empty character < eps >, wherein the weight of the preset intention marking information and the intention information < OPEN _ APP > is 0;
from state 1 to state 2, the character is accepted and output, with a weight of 0;
from state 2 to state 3, the accept character "on", the output character "on", with a weight of 0;
from the state 3 to the state 4, receiving an empty character < eps >, outputting the first half part <, the weight of which is 0, of the preset slot position marking information;
the two paths differ starting from state 4:
one of the paths is 4,5, 6, 7:
from state 4 to state 5, the accepting character "micro", the outputting character "micro", the weight of which is 0;
from state 5 to state 6, the character "letter" is accepted, the character "letter" is output, and the weight is 0;
from the state 6 to the state 7, receiving a null character < eps >, outputting the second half part of the preset slot position marking information, namely app >, and the weight of the app > is 0;
the other path is 4, 8, 9:
for receiving an arbitrary character < any > from the state 4 to the state 8, outputting the arbitrary character < any >, the weight of which is 1;
from state 8 to state 8, for receiving an arbitrary character < any >, outputting the arbitrary character < any >, whose weight is 1;
from the state 8 to the state 9, a null character < eps > is received, and the second half of the preset slot marking information, namely app >, is output, and the weight of the slot marking information is 0.
WFST intends the accumulated weight of slot model 1 to be calculated as weight addition in the path.
FIG. 13 shows a WFST intent slot model 2 compiled according to WFST rule 2.
The WFST intent slot model 2 includes 6 states. Where state 0 represents the initial state and state 5 represents the terminal state. The WFST intended slot model 2 has only one path: 0.1, 2,3,4 and 5.
From state 0 to state 1, receiving an empty character < eps >, outputting preset intention labeling information and intention information < CHECK _ MESSAGE >, wherein the weight is 0;
from state 1 to state 2, the character is accepted and output, with a weight of 0;
from state 2 to state 3, the accept character "on", the output character "on", with a weight of 0;
from state 3 to state 4, the accepting character "short", the outputting character "short", with a weight of 0;
from state 4 to state 5, the character "letter" is accepted, and the character "letter" is output with a weight of 0.
WFST intends the cumulative weight of slot model 2 to be calculated by adding weights in the path.
(3) And outputting the WFST intention slot position model file for persistent storage.
And secondly, matching rules on the line.
(1) Firstly, loading all compiled WFST intention slot position model files, and initializing an NLU engine;
(2) performing parallel rule matching on user input by using a WFST intention slot model to obtain output results, wherein each output result comprises intention identification information, slot position identification information and weight;
(4) one or more WFST matching results with the highest score are output using the structuring.
The following describes (2) and (4) in the above-described online rule matching in detail with reference to the WFST intended slot model shown in fig. 12 and 13.
Please refer to fig. 14, which is another flowchart illustrating the intent recognition method according to the embodiment of the present application.
S1401, the electronic equipment loads a preset WFST intention slot model;
the electronic device may load a preset WFST intent slot model by reading a WFST intent slot model file stored in the electronic device.
It is understood that when the preset WFST intended slot model is loaded, the loading may be performed in its entirety; or a part of the terminal may be loaded according to a current scene of the terminal, which is not limited herein.
S1402, responding to the voice input of the user, and converting the voice input into a first text by the electronic equipment;
similar to step S1001, no further description is provided here.
S1403, the electronic equipment performs format preprocessing on the first text to obtain a second text;
similar to step S1002, no further description is provided here.
It is understood that steps S1402 and S1403 may be executed simultaneously with step S1401, may be executed before step S1401, and may be executed after step S1401, which is not limited herein.
S1404, the electronic equipment uses a plurality of preset WFST intention slot models to perform parallel rule matching on the second text to obtain a WFST result;
a default WFST intended slot model is a default WFST. The preset WFST intention slot model adds preset intention label information and/or preset slot label information in an input text in the process of state transition, wherein the preset intention icon label information is used for marking the intention of the input text, and the preset slot label information is used for marking the slot information in the input text.
It can be understood that, in some preset WFST intention slot models, the position of an intention can be marked only by preset intention marking information; in some preset WFST intention slot models, intention information can be directly added into an input text, and the position of the intention information is marked by using preset intention marking information; in some preset WFST intention slot models, the position of an intention can be marked by preset intention marking information, and the position of slot information can be marked by the preset slot marking information; in some predefined WFST intention slot models, intention information may be directly added to the input text, the location of the intention information may be marked using predefined intention marking information, and the location of the slot information may be marked using predefined slot marking information. And is not limited herein.
There is a weight in the WFST intent slot model that is preset for each state transition. The WFST result includes the matching text of the second text and the cumulative weight of the matching path. The matching text of the second text is the text which is output after being successfully matched through the matching path and contains the preset intention marking information and/or the preset slot position marking information.
Fig. 15 is an exemplary diagram illustrating a second text being subjected to multiple preset WFST intention slot model parallel rule matching in the embodiment of the present application. The electronic device adopts a strategy of an intended-one WFST intended slot model, and stores a plurality of preset WFST intended slot models, such as CALL. After the electronic device loads the preset WFST intention slot models, the preprocessed second text can be subjected to parallel rule matching. Since some paths of some models have wildcards in the default WFST intention slot models, the second text may be successfully matched with only one path of one default WFST intention slot model or with multiple paths of multiple default WFST intention slot models. Thus, one WFST result may be obtained, or multiple WFST results may be obtained.
If only one WFST result is obtained, the WFST result can be directly determined to be the WFST result with the highest confidence level, and step S1406 is executed;
when a plurality of WFST results are obtained, step S1405 needs to be executed to determine the WFST result with the highest confidence level according to the accumulated weight.
It is to be understood that, since the weights in the predefined WFST intention slot model can be customized, and the calculation manner of the accumulated weight can also be customized, the concepts of the weights and the accumulated weight representation can be different according to different settings, the calculation manner of the accumulated weight can be different, and accordingly, the manner of determining the WFST result with the highest confidence level can also be different, and is not limited herein.
Fig. 16 is an exemplary diagram illustrating a case where different settings are applied to the weights and the cumulative weights in the preset WFST intended slot model in the embodiment of the present application. These several different settings are described by way of example below:
case 1: the weight of the state transition accepting the wildcard is larger than that of the state transition not accepting the wildcard, and the larger the weight of each state transition on the matching path is, the larger the cumulative weight of the matching path is.
In this case, the weights represent the costs of state transitions, and the cumulative weights represent the costs of matching paths. The greater the cumulative weight, the greater the cost of WFST matching the current path, and correspondingly, the lower the confidence. Therefore, at the time of the state transition, the weight of the state transition which accepts the wildcard is larger than the weight of the state transition which does not accept the wildcard. And the weight can be set according to the degree and the range of wildcard: the weight of the wildcard with high wildcard degree and large wildcard range is set to be larger than the weight of the wildcard with low wildcard degree and small wildcard range.
The WFST intended slot model 1 shown in fig. 11 and the WFST intended slot model 2 shown in fig. 12 are WFST intended slot models set in accordance with the case 1. Taking the second text as an "open short message" as an example, when the second text is matched with the WFST intention slot model 1 and the WFST intention slot model 2 by the parallel rule:
WFST intended slot model 1 will get WFST result 1 according to matching paths 0,1,2,3, 6, 10, 10, 11: < OPEN _ APP > OPENs < SMS: APP >, Weight: 2.
WFST intended slot model 2 obtains WFST result 2 according to matching paths 0,1,2,3,4, 5: < CHECK _ MESSAGE > open short MESSAGE, Weight: 0.
Since WFST intends to match the path of slot model 1: states 9 through 10, and states 10 through 10 are wildcard matches, and thus their weight 1 is higher than the default weight 0 for other state transitions that do not employ wildcards. Since the WFST intention slot model 1 calculates the cumulative weight in such a manner that the cumulative weight is the sum of the weights of the state transitions on the matching path, the cumulative weight is 2.
However, the WFST intends for the state transition of slot model 2 not using wildcards on the matching path, so the weight of each state transition is the default weight of 0. The final cumulative weight is also 0.
Case 2: the weight of the state transition accepting the wildcards is larger than that of the state transition not accepting the wildcards, and the larger the weight of each state transition on the matching path is, the smaller the cumulative weight of the matching path is.
In some cases, the weights of the state transitions may be set to represent the costs of the state transitions, and the cumulative weights of the matching paths represent the confidence levels of the matching paths. The larger the cumulative weight, the higher the confidence.
In this case, the cumulative weight may be calculated such that the cumulative weight of the matching path decreases as the weight of the state transition on the matching path increases. For example, the accumulated path is calculated by matching the negative of the sum of the state transitions on the path, and the like, and is not limited herein.
Case 3: the weight of the state transition accepting the wildcard is less than the weight of the state transition not accepting the wildcard, and the smaller the weight of each state transition on the matching path is, the larger the cumulative weight of the matching path is.
In some cases, the weight of the state transition may be set to represent the confidence level of the state transition, and the cumulative weight of the matching paths represents the cost of the matching paths. The larger the cumulative weight, the higher the cost and the lower the confidence.
Case 4: the weight of the state transition accepting the wildcard is smaller than that of the state transition not accepting the wildcard, and the smaller the weight of each state transition on the matching path is, the smaller the cumulative weight of the matching path is.
In some cases, the weights of the state transitions may be set to indicate the confidence level of the state transitions, and the cumulative weights of the matching paths indicate the confidence level of the matching paths. The larger the cumulative weight, the higher the confidence.
It is understood that what kind of situation is used to set the predetermined WFST intended slot model may be determined according to actual requirements, and is not limited herein.
S1405, when the WFST results are multiple, the electronic equipment determines that the matching text of the second text in the WFST results with the highest credibility is the third text;
in some cases, the electronic device may use the cumulative weights in the WFST results directly for comparison. For example, when the accumulated weight indicates the degree of confidence, it is determined that the higher the accumulated weight, the higher the degree of confidence of the WFST result. When the cumulative weight represents the cost, the WFST result whose cumulative weight is determined to be smaller is more highly reliable.
However, in practical applications, the slot model may have a different number of state transitions due to the respective predefined WFST intentions in order to make fairness in the final confidence comparison. Therefore, the accumulated weight can be normalized to obtain the credibility score, and then the WFST result with the highest credibility is determined according to the credibility score, specifically:
1. when the WFST results are multiple, the electronic equipment carries out credibility score calculation on the multiple WFST results;
specifically, if the accumulated weight represents the cost of the matching path.
Then one confidence score calculation formula may be:
confidence score calculation equation 1:
Figure BDA0002544192610000251
wherein w represents the cumulative weight of the matching path in the WFST result to be scored, and wmaxIndicates the largest cumulative weight among the cumulative weights of matching paths in the WFST results. The score value range after the normalized credibility score calculation is [ 0,1 ], and the higher the score is, the higher the credibility is.
Illustratively, to compare the WFST result 1 obtained in the above example:<OPEN_APP>open<Short message app>Weight:2, and WFST result 2:<CHECK_MESSAGE>opening the short message, Weight:0, and calculating the credibility score as an example, because the cumulative Weight in WFST result 1 and WFST result 2 is 2 at most, wmaxAnd 2, then:
confidence score of WFST result 1 score (w)1,wmax)=1-2/2=0;
Confidence score of WFST result 2 score (w)2,wmax)=1-0/2=1。
It is to be understood that the reliability score calculation formula 1 is an alternative reliability score calculation formula when the cumulative weight represents the cost of the matching path. In practical applications, other calculation formulas may be used to determine the confidence score of the WFST result, and it is only necessary that the higher the score, the higher the confidence.
It can be understood that, if the accumulated weight represents the reliability of the matching path, other formulas may be adopted to perform normalization reliability score calculation, so that a higher score represents a higher reliability, and details are not described herein.
2. Determining the matching text of the second text in the WFST result with the highest credibility score as the third text;
illustratively, of WFST result 1 and WFST result 2, WFST result 2 with the highest confidence score may be the third text, and thus the electronic device may determine that the matching text "< CHECK _ MESSAGE > open text" of the second text in WFST result 2 is the third text.
S1406, the electronic equipment acquires the intention information and/or the slot position information from the third text according to the preset intention marking information and/or the preset slot position marking information;
similar to step S1004, the detailed description is omitted here.
Illustratively, if the third text is: < CHECK _ MESSAGE > open the short MESSAGE. The electronic equipment can extract intention information from the third text according to preset intention icon attention information < >: CHECK _ MESSAGE. It is to be understood that, if slot annotation information is included in the third text, the electronic device may further extract the slot information from the third text, which is not limited herein.
S1407, the electronic device outputs the intention information and/or the slot position information in a structured manner.
Similar to step S1005, no further description is provided here.
For example, after extracting the intention information CHECK _ MESSAGE from the third text in WFST result 2, the electronic device may perform structured output of the intention information:
JSON (slot-free):
Figure BDA0002544192610000252
Figure BDA0002544192610000261
the structured output may be used by other modules in the electronic device.
In the embodiment of the application, the electronic device may store a plurality of preset WFST intention slot models, and when the intention is recognized, the electronic device may use the plurality of WFST intention slot models to perform parallel matching on the user input. And determining the WFST result with the highest credibility from the obtained multiple WFST results, and extracting and outputting intention information and slot information. The intention recognition rate is improved, the intention recognition accuracy is improved, the WFST intention slot model is adopted for parallel matching, the matching rate is greatly improved, and the operation load of the electronic equipment is reduced.
The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present application.
As used in the above embodiments, the term "when …" may be interpreted to mean "if …" or "after …" or "in response to a determination of …" or "in response to a detection of …", depending on the context. Similarly, depending on the context, the phrase "at the time of determination …" or "if (a stated condition or event) is detected" may be interpreted to mean "if the determination …" or "in response to the determination …" or "upon detection (a stated condition or event)" or "in response to detection (a stated condition or event)".
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable devices. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., coaxial cable, fiber optic, digital subscriber line) or wirelessly (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid state disk), among others.
One of ordinary skill in the art will appreciate that all or part of the processes in the methods of the above embodiments may be implemented by hardware related to instructions of a computer program, which may be stored in a computer-readable storage medium, and when executed, may include the processes of the above method embodiments. And the aforementioned storage medium includes: various media capable of storing program codes, such as ROM or RAM, magnetic or optical disks, etc.

Claims (23)

1. An intent recognition method, comprising:
in response to a voice input by a user, the electronic device converts the voice input into a first text;
the electronic equipment carries out rule matching on the second text by using a preset FST intention slot model to obtain a third text; the second text is determined according to the first text; the preset FST intention slot position model is a preset FST; the third text comprises preset intention marking information and/or preset slot position marking information, the preset intention marking information is used for marking out intention information of the second text, and the preset slot position marking information is used for marking out slot position information in the second text;
and the electronic equipment acquires intention information and/or slot position information from the third text according to preset intention marking information and/or preset slot position marking information.
2. The method according to claim 1, wherein before the step of the electronic device performing rule matching on the second text by using a preset FST intention slot model to obtain a third text, the method further comprises:
the electronic equipment performs format preprocessing on the first text to obtain a second text; the format characters in the second text are less than or equal to the format characters in the first text.
3. The method according to claim 1 or 2, wherein after the step of obtaining the intention information and/or the slot position information from the third text according to the preset intention labeling information and/or the preset slot position labeling information, the method further comprises:
and the electronic equipment performs structured output on the intention information and/or the slot position information.
4. The method of any of claims 1-3, wherein the default FST intent slot model is a default WFST intent slot model, wherein a default WFST intent slot model is a default WFST, and wherein each state transition in the default WFST intent slot model has a weight.
5. The method according to claim 4, wherein the electronic device performs rule matching on the second text by using a preset FST intention slot model to obtain a third text, and specifically comprises:
the electronic equipment uses a plurality of preset WFST intention slot models to perform parallel rule matching on the second text to obtain a WFST result; the WFST result comprises a matching text of a second text and an accumulated weight of a matching path, wherein the matching text of the second text is a text which is output after being successfully matched through the matching path and contains preset intention marking information and/or preset slot position marking information;
when the WFST result is one, the electronic device determining that a matching text of a second text in the WFST result is the third text;
when the WFST result is multiple, the electronic device determines that matching text of the second text in the WFST result with the highest confidence level is the third text.
6. The method according to claim 5, wherein when the WFST result is multiple, the electronic device determines that the matching text of the second text in the WFST result with the highest confidence level is the third text, and specifically comprises:
when the WFST results are multiple, the electronic equipment carries out credibility score calculation on the multiple WFST results according to the cumulative weight of the matched paths in the multiple WFST results;
the electronic device determines a matching text of the second text in the WFST result with the highest confidence score as the third text.
7. The method of claim 6 wherein the predetermined WFST intention slot model has a greater weight for accepting state transitions of wildcards than for not accepting state transitions of wildcards, and wherein the greater the weight of each state transition on a matching path, the greater the cumulative weight of the matching path.
8. The method of claim 7 wherein the cumulative weight of matching paths in the predefined WFST intent slot model is equal to the sum of the weights of each state transition on a matching path.
9. The method according to claim 7 or 8, wherein the electronic device performs confidence score calculation on the WFST results according to the cumulative weight of the matching path in the WFST results, and specifically comprises:
the electronic equipment calculates the credibility scores of a plurality of WFST results by using a credibility score calculation formula 1;
confidence score calculation equation 1:
Figure FDA0002544192600000021
wherein w represents the cumulative weight of the matching path in the WFST result to be scored, and wmaxRepresents the largest cumulative weight among the cumulative weights of the matching paths in the plurality of WFST results.
10. The method according to any one of claims 4 to 9, wherein the step of the electronic device performing rule matching on the second text by using a preset FST intention slot model to obtain a third text is preceded by the method further comprising:
the electronic device loads a preset WFST intent slot model.
11. An electronic device, characterized in that the electronic device comprises: one or more processors and memory;
the memory coupled with the one or more processors, the memory to store computer program code, the computer program code including computer instructions, the one or more processors to invoke the computer instructions to cause the electronic device to perform:
in response to a voice input by a user, converting the voice input into a first text;
performing rule matching on the second text by using a preset FST intention slot model to obtain a third text; the second text is determined according to the first text; the preset FST intention slot position model is a preset FST; the third text comprises preset intention marking information and/or preset slot position marking information, the preset intention marking information is used for marking out intention information of the second text, and the preset slot position marking information is used for marking out slot position information in the second text;
and acquiring intention information and/or slot position information from the third text according to preset intention marking information and/or preset slot position marking information.
12. The electronic device of claim 11, wherein the one or more processors are further configured to invoke the computer instructions to cause the electronic device to perform:
carrying out format preprocessing on the first text to obtain a second text; the format characters in the second text are less than or equal to the format characters in the first text.
13. The electronic device of claim 11 or 12, wherein the one or more processors are further configured to invoke the computer instructions to cause the electronic device to perform:
and outputting the intention information and/or the slot position information in a structuralized mode.
14. The electronic device of any of claims 11-13, wherein the default FST intent slot model is a default WFST intent slot model, wherein a default WFST intent slot model is a default WFST, and wherein each state transition in the default WFST intent slot model has a weight.
15. The electronic device of claim 14, wherein the one or more processors are further configured to invoke the computer instructions to cause the electronic device to perform:
performing parallel rule matching on the second text by using a plurality of preset WFST intention slot models to obtain a WFST result; the WFST result comprises a matching text of a second text and an accumulated weight of a matching path, wherein the matching text of the second text is a text which is output after being successfully matched through the matching path and contains preset intention marking information and/or preset slot position marking information;
when the WFST result is one, determining that the matching text of the second text in the WFST result is the third text;
when the WFST result is multiple, determining that the matching text of the second text in the WFST result with the highest credibility is the third text.
16. The electronic device of claim 15, wherein the one or more processors are further configured to invoke the computer instructions to cause the electronic device to perform:
when the WFST results are multiple, carrying out credibility score calculation on the multiple WFST results according to the cumulative weight of the matched path in the multiple WFST results;
and determining the matching text of the second text in the WFST result with the highest credibility score as the third text.
17. The electronic device of claim 16, wherein in the predefined WFST intention slot model, a state transition accepting wildcards has a greater weight than a state transition not accepting wildcards, and wherein the greater the weight of each state transition on a matching path, the greater the cumulative weight of the matching path.
18. The electronic device of claim 17, wherein in the preset WFST intent slot model, the cumulative weight of matching paths equals the sum of the weights of each state transition on a matching path.
19. The electronic device of claim 17 or 18, wherein the one or more processors are further configured to invoke the computer instructions to cause the electronic device to perform:
calculating the credibility scores of a plurality of WFST results by using a credibility score calculation formula 1;
confidence score calculation equation 1:
Figure FDA0002544192600000031
wherein w represents the cumulative weight of the matching path in the WFST result to be scored, and wmaxRepresents the largest cumulative weight among the cumulative weights of the matching paths in the plurality of WFST results.
20. The electronic device of any of claims 14-19, wherein the one or more processors are further configured to invoke the computer instructions to cause the electronic device to perform:
the preset WFST intent slot model is loaded.
21. A chip system for application to an electronic device, the chip system comprising one or more processors for invoking computer instructions to cause the electronic device to perform the method of any of claims 1-10.
22. A computer program product comprising instructions for causing an electronic device to perform the method according to any one of claims 1-10 when the computer program product is run on the electronic device.
23. A computer-readable storage medium comprising instructions that, when executed on an electronic device, cause the electronic device to perform the method of any of claims 1-10.
CN202010555603.3A 2020-06-17 2020-06-17 Intention recognition method and electronic equipment Pending CN113806473A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202010555603.3A CN113806473A (en) 2020-06-17 2020-06-17 Intention recognition method and electronic equipment
PCT/CN2021/100475 WO2021254411A1 (en) 2020-06-17 2021-06-17 Intent recognigion method and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010555603.3A CN113806473A (en) 2020-06-17 2020-06-17 Intention recognition method and electronic equipment

Publications (1)

Publication Number Publication Date
CN113806473A true CN113806473A (en) 2021-12-17

Family

ID=78892632

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010555603.3A Pending CN113806473A (en) 2020-06-17 2020-06-17 Intention recognition method and electronic equipment

Country Status (2)

Country Link
CN (1) CN113806473A (en)
WO (1) WO2021254411A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115453897A (en) * 2022-08-18 2022-12-09 青岛海尔科技有限公司 Method and device for determining intention instruction, storage medium and electronic device
WO2023124849A1 (en) * 2021-12-30 2023-07-06 华为技术有限公司 Speech recognition method and device

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11934794B1 (en) * 2022-09-30 2024-03-19 Knowbl Inc. Systems and methods for algorithmically orchestrating conversational dialogue transitions within an automated conversational system
CN115563951A (en) * 2022-10-14 2023-01-03 美的集团(上海)有限公司 Text sequence labeling method and device, storage medium and electronic equipment

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8700404B1 (en) * 2005-08-27 2014-04-15 At&T Intellectual Property Ii, L.P. System and method for using semantic and syntactic graphs for utterance classification
US10453454B2 (en) * 2017-10-26 2019-10-22 Hitachi, Ltd. Dialog system with self-learning natural language understanding
CN111078844B (en) * 2018-10-18 2023-03-14 上海交通大学 Task-based dialog system and method for software crowdsourcing
CN109543190B (en) * 2018-11-29 2023-06-16 北京羽扇智信息科技有限公司 Intention recognition method, device, equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023124849A1 (en) * 2021-12-30 2023-07-06 华为技术有限公司 Speech recognition method and device
CN115453897A (en) * 2022-08-18 2022-12-09 青岛海尔科技有限公司 Method and device for determining intention instruction, storage medium and electronic device

Also Published As

Publication number Publication date
WO2021254411A1 (en) 2021-12-23

Similar Documents

Publication Publication Date Title
CN110111787B (en) Semantic parsing method and server
CN110910872B (en) Voice interaction method and device
CN112567457B (en) Voice detection method, prediction model training method, device, equipment and medium
CN110134316B (en) Model training method, emotion recognition method, and related device and equipment
CN110798506B (en) Method, device and equipment for executing command
WO2021254411A1 (en) Intent recognigion method and electronic device
CN111881315A (en) Image information input method, electronic device, and computer-readable storage medium
WO2022052776A1 (en) Human-computer interaction method, and electronic device and system
JP7252327B2 (en) Human-computer interaction methods and electronic devices
CN111669459A (en) Keyboard display method, electronic device and computer readable storage medium
CN111970401B (en) Call content processing method, electronic equipment and storage medium
CN112130714B (en) Keyword search method capable of learning and electronic equipment
CN112256868A (en) Zero-reference resolution method, method for training zero-reference resolution model and electronic equipment
CN113297843A (en) Reference resolution method and device and electronic equipment
CN114691839A (en) Intention slot position identification method
CN111768765B (en) Language model generation method and electronic equipment
CN112740148A (en) Method for inputting information into input box and electronic equipment
CN112416984A (en) Data processing method and device
CN113380240B (en) Voice interaction method and electronic equipment
WO2022007757A1 (en) Cross-device voiceprint registration method, electronic device and storage medium
CN113742460B (en) Method and device for generating virtual roles
CN114528842A (en) Word vector construction method, device and equipment and computer readable storage medium
CN115131789A (en) Character recognition method, character recognition equipment and storage medium
CN114238554A (en) Text label extraction method
CN113497835A (en) Multi-screen interaction method, electronic equipment and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination