WO2021254411A1 - Procédé de reconnaissance d'intention et dispositif électronique - Google Patents
Procédé de reconnaissance d'intention et dispositif électronique Download PDFInfo
- Publication number
- WO2021254411A1 WO2021254411A1 PCT/CN2021/100475 CN2021100475W WO2021254411A1 WO 2021254411 A1 WO2021254411 A1 WO 2021254411A1 CN 2021100475 W CN2021100475 W CN 2021100475W WO 2021254411 A1 WO2021254411 A1 WO 2021254411A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- text
- wfst
- intent
- preset
- electronic device
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Definitions
- This application relates to the field of artificial intelligence technology, in particular to an intention recognition method and electronic equipment.
- Natural language processing is a sub-field of artificial intelligence (AI).
- Natural language understanding is a sub-field of natural language processing, and it is also the most difficult subject of NLP. Intent recognition and slot filling are the two most critical tasks of NLU, but due to factors such as language diversity, ambiguity, robustness, knowledge dependence and context, it is very difficult for NLU to complete these two tasks well. big.
- CFG context free grammar
- CYK algorithm cocke younger kasami algorithm, CYK algorithm
- CNF Chomsky normal form
- the entire analysis path can generate a grammatical analysis tree, and the user can extract grammatical features based on this grammatical tree, and then obtain the desired grammatical information, such as parts of speech, entities, sentence components and other information.
- This application provides a method and electronic device for intent recognition to improve the speed and accuracy of matching when using NLU rules for intent recognition.
- the present application provides an intent recognition method.
- the method includes: in response to a user's voice input, an electronic device converts the voice input into a first text; the electronic device uses a preset FST intent slot model to pair Rule matching is performed on the second text to obtain the third text; the second text is determined according to the first text; the preset FST intent slot model is a preset FST; the third text includes preset intent tagging information And/or preset slot labeling information, the preset intent labeling information is used to label the intent information of the second text, and the preset slot labeling information is used to label the slot information in the second text; the The electronic device obtains the intention information and/or the slot information from the third text according to the preset intention labeling information and/or the preset slot labeling information.
- the electronic device uses the preset FST intent slot model to recognize user input. Since the preset FST intent slot model is an FST, based on the characteristics of FST rule matching, it performs rule matching extremely fast . And after matching, the third text with preset intent labeling information and preset slot labeling information can be obtained. After the matching is completed, the intent information and slot information can be easily extracted from it, which greatly improves the use of NLU rules for intent The speed and accuracy of matching during recognition.
- the electronic device uses a preset FST intent slot model to perform rule matching on the second text, and before the step of obtaining the third text, the method further includes: the electronic The device performs format preprocessing on the first text to obtain the second text; the format characters in the second text are less than or equal to the format characters in the first text.
- the electronic device may first perform format preprocessing on the first text to obtain the second text.
- the complexity of the preset FST intent slot model for matching the second text with FST rules can be simplified, and the matching speed can be further improved.
- the electronic device obtains the intent information and/or slot information from the third text according to preset intent labeling information and/or preset slot labeling information. After the steps, the method further includes: the electronic device outputs the intent information and/or slot information in a structured manner.
- the electronic device can output the intent information and/or slot information in a structured manner, so that other modules in the electronic device can use the intent information and/or slot information.
- the preset FST intent slot model is a preset WFST intent slot model
- a preset WFST intent slot model is a preset WFST
- each state transition has a weight.
- the preset FST intent slot model is the preset WFST intent slot model, so that the weight of each match can be obtained during parallel matching, which facilitates the screening of matching results.
- the electronic device uses a preset FST intent slot model to perform rule matching on the second text to obtain the third text, which specifically includes: the electronic device uses multiple presets Set the WFST intent slot model, perform parallel rule matching on the second text to obtain the WFST result; where the WFST result includes the matching text of the second text and the cumulative weight of the matching path, and the matching text of the second text is passed After the matching path is successfully matched, the output text contains the preset intent labeling information and/or the preset slot labeling information; when the WFST result is one, the electronic device determines the matching text of the second text in the WFST result Is the third text; when there are multiple WFST results, the electronic device determines that the matching text of the second text in the WFST result with the highest credibility is the third text.
- the electronic device determines the matching text of the second text in the WFST result with the highest credibility after parallel rule matching as the third text. While improving the efficiency of parallel matching, it also ensures the accuracy of intent recognition.
- the electronic device determines that the matching text of the second text in the WFST result with the highest credibility is the third text, Specifically: when there are multiple WFST results, the electronic device calculates the credibility score for the multiple WFST results according to the cumulative weight of the matching path in the multiple WFST results; the electronic device determines that the credibility score is the highest The matching text of the second text in the WFST result is the third text.
- the electronic device determines the matching text of the second text in the WFST result with the highest credibility score as the third text, which improves the accuracy of credibility evaluation.
- the weight of state transitions that accept wildcards is greater than the weight of state transitions that do not accept wildcards, and each state on the matching path The greater the weight of the transfer, the greater the cumulative weight of the matching path.
- the cumulative weight of the matching path is equal to the sum of the weights of each state transition on the matching path.
- the electronic device performs credibility score calculation on the multiple WFST results according to the cumulative weight of the matching path in the multiple WFST results, which specifically includes: the electronic device Use the credibility score calculation formula 1 to calculate the credibility score of multiple WFST results;
- w represents the cumulative weight of the matching path in the WFST results that need to be scored
- w max represents the largest cumulative weight among the cumulative weights of the matching path in the multiple WFST results.
- the electronic device uses a preset FST intent slot model to perform rule matching on the second text, and before the step of obtaining the third text, the method further includes: the electronic The device loads the preset WFST intent slot model.
- an embodiment of the present application provides an electronic device, the electronic device includes: one or more processors and a memory; the memory is coupled with the one or more processors, and the memory is used to store computer program codes,
- the computer program code includes computer instructions, and the one or more processors call the computer instructions to cause the electronic device to execute: in response to a user's voice input, convert the voice input into a first text; use a preset FST intent slot
- the bit model performs rule matching on the second text to obtain the third text; the second text is determined according to the first text; the preset FST intent slot model is a preset FST;
- the third text includes presets Intent labeling information and/or preset slot labeling information, the preset intent labeling information is used to label the intent information of the second text, and the preset slot labeling information is used to label the slot in the second text Information: According to the preset intent labeling information and/or the preset slot labeling information, the intent information and/or slot information are obtained from the third text.
- the electronic device uses the preset FST intent slot model to recognize user input. Since the preset FST intent slot model is an FST, based on the characteristics of FST rule matching, it performs rule matching extremely fast . And after matching, the third text with preset intent labeling information and preset slot labeling information can be obtained. After the matching is completed, the intent information and slot information can be easily extracted from it, which greatly improves the use of NLU rules for intent The speed and accuracy of matching during recognition.
- the one or more processors are also used to call the computer instructions to make the electronic device execute: format preprocessing of the first text to obtain the first text Second text; the format characters in the second text are less than or equal to the format characters in the first text.
- the one or more processors are also used to call the computer instructions to make the electronic device execute: structure the intent information and/or slot information Output.
- the preset FST intent slot model is a preset WFST intent slot model
- a preset WFST intent slot model is a preset WFST
- each state transition has a weight.
- the one or more processors are specifically configured to call the computer instructions to make the electronic device execute: using multiple preset WFST intent slot models, The second text is matched with parallel rules to obtain the WFST result; where the WFST result includes the matching text of the second text and the cumulative weight of the matching path, and the matching text of the second text is the output after successfully matching through the matching path A text containing preset intent labeling information and/or preset slot labeling information; when the WFST result is one, it is determined that the matching text of the second text in the WFST result is the third text; when the WFST result is multiple At this time, it is determined that the matching text of the second text in the WFST result with the highest credibility is the third text.
- the one or more processors are specifically configured to call the computer instructions to make the electronic device execute: when the WFST result is multiple, according to the multiple The cumulative weight of the matching path in the WFST result is calculated on the credibility score of the multiple WFST results; the matching text of the second text in the WFST result with the highest credibility score is determined to be the third text.
- the weight of state transitions that accept wildcards is greater than the weight of state transitions that do not accept wildcards, and each state on the matching path The greater the weight of the transfer, the greater the cumulative weight of the matching path.
- the cumulative weight of the matching path is equal to the sum of the weights of each state transition on the matching path.
- the one or more processors are specifically configured to call the computer instructions to make the electronic device execute: using the credibility score calculation formula 1, calculate multiple WFSTs Reliability score of results;
- w represents the cumulative weight of the matching path in the WFST results that need to be scored, and represents the largest cumulative weight among the cumulative weights of the matching path in the multiple WFST results.
- the one or more processors are also used to call the computer instructions to cause the electronic device to execute: load a preset WFST intent slot model.
- embodiments of the present application provide a chip system that is applied to an electronic device.
- the chip system includes one or more processors for invoking computer instructions to make the electronic device execute the first Aspect and the method described in any possible implementation of the first aspect.
- the chip system may include one processor 110 in the electronic device 100 as shown in FIG. 5, or may include multiple processors 110 in the electronic device 100 as shown in FIG. 5, which is not limited here. .
- the embodiments of the present application provide a computer program product containing instructions.
- the computer program product When the computer program product is run on an electronic device, the electronic device executes the first aspect and any possible implementation manner in the first aspect. Described method.
- an embodiment of the present application provides a computer-readable storage medium, including instructions, which when the foregoing instructions run on an electronic device, cause the electronic device to execute the first aspect and any possible implementation manner in the first aspect Described method.
- the electronic equipment provided in the second aspect, the chip system provided in the third aspect, the computer program product provided in the fourth aspect, and the computer storage medium provided in the fifth aspect are all used to implement the methods provided in the embodiments of the present application. . Therefore, the beneficial effects that can be achieved can refer to the beneficial effects in the corresponding method, which will not be repeated here.
- Figure 1 is a schematic diagram of the relationship between intent and slot
- FIG. 2 is an exemplary schematic diagram of an FSA
- FIG. 3 is an exemplary schematic diagram of an FST
- FIG. 4 is an exemplary schematic diagram of a WFST
- FIG. 5 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
- FIG. 6 is a block diagram of the software structure of an electronic device according to an embodiment of the present application.
- FIG. 7 is a schematic diagram of the structure of the intention recognition module in an embodiment of the present application.
- FIG. 8 is a schematic diagram of a usage scenario of the intention recognition method in an embodiment of the present application.
- FIG. 9 is an exemplary schematic diagram of an FST intention slot model in an embodiment of the present application.
- FIG. 10 is a schematic flowchart of an intention recognition method in an embodiment of the present application.
- FIG. 11 is a schematic diagram of another flow chart of the intention recognition method in an embodiment of the present application.
- FIG. 12 is an exemplary schematic diagram of a WFST intention slot model in an embodiment of the present application.
- FIG. 13 is another exemplary schematic diagram of a WFST intention slot model in an embodiment of the present application.
- FIG. 14 is a schematic diagram of another flow chart of an intention recognition method in an embodiment of the present application.
- 15 is an exemplary schematic diagram of parallel rule matching of multiple preset WFST intent slot models for the second text in an embodiment of the present application
- FIG. 16 is an exemplary schematic diagram of a situation in which different settings are used for weights and cumulative weights in the preset WFST intention slot model in an embodiment of the present application.
- first and second are only used for descriptive purposes, and cannot be understood as implying or implying relative importance or implicitly specifying the number of indicated technical features. Therefore, the features defined with “first” and “second” may explicitly or implicitly include one or more of these features. In the description of the embodiments of the present application, unless otherwise specified, “multiple” The meaning is two or more.
- Intent refers to the identification of the actual or potential needs of the user by the electronic device. Fundamentally speaking, intent is a classifier that divides user needs into certain types.
- Intention recognition also known as SUC (Spoken Utterance Classification), as the name suggests, is to classify the natural language conversation input by the user, and the classified category corresponds to the user's intent. For example, “what's the weather today", the intention is “ask the weather”.
- intent recognition can be regarded as a typical classification problem.
- the classification and definition of intent can refer to the ISO-24617-2 standard, which has 56 detailed definitions. The definition of intention has a lot to do with the positioning of the system itself and the knowledge base it possesses, that is, the definition of intention has a very strong domain relevance. It can be understood that in the embodiments of the present application, the classification and definition of intentions are not limited to the ISO-24617-2 standard.
- the slot is the parameter of the intent.
- An intent may correspond to several slots. For example, when asking for a bus route, you need to provide necessary parameters such as departure place, destination, and time. The above parameters are the slots corresponding to the intention of "asking for bus route".
- the main goal of the semantic slot filling task is to extract the pre-defined semantic slot values in the semantic frame from the input sentence on the premise that the semantic frame of a specific domain or specific intention is known.
- the semantic slot filling task can be transformed into a sequence labeling task, that is, using the classic IOB notation method to mark a word as the beginning, continuation (inside), or non-semantic slot (outside) of a certain semantic slot.
- Intent and slot position can let the system know which specific task to perform, and give the type of parameters needed to perform the task.
- Slot definition Slot 1: Time, Date; Slot 2: Location, Location.
- Fig. 1 is a schematic diagram of a relationship between an intention and a slot in an embodiment of the application.
- two necessary slots are defined for the "Ask the weather” task, which are "time” and "location".
- the above definition can solve the task requirement.
- a system often needs to be able to handle several tasks at the same time.
- the weather station should be able to answer the question of “inquiring about the weather” as well as the question of “inquiring about the temperature”.
- an optimized strategy is to define higher-level domains, such as "asking for the weather” intentions and “asking for temperature” intentions are both in the "weather” domain.
- the domain can be simply understood as a collection of intents.
- NLU Natural Language Understanding
- the user intent and the corresponding slot value of the corresponding slot can be identified from the user input.
- the goal of intent recognition is to identify user intent from the input.
- a single task can be simply modeled as a two-category question, such as "asking for the weather” intent, which can be modeled as “asking for the weather” or “not as for asking about the weather” during intent recognition.
- "Weather” two classification problem When it comes to the need for the system to handle multiple tasks, the system needs to be able to distinguish each intent. In this case, the two-category problem is transformed into a multi-category problem.
- the task of slot filling is to extract information from the data and fill it into a pre-defined slot.
- the intent and the corresponding slot have been defined in Figure 1.
- the system should Can extract "Today” and “Shanghai” and fill them into the “Time” and “Location” slots respectively.
- FST is currently widely used in speech recognition and natural language search and processing. For example, in natural language processing, some operations that modify text content according to rules are often encountered. For example, a rule is: if c is immediately followed by x in the string, then c is changed to b. FST is based on mathematical operations on these rules, integrating several rules into a one-way large-scale rule to effectively improve the efficiency of the rule-based system.
- FSA finite state acceptor
- FSA For a given input sequence, FSA returns "receiving" or “not receiving” two states.
- FIG. 2 it is an exemplary schematic diagram of an FSA, and its nodes and arcs respectively correspond to state and state transitions.
- state 0 represents the initial state
- state 5 represents the end state.
- State 0 to state 1 can accept character a
- state 1 to state 1 can accept character b
- state 1 to state 2 can accept character c
- state 2 to state 5 can accept character d
- state 0 to state 3 can accept character b.
- State 3 to State 4 can accept the character c
- State 4 to State 4 can accept the character d
- State 4 to State 5 can accept the character e.
- the regular expression corresponding to the FSA described in Figure 2 is: ab*cd
- the FSA shown in FIG. 2 can receive a symbol sequence "a, b, c, d" through paths 0, 1, 1, 2, and 5. At this time, the FSA can return to the "receive” state. For another example, if a sequence of "a, b, d" is entered in the FSA shown in FIG. 2, since there is no path in the FSA shown in FIG. 2 to obtain the sequence, the FSA will return to the "not receiving" state.
- FST is an extension of FSA, and each state transition has an output tag, called an input-output tag pair.
- Figure 2 it is an exemplary schematic diagram of an FST.
- state 0 represents the initial state
- state 5 represents the end state.
- the input and output label pairs from state 0 to state 1 are a: z; the input and output label pairs from state 1 to state 1 are b: y; the input and output label pairs from state 1 to state 2 are c: x; from state 2 to state 5
- FST can describe the conversion of a set of rules or the conversion of a set of symbol sequences to another set of conforming sequences.
- the input symbol sequence "a, b, c, d" through the path 0, 1, 1, 2, 5, because a is converted to z, b is converted to y, c is converted to x, and d is converted to w .
- FST is an efficient data structure, and its basic theory is based on the graph theory in the data structure. FST is divided into non-deterministic (Non-Deterministic) FST and deterministic (Deterministic) FST.
- the deterministic FST is a 7-tuple: (Q, ⁇ , ⁇ , ⁇ , ⁇ ,q 0 ,F), where:
- Non-deterministic FST is also a 7-tuple, but non-deterministic FST may have multiple choices when performing state transitions, so Definition 4 and Definition 5 are different from deterministic FST.
- an FST has multiple states and multiple state transitions, starting from the initial state, passing through multiple accepting states, and finally reaching the ending state to complete a matching operation.
- FST The typical features of FST are flexible use, high matching efficiency, and low memory overhead.
- WFST is a type of FST.
- Each state transition has a weight
- each initial state has an initial weight
- each terminal state has an end weight.
- the weight is generally the probability or loss of transition or initial/termination state. The weight will be accumulated along each path and accumulated on different paths.
- the calculation method of the cumulative weight can be specified by the specific WFST.
- the cumulative weight can be multiplying all weights on the passing path, adding all the weights on the passing path, or calculating the weights on the passing path. The method is not limited here.
- FIG 4 is an exemplary schematic diagram of WFST.
- each state transition label is transferred in the form of "input-label: output-label/weight”, and the initial state and the final state also have corresponding weights.
- WFST can be defined by an 8-tuple ( ⁇ , ⁇ , Q, I, F, E, ⁇ , ⁇ ):
- Q is a set of finite states
- the WFST shown in Figure 4 can be defined as follows:
- each transition in E consists of (source state, input label, output label, weight, target state);
- FIG. 5 is a schematic structural diagram of an electronic device 100 provided by an embodiment of the present application.
- the electronic device 100 may have more or fewer components than shown in the figure, may combine two or more components, or may have different component configurations.
- the various components shown in the figure may be implemented in hardware, software, or a combination of hardware and software including one or more signal processing and/or application specific integrated circuits.
- the electronic device 100 may include: a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2.
- Mobile communication module 150 wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, earphone jack 170D, sensor module 180, buttons 190, motor 191, indicator 192, camera 193, display screen 194 and Subscriber identification module (subscriber identification module, SIM) card interface 195, etc.
- SIM Subscriber identification module
- the sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, and ambient light Sensor 180L, bone conduction sensor 180M, etc.
- the structure illustrated in the embodiment of the present invention does not constitute a specific limitation on the electronic device 100.
- the electronic device 100 may include more or fewer components than shown, or combine certain components, or split certain components, or arrange different components.
- the illustrated components can be implemented in hardware, software, or a combination of software and hardware.
- the processor 110 may include one or more processing units.
- the processor 110 may include an application processor (AP), a modem processor, a graphics processing unit (GPU), and an image signal processor. (image signal processor, ISP), controller, memory, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural-network processing unit (NPU) Wait.
- AP application processor
- modem processor modem processor
- GPU graphics processing unit
- image signal processor image signal processor
- ISP image signal processor
- controller memory
- video codec digital signal processor
- DSP digital signal processor
- NPU neural-network processing unit
- the different processing units may be independent devices or integrated in one or more processors.
- the controller may be the nerve center and command center of the electronic device 100.
- the controller can generate operation control signals according to the instruction operation code and timing signals to complete the control of fetching and executing instructions.
- a memory may also be provided in the processor 110 to store instructions and data.
- the memory in the processor 110 is a cache memory.
- the memory can store instructions or data that have just been used or recycled by the processor 110. If the processor 110 needs to use the instruction or data again, it can be directly called from the memory. Repeated accesses are avoided, the waiting time of the processor 110 is reduced, and the efficiency of the system is improved.
- the processor 110 may include one or more interfaces.
- the interface can include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, and a universal asynchronous transmitter (universal asynchronous transmitter) interface.
- I2C integrated circuit
- I2S integrated circuit built-in audio
- PCM pulse code modulation
- PCM pulse code modulation
- UART universal asynchronous transmitter
- MIPI mobile industry processor interface
- GPIO general-purpose input/output
- SIM subscriber identity module
- USB Universal Serial Bus
- the I2C interface is a bidirectional synchronous serial bus, which includes a serial data line (SDA) and a serial clock line (SCL).
- the processor 110 may include multiple sets of I2C buses.
- the processor 110 may couple the touch sensor 180K, charger, flash, camera 193, etc., respectively through different I2C bus interfaces.
- the processor 110 may couple the touch sensor 180K through an I2C interface, so that the processor 110 and the touch sensor 180K communicate through the I2C bus interface to implement the touch function of the electronic device 100.
- the I2S interface can be used for audio communication.
- the processor 110 may include multiple sets of I2S buses.
- the processor 110 may be coupled with the audio module 170 through an I2S bus to implement communication between the processor 110 and the audio module 170.
- the audio module 170 may transmit audio signals to the wireless communication module 160 through an I2S interface, so as to realize the function of answering calls through a Bluetooth headset.
- the PCM interface can also be used for audio communication to sample, quantize and encode analog signals.
- the audio module 170 and the wireless communication module 160 may be coupled through a PCM bus interface.
- the audio module 170 may also transmit audio signals to the wireless communication module 160 through the PCM interface, so as to realize the function of answering calls through the Bluetooth headset. Both the I2S interface and the PCM interface can be used for audio communication.
- the UART interface is a universal serial data bus used for asynchronous communication.
- the bus can be a two-way communication bus. It converts the data to be transmitted between serial communication and parallel communication.
- the UART interface is generally used to connect the processor 110 and the wireless communication module 160.
- the processor 110 communicates with the Bluetooth module in the wireless communication module 160 through the UART interface to realize the Bluetooth function.
- the audio module 170 may transmit audio signals to the wireless communication module 160 through a UART interface, so as to realize the function of playing music through a Bluetooth headset.
- the MIPI interface can be used to connect the processor 110 with the display screen 194, the camera 193 and other peripheral devices.
- the MIPI interface includes camera serial interface (camera serial interface, CSI), display serial interface (display serial interface, DSI), etc.
- the processor 110 and the camera 193 communicate through a CSI interface to implement the shooting function of the electronic device 100.
- the processor 110 and the display screen 194 communicate through a DSI interface to realize the display function of the electronic device 100.
- the GPIO interface can be configured through software.
- the GPIO interface can be configured as a control signal or as a data signal.
- the GPIO interface can be used to connect the processor 110 with the camera 193, the display screen 194, the wireless communication module 160, the audio module 170, the sensor module 180, and so on.
- the GPIO interface can also be configured as an I2C interface, I2S interface, UART interface, MIPI interface, etc.
- the SIM interface can be used to communicate with the SIM card interface 195 to realize the function of transmitting data to the SIM card or reading data in the SIM card.
- the USB interface 130 is an interface that complies with the USB standard specification, and specifically may be a Mini USB interface, a Micro USB interface, a USB Type C interface, and so on.
- the USB interface 130 can be used to connect a charger to charge the electronic device 100, and can also be used to transfer data between the electronic device 100 and peripheral devices. It can also be used to connect earphones and play audio through earphones.
- the interface can also be used to connect other electronic devices, such as AR devices.
- the interface connection relationship between the modules illustrated in the embodiment of the present invention is only a schematic illustration, and does not constitute a structural limitation of the electronic device 100.
- the electronic device 100 may also adopt different interface connection modes in the foregoing embodiments, or a combination of multiple interface connection modes.
- the charging management module 140 is used to receive charging input from the charger.
- the charger can be a wireless charger or a wired charger.
- the power management module 141 is used to connect the battery 142, the charging management module 140 and the processor 110.
- the power management module 141 receives input from the battery 142 and/or the charging management module 140, and supplies power to the processor 110, the internal memory 121, the external memory, the display screen 194, the camera 193, and the wireless communication module 160.
- the wireless communication function of the electronic device 100 can be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modem processor, and the baseband processor.
- the antenna 1 and the antenna 2 are used to transmit and receive electromagnetic wave signals.
- Each antenna in the electronic device 100 can be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve antenna utilization.
- antenna 1 can be multiplexed as a diversity antenna of a wireless local area network.
- the antenna can be used in combination with a tuning switch.
- the mobile communication module 150 may provide a wireless communication solution including 2G/3G/4G/5G and the like applied to the electronic device 100.
- the mobile communication module 150 may include at least one filter, a switch, a power amplifier, a low noise amplifier (LNA), and the like.
- the mobile communication module 150 can receive electromagnetic waves by the antenna 1, and perform processing such as filtering, amplifying and transmitting the received electromagnetic waves to the modem processor for demodulation.
- the mobile communication module 150 can also amplify the signal modulated by the modem processor, and convert it into electromagnetic wave radiation via the antenna 1.
- at least part of the functional modules of the mobile communication module 150 may be provided in the processor 110.
- at least part of the functional modules of the mobile communication module 150 and at least part of the modules of the processor 110 may be provided in the same device.
- the modem processor may include a modulator and a demodulator.
- the modulator is used to modulate the low frequency baseband signal to be sent into a medium and high frequency signal.
- the demodulator is used to demodulate the received electromagnetic wave signal into a low-frequency baseband signal.
- the demodulator then transmits the demodulated low-frequency baseband signal to the baseband processor for processing.
- the low-frequency baseband signal is processed by the baseband processor and then passed to the application processor.
- the application processor outputs a sound signal through an audio device (not limited to the speaker 170A, the receiver 170B, etc.), or displays an image or video through the display screen 194.
- the modem processor may be an independent device.
- the modem processor may be independent of the processor 110 and be provided in the same device as the mobile communication module 150 or other functional modules.
- the wireless communication module 160 can provide applications on the electronic device 100 including wireless local area networks (WLAN) (such as wireless fidelity (Wi-Fi) networks), bluetooth (BT), and global navigation satellites.
- WLAN wireless local area networks
- BT wireless fidelity
- GNSS global navigation satellite system
- FM frequency modulation
- NFC near field communication technology
- infrared technology infrared, IR
- the wireless communication module 160 may be one or more devices integrating at least one communication processing module.
- the wireless communication module 160 receives electromagnetic waves via the antenna 2, frequency modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 110.
- the wireless communication module 160 may also receive a signal to be sent from the processor 110, perform frequency modulation, amplify it, and convert it into electromagnetic waves to radiate through the antenna 2.
- the antenna 1 of the electronic device 100 is coupled with the mobile communication module 150, and the antenna 2 is coupled with the wireless communication module 160, so that the electronic device 100 can communicate with the network and other devices through wireless communication technology.
- the wireless communication technology may include global system for mobile communications (GSM), general packet radio service (GPRS), code division multiple access (CDMA), broadband Code division multiple access (wideband code division multiple access, WCDMA), time-division code division multiple access (time-division code division multiple access, TD-SCDMA), long term evolution (LTE), BT, GNSS, WLAN, NFC , FM, and/or IR technology, etc.
- the GNSS may include the global positioning system (GPS), the global navigation satellite system (GLONASS), the Beidou navigation satellite system (BDS), and the quasi-zenith satellite system (quasi). -zenith satellite system, QZSS) and/or satellite-based augmentation systems (SBAS).
- GPS global positioning system
- GLONASS global navigation satellite system
- BDS Beidou navigation satellite system
- QZSS quasi-zenith satellite system
- SBAS satellite-based augmentation systems
- the electronic device 100 implements a display function through a GPU, a display screen 194, an application processor, and the like.
- the GPU is an image processing microprocessor, which is connected to the display screen 194 and the application processor.
- the GPU is used to perform mathematical and geometric calculations and is used for graphics rendering.
- the processor 110 may include one or more GPUs that execute program instructions to generate or change display information.
- the display screen 194 is used to display images, videos, and the like.
- the display screen 194 includes a display panel.
- the display panel can adopt liquid crystal display (LCD), organic light-emitting diode (OLED), active matrix organic light-emitting diode or active-matrix organic light-emitting diode (active-matrix organic light-emitting diode).
- Emitting diode AMOLED, flexible light-emitting diode (FLED), Miniled, MicroLed, Micro-oLed, quantum dot light-emitting diode (QLED), etc.
- the electronic device 100 may include one or N display screens 194, and N is a positive integer greater than one.
- the electronic device 100 can realize a shooting function through an ISP, a camera 193, a video codec, a GPU, a display screen 194, and an application processor.
- the ISP is used to process the data fed back from the camera 193. For example, when taking a picture, the shutter is opened, the light is transmitted to the photosensitive element of the camera through the lens, the light signal is converted into an electrical signal, and the photosensitive element of the camera transfers the electrical signal to the ISP for processing and is converted into an image visible to the naked eye.
- ISP can also optimize the image noise, brightness, and skin color. ISP can also optimize the exposure, color temperature and other parameters of the shooting scene.
- the ISP may be provided in the camera 193.
- the camera 193 is used to capture still images or videos.
- the object generates an optical image through the lens and is projected to the photosensitive element.
- the photosensitive element may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor.
- CMOS complementary metal-oxide-semiconductor
- the photosensitive element converts the optical signal into an electrical signal, and then transfers the electrical signal to the ISP to convert it into a digital image signal.
- ISP outputs digital image signals to DSP for processing.
- DSP converts digital image signals into standard RGB, YUV and other formats of image signals.
- the electronic device 100 may include one or N cameras 193, and N is a positive integer greater than one.
- Digital signal processors are used to process digital signals. In addition to digital image signals, they can also process other digital signals. For example, when the electronic device 100 selects a frequency point, the digital signal processor is used to perform Fourier transform on the energy of the frequency point.
- Video codecs are used to compress or decompress digital video.
- the electronic device 100 may support one or more video codecs. In this way, the electronic device 100 can play or record videos in multiple encoding formats, such as: moving picture experts group (MPEG) 1, MPEG2, MPEG3, MPEG4, and so on.
- MPEG moving picture experts group
- MPEG2 MPEG2, MPEG3, MPEG4, and so on.
- NPU is a neural-network (NN) computing processor.
- NN neural-network
- applications such as intelligent cognition of the electronic device 100 can be realized, such as image recognition, face recognition, voice recognition, text understanding, and so on.
- the external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the electronic device 100.
- the external memory card communicates with the processor 110 through the external memory interface 120 to realize the data storage function. For example, save music, video and other files in an external memory card.
- the internal memory 121 may be used to store computer executable program code, where the executable program code includes instructions.
- the processor 110 executes various functional applications and data processing of the electronic device 100 by running instructions stored in the internal memory 121.
- the internal memory 121 may include a storage program area and a storage data area.
- the storage program area can store an operating system, at least one application required by a function (such as a face recognition function, a fingerprint recognition function, a mobile payment function, etc.) and so on.
- the storage data area can store data created during the use of the electronic device 100 (such as face information template data, fingerprint information template, etc.) and the like.
- the internal memory 121 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, a universal flash storage (UFS), and the like.
- UFS universal flash storage
- the electronic device 100 can implement audio functions through the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the earphone interface 170D, and the application processor. For example, music playback, recording, etc.
- the audio module 170 is used to convert digital audio information into an analog audio signal for output, and is also used to convert an analog audio input into a digital audio signal.
- the audio module 170 can also be used to encode and decode audio signals.
- the audio module 170 may be provided in the processor 110, or part of the functional modules of the audio module 170 may be provided in the processor 110.
- the speaker 170A also called “speaker” is used to convert audio electrical signals into sound signals.
- the electronic device 100 can listen to music through the speaker 170A, or listen to a hands-free call.
- the receiver 170B also called a "handset" is used to convert audio electrical signals into sound signals.
- the electronic device 100 answers a call or voice message, it can receive the voice by bringing the receiver 170B close to the human ear.
- the microphone 170C also called “microphone”, “microphone”, is used to convert sound signals into electrical signals.
- the user can make a sound by approaching the microphone 170C through the human mouth, and input the sound signal into the microphone 170C.
- the electronic device 100 may be provided with at least one microphone 170C. In other embodiments, the electronic device 100 may be provided with two microphones 170C, which can implement noise reduction functions in addition to collecting sound signals. In other embodiments, the electronic device 100 may also be provided with three, four or more microphones 170C to collect sound signals, reduce noise, identify sound sources, and realize directional recording functions.
- the earphone interface 170D is used to connect wired earphones.
- the earphone interface 170D may be a USB interface 130, or a 3.5mm open mobile terminal platform (OMTP) standard interface, or a cellular telecommunications industry association (cellular telecommunications industry association of the USA, CTIA) standard interface.
- OMTP open mobile terminal platform
- CTIA cellular telecommunications industry association
- the pressure sensor 180A is used to sense the pressure signal and can convert the pressure signal into an electrical signal.
- the pressure sensor 180A may be provided on the display screen 194.
- the capacitive pressure sensor may include at least two parallel plates with conductive materials.
- the electronic device 100 determines the intensity of the pressure according to the change in capacitance.
- the electronic device 100 detects the intensity of the touch operation according to the pressure sensor 180A.
- the electronic device 100 may also calculate the touched position according to the detection signal of the pressure sensor 180A.
- touch operations that act on the same touch position but have different touch operation intensities can correspond to different operation instructions. For example: when a touch operation whose intensity is less than the first pressure threshold is applied to the short message application icon, an instruction to view the short message is executed. When a touch operation with a touch operation intensity greater than or equal to the first pressure threshold acts on the short message application icon, an instruction to create a new short message is executed.
- the gyro sensor 180B may be used to determine the movement posture of the electronic device 100.
- the angular velocity of the electronic device 100 around three axes ie, x, y, and z axes
- the gyro sensor 180B can be used for image stabilization.
- the gyro sensor 180B detects the shake angle of the electronic device 100, calculates the distance that the lens module needs to compensate according to the angle, and allows the lens to counteract the shake of the electronic device 100 through reverse movement to achieve anti-shake.
- the gyro sensor 180B can also be used for navigation and somatosensory game scenes.
- the air pressure sensor 180C is used to measure air pressure.
- the electronic device 100 calculates the altitude based on the air pressure value measured by the air pressure sensor 180C to assist positioning and navigation.
- the magnetic sensor 180D includes a Hall sensor.
- the electronic device 100 can use the magnetic sensor 180D to detect the opening and closing of the flip holster.
- the electronic device 100 can detect the opening and closing of the flip according to the magnetic sensor 180D. Then, according to the detected opening and closing state of the leather case or the opening and closing state of the flip cover, features such as automatic unlocking of the flip cover are set.
- the acceleration sensor 180E can detect the magnitude of the acceleration of the electronic device 100 in various directions (generally three axes). When the electronic device 100 is stationary, the magnitude and direction of gravity can be detected. It can also be used to identify the posture of electronic devices, and be used in applications such as horizontal and vertical screen switching, pedometers and so on.
- the electronic device 100 can measure the distance by infrared or laser. In some embodiments, when shooting a scene, the electronic device 100 may use the distance sensor 180F to measure the distance to achieve fast focusing.
- the proximity light sensor 180G may include, for example, a light emitting diode (LED) and a light detector such as a photodiode.
- the light emitting diode may be an infrared light emitting diode.
- the electronic device 100 emits infrared light to the outside through the light emitting diode.
- the electronic device 100 uses a photodiode to detect infrared reflected light from nearby objects. When sufficient reflected light is detected, it can be determined that there is an object near the electronic device 100. When insufficient reflected light is detected, the electronic device 100 can determine that there is no object near the electronic device 100.
- the electronic device 100 can use the proximity light sensor 180G to detect that the user holds the electronic device 100 close to the ear to talk, so as to automatically turn off the screen to save power.
- the proximity light sensor 180G can also be used in leather case mode, and the pocket mode will automatically unlock and lock the screen.
- the ambient light sensor 180L is used to sense the brightness of the ambient light.
- the electronic device 100 can adaptively adjust the brightness of the display screen 194 according to the perceived brightness of the ambient light.
- the ambient light sensor 180L can also be used to automatically adjust the white balance when taking pictures.
- the ambient light sensor 180L can also cooperate with the proximity light sensor 180G to detect whether the electronic device 100 is in the pocket to prevent accidental touch.
- the fingerprint sensor 180H is used to collect fingerprints.
- the electronic device 100 can use the collected fingerprint characteristics to implement fingerprint unlocking, access application locks, fingerprint photographs, fingerprint answering calls, and so on.
- the temperature sensor 180J is used to detect temperature.
- the electronic device 100 uses the temperature detected by the temperature sensor 180J to execute a temperature processing strategy. For example, when the temperature reported by the temperature sensor 180J exceeds a threshold value, the electronic device 100 reduces the performance of the processor located near the temperature sensor 180J, so as to reduce power consumption and implement thermal protection.
- the electronic device 100 when the temperature is lower than another threshold, the electronic device 100 heats the battery 142 to avoid abnormal shutdown of the electronic device 100 due to low temperature.
- the electronic device 100 boosts the output voltage of the battery 142 to avoid abnormal shutdown caused by low temperature.
- Touch sensor 180K also called “touch panel”.
- the touch sensor 180K may be provided on the display screen 194, and the touch screen is composed of the touch sensor 180K and the display screen 194, which is also called a “touch screen”.
- the touch sensor 180K is used to detect touch operations acting on or near it.
- the touch sensor can pass the detected touch operation to the application processor to determine the type of touch event.
- the visual output related to the touch operation can be provided through the display screen 194.
- the touch sensor 180K may also be disposed on the surface of the electronic device 100, which is different from the position of the display screen 194.
- the button 190 includes a power-on button, a volume button, and so on.
- the button 190 may be a mechanical button. It can also be a touch button.
- the electronic device 100 may receive key input, and generate key signal input related to user settings and function control of the electronic device 100.
- the motor 191 can generate vibration prompts.
- the motor 191 can be used for incoming call vibration notification, and can also be used for touch vibration feedback.
- touch operations that act on different applications can correspond to different vibration feedback effects.
- Acting on touch operations in different areas of the display screen 194, the motor 191 can also correspond to different vibration feedback effects.
- Different application scenarios for example: time reminding, receiving information, alarm clock, games, etc.
- the touch vibration feedback effect can also support customization.
- the indicator 192 may be an indicator light, which may be used to indicate the charging status, power change, or to indicate messages, missed calls, notifications, and so on.
- the SIM card interface 195 is used to connect to the SIM card.
- the SIM card can be inserted into the SIM card interface 195 or pulled out from the SIM card interface 195 to achieve contact and separation with the electronic device 100.
- the electronic device 100 may support 1 or N SIM card interfaces, and N is a positive integer greater than 1.
- the SIM card interface 195 can support Nano SIM cards, Micro SIM cards, SIM cards, etc.
- the same SIM card interface 195 can insert multiple cards at the same time. The types of the multiple cards can be the same or different.
- the SIM card interface 195 can also be compatible with different types of SIM cards.
- the SIM card interface 195 can also be compatible with external memory cards.
- the electronic device 100 interacts with the network through the SIM card to implement functions such as call and data communication.
- the electronic device 100 may receive the user's voice input and environmental information through the microphone 170C and the sensor module 180. After the user's voice input is converted into digital audio information through the audio module 170, the processor 110 may perform voice recognition , Converted into text information. Then execute the intention recognition method in the embodiment of the present application to identify the user's intention and slot, and express it with structured semantics.
- FIG. 6 is a block diagram of the software structure of the electronic device 100 according to an embodiment of the present application.
- the layered architecture divides the software into several layers, and each layer has a clear role and division of labor. Communication between layers through software interface.
- the system is divided into four layers, from top to bottom, the application layer, the application framework layer, the runtime and system libraries, and the kernel layer.
- the application layer can include a series of application packages.
- the application package may include applications (also referred to as applications) such as camera, gallery, calendar, call, map, navigation, WLAN, Bluetooth, music, video, short message, etc.
- applications also referred to as applications
- the application layer may also include an intention recognition module.
- the intention recognition module is used to execute the intention recognition method in the embodiment of the present application.
- the application framework layer provides an application programming interface (application programming interface, API) and a programming framework for applications in the application layer.
- the application framework layer includes some predefined functions.
- the application framework layer can include a window manager, a content provider, a view system, a phone manager, a resource manager, a notification manager, and so on.
- the window manager is used to manage window programs.
- the window manager can obtain the size of the display screen, determine whether there is a status bar, lock the screen, take a screenshot, etc.
- the content provider is used to store and retrieve data and make these data accessible to applications.
- the data may include video, image, audio, phone calls made and received, browsing history and bookmarks, phone book, etc.
- the view system includes visual controls, such as controls that display text, controls that display pictures, and so on.
- the view system can be used to build applications.
- the display interface can be composed of one or more views.
- a display interface that includes a short message notification icon may include a view that displays text and a view that displays pictures.
- the phone manager is used to provide the communication function of the electronic device 100. For example, the management of the call status (including connecting, hanging up, etc.).
- the resource manager provides various resources for the application, such as localized strings, icons, pictures, layout files, video files, and so on.
- the notification manager enables the application to display notification information in the status bar, which can be used to convey notification-type messages, and it can automatically disappear after a short stay without user interaction.
- the notification manager is used to notify that the download is complete, message reminders, and so on.
- the notification manager can also be a notification that appears in the status bar at the top of the system in the form of a chart or a scroll bar text, such as a notification of an application running in the background, or a notification that appears on the screen in the form of a dialogue interface.
- prompt text information in the status bar sound a prompt sound, electronic device vibration, flashing indicator light, etc.
- Runtime includes core libraries and virtual machines. Runtime is responsible for system scheduling and management.
- the core library consists of two parts: one part is the function function that the java language needs to call, and the other part is the core library of the system.
- the application layer and the application framework layer run in a virtual machine.
- the virtual machine executes the java files of the application layer and the application framework layer as binary files.
- the virtual machine is used to perform functions such as object life cycle management, stack management, thread management, security and exception management, and garbage collection.
- the system library can include multiple functional modules. For example: surface manager (surface manager), media library (Media Libraries), three-dimensional graphics processing library (for example: OpenGL ES), two-dimensional graphics engine (for example: SGL), etc.
- the surface manager is used to manage the display subsystem, and provides a combination of two-dimensional (2-dimensional, 2D) and three-dimensional (3-dimensional, 3D) layers for multiple applications.
- the media library supports playback and recording of a variety of commonly used audio and video formats, as well as still image files.
- the media library can support a variety of audio and video encoding formats, such as: MPEG4, H.264, MP3, AAC, AMR, JPG, PNG, etc.
- the 3D graphics processing library is used to implement 3D graphics drawing, image rendering, synthesis, and layer processing.
- the 2D graphics engine is a drawing engine for 2D drawing.
- the kernel layer is the layer between hardware and software.
- the kernel layer contains at least display driver, camera driver, audio driver, sensor driver, and virtual card driver.
- the corresponding hardware interrupt is sent to the kernel layer.
- the kernel layer processes the touch operation into the original input event (including touch coordinates, time stamp of the touch operation, etc.).
- the original input events are stored in the kernel layer.
- the application framework layer obtains the original input event from the kernel layer and identifies the control corresponding to the input event. Taking the touch operation as a touch click operation, and the control corresponding to the click operation is the control of the camera application icon as an example, the camera application calls the interface of the application framework layer to start the camera application, and then starts the camera driver by calling the kernel layer.
- the camera 193 captures still images or videos.
- FIG. 7 it is a schematic diagram of the architecture of the intention recognition module in the embodiment of this application.
- the intent recognition module is an NLU engine 700.
- the NLU engine 700 is used to perform semantic analysis on user input, and output analysis results such as intent and slot position for use by other modules.
- the NLU engine 700 includes a text preprocessing unit 701, a rule engine 702, a machine learning engine 703, an entity recognition unit 704, an intent classification unit 705, and a slot filling unit 706.
- the text preprocessing unit 701 is used to preprocess the text input by the user, mainly including removing format symbols in the text that are not needed for subsequent semantic analysis, such as punctuation and spaces. It is understandable that the text input by the user is generally obtained after voice recognition of the user's voice information.
- the rule engine 702 is configured to perform rule matching on the preprocessed text input by the user according to preset rules based on FST, perform high-frequency sentence pattern coverage, and obtain formatted text with intentional slot format tags.
- the intention recognition method in the embodiment of the present application is mainly used for the construction of the rule engine 702.
- the machine learning engine 703 is used to process the preprocessed text input by the user through machine learning to obtain formatted text marked with an intentional slot format.
- Entity recognition 704 is used to extract entity information from formatted text with intentional slot format tags output by the rule engine 702 or the machine learning engine 703;
- the intention classification 705 is used to extract the intention information from the formatted text marked with the intention slot format output by the rule engine 702 or the machine learning engine 703;
- Slot filling 706 is used to extract slot information from the formatted text of the intentional slot format mark output by the rule engine 702 or the machine learning engine 703.
- FIG. 8 a schematic diagram of a usage scenario of the intention recognition method in an embodiment of this application.
- the electronic device 100 After the user inputs "Call Dad" into the electronic device 100 by voice, the electronic device 100 will recognize the user input by voice, convert it into the text input by the user, and then use the NLU engine 700 to perform intent recognition on the text input by the user and output
- the intention is to "call” and the slot “dad” to the dial-up application.
- the dialing application dials the phone number of "Dad” according to the intention and slot.
- CPU Intel(R)Xeon(R)E5-2690v2@3.00GHz, memory: 32GB;
- end-to-end 150ms 100tps, end-to-end: 12ms
- FIG. 9 it is an exemplary schematic diagram of the FST intention slot model in the embodiment of this application.
- the intent information is inserted after the initial state of the FST, and the slot label information is inserted before and after all slots, so as to realize the output of the intent information and the slot identification information when the FST is matched.
- the FST intent slot model includes states and transitions between states, specifically:
- the FST status is represented by a circle plus a status number
- State 0 and State 14 are the initial state and end state of FST respectively, and the other states are intermediate states.
- FST rule of the FST intent slot model shown in Figure 9 can be written as:
- FIG. 10 it is a schematic flowchart of an intention recognition method in an embodiment of this application.
- the electronic device In response to a user's voice input, the electronic device converts the voice input into a first text;
- the electronic device 100 may convert the voice input into text: “Call Dad”.
- the electronic device 100 may execute S1001 only after receiving a certain trigger.
- the electronic device 100 may perform step S1001 only after detecting that the user has turned on the voice assistant function; or, the electronic device 100 may perform step S1001 after detecting that the user double-clicks on the screen; or the electronic device 100 may also perform step S1001 after it is turned on.
- Step S1001 can be executed, which is not limited here.
- the electronic device performs format preprocessing on the first text to obtain the second text;
- the main purpose of the electronic device performing format preprocessing on the first text is to remove format characters that are not used in the subsequent FST intent slot model matching in the first text. For example, remove spaces, punctuation marks, etc. in the first text.
- the first text is: "Call Huaweing.”
- the second text obtained is: "Call Huaweing”. Removed the spaces and periods.
- step S1002 may not be performed, and step S1003 is directly performed on the first text obtained after step S1001 is performed. At this time, the first text is the second text. Because the FST intent slot model can also add relevant rules for formatting the first text, but this will increase the complexity of the FST intent slot model. Which scheme to use can be selected according to the actual situation and is not limited here.
- the electronic device uses the preset FST intent slot model to perform rule matching on the second text to obtain the third text;
- the preset FST intent slot model is a preset FST; the preset FST intent slot model will add preset intent label information and/or preset slot labels to the input text during the state transition process Information, the preset intent labeling information is used to label the intent information of the input text, and the preset slot labeling information is used to label the slot information in the input text.
- the third text is a text containing preset intent labeling information and/or preset slot labeling information after the rule matching is performed on the second text.
- A) FST receives null characters and outputs intent information ⁇ CALL>, and transfers from initial state 0 to state 1;
- FST accepts the character “electricity”, outputs the character “electricity”, and transfers from state 2 to state 3;
- FST accepts the character "word”, outputs the character "word”, and transfers from state 3 to state 4;
- FST accepts empty characters, outputs the slot label information prefix " ⁇ ", and transfers from state 5 to state 6;
- FST accepts the character "small”, outputs the character "small”, and transfers from state 6 to state 7;
- FST accepts the character "Ming", outputs the character "Ming", and transfers from state 7 to state 10;
- FST accepts empty characters, outputs the suffix ":contact>" of the slot label information, and transfers from state 10 to termination state 14, thereby completing the matching process;
- the output " ⁇ CALL> to call ⁇ :contact>" is the third text obtained after rule matching.
- the third text will contain preset intent labeling information, such as ⁇ >, and preset slot labeling information, such as ⁇ :contact>.
- the preset intent tagging information ⁇ > indicates that the intention of the input second text is CALL
- the preset slot tagging information ⁇ :contact> indicates that the slot in the input second text is Xiaoming.
- Fig. 9 is an exemplary FST intent slot model.
- other preset symbols can also be used as preset intent labeling information and preset slot labeling information according to actual needs, which is not limited here.
- the electronic device obtains intent information and/or slot information from the third text according to preset intent labeling information and/or preset slot labeling information;
- the electronic device After the electronic device obtains the third text containing the preset intent labeling information and the preset slot labeling information, it can extract the intent information and the preset slot labeling information from the third text according to the position of the preset intent labeling information and the preset slot labeling information. Slot information.
- the electronic device labels the information according to the preset intent ⁇ >, and the intent information can be extracted as CALL; label according to the preset slot Information ⁇ :contact>, the slot information can be extracted as Xiaoming.
- the electronic device outputs the intent information and/or slot information in a structured manner.
- the electronic device can structure the output of the intent information and the slot information for use by other modules in the electronic device.
- the structured output may also contain other information, such as the first text, the second text, etc., which are not limited here.
- the electronic device uses a preset FST intent slot model to recognize user input. Since the preset FST intent slot model is an FST, based on the characteristics of FST rule matching, it performs rule matching extremely fast . And after matching, the third text with preset intent labeling information and preset slot labeling information can be obtained. After the matching is completed, the intent information and slot information can be easily extracted from it, which greatly improves the use of NLU rules for intent The speed and accuracy of matching during recognition.
- the electronic device 100 needs to recognize many different types of intents, when constructing the FST intent slot model, a strategy of one intent and one FST intent slot model is generally adopted, which can effectively reduce the number of different intents. Conflict of rules between intentions. When performing rule matching, a multi-intent parallel matching method can be used to minimize the matching delay. Therefore, the electronic device 100 can store a large number of preset FST intent slot models.
- the preset FST intent slot model may be a preset WFST intent slot model. Since each state transition in WFST will have a weight, the cumulative weight will be obtained after the final matching is completed.
- the electronic device can extract intent information and slot information from the output result with the highest score.
- FIG. 11 it is a schematic diagram of another process of the intention recognition method according to an embodiment of this application.
- the corpus producer or researcher compiles the WFST rules of each intent through the front-end rule editor, and adopts the organization method of one intent and one WFST.
- rule_2 Open" " ⁇ "word+:app>"
- word is a wildcard character, which can match any character, and its weight is 1.
- the weight for matching other characters is the default weight 0.
- the intent of the WFST Rule 1 is: to open the application WeChat.
- the intent of this WFST rule 1 is: to open a short message.
- the WFST intent slot model file is the WFST intent slot model.
- FIG. 12 it is a WFST intent slot model 1 compiled according to WFST rule 1.
- the WFST intention slot model 1 includes 10 states. Among them, state 0 represents the initial state, and states 7 and 9 represent the end state.
- the WFST intent slot model 1 has two paths: 0, 1, 2, 3, 4, 5, 6, 7 and 0, 1, 2, 3, 4, 8, 9:
- One of the paths is 4, 5, 6, 7:
- From state 4 to state 8 is to receive an arbitrary character ⁇ any>, output the arbitrary character ⁇ any>, and its weight is 1;
- From state 8 to state 8 is to receive an arbitrary character ⁇ any>, output the arbitrary character ⁇ any>, and its weight is 1;
- the calculation method of the cumulative weight of the WFST intention slot model 1 is the addition of the weights in the path.
- the WFST intent slot model 2 includes 6 states. Among them, state 0 represents the initial state, and state 5 represents the end state.
- the WFST intent slot model 2 has only one path: 0, 1, 2, 3, 4, 5.
- the cumulative weight of the WFST intention slot model 2 is calculated by adding the weights in the path.
- Each output result includes intent identification information, slot identification information and weight;
- FIG. 14 is a schematic diagram of another process of the intention recognition method in an embodiment of this application.
- the electronic device loads the preset WFST intent slot model
- the electronic device can load the preset WFST intent slot model by reading the WFST intent slot model file stored in the electronic device.
- the electronic device in response to the user's voice input, the electronic device converts the voice input into a first text
- step S1001 It is similar to step S1001 and will not be repeated here.
- the electronic device performs format preprocessing on the first text to obtain the second text
- step S1002 It is similar to step S1002 and will not be repeated here.
- steps S1402 and S1403 can be performed simultaneously with step S1401, can be performed before step S1401, or can be performed after step S1401, which is not limited here.
- the electronic device uses multiple preset WFST intent slot models to perform parallel rule matching on the second text to obtain a WFST result;
- a preset WFST intent slot model is a preset WFST.
- the preset WFST intent slot model will add preset intent labeling information and/or preset slot labeling information to the input text during the state transition process.
- the preset intent labeling information is used to label the input text.
- the preset slot labeling information is used to label the slot information in the input text.
- each state transition has a weight.
- the WFST result includes the matching text of the second text and the cumulative weight of the matching path.
- the matching text of the second text is the output text containing the preset intent label information and/or the preset slot label information after successful matching through the matching path.
- FIG. 15 it is an exemplary schematic diagram of performing parallel rule matching of multiple preset WFST intent slot models on the second text in an embodiment of this application.
- the electronic device adopts a strategy of intent to a WFST intent slot model, and stores multiple preset WFST intent slot models, such as CALL.fst, MESSAGE.fst, NAVIGATE.fst, and so on. After the electronic device loads these preset WFST intent slot models, it can perform parallel rule matching on the second text obtained by preprocessing. Because in the preset WFST intent slot model, some models have wildcards in some paths, the second text may only match one path in a preset WFST intent slot model, or it may be matched with multiple presets. Multiple paths in the WFST intent slot model are all matched successfully. Therefore, one WFST result may be obtained, or multiple WFST results may be obtained.
- step S1406 is executed;
- step S1405 needs to be executed, and the WFST result with the highest credibility among them is determined according to the cumulative weight.
- weights in the preset WFST intent slot model can be customized, and the calculation method of cumulative weights can also be customized. Therefore, the concept of weight and cumulative weight can be different according to the setting.
- the calculation method of the weight can be different, and accordingly, the method of determining the WFST result with the highest credibility can also be different, which is not limited here.
- this is an exemplary schematic diagram of a case where different settings are used for the weight and the cumulative weight in the preset WFST intent slot model in this embodiment of the present application.
- the following is an exemplary description of these different settings:
- Case 1 The weight of a state transition that accepts wildcards is greater than that of a state transition that does not accept wildcards, and the greater the weight of each state transition on the matching path, the greater the cumulative weight of the matching path.
- the weight represents the cost of state transition
- the cumulative weight represents the cost of matching paths.
- the greater the cumulative weight the greater the cost of WFST matching the current path, and correspondingly, the lower the credibility. Therefore, when the state transitions, the weight of the state transition that accepts the wildcard is greater than the weight of the state transition that does not accept the wildcard.
- the WFST intent slot model 1 shown in FIG. 11 and the WFST intent slot model 2 shown in FIG. 12 are the WFST intent slot models set according to the situation 1. Taking the second text as "Open SMS" as an example, when the second text uses WFST intent slot model 1 and WFST intent slot model 2 to perform parallel rule matching:
- WFST intent slot model 1 will follow the matching path 0, 1, 2, 3, 6, 10, 10, 11 to get WFST result 1: ⁇ OPEN_APP> open ⁇ message:app>,Weight:2.
- WFST intent slot model 2 will follow the matching path 0, 1, 2, 3, 4, 5 to get WFST result 2: ⁇ CHECK_MESSAGE> open the message, Weight:0.
- Case 2 The weight of the state transition that accepts the wildcard is greater than the weight of the state transition that does not accept the wildcard, and the greater the weight of each state transition on the matching path, the smaller the cumulative weight of the matching path.
- the weight of the state transition can be set to indicate the cost of the state transition, and the cumulative weight of the matching path indicates the credibility of the matching path. The greater the cumulative weight, the higher the credibility.
- the cumulative weight calculation method can be set so that the greater the weight of the state transition on the matching path, the smaller the cumulative weight of the matching path.
- the calculation method of the cumulative path adopts the negative number of the sum of state transitions on the matching path, etc., which is not limited here.
- Case 3 The weight of the state transition that accepts the wildcard is less than the weight of the state transition that does not accept the wildcard, and the smaller the weight of each state transition on the matching path, the greater the cumulative weight of the matching path.
- the weight of the state transition can be set to indicate the credibility of the state transition, and the cumulative weight of the matching path indicates the cost of the matching path. The larger the cumulative weight, the higher the cost and the lower the credibility.
- Case 4 The weight of the state transition that accepts the wildcard is less than the weight of the state transition that does not accept the wildcard, and the smaller the weight of each state transition on the matching path, the smaller the cumulative weight of the matching path.
- the weight of the state transition can be set to indicate the credibility of the state transition, and the cumulative weight of the matching path indicates the credibility of the matching path. The greater the cumulative weight, the higher the credibility.
- the electronic device determines that the matching text of the second text in the WFST result with the highest credibility is the third text;
- electronic devices can directly use the cumulative weights in the WFST results for comparison. For example, when the cumulative weight represents credibility, it is determined that the higher the cumulative weight, the higher the credibility of the WFST result. When the cumulative weight represents the cost, it is determined that the smaller the cumulative weight, the higher the credibility of the WFST result.
- each preset WFST intent slot model may have a different number of state transitions, in order to make the final credibility comparison fair. Therefore, the cumulative weight can be normalized first to obtain the credibility score, and then the WFST result with the highest credibility can be determined according to the credibility score. Specifically:
- the electronic device performs credibility score calculation on the multiple WFST results
- the cumulative weight represents the cost of matching paths.
- a formula for calculating the credibility score can be:
- w represents the cumulative weight of the matching path in the WFST results that need to be scored
- w max represents the largest cumulative weight among the cumulative weights of the matching path in the multiple WFST results.
- the score range after the normalized credibility score calculation is [0, 1], the higher the score, the higher the credibility.
- WFST result 1 ⁇ OPEN_APP> open ⁇ message:app>, Weight:2, and WFST result 2: ⁇ CHECK_MESSAGE> open the SMS, Weight:0, to calculate the credibility score obtained in the above example
- w max is 2
- the credibility score calculation formula 1 is an optional credibility score calculation formula when the cumulative weight represents the cost of the matching path. In practical applications, other calculation formulas can also be used to determine the credibility score of the WFST results. It is only necessary to make the higher the score, the higher the credibility.
- the electronic device can determine that the matching text " ⁇ CHECK_MESSAGE> Open SMS" of the second text in WFST result 2 is the first Three texts.
- S1406 The electronic device obtains the intent information and/or slot information from the third text according to the preset intent labeling information and/or the preset slot labeling information;
- step S1004 It is similar to step S1004, and will not be repeated here.
- the electronic device labels the information ⁇ > according to the preset intention, and can extract the intention information from the third text: CHECK_MESSAGE. It is understandable that if there is slot marking information in the third text, the electronic device can also extract the slot information from the third text, which is not limited here.
- the electronic device outputs the intent information and/or slot information in a structured manner.
- step S1005 It is similar to step S1005 and will not be repeated here.
- the electronic device can structure the intention information:
- the structured output can be used by other modules in the electronic device.
- multiple preset WFST intent slot models can be stored in the electronic device.
- the electronic device can use multiple WFST intent slot models to match user input in parallel. From the multiple WFST results obtained, the WFST result with the highest credibility is determined to extract and output the intention information and slot information. While improving the rate of intent recognition, it also improves the accuracy of intent recognition, and the WFST intent slot model is used for parallel matching, which greatly improves the matching rate and reduces the computing load of electronic devices.
- the term “when” can be interpreted as meaning “if" or “after” or “in response to determining" or “in response to detecting".
- the phrase “when determining" or “if detected (statement or event)” can be interpreted as meaning “if determined" or “in response to determining" or “when detected (Condition or event stated)” or “in response to detection of (condition or event stated)”.
- the computer program product includes one or more computer instructions.
- the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
- the computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium. For example, the computer instructions may be transmitted from a website, computer, server, or data center.
- the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or a data center integrated with one or more available media.
- the usable medium may be a magnetic medium, (for example, a floppy disk, a hard disk, and a magnetic tape), an optical medium (for example, a DVD), or a semiconductor medium (for example, a solid state hard disk).
- the process can be completed by a computer program instructing relevant hardware.
- the program can be stored in a computer readable storage medium. , May include the processes of the foregoing method embodiments.
- the aforementioned storage media include: ROM or random storage RAM, magnetic disks or optical discs and other media that can store program codes.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Machine Translation (AREA)
Abstract
Procédé de reconnaissance d'intention et dispositif électronique. Dans le procédé décrit, un dispositif électronique convertit une entrée vocale en un premier texte, puis utilise un modèle d'intervalle d'intention FST prédéfini pour effectuer une mise en correspondance de règles sur un deuxième texte déterminé en fonction du premier texte, obtient un troisième texte, puis obtient des informations d'intention et/ou des informations d'intervalle à partir du troisième texte selon des informations de marquage d'intention prédéfinies et/ou des informations de marquage d'intervalle prédéfinies. Par mise en œuvre de la solution technique fournie par la présente invention, la vitesse et la précision de mise en correspondance sont améliorées lors de l'utilisation d'une règle NLU pour la reconnaissance d'intention.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010555603.3A CN113806473A (zh) | 2020-06-17 | 2020-06-17 | 意图识别方法和电子设备 |
CN202010555603.3 | 2020-06-17 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021254411A1 true WO2021254411A1 (fr) | 2021-12-23 |
Family
ID=78892632
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/100475 WO2021254411A1 (fr) | 2020-06-17 | 2021-06-17 | Procédé de reconnaissance d'intention et dispositif électronique |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN113806473A (fr) |
WO (1) | WO2021254411A1 (fr) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115563951A (zh) * | 2022-10-14 | 2023-01-03 | 美的集团(上海)有限公司 | 文本序列的标注方法、装置、存储介质和电子设备 |
CN117034957A (zh) * | 2023-06-30 | 2023-11-10 | 海信集团控股股份有限公司 | 一种语义理解方法及设备 |
US11934794B1 (en) * | 2022-09-30 | 2024-03-19 | Knowbl Inc. | Systems and methods for algorithmically orchestrating conversational dialogue transitions within an automated conversational system |
CN118467570A (zh) * | 2024-07-10 | 2024-08-09 | 浪潮通用软件有限公司 | 一种用于业务数据查询的数据后处理方法、系统及设备 |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116416993A (zh) * | 2021-12-30 | 2023-07-11 | 华为技术有限公司 | 一种语音识别的方法和装置 |
CN115453897A (zh) * | 2022-08-18 | 2022-12-09 | 青岛海尔科技有限公司 | 意图指令的确定方法及装置、存储介质及电子装置 |
CN117973394A (zh) * | 2022-10-25 | 2024-05-03 | 华为技术有限公司 | 一种自然语言处理方法、装置、电子设备及存储介质 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8700404B1 (en) * | 2005-08-27 | 2014-04-15 | At&T Intellectual Property Ii, L.P. | System and method for using semantic and syntactic graphs for utterance classification |
CN109543190A (zh) * | 2018-11-29 | 2019-03-29 | 北京羽扇智信息科技有限公司 | 一种意图识别方法、装置、设备及存储介质 |
CN110019745A (zh) * | 2017-10-26 | 2019-07-16 | 株式会社日立制作所 | 具有自学习自然语言理解的对话系统 |
CN111078844A (zh) * | 2018-10-18 | 2020-04-28 | 上海交通大学 | 软件众包的任务型对话系统及方法 |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108829757B (zh) * | 2018-05-28 | 2022-01-28 | 广州麦优网络科技有限公司 | 一种聊天机器人的智能服务方法、服务器及存储介质 |
CN110209791B (zh) * | 2019-06-12 | 2021-03-26 | 百融云创科技股份有限公司 | 一种多轮对话智能语音交互系统及装置 |
-
2020
- 2020-06-17 CN CN202010555603.3A patent/CN113806473A/zh active Pending
-
2021
- 2021-06-17 WO PCT/CN2021/100475 patent/WO2021254411A1/fr active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8700404B1 (en) * | 2005-08-27 | 2014-04-15 | At&T Intellectual Property Ii, L.P. | System and method for using semantic and syntactic graphs for utterance classification |
CN110019745A (zh) * | 2017-10-26 | 2019-07-16 | 株式会社日立制作所 | 具有自学习自然语言理解的对话系统 |
CN111078844A (zh) * | 2018-10-18 | 2020-04-28 | 上海交通大学 | 软件众包的任务型对话系统及方法 |
CN109543190A (zh) * | 2018-11-29 | 2019-03-29 | 北京羽扇智信息科技有限公司 | 一种意图识别方法、装置、设备及存储介质 |
Non-Patent Citations (1)
Title |
---|
WU YUKAI: "RESEARCH AND IMPLEMENTATION ON SEMANTIC PROCESSING SYSTEM BASED ON RULE MATCHING", CHINESE SELECTED DOCTORAL DISSERTATIONS AND MASTER'S THESES FULL-TEXT DATABASES (MASTER), INFORMATION SCIENCE AND TECHNOLOGY, 15 February 2020 (2020-02-15), XP055881926 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11934794B1 (en) * | 2022-09-30 | 2024-03-19 | Knowbl Inc. | Systems and methods for algorithmically orchestrating conversational dialogue transitions within an automated conversational system |
CN115563951A (zh) * | 2022-10-14 | 2023-01-03 | 美的集团(上海)有限公司 | 文本序列的标注方法、装置、存储介质和电子设备 |
CN117034957A (zh) * | 2023-06-30 | 2023-11-10 | 海信集团控股股份有限公司 | 一种语义理解方法及设备 |
CN117034957B (zh) * | 2023-06-30 | 2024-05-31 | 海信集团控股股份有限公司 | 一种融合大模型的语义理解方法及设备 |
CN118467570A (zh) * | 2024-07-10 | 2024-08-09 | 浪潮通用软件有限公司 | 一种用于业务数据查询的数据后处理方法、系统及设备 |
Also Published As
Publication number | Publication date |
---|---|
CN113806473A (zh) | 2021-12-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110111787B (zh) | 一种语义解析方法及服务器 | |
WO2021254411A1 (fr) | Procédé de reconnaissance d'intention et dispositif électronique | |
CN110910872B (zh) | 语音交互方法及装置 | |
CN112567457B (zh) | 语音检测方法、预测模型的训练方法、装置、设备及介质 | |
CN112154431B (zh) | 一种人机交互的方法及电子设备 | |
CN110798506B (zh) | 执行命令的方法、装置及设备 | |
WO2021258797A1 (fr) | Procédé d'entrée d'informations d'image, dispositif électronique, et support de stockage lisible par ordinateur | |
CN111970401B (zh) | 一种通话内容处理方法、电子设备和存储介质 | |
CN114691839A (zh) | 一种意图槽位识别方法 | |
CN116052648A (zh) | 一种语音识别模型的训练方法、使用方法及训练系统 | |
WO2021238371A1 (fr) | Procédé et appareil de génération d'un personnage virtuel | |
WO2022127130A1 (fr) | Procédé d'ajout de séquence d'opérations, dispositif électronique, et système | |
WO2021031862A1 (fr) | Procédé de traitement de données et appareil associé | |
CN111768765A (zh) | 语言模型生成方法和电子设备 | |
CN113380240B (zh) | 语音交互方法和电子设备 | |
WO2022007757A1 (fr) | Procédé d'enregistrement d'empreinte vocale inter-appareils, dispositif électronique et support de stockage | |
CN114817521B (zh) | 搜索方法和电子设备 | |
WO2023236908A1 (fr) | Procédé de description d'image, dispositif électronique et support de stockage lisible par ordinateur | |
WO2024140891A1 (fr) | Procédé de compilation, dispositif électronique et système | |
WO2021238338A1 (fr) | Procédé et dispositif de synthèse de parole | |
CN118568380A (zh) | 一种人机交互方法、电子设备及系统 | |
CN118672531A (zh) | 跨容器显示的方法和装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21826479 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21826479 Country of ref document: EP Kind code of ref document: A1 |