WO2021254411A1 - Intent recognigion method and electronic device - Google Patents

Intent recognigion method and electronic device Download PDF

Info

Publication number
WO2021254411A1
WO2021254411A1 PCT/CN2021/100475 CN2021100475W WO2021254411A1 WO 2021254411 A1 WO2021254411 A1 WO 2021254411A1 CN 2021100475 W CN2021100475 W CN 2021100475W WO 2021254411 A1 WO2021254411 A1 WO 2021254411A1
Authority
WO
WIPO (PCT)
Prior art keywords
text
wfst
intent
preset
electronic device
Prior art date
Application number
PCT/CN2021/100475
Other languages
French (fr)
Chinese (zh)
Inventor
潘龙飞
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2021254411A1 publication Critical patent/WO2021254411A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Definitions

  • This application relates to the field of artificial intelligence technology, in particular to an intention recognition method and electronic equipment.
  • Natural language processing is a sub-field of artificial intelligence (AI).
  • Natural language understanding is a sub-field of natural language processing, and it is also the most difficult subject of NLP. Intent recognition and slot filling are the two most critical tasks of NLU, but due to factors such as language diversity, ambiguity, robustness, knowledge dependence and context, it is very difficult for NLU to complete these two tasks well. big.
  • CFG context free grammar
  • CYK algorithm cocke younger kasami algorithm, CYK algorithm
  • CNF Chomsky normal form
  • the entire analysis path can generate a grammatical analysis tree, and the user can extract grammatical features based on this grammatical tree, and then obtain the desired grammatical information, such as parts of speech, entities, sentence components and other information.
  • This application provides a method and electronic device for intent recognition to improve the speed and accuracy of matching when using NLU rules for intent recognition.
  • the present application provides an intent recognition method.
  • the method includes: in response to a user's voice input, an electronic device converts the voice input into a first text; the electronic device uses a preset FST intent slot model to pair Rule matching is performed on the second text to obtain the third text; the second text is determined according to the first text; the preset FST intent slot model is a preset FST; the third text includes preset intent tagging information And/or preset slot labeling information, the preset intent labeling information is used to label the intent information of the second text, and the preset slot labeling information is used to label the slot information in the second text; the The electronic device obtains the intention information and/or the slot information from the third text according to the preset intention labeling information and/or the preset slot labeling information.
  • the electronic device uses the preset FST intent slot model to recognize user input. Since the preset FST intent slot model is an FST, based on the characteristics of FST rule matching, it performs rule matching extremely fast . And after matching, the third text with preset intent labeling information and preset slot labeling information can be obtained. After the matching is completed, the intent information and slot information can be easily extracted from it, which greatly improves the use of NLU rules for intent The speed and accuracy of matching during recognition.
  • the electronic device uses a preset FST intent slot model to perform rule matching on the second text, and before the step of obtaining the third text, the method further includes: the electronic The device performs format preprocessing on the first text to obtain the second text; the format characters in the second text are less than or equal to the format characters in the first text.
  • the electronic device may first perform format preprocessing on the first text to obtain the second text.
  • the complexity of the preset FST intent slot model for matching the second text with FST rules can be simplified, and the matching speed can be further improved.
  • the electronic device obtains the intent information and/or slot information from the third text according to preset intent labeling information and/or preset slot labeling information. After the steps, the method further includes: the electronic device outputs the intent information and/or slot information in a structured manner.
  • the electronic device can output the intent information and/or slot information in a structured manner, so that other modules in the electronic device can use the intent information and/or slot information.
  • the preset FST intent slot model is a preset WFST intent slot model
  • a preset WFST intent slot model is a preset WFST
  • each state transition has a weight.
  • the preset FST intent slot model is the preset WFST intent slot model, so that the weight of each match can be obtained during parallel matching, which facilitates the screening of matching results.
  • the electronic device uses a preset FST intent slot model to perform rule matching on the second text to obtain the third text, which specifically includes: the electronic device uses multiple presets Set the WFST intent slot model, perform parallel rule matching on the second text to obtain the WFST result; where the WFST result includes the matching text of the second text and the cumulative weight of the matching path, and the matching text of the second text is passed After the matching path is successfully matched, the output text contains the preset intent labeling information and/or the preset slot labeling information; when the WFST result is one, the electronic device determines the matching text of the second text in the WFST result Is the third text; when there are multiple WFST results, the electronic device determines that the matching text of the second text in the WFST result with the highest credibility is the third text.
  • the electronic device determines the matching text of the second text in the WFST result with the highest credibility after parallel rule matching as the third text. While improving the efficiency of parallel matching, it also ensures the accuracy of intent recognition.
  • the electronic device determines that the matching text of the second text in the WFST result with the highest credibility is the third text, Specifically: when there are multiple WFST results, the electronic device calculates the credibility score for the multiple WFST results according to the cumulative weight of the matching path in the multiple WFST results; the electronic device determines that the credibility score is the highest The matching text of the second text in the WFST result is the third text.
  • the electronic device determines the matching text of the second text in the WFST result with the highest credibility score as the third text, which improves the accuracy of credibility evaluation.
  • the weight of state transitions that accept wildcards is greater than the weight of state transitions that do not accept wildcards, and each state on the matching path The greater the weight of the transfer, the greater the cumulative weight of the matching path.
  • the cumulative weight of the matching path is equal to the sum of the weights of each state transition on the matching path.
  • the electronic device performs credibility score calculation on the multiple WFST results according to the cumulative weight of the matching path in the multiple WFST results, which specifically includes: the electronic device Use the credibility score calculation formula 1 to calculate the credibility score of multiple WFST results;
  • w represents the cumulative weight of the matching path in the WFST results that need to be scored
  • w max represents the largest cumulative weight among the cumulative weights of the matching path in the multiple WFST results.
  • the electronic device uses a preset FST intent slot model to perform rule matching on the second text, and before the step of obtaining the third text, the method further includes: the electronic The device loads the preset WFST intent slot model.
  • an embodiment of the present application provides an electronic device, the electronic device includes: one or more processors and a memory; the memory is coupled with the one or more processors, and the memory is used to store computer program codes,
  • the computer program code includes computer instructions, and the one or more processors call the computer instructions to cause the electronic device to execute: in response to a user's voice input, convert the voice input into a first text; use a preset FST intent slot
  • the bit model performs rule matching on the second text to obtain the third text; the second text is determined according to the first text; the preset FST intent slot model is a preset FST;
  • the third text includes presets Intent labeling information and/or preset slot labeling information, the preset intent labeling information is used to label the intent information of the second text, and the preset slot labeling information is used to label the slot in the second text Information: According to the preset intent labeling information and/or the preset slot labeling information, the intent information and/or slot information are obtained from the third text.
  • the electronic device uses the preset FST intent slot model to recognize user input. Since the preset FST intent slot model is an FST, based on the characteristics of FST rule matching, it performs rule matching extremely fast . And after matching, the third text with preset intent labeling information and preset slot labeling information can be obtained. After the matching is completed, the intent information and slot information can be easily extracted from it, which greatly improves the use of NLU rules for intent The speed and accuracy of matching during recognition.
  • the one or more processors are also used to call the computer instructions to make the electronic device execute: format preprocessing of the first text to obtain the first text Second text; the format characters in the second text are less than or equal to the format characters in the first text.
  • the one or more processors are also used to call the computer instructions to make the electronic device execute: structure the intent information and/or slot information Output.
  • the preset FST intent slot model is a preset WFST intent slot model
  • a preset WFST intent slot model is a preset WFST
  • each state transition has a weight.
  • the one or more processors are specifically configured to call the computer instructions to make the electronic device execute: using multiple preset WFST intent slot models, The second text is matched with parallel rules to obtain the WFST result; where the WFST result includes the matching text of the second text and the cumulative weight of the matching path, and the matching text of the second text is the output after successfully matching through the matching path A text containing preset intent labeling information and/or preset slot labeling information; when the WFST result is one, it is determined that the matching text of the second text in the WFST result is the third text; when the WFST result is multiple At this time, it is determined that the matching text of the second text in the WFST result with the highest credibility is the third text.
  • the one or more processors are specifically configured to call the computer instructions to make the electronic device execute: when the WFST result is multiple, according to the multiple The cumulative weight of the matching path in the WFST result is calculated on the credibility score of the multiple WFST results; the matching text of the second text in the WFST result with the highest credibility score is determined to be the third text.
  • the weight of state transitions that accept wildcards is greater than the weight of state transitions that do not accept wildcards, and each state on the matching path The greater the weight of the transfer, the greater the cumulative weight of the matching path.
  • the cumulative weight of the matching path is equal to the sum of the weights of each state transition on the matching path.
  • the one or more processors are specifically configured to call the computer instructions to make the electronic device execute: using the credibility score calculation formula 1, calculate multiple WFSTs Reliability score of results;
  • w represents the cumulative weight of the matching path in the WFST results that need to be scored, and represents the largest cumulative weight among the cumulative weights of the matching path in the multiple WFST results.
  • the one or more processors are also used to call the computer instructions to cause the electronic device to execute: load a preset WFST intent slot model.
  • embodiments of the present application provide a chip system that is applied to an electronic device.
  • the chip system includes one or more processors for invoking computer instructions to make the electronic device execute the first Aspect and the method described in any possible implementation of the first aspect.
  • the chip system may include one processor 110 in the electronic device 100 as shown in FIG. 5, or may include multiple processors 110 in the electronic device 100 as shown in FIG. 5, which is not limited here. .
  • the embodiments of the present application provide a computer program product containing instructions.
  • the computer program product When the computer program product is run on an electronic device, the electronic device executes the first aspect and any possible implementation manner in the first aspect. Described method.
  • an embodiment of the present application provides a computer-readable storage medium, including instructions, which when the foregoing instructions run on an electronic device, cause the electronic device to execute the first aspect and any possible implementation manner in the first aspect Described method.
  • the electronic equipment provided in the second aspect, the chip system provided in the third aspect, the computer program product provided in the fourth aspect, and the computer storage medium provided in the fifth aspect are all used to implement the methods provided in the embodiments of the present application. . Therefore, the beneficial effects that can be achieved can refer to the beneficial effects in the corresponding method, which will not be repeated here.
  • Figure 1 is a schematic diagram of the relationship between intent and slot
  • FIG. 2 is an exemplary schematic diagram of an FSA
  • FIG. 3 is an exemplary schematic diagram of an FST
  • FIG. 4 is an exemplary schematic diagram of a WFST
  • FIG. 5 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • FIG. 6 is a block diagram of the software structure of an electronic device according to an embodiment of the present application.
  • FIG. 7 is a schematic diagram of the structure of the intention recognition module in an embodiment of the present application.
  • FIG. 8 is a schematic diagram of a usage scenario of the intention recognition method in an embodiment of the present application.
  • FIG. 9 is an exemplary schematic diagram of an FST intention slot model in an embodiment of the present application.
  • FIG. 10 is a schematic flowchart of an intention recognition method in an embodiment of the present application.
  • FIG. 11 is a schematic diagram of another flow chart of the intention recognition method in an embodiment of the present application.
  • FIG. 12 is an exemplary schematic diagram of a WFST intention slot model in an embodiment of the present application.
  • FIG. 13 is another exemplary schematic diagram of a WFST intention slot model in an embodiment of the present application.
  • FIG. 14 is a schematic diagram of another flow chart of an intention recognition method in an embodiment of the present application.
  • 15 is an exemplary schematic diagram of parallel rule matching of multiple preset WFST intent slot models for the second text in an embodiment of the present application
  • FIG. 16 is an exemplary schematic diagram of a situation in which different settings are used for weights and cumulative weights in the preset WFST intention slot model in an embodiment of the present application.
  • first and second are only used for descriptive purposes, and cannot be understood as implying or implying relative importance or implicitly specifying the number of indicated technical features. Therefore, the features defined with “first” and “second” may explicitly or implicitly include one or more of these features. In the description of the embodiments of the present application, unless otherwise specified, “multiple” The meaning is two or more.
  • Intent refers to the identification of the actual or potential needs of the user by the electronic device. Fundamentally speaking, intent is a classifier that divides user needs into certain types.
  • Intention recognition also known as SUC (Spoken Utterance Classification), as the name suggests, is to classify the natural language conversation input by the user, and the classified category corresponds to the user's intent. For example, “what's the weather today", the intention is “ask the weather”.
  • intent recognition can be regarded as a typical classification problem.
  • the classification and definition of intent can refer to the ISO-24617-2 standard, which has 56 detailed definitions. The definition of intention has a lot to do with the positioning of the system itself and the knowledge base it possesses, that is, the definition of intention has a very strong domain relevance. It can be understood that in the embodiments of the present application, the classification and definition of intentions are not limited to the ISO-24617-2 standard.
  • the slot is the parameter of the intent.
  • An intent may correspond to several slots. For example, when asking for a bus route, you need to provide necessary parameters such as departure place, destination, and time. The above parameters are the slots corresponding to the intention of "asking for bus route".
  • the main goal of the semantic slot filling task is to extract the pre-defined semantic slot values in the semantic frame from the input sentence on the premise that the semantic frame of a specific domain or specific intention is known.
  • the semantic slot filling task can be transformed into a sequence labeling task, that is, using the classic IOB notation method to mark a word as the beginning, continuation (inside), or non-semantic slot (outside) of a certain semantic slot.
  • Intent and slot position can let the system know which specific task to perform, and give the type of parameters needed to perform the task.
  • Slot definition Slot 1: Time, Date; Slot 2: Location, Location.
  • Fig. 1 is a schematic diagram of a relationship between an intention and a slot in an embodiment of the application.
  • two necessary slots are defined for the "Ask the weather” task, which are "time” and "location".
  • the above definition can solve the task requirement.
  • a system often needs to be able to handle several tasks at the same time.
  • the weather station should be able to answer the question of “inquiring about the weather” as well as the question of “inquiring about the temperature”.
  • an optimized strategy is to define higher-level domains, such as "asking for the weather” intentions and “asking for temperature” intentions are both in the "weather” domain.
  • the domain can be simply understood as a collection of intents.
  • NLU Natural Language Understanding
  • the user intent and the corresponding slot value of the corresponding slot can be identified from the user input.
  • the goal of intent recognition is to identify user intent from the input.
  • a single task can be simply modeled as a two-category question, such as "asking for the weather” intent, which can be modeled as “asking for the weather” or “not as for asking about the weather” during intent recognition.
  • "Weather” two classification problem When it comes to the need for the system to handle multiple tasks, the system needs to be able to distinguish each intent. In this case, the two-category problem is transformed into a multi-category problem.
  • the task of slot filling is to extract information from the data and fill it into a pre-defined slot.
  • the intent and the corresponding slot have been defined in Figure 1.
  • the system should Can extract "Today” and “Shanghai” and fill them into the “Time” and “Location” slots respectively.
  • FST is currently widely used in speech recognition and natural language search and processing. For example, in natural language processing, some operations that modify text content according to rules are often encountered. For example, a rule is: if c is immediately followed by x in the string, then c is changed to b. FST is based on mathematical operations on these rules, integrating several rules into a one-way large-scale rule to effectively improve the efficiency of the rule-based system.
  • FSA finite state acceptor
  • FSA For a given input sequence, FSA returns "receiving" or “not receiving” two states.
  • FIG. 2 it is an exemplary schematic diagram of an FSA, and its nodes and arcs respectively correspond to state and state transitions.
  • state 0 represents the initial state
  • state 5 represents the end state.
  • State 0 to state 1 can accept character a
  • state 1 to state 1 can accept character b
  • state 1 to state 2 can accept character c
  • state 2 to state 5 can accept character d
  • state 0 to state 3 can accept character b.
  • State 3 to State 4 can accept the character c
  • State 4 to State 4 can accept the character d
  • State 4 to State 5 can accept the character e.
  • the regular expression corresponding to the FSA described in Figure 2 is: ab*cd
  • the FSA shown in FIG. 2 can receive a symbol sequence "a, b, c, d" through paths 0, 1, 1, 2, and 5. At this time, the FSA can return to the "receive” state. For another example, if a sequence of "a, b, d" is entered in the FSA shown in FIG. 2, since there is no path in the FSA shown in FIG. 2 to obtain the sequence, the FSA will return to the "not receiving" state.
  • FST is an extension of FSA, and each state transition has an output tag, called an input-output tag pair.
  • Figure 2 it is an exemplary schematic diagram of an FST.
  • state 0 represents the initial state
  • state 5 represents the end state.
  • the input and output label pairs from state 0 to state 1 are a: z; the input and output label pairs from state 1 to state 1 are b: y; the input and output label pairs from state 1 to state 2 are c: x; from state 2 to state 5
  • FST can describe the conversion of a set of rules or the conversion of a set of symbol sequences to another set of conforming sequences.
  • the input symbol sequence "a, b, c, d" through the path 0, 1, 1, 2, 5, because a is converted to z, b is converted to y, c is converted to x, and d is converted to w .
  • FST is an efficient data structure, and its basic theory is based on the graph theory in the data structure. FST is divided into non-deterministic (Non-Deterministic) FST and deterministic (Deterministic) FST.
  • the deterministic FST is a 7-tuple: (Q, ⁇ , ⁇ , ⁇ , ⁇ ,q 0 ,F), where:
  • Non-deterministic FST is also a 7-tuple, but non-deterministic FST may have multiple choices when performing state transitions, so Definition 4 and Definition 5 are different from deterministic FST.
  • an FST has multiple states and multiple state transitions, starting from the initial state, passing through multiple accepting states, and finally reaching the ending state to complete a matching operation.
  • FST The typical features of FST are flexible use, high matching efficiency, and low memory overhead.
  • WFST is a type of FST.
  • Each state transition has a weight
  • each initial state has an initial weight
  • each terminal state has an end weight.
  • the weight is generally the probability or loss of transition or initial/termination state. The weight will be accumulated along each path and accumulated on different paths.
  • the calculation method of the cumulative weight can be specified by the specific WFST.
  • the cumulative weight can be multiplying all weights on the passing path, adding all the weights on the passing path, or calculating the weights on the passing path. The method is not limited here.
  • FIG 4 is an exemplary schematic diagram of WFST.
  • each state transition label is transferred in the form of "input-label: output-label/weight”, and the initial state and the final state also have corresponding weights.
  • WFST can be defined by an 8-tuple ( ⁇ , ⁇ , Q, I, F, E, ⁇ , ⁇ ):
  • Q is a set of finite states
  • the WFST shown in Figure 4 can be defined as follows:
  • each transition in E consists of (source state, input label, output label, weight, target state);
  • FIG. 5 is a schematic structural diagram of an electronic device 100 provided by an embodiment of the present application.
  • the electronic device 100 may have more or fewer components than shown in the figure, may combine two or more components, or may have different component configurations.
  • the various components shown in the figure may be implemented in hardware, software, or a combination of hardware and software including one or more signal processing and/or application specific integrated circuits.
  • the electronic device 100 may include: a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2.
  • Mobile communication module 150 wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, earphone jack 170D, sensor module 180, buttons 190, motor 191, indicator 192, camera 193, display screen 194 and Subscriber identification module (subscriber identification module, SIM) card interface 195, etc.
  • SIM Subscriber identification module
  • the sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, and ambient light Sensor 180L, bone conduction sensor 180M, etc.
  • the structure illustrated in the embodiment of the present invention does not constitute a specific limitation on the electronic device 100.
  • the electronic device 100 may include more or fewer components than shown, or combine certain components, or split certain components, or arrange different components.
  • the illustrated components can be implemented in hardware, software, or a combination of software and hardware.
  • the processor 110 may include one or more processing units.
  • the processor 110 may include an application processor (AP), a modem processor, a graphics processing unit (GPU), and an image signal processor. (image signal processor, ISP), controller, memory, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural-network processing unit (NPU) Wait.
  • AP application processor
  • modem processor modem processor
  • GPU graphics processing unit
  • image signal processor image signal processor
  • ISP image signal processor
  • controller memory
  • video codec digital signal processor
  • DSP digital signal processor
  • NPU neural-network processing unit
  • the different processing units may be independent devices or integrated in one or more processors.
  • the controller may be the nerve center and command center of the electronic device 100.
  • the controller can generate operation control signals according to the instruction operation code and timing signals to complete the control of fetching and executing instructions.
  • a memory may also be provided in the processor 110 to store instructions and data.
  • the memory in the processor 110 is a cache memory.
  • the memory can store instructions or data that have just been used or recycled by the processor 110. If the processor 110 needs to use the instruction or data again, it can be directly called from the memory. Repeated accesses are avoided, the waiting time of the processor 110 is reduced, and the efficiency of the system is improved.
  • the processor 110 may include one or more interfaces.
  • the interface can include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, and a universal asynchronous transmitter (universal asynchronous transmitter) interface.
  • I2C integrated circuit
  • I2S integrated circuit built-in audio
  • PCM pulse code modulation
  • PCM pulse code modulation
  • UART universal asynchronous transmitter
  • MIPI mobile industry processor interface
  • GPIO general-purpose input/output
  • SIM subscriber identity module
  • USB Universal Serial Bus
  • the I2C interface is a bidirectional synchronous serial bus, which includes a serial data line (SDA) and a serial clock line (SCL).
  • the processor 110 may include multiple sets of I2C buses.
  • the processor 110 may couple the touch sensor 180K, charger, flash, camera 193, etc., respectively through different I2C bus interfaces.
  • the processor 110 may couple the touch sensor 180K through an I2C interface, so that the processor 110 and the touch sensor 180K communicate through the I2C bus interface to implement the touch function of the electronic device 100.
  • the I2S interface can be used for audio communication.
  • the processor 110 may include multiple sets of I2S buses.
  • the processor 110 may be coupled with the audio module 170 through an I2S bus to implement communication between the processor 110 and the audio module 170.
  • the audio module 170 may transmit audio signals to the wireless communication module 160 through an I2S interface, so as to realize the function of answering calls through a Bluetooth headset.
  • the PCM interface can also be used for audio communication to sample, quantize and encode analog signals.
  • the audio module 170 and the wireless communication module 160 may be coupled through a PCM bus interface.
  • the audio module 170 may also transmit audio signals to the wireless communication module 160 through the PCM interface, so as to realize the function of answering calls through the Bluetooth headset. Both the I2S interface and the PCM interface can be used for audio communication.
  • the UART interface is a universal serial data bus used for asynchronous communication.
  • the bus can be a two-way communication bus. It converts the data to be transmitted between serial communication and parallel communication.
  • the UART interface is generally used to connect the processor 110 and the wireless communication module 160.
  • the processor 110 communicates with the Bluetooth module in the wireless communication module 160 through the UART interface to realize the Bluetooth function.
  • the audio module 170 may transmit audio signals to the wireless communication module 160 through a UART interface, so as to realize the function of playing music through a Bluetooth headset.
  • the MIPI interface can be used to connect the processor 110 with the display screen 194, the camera 193 and other peripheral devices.
  • the MIPI interface includes camera serial interface (camera serial interface, CSI), display serial interface (display serial interface, DSI), etc.
  • the processor 110 and the camera 193 communicate through a CSI interface to implement the shooting function of the electronic device 100.
  • the processor 110 and the display screen 194 communicate through a DSI interface to realize the display function of the electronic device 100.
  • the GPIO interface can be configured through software.
  • the GPIO interface can be configured as a control signal or as a data signal.
  • the GPIO interface can be used to connect the processor 110 with the camera 193, the display screen 194, the wireless communication module 160, the audio module 170, the sensor module 180, and so on.
  • the GPIO interface can also be configured as an I2C interface, I2S interface, UART interface, MIPI interface, etc.
  • the SIM interface can be used to communicate with the SIM card interface 195 to realize the function of transmitting data to the SIM card or reading data in the SIM card.
  • the USB interface 130 is an interface that complies with the USB standard specification, and specifically may be a Mini USB interface, a Micro USB interface, a USB Type C interface, and so on.
  • the USB interface 130 can be used to connect a charger to charge the electronic device 100, and can also be used to transfer data between the electronic device 100 and peripheral devices. It can also be used to connect earphones and play audio through earphones.
  • the interface can also be used to connect other electronic devices, such as AR devices.
  • the interface connection relationship between the modules illustrated in the embodiment of the present invention is only a schematic illustration, and does not constitute a structural limitation of the electronic device 100.
  • the electronic device 100 may also adopt different interface connection modes in the foregoing embodiments, or a combination of multiple interface connection modes.
  • the charging management module 140 is used to receive charging input from the charger.
  • the charger can be a wireless charger or a wired charger.
  • the power management module 141 is used to connect the battery 142, the charging management module 140 and the processor 110.
  • the power management module 141 receives input from the battery 142 and/or the charging management module 140, and supplies power to the processor 110, the internal memory 121, the external memory, the display screen 194, the camera 193, and the wireless communication module 160.
  • the wireless communication function of the electronic device 100 can be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modem processor, and the baseband processor.
  • the antenna 1 and the antenna 2 are used to transmit and receive electromagnetic wave signals.
  • Each antenna in the electronic device 100 can be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve antenna utilization.
  • antenna 1 can be multiplexed as a diversity antenna of a wireless local area network.
  • the antenna can be used in combination with a tuning switch.
  • the mobile communication module 150 may provide a wireless communication solution including 2G/3G/4G/5G and the like applied to the electronic device 100.
  • the mobile communication module 150 may include at least one filter, a switch, a power amplifier, a low noise amplifier (LNA), and the like.
  • the mobile communication module 150 can receive electromagnetic waves by the antenna 1, and perform processing such as filtering, amplifying and transmitting the received electromagnetic waves to the modem processor for demodulation.
  • the mobile communication module 150 can also amplify the signal modulated by the modem processor, and convert it into electromagnetic wave radiation via the antenna 1.
  • at least part of the functional modules of the mobile communication module 150 may be provided in the processor 110.
  • at least part of the functional modules of the mobile communication module 150 and at least part of the modules of the processor 110 may be provided in the same device.
  • the modem processor may include a modulator and a demodulator.
  • the modulator is used to modulate the low frequency baseband signal to be sent into a medium and high frequency signal.
  • the demodulator is used to demodulate the received electromagnetic wave signal into a low-frequency baseband signal.
  • the demodulator then transmits the demodulated low-frequency baseband signal to the baseband processor for processing.
  • the low-frequency baseband signal is processed by the baseband processor and then passed to the application processor.
  • the application processor outputs a sound signal through an audio device (not limited to the speaker 170A, the receiver 170B, etc.), or displays an image or video through the display screen 194.
  • the modem processor may be an independent device.
  • the modem processor may be independent of the processor 110 and be provided in the same device as the mobile communication module 150 or other functional modules.
  • the wireless communication module 160 can provide applications on the electronic device 100 including wireless local area networks (WLAN) (such as wireless fidelity (Wi-Fi) networks), bluetooth (BT), and global navigation satellites.
  • WLAN wireless local area networks
  • BT wireless fidelity
  • GNSS global navigation satellite system
  • FM frequency modulation
  • NFC near field communication technology
  • infrared technology infrared, IR
  • the wireless communication module 160 may be one or more devices integrating at least one communication processing module.
  • the wireless communication module 160 receives electromagnetic waves via the antenna 2, frequency modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 110.
  • the wireless communication module 160 may also receive a signal to be sent from the processor 110, perform frequency modulation, amplify it, and convert it into electromagnetic waves to radiate through the antenna 2.
  • the antenna 1 of the electronic device 100 is coupled with the mobile communication module 150, and the antenna 2 is coupled with the wireless communication module 160, so that the electronic device 100 can communicate with the network and other devices through wireless communication technology.
  • the wireless communication technology may include global system for mobile communications (GSM), general packet radio service (GPRS), code division multiple access (CDMA), broadband Code division multiple access (wideband code division multiple access, WCDMA), time-division code division multiple access (time-division code division multiple access, TD-SCDMA), long term evolution (LTE), BT, GNSS, WLAN, NFC , FM, and/or IR technology, etc.
  • the GNSS may include the global positioning system (GPS), the global navigation satellite system (GLONASS), the Beidou navigation satellite system (BDS), and the quasi-zenith satellite system (quasi). -zenith satellite system, QZSS) and/or satellite-based augmentation systems (SBAS).
  • GPS global positioning system
  • GLONASS global navigation satellite system
  • BDS Beidou navigation satellite system
  • QZSS quasi-zenith satellite system
  • SBAS satellite-based augmentation systems
  • the electronic device 100 implements a display function through a GPU, a display screen 194, an application processor, and the like.
  • the GPU is an image processing microprocessor, which is connected to the display screen 194 and the application processor.
  • the GPU is used to perform mathematical and geometric calculations and is used for graphics rendering.
  • the processor 110 may include one or more GPUs that execute program instructions to generate or change display information.
  • the display screen 194 is used to display images, videos, and the like.
  • the display screen 194 includes a display panel.
  • the display panel can adopt liquid crystal display (LCD), organic light-emitting diode (OLED), active matrix organic light-emitting diode or active-matrix organic light-emitting diode (active-matrix organic light-emitting diode).
  • Emitting diode AMOLED, flexible light-emitting diode (FLED), Miniled, MicroLed, Micro-oLed, quantum dot light-emitting diode (QLED), etc.
  • the electronic device 100 may include one or N display screens 194, and N is a positive integer greater than one.
  • the electronic device 100 can realize a shooting function through an ISP, a camera 193, a video codec, a GPU, a display screen 194, and an application processor.
  • the ISP is used to process the data fed back from the camera 193. For example, when taking a picture, the shutter is opened, the light is transmitted to the photosensitive element of the camera through the lens, the light signal is converted into an electrical signal, and the photosensitive element of the camera transfers the electrical signal to the ISP for processing and is converted into an image visible to the naked eye.
  • ISP can also optimize the image noise, brightness, and skin color. ISP can also optimize the exposure, color temperature and other parameters of the shooting scene.
  • the ISP may be provided in the camera 193.
  • the camera 193 is used to capture still images or videos.
  • the object generates an optical image through the lens and is projected to the photosensitive element.
  • the photosensitive element may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor.
  • CMOS complementary metal-oxide-semiconductor
  • the photosensitive element converts the optical signal into an electrical signal, and then transfers the electrical signal to the ISP to convert it into a digital image signal.
  • ISP outputs digital image signals to DSP for processing.
  • DSP converts digital image signals into standard RGB, YUV and other formats of image signals.
  • the electronic device 100 may include one or N cameras 193, and N is a positive integer greater than one.
  • Digital signal processors are used to process digital signals. In addition to digital image signals, they can also process other digital signals. For example, when the electronic device 100 selects a frequency point, the digital signal processor is used to perform Fourier transform on the energy of the frequency point.
  • Video codecs are used to compress or decompress digital video.
  • the electronic device 100 may support one or more video codecs. In this way, the electronic device 100 can play or record videos in multiple encoding formats, such as: moving picture experts group (MPEG) 1, MPEG2, MPEG3, MPEG4, and so on.
  • MPEG moving picture experts group
  • MPEG2 MPEG2, MPEG3, MPEG4, and so on.
  • NPU is a neural-network (NN) computing processor.
  • NN neural-network
  • applications such as intelligent cognition of the electronic device 100 can be realized, such as image recognition, face recognition, voice recognition, text understanding, and so on.
  • the external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the electronic device 100.
  • the external memory card communicates with the processor 110 through the external memory interface 120 to realize the data storage function. For example, save music, video and other files in an external memory card.
  • the internal memory 121 may be used to store computer executable program code, where the executable program code includes instructions.
  • the processor 110 executes various functional applications and data processing of the electronic device 100 by running instructions stored in the internal memory 121.
  • the internal memory 121 may include a storage program area and a storage data area.
  • the storage program area can store an operating system, at least one application required by a function (such as a face recognition function, a fingerprint recognition function, a mobile payment function, etc.) and so on.
  • the storage data area can store data created during the use of the electronic device 100 (such as face information template data, fingerprint information template, etc.) and the like.
  • the internal memory 121 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, a universal flash storage (UFS), and the like.
  • UFS universal flash storage
  • the electronic device 100 can implement audio functions through the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the earphone interface 170D, and the application processor. For example, music playback, recording, etc.
  • the audio module 170 is used to convert digital audio information into an analog audio signal for output, and is also used to convert an analog audio input into a digital audio signal.
  • the audio module 170 can also be used to encode and decode audio signals.
  • the audio module 170 may be provided in the processor 110, or part of the functional modules of the audio module 170 may be provided in the processor 110.
  • the speaker 170A also called “speaker” is used to convert audio electrical signals into sound signals.
  • the electronic device 100 can listen to music through the speaker 170A, or listen to a hands-free call.
  • the receiver 170B also called a "handset" is used to convert audio electrical signals into sound signals.
  • the electronic device 100 answers a call or voice message, it can receive the voice by bringing the receiver 170B close to the human ear.
  • the microphone 170C also called “microphone”, “microphone”, is used to convert sound signals into electrical signals.
  • the user can make a sound by approaching the microphone 170C through the human mouth, and input the sound signal into the microphone 170C.
  • the electronic device 100 may be provided with at least one microphone 170C. In other embodiments, the electronic device 100 may be provided with two microphones 170C, which can implement noise reduction functions in addition to collecting sound signals. In other embodiments, the electronic device 100 may also be provided with three, four or more microphones 170C to collect sound signals, reduce noise, identify sound sources, and realize directional recording functions.
  • the earphone interface 170D is used to connect wired earphones.
  • the earphone interface 170D may be a USB interface 130, or a 3.5mm open mobile terminal platform (OMTP) standard interface, or a cellular telecommunications industry association (cellular telecommunications industry association of the USA, CTIA) standard interface.
  • OMTP open mobile terminal platform
  • CTIA cellular telecommunications industry association
  • the pressure sensor 180A is used to sense the pressure signal and can convert the pressure signal into an electrical signal.
  • the pressure sensor 180A may be provided on the display screen 194.
  • the capacitive pressure sensor may include at least two parallel plates with conductive materials.
  • the electronic device 100 determines the intensity of the pressure according to the change in capacitance.
  • the electronic device 100 detects the intensity of the touch operation according to the pressure sensor 180A.
  • the electronic device 100 may also calculate the touched position according to the detection signal of the pressure sensor 180A.
  • touch operations that act on the same touch position but have different touch operation intensities can correspond to different operation instructions. For example: when a touch operation whose intensity is less than the first pressure threshold is applied to the short message application icon, an instruction to view the short message is executed. When a touch operation with a touch operation intensity greater than or equal to the first pressure threshold acts on the short message application icon, an instruction to create a new short message is executed.
  • the gyro sensor 180B may be used to determine the movement posture of the electronic device 100.
  • the angular velocity of the electronic device 100 around three axes ie, x, y, and z axes
  • the gyro sensor 180B can be used for image stabilization.
  • the gyro sensor 180B detects the shake angle of the electronic device 100, calculates the distance that the lens module needs to compensate according to the angle, and allows the lens to counteract the shake of the electronic device 100 through reverse movement to achieve anti-shake.
  • the gyro sensor 180B can also be used for navigation and somatosensory game scenes.
  • the air pressure sensor 180C is used to measure air pressure.
  • the electronic device 100 calculates the altitude based on the air pressure value measured by the air pressure sensor 180C to assist positioning and navigation.
  • the magnetic sensor 180D includes a Hall sensor.
  • the electronic device 100 can use the magnetic sensor 180D to detect the opening and closing of the flip holster.
  • the electronic device 100 can detect the opening and closing of the flip according to the magnetic sensor 180D. Then, according to the detected opening and closing state of the leather case or the opening and closing state of the flip cover, features such as automatic unlocking of the flip cover are set.
  • the acceleration sensor 180E can detect the magnitude of the acceleration of the electronic device 100 in various directions (generally three axes). When the electronic device 100 is stationary, the magnitude and direction of gravity can be detected. It can also be used to identify the posture of electronic devices, and be used in applications such as horizontal and vertical screen switching, pedometers and so on.
  • the electronic device 100 can measure the distance by infrared or laser. In some embodiments, when shooting a scene, the electronic device 100 may use the distance sensor 180F to measure the distance to achieve fast focusing.
  • the proximity light sensor 180G may include, for example, a light emitting diode (LED) and a light detector such as a photodiode.
  • the light emitting diode may be an infrared light emitting diode.
  • the electronic device 100 emits infrared light to the outside through the light emitting diode.
  • the electronic device 100 uses a photodiode to detect infrared reflected light from nearby objects. When sufficient reflected light is detected, it can be determined that there is an object near the electronic device 100. When insufficient reflected light is detected, the electronic device 100 can determine that there is no object near the electronic device 100.
  • the electronic device 100 can use the proximity light sensor 180G to detect that the user holds the electronic device 100 close to the ear to talk, so as to automatically turn off the screen to save power.
  • the proximity light sensor 180G can also be used in leather case mode, and the pocket mode will automatically unlock and lock the screen.
  • the ambient light sensor 180L is used to sense the brightness of the ambient light.
  • the electronic device 100 can adaptively adjust the brightness of the display screen 194 according to the perceived brightness of the ambient light.
  • the ambient light sensor 180L can also be used to automatically adjust the white balance when taking pictures.
  • the ambient light sensor 180L can also cooperate with the proximity light sensor 180G to detect whether the electronic device 100 is in the pocket to prevent accidental touch.
  • the fingerprint sensor 180H is used to collect fingerprints.
  • the electronic device 100 can use the collected fingerprint characteristics to implement fingerprint unlocking, access application locks, fingerprint photographs, fingerprint answering calls, and so on.
  • the temperature sensor 180J is used to detect temperature.
  • the electronic device 100 uses the temperature detected by the temperature sensor 180J to execute a temperature processing strategy. For example, when the temperature reported by the temperature sensor 180J exceeds a threshold value, the electronic device 100 reduces the performance of the processor located near the temperature sensor 180J, so as to reduce power consumption and implement thermal protection.
  • the electronic device 100 when the temperature is lower than another threshold, the electronic device 100 heats the battery 142 to avoid abnormal shutdown of the electronic device 100 due to low temperature.
  • the electronic device 100 boosts the output voltage of the battery 142 to avoid abnormal shutdown caused by low temperature.
  • Touch sensor 180K also called “touch panel”.
  • the touch sensor 180K may be provided on the display screen 194, and the touch screen is composed of the touch sensor 180K and the display screen 194, which is also called a “touch screen”.
  • the touch sensor 180K is used to detect touch operations acting on or near it.
  • the touch sensor can pass the detected touch operation to the application processor to determine the type of touch event.
  • the visual output related to the touch operation can be provided through the display screen 194.
  • the touch sensor 180K may also be disposed on the surface of the electronic device 100, which is different from the position of the display screen 194.
  • the button 190 includes a power-on button, a volume button, and so on.
  • the button 190 may be a mechanical button. It can also be a touch button.
  • the electronic device 100 may receive key input, and generate key signal input related to user settings and function control of the electronic device 100.
  • the motor 191 can generate vibration prompts.
  • the motor 191 can be used for incoming call vibration notification, and can also be used for touch vibration feedback.
  • touch operations that act on different applications can correspond to different vibration feedback effects.
  • Acting on touch operations in different areas of the display screen 194, the motor 191 can also correspond to different vibration feedback effects.
  • Different application scenarios for example: time reminding, receiving information, alarm clock, games, etc.
  • the touch vibration feedback effect can also support customization.
  • the indicator 192 may be an indicator light, which may be used to indicate the charging status, power change, or to indicate messages, missed calls, notifications, and so on.
  • the SIM card interface 195 is used to connect to the SIM card.
  • the SIM card can be inserted into the SIM card interface 195 or pulled out from the SIM card interface 195 to achieve contact and separation with the electronic device 100.
  • the electronic device 100 may support 1 or N SIM card interfaces, and N is a positive integer greater than 1.
  • the SIM card interface 195 can support Nano SIM cards, Micro SIM cards, SIM cards, etc.
  • the same SIM card interface 195 can insert multiple cards at the same time. The types of the multiple cards can be the same or different.
  • the SIM card interface 195 can also be compatible with different types of SIM cards.
  • the SIM card interface 195 can also be compatible with external memory cards.
  • the electronic device 100 interacts with the network through the SIM card to implement functions such as call and data communication.
  • the electronic device 100 may receive the user's voice input and environmental information through the microphone 170C and the sensor module 180. After the user's voice input is converted into digital audio information through the audio module 170, the processor 110 may perform voice recognition , Converted into text information. Then execute the intention recognition method in the embodiment of the present application to identify the user's intention and slot, and express it with structured semantics.
  • FIG. 6 is a block diagram of the software structure of the electronic device 100 according to an embodiment of the present application.
  • the layered architecture divides the software into several layers, and each layer has a clear role and division of labor. Communication between layers through software interface.
  • the system is divided into four layers, from top to bottom, the application layer, the application framework layer, the runtime and system libraries, and the kernel layer.
  • the application layer can include a series of application packages.
  • the application package may include applications (also referred to as applications) such as camera, gallery, calendar, call, map, navigation, WLAN, Bluetooth, music, video, short message, etc.
  • applications also referred to as applications
  • the application layer may also include an intention recognition module.
  • the intention recognition module is used to execute the intention recognition method in the embodiment of the present application.
  • the application framework layer provides an application programming interface (application programming interface, API) and a programming framework for applications in the application layer.
  • the application framework layer includes some predefined functions.
  • the application framework layer can include a window manager, a content provider, a view system, a phone manager, a resource manager, a notification manager, and so on.
  • the window manager is used to manage window programs.
  • the window manager can obtain the size of the display screen, determine whether there is a status bar, lock the screen, take a screenshot, etc.
  • the content provider is used to store and retrieve data and make these data accessible to applications.
  • the data may include video, image, audio, phone calls made and received, browsing history and bookmarks, phone book, etc.
  • the view system includes visual controls, such as controls that display text, controls that display pictures, and so on.
  • the view system can be used to build applications.
  • the display interface can be composed of one or more views.
  • a display interface that includes a short message notification icon may include a view that displays text and a view that displays pictures.
  • the phone manager is used to provide the communication function of the electronic device 100. For example, the management of the call status (including connecting, hanging up, etc.).
  • the resource manager provides various resources for the application, such as localized strings, icons, pictures, layout files, video files, and so on.
  • the notification manager enables the application to display notification information in the status bar, which can be used to convey notification-type messages, and it can automatically disappear after a short stay without user interaction.
  • the notification manager is used to notify that the download is complete, message reminders, and so on.
  • the notification manager can also be a notification that appears in the status bar at the top of the system in the form of a chart or a scroll bar text, such as a notification of an application running in the background, or a notification that appears on the screen in the form of a dialogue interface.
  • prompt text information in the status bar sound a prompt sound, electronic device vibration, flashing indicator light, etc.
  • Runtime includes core libraries and virtual machines. Runtime is responsible for system scheduling and management.
  • the core library consists of two parts: one part is the function function that the java language needs to call, and the other part is the core library of the system.
  • the application layer and the application framework layer run in a virtual machine.
  • the virtual machine executes the java files of the application layer and the application framework layer as binary files.
  • the virtual machine is used to perform functions such as object life cycle management, stack management, thread management, security and exception management, and garbage collection.
  • the system library can include multiple functional modules. For example: surface manager (surface manager), media library (Media Libraries), three-dimensional graphics processing library (for example: OpenGL ES), two-dimensional graphics engine (for example: SGL), etc.
  • the surface manager is used to manage the display subsystem, and provides a combination of two-dimensional (2-dimensional, 2D) and three-dimensional (3-dimensional, 3D) layers for multiple applications.
  • the media library supports playback and recording of a variety of commonly used audio and video formats, as well as still image files.
  • the media library can support a variety of audio and video encoding formats, such as: MPEG4, H.264, MP3, AAC, AMR, JPG, PNG, etc.
  • the 3D graphics processing library is used to implement 3D graphics drawing, image rendering, synthesis, and layer processing.
  • the 2D graphics engine is a drawing engine for 2D drawing.
  • the kernel layer is the layer between hardware and software.
  • the kernel layer contains at least display driver, camera driver, audio driver, sensor driver, and virtual card driver.
  • the corresponding hardware interrupt is sent to the kernel layer.
  • the kernel layer processes the touch operation into the original input event (including touch coordinates, time stamp of the touch operation, etc.).
  • the original input events are stored in the kernel layer.
  • the application framework layer obtains the original input event from the kernel layer and identifies the control corresponding to the input event. Taking the touch operation as a touch click operation, and the control corresponding to the click operation is the control of the camera application icon as an example, the camera application calls the interface of the application framework layer to start the camera application, and then starts the camera driver by calling the kernel layer.
  • the camera 193 captures still images or videos.
  • FIG. 7 it is a schematic diagram of the architecture of the intention recognition module in the embodiment of this application.
  • the intent recognition module is an NLU engine 700.
  • the NLU engine 700 is used to perform semantic analysis on user input, and output analysis results such as intent and slot position for use by other modules.
  • the NLU engine 700 includes a text preprocessing unit 701, a rule engine 702, a machine learning engine 703, an entity recognition unit 704, an intent classification unit 705, and a slot filling unit 706.
  • the text preprocessing unit 701 is used to preprocess the text input by the user, mainly including removing format symbols in the text that are not needed for subsequent semantic analysis, such as punctuation and spaces. It is understandable that the text input by the user is generally obtained after voice recognition of the user's voice information.
  • the rule engine 702 is configured to perform rule matching on the preprocessed text input by the user according to preset rules based on FST, perform high-frequency sentence pattern coverage, and obtain formatted text with intentional slot format tags.
  • the intention recognition method in the embodiment of the present application is mainly used for the construction of the rule engine 702.
  • the machine learning engine 703 is used to process the preprocessed text input by the user through machine learning to obtain formatted text marked with an intentional slot format.
  • Entity recognition 704 is used to extract entity information from formatted text with intentional slot format tags output by the rule engine 702 or the machine learning engine 703;
  • the intention classification 705 is used to extract the intention information from the formatted text marked with the intention slot format output by the rule engine 702 or the machine learning engine 703;
  • Slot filling 706 is used to extract slot information from the formatted text of the intentional slot format mark output by the rule engine 702 or the machine learning engine 703.
  • FIG. 8 a schematic diagram of a usage scenario of the intention recognition method in an embodiment of this application.
  • the electronic device 100 After the user inputs "Call Dad" into the electronic device 100 by voice, the electronic device 100 will recognize the user input by voice, convert it into the text input by the user, and then use the NLU engine 700 to perform intent recognition on the text input by the user and output
  • the intention is to "call” and the slot “dad” to the dial-up application.
  • the dialing application dials the phone number of "Dad” according to the intention and slot.
  • CPU Intel(R)Xeon(R)E5-2690v2@3.00GHz, memory: 32GB;
  • end-to-end 150ms 100tps, end-to-end: 12ms
  • FIG. 9 it is an exemplary schematic diagram of the FST intention slot model in the embodiment of this application.
  • the intent information is inserted after the initial state of the FST, and the slot label information is inserted before and after all slots, so as to realize the output of the intent information and the slot identification information when the FST is matched.
  • the FST intent slot model includes states and transitions between states, specifically:
  • the FST status is represented by a circle plus a status number
  • State 0 and State 14 are the initial state and end state of FST respectively, and the other states are intermediate states.
  • FST rule of the FST intent slot model shown in Figure 9 can be written as:
  • FIG. 10 it is a schematic flowchart of an intention recognition method in an embodiment of this application.
  • the electronic device In response to a user's voice input, the electronic device converts the voice input into a first text;
  • the electronic device 100 may convert the voice input into text: “Call Dad”.
  • the electronic device 100 may execute S1001 only after receiving a certain trigger.
  • the electronic device 100 may perform step S1001 only after detecting that the user has turned on the voice assistant function; or, the electronic device 100 may perform step S1001 after detecting that the user double-clicks on the screen; or the electronic device 100 may also perform step S1001 after it is turned on.
  • Step S1001 can be executed, which is not limited here.
  • the electronic device performs format preprocessing on the first text to obtain the second text;
  • the main purpose of the electronic device performing format preprocessing on the first text is to remove format characters that are not used in the subsequent FST intent slot model matching in the first text. For example, remove spaces, punctuation marks, etc. in the first text.
  • the first text is: "Call Huaweing.”
  • the second text obtained is: "Call Huaweing”. Removed the spaces and periods.
  • step S1002 may not be performed, and step S1003 is directly performed on the first text obtained after step S1001 is performed. At this time, the first text is the second text. Because the FST intent slot model can also add relevant rules for formatting the first text, but this will increase the complexity of the FST intent slot model. Which scheme to use can be selected according to the actual situation and is not limited here.
  • the electronic device uses the preset FST intent slot model to perform rule matching on the second text to obtain the third text;
  • the preset FST intent slot model is a preset FST; the preset FST intent slot model will add preset intent label information and/or preset slot labels to the input text during the state transition process Information, the preset intent labeling information is used to label the intent information of the input text, and the preset slot labeling information is used to label the slot information in the input text.
  • the third text is a text containing preset intent labeling information and/or preset slot labeling information after the rule matching is performed on the second text.
  • A) FST receives null characters and outputs intent information ⁇ CALL>, and transfers from initial state 0 to state 1;
  • FST accepts the character “electricity”, outputs the character “electricity”, and transfers from state 2 to state 3;
  • FST accepts the character "word”, outputs the character "word”, and transfers from state 3 to state 4;
  • FST accepts empty characters, outputs the slot label information prefix " ⁇ ", and transfers from state 5 to state 6;
  • FST accepts the character "small”, outputs the character "small”, and transfers from state 6 to state 7;
  • FST accepts the character "Ming", outputs the character "Ming", and transfers from state 7 to state 10;
  • FST accepts empty characters, outputs the suffix ":contact>" of the slot label information, and transfers from state 10 to termination state 14, thereby completing the matching process;
  • the output " ⁇ CALL> to call ⁇ :contact>" is the third text obtained after rule matching.
  • the third text will contain preset intent labeling information, such as ⁇ >, and preset slot labeling information, such as ⁇ :contact>.
  • the preset intent tagging information ⁇ > indicates that the intention of the input second text is CALL
  • the preset slot tagging information ⁇ :contact> indicates that the slot in the input second text is Xiaoming.
  • Fig. 9 is an exemplary FST intent slot model.
  • other preset symbols can also be used as preset intent labeling information and preset slot labeling information according to actual needs, which is not limited here.
  • the electronic device obtains intent information and/or slot information from the third text according to preset intent labeling information and/or preset slot labeling information;
  • the electronic device After the electronic device obtains the third text containing the preset intent labeling information and the preset slot labeling information, it can extract the intent information and the preset slot labeling information from the third text according to the position of the preset intent labeling information and the preset slot labeling information. Slot information.
  • the electronic device labels the information according to the preset intent ⁇ >, and the intent information can be extracted as CALL; label according to the preset slot Information ⁇ :contact>, the slot information can be extracted as Xiaoming.
  • the electronic device outputs the intent information and/or slot information in a structured manner.
  • the electronic device can structure the output of the intent information and the slot information for use by other modules in the electronic device.
  • the structured output may also contain other information, such as the first text, the second text, etc., which are not limited here.
  • the electronic device uses a preset FST intent slot model to recognize user input. Since the preset FST intent slot model is an FST, based on the characteristics of FST rule matching, it performs rule matching extremely fast . And after matching, the third text with preset intent labeling information and preset slot labeling information can be obtained. After the matching is completed, the intent information and slot information can be easily extracted from it, which greatly improves the use of NLU rules for intent The speed and accuracy of matching during recognition.
  • the electronic device 100 needs to recognize many different types of intents, when constructing the FST intent slot model, a strategy of one intent and one FST intent slot model is generally adopted, which can effectively reduce the number of different intents. Conflict of rules between intentions. When performing rule matching, a multi-intent parallel matching method can be used to minimize the matching delay. Therefore, the electronic device 100 can store a large number of preset FST intent slot models.
  • the preset FST intent slot model may be a preset WFST intent slot model. Since each state transition in WFST will have a weight, the cumulative weight will be obtained after the final matching is completed.
  • the electronic device can extract intent information and slot information from the output result with the highest score.
  • FIG. 11 it is a schematic diagram of another process of the intention recognition method according to an embodiment of this application.
  • the corpus producer or researcher compiles the WFST rules of each intent through the front-end rule editor, and adopts the organization method of one intent and one WFST.
  • rule_2 Open" " ⁇ "word+:app>"
  • word is a wildcard character, which can match any character, and its weight is 1.
  • the weight for matching other characters is the default weight 0.
  • the intent of the WFST Rule 1 is: to open the application WeChat.
  • the intent of this WFST rule 1 is: to open a short message.
  • the WFST intent slot model file is the WFST intent slot model.
  • FIG. 12 it is a WFST intent slot model 1 compiled according to WFST rule 1.
  • the WFST intention slot model 1 includes 10 states. Among them, state 0 represents the initial state, and states 7 and 9 represent the end state.
  • the WFST intent slot model 1 has two paths: 0, 1, 2, 3, 4, 5, 6, 7 and 0, 1, 2, 3, 4, 8, 9:
  • One of the paths is 4, 5, 6, 7:
  • From state 4 to state 8 is to receive an arbitrary character ⁇ any>, output the arbitrary character ⁇ any>, and its weight is 1;
  • From state 8 to state 8 is to receive an arbitrary character ⁇ any>, output the arbitrary character ⁇ any>, and its weight is 1;
  • the calculation method of the cumulative weight of the WFST intention slot model 1 is the addition of the weights in the path.
  • the WFST intent slot model 2 includes 6 states. Among them, state 0 represents the initial state, and state 5 represents the end state.
  • the WFST intent slot model 2 has only one path: 0, 1, 2, 3, 4, 5.
  • the cumulative weight of the WFST intention slot model 2 is calculated by adding the weights in the path.
  • Each output result includes intent identification information, slot identification information and weight;
  • FIG. 14 is a schematic diagram of another process of the intention recognition method in an embodiment of this application.
  • the electronic device loads the preset WFST intent slot model
  • the electronic device can load the preset WFST intent slot model by reading the WFST intent slot model file stored in the electronic device.
  • the electronic device in response to the user's voice input, the electronic device converts the voice input into a first text
  • step S1001 It is similar to step S1001 and will not be repeated here.
  • the electronic device performs format preprocessing on the first text to obtain the second text
  • step S1002 It is similar to step S1002 and will not be repeated here.
  • steps S1402 and S1403 can be performed simultaneously with step S1401, can be performed before step S1401, or can be performed after step S1401, which is not limited here.
  • the electronic device uses multiple preset WFST intent slot models to perform parallel rule matching on the second text to obtain a WFST result;
  • a preset WFST intent slot model is a preset WFST.
  • the preset WFST intent slot model will add preset intent labeling information and/or preset slot labeling information to the input text during the state transition process.
  • the preset intent labeling information is used to label the input text.
  • the preset slot labeling information is used to label the slot information in the input text.
  • each state transition has a weight.
  • the WFST result includes the matching text of the second text and the cumulative weight of the matching path.
  • the matching text of the second text is the output text containing the preset intent label information and/or the preset slot label information after successful matching through the matching path.
  • FIG. 15 it is an exemplary schematic diagram of performing parallel rule matching of multiple preset WFST intent slot models on the second text in an embodiment of this application.
  • the electronic device adopts a strategy of intent to a WFST intent slot model, and stores multiple preset WFST intent slot models, such as CALL.fst, MESSAGE.fst, NAVIGATE.fst, and so on. After the electronic device loads these preset WFST intent slot models, it can perform parallel rule matching on the second text obtained by preprocessing. Because in the preset WFST intent slot model, some models have wildcards in some paths, the second text may only match one path in a preset WFST intent slot model, or it may be matched with multiple presets. Multiple paths in the WFST intent slot model are all matched successfully. Therefore, one WFST result may be obtained, or multiple WFST results may be obtained.
  • step S1406 is executed;
  • step S1405 needs to be executed, and the WFST result with the highest credibility among them is determined according to the cumulative weight.
  • weights in the preset WFST intent slot model can be customized, and the calculation method of cumulative weights can also be customized. Therefore, the concept of weight and cumulative weight can be different according to the setting.
  • the calculation method of the weight can be different, and accordingly, the method of determining the WFST result with the highest credibility can also be different, which is not limited here.
  • this is an exemplary schematic diagram of a case where different settings are used for the weight and the cumulative weight in the preset WFST intent slot model in this embodiment of the present application.
  • the following is an exemplary description of these different settings:
  • Case 1 The weight of a state transition that accepts wildcards is greater than that of a state transition that does not accept wildcards, and the greater the weight of each state transition on the matching path, the greater the cumulative weight of the matching path.
  • the weight represents the cost of state transition
  • the cumulative weight represents the cost of matching paths.
  • the greater the cumulative weight the greater the cost of WFST matching the current path, and correspondingly, the lower the credibility. Therefore, when the state transitions, the weight of the state transition that accepts the wildcard is greater than the weight of the state transition that does not accept the wildcard.
  • the WFST intent slot model 1 shown in FIG. 11 and the WFST intent slot model 2 shown in FIG. 12 are the WFST intent slot models set according to the situation 1. Taking the second text as "Open SMS" as an example, when the second text uses WFST intent slot model 1 and WFST intent slot model 2 to perform parallel rule matching:
  • WFST intent slot model 1 will follow the matching path 0, 1, 2, 3, 6, 10, 10, 11 to get WFST result 1: ⁇ OPEN_APP> open ⁇ message:app>,Weight:2.
  • WFST intent slot model 2 will follow the matching path 0, 1, 2, 3, 4, 5 to get WFST result 2: ⁇ CHECK_MESSAGE> open the message, Weight:0.
  • Case 2 The weight of the state transition that accepts the wildcard is greater than the weight of the state transition that does not accept the wildcard, and the greater the weight of each state transition on the matching path, the smaller the cumulative weight of the matching path.
  • the weight of the state transition can be set to indicate the cost of the state transition, and the cumulative weight of the matching path indicates the credibility of the matching path. The greater the cumulative weight, the higher the credibility.
  • the cumulative weight calculation method can be set so that the greater the weight of the state transition on the matching path, the smaller the cumulative weight of the matching path.
  • the calculation method of the cumulative path adopts the negative number of the sum of state transitions on the matching path, etc., which is not limited here.
  • Case 3 The weight of the state transition that accepts the wildcard is less than the weight of the state transition that does not accept the wildcard, and the smaller the weight of each state transition on the matching path, the greater the cumulative weight of the matching path.
  • the weight of the state transition can be set to indicate the credibility of the state transition, and the cumulative weight of the matching path indicates the cost of the matching path. The larger the cumulative weight, the higher the cost and the lower the credibility.
  • Case 4 The weight of the state transition that accepts the wildcard is less than the weight of the state transition that does not accept the wildcard, and the smaller the weight of each state transition on the matching path, the smaller the cumulative weight of the matching path.
  • the weight of the state transition can be set to indicate the credibility of the state transition, and the cumulative weight of the matching path indicates the credibility of the matching path. The greater the cumulative weight, the higher the credibility.
  • the electronic device determines that the matching text of the second text in the WFST result with the highest credibility is the third text;
  • electronic devices can directly use the cumulative weights in the WFST results for comparison. For example, when the cumulative weight represents credibility, it is determined that the higher the cumulative weight, the higher the credibility of the WFST result. When the cumulative weight represents the cost, it is determined that the smaller the cumulative weight, the higher the credibility of the WFST result.
  • each preset WFST intent slot model may have a different number of state transitions, in order to make the final credibility comparison fair. Therefore, the cumulative weight can be normalized first to obtain the credibility score, and then the WFST result with the highest credibility can be determined according to the credibility score. Specifically:
  • the electronic device performs credibility score calculation on the multiple WFST results
  • the cumulative weight represents the cost of matching paths.
  • a formula for calculating the credibility score can be:
  • w represents the cumulative weight of the matching path in the WFST results that need to be scored
  • w max represents the largest cumulative weight among the cumulative weights of the matching path in the multiple WFST results.
  • the score range after the normalized credibility score calculation is [0, 1], the higher the score, the higher the credibility.
  • WFST result 1 ⁇ OPEN_APP> open ⁇ message:app>, Weight:2, and WFST result 2: ⁇ CHECK_MESSAGE> open the SMS, Weight:0, to calculate the credibility score obtained in the above example
  • w max is 2
  • the credibility score calculation formula 1 is an optional credibility score calculation formula when the cumulative weight represents the cost of the matching path. In practical applications, other calculation formulas can also be used to determine the credibility score of the WFST results. It is only necessary to make the higher the score, the higher the credibility.
  • the electronic device can determine that the matching text " ⁇ CHECK_MESSAGE> Open SMS" of the second text in WFST result 2 is the first Three texts.
  • S1406 The electronic device obtains the intent information and/or slot information from the third text according to the preset intent labeling information and/or the preset slot labeling information;
  • step S1004 It is similar to step S1004, and will not be repeated here.
  • the electronic device labels the information ⁇ > according to the preset intention, and can extract the intention information from the third text: CHECK_MESSAGE. It is understandable that if there is slot marking information in the third text, the electronic device can also extract the slot information from the third text, which is not limited here.
  • the electronic device outputs the intent information and/or slot information in a structured manner.
  • step S1005 It is similar to step S1005 and will not be repeated here.
  • the electronic device can structure the intention information:
  • the structured output can be used by other modules in the electronic device.
  • multiple preset WFST intent slot models can be stored in the electronic device.
  • the electronic device can use multiple WFST intent slot models to match user input in parallel. From the multiple WFST results obtained, the WFST result with the highest credibility is determined to extract and output the intention information and slot information. While improving the rate of intent recognition, it also improves the accuracy of intent recognition, and the WFST intent slot model is used for parallel matching, which greatly improves the matching rate and reduces the computing load of electronic devices.
  • the term “when” can be interpreted as meaning “if" or “after” or “in response to determining" or “in response to detecting".
  • the phrase “when determining" or “if detected (statement or event)” can be interpreted as meaning “if determined" or “in response to determining" or “when detected (Condition or event stated)” or “in response to detection of (condition or event stated)”.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium. For example, the computer instructions may be transmitted from a website, computer, server, or data center.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or a data center integrated with one or more available media.
  • the usable medium may be a magnetic medium, (for example, a floppy disk, a hard disk, and a magnetic tape), an optical medium (for example, a DVD), or a semiconductor medium (for example, a solid state hard disk).
  • the process can be completed by a computer program instructing relevant hardware.
  • the program can be stored in a computer readable storage medium. , May include the processes of the foregoing method embodiments.
  • the aforementioned storage media include: ROM or random storage RAM, magnetic disks or optical discs and other media that can store program codes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Machine Translation (AREA)

Abstract

An intent recognition method and an electronic device. In the described method, an electronic device converts a voice input into a first text, then uses a preset FST intent slot model to perform rule matching on a second text determined according to the first text, obtains a third text, and then obtains intent information and/or slot information from the third text according to preset intent labeling information and/or preset slot labeling information. By implementing the technical solution provided by the present application, the speed and accuracy of matching are improved when using an NLU rule for intent recognition.

Description

意图识别方法和电子设备Intention recognition method and electronic equipment
本申请要求于2020年6月17日提交中国专利局、申请号为202010555603.3、申请名称为“意图识别方法和电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on June 17, 2020 with the application number 202010555603.3 and the application title "Intent Recognition Method and Electronic Equipment", the entire content of which is incorporated into this application by reference.
技术领域Technical field
本申请涉及人工智能技术领域,尤其涉及意图识别方法和电子设备。This application relates to the field of artificial intelligence technology, in particular to an intention recognition method and electronic equipment.
背景技术Background technique
自然语言处理(natural language processing,NLP)是人工智能(artificial intelligence,AI)的一个子领域。而自然语言理解(natural language understanding,NLU)是自然语言处理的一项子领域,也是NLP最困难的一个课题。意图识别和槽位填充是NLU最关键的两个任务,但是由于语言的多样性,歧义性,鲁棒性,知识依赖和上下文语境等因素,NLU要很好地完成这两个任务难度非常大。Natural language processing (NLP) is a sub-field of artificial intelligence (AI). Natural language understanding (NLU) is a sub-field of natural language processing, and it is also the most difficult subject of NLP. Intent recognition and slot filling are the two most critical tasks of NLU, but due to factors such as language diversity, ambiguity, robustness, knowledge dependence and context, it is very difficult for NLU to complete these two tasks well. big.
目前,有一种基于上下文无关文法(context free grammar,CFG)的解析算法(Parsing algorithm)的语法分析方法,例如使用CYK算法(cocke younger kasami algorithm,CYK algorithm)等:首先把一般形的CFG转换为乔姆斯基范式(chomsky normal form,CNF)形式,然后使用CYK算法对转换所得的CNF文法进行自底向上的语法解析。CYK算法使用动态规划的思想,由输入单词出发,根据文法递推式一步步规约到初始状态,最终完成语法分析。当语法分析完成,整个分析路径可以生成一棵语法分析树,用户可以根据这棵语法树进行语法特征提取,进而得出想要的语法信息,例如词性,实体,语句成分等信息。At present, there is a grammatical analysis method based on the parsing algorithm (context free grammar, CFG), for example, the CYK algorithm (cocke younger kasami algorithm, CYK algorithm), etc.: first convert the general form of CFG to Chomsky normal form (CNF) form, and then use the CYK algorithm to perform bottom-up grammatical analysis of the converted CNF grammar. The CYK algorithm uses the idea of dynamic programming, starting from the input word, and step by step to the initial state according to the grammar recursively, and finally completes the grammatical analysis. When the grammatical analysis is completed, the entire analysis path can generate a grammatical analysis tree, and the user can extract grammatical features based on this grammatical tree, and then obtain the desired grammatical information, such as parts of speech, entities, sentence components and other information.
然而使用CFG解析算法进行语法分析得到意图识别的意图和槽位,在语法规则数量较少的情况下可行。但在语法规则数量较大的情况下,其解析速度会受到较大的影响,解析很慢,甚至会造成语法解析服务不可用。However, using the CFG parsing algorithm to perform grammatical analysis to obtain the intent and slot of the intent recognition is feasible when the number of grammatical rules is small. However, in the case of a large number of grammatical rules, its parsing speed will be greatly affected, the parsing is very slow, and even the parsing service may be unavailable.
发明内容Summary of the invention
本申请提供了意图识别的方法和电子设备,提升使用NLU规则进行意图识别时,匹配的速度和准确度。This application provides a method and electronic device for intent recognition to improve the speed and accuracy of matching when using NLU rules for intent recognition.
第一方面,本申请提供了一种意图识别方法,该方法包括:响应于用户的语音输入,电子设备将该语音输入转换为第一文本;该电子设备使用预设的FST意图槽位模型对第二文本进行规则匹配,得到第三文本;该第二文本根据该第一文本确定;该预设的FST意图槽位模型为一个预设的FST;该第三文本中包括预设意图标注信息和/或预设槽位标注信息,该预设意图标注信息用于标注出该第二文本的意图信息,该预设槽位标注信息用于标注出该第二文本中的槽位信息;该电子设备根据预设意图标注信息和/或预设槽位标注信息,从该第三文本中获取意图信息和/或槽位信息。In a first aspect, the present application provides an intent recognition method. The method includes: in response to a user's voice input, an electronic device converts the voice input into a first text; the electronic device uses a preset FST intent slot model to pair Rule matching is performed on the second text to obtain the third text; the second text is determined according to the first text; the preset FST intent slot model is a preset FST; the third text includes preset intent tagging information And/or preset slot labeling information, the preset intent labeling information is used to label the intent information of the second text, and the preset slot labeling information is used to label the slot information in the second text; the The electronic device obtains the intention information and/or the slot information from the third text according to the preset intention labeling information and/or the preset slot labeling information.
在上述实施例中,电子设备使用预设的FST意图槽位模型对用户输入进行意图识别,由于预设的FST意图槽位模型为一个FST,基于FST规则匹配的特性,其进行规则匹配极 快。且匹配后能得到具有预设意图标注信息和预设槽位标注信息的第三文本,在匹配完成后能便捷的从中提取出意图信息和槽位信息,极大的提升了使用NLU规则进行意图识别时匹配的速度和准确度。In the above embodiment, the electronic device uses the preset FST intent slot model to recognize user input. Since the preset FST intent slot model is an FST, based on the characteristics of FST rule matching, it performs rule matching extremely fast . And after matching, the third text with preset intent labeling information and preset slot labeling information can be obtained. After the matching is completed, the intent information and slot information can be easily extracted from it, which greatly improves the use of NLU rules for intent The speed and accuracy of matching during recognition.
结合第一方面的一些实施例,在一些实施例中,该电子设备使用预设的FST意图槽位模型对第二文本进行规则匹配,得到第三文本的步骤之前,该方法还包括:该电子设备对该第一文本进行格式预处理,得到该第二文本;该第二文本中的格式字符少于或等于该第一文本中的格式字符。With reference to some embodiments of the first aspect, in some embodiments, the electronic device uses a preset FST intent slot model to perform rule matching on the second text, and before the step of obtaining the third text, the method further includes: the electronic The device performs format preprocessing on the first text to obtain the second text; the format characters in the second text are less than or equal to the format characters in the first text.
在上述实施例中,电子设备可以先对第一文本进行格式预处理,得到该第二文本。这样可以简化对第二文本进行FST规则匹配的预设的FST意图槽位模型的复杂度,进一步提升匹配速度。In the foregoing embodiment, the electronic device may first perform format preprocessing on the first text to obtain the second text. In this way, the complexity of the preset FST intent slot model for matching the second text with FST rules can be simplified, and the matching speed can be further improved.
结合第一方面的一些实施例,在一些实施例中,该电子设备根据预设意图标注信息和/或预设槽位标注信息,从该第三文本中获取意图信息和/或槽位信息的步骤之后,该方法还包括:该电子设备将该意图信息和/或槽位信息进行结构化的输出。With reference to some embodiments of the first aspect, in some embodiments, the electronic device obtains the intent information and/or slot information from the third text according to preset intent labeling information and/or preset slot labeling information. After the steps, the method further includes: the electronic device outputs the intent information and/or slot information in a structured manner.
在上述实施例中,电子设备可以对意图信息和/或槽位信息进行结构化的输出,这样便于电子设备中的其他模块使用该意图信息和/或槽位信息。In the foregoing embodiment, the electronic device can output the intent information and/or slot information in a structured manner, so that other modules in the electronic device can use the intent information and/or slot information.
结合第一方面的一些实施例,在一些实施例中,该预设的FST意图槽位模型为预设的WFST意图槽位模型,一个预设的WFST意图槽位模型是一个预设的WFST,该预设的WFST意图槽位模型中,每一次状态转移时都有一个权重。With reference to some embodiments of the first aspect, in some embodiments, the preset FST intent slot model is a preset WFST intent slot model, and a preset WFST intent slot model is a preset WFST, In the preset WFST intent slot model, each state transition has a weight.
在上述实施例中,预设的FST意图槽位模型为预设的WFST意图槽位模型,这样可以在并行匹配时,得到每个匹配的权重,便于进行匹配结果的筛选。In the foregoing embodiment, the preset FST intent slot model is the preset WFST intent slot model, so that the weight of each match can be obtained during parallel matching, which facilitates the screening of matching results.
结合第一方面的一些实施例,在一些实施例中,该电子设备使用预设的FST意图槽位模型对第二文本进行规则匹配,得到第三文本,具体包括:该电子设备使用多个预设的WFST意图槽位模型,对该第二文本进行并行规则匹配,得到WFST结果;其中,WFST结果中包括第二文本的匹配文本以及匹配路径的累积权重,该第二文本的匹配文本为通过该匹配路径成功匹配后,输出的包含预设意图标注信息和/或预设槽位标注信息的文本;当该WFST结果为一个时,该电子设备确定该WFST结果中的第二文本的匹配文本为该第三文本;当该WFST结果为多个时,该电子设备确定其中可信度最高的WFST结果中的第二文本的匹配文本为该第三文本。With reference to some embodiments of the first aspect, in some embodiments, the electronic device uses a preset FST intent slot model to perform rule matching on the second text to obtain the third text, which specifically includes: the electronic device uses multiple presets Set the WFST intent slot model, perform parallel rule matching on the second text to obtain the WFST result; where the WFST result includes the matching text of the second text and the cumulative weight of the matching path, and the matching text of the second text is passed After the matching path is successfully matched, the output text contains the preset intent labeling information and/or the preset slot labeling information; when the WFST result is one, the electronic device determines the matching text of the second text in the WFST result Is the third text; when there are multiple WFST results, the electronic device determines that the matching text of the second text in the WFST result with the highest credibility is the third text.
在上述实施例中,电子设备将并行规则匹配后可信度最高的WFST结果中的第二文本的匹配文本确定为第三文本。在提升了并行匹配效率的同时,保证了意图识别的准确度。In the foregoing embodiment, the electronic device determines the matching text of the second text in the WFST result with the highest credibility after parallel rule matching as the third text. While improving the efficiency of parallel matching, it also ensures the accuracy of intent recognition.
结合第一方面的一些实施例,在一些实施例中,该当该WFST结果为多个时,该电子设备确定其中可信度最高的WFST结果中的第二文本的匹配文本为该第三文本,具体包括:当该WFST结果为多个时,该电子设备根据该多个WFST结果中匹配路径的累积权重,对该多个WFST结果进行可信度评分计算;该电子设备确定可信度评分最高的WFST结果中的第二文本的匹配文本为该第三文本。With reference to some embodiments of the first aspect, in some embodiments, when there are multiple WFST results, the electronic device determines that the matching text of the second text in the WFST result with the highest credibility is the third text, Specifically: when there are multiple WFST results, the electronic device calculates the credibility score for the multiple WFST results according to the cumulative weight of the matching path in the multiple WFST results; the electronic device determines that the credibility score is the highest The matching text of the second text in the WFST result is the third text.
在上述实施例中,电子设备将可信度评分最高的WFST结果中的第二文本的匹配文本确定为该第三文本,提升了可信度评价的准确度。In the foregoing embodiment, the electronic device determines the matching text of the second text in the WFST result with the highest credibility score as the third text, which improves the accuracy of credibility evaluation.
结合第一方面的一些实施例,在一些实施例中,该预设的WFST意图槽位模型中,接 受通配符的状态转移的权重大于不接受通配符的状态转移的权重,且匹配路径上每次状态转移的权重越大,匹配路径的累积权重越大。In combination with some embodiments of the first aspect, in some embodiments, in the preset WFST intent slot model, the weight of state transitions that accept wildcards is greater than the weight of state transitions that do not accept wildcards, and each state on the matching path The greater the weight of the transfer, the greater the cumulative weight of the matching path.
结合第一方面的一些实施例,在一些实施例中,该预设的WFST意图槽位模型中,匹配路径的累积权重等于匹配路径上每次状态转移的权重之和。With reference to some embodiments of the first aspect, in some embodiments, in the preset WFST intent slot model, the cumulative weight of the matching path is equal to the sum of the weights of each state transition on the matching path.
结合第一方面的一些实施例,在一些实施例中,该电子设备根据该多个WFST结果中匹配路径的累积权重,对该多个WFST结果进行可信度评分计算,具体包括:该电子设备使用可信度评分计算公式1,计算多个WFST结果的可信度评分;With reference to some embodiments of the first aspect, in some embodiments, the electronic device performs credibility score calculation on the multiple WFST results according to the cumulative weight of the matching path in the multiple WFST results, which specifically includes: the electronic device Use the credibility score calculation formula 1 to calculate the credibility score of multiple WFST results;
可信度评分计算公式1:
Figure PCTCN2021100475-appb-000001
Reliability score calculation formula 1:
Figure PCTCN2021100475-appb-000001
其中,w表示需要评分的WFST结果中匹配路径的累积权重,w max表示该多个WFST结果中匹配路径的累积权重中最大的累积权重。 Among them, w represents the cumulative weight of the matching path in the WFST results that need to be scored, and w max represents the largest cumulative weight among the cumulative weights of the matching path in the multiple WFST results.
结合第一方面的一些实施例,在一些实施例中,该电子设备使用预设的FST意图槽位模型对第二文本进行规则匹配,得到第三文本的步骤之前,该方法还包括:该电子设备加载预设的WFST意图槽位模型。With reference to some embodiments of the first aspect, in some embodiments, the electronic device uses a preset FST intent slot model to perform rule matching on the second text, and before the step of obtaining the third text, the method further includes: the electronic The device loads the preset WFST intent slot model.
第二方面,本申请实施例提供了一种电子设备,该电子设备包括:一个或多个处理器和存储器;该存储器与该一个或多个处理器耦合,该存储器用于存储计算机程序代码,该计算机程序代码包括计算机指令,该一个或多个处理器调用该计算机指令以使得该电子设备执行:响应于用户的语音输入,将该语音输入转换为第一文本;使用预设的FST意图槽位模型对第二文本进行规则匹配,得到第三文本;该第二文本根据该第一文本确定;该预设的FST意图槽位模型为一个预设的FST;该第三文本中包括预设意图标注信息和/或预设槽位标注信息,该预设意图标注信息用于标注出该第二文本的意图信息,该预设槽位标注信息用于标注出该第二文本中的槽位信息;根据预设意图标注信息和/或预设槽位标注信息,从该第三文本中获取意图信息和/或槽位信息。In a second aspect, an embodiment of the present application provides an electronic device, the electronic device includes: one or more processors and a memory; the memory is coupled with the one or more processors, and the memory is used to store computer program codes, The computer program code includes computer instructions, and the one or more processors call the computer instructions to cause the electronic device to execute: in response to a user's voice input, convert the voice input into a first text; use a preset FST intent slot The bit model performs rule matching on the second text to obtain the third text; the second text is determined according to the first text; the preset FST intent slot model is a preset FST; the third text includes presets Intent labeling information and/or preset slot labeling information, the preset intent labeling information is used to label the intent information of the second text, and the preset slot labeling information is used to label the slot in the second text Information: According to the preset intent labeling information and/or the preset slot labeling information, the intent information and/or slot information are obtained from the third text.
在上述实施例中,电子设备使用预设的FST意图槽位模型对用户输入进行意图识别,由于预设的FST意图槽位模型为一个FST,基于FST规则匹配的特性,其进行规则匹配极快。且匹配后能得到具有预设意图标注信息和预设槽位标注信息的第三文本,在匹配完成后能便捷的从中提取出意图信息和槽位信息,极大的提升了使用NLU规则进行意图识别时匹配的速度和准确度。In the above embodiment, the electronic device uses the preset FST intent slot model to recognize user input. Since the preset FST intent slot model is an FST, based on the characteristics of FST rule matching, it performs rule matching extremely fast . And after matching, the third text with preset intent labeling information and preset slot labeling information can be obtained. After the matching is completed, the intent information and slot information can be easily extracted from it, which greatly improves the use of NLU rules for intent The speed and accuracy of matching during recognition.
结合第一方面的一些实施例,在一些实施例中,该一个或多个处理器,还用于调用该计算机指令以使得该电子设备执行:对该第一文本进行格式预处理,得到该第二文本;该第二文本中的格式字符少于或等于该第一文本中的格式字符。With reference to some embodiments of the first aspect, in some embodiments, the one or more processors are also used to call the computer instructions to make the electronic device execute: format preprocessing of the first text to obtain the first text Second text; the format characters in the second text are less than or equal to the format characters in the first text.
结合第一方面的一些实施例,在一些实施例中,该一个或多个处理器,还用于调用该计算机指令以使得该电子设备执行:将该意图信息和/或槽位信息进行结构化的输出。With reference to some embodiments of the first aspect, in some embodiments, the one or more processors are also used to call the computer instructions to make the electronic device execute: structure the intent information and/or slot information Output.
结合第一方面的一些实施例,在一些实施例中,该预设的FST意图槽位模型为预设的WFST意图槽位模型,一个预设的WFST意图槽位模型是一个预设的WFST,该预设的WFST意图槽位模型中,每一次状态转移时都有一个权重。With reference to some embodiments of the first aspect, in some embodiments, the preset FST intent slot model is a preset WFST intent slot model, and a preset WFST intent slot model is a preset WFST, In the preset WFST intent slot model, each state transition has a weight.
结合第一方面的一些实施例,在一些实施例中,该一个或多个处理器,具体用于调用 该计算机指令以使得该电子设备执行:使用多个预设的WFST意图槽位模型,对该第二文本进行并行规则匹配,得到WFST结果;其中,WFST结果中包括第二文本的匹配文本以及匹配路径的累积权重,该第二文本的匹配文本为通过该匹配路径成功匹配后,输出的包含预设意图标注信息和/或预设槽位标注信息的文本;当该WFST结果为一个时,确定该WFST结果中的第二文本的匹配文本为该第三文本;当该WFST结果为多个时,确定其中可信度最高的WFST结果中的第二文本的匹配文本为该第三文本。With reference to some embodiments of the first aspect, in some embodiments, the one or more processors are specifically configured to call the computer instructions to make the electronic device execute: using multiple preset WFST intent slot models, The second text is matched with parallel rules to obtain the WFST result; where the WFST result includes the matching text of the second text and the cumulative weight of the matching path, and the matching text of the second text is the output after successfully matching through the matching path A text containing preset intent labeling information and/or preset slot labeling information; when the WFST result is one, it is determined that the matching text of the second text in the WFST result is the third text; when the WFST result is multiple At this time, it is determined that the matching text of the second text in the WFST result with the highest credibility is the third text.
结合第一方面的一些实施例,在一些实施例中,该一个或多个处理器,具体用于调用该计算机指令以使得该电子设备执行:当该WFST结果为多个时,根据该多个WFST结果中匹配路径的累积权重,对该多个WFST结果进行可信度评分计算;确定可信度评分最高的WFST结果中的第二文本的匹配文本为该第三文本。With reference to some embodiments of the first aspect, in some embodiments, the one or more processors are specifically configured to call the computer instructions to make the electronic device execute: when the WFST result is multiple, according to the multiple The cumulative weight of the matching path in the WFST result is calculated on the credibility score of the multiple WFST results; the matching text of the second text in the WFST result with the highest credibility score is determined to be the third text.
结合第一方面的一些实施例,在一些实施例中,该预设的WFST意图槽位模型中,接受通配符的状态转移的权重大于不接受通配符的状态转移的权重,且匹配路径上每次状态转移的权重越大,匹配路径的累积权重越大。In combination with some embodiments of the first aspect, in some embodiments, in the preset WFST intent slot model, the weight of state transitions that accept wildcards is greater than the weight of state transitions that do not accept wildcards, and each state on the matching path The greater the weight of the transfer, the greater the cumulative weight of the matching path.
结合第一方面的一些实施例,在一些实施例中,该预设的WFST意图槽位模型中,匹配路径的累积权重等于匹配路径上每次状态转移的权重之和。With reference to some embodiments of the first aspect, in some embodiments, in the preset WFST intent slot model, the cumulative weight of the matching path is equal to the sum of the weights of each state transition on the matching path.
结合第一方面的一些实施例,在一些实施例中,该一个或多个处理器,具体用于调用该计算机指令以使得该电子设备执行:使用可信度评分计算公式1,计算多个WFST结果的可信度评分;With reference to some embodiments of the first aspect, in some embodiments, the one or more processors are specifically configured to call the computer instructions to make the electronic device execute: using the credibility score calculation formula 1, calculate multiple WFSTs Reliability score of results;
可信度评分计算公式1:
Figure PCTCN2021100475-appb-000002
Reliability score calculation formula 1:
Figure PCTCN2021100475-appb-000002
其中,w表示需要评分的WFST结果中匹配路径的累积权重,表示该多个WFST结果中匹配路径的累积权重中最大的累积权重。Wherein, w represents the cumulative weight of the matching path in the WFST results that need to be scored, and represents the largest cumulative weight among the cumulative weights of the matching path in the multiple WFST results.
结合第一方面的一些实施例,在一些实施例中,该一个或多个处理器,还用于调用该计算机指令以使得该电子设备执行:加载预设的WFST意图槽位模型。With reference to some embodiments of the first aspect, in some embodiments, the one or more processors are also used to call the computer instructions to cause the electronic device to execute: load a preset WFST intent slot model.
第三方面,本申请实施例提供了一种芯片系统,该芯片系统应用于电子设备,该芯片系统包括一个或多个处理器,该处理器用于调用计算机指令以使得该电子设备执行如第一方面以及第一方面中任一可能的实现方式描述的方法。In a third aspect, embodiments of the present application provide a chip system that is applied to an electronic device. The chip system includes one or more processors for invoking computer instructions to make the electronic device execute the first Aspect and the method described in any possible implementation of the first aspect.
可以理解的是,该芯片系统可以包含一个如图5所示的电子设备100中的处理器110,也可以包含多个如图5所示的电子设备100中的处理器110,此处不作限定。It is understandable that the chip system may include one processor 110 in the electronic device 100 as shown in FIG. 5, or may include multiple processors 110 in the electronic device 100 as shown in FIG. 5, which is not limited here. .
第四方面,本申请实施例提供一种包含指令的计算机程序产品,当上述计算机程序产品在电子设备上运行时,使得上述电子设备执行如第一方面以及第一方面中任一可能的实现方式描述的方法。In a fourth aspect, the embodiments of the present application provide a computer program product containing instructions. When the computer program product is run on an electronic device, the electronic device executes the first aspect and any possible implementation manner in the first aspect. Described method.
第五方面,本申请实施例提供一种计算机可读存储介质,包括指令,当上述指令在电子设备上运行时,使得上述电子设备执行如第一方面以及第一方面中任一可能的实现方式描述的方法。In a fifth aspect, an embodiment of the present application provides a computer-readable storage medium, including instructions, which when the foregoing instructions run on an electronic device, cause the electronic device to execute the first aspect and any possible implementation manner in the first aspect Described method.
可以理解地,上述第二方面提供的电子设备、第三方面提供的芯片系统、第四方面提供的计算机程序产品和第五方面提供的计算机存储介质均用于执行本申请实施例所提供的 方法。因此,其所能达到的有益效果可参考对应方法中的有益效果,此处不再赘述。Understandably, the electronic equipment provided in the second aspect, the chip system provided in the third aspect, the computer program product provided in the fourth aspect, and the computer storage medium provided in the fifth aspect are all used to implement the methods provided in the embodiments of the present application. . Therefore, the beneficial effects that can be achieved can refer to the beneficial effects in the corresponding method, which will not be repeated here.
附图说明Description of the drawings
图1是一个意图和槽位关系示意图;Figure 1 is a schematic diagram of the relationship between intent and slot;
图2是一个FSA的示例性示意图;Figure 2 is an exemplary schematic diagram of an FSA;
图3是一个FST的示例性示意图;Figure 3 is an exemplary schematic diagram of an FST;
图4是一个WFST的示例性示意图;Figure 4 is an exemplary schematic diagram of a WFST;
图5是本申请实施例提供的电子设备的结构示意图;FIG. 5 is a schematic structural diagram of an electronic device provided by an embodiment of the present application;
图6是本申请实施例的电子设备的软件结构框图;FIG. 6 is a block diagram of the software structure of an electronic device according to an embodiment of the present application;
图7是本申请实施例中意图识别模块的一个构架示意图;FIG. 7 is a schematic diagram of the structure of the intention recognition module in an embodiment of the present application;
图8是本申请实施例中意图识别方法一个使用场景示意图;FIG. 8 is a schematic diagram of a usage scenario of the intention recognition method in an embodiment of the present application;
图9是本申请实施例中FST意图槽位模型一个示例性示意图;FIG. 9 is an exemplary schematic diagram of an FST intention slot model in an embodiment of the present application;
图10是本申请实施例中意图识别方法一个流程示意图;FIG. 10 is a schematic flowchart of an intention recognition method in an embodiment of the present application;
图11是本申请实施例中意图识别方法另一个流程示意图;FIG. 11 is a schematic diagram of another flow chart of the intention recognition method in an embodiment of the present application;
图12是本申请实施例中WFST意图槽位模型一个示例性示意图;FIG. 12 is an exemplary schematic diagram of a WFST intention slot model in an embodiment of the present application;
图13是本申请实施例中WFST意图槽位模型另一个示例性示意图;FIG. 13 is another exemplary schematic diagram of a WFST intention slot model in an embodiment of the present application;
图14是本申请实施例中意图识别方法另一个流程示意图;FIG. 14 is a schematic diagram of another flow chart of an intention recognition method in an embodiment of the present application;
图15是本申请实施例中一个对第二文本进行多个预设的WFST意图槽位模型并行规则匹配的示例性示意图;15 is an exemplary schematic diagram of parallel rule matching of multiple preset WFST intent slot models for the second text in an embodiment of the present application;
图16是本申请实施例中在预设的WFST意图槽位模型中对权重和累积权重采用不同设定的情况的一种示例性示意图。FIG. 16 is an exemplary schematic diagram of a situation in which different settings are used for weights and cumulative weights in the preset WFST intention slot model in an embodiment of the present application.
具体实施方式detailed description
本申请以下实施例中所使用的术语只是为了描述特定实施例的目的,而并非旨在作为对本申请的限制。如在本申请的说明书和所附权利要求书中所使用的那样,单数表达形式“一个”、“一种”、“所述”、“上述”、“该”和“这一”旨在也包括复数表达形式,除非其上下文中明确地有相反指示。还应当理解,本申请中使用的术语“和/或”是指并包含一个或多个所列出项目的任何或所有可能组合。The terms used in the following embodiments of the present application are only for the purpose of describing specific embodiments, and are not intended to limit the present application. As used in the specification and appended claims of this application, the singular expressions "a", "an", "said", "above", "the" and "this" are intended to also Including plural expressions, unless the context clearly indicates to the contrary. It should also be understood that the term "and/or" used in this application refers to and includes any or all possible combinations of one or more of the listed items.
以下,术语“第一”、“第二”仅用于描述目的,而不能理解为暗示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征,在本申请实施例的描述中,除非另有说明,“多个”的含义是两个或两个以上。Hereinafter, the terms "first" and "second" are only used for descriptive purposes, and cannot be understood as implying or implying relative importance or implicitly specifying the number of indicated technical features. Therefore, the features defined with “first” and “second” may explicitly or implicitly include one or more of these features. In the description of the embodiments of the present application, unless otherwise specified, “multiple” The meaning is two or more.
由于本申请实施例涉及意图识别技术的应用,为了便于理解,下面先对本申请实施例涉及的相关术语及概念进行介绍。Since the embodiments of the present application involve the application of intention recognition technology, in order to facilitate understanding, the following first introduces related terms and concepts involved in the embodiments of the present application.
(1)意图和槽位:(1) Intent and slot:
1.1、意图和槽位的定义:1.1. Definition of intent and slot:
意图,是指电子设备识别用户实际的或潜在的需求是什么。从根本来说,意图是一个分类器,将用户需求划分为某个类型。Intent refers to the identification of the actual or potential needs of the user by the electronic device. Fundamentally speaking, intent is a classifier that divides user needs into certain types.
意图和槽位共同构成了“用户动作”,电子设备无法直接理解自然语言,因此意图识别的作用便是将自然语言映射为机器能够理解的结构化语义表示。Intentions and slots together constitute a "user action", and electronic devices cannot directly understand natural language. Therefore, the role of intention recognition is to map natural language into a structured semantic representation that machines can understand.
意图识别,也被称为SUC(Spoken Utterance Classification),顾名思义,是将用户输入的自然语言会话进行类别(classification)划分,划分的类别对应的就是用户意图。例如“今天天气如何”,其意图为“询问天气”。自然地,可以将意图识别看作一个典型的分类问题。示例性的,意图的分类和定义可参考ISO-24617-2标准,其中共有56种详细的定义。意图的定义与系统自身的定位和所具有的知识库有很大关系,即意图的定义具有非常强的领域相关性。可以理解的是,本申请实施例中,意图的分类和定义不局限于ISO-24617-2标准。Intention recognition, also known as SUC (Spoken Utterance Classification), as the name suggests, is to classify the natural language conversation input by the user, and the classified category corresponds to the user's intent. For example, "what's the weather today", the intention is "ask the weather". Naturally, intent recognition can be regarded as a typical classification problem. Exemplarily, the classification and definition of intent can refer to the ISO-24617-2 standard, which has 56 detailed definitions. The definition of intention has a lot to do with the positioning of the system itself and the knowledge base it possesses, that is, the definition of intention has a very strong domain relevance. It can be understood that in the embodiments of the present application, the classification and definition of intentions are not limited to the ISO-24617-2 standard.
槽位,即意图所带的参数。一个意图可能对应若干个槽位,例如询问公交车路线时,需要给出出发地、目的地、时间等必要参数。以上参数即“询问公交车路线”这一意图对应的槽位。The slot is the parameter of the intent. An intent may correspond to several slots. For example, when asking for a bus route, you need to provide necessary parameters such as departure place, destination, and time. The above parameters are the slots corresponding to the intention of "asking for bus route".
例如,语义槽位填充任务的主要目标是在已知特定领域或特定意图的语义框架(semantic frame)的前提下,从输入语句中抽取该语义框架中预先定义好的语义槽的值。语义槽位填充任务可以转化为序列标注任务,即运用经典的IOB标记法,标记某一个词是某一语义槽的开始(begin)、延续(inside),或是非语义槽(outside)。For example, the main goal of the semantic slot filling task is to extract the pre-defined semantic slot values in the semantic frame from the input sentence on the premise that the semantic frame of a specific domain or specific intention is known. The semantic slot filling task can be transformed into a sequence labeling task, that is, using the classic IOB notation method to mark a word as the beginning, continuation (inside), or non-semantic slot (outside) of a certain semantic slot.
要使一个系统能正常工作,首先要设计意图和槽位。意图和槽位能够让系统知道该执行哪项特定任务,并且给出执行该任务时需要的参数类型。To make a system work properly, you must first design intent and slot location. Intent and slot position can let the system know which specific task to perform, and give the type of parameters needed to perform the task.
以一个具体的“询问天气”的需求为例,介绍面向任务的对话系统中对意图和槽位的设计:Taking a specific "inquiry about the weather" requirement as an example, introduce the design of intent and slot in the task-oriented dialogue system:
用户输入示例:“今天上海天气怎么样”;User input example: "How is the weather in Shanghai today";
用户意图定义:询问天气,Ask_Weather;User intention definition: Ask the weather, Ask_Weather;
槽位定义:槽位一:时间,Date;槽位二:地点,Location。Slot definition: Slot 1: Time, Date; Slot 2: Location, Location.
图1为本申请实施例中一个意图和槽位关系示意图。如图1中(a)所示,在该示例中,针对“询问天气”任务定义了两个必要的槽位,它们分别是“时间”和“地点”。对于一个单一的任务,上述定义便可解决任务需求。但在真实的业务环境下,一个系统往往需要能够同时处理若干个任务,例如气象台除了能够回答“询问天气”的问题,也应该能够回答“询问温度”的问题。Fig. 1 is a schematic diagram of a relationship between an intention and a slot in an embodiment of the application. As shown in Figure 1 (a), in this example, two necessary slots are defined for the "Ask the weather" task, which are "time" and "location". For a single task, the above definition can solve the task requirement. However, in a real business environment, a system often needs to be able to handle several tasks at the same time. For example, the weather station should be able to answer the question of “inquiring about the weather” as well as the question of “inquiring about the temperature”.
对于同一系统处理多种任务的复杂情况,一种优化的策略是定义更上层的领域,如将“询问天气”意图和“询问温度”意图均归属于“天气”领域。在这种情况下,可以简单地将领域理解为意图的集合。定义领域并先进行领域识别的优点是可以约束领域知识范围,减少后续意图识别和槽位填充的搜索空间。此外,对于每一个领域进行更深入的理解,利用好任务及领域相关的特定知识和特征,往往能够显著地提升自然语言理解(Natural Language Understanding,NLU)的效果。据此,对图1中(a)的示例进行改进,加入“天气”领域:For the complex situation in which the same system handles multiple tasks, an optimized strategy is to define higher-level domains, such as "asking for the weather" intentions and "asking for temperature" intentions are both in the "weather" domain. In this case, the domain can be simply understood as a collection of intents. The advantage of defining the domain and performing domain recognition first is that it can constrain the scope of domain knowledge and reduce the search space for subsequent intent recognition and slot filling. In addition, a deeper understanding of each field and the use of specific knowledge and characteristics related to tasks and fields can often significantly improve the effect of Natural Language Understanding (NLU). Based on this, the example in Figure 1 (a) is improved by adding the "weather" field:
用户输入示例:User input example:
1.“今天上海天气怎么样”;1. "How is the weather in Shanghai today";
2.“上海现在气温多少度”;2. "What is the current temperature in Shanghai";
领域定义:天气,Weather;Field definition: weather, Weather;
用户意图定义:User intent definition:
1.询问天气,Ask_Weather;1. Ask the weather, Ask_Weather;
2.询问温度,Ask_Temperature;2. Ask the temperature, Ask_Temperature;
槽位定义:Slot definition:
槽位一:时间,Date;Slot 1: Time, Date;
槽位二:地点,Location。Slot 2: Location, Location.
改进后的“询问天气”的需求对应的意图和槽位如图1中(b)所示。The intent and slot corresponding to the improved "Ask the weather" requirement are shown in Figure 1 (b).
1.2、意图识别和槽位填充:1.2. Intent identification and slot filling:
做好意图和槽位的定义后,可以从用户输入中识别用户意图和相应槽对应的槽值。After the intent and slot are defined, the user intent and the corresponding slot value of the corresponding slot can be identified from the user input.
意图识别的目标是从输入中识别用户意图,单一任务可以简单地建模为一个二分类问题,如“询问天气”意图,在意图识别时可以被建模为“是询问天气”或者“不是询问天气”二分类问题。当涉及需要系统处理多种任务时,系统需要能够判别各个意图,在这种情况下,二分类问题就转化成了多分类问题。The goal of intent recognition is to identify user intent from the input. A single task can be simply modeled as a two-category question, such as "asking for the weather" intent, which can be modeled as "asking for the weather" or "not as for asking about the weather" during intent recognition. "Weather" two classification problem. When it comes to the need for the system to handle multiple tasks, the system needs to be able to distinguish each intent. In this case, the two-category problem is transformed into a multi-category problem.
槽位填充的任务是从数据中提取信息并填充到事先定义好的槽位中,例如在图1中已经定义好了意图和相应的槽位,对于用户输入“今天上海天气怎么样”系统应当能够提取出“今天”和“上海”并分别将其填充到“时间”和“地点”槽位。The task of slot filling is to extract information from the data and fill it into a pre-defined slot. For example, the intent and the corresponding slot have been defined in Figure 1. For the user to input "How is the weather in Shanghai today", the system should Can extract "Today" and "Shanghai" and fill them into the "Time" and "Location" slots respectively.
(2)有限状态转换器(finite state transducer,FST):(2) Finite state converter (FST):
FST目前在语音识别和自然语言搜索、处理等方向被广泛应用。例如,在自然语言处理中,经常会遇到一些根据规则对文本内容进行修改的操作。例如,一个规则为:如果在字符串中,c后面紧接x,则将c修改为b。FST是基于这些规则上的数学操作,将若干个规则整合成一个单程的大型规则,以有效提高基于规则的系统的效率。FST is currently widely used in speech recognition and natural language search and processing. For example, in natural language processing, some operations that modify text content according to rules are often encountered. For example, a rule is: if c is immediately followed by x in the string, then c is changed to b. FST is based on mathematical operations on these rules, integrating several rules into a one-way large-scale rule to effectively improve the efficiency of the rule-based system.
为便于理解FST,下面首先对有限状态接收器(finite state acceptor,FSA)进行介绍:To facilitate the understanding of FST, the following first introduces the finite state acceptor (FSA):
对于给定的输入序列,FSA返回“接收”或者“不接收”两种状态。For a given input sequence, FSA returns "receiving" or "not receiving" two states.
如图2所示,为一个FSA的示例性示意图,其节点和弧分别对应状态与状态的转移。图2所示的FSA中,状态0代表初始状态,状态5代表终止状态。状态0到状态1可以接受字符a,状态1到状态1可以接受字符b,状态1到状态2可以接受字符c,状态2到状态5可以接受字符d,状态0到状态3可以接受字符b,状态3到状态4可以接受字符c,状态4到状态4可以接受字符d,状态4到状态5可以接受字符e。该图2所述的FSA对应的正则表达式为:ab*cd|bcd*e,其中*表示可以对前一个字符重复任意次数。As shown in Figure 2, it is an exemplary schematic diagram of an FSA, and its nodes and arcs respectively correspond to state and state transitions. In the FSA shown in Figure 2, state 0 represents the initial state, and state 5 represents the end state. State 0 to state 1 can accept character a, state 1 to state 1 can accept character b, state 1 to state 2 can accept character c, state 2 to state 5 can accept character d, and state 0 to state 3 can accept character b. State 3 to State 4 can accept the character c, State 4 to State 4 can accept the character d, and State 4 to State 5 can accept the character e. The regular expression corresponding to the FSA described in Figure 2 is: ab*cd|bcd*e, where * means that the previous character can be repeated any number of times.
例如,图2所示的FSA可以通过路径0,1,1,2,5接收一个符号序列“a,b,c,d”,此时,该FSA可以返回“接收”状态。再如,如果在图2所述的FSA中输入“a,b,d”序列,由于图2所示的FSA中没有路径可以得到该序列,因此,该FSA会返回“不接收”状态。For example, the FSA shown in FIG. 2 can receive a symbol sequence "a, b, c, d" through paths 0, 1, 1, 2, and 5. At this time, the FSA can return to the "receive" state. For another example, if a sequence of "a, b, d" is entered in the FSA shown in FIG. 2, since there is no path in the FSA shown in FIG. 2 to obtain the sequence, the FSA will return to the "not receiving" state.
FST是FSA的扩展,其每一次状态转移时都有一个输出标签,叫做输入输出标签对。如图2所示,为一个FST的示例性示意图。图3所示的FST中,状态0代表初始状态,状 态5代表终止状态。状态0到状态1的输入输出标签对为a:z;状态1到状态1的输入输出标签对为b:y;状态1到状态2的输入输出标签对为c:x;状态2到状态5的输入输出标签对为d:w;状态0到状态3的输入输出标签对为b:y;状态3到状态4的输入输出标签对为c:x;状态4到状态4的输入输出标签对为d:w;状态4到状态5的输入输出标签对为e:v。FST is an extension of FSA, and each state transition has an output tag, called an input-output tag pair. As shown in Figure 2, it is an exemplary schematic diagram of an FST. In the FST shown in Figure 3, state 0 represents the initial state, and state 5 represents the end state. The input and output label pairs from state 0 to state 1 are a: z; the input and output label pairs from state 1 to state 1 are b: y; the input and output label pairs from state 1 to state 2 are c: x; from state 2 to state 5 The I/O label pair of d:w; the I/O label pair of state 0 to state 3 is b:y; the I/O label pair of state 3 to state 4 is c:x; the I/O label pair of state 4 to state 4 It is d:w; the input-output label pair from state 4 to state 5 is e:v.
通过这样的标签对,FST可描述一组规则的转换或一组符号序列到另一组符合序列的转换。如图2中,输入符号序列“a,b,c,d”,通过路径0,1,1,2,5,由于a转换为了z,b转换为了y,c转换为了x,d转换为了w,即可得到另一个符号序列“z,y,x,w”。Through such tag pairs, FST can describe the conversion of a set of rules or the conversion of a set of symbol sequences to another set of conforming sequences. As shown in Figure 2, the input symbol sequence "a, b, c, d", through the path 0, 1, 1, 2, 5, because a is converted to z, b is converted to y, c is converted to x, and d is converted to w , You can get another symbol sequence "z, y, x, w".
FST是一种高效的数据结构,其基础理论基于数据结构中的图(Graph)论。FST分为非确定性(Non-Deterministic)FST和确定性(Deterministic)FST。FST is an efficient data structure, and its basic theory is based on the graph theory in the data structure. FST is divided into non-deterministic (Non-Deterministic) FST and deterministic (Deterministic) FST.
确定性FST是一个7元组:(Q,Σ,Γ,δ,ω,q 0,F),其中: The deterministic FST is a 7-tuple: (Q,Σ,Γ,δ,ω,q 0 ,F), where:
(定义1)Q是一个有限的状态集合(states);(Definition 1) Q is a finite set of states (states);
(定义2)Σ是一个有限的输入字符集合(alphabet);(Definition 2) Σ is a limited set of input characters (alphabet);
(定义3)Γ是一个有限的输出字符集合(output alphabet);(Definition 3) Γ is a limited set of output characters (output alphabet);
(定义4)δ:Q×Σ→P(Q)是状态转移函数;(Definition 4) δ: Q×Σ→P(Q) is the state transition function;
(定义5)ω:Q×Σ→Γ是输出函数;(Definition 5) ω: Q×Σ→Γ is the output function;
(定义6)q 0∈Q是初始状态; (Definition 6) q 0 ∈Q is the initial state;
(定义7)
Figure PCTCN2021100475-appb-000003
是接受状态集合。
(Definition 7)
Figure PCTCN2021100475-appb-000003
Is the collection of acceptance states.
非确定性FST也是一个7元组,不过非确定性FST在进行状态转移时可能有多个选择,所以定义4、定义5和确定性FST有所不同,非确定性FST中δ和ω的定义如下:Non-deterministic FST is also a 7-tuple, but non-deterministic FST may have multiple choices when performing state transitions, so Definition 4 and Definition 5 are different from deterministic FST. The definitions of δ and ω in non-deterministic FST as follows:
(定义4)δ:Q×Σ∪{ε}→P(Q)是状态转移函数;(Definition 4) δ: Q×Σ∪{ε}→P(Q) is the state transition function;
(定义5)ω:Q×Σ∪{ε}×Q→Γ *是输出函数。 (Definition 5) ω: Q×Σ∪{ε}×Q→Γ * is the output function.
正如上述定义,一个FST拥有多个状态和多个状态转移,从初始状态出发,经过多个接受状态,最终到达结束状态完成一次匹配操作。As defined above, an FST has multiple states and multiple state transitions, starting from the initial state, passing through multiple accepting states, and finally reaching the ending state to complete a matching operation.
FST的典型特征是使用灵活,匹配效率高,内存开销小。The typical features of FST are flexible use, high matching efficiency, and low memory overhead.
例如:有三个规则:For example: There are three rules:
1、当c后紧接x时,将c变为b:cx→bx;1. When c is immediately followed by x, change c to b: cx→bx;
2、当a前面是rs时,将a变为b:rsa→rsb;2. When a is in front of rs, change a to b: rsa→rsb;
3、当b前面是rs,后面是xy时,将b变为a:rsbxy→rsaxy。3. When b is preceded by rs and followed by xy, change b to a: rsbxy→rsaxy.
当输入的字符串是rsaxyrscxy时,根据上述这3个规则,其会按如下方式进行变换:When the input string is rsaxyrscxy, according to the above three rules, it will be transformed as follows:
1、rsaxyrscxy→rsaxyrsbxy;1. rsaxyrscxy→rsaxyrsbxy;
2、rsaxyrsbxy→rsbxyrsbxy;2. rsaxyrsbxy→rsbxyrsbxy;
3、rsbxyrsbxy→rsaxyrsaxy。3. rsbxyrsbxy→rsaxyrsaxy.
可以发现,按照这种单个规则的方式进行变换,比较低效。在第二步做的变换,在第三步又变回去了。相当于做的第二步变换没有意义。于是FST就出现了,提供了一种消除这些低效处理的方式。将这些规则合成为一个FST后,这个FST就能达到上述三个规则的 效果,而且只需要通过FST进行一次遍历,不会再产生低效的转换。原先基于这些规则的任务耗费的时间与规则数量、长度、输入字符数都有关系,而使用FST后耗费的时间只与输入字符串的字符数有关了。It can be found that the transformation in accordance with this single rule is relatively inefficient. The transformation made in the second step is changed back in the third step. It is equivalent to the second step of transformation that is meaningless. So FST appeared, providing a way to eliminate these inefficient treatments. After these rules are combined into one FST, this FST can achieve the effects of the above three rules, and only one traversal through the FST is required, and no inefficient conversion will be generated. The time spent on tasks based on these rules is related to the number of rules, length, and the number of input characters. However, the time spent after using FST is only related to the number of characters in the input string.
(3)加权有限状态转换器(weighted finite state transducer,WFST):(3) Weighted finite state transducer (WFST):
WFST为一种类型的FST。在每一次状态转移时都有一个权重,在每次的初始状态都有初始权重,在每次的终止状态都有终止权重。权重一般是转移或初始/终止状态的概率或损失。权重会延每条路径进行累积,并在不同路径进行累积。WFST is a type of FST. Each state transition has a weight, each initial state has an initial weight, and each terminal state has an end weight. The weight is generally the probability or loss of transition or initial/termination state. The weight will be accumulated along each path and accumulated on different paths.
累积权重的计算方式可以由具体的WFST规定,例如,累积权重可以为将经过路径上所有权重相乘,也可以为将经过路径上所有权重相加,还可以为其他对经过路径上权重进行计算的方式,此处不作限定。The calculation method of the cumulative weight can be specified by the specific WFST. For example, the cumulative weight can be multiplying all weights on the passing path, adding all the weights on the passing path, or calculating the weights on the passing path. The method is not limited here.
如图4是WFST的一个示例性示意图。这里每次的状态转移标签都以“输入-标签:输出-标签/权重”的形式进行转移,初始状态和终止状态也有相应的权重。在图4所示的WFST中,累积权重的计算方式为将经过路径上所有权重相乘。若输入字符序列“a,b,c,d”,则会按照路径0,1,1,2,5,以0.5*1.2*0.7*3*2*0.1=0.252的累积权重转换为字符序列“z,y,x,w”。Figure 4 is an exemplary schematic diagram of WFST. Here, each state transition label is transferred in the form of "input-label: output-label/weight", and the initial state and the final state also have corresponding weights. In the WFST shown in Figure 4, the cumulative weight is calculated by multiplying all the weights on the passing path. If the character sequence "a, b, c, d" is input, it will be converted into a character sequence with a cumulative weight of 0.5*1.2*0.7*3*2*0.1=0.252 according to the path 0, 1, 1, 2, 5 z, y, x, w".
WFST可以通过一个8元组(∑,Λ,Q,I,F,E,λ,ρ)来定义:WFST can be defined by an 8-tuple (∑, Λ, Q, I, F, E, λ, ρ):
(定义1)∑是一组有限的输入标签;(Definition 1) ∑ is a limited set of input tags;
(定义2)Λ是一组有限的输出标签;(Definition 2) Λ is a set of limited output tags;
(定义3)Q是一组有限的状态;(Definition 3) Q is a set of finite states;
(定义4)
Figure PCTCN2021100475-appb-000004
是一组初始状态;
(Definition 4)
Figure PCTCN2021100475-appb-000004
Is a set of initial states;
(定义5)
Figure PCTCN2021100475-appb-000005
是一组终止状态;
(Definition 5)
Figure PCTCN2021100475-appb-000005
Is a set of termination states;
(定义6)
Figure PCTCN2021100475-appb-000006
是多组有限的转移;其中"∈"是一个元符号标签,它代表了无符号的输入输出;K是一组权重元素集合;
(Definition 6)
Figure PCTCN2021100475-appb-000006
It is a group of finite transitions; among them, "∈" is a meta-symbol label, which represents unsigned input and output; K is a set of weight elements;
(定义7)λ:I→K是权重初始函数;(Definition 7) λ: I→K is the weight initial function;
(定义8)ρ:F→K是权重终止函数。(Definition 8) ρ: F→K is the weight termination function.
示例性的,通过上述定义,图4所示的WFST可以被定义如下:Exemplarily, through the above definition, the WFST shown in Figure 4 can be defined as follows:
(定义1)∑={a,b,c,d,e};(Definition 1) ∑={a,b,c,d,e};
(定义2)Λ={v,x,y,w,z};(Definition 2) Λ={v,x,y,w,z};
(定义3)Q={0,1,2,3,4,5};(Definition 3) Q={0,1,2,3,4,5};
(定义4)I={0};(Definition 4) I={0};
(定义5)F={5};(Definition 5) F={5};
(定义6)(Definition 6)
Figure PCTCN2021100475-appb-000007
Figure PCTCN2021100475-appb-000007
其中,E中的每次转移由(源状态,输入标签,输出标签,权重,目标状态)组成;Among them, each transition in E consists of (source state, input label, output label, weight, target state);
(定义7)λ(0)=0.5;(Definition 7) λ(0)=0.5;
(定义8)ρ(5)=0.1。(Definition 8) ρ(5) = 0.1.
下面首先介绍本申请实施例提供的示例性电子设备100。The following first introduces an exemplary electronic device 100 provided in an embodiment of the present application.
图5是本申请实施例提供的电子设备100的结构示意图。FIG. 5 is a schematic structural diagram of an electronic device 100 provided by an embodiment of the present application.
下面以电子设备100为例对实施例进行具体说明。应该理解的是,电子设备100可以具有比图中所示的更多的或者更少的部件,可以组合两个或多个的部件,或者可以具有不同的部件配置。图中所示出的各种部件可以在包括一个或多个信号处理和/或专用集成电路在内的硬件、软件、或硬件和软件的组合中实现。Hereinafter, the embodiment will be described in detail by taking the electronic device 100 as an example. It should be understood that the electronic device 100 may have more or fewer components than shown in the figure, may combine two or more components, or may have different component configurations. The various components shown in the figure may be implemented in hardware, software, or a combination of hardware and software including one or more signal processing and/or application specific integrated circuits.
电子设备100可以包括:处理器110,外部存储器接口120,内部存储器121,通用串行总线(universal serial bus,USB)接口130,充电管理模块140,电源管理模块141,电池142,天线1,天线2,移动通信模块150,无线通信模块160,音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,传感器模块180,按键190,马达191,指示器192,摄像头193,显示屏194以及用户标识模块(subscriber identification module,SIM)卡接口195等。其中传感器模块180可以包括压力传感器180A,陀螺仪传感器180B,气压传感器180C,磁传感器180D,加速度传感器180E,距离传感器180F,接近光传感器180G,指纹传感器180H,温度传感器180J,触摸传感器180K,环境光传感器180L,骨传导传感器180M等。The electronic device 100 may include: a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2. Mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, earphone jack 170D, sensor module 180, buttons 190, motor 191, indicator 192, camera 193, display screen 194 and Subscriber identification module (subscriber identification module, SIM) card interface 195, etc. The sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, and ambient light Sensor 180L, bone conduction sensor 180M, etc.
可以理解的是,本发明实施例示意的结构并不构成对电子设备100的具体限定。在本申请另一些实施例中,电子设备100可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。It can be understood that the structure illustrated in the embodiment of the present invention does not constitute a specific limitation on the electronic device 100. In other embodiments of the present application, the electronic device 100 may include more or fewer components than shown, or combine certain components, or split certain components, or arrange different components. The illustrated components can be implemented in hardware, software, or a combination of software and hardware.
处理器110可以包括一个或多个处理单元,例如:处理器110可以包括应用处理器(application processor,AP),调制解调处理器,图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),控制器,存储器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网络处理器(neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。The processor 110 may include one or more processing units. For example, the processor 110 may include an application processor (AP), a modem processor, a graphics processing unit (GPU), and an image signal processor. (image signal processor, ISP), controller, memory, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural-network processing unit (NPU) Wait. Among them, the different processing units may be independent devices or integrated in one or more processors.
其中,控制器可以是电子设备100的神经中枢和指挥中心。控制器可以根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。The controller may be the nerve center and command center of the electronic device 100. The controller can generate operation control signals according to the instruction operation code and timing signals to complete the control of fetching and executing instructions.
处理器110中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器110中的存储器为高速缓冲存储器。该存储器可以保存处理器110刚用过或循环使用的指令或数据。如果处理器110需要再次使用该指令或数据,可从所述存储器中直接调用。避免了重复存取,减少了处理器110的等待时间,因而提高了系统的效率。A memory may also be provided in the processor 110 to store instructions and data. In some embodiments, the memory in the processor 110 is a cache memory. The memory can store instructions or data that have just been used or recycled by the processor 110. If the processor 110 needs to use the instruction or data again, it can be directly called from the memory. Repeated accesses are avoided, the waiting time of the processor 110 is reduced, and the efficiency of the system is improved.
在一些实施例中,处理器110可以包括一个或多个接口。接口可以包括集成电路(inter-integrated circuit,I2C)接口,集成电路内置音频(inter-integrated circuit sound,I2S)接口,脉冲编码调制(pulse code modulation,PCM)接口,通用异步收发传输器(universal  asynchronous receiver/transmitter,UART)接口,移动产业处理器接口(mobile industry processor interface,MIPI),通用输入输出(general-purpose input/output,GPIO)接口,用户标识模块(subscriber identity module,SIM)接口,和/或通用串行总线(universal serial bus,USB)接口等。In some embodiments, the processor 110 may include one or more interfaces. The interface can include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, and a universal asynchronous transmitter (universal asynchronous transmitter) interface. receiver/transmitter, UART) interface, mobile industry processor interface (MIPI), general-purpose input/output (GPIO) interface, subscriber identity module (SIM) interface, and / Or Universal Serial Bus (USB) interface, etc.
I2C接口是一种双向同步串行总线,包括一根串行数据线(serial data line,SDA)和一根串行时钟线(derail clock line,SCL)。在一些实施例中,处理器110可以包含多组I2C总线。处理器110可以通过不同的I2C总线接口分别耦合触摸传感器180K,充电器,闪光灯,摄像头193等。例如:处理器110可以通过I2C接口耦合触摸传感器180K,使处理器110与触摸传感器180K通过I2C总线接口通信,实现电子设备100的触摸功能。The I2C interface is a bidirectional synchronous serial bus, which includes a serial data line (SDA) and a serial clock line (SCL). In some embodiments, the processor 110 may include multiple sets of I2C buses. The processor 110 may couple the touch sensor 180K, charger, flash, camera 193, etc., respectively through different I2C bus interfaces. For example, the processor 110 may couple the touch sensor 180K through an I2C interface, so that the processor 110 and the touch sensor 180K communicate through the I2C bus interface to implement the touch function of the electronic device 100.
I2S接口可以用于音频通信。在一些实施例中,处理器110可以包含多组I2S总线。处理器110可以通过I2S总线与音频模块170耦合,实现处理器110与音频模块170之间的通信。在一些实施例中,音频模块170可以通过I2S接口向无线通信模块160传递音频信号,实现通过蓝牙耳机接听电话的功能。The I2S interface can be used for audio communication. In some embodiments, the processor 110 may include multiple sets of I2S buses. The processor 110 may be coupled with the audio module 170 through an I2S bus to implement communication between the processor 110 and the audio module 170. In some embodiments, the audio module 170 may transmit audio signals to the wireless communication module 160 through an I2S interface, so as to realize the function of answering calls through a Bluetooth headset.
PCM接口也可以用于音频通信,将模拟信号抽样,量化和编码。在一些实施例中,音频模块170与无线通信模块160可以通过PCM总线接口耦合。在一些实施例中,音频模块170也可以通过PCM接口向无线通信模块160传递音频信号,实现通过蓝牙耳机接听电话的功能。所述I2S接口和所述PCM接口都可以用于音频通信。The PCM interface can also be used for audio communication to sample, quantize and encode analog signals. In some embodiments, the audio module 170 and the wireless communication module 160 may be coupled through a PCM bus interface. In some embodiments, the audio module 170 may also transmit audio signals to the wireless communication module 160 through the PCM interface, so as to realize the function of answering calls through the Bluetooth headset. Both the I2S interface and the PCM interface can be used for audio communication.
UART接口是一种通用串行数据总线,用于异步通信。该总线可以为双向通信总线。它将要传输的数据在串行通信与并行通信之间转换。在一些实施例中,UART接口通常被用于连接处理器110与无线通信模块160。例如:处理器110通过UART接口与无线通信模块160中的蓝牙模块通信,实现蓝牙功能。在一些实施例中,音频模块170可以通过UART接口向无线通信模块160传递音频信号,实现通过蓝牙耳机播放音乐的功能。The UART interface is a universal serial data bus used for asynchronous communication. The bus can be a two-way communication bus. It converts the data to be transmitted between serial communication and parallel communication. In some embodiments, the UART interface is generally used to connect the processor 110 and the wireless communication module 160. For example, the processor 110 communicates with the Bluetooth module in the wireless communication module 160 through the UART interface to realize the Bluetooth function. In some embodiments, the audio module 170 may transmit audio signals to the wireless communication module 160 through a UART interface, so as to realize the function of playing music through a Bluetooth headset.
MIPI接口可以被用于连接处理器110与显示屏194,摄像头193等外围器件。MIPI接口包括摄像头串行接口(camera serial interface,CSI),显示屏串行接口(display serial interface,DSI)等。在一些实施例中,处理器110和摄像头193通过CSI接口通信,实现电子设备100的拍摄功能。处理器110和显示屏194通过DSI接口通信,实现电子设备100的显示功能。The MIPI interface can be used to connect the processor 110 with the display screen 194, the camera 193 and other peripheral devices. The MIPI interface includes camera serial interface (camera serial interface, CSI), display serial interface (display serial interface, DSI), etc. In some embodiments, the processor 110 and the camera 193 communicate through a CSI interface to implement the shooting function of the electronic device 100. The processor 110 and the display screen 194 communicate through a DSI interface to realize the display function of the electronic device 100.
GPIO接口可以通过软件配置。GPIO接口可以被配置为控制信号,也可被配置为数据信号。在一些实施例中,GPIO接口可以用于连接处理器110与摄像头193,显示屏194,无线通信模块160,音频模块170,传感器模块180等。GPIO接口还可以被配置为I2C接口,I2S接口,UART接口,MIPI接口等。The GPIO interface can be configured through software. The GPIO interface can be configured as a control signal or as a data signal. In some embodiments, the GPIO interface can be used to connect the processor 110 with the camera 193, the display screen 194, the wireless communication module 160, the audio module 170, the sensor module 180, and so on. The GPIO interface can also be configured as an I2C interface, I2S interface, UART interface, MIPI interface, etc.
SIM接口可以被用于与SIM卡接口195通信,实现传送数据到SIM卡或读取SIM卡中数据的功能。The SIM interface can be used to communicate with the SIM card interface 195 to realize the function of transmitting data to the SIM card or reading data in the SIM card.
USB接口130是符合USB标准规范的接口,具体可以是Mini USB接口,Micro USB接口,USB Type C接口等。USB接口130可以用于连接充电器为电子设备100充电,也可以用于电子设备100与外围设备之间传输数据。也可以用于连接耳机,通过耳机播放音频。该接口还可以用于连接其他电子设备,例如AR设备等。The USB interface 130 is an interface that complies with the USB standard specification, and specifically may be a Mini USB interface, a Micro USB interface, a USB Type C interface, and so on. The USB interface 130 can be used to connect a charger to charge the electronic device 100, and can also be used to transfer data between the electronic device 100 and peripheral devices. It can also be used to connect earphones and play audio through earphones. The interface can also be used to connect other electronic devices, such as AR devices.
可以理解的是,本发明实施例示意的各模块间的接口连接关系,只是示意性说明,并 不构成对电子设备100的结构限定。在本申请另一些实施例中,电子设备100也可以采用上述实施例中不同的接口连接方式,或多种接口连接方式的组合。It can be understood that the interface connection relationship between the modules illustrated in the embodiment of the present invention is only a schematic illustration, and does not constitute a structural limitation of the electronic device 100. In other embodiments of the present application, the electronic device 100 may also adopt different interface connection modes in the foregoing embodiments, or a combination of multiple interface connection modes.
充电管理模块140用于从充电器接收充电输入。其中,充电器可以是无线充电器,也可以是有线充电器。The charging management module 140 is used to receive charging input from the charger. Among them, the charger can be a wireless charger or a wired charger.
电源管理模块141用于连接电池142,充电管理模块140与处理器110。电源管理模块141接收电池142和/或充电管理模块140的输入,为处理器110,内部存储器121,外部存储器,显示屏194,摄像头193,和无线通信模块160等供电。The power management module 141 is used to connect the battery 142, the charging management module 140 and the processor 110. The power management module 141 receives input from the battery 142 and/or the charging management module 140, and supplies power to the processor 110, the internal memory 121, the external memory, the display screen 194, the camera 193, and the wireless communication module 160.
电子设备100的无线通信功能可以通过天线1,天线2,移动通信模块150,无线通信模块160,调制解调处理器以及基带处理器等实现。The wireless communication function of the electronic device 100 can be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modem processor, and the baseband processor.
天线1和天线2用于发射和接收电磁波信号。电子设备100中的每个天线可用于覆盖单个或多个通信频带。不同的天线还可以复用,以提高天线的利用率。例如:可以将天线1复用为无线局域网的分集天线。在另外一些实施例中,天线可以和调谐开关结合使用。The antenna 1 and the antenna 2 are used to transmit and receive electromagnetic wave signals. Each antenna in the electronic device 100 can be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve antenna utilization. For example, antenna 1 can be multiplexed as a diversity antenna of a wireless local area network. In other embodiments, the antenna can be used in combination with a tuning switch.
移动通信模块150可以提供应用在电子设备100上的包括2G/3G/4G/5G等无线通信的解决方案。移动通信模块150可以包括至少一个滤波器,开关,功率放大器,低噪声放大器(low noise amplifier,LNA)等。移动通信模块150可以由天线1接收电磁波,并对接收的电磁波进行滤波,放大等处理,传送至调制解调处理器进行解调。移动通信模块150还可以对经调制解调处理器调制后的信号放大,经天线1转为电磁波辐射出去。在一些实施例中,移动通信模块150的至少部分功能模块可以被设置于处理器110中。在一些实施例中,移动通信模块150的至少部分功能模块可以与处理器110的至少部分模块被设置在同一个器件中。The mobile communication module 150 may provide a wireless communication solution including 2G/3G/4G/5G and the like applied to the electronic device 100. The mobile communication module 150 may include at least one filter, a switch, a power amplifier, a low noise amplifier (LNA), and the like. The mobile communication module 150 can receive electromagnetic waves by the antenna 1, and perform processing such as filtering, amplifying and transmitting the received electromagnetic waves to the modem processor for demodulation. The mobile communication module 150 can also amplify the signal modulated by the modem processor, and convert it into electromagnetic wave radiation via the antenna 1. In some embodiments, at least part of the functional modules of the mobile communication module 150 may be provided in the processor 110. In some embodiments, at least part of the functional modules of the mobile communication module 150 and at least part of the modules of the processor 110 may be provided in the same device.
调制解调处理器可以包括调制器和解调器。其中,调制器用于将待发送的低频基带信号调制成中高频信号。解调器用于将接收的电磁波信号解调为低频基带信号。随后解调器将解调得到的低频基带信号传送至基带处理器处理。低频基带信号经基带处理器处理后,被传递给应用处理器。应用处理器通过音频设备(不限于扬声器170A,受话器170B等)输出声音信号,或通过显示屏194显示图像或视频。在一些实施例中,调制解调处理器可以是独立的器件。在另一些实施例中,调制解调处理器可以独立于处理器110,与移动通信模块150或其他功能模块设置在同一个器件中。The modem processor may include a modulator and a demodulator. Among them, the modulator is used to modulate the low frequency baseband signal to be sent into a medium and high frequency signal. The demodulator is used to demodulate the received electromagnetic wave signal into a low-frequency baseband signal. The demodulator then transmits the demodulated low-frequency baseband signal to the baseband processor for processing. The low-frequency baseband signal is processed by the baseband processor and then passed to the application processor. The application processor outputs a sound signal through an audio device (not limited to the speaker 170A, the receiver 170B, etc.), or displays an image or video through the display screen 194. In some embodiments, the modem processor may be an independent device. In other embodiments, the modem processor may be independent of the processor 110 and be provided in the same device as the mobile communication module 150 or other functional modules.
无线通信模块160可以提供应用在电子设备100上的包括无线局域网(wireless local area networks,WLAN)(如无线保真(wireless fidelity,Wi-Fi)网络),蓝牙(bluetooth,BT),全球导航卫星系统(global navigation satellite system,GNSS),调频(frequency modulation,FM),近距离无线通信技术(near field communication,NFC),红外技术(infrared,IR)等无线通信的解决方案。无线通信模块160可以是集成至少一个通信处理模块的一个或多个器件。无线通信模块160经由天线2接收电磁波,将电磁波信号调频以及滤波处理,将处理后的信号发送到处理器110。无线通信模块160还可以从处理器110接收待发送的信号,对其进行调频,放大,经天线2转为电磁波辐射出去。The wireless communication module 160 can provide applications on the electronic device 100 including wireless local area networks (WLAN) (such as wireless fidelity (Wi-Fi) networks), bluetooth (BT), and global navigation satellites. System (global navigation satellite system, GNSS), frequency modulation (frequency modulation, FM), near field communication technology (near field communication, NFC), infrared technology (infrared, IR) and other wireless communication solutions. The wireless communication module 160 may be one or more devices integrating at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via the antenna 2, frequency modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 110. The wireless communication module 160 may also receive a signal to be sent from the processor 110, perform frequency modulation, amplify it, and convert it into electromagnetic waves to radiate through the antenna 2.
在一些实施例中,电子设备100的天线1和移动通信模块150耦合,天线2和无线通信模块160耦合,使得电子设备100可以通过无线通信技术与网络以及其他设备通信。所述无线通信技术可以包括全球移动通讯系统(global system for mobile communications, GSM),通用分组无线服务(general packet radio service,GPRS),码分多址接入(code division multiple access,CDMA),宽带码分多址(wideband code division multiple access,WCDMA),时分码分多址(time-division code division multiple access,TD-SCDMA),长期演进(long term evolution,LTE),BT,GNSS,WLAN,NFC,FM,和/或IR技术等。所述GNSS可以包括全球卫星定位系统(global positioning system,GPS),全球导航卫星系统(global navigation satellite system,GLONASS),北斗卫星导航系统(beidou navigation satellite system,BDS),准天顶卫星系统(quasi-zenith satellite system,QZSS)和/或星基增强系统(satellite based augmentation systems,SBAS)。In some embodiments, the antenna 1 of the electronic device 100 is coupled with the mobile communication module 150, and the antenna 2 is coupled with the wireless communication module 160, so that the electronic device 100 can communicate with the network and other devices through wireless communication technology. The wireless communication technology may include global system for mobile communications (GSM), general packet radio service (GPRS), code division multiple access (CDMA), broadband Code division multiple access (wideband code division multiple access, WCDMA), time-division code division multiple access (time-division code division multiple access, TD-SCDMA), long term evolution (LTE), BT, GNSS, WLAN, NFC , FM, and/or IR technology, etc. The GNSS may include the global positioning system (GPS), the global navigation satellite system (GLONASS), the Beidou navigation satellite system (BDS), and the quasi-zenith satellite system (quasi). -zenith satellite system, QZSS) and/or satellite-based augmentation systems (SBAS).
电子设备100通过GPU,显示屏194,以及应用处理器等实现显示功能。GPU为图像处理的微处理器,连接显示屏194和应用处理器。GPU用于执行数学和几何计算,用于图形渲染。处理器110可包括一个或多个GPU,其执行程序指令以生成或改变显示信息。The electronic device 100 implements a display function through a GPU, a display screen 194, an application processor, and the like. The GPU is an image processing microprocessor, which is connected to the display screen 194 and the application processor. The GPU is used to perform mathematical and geometric calculations and is used for graphics rendering. The processor 110 may include one or more GPUs that execute program instructions to generate or change display information.
显示屏194用于显示图像,视频等。显示屏194包括显示面板。显示面板可以采用液晶显示屏(liquid crystal display,LCD),有机发光二极管(organic light-emitting diode,OLED),有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrix organic light emitting diode,AMOLED),柔性发光二极管(flex light-emitting diode,FLED),Miniled,MicroLed,Micro-oLed,量子点发光二极管(quantum dot light emitting diodes,QLED)等。在一些实施例中,电子设备100可以包括1个或N个显示屏194,N为大于1的正整数。The display screen 194 is used to display images, videos, and the like. The display screen 194 includes a display panel. The display panel can adopt liquid crystal display (LCD), organic light-emitting diode (OLED), active matrix organic light-emitting diode or active-matrix organic light-emitting diode (active-matrix organic light-emitting diode). Emitting diode, AMOLED, flexible light-emitting diode (FLED), Miniled, MicroLed, Micro-oLed, quantum dot light-emitting diode (QLED), etc. In some embodiments, the electronic device 100 may include one or N display screens 194, and N is a positive integer greater than one.
电子设备100可以通过ISP,摄像头193,视频编解码器,GPU,显示屏194以及应用处理器等实现拍摄功能。The electronic device 100 can realize a shooting function through an ISP, a camera 193, a video codec, a GPU, a display screen 194, and an application processor.
ISP用于处理摄像头193反馈的数据。例如,拍照时,打开快门,光线通过镜头被传递到摄像头感光元件上,光信号转换为电信号,摄像头感光元件将所述电信号传递给ISP处理,转化为肉眼可见的图像。ISP还可以对图像的噪点,亮度,肤色进行算法优化。ISP还可以对拍摄场景的曝光,色温等参数优化。在一些实施例中,ISP可以设置在摄像头193中。The ISP is used to process the data fed back from the camera 193. For example, when taking a picture, the shutter is opened, the light is transmitted to the photosensitive element of the camera through the lens, the light signal is converted into an electrical signal, and the photosensitive element of the camera transfers the electrical signal to the ISP for processing and is converted into an image visible to the naked eye. ISP can also optimize the image noise, brightness, and skin color. ISP can also optimize the exposure, color temperature and other parameters of the shooting scene. In some embodiments, the ISP may be provided in the camera 193.
摄像头193用于捕获静态图像或视频。物体通过镜头生成光学图像投射到感光元件。感光元件可以是电荷耦合器件(charge coupled device,CCD)或互补金属氧化物半导体(complementary metal-oxide-semiconductor,CMOS)光电晶体管。感光元件把光信号转换成电信号,之后将电信号传递给ISP转换成数字图像信号。ISP将数字图像信号输出到DSP加工处理。DSP将数字图像信号转换成标准的RGB,YUV等格式的图像信号。在一些实施例中,电子设备100可以包括1个或N个摄像头193,N为大于1的正整数。The camera 193 is used to capture still images or videos. The object generates an optical image through the lens and is projected to the photosensitive element. The photosensitive element may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor. The photosensitive element converts the optical signal into an electrical signal, and then transfers the electrical signal to the ISP to convert it into a digital image signal. ISP outputs digital image signals to DSP for processing. DSP converts digital image signals into standard RGB, YUV and other formats of image signals. In some embodiments, the electronic device 100 may include one or N cameras 193, and N is a positive integer greater than one.
数字信号处理器用于处理数字信号,除了可以处理数字图像信号,还可以处理其他数字信号。例如,当电子设备100在频点选择时,数字信号处理器用于对频点能量进行傅里叶变换等。Digital signal processors are used to process digital signals. In addition to digital image signals, they can also process other digital signals. For example, when the electronic device 100 selects a frequency point, the digital signal processor is used to perform Fourier transform on the energy of the frequency point.
视频编解码器用于对数字视频压缩或解压缩。电子设备100可以支持一种或多种视频编解码器。这样,电子设备100可以播放或录制多种编码格式的视频,例如:动态图像专家组(moving picture experts group,MPEG)1,MPEG2,MPEG3,MPEG4等。Video codecs are used to compress or decompress digital video. The electronic device 100 may support one or more video codecs. In this way, the electronic device 100 can play or record videos in multiple encoding formats, such as: moving picture experts group (MPEG) 1, MPEG2, MPEG3, MPEG4, and so on.
NPU为神经网络(neural-network,NN)计算处理器,通过借鉴生物神经网络结构,例如借鉴人脑神经元之间传递模式,对输入信息快速处理,还可以不断的自学习。通过NPU 可以实现电子设备100的智能认知等应用,例如:图像识别,人脸识别,语音识别,文本理解等。NPU is a neural-network (NN) computing processor. By drawing on the structure of biological neural networks, for example, the transfer mode between human brain neurons, it can quickly process input information, and it can also continuously self-learn. Through the NPU, applications such as intelligent cognition of the electronic device 100 can be realized, such as image recognition, face recognition, voice recognition, text understanding, and so on.
外部存储器接口120可以用于连接外部存储卡,例如Micro SD卡,实现扩展电子设备100的存储能力。外部存储卡通过外部存储器接口120与处理器110通信,实现数据存储功能。例如将音乐,视频等文件保存在外部存储卡中。The external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the electronic device 100. The external memory card communicates with the processor 110 through the external memory interface 120 to realize the data storage function. For example, save music, video and other files in an external memory card.
内部存储器121可以用于存储计算机可执行程序代码,所述可执行程序代码包括指令。处理器110通过运行存储在内部存储器121的指令,从而执行电子设备100的各种功能应用以及数据处理。内部存储器121可以包括存储程序区和存储数据区。其中,存储程序区可存储操作系统,至少一个功能所需的应用(比如人脸识别功能,指纹识别功能、移动支付功能等)等。存储数据区可存储电子设备100使用过程中所创建的数据(比如人脸信息模板数据,指纹信息模板等)等。此外,内部存储器121可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件,闪存器件,通用闪存存储器(universal flash storage,UFS)等。The internal memory 121 may be used to store computer executable program code, where the executable program code includes instructions. The processor 110 executes various functional applications and data processing of the electronic device 100 by running instructions stored in the internal memory 121. The internal memory 121 may include a storage program area and a storage data area. Among them, the storage program area can store an operating system, at least one application required by a function (such as a face recognition function, a fingerprint recognition function, a mobile payment function, etc.) and so on. The storage data area can store data created during the use of the electronic device 100 (such as face information template data, fingerprint information template, etc.) and the like. In addition, the internal memory 121 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, a universal flash storage (UFS), and the like.
电子设备100可以通过音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,以及应用处理器等实现音频功能。例如音乐播放,录音等。The electronic device 100 can implement audio functions through the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the earphone interface 170D, and the application processor. For example, music playback, recording, etc.
音频模块170用于将数字音频信息转换成模拟音频信号输出,也用于将模拟音频输入转换为数字音频信号。音频模块170还可以用于对音频信号编码和解码。在一些实施例中,音频模块170可以设置于处理器110中,或将音频模块170的部分功能模块设置于处理器110中。The audio module 170 is used to convert digital audio information into an analog audio signal for output, and is also used to convert an analog audio input into a digital audio signal. The audio module 170 can also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be provided in the processor 110, or part of the functional modules of the audio module 170 may be provided in the processor 110.
扬声器170A,也称“喇叭”,用于将音频电信号转换为声音信号。电子设备100可以通过扬声器170A收听音乐,或收听免提通话。The speaker 170A, also called "speaker", is used to convert audio electrical signals into sound signals. The electronic device 100 can listen to music through the speaker 170A, or listen to a hands-free call.
受话器170B,也称“听筒”,用于将音频电信号转换成声音信号。当电子设备100接听电话或语音信息时,可以通过将受话器170B靠近人耳接听语音。The receiver 170B, also called a "handset", is used to convert audio electrical signals into sound signals. When the electronic device 100 answers a call or voice message, it can receive the voice by bringing the receiver 170B close to the human ear.
麦克风170C,也称“话筒”,“传声器”,用于将声音信号转换为电信号。当拨打电话或发送语音信息时,用户可以通过人嘴靠近麦克风170C发声,将声音信号输入到麦克风170C。电子设备100可以设置至少一个麦克风170C。在另一些实施例中,电子设备100可以设置两个麦克风170C,除了采集声音信号,还可以实现降噪功能。在另一些实施例中,电子设备100还可以设置三个,四个或更多麦克风170C,实现采集声音信号,降噪,还可以识别声音来源,实现定向录音功能等。The microphone 170C, also called "microphone", "microphone", is used to convert sound signals into electrical signals. When making a call or sending a voice message, the user can make a sound by approaching the microphone 170C through the human mouth, and input the sound signal into the microphone 170C. The electronic device 100 may be provided with at least one microphone 170C. In other embodiments, the electronic device 100 may be provided with two microphones 170C, which can implement noise reduction functions in addition to collecting sound signals. In other embodiments, the electronic device 100 may also be provided with three, four or more microphones 170C to collect sound signals, reduce noise, identify sound sources, and realize directional recording functions.
耳机接口170D用于连接有线耳机。耳机接口170D可以是USB接口130,也可以是3.5mm的开放移动电子设备平台(open mobile terminal platform,OMTP)标准接口,美国蜂窝电信工业协会(cellular telecommunications industry association of the USA,CTIA)标准接口。The earphone interface 170D is used to connect wired earphones. The earphone interface 170D may be a USB interface 130, or a 3.5mm open mobile terminal platform (OMTP) standard interface, or a cellular telecommunications industry association (cellular telecommunications industry association of the USA, CTIA) standard interface.
压力传感器180A用于感受压力信号,可以将压力信号转换成电信号。在一些实施例中,压力传感器180A可以设置于显示屏194。压力传感器180A的种类很多,如电阻式压力传感器,电感式压力传感器,电容式压力传感器等。电容式压力传感器可以是包括至少两个具有导电材料的平行板。当有力作用于压力传感器180A,电极之间的电容改变。电子设备100根据电容的变化确定压力的强度。当有触摸操作作用于显示屏194,电子设备100 根据压力传感器180A检测所述触摸操作强度。电子设备100也可以根据压力传感器180A的检测信号计算触摸的位置。在一些实施例中,作用于相同触摸位置,但不同触摸操作强度的触摸操作,可以对应不同的操作指令。例如:当有触摸操作强度小于第一压力阈值的触摸操作作用于短消息应用图标时,执行查看短消息的指令。当有触摸操作强度大于或等于第一压力阈值的触摸操作作用于短消息应用图标时,执行新建短消息的指令。The pressure sensor 180A is used to sense the pressure signal and can convert the pressure signal into an electrical signal. In some embodiments, the pressure sensor 180A may be provided on the display screen 194. There are many types of pressure sensors 180A, such as resistive pressure sensors, inductive pressure sensors, capacitive pressure sensors and so on. The capacitive pressure sensor may include at least two parallel plates with conductive materials. When a force is applied to the pressure sensor 180A, the capacitance between the electrodes changes. The electronic device 100 determines the intensity of the pressure according to the change in capacitance. When a touch operation acts on the display screen 194, the electronic device 100 detects the intensity of the touch operation according to the pressure sensor 180A. The electronic device 100 may also calculate the touched position according to the detection signal of the pressure sensor 180A. In some embodiments, touch operations that act on the same touch position but have different touch operation intensities can correspond to different operation instructions. For example: when a touch operation whose intensity is less than the first pressure threshold is applied to the short message application icon, an instruction to view the short message is executed. When a touch operation with a touch operation intensity greater than or equal to the first pressure threshold acts on the short message application icon, an instruction to create a new short message is executed.
陀螺仪传感器180B可以用于确定电子设备100的运动姿态。在一些实施例中,可以通过陀螺仪传感器180B确定电子设备100围绕三个轴(即,x,y和z轴)的角速度。陀螺仪传感器180B可以用于拍摄防抖。示例性的,当按下快门,陀螺仪传感器180B检测电子设备100抖动的角度,根据角度计算出镜头模组需要补偿的距离,让镜头通过反向运动抵消电子设备100的抖动,实现防抖。陀螺仪传感器180B还可以用于导航,体感游戏场景。The gyro sensor 180B may be used to determine the movement posture of the electronic device 100. In some embodiments, the angular velocity of the electronic device 100 around three axes (ie, x, y, and z axes) can be determined by the gyroscope sensor 180B. The gyro sensor 180B can be used for image stabilization. Exemplarily, when the shutter is pressed, the gyro sensor 180B detects the shake angle of the electronic device 100, calculates the distance that the lens module needs to compensate according to the angle, and allows the lens to counteract the shake of the electronic device 100 through reverse movement to achieve anti-shake. The gyro sensor 180B can also be used for navigation and somatosensory game scenes.
气压传感器180C用于测量气压。在一些实施例中,电子设备100通过气压传感器180C测得的气压值计算海拔高度,辅助定位和导航。The air pressure sensor 180C is used to measure air pressure. In some embodiments, the electronic device 100 calculates the altitude based on the air pressure value measured by the air pressure sensor 180C to assist positioning and navigation.
磁传感器180D包括霍尔传感器。电子设备100可以利用磁传感器180D检测翻盖皮套的开合。在一些实施例中,当电子设备100是翻盖机时,电子设备100可以根据磁传感器180D检测翻盖的开合。进而根据检测到的皮套的开合状态或翻盖的开合状态,设置翻盖自动解锁等特性。The magnetic sensor 180D includes a Hall sensor. The electronic device 100 can use the magnetic sensor 180D to detect the opening and closing of the flip holster. In some embodiments, when the electronic device 100 is a flip machine, the electronic device 100 can detect the opening and closing of the flip according to the magnetic sensor 180D. Then, according to the detected opening and closing state of the leather case or the opening and closing state of the flip cover, features such as automatic unlocking of the flip cover are set.
加速度传感器180E可检测电子设备100在各个方向上(一般为三轴)加速度的大小。当电子设备100静止时可检测出重力的大小及方向。还可以用于识别电子设备姿态,应用于横竖屏切换,计步器等应用。The acceleration sensor 180E can detect the magnitude of the acceleration of the electronic device 100 in various directions (generally three axes). When the electronic device 100 is stationary, the magnitude and direction of gravity can be detected. It can also be used to identify the posture of electronic devices, and be used in applications such as horizontal and vertical screen switching, pedometers and so on.
距离传感器180F,用于测量距离。电子设备100可以通过红外或激光测量距离。在一些实施例中,拍摄场景,电子设备100可以利用距离传感器180F测距以实现快速对焦。Distance sensor 180F, used to measure distance. The electronic device 100 can measure the distance by infrared or laser. In some embodiments, when shooting a scene, the electronic device 100 may use the distance sensor 180F to measure the distance to achieve fast focusing.
接近光传感器180G可以包括例如发光二极管(LED)和光检测器,例如光电二极管。发光二极管可以是红外发光二极管。电子设备100通过发光二极管向外发射红外光。电子设备100使用光电二极管检测来自附近物体的红外反射光。当检测到充分的反射光时,可以确定电子设备100附近有物体。当检测到不充分的反射光时,电子设备100可以确定电子设备100附近没有物体。电子设备100可以利用接近光传感器180G检测用户手持电子设备100贴近耳朵通话,以便自动熄灭屏幕达到省电的目的。接近光传感器180G也可用于皮套模式,口袋模式自动解锁与锁屏。The proximity light sensor 180G may include, for example, a light emitting diode (LED) and a light detector such as a photodiode. The light emitting diode may be an infrared light emitting diode. The electronic device 100 emits infrared light to the outside through the light emitting diode. The electronic device 100 uses a photodiode to detect infrared reflected light from nearby objects. When sufficient reflected light is detected, it can be determined that there is an object near the electronic device 100. When insufficient reflected light is detected, the electronic device 100 can determine that there is no object near the electronic device 100. The electronic device 100 can use the proximity light sensor 180G to detect that the user holds the electronic device 100 close to the ear to talk, so as to automatically turn off the screen to save power. The proximity light sensor 180G can also be used in leather case mode, and the pocket mode will automatically unlock and lock the screen.
环境光传感器180L用于感知环境光亮度。电子设备100可以根据感知的环境光亮度自适应调节显示屏194亮度。环境光传感器180L也可用于拍照时自动调节白平衡。环境光传感器180L还可以与接近光传感器180G配合,检测电子设备100是否在口袋里,以防误触。The ambient light sensor 180L is used to sense the brightness of the ambient light. The electronic device 100 can adaptively adjust the brightness of the display screen 194 according to the perceived brightness of the ambient light. The ambient light sensor 180L can also be used to automatically adjust the white balance when taking pictures. The ambient light sensor 180L can also cooperate with the proximity light sensor 180G to detect whether the electronic device 100 is in the pocket to prevent accidental touch.
指纹传感器180H用于采集指纹。电子设备100可以利用采集的指纹特性实现指纹解锁,访问应用锁,指纹拍照,指纹接听来电等。The fingerprint sensor 180H is used to collect fingerprints. The electronic device 100 can use the collected fingerprint characteristics to implement fingerprint unlocking, access application locks, fingerprint photographs, fingerprint answering calls, and so on.
温度传感器180J用于检测温度。在一些实施例中,电子设备100利用温度传感器180J检测的温度,执行温度处理策略。例如,当温度传感器180J上报的温度超过阈值,电子设备100执行降低位于温度传感器180J附近的处理器的性能,以便降低功耗实施热保护。在另一些实施例中,当温度低于另一阈值时,电子设备100对电池142加热,以避免低温导致电子设备100异常关机。在其他一些实施例中,当温度低于又一阈值时,电子设备100 对电池142的输出电压执行升压,以避免低温导致的异常关机。The temperature sensor 180J is used to detect temperature. In some embodiments, the electronic device 100 uses the temperature detected by the temperature sensor 180J to execute a temperature processing strategy. For example, when the temperature reported by the temperature sensor 180J exceeds a threshold value, the electronic device 100 reduces the performance of the processor located near the temperature sensor 180J, so as to reduce power consumption and implement thermal protection. In other embodiments, when the temperature is lower than another threshold, the electronic device 100 heats the battery 142 to avoid abnormal shutdown of the electronic device 100 due to low temperature. In some other embodiments, when the temperature is lower than another threshold, the electronic device 100 boosts the output voltage of the battery 142 to avoid abnormal shutdown caused by low temperature.
触摸传感器180K,也称“触控面板”。触摸传感器180K可以设置于显示屏194,由触摸传感器180K与显示屏194组成触摸屏,也称“触控屏”。触摸传感器180K用于检测作用于其上或附近的触摸操作。触摸传感器可以将检测到的触摸操作传递给应用处理器,以确定触摸事件类型。可以通过显示屏194提供与触摸操作相关的视觉输出。在另一些实施例中,触摸传感器180K也可以设置于电子设备100的表面,与显示屏194所处的位置不同。Touch sensor 180K, also called "touch panel". The touch sensor 180K may be provided on the display screen 194, and the touch screen is composed of the touch sensor 180K and the display screen 194, which is also called a “touch screen”. The touch sensor 180K is used to detect touch operations acting on or near it. The touch sensor can pass the detected touch operation to the application processor to determine the type of touch event. The visual output related to the touch operation can be provided through the display screen 194. In other embodiments, the touch sensor 180K may also be disposed on the surface of the electronic device 100, which is different from the position of the display screen 194.
按键190包括开机键,音量键等。按键190可以是机械按键。也可以是触摸式按键。电子设备100可以接收按键输入,产生与电子设备100的用户设置以及功能控制有关的键信号输入。The button 190 includes a power-on button, a volume button, and so on. The button 190 may be a mechanical button. It can also be a touch button. The electronic device 100 may receive key input, and generate key signal input related to user settings and function control of the electronic device 100.
马达191可以产生振动提示。马达191可以用于来电振动提示,也可以用于触摸振动反馈。例如,作用于不同应用(例如拍照,音频播放等)的触摸操作,可以对应不同的振动反馈效果。作用于显示屏194不同区域的触摸操作,马达191也可对应不同的振动反馈效果。不同的应用场景(例如:时间提醒,接收信息,闹钟,游戏等)也可以对应不同的振动反馈效果。触摸振动反馈效果还可以支持自定义。The motor 191 can generate vibration prompts. The motor 191 can be used for incoming call vibration notification, and can also be used for touch vibration feedback. For example, touch operations that act on different applications (such as taking pictures, audio playback, etc.) can correspond to different vibration feedback effects. Acting on touch operations in different areas of the display screen 194, the motor 191 can also correspond to different vibration feedback effects. Different application scenarios (for example: time reminding, receiving information, alarm clock, games, etc.) can also correspond to different vibration feedback effects. The touch vibration feedback effect can also support customization.
指示器192可以是指示灯,可以用于指示充电状态,电量变化,也可以用于指示消息,未接来电,通知等。The indicator 192 may be an indicator light, which may be used to indicate the charging status, power change, or to indicate messages, missed calls, notifications, and so on.
SIM卡接口195用于连接SIM卡。SIM卡可以通过插入SIM卡接口195,或从SIM卡接口195拔出,实现和电子设备100的接触和分离。电子设备100可以支持1个或N个SIM卡接口,N为大于1的正整数。SIM卡接口195可以支持Nano SIM卡,Micro SIM卡,SIM卡等。同一个SIM卡接口195可以同时插入多张卡。所述多张卡的类型可以相同,也可以不同。SIM卡接口195也可以兼容不同类型的SIM卡。SIM卡接口195也可以兼容外部存储卡。电子设备100通过SIM卡和网络交互,实现通话以及数据通信等功能。The SIM card interface 195 is used to connect to the SIM card. The SIM card can be inserted into the SIM card interface 195 or pulled out from the SIM card interface 195 to achieve contact and separation with the electronic device 100. The electronic device 100 may support 1 or N SIM card interfaces, and N is a positive integer greater than 1. The SIM card interface 195 can support Nano SIM cards, Micro SIM cards, SIM cards, etc. The same SIM card interface 195 can insert multiple cards at the same time. The types of the multiple cards can be the same or different. The SIM card interface 195 can also be compatible with different types of SIM cards. The SIM card interface 195 can also be compatible with external memory cards. The electronic device 100 interacts with the network through the SIM card to implement functions such as call and data communication.
本申请实施例中,电子设备100可以通过麦克风170C以及传感器模块180接受到用户的语音输入及环境信息,用户的语音输入通过音频模块170转换为数字音频信息后,可以由处理器110进行语音识别,转换成文字信息。再执行本申请实施例中的意图识别方法,识别出用户的意图和槽位,并用结构化语义表示。In the embodiment of the present application, the electronic device 100 may receive the user's voice input and environmental information through the microphone 170C and the sensor module 180. After the user's voice input is converted into digital audio information through the audio module 170, the processor 110 may perform voice recognition , Converted into text information. Then execute the intention recognition method in the embodiment of the present application to identify the user's intention and slot, and express it with structured semantics.
图6是本申请实施例的电子设备100的软件结构框图。FIG. 6 is a block diagram of the software structure of the electronic device 100 according to an embodiment of the present application.
分层架构将软件分成若干个层,每一层都有清晰的角色和分工。层与层之间通过软件接口通信。在一些实施例中,将系统分为四层,从上至下分别为应用程序层,应用程序框架层,运行时(Runtime)和系统库,以及内核层。The layered architecture divides the software into several layers, and each layer has a clear role and division of labor. Communication between layers through software interface. In some embodiments, the system is divided into four layers, from top to bottom, the application layer, the application framework layer, the runtime and system libraries, and the kernel layer.
应用程序层可以包括一系列应用程序包。The application layer can include a series of application packages.
如图6所示,应用程序包可以包括相机,图库,日历,通话,地图,导航,WLAN,蓝牙,音乐,视频,短信息等应用程序(也可以称为应用)。As shown in Figure 6, the application package may include applications (also referred to as applications) such as camera, gallery, calendar, call, map, navigation, WLAN, Bluetooth, music, video, short message, etc.
本申请实施例中,应用程序层还可以包括意图识别模块。该意图识别模块用于执行本申请实施例中的意图识别方法。In the embodiment of the present application, the application layer may also include an intention recognition module. The intention recognition module is used to execute the intention recognition method in the embodiment of the present application.
应用程序框架层为应用程序层的应用程序提供应用编程接口(application programming interface,API)和编程框架。应用程序框架层包括一些预先定义的函数。The application framework layer provides an application programming interface (application programming interface, API) and a programming framework for applications in the application layer. The application framework layer includes some predefined functions.
如图6所示,应用程序框架层可以包括窗口管理器,内容提供器,视图系统,电话管理器,资源管理器,通知管理器等。As shown in Figure 6, the application framework layer can include a window manager, a content provider, a view system, a phone manager, a resource manager, a notification manager, and so on.
窗口管理器用于管理窗口程序。窗口管理器可以获取显示屏大小,判断是否有状态栏,锁定屏幕,截取屏幕等。The window manager is used to manage window programs. The window manager can obtain the size of the display screen, determine whether there is a status bar, lock the screen, take a screenshot, etc.
内容提供器用来存放和获取数据,并使这些数据可以被应用程序访问。所述数据可以包括视频,图像,音频,拨打和接听的电话,浏览历史和书签,电话簿等。The content provider is used to store and retrieve data and make these data accessible to applications. The data may include video, image, audio, phone calls made and received, browsing history and bookmarks, phone book, etc.
视图系统包括可视控件,例如显示文字的控件,显示图片的控件等。视图系统可用于构建应用程序。显示界面可以由一个或多个视图组成的。例如,包括短信通知图标的显示界面,可以包括显示文字的视图以及显示图片的视图。The view system includes visual controls, such as controls that display text, controls that display pictures, and so on. The view system can be used to build applications. The display interface can be composed of one or more views. For example, a display interface that includes a short message notification icon may include a view that displays text and a view that displays pictures.
电话管理器用于提供电子设备100的通信功能。例如通话状态的管理(包括接通,挂断等)。The phone manager is used to provide the communication function of the electronic device 100. For example, the management of the call status (including connecting, hanging up, etc.).
资源管理器为应用程序提供各种资源,比如本地化字符串,图标,图片,布局文件,视频文件等等。The resource manager provides various resources for the application, such as localized strings, icons, pictures, layout files, video files, and so on.
通知管理器使应用程序可以在状态栏中显示通知信息,可以用于传达告知类型的消息,可以短暂停留后自动消失,无需用户交互。比如通知管理器被用于告知下载完成,消息提醒等。通知管理器还可以是以图表或者滚动条文本形式出现在系统顶部状态栏的通知,例如后台运行的应用程序的通知,还可以是以对话界面形式出现在屏幕上的通知。例如在状态栏提示文本信息,发出提示音,电子设备振动,指示灯闪烁等。The notification manager enables the application to display notification information in the status bar, which can be used to convey notification-type messages, and it can automatically disappear after a short stay without user interaction. For example, the notification manager is used to notify that the download is complete, message reminders, and so on. The notification manager can also be a notification that appears in the status bar at the top of the system in the form of a chart or a scroll bar text, such as a notification of an application running in the background, or a notification that appears on the screen in the form of a dialogue interface. For example, prompt text information in the status bar, sound a prompt sound, electronic device vibration, flashing indicator light, etc.
运行时(Runtime)包括核心库和虚拟机。Runtime负责系统的调度和管理。Runtime includes core libraries and virtual machines. Runtime is responsible for system scheduling and management.
核心库包含两部分:一部分是java语言需要调用的功能函数,另一部分是系统的核心库。The core library consists of two parts: one part is the function function that the java language needs to call, and the other part is the core library of the system.
应用程序层和应用程序框架层运行在虚拟机中。虚拟机将应用程序层和应用程序框架层的java文件执行为二进制文件。虚拟机用于执行对象生命周期的管理,堆栈管理,线程管理,安全和异常的管理,以及垃圾回收等功能。The application layer and the application framework layer run in a virtual machine. The virtual machine executes the java files of the application layer and the application framework layer as binary files. The virtual machine is used to perform functions such as object life cycle management, stack management, thread management, security and exception management, and garbage collection.
系统库可以包括多个功能模块。例如:表面管理器(surface manager),媒体库(Media Libraries),三维图形处理库(例如:OpenGL ES),二维图形引擎(例如:SGL)等。The system library can include multiple functional modules. For example: surface manager (surface manager), media library (Media Libraries), three-dimensional graphics processing library (for example: OpenGL ES), two-dimensional graphics engine (for example: SGL), etc.
表面管理器用于对显示子系统进行管理,并且为多个应用程序提供了二维(2-dimensional,2D)和三维(3-dimensional,3D)图层的融合。The surface manager is used to manage the display subsystem, and provides a combination of two-dimensional (2-dimensional, 2D) and three-dimensional (3-dimensional, 3D) layers for multiple applications.
媒体库支持多种常用的音频,视频格式回放和录制,以及静态图像文件等。媒体库可以支持多种音视频编码格式,例如:MPEG4,H.264,MP3,AAC,AMR,JPG,PNG等。The media library supports playback and recording of a variety of commonly used audio and video formats, as well as still image files. The media library can support a variety of audio and video encoding formats, such as: MPEG4, H.264, MP3, AAC, AMR, JPG, PNG, etc.
三维图形处理库用于实现3D图形绘图,图像渲染,合成,和图层处理等。The 3D graphics processing library is used to implement 3D graphics drawing, image rendering, synthesis, and layer processing.
2D图形引擎是2D绘图的绘图引擎。The 2D graphics engine is a drawing engine for 2D drawing.
内核层是硬件和软件之间的层。内核层至少包含显示驱动,摄像头驱动,音频驱动,传感器驱动,虚拟卡驱动。The kernel layer is the layer between hardware and software. The kernel layer contains at least display driver, camera driver, audio driver, sensor driver, and virtual card driver.
下面结合捕获拍照场景,示例性说明电子设备100软件以及硬件的工作流程。In the following, the workflow of the software and hardware of the electronic device 100 will be exemplified in conjunction with capturing a photo scene.
当触摸传感器180K接收到触摸操作,相应的硬件中断被发给内核层。内核层将触摸操作加工成原始输入事件(包括触摸坐标,触摸操作的时间戳等信息)。原始输入事件被存储在内核层。应用程序框架层从内核层获取原始输入事件,识别该输入事件所对应的控件。 以该触摸操作是触摸单击操作,该单击操作所对应的控件为相机应用图标的控件为例,相机应用调用应用框架层的接口,启动相机应用,进而通过调用内核层启动摄像头驱动,通过摄像头193捕获静态图像或视频。When the touch sensor 180K receives a touch operation, the corresponding hardware interrupt is sent to the kernel layer. The kernel layer processes the touch operation into the original input event (including touch coordinates, time stamp of the touch operation, etc.). The original input events are stored in the kernel layer. The application framework layer obtains the original input event from the kernel layer and identifies the control corresponding to the input event. Taking the touch operation as a touch click operation, and the control corresponding to the click operation is the control of the camera application icon as an example, the camera application calls the interface of the application framework layer to start the camera application, and then starts the camera driver by calling the kernel layer. The camera 193 captures still images or videos.
具体的,如图7所示,为本申请实施例中意图识别模块的一个构架示意图。该意图识别模块为一个NLU引擎700。Specifically, as shown in FIG. 7, it is a schematic diagram of the architecture of the intention recognition module in the embodiment of this application. The intent recognition module is an NLU engine 700.
该NLU引擎700用于对用户输入进行语义分析,输出意图和槽位等分析结果供其他模块使用。The NLU engine 700 is used to perform semantic analysis on user input, and output analysis results such as intent and slot position for use by other modules.
该NLU引擎700包括文本预处理单元701、规则引擎702、机器学习引擎703、实体识别单元704、意图分类单元705和槽位填充单元706。The NLU engine 700 includes a text preprocessing unit 701, a rule engine 702, a machine learning engine 703, an entity recognition unit 704, an intent classification unit 705, and a slot filling unit 706.
其中,文本预处理单元701,用于对用户输入的文本进行预处理,主要包括去除文本中后续语义分析不需要使用到的格式符号,例如标点、空格等。可以理解的是,该用户输入的文本一般是对用户的语音信息进行语音识别后得到的。The text preprocessing unit 701 is used to preprocess the text input by the user, mainly including removing format symbols in the text that are not needed for subsequent semantic analysis, such as punctuation and spaces. It is understandable that the text input by the user is generally obtained after voice recognition of the user's voice information.
规则引擎702,用于根据基于FST的预设规则,对预处理后的用户输入的文本进行规则匹配,进行高频句式覆盖,得到有意图槽位格式标记的格式化文本。本申请实施例中的意图识别方法主要用于该规则引擎702的构建。The rule engine 702 is configured to perform rule matching on the preprocessed text input by the user according to preset rules based on FST, perform high-frequency sentence pattern coverage, and obtain formatted text with intentional slot format tags. The intention recognition method in the embodiment of the present application is mainly used for the construction of the rule engine 702.
机器学习引擎703,用于通过机器学习,对预处理后的用户输入的文本进行处理,得到有意图槽位格式标记的格式化文本。The machine learning engine 703 is used to process the preprocessed text input by the user through machine learning to obtain formatted text marked with an intentional slot format.
实体识别704,用于从规则引擎702或机器学习引擎703输出的有意图槽位格式标记的格式化文本中,提取出实体信息; Entity recognition 704 is used to extract entity information from formatted text with intentional slot format tags output by the rule engine 702 or the machine learning engine 703;
意图分类705,用于从规则引擎702或机器学习引擎703输出的有意图槽位格式标记的格式化文本中,提取出意图信息;The intention classification 705 is used to extract the intention information from the formatted text marked with the intention slot format output by the rule engine 702 or the machine learning engine 703;
槽位填充706,用于从规则引擎702或机器学习引擎703输出的有意图槽位格式标记的格式化文本中,提取出槽位信息。Slot filling 706 is used to extract slot information from the formatted text of the intentional slot format mark output by the rule engine 702 or the machine learning engine 703.
如图8所示,为本申请实施例中意图识别方法一个使用场景示意图。用户对电子设备100语音输入“打电话给爸爸”后,电子设备100会语音识别该用户输入,将其转换为用户输入的文本,然后通过NLU引擎700对该用户输入的文本进行意图识别,输出意图“打电话”和槽位“爸爸”给拨号应用。拨号应用根据该意图和槽位,拨打“爸爸”的电话号码。As shown in FIG. 8, a schematic diagram of a usage scenario of the intention recognition method in an embodiment of this application. After the user inputs "Call Dad" into the electronic device 100 by voice, the electronic device 100 will recognize the user input by voice, convert it into the text input by the user, and then use the NLU engine 700 to perform intent recognition on the text input by the user and output The intention is to "call" and the slot "dad" to the dial-up application. The dialing application dials the phone number of "Dad" according to the intention and slot.
电子设备100中NLU引擎700对该用户输入的文本进行意图识别的过程中,和现有基于CFG解析算法的规则匹配系统相比,本申请实施例意图识别方法中使用的基于FST的规则匹配系统,实际工程测试结果如下表1所示:In the process of the NLU engine 700 in the electronic device 100 performing intent recognition on the text input by the user, compared with the existing rule matching system based on the CFG parsing algorithm, the FST-based rule matching system used in the intent recognition method of the embodiment of the present application , The actual engineering test results are shown in Table 1 below:
硬件环境:CPU:Intel(R)Xeon(R)E5-2690v2@3.00GHz,内存:32GB;Hardware environment: CPU: Intel(R)Xeon(R)E5-2690v2@3.00GHz, memory: 32GB;
指标项Index item CFG规则匹配系统CFG rule matching system FST规则匹配系统FST rule matching system
规则配置Rule configuration 25万模板250,000 templates 25万规则250,000 rule
构建方式Construction method 人工编写规则Manually write rules 人工编写规则Manually write rules
匹配时延Match delay 限制条件:单节点,并发Restrictions: single node, concurrent 限制条件:单节点,并发Restrictions: single node, concurrent
 To 100tps,端到端:150ms100tps, end-to-end: 150ms 100tps,端到端:12ms100tps, end-to-end: 12ms
表1Table 1
从表1所示的测试结果可以很明显的看出,使用本申请实施例意图识别方法中基于FST的规则匹配系统,在控制其他变量都完全相同的情况下,相比现有的基于CFG解析算法的规则匹配系统,规则匹配的延时显著降低,匹配速度得到极大的提升。It can be clearly seen from the test results shown in Table 1 that using the FST-based rule matching system in the intention identification method of the embodiment of the present application, under the condition that other variables are completely the same, compared with the existing CFG-based analysis With the algorithmic rule matching system, the delay of rule matching is significantly reduced, and the matching speed is greatly improved.
下面结合上述示例性电子设备100的软硬件机构,对本申请实施例中意图识别方法进行具体描述:The following specifically describes the intention recognition method in the embodiment of the present application in conjunction with the hardware and software mechanisms of the above exemplary electronic device 100:
如图9所示,为本申请实施例中FST意图槽位模型一个示例性示意图。As shown in FIG. 9, it is an exemplary schematic diagram of the FST intention slot model in the embodiment of this application.
图9所示的FST意图槽位模型中,通过在FST初始状态后插入意图信息,在所有槽位前后插入槽位标注信息,从而实现在FST匹配时进行意图信息和槽位标识信息的输出。FST意图槽位模型包括状态以及状态之间的转换,具体的:In the FST intent slot model shown in FIG. 9, the intent information is inserted after the initial state of the FST, and the slot label information is inserted before and after all slots, so as to realize the output of the intent information and the slot identification information when the FST is matched. The FST intent slot model includes states and transitions between states, specifically:
(1)FST状态由圆圈加上状态编号表示;(1) The FST status is represented by a circle plus a status number;
(2)FST状态与状态之间的转换由它们之间的有向曲线连接表示;(2) The transition between FST state and state is represented by the connection of directed curves between them;
(3)FST状态之间的转换函数为有向边上的<key>:<value>映射,当FST接受<key>时,会输出<value>。其中<eps>表示接受一个空字符输入(不消耗当前输入的字符串);(3) The conversion function between FST states is the <key>: <value> mapping on the directed edge. When FST accepts <key>, it will output <value>. Among them, <eps> means accepting a null character input (does not consume the current input string);
(4)状态0和状态14分别是FST的初始状态和结束状态,其他状态为中间状态。(4) State 0 and State 14 are the initial state and end state of FST respectively, and the other states are intermediate states.
该图9所示的FST意图槽位模型是由人工预先编写的,其具体过程可以为:The FST intent slot model shown in Figure 9 is manually pre-written, and the specific process can be as follows:
(1)编写FST规则。例如,图9所示的FST意图槽位模型的FST规则可以写为:(1) Write FST rules. For example, the FST rule of the FST intent slot model shown in Figure 9 can be written as:
person=“爸爸”|“妈妈”|“小明”;person="Dad"|"Mom"|"Xiao Ming";
call_1=“打电话给”(“”:“<”)person(“”:“:contact>”);call_1="Call"("":"<")person("":":contact>");
call_2=“给”(“”:“<”)person(“”:“:contact>”)“打电话”;call_2="Give"("":"<")person("":":contact>")"Call";
export CALL=(“”:”<CALL>”)(call_1|call_2);export CALL=("":"<CALL>")(call_1|call_2);
(2)使用规则编译程序将该FST规则编译成如图9所示的FST意图槽位模型。(2) Use the rule compiler to compile the FST rules into the FST intent slot model as shown in FIG. 9.
下面结合图9所示的FST意图槽位模型,对本申请实施例的意图识别方法中,基于FST意图槽位模型对用户输入的文本进行NLU意图识别和槽位填充的过程进行描述:The following describes the process of performing NLU intent recognition and slot filling on text input by the user based on the FST intent slot model in the intent recognition method of the embodiment of the present application in conjunction with the FST intent slot model shown in FIG. 9:
如图10所示,为本申请实施例中意图识别方法一个流程示意图。As shown in FIG. 10, it is a schematic flowchart of an intention recognition method in an embodiment of this application.
S1001、响应于用户的语音输入,电子设备将该语音输入转换为第一文本;S1001. In response to a user's voice input, the electronic device converts the voice input into a first text;
示例性的,如图8所示,当用户对电子设备100语音说:“打电话给爸爸”,则电子设备100可以将该语音输入转换为文本:“打电话给爸爸”。Exemplarily, as shown in FIG. 8, when the user speaks to the electronic device 100 by voice: "Call Dad", the electronic device 100 may convert the voice input into text: "Call Dad".
可以理解的是,电子设备100可以在接收到某个触发后,才执行S1001。例如,电子设备100可以在检测到用户打开语音助手功能后,才执行步骤S1001;或者,电子设备100可以在检测到用户双击屏幕后,才执行步骤S1001;或者,电子设备100也可以在开机后就可执行步骤S1001,此处不作限定。It is understandable that the electronic device 100 may execute S1001 only after receiving a certain trigger. For example, the electronic device 100 may perform step S1001 only after detecting that the user has turned on the voice assistant function; or, the electronic device 100 may perform step S1001 after detecting that the user double-clicks on the screen; or the electronic device 100 may also perform step S1001 after it is turned on. Step S1001 can be executed, which is not limited here.
S1002、该电子设备对该第一文本进行格式预处理,得到第二文本;S1002, the electronic device performs format preprocessing on the first text to obtain the second text;
电子设备对第一文本进行格式预处理的主要目的在于:去除掉第一文本中后续在进行FST意图槽位模型匹配时无用的格式字符。例如,去除掉第一文本中的空格、标点符号等等。The main purpose of the electronic device performing format preprocessing on the first text is to remove format characters that are not used in the subsequent FST intent slot model matching in the first text. For example, remove spaces, punctuation marks, etc. in the first text.
示例性的,若第一文本为:“打电话给小明。”在通过步骤S1002进行格式预处理后,得到的第二文本为:“打电话给小明”。去除了其中的空格和句号。Exemplarily, if the first text is: "Call Xiaoming." After the format preprocessing is performed in step S1002, the second text obtained is: "Call Xiaoming". Removed the spaces and periods.
可以理解的是,在一些实施例中,也可以不执行步骤S1002,直接对执行步骤S1001后得到第一文本执行步骤S1003,此时,第一文本就是第二文本。因为FST意图槽位模型中也可以加入对第一文本进行格式预处理的相关规则,但是这样会加大FST意图槽位模型的复杂度。具体使用何种方案,可以根据实际情况选择,此处不作限定。It can be understood that, in some embodiments, step S1002 may not be performed, and step S1003 is directly performed on the first text obtained after step S1001 is performed. At this time, the first text is the second text. Because the FST intent slot model can also add relevant rules for formatting the first text, but this will increase the complexity of the FST intent slot model. Which scheme to use can be selected according to the actual situation and is not limited here.
S1003、该电子设备使用预设的FST意图槽位模型对该第二文本进行规则匹配,得到第三文本;S1003. The electronic device uses the preset FST intent slot model to perform rule matching on the second text to obtain the third text;
该预设的FST意图槽位模型为一个预设的FST;该预设的FST意图槽位模型在状态转换过程中会在输入文本中加上预设意图标注信息和/或预设槽位标注信息,该预设意图标注信息用于标注出输入文本的意图信息,该预设槽位标注信息用于标注出输入文本中的槽位信息。The preset FST intent slot model is a preset FST; the preset FST intent slot model will add preset intent label information and/or preset slot labels to the input text during the state transition process Information, the preset intent labeling information is used to label the intent information of the input text, and the preset slot labeling information is used to label the slot information in the input text.
可以理解的是,在有些预设的FST意图槽位模型中,可以仅用预设意图标注信息标注出意图的位置;在有些预设的FST意图槽位模型中,可以直接在输入文本中加入意图信息,并使用预设意图标注信息标注出意图信息的位置;在有些预设的FST意图槽位模型中,可以用预设意图标注信息标注出意图的位置,并用预设槽位标注信息标注出槽位信息的位置;在有些预设的FST意图槽位模型中,可以直接在输入文本中加入意图信息,使用预设意图标注信息标注出意图信息的位置,并用预设槽位标注信息标注出槽位信息的位置。此处不作限定。It is understandable that in some preset FST intent slot models, you can only use the preset intent label information to mark the location of the intent; in some preset FST intent slot models, you can directly add in the input text Intent information, and use the preset intent labeling information to mark the location of the intent information; in some preset FST intent slot models, you can use the preset intent labeling information to mark the location of the intent, and use the preset slot labeling information to mark The location of the slot information; in some preset FST intent slot models, you can directly add intent information to the input text, use the preset intent annotation information to mark the position of the intent information, and use the preset slot annotation information to mark The location of the slot information. There is no limitation here.
该第三文本为第二文本进行规则匹配后,包含预设意图标注信息和/或预设槽位标注信息的文本。The third text is a text containing preset intent labeling information and/or preset slot labeling information after the rule matching is performed on the second text.
示例性的,以第二文本为:“打电话给小明”,使用如图9所示的FST意图槽位模型中进行规则匹配为例,对匹配的过程进行描述,以下将图9所示的FST意图槽位模型简称为FST:Exemplarily, taking the second text as: "Call Xiaoming", using the rule matching in the FST intent slot model shown in Figure 9 as an example to describe the matching process, as shown in Figure 9 below. The FST intent slot model is abbreviated as FST:
A)FST从接受空字符,输出意图信息<CALL>,由初始状态0转移到状态1;A) FST receives null characters and outputs intent information <CALL>, and transfers from initial state 0 to state 1;
B)FST接受字符“打”,输出字符“打”,由状态1转移到状态2;B) FST accepts the character "beat", outputs the character "beat", and transfers from state 1 to state 2;
C)FST接受字符“电”,输出字符“电”,由状态2转移到状态3;C) FST accepts the character "electricity", outputs the character "electricity", and transfers from state 2 to state 3;
D)FST接受字符“话”,输出字符“话”,由状态3转移到状态4;D) FST accepts the character "word", outputs the character "word", and transfers from state 3 to state 4;
E)FST接受字符“给”,输出字符“给”,由状态4转移到状态5;E) FST accepts the character "give", outputs the character "give", and transfers from state 4 to state 5;
F)FST接受空字符,输出槽位标注信息前缀“<”,由状态5转移到状态6;F) FST accepts empty characters, outputs the slot label information prefix "<", and transfers from state 5 to state 6;
G)FST接受字符“小”,输出字符“小”,由状态6转移到状态7;G) FST accepts the character "small", outputs the character "small", and transfers from state 6 to state 7;
H)FST接受字符“明”,输出字符“明”,由状态7转移到状态10;H) FST accepts the character "Ming", outputs the character "Ming", and transfers from state 7 to state 10;
I)FST接受空字符,输出槽位标注信息后缀“:contact>”,由状态10转移到终止状态14,从而完成匹配过程;I) FST accepts empty characters, outputs the suffix ":contact>" of the slot label information, and transfers from state 10 to termination state 14, thereby completing the matching process;
J)输出FST匹配结果,如:“<CALL>打电话给<小明:contact>”。J) Output the FST matching result, such as: "<CALL> call <小明:contact>".
该输出的“<CALL>打电话给<小明:contact>”即为进行规则匹配后得到的第三文本。The output "<CALL> to call <小明:contact>" is the third text obtained after rule matching.
同理可知,按照图9所示的FST意图槽位模型,若输入的第二文本为“给小明打电话”,则输出的第三文本为:““<CALL>给<小明:contact>打电话”,具体的匹配过程此处不再赘述。In the same way, according to the FST intent slot model shown in Figure 9, if the input second text is "Call Xiaoming", the output third text is: ""<CALL>to <小明:contact> Phone", the specific matching process will not be repeated here.
从上述示例可知,第三文本中会包含预设意图标注信息,例如<>,以及预设槽位标注信息,例如<:contact>。其中,预设意图标注信息<>标注出输入的第二文本的意图为CALL,预设槽位标注信息<:contact>标注出输入的第二文本中槽位为小明。It can be seen from the above example that the third text will contain preset intent labeling information, such as <>, and preset slot labeling information, such as <:contact>. Among them, the preset intent tagging information <> indicates that the intention of the input second text is CALL, and the preset slot tagging information <:contact> indicates that the slot in the input second text is Xiaoming.
可以理解的是,图9为一个示例性的FST意图槽位模型。在实际应用中,还可以根据实际需求,使用其它的预设符号作为预设意图标注信息与预设槽位标注信息,此处不作限定。It is understandable that Fig. 9 is an exemplary FST intent slot model. In actual applications, other preset symbols can also be used as preset intent labeling information and preset slot labeling information according to actual needs, which is not limited here.
S1004、该电子设备根据预设意图标注信息和/或预设槽位标注信息,从该第三文本中获取意图信息和/或槽位信息;S1004. The electronic device obtains intent information and/or slot information from the third text according to preset intent labeling information and/or preset slot labeling information;
电子设备得到含有预设意图标注信息和预设槽位标注信息的第三文本后,可以根据预设意图标注信息和预设槽位标注信息的位置,从该第三文本中提取出意图信息和槽位信息。After the electronic device obtains the third text containing the preset intent labeling information and the preset slot labeling information, it can extract the intent information and the preset slot labeling information from the third text according to the position of the preset intent labeling information and the preset slot labeling information. Slot information.
示例性的,对于第三文本:“<CALL>打电话给<小明:contact>”,电子设备根据预设意图标注信息<>,可以提取出其中的意图信息为CALL;根据预设槽位标注信息<:contact>,可以提取出其中的槽位信息为小明。Exemplarily, for the third text: "<CALL>Call <小明:contact>", the electronic device labels the information according to the preset intent <>, and the intent information can be extracted as CALL; label according to the preset slot Information <:contact>, the slot information can be extracted as Xiaoming.
S1005、电子设备将该意图信息和/或槽位信息进行结构化的输出。S1005. The electronic device outputs the intent information and/or slot information in a structured manner.
在得到意图信息和槽位信息后,电子设备可以将该意图信息和槽位信息进行结构化的输出,以供电子设备中其他模块使用。该结构化的输出中还可以包含其他信息,例如第一文本、第二文本等,此处不作限定。After obtaining the intent information and the slot information, the electronic device can structure the output of the intent information and the slot information for use by other modules in the electronic device. The structured output may also contain other information, such as the first text, the second text, etc., which are not limited here.
示例性的,对于第一文本:“打电话给小明。”,经过步骤S1001至步骤S1004,提取出其意图信息为CALL、槽位信息为小明后,可以将意图信息、槽位信息以及第二文本共同组成结构化的输出,供电子设备100中的其他模块使用。例如,可以输出JSON“{“text”:”打电话给小明”,“intent”:”CALL”,“slots”:[{“slotName”:”contact”,“slotValue”:”小明”}]}”。Exemplarily, for the first text: "Call Xiaoming." After step S1001 to step S1004, after extracting the intent information as CALL and the slot information as Xiaoming, the intent information, the slot information, and the second The texts together form a structured output for use by other modules in the electronic device 100. For example, you can output JSON "{"text":"Call Xiaoming","intent":"CALL","slots":[{"slotName":"contact","slotValue":"Xiaoming"}]} ".
本申请实施例中,电子设备使用预设的FST意图槽位模型对用户输入进行意图识别,由于预设的FST意图槽位模型为一个FST,基于FST规则匹配的特性,其进行规则匹配极快。且匹配后能得到具有预设意图标注信息和预设槽位标注信息的第三文本,在匹配完成后能便捷的从中提取出意图信息和槽位信息,极大的提升了使用NLU规则进行意图识别时匹配的速度和准确度。In the embodiment of this application, the electronic device uses a preset FST intent slot model to recognize user input. Since the preset FST intent slot model is an FST, based on the characteristics of FST rule matching, it performs rule matching extremely fast . And after matching, the third text with preset intent labeling information and preset slot labeling information can be obtained. After the matching is completed, the intent information and slot information can be easily extracted from it, which greatly improves the use of NLU rules for intent The speed and accuracy of matching during recognition.
在实际应用中,由于电子设备100需要对非常多不同种类的意图进行识别,在进行FST意图槽位模型构建时,一般会采用一个意图一个FST意图槽位模型的策略,这样可以有效地减少不同意图间的规则冲突问题。当进行规则匹配时,可以采用多意图并行匹配的方式, 最大程度减少匹配时延。因此电子设备100中可以存储有非常多的预设的FST意图槽位模型。In practical applications, since the electronic device 100 needs to recognize many different types of intents, when constructing the FST intent slot model, a strategy of one intent and one FST intent slot model is generally adopted, which can effectively reduce the number of different intents. Conflict of rules between intentions. When performing rule matching, a multi-intent parallel matching method can be used to minimize the matching delay. Therefore, the electronic device 100 can store a large number of preset FST intent slot models.
但是,同时对一个用户输入进行多个预设的FST意图槽位模型进行匹配时,可能会出现多个输出。此时,为了便于电子设备确定输出的意图信息和槽位信息的可信度,该预设的FST意图槽位模型可以为预设的WFST意图槽位模型。由于WFST中每次状态转换都会有一个权重,最终匹配完成后会得到累积权重。However, when multiple preset FST intent slot models are matched to a user input at the same time, multiple outputs may appear. At this time, in order to facilitate the electronic device to determine the credibility of the output intent information and the slot information, the preset FST intent slot model may be a preset WFST intent slot model. Since each state transition in WFST will have a weight, the cumulative weight will be obtained after the final matching is completed.
因此在对一个用户输入进行多个预设的WFST意图槽位模型进行匹配后,可以得到多个输出结果,以及每个输出结果对应的评分。此时,电子设备可以从评分最高的输出结果中提取出意图信息和槽位信息。Therefore, after matching multiple preset WFST intent slot models on a user input, multiple output results and a score corresponding to each output result can be obtained. At this time, the electronic device can extract intent information and slot information from the output result with the highest score.
如图11所示,为本申请实施例的意图识别方法另一个流程示意图。As shown in FIG. 11, it is a schematic diagram of another process of the intention recognition method according to an embodiment of this application.
一、线下规则编译。1. Offline rules compilation.
(1)语料生产者或研发人员通过前端规则编辑器编写各个意图的WFST规则,采用一个意图一个WFST的组织方式。(1) The corpus producer or researcher compiles the WFST rules of each intent through the front-end rule editor, and adopts the organization method of one intent and one WFST.
例如编写如下的WFST规则:For example, write the following WFST rules:
WFST规则1:WFST Rule 1:
word=wildcard(1)<1>;word=wildcard(1)<1>;
rule_1=“打开”“<”“微信”“:app>”;rule_1="Open" "<" "WeChat" ":app>";
rule_2=“打开”“<”word+:app>”;rule_2="Open" "<"word+:app>";
WFST规则1中,word为通配字符,可以与任意字符匹配,其权重为1。其他字符匹配的权重为默认权重0。In WFST rule 1, word is a wildcard character, which can match any character, and its weight is 1. The weight for matching other characters is the default weight 0.
该WFST规则1的意图为:打开应用程序微信。The intent of the WFST Rule 1 is: to open the application WeChat.
WFST规则2:WFST Rule 2:
rule_1=“打开短信”;rule_1="Open SMS";
WFST规则2中没有通配字符,所有字符匹配的权重均为默认权重0。There are no wildcard characters in WFST rule 2, and the weight for matching all characters is the default weight of 0.
该WFST规则1的意图为:打开短消息。The intent of this WFST rule 1 is: to open a short message.
(2)语法编译器对规则文件进行编译,得到WFST意图槽位模型文件;(2) The grammar compiler compiles the rule file to obtain the WFST intent slot model file;
WFST意图槽位模型文件中为WFST意图槽位模型。The WFST intent slot model file is the WFST intent slot model.
例如,如图12所示,为根据WFST规则1编译得到的WFST意图槽位模型1。For example, as shown in Figure 12, it is a WFST intent slot model 1 compiled according to WFST rule 1.
该WFST意图槽位模型1中包括10个状态。其中状态0代表初始状态,状态7和9代表终止状态。该WFST意图槽位模型1有两条路径:0、1、2、3、4、5、6、7和0、1、2、3、4、8、9:The WFST intention slot model 1 includes 10 states. Among them, state 0 represents the initial state, and states 7 and 9 represent the end state. The WFST intent slot model 1 has two paths: 0, 1, 2, 3, 4, 5, 6, 7 and 0, 1, 2, 3, 4, 8, 9:
从状态0到状态4的路径相同:The path from state 0 to state 4 is the same:
从状态0到状态1为接受一个空字符<eps>,输出预设意图标注信息和意图信息<OPEN_APP>,其权重为0;From state 0 to state 1, an empty character <eps> is accepted, and the preset intent label information and intent information <OPEN_APP> are output, and the weight is 0;
从状态1到状态2为接受字符“打”,输出字符“打”,其权重为0;From state 1 to state 2, the character "beat" is accepted, and the character "beat" is output, and its weight is 0;
从状态2到状态3为接受字符“开”,输出字符“开”,其权重为0;From state 2 to state 3, the character "on" is accepted, and the character "on" is output, and its weight is 0;
从状态3到状态4为接收一个空字符<eps>,输出预设槽位标注信息的前半部分<,其权重为0;From state 3 to state 4, it receives an empty character <eps>, and outputs the first half of the label information of the preset slot <, and its weight is 0;
从状态4开始两条路径不同:Starting from state 4, the two paths are different:
其中一条路径为4、5、6、7:One of the paths is 4, 5, 6, 7:
从状态4到状态5为接受字符“微”,输出字符“微”,其权重为0;From state 4 to state 5, the character "micro" is accepted, and the character "micro" is output, and its weight is 0;
从状态5到状态6为接受字符“信”,输出字符“信”,其权重为0;From state 5 to state 6, the character "letter" is accepted, and the character "letter" is output, and its weight is 0;
从状态6到状态7为接收一个空字符<eps>,输出预设槽位标注信息的后半部分:app>,其权重为0;From state 6 to state 7, an empty character <eps> is received, and the second half of the label information of the preset slot: app> is output, and its weight is 0;
另一条路径为4、8、9:The other path is 4, 8, 9:
从状态4到状态8为接收一个任意字符<any>,输出该任意字符<any>,其权重为1;From state 4 to state 8 is to receive an arbitrary character <any>, output the arbitrary character <any>, and its weight is 1;
从状态8到状态8为接收一个任意字符<any>,输出该任意字符<any>,其权重为1;From state 8 to state 8 is to receive an arbitrary character <any>, output the arbitrary character <any>, and its weight is 1;
从状态8到状态9为接收一个空字符<eps>,输出预设槽位标注信息的后半部分:app>,其权重为0。From state 8 to state 9, an empty character <eps> is received, and the second half of the preset slot label information: app> is output, and its weight is 0.
WFST意图槽位模型1的累积权重的计算方式为路径中的权重相加。The calculation method of the cumulative weight of the WFST intention slot model 1 is the addition of the weights in the path.
如图13所示,为根据WFST规则2编译得到的WFST意图槽位模型2。As shown in Figure 13, it is a WFST intent slot model 2 compiled according to WFST rule 2.
该WFST意图槽位模型2中包括6个状态。其中状态0代表初始状态,状态5代表终止状态。该WFST意图槽位模型2只有一条路径:0、1、2、3、4、5。The WFST intent slot model 2 includes 6 states. Among them, state 0 represents the initial state, and state 5 represents the end state. The WFST intent slot model 2 has only one path: 0, 1, 2, 3, 4, 5.
从状态0到状态1为接受一个空字符<eps>,输出预设意图标注信息和意图信息<CHECK_MESSAGE>,其权重为0;From state 0 to state 1, an empty character <eps> is accepted, and the preset intention label information and intention information <CHECK_MESSAGE> are output, and the weight is 0;
从状态1到状态2为接受字符“打”,输出字符“打”,其权重为0;From state 1 to state 2, the character "beat" is accepted, and the character "beat" is output, and its weight is 0;
从状态2到状态3为接受字符“开”,输出字符“开”,其权重为0;From state 2 to state 3, the character "on" is accepted, and the character "on" is output, and its weight is 0;
从状态3到状态4为接受字符“短”,输出字符“短”,其权重为0;From state 3 to state 4, the character "short" is accepted, and the character "short" is output, and its weight is 0;
从状态4到状态5为接受字符“信”,输出字符“信”,其权重为0。From state 4 to state 5, the character "letter" is accepted, and the character "letter" is output, and its weight is 0.
WFST意图槽位模型2的累积权重的计算方式为路径中的权重相加。The cumulative weight of the WFST intention slot model 2 is calculated by adding the weights in the path.
(3)输出WFST意图槽位模型文件,进行持久化保存。(3) Output the WFST intent slot model file for persistent storage.
二、线上规则匹配。2. Online rule matching.
(1)首先加载已编译的所有WFST意图槽位模型文件,初始化NLU引擎;(1) First load all the compiled WFST intent slot model files and initialize the NLU engine;
(2)对用户输入使用WFST意图槽位模型进行并行规则匹配得到输出结果,每个输出结果中包括意图标识信息、槽位标识信息和权重;(2) Use the WFST intent slot model to perform parallel rule matching on user input to obtain output results. Each output result includes intent identification information, slot identification information and weight;
(3)使用结构化输出评分最高的一个或多个WFST匹配结果。(3) Use one or more WFST matching results with the highest structured output score.
下面结合图12和图13所示的WFST意图槽位模型,对上述线上规则匹配中的(2)(3)进行具体描述。The following describes (2) and (3) of the online rule matching in detail in conjunction with the WFST intention slot model shown in FIG. 12 and FIG. 13.
请参阅图14,为本申请实施例中意图识别方法另一个流程示意图。Please refer to FIG. 14, which is a schematic diagram of another process of the intention recognition method in an embodiment of this application.
S1401、电子设备加载预设的WFST意图槽位模型;S1401, the electronic device loads the preset WFST intent slot model;
电子设备可以通过读取电子设备中存储的WFST意图槽位模型文件,来加载预设的WFST意图槽位模型。The electronic device can load the preset WFST intent slot model by reading the WFST intent slot model file stored in the electronic device.
可以理解的是,加载预设的WFST意图槽位模型时,可以全部进行加载;也可以根据终端当前所处的场景,加载其中的一部分,此处不作限定。It is understandable that when the preset WFST intent slot model is loaded, all of it can be loaded; or a part of it can be loaded according to the current scene of the terminal, which is not limited here.
S1402、响应于用户的语音输入,电子设备将该语音输入转换为第一文本;S1402, in response to the user's voice input, the electronic device converts the voice input into a first text;
与步骤S1001类似,此处不作赘述。It is similar to step S1001 and will not be repeated here.
S1403、该电子设备对该第一文本进行格式预处理,得到第二文本;S1403. The electronic device performs format preprocessing on the first text to obtain the second text;
与步骤S1002类似,此处不作赘述。It is similar to step S1002 and will not be repeated here.
可以理解的是,步骤S1402和S1403可以与步骤S1401同时执行,可以在步骤S1401之前执行,也可以在步骤S1401之后执行,此处不作限定。It can be understood that steps S1402 and S1403 can be performed simultaneously with step S1401, can be performed before step S1401, or can be performed after step S1401, which is not limited here.
S1404、电子设备使用多个预设的WFST意图槽位模型,对该第二文本进行并行规则匹配,得到WFST结果;S1404. The electronic device uses multiple preset WFST intent slot models to perform parallel rule matching on the second text to obtain a WFST result;
一个预设的WFST意图槽位模型是一个预设的WFST。该预设的WFST意图槽位模型在状态转移的过程中会在输入文本中加上预设意图标注信息和/或预设槽位标注信息,该预设意图标注信息用于标注出输入文本的意图,该预设槽位标注信息用于标注出输入文本中的槽位信息。A preset WFST intent slot model is a preset WFST. The preset WFST intent slot model will add preset intent labeling information and/or preset slot labeling information to the input text during the state transition process. The preset intent labeling information is used to label the input text. Intentionally, the preset slot labeling information is used to label the slot information in the input text.
可以理解的是,在有些预设的WFST意图槽位模型中,可以仅用预设意图标注信息标注出意图的位置;在有些预设的WFST意图槽位模型中,可以直接在输入文本中加入意图信息,并使用预设意图标注信息标注出意图信息的位置;在有些预设的WFST意图槽位模型中,可以用预设意图标注信息标注出意图的位置,并用预设槽位标注信息标注出槽位信息的位置;在有些预设的WFST意图槽位模型中,可以直接在输入文本中加入意图信息,使用预设意图标注信息标注出意图信息的位置,并用预设槽位标注信息标注出槽位信息的位置。此处不作限定。It is understandable that in some preset WFST intent slot models, you can only use the preset intent label information to mark the location of the intent; in some preset WFST intent slot models, you can directly add in the input text Intent information, and use the preset intent labeling information to mark the location of the intent information; in some preset WFST intent slot models, you can use the preset intent labeling information to mark the location of the intent, and use the preset slot labeling information to mark The position of the slot information; in some preset WFST intent slot models, you can directly add intent information to the input text, use the preset intent annotation information to mark the position of the intent information, and use the preset slot annotation information to mark The location of the slot information. There is no limitation here.
预设的WFST意图槽位模型中,每一次状态转移时都有一个权重。WFST结果中包括第二文本的匹配文本以及匹配路径的累积权重。该第二文本的匹配文本为通过匹配路径成功匹配后,输出的包含预设意图标注信息和/或预设槽位标注信息的文本。In the preset WFST intent slot model, each state transition has a weight. The WFST result includes the matching text of the second text and the cumulative weight of the matching path. The matching text of the second text is the output text containing the preset intent label information and/or the preset slot label information after successful matching through the matching path.
如图15所示,为本申请实施例中一个对第二文本进行多个预设的WFST意图槽位模型并行规则匹配的示例性示意图。电子设备采用一个意图一个WFST意图槽位模型的策略,存储了多个预设的WFST意图槽位模型,例如CALL.fst、MESSAGE.fst、NAVIGATE.fst等等。电子设备加载这些预设的WFST意图槽位模型后,可以对预处理得到的第二文本进行并行规则匹配。由于预设的WFST意图槽位模型中,有些模型的有些路径中会有通配符,因此第二文本可能只与一个预设的WFST意图槽位模型中一条路径匹配成功,也可能与多个预设的WFST意图槽位模型中的多条路径都匹配成功。因此,可能会得到一个WFST结果,也可能会得到多个WFST结果。As shown in FIG. 15, it is an exemplary schematic diagram of performing parallel rule matching of multiple preset WFST intent slot models on the second text in an embodiment of this application. The electronic device adopts a strategy of intent to a WFST intent slot model, and stores multiple preset WFST intent slot models, such as CALL.fst, MESSAGE.fst, NAVIGATE.fst, and so on. After the electronic device loads these preset WFST intent slot models, it can perform parallel rule matching on the second text obtained by preprocessing. Because in the preset WFST intent slot model, some models have wildcards in some paths, the second text may only match one path in a preset WFST intent slot model, or it may be matched with multiple presets. Multiple paths in the WFST intent slot model are all matched successfully. Therefore, one WFST result may be obtained, or multiple WFST results may be obtained.
当只得到一个WFST结果时,可以直接确定该WFST结果为可信度最高的WFST结果, 执行步骤S1406;When only one WFST result is obtained, it can be directly determined that the WFST result is the most reliable WFST result, and step S1406 is executed;
当得到多个WFST结果时,则需要执行步骤S1405,根据累积权重,确定其中可信度最高的WFST结果。When multiple WFST results are obtained, step S1405 needs to be executed, and the WFST result with the highest credibility among them is determined according to the cumulative weight.
可以理解的是,由于预设的WFST意图槽位模型中权重可自定义,累积权重的计算方式也可以自定义,因此根据设定的不同,权重和累积权重表示的概念可以有所不同,累积权重的计算方式可以不同,相应的,确定可信度最高的WFST结果的方式也可以不同,此处不作限定。It is understandable that the weights in the preset WFST intent slot model can be customized, and the calculation method of cumulative weights can also be customized. Therefore, the concept of weight and cumulative weight can be different according to the setting. The calculation method of the weight can be different, and accordingly, the method of determining the WFST result with the highest credibility can also be different, which is not limited here.
如图16所示,为本申请实施例中在预设的WFST意图槽位模型中对权重和累积权重采用不同设定的情况的一种示例性示意图。下面对这几种不同的设定情况进行示例性的描述:As shown in FIG. 16, this is an exemplary schematic diagram of a case where different settings are used for the weight and the cumulative weight in the preset WFST intent slot model in this embodiment of the present application. The following is an exemplary description of these different settings:
情况1:接受通配符的状态转移的权重大于不接受通配符的状态转移的权重,且匹配路径上每次状态转移的权重越大,匹配路径的累积权重越大。Case 1: The weight of a state transition that accepts wildcards is greater than that of a state transition that does not accept wildcards, and the greater the weight of each state transition on the matching path, the greater the cumulative weight of the matching path.
此种情况下,权重表示状态转移的代价,累积权重表示匹配路径的代价。累积权重越大,表明WFST匹配当前路径的代价越大,相应的,可信度越低。因此,在状态转移时,接受通配符的状态转移的权重大于不接受通配符的状态转移的权重。且还可以根据通配符通配的程度和范围来设定权重:将通配程度高、通配范围大的通配符的权重,设定的比通配程度低、通配范围小的通配符的权重更大。In this case, the weight represents the cost of state transition, and the cumulative weight represents the cost of matching paths. The greater the cumulative weight, the greater the cost of WFST matching the current path, and correspondingly, the lower the credibility. Therefore, when the state transitions, the weight of the state transition that accepts the wildcard is greater than the weight of the state transition that does not accept the wildcard. And you can also set the weight according to the degree and range of wildcard wildcards: set the weight of wildcards with a high degree of wildcarding and a large wildcard range to be greater than those with a low degree of wildcarding and a small wildcard range. .
如图11所示的WFST意图槽位模型1和图12所示的WFST意图槽位模型2即是按照情况1设定的WFST意图槽位模型。以第二文本为“打开短信”为例,将该第二文本使用WFST意图槽位模型1和WFST意图槽位模型2进行并行规则匹配时:The WFST intent slot model 1 shown in FIG. 11 and the WFST intent slot model 2 shown in FIG. 12 are the WFST intent slot models set according to the situation 1. Taking the second text as "Open SMS" as an example, when the second text uses WFST intent slot model 1 and WFST intent slot model 2 to perform parallel rule matching:
WFST意图槽位模型1会按照匹配路径0,1,2,3,6,10,10,11得到WFST结果1:<OPEN_APP>打开<短信:app>,Weight:2。WFST intent slot model 1 will follow the matching path 0, 1, 2, 3, 6, 10, 10, 11 to get WFST result 1: <OPEN_APP> open <message:app>,Weight:2.
WFST意图槽位模型2会按照匹配路径0,1,2,3,4,5得到WFST结果2:<CHECK_MESSAGE>打开短信,Weight:0。WFST intent slot model 2 will follow the matching path 0, 1, 2, 3, 4, 5 to get WFST result 2: <CHECK_MESSAGE> open the message, Weight:0.
由于WFST意图槽位模型1的匹配路径:状态9到状态10,状态10到状态10为通配符匹配,因此,其权重1比其他不采用通配符的状态转移的默认权重0高。由于该WFST意图槽位模型1中累积权重的计算方式为匹配路径上状态转移的权重之和,因此,其累积权重为2。Since the matching path of WFST intent slot model 1: state 9 to state 10, state 10 to state 10 is wildcard matching, its weight 1 is higher than the default weight 0 of other state transitions that do not use wildcards. Since the calculation method of the cumulative weight in the WFST intent slot model 1 is the sum of the weights of the state transitions on the matching path, the cumulative weight is 2.
而WFST意图槽位模型2的匹配路径上没有采用通配符的状态转移,因此其各状态转移的权重均为默认权重0。最终累积权重也为0。However, there is no wildcard state transition on the matching path of the WFST intent slot model 2, so the weight of each state transition is the default weight 0. The final cumulative weight is also 0.
情况2:接受通配符的状态转移的权重大于不接受通配符的状态转移的权重,且匹配路径上每次状态转移的权重越大,匹配路径的累积权重越小。Case 2: The weight of the state transition that accepts the wildcard is greater than the weight of the state transition that does not accept the wildcard, and the greater the weight of each state transition on the matching path, the smaller the cumulative weight of the matching path.
在有些情况下,可以设定状态转移的权重表示状态转移的代价,匹配路径的累积权重表示匹配路径的可信度。累积权重越大,表示可信度越高。In some cases, the weight of the state transition can be set to indicate the cost of the state transition, and the cumulative weight of the matching path indicates the credibility of the matching path. The greater the cumulative weight, the higher the credibility.
这种情况下,可以设定累积权重的计算方式,使得匹配路径上状态转移的权重越大,匹配路径的累积权重越小。例如,累积路径的计算方式采用匹配路径上状态转移之和的负数等,此处不作限定。In this case, the cumulative weight calculation method can be set so that the greater the weight of the state transition on the matching path, the smaller the cumulative weight of the matching path. For example, the calculation method of the cumulative path adopts the negative number of the sum of state transitions on the matching path, etc., which is not limited here.
情况3:接受通配符的状态转移的权重小于不接受通配符的状态转移的权重,且匹配路径上每次状态转移的权重越小,匹配路径的累积权重越大。Case 3: The weight of the state transition that accepts the wildcard is less than the weight of the state transition that does not accept the wildcard, and the smaller the weight of each state transition on the matching path, the greater the cumulative weight of the matching path.
在有些情况下,可以设定状态转移的权重表示状态转移的可信度,匹配路径的累积权重表示匹配路径的代价。累积权重越大,表示代价越高,可信度越低。In some cases, the weight of the state transition can be set to indicate the credibility of the state transition, and the cumulative weight of the matching path indicates the cost of the matching path. The larger the cumulative weight, the higher the cost and the lower the credibility.
情况4:接受通配符的状态转移的权重小于不接受通配符的状态转移的权重,且匹配路径上每次状态转移的权重越小,匹配路径的累积权重越小。Case 4: The weight of the state transition that accepts the wildcard is less than the weight of the state transition that does not accept the wildcard, and the smaller the weight of each state transition on the matching path, the smaller the cumulative weight of the matching path.
在有些情况下,可以设定状态转移的权重表示状态转移的可信度,匹配路径的累积权重表示匹配路径的可信度。累积权重越大,表示可信度越高。In some cases, the weight of the state transition can be set to indicate the credibility of the state transition, and the cumulative weight of the matching path indicates the credibility of the matching path. The greater the cumulative weight, the higher the credibility.
可以理解的是,使用何种情况来设定预设的WFST意图槽位模型,可以根据实际需求而定,此处不作限定。It is understandable that what kind of situation is used to set the preset WFST intent slot model can be determined according to actual needs and is not limited here.
S1405、当该WFST结果为多个时,电子设备确定其中可信度最高的WFST结果中的第二文本的匹配文本为第三文本;S1405. When there are multiple WFST results, the electronic device determines that the matching text of the second text in the WFST result with the highest credibility is the third text;
在有些情况下,电子设备可以直接使用WFST结果中的累积权重来进行比较。例如当累积权重表示可信度时,确定累积权重越高的WFST结果的可信度越高。当累积权重表示代价时,确定累积权重越小的WFST结果的可信度越高。In some cases, electronic devices can directly use the cumulative weights in the WFST results for comparison. For example, when the cumulative weight represents credibility, it is determined that the higher the cumulative weight, the higher the credibility of the WFST result. When the cumulative weight represents the cost, it is determined that the smaller the cumulative weight, the higher the credibility of the WFST result.
然而,在实际应用中,由于各个预设的WFST意图槽位模型可能具有不同数目的状态转移,为了使得最终进行可信度比较时的公平。因此可以先对累积权重进行归一化处理,得到可信度评分,再根据可信度评分确定可信度最高的WFST结果,具体的:However, in actual applications, since each preset WFST intent slot model may have a different number of state transitions, in order to make the final credibility comparison fair. Therefore, the cumulative weight can be normalized first to obtain the credibility score, and then the WFST result with the highest credibility can be determined according to the credibility score. Specifically:
1、当该WFST结果为多个时,电子设备对该多个WFST结果进行可信度评分计算;1. When there are multiple WFST results, the electronic device performs credibility score calculation on the multiple WFST results;
具体的,若累积权重表示匹配路径的代价。Specifically, if the cumulative weight represents the cost of matching paths.
则一种可信度评分计算公式可以为:Then a formula for calculating the credibility score can be:
可信度评分计算公式1:
Figure PCTCN2021100475-appb-000008
Reliability score calculation formula 1:
Figure PCTCN2021100475-appb-000008
其中,w表示需要评分的WFST结果中匹配路径的累积权重,w max表示多个WFST结果中匹配路径的累积权重中最大的累积权重。归一化可信度评分计算后的评分值域为【0,1】,评分越高表示可信度越高。 Among them, w represents the cumulative weight of the matching path in the WFST results that need to be scored, and w max represents the largest cumulative weight among the cumulative weights of the matching path in the multiple WFST results. The score range after the normalized credibility score calculation is [0, 1], the higher the score, the higher the credibility.
示例性的,以对上述示例中得到的WFST结果1:<OPEN_APP>打开<短信:app>,Weight:2,和WFST结果2:<CHECK_MESSAGE>打开短信,Weight:0,进行可信度评分计算为例,由于WFST结果1和WFST结果2中累积权重最大为2,因此w max为2,则: Exemplarily, take the WFST result 1: <OPEN_APP> open <message:app>, Weight:2, and WFST result 2: <CHECK_MESSAGE> open the SMS, Weight:0, to calculate the credibility score obtained in the above example As an example, since the maximum cumulative weight in WFST result 1 and WFST result 2 is 2, so w max is 2, then:
WFST结果1的可信度评分score(w 1,w max)=1-2/2=0; The credibility score of WFST result 1 score(w 1 ,w max )=1-2/2=0;
WFST结果2的可信度评分score(w 2,w max)=1-0/2=1。 The credibility score of WFST result 2 is score(w 2 , w max )=1-0/2=1.
可以理解的是,该可信度评分计算公式1为累积权重表示匹配路径的代价时一种可选的可信度评分计算公式。在实际应用中,也可以使用其他的计算公式来确定WFST结果的可信度评分,只需要使得评分越高,可信度越高即可。It is understandable that the credibility score calculation formula 1 is an optional credibility score calculation formula when the cumulative weight represents the cost of the matching path. In practical applications, other calculation formulas can also be used to determine the credibility score of the WFST results. It is only necessary to make the higher the score, the higher the credibility.
可以理解的是,若累积权重表示匹配路径的可信度,则可以采取其他的公式来进行归一化的可信度评分计算,使得评分越高表示可信度越高,此处不再赘述。It is understandable that if the cumulative weight represents the credibility of the matching path, other formulas can be used to calculate the normalized credibility score, so that the higher the score, the higher the credibility, so I won’t repeat it here. .
2、确定可信度评分最高的WFST结果中的第二文本的匹配文本为该第三文本;2. Determine the matching text of the second text in the WFST result with the highest credibility score as the third text;
示例性的,WFST结果1和WFST结果2中,可信度评分最高的是WFST结果2,因此,电子设备可以确定WFST结果2中的第二文本的匹配文本“<CHECK_MESSAGE>打开短信”是第三文本。Exemplarily, in WFST result 1 and WFST result 2, the highest credibility score is WFST result 2. Therefore, the electronic device can determine that the matching text "<CHECK_MESSAGE> Open SMS" of the second text in WFST result 2 is the first Three texts.
S1406、电子设备根据预设意图标注信息和/或预设槽位标注信息,从第三文本中获取意图信息和/或槽位信息;S1406: The electronic device obtains the intent information and/or slot information from the third text according to the preset intent labeling information and/or the preset slot labeling information;
与步骤S1004类似,此处不作赘述。It is similar to step S1004, and will not be repeated here.
示例性的,若第三文本为:<CHECK_MESSAGE>打开短信。电子设备根据预设意图标注信息<>,可以从该第三文本中提取出意图信息:CHECK_MESSAGE。可以理解的是,若该第三文本中有槽位标注信息,则电子设备还可以从该第三文本中提取出槽位信息,此处不作限定。Exemplarily, if the third text is: <CHECK_MESSAGE> Open the short message. The electronic device labels the information <> according to the preset intention, and can extract the intention information from the third text: CHECK_MESSAGE. It is understandable that if there is slot marking information in the third text, the electronic device can also extract the slot information from the third text, which is not limited here.
S1407、电子设备将该意图信息和/或槽位信息进行结构化的输出。S1407. The electronic device outputs the intent information and/or slot information in a structured manner.
与步骤S1005类似,此处不作赘述。It is similar to step S1005 and will not be repeated here.
示例性的,从WFST结果2中的第三文本中提取出意图信息CHECK_MESSAGE后,电子设备可以将该意图信息进行结构化的输出:Exemplarily, after extracting the intention information CHECK_MESSAGE from the third text in the WFST result 2, the electronic device can structure the intention information:
JSON(无槽位):JSON (no slots):
Figure PCTCN2021100475-appb-000009
Figure PCTCN2021100475-appb-000009
该结构化的输出可以被电子设备中其他模块使用。The structured output can be used by other modules in the electronic device.
本申请实施例中,电子设备中可以存储有多个预设的WFST意图槽位模型,在进行意图识别时,电子设备可以使用多个WFST意图槽位模型对用户输入进行并行匹配。从得到的多个WFST结果中确定出可信度最高的WFST结果进行意图信息和槽位信息的提取和输出。在提高意图识别识别率的同时,也提升了意图识别的准确性,并且采用WFST意图槽位模型进行并行匹配,极大的提升了匹配速率,降低了电子设备的运算负荷。In the embodiment of the present application, multiple preset WFST intent slot models can be stored in the electronic device. When performing intent recognition, the electronic device can use multiple WFST intent slot models to match user input in parallel. From the multiple WFST results obtained, the WFST result with the highest credibility is determined to extract and output the intention information and slot information. While improving the rate of intent recognition, it also improves the accuracy of intent recognition, and the WFST intent slot model is used for parallel matching, which greatly improves the matching rate and reduces the computing load of electronic devices.
以上所述,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的范围。As mentioned above, the above embodiments are only used to illustrate the technical solutions of the present application, but not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: The technical solutions recorded in the embodiments are modified, or some of the technical features are equivalently replaced; and these modifications or replacements do not cause the essence of the corresponding technical solutions to deviate from the scope of the technical solutions of the embodiments of the present application.
上述实施例中所用,根据上下文,术语“当…时”可以被解释为意思是“如果…”或“在…后”或“响应于确定…”或“响应于检测到…”。类似地,根据上下文,短语“在确定…时”或“如果检测到(所陈述的条件或事件)”可以被解释为意思是“如果确定…”或“响应于确定…”或“在检测到(所陈述的条件或事件)时”或“响应于检测到(所陈述的条件或事件)”。As used in the above embodiments, depending on the context, the term "when" can be interpreted as meaning "if..." or "after" or "in response to determining..." or "in response to detecting...". Similarly, depending on the context, the phrase "when determining..." or "if detected (statement or event)" can be interpreted as meaning "if determined..." or "in response to determining..." or "when detected (Condition or event stated)" or "in response to detection of (condition or event stated)".
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线)或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如DVD)、或者半导体介质(例如固态硬盘)等。In the foregoing embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented by software, it can be implemented in the form of a computer program product in whole or in part. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the processes or functions described in the embodiments of the present application are generated in whole or in part. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices. The computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium. For example, the computer instructions may be transmitted from a website, computer, server, or data center. Transmission to another website, computer, server or data center via wired (such as coaxial cable, optical fiber, digital subscriber line) or wireless (such as infrared, wireless, microwave, etc.). The computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or a data center integrated with one or more available media. The usable medium may be a magnetic medium, (for example, a floppy disk, a hard disk, and a magnetic tape), an optical medium (for example, a DVD), or a semiconductor medium (for example, a solid state hard disk).
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,该流程可以由计算机程序来指令相关的硬件完成,该程序可存储于计算机可读取存储介质中,该程序在执行时,可包括如上述各方法实施例的流程。而前述的存储介质包括:ROM或随机存储记忆体RAM、磁碟或者光盘等各种可存储程序代码的介质。A person of ordinary skill in the art can understand that all or part of the process in the above-mentioned embodiment method can be realized. The process can be completed by a computer program instructing relevant hardware. The program can be stored in a computer readable storage medium. , May include the processes of the foregoing method embodiments. The aforementioned storage media include: ROM or random storage RAM, magnetic disks or optical discs and other media that can store program codes.

Claims (23)

  1. 一种意图识别方法,其特征在于,包括:An intention recognition method, which is characterized in that it includes:
    响应于用户的语音输入,电子设备将所述语音输入转换为第一文本;In response to the user's voice input, the electronic device converts the voice input into the first text;
    所述电子设备使用预设的FST意图槽位模型对第二文本进行规则匹配,得到第三文本;所述第二文本根据所述第一文本确定;所述预设的FST意图槽位模型为一个预设的FST;所述第三文本中包括预设意图标注信息和/或预设槽位标注信息,所述预设意图标注信息用于标注出所述第二文本的意图信息,所述预设槽位标注信息用于标注出所述第二文本中的槽位信息;The electronic device uses a preset FST intent slot model to perform rule matching on the second text to obtain a third text; the second text is determined according to the first text; the preset FST intent slot model is A preset FST; the third text includes preset intent tagging information and/or preset slot tagging information, the preset intent tagging information is used to annotate the intent information of the second text, the The preset slot marking information is used to mark the slot information in the second text;
    所述电子设备根据预设意图标注信息和/或预设槽位标注信息,从所述第三文本中获取意图信息和/或槽位信息。The electronic device obtains intent information and/or slot information from the third text according to preset intent labeling information and/or preset slot labeling information.
  2. 根据权利要求1所述的方法,其特征在于,所述电子设备使用预设的FST意图槽位模型对第二文本进行规则匹配,得到第三文本的步骤之前,所述方法还包括:The method according to claim 1, wherein the electronic device uses a preset FST intent slot model to perform rule matching on the second text, and before the step of obtaining the third text, the method further comprises:
    所述电子设备对所述第一文本进行格式预处理,得到所述第二文本;所述第二文本中的格式字符少于或等于所述第一文本中的格式字符。The electronic device performs format preprocessing on the first text to obtain the second text; the format characters in the second text are less than or equal to the format characters in the first text.
  3. 根据权利要求1或2所述的方法,其特征在于,所述电子设备根据预设意图标注信息和/或预设槽位标注信息,从所述第三文本中获取意图信息和/或槽位信息的步骤之后,所述方法还包括:The method according to claim 1 or 2, wherein the electronic device obtains the intent information and/or the slot from the third text according to preset intent labeling information and/or preset slot labeling information After the information step, the method further includes:
    所述电子设备将所述意图信息和/或槽位信息进行结构化的输出。The electronic device outputs the intention information and/or slot information in a structured manner.
  4. 根据权利要求1至3中任一项所述的方法,其特征在于,所述预设的FST意图槽位模型为预设的WFST意图槽位模型,一个预设的WFST意图槽位模型是一个预设的WFST,所述预设的WFST意图槽位模型中,每一次状态转移时都有一个权重。The method according to any one of claims 1 to 3, wherein the preset FST intent slot model is a preset WFST intent slot model, and a preset WFST intent slot model is a In the preset WFST, in the preset WFST intent slot model, each state transition has a weight.
  5. 根据权利要求4所述的方法,其特征在于,所述电子设备使用预设的FST意图槽位模型对第二文本进行规则匹配,得到第三文本,具体包括:The method according to claim 4, wherein the electronic device uses a preset FST intent slot model to perform rule matching on the second text to obtain the third text, which specifically includes:
    所述电子设备使用多个预设的WFST意图槽位模型,对所述第二文本进行并行规则匹配,得到WFST结果;其中,WFST结果中包括第二文本的匹配文本以及匹配路径的累积权重,所述第二文本的匹配文本为通过所述匹配路径成功匹配后,输出的包含预设意图标注信息和/或预设槽位标注信息的文本;The electronic device uses a plurality of preset WFST intent slot models to perform parallel rule matching on the second text to obtain a WFST result; wherein the WFST result includes the matching text of the second text and the cumulative weight of the matching path, The matching text of the second text is a text that contains preset intent labeling information and/or preset slot labeling information that is output after successful matching through the matching path;
    当所述WFST结果为一个时,所述电子设备确定所述WFST结果中的第二文本的匹配文本为所述第三文本;When the WFST result is one, the electronic device determines that the matching text of the second text in the WFST result is the third text;
    当所述WFST结果为多个时,所述电子设备确定其中可信度最高的WFST结果中的第二文本的匹配文本为所述第三文本。When there are multiple WFST results, the electronic device determines that the matching text of the second text in the WFST result with the highest credibility is the third text.
  6. 根据权利要求5所述的方法,其特征在于,所述当所述WFST结果为多个时,所述电子设备确定其中可信度最高的WFST结果中的第二文本的匹配文本为所述第三文本,具 体包括:The method according to claim 5, wherein when there are multiple WFST results, the electronic device determines that the matching text of the second text in the WFST result with the highest credibility is the first Three texts, specifically including:
    当所述WFST结果为多个时,所述电子设备根据所述多个WFST结果中匹配路径的累积权重,对所述多个WFST结果进行可信度评分计算;When there are multiple WFST results, the electronic device performs credibility score calculation on the multiple WFST results according to the cumulative weights of the matching paths in the multiple WFST results;
    所述电子设备确定可信度评分最高的WFST结果中的第二文本的匹配文本为所述第三文本。The electronic device determines that the matching text of the second text in the WFST result with the highest credibility score is the third text.
  7. 根据权利要求6所述的方法,其特征在于,所述预设的WFST意图槽位模型中,接受通配符的状态转移的权重大于不接受通配符的状态转移的权重,且匹配路径上每次状态转移的权重越大,匹配路径的累积权重越大。The method according to claim 6, wherein in the preset WFST intent slot model, the weight of the state transition that accepts wildcards is greater than the weight of the state transition that does not accept wildcards, and each state transition on the matching path The greater the weight of, the greater the cumulative weight of the matching path.
  8. 根据权利要求7所述的方法,其特征在于,所述预设的WFST意图槽位模型中,匹配路径的累积权重等于匹配路径上每次状态转移的权重之和。The method according to claim 7, wherein in the preset WFST intent slot model, the cumulative weight of the matching path is equal to the sum of the weights of each state transition on the matching path.
  9. 根据权利要求7或8所述的方法,其特征在于,所述电子设备根据所述多个WFST结果中匹配路径的累积权重,对所述多个WFST结果进行可信度评分计算,具体包括:The method according to claim 7 or 8, wherein the electronic device performs credibility score calculation on the multiple WFST results according to the cumulative weights of the matching paths in the multiple WFST results, which specifically includes:
    所述电子设备使用可信度评分计算公式1,计算多个WFST结果的可信度评分;The electronic device uses the credibility score calculation formula 1 to calculate the credibility scores of multiple WFST results;
    可信度评分计算公式1:
    Figure PCTCN2021100475-appb-100001
    Reliability score calculation formula 1:
    Figure PCTCN2021100475-appb-100001
    其中,w表示需要评分的WFST结果中匹配路径的累积权重,w max表示所述多个WFST结果中匹配路径的累积权重中最大的累积权重。 Wherein, w represents the cumulative weight of the matching path in the WFST results that need to be scored, and w max represents the largest cumulative weight among the cumulative weights of the matching path in the multiple WFST results.
  10. 根据权利要求4至9中任一项所述的方法,其特征在于,所述电子设备使用预设的FST意图槽位模型对第二文本进行规则匹配,得到第三文本的步骤之前,所述方法还包括:The method according to any one of claims 4 to 9, wherein the electronic device uses a preset FST intent slot model to perform rule matching on the second text, and before the step of obtaining the third text, the Methods also include:
    所述电子设备加载预设的WFST意图槽位模型。The electronic device loads a preset WFST intent slot model.
  11. 一种电子设备,其特征在于,所述电子设备包括:一个或多个处理器和存储器;An electronic device, characterized in that, the electronic device includes: one or more processors and memories;
    所述存储器与所述一个或多个处理器耦合,所述存储器用于存储计算机程序代码,所述计算机程序代码包括计算机指令,所述一个或多个处理器调用所述计算机指令以使得所述电子设备执行:The memory is coupled with the one or more processors, and the memory is used to store computer program code, the computer program code includes computer instructions, and the one or more processors invoke the computer instructions to cause the Electronic equipment execution:
    响应于用户的语音输入,将所述语音输入转换为第一文本;In response to the user's voice input, converting the voice input into the first text;
    使用预设的FST意图槽位模型对第二文本进行规则匹配,得到第三文本;所述第二文本根据所述第一文本确定;所述预设的FST意图槽位模型为一个预设的FST;所述第三文本中包括预设意图标注信息和/或预设槽位标注信息,所述预设意图标注信息用于标注出所述第二文本的意图信息,所述预设槽位标注信息用于标注出所述第二文本中的槽位信息;Use the preset FST intent slot model to perform rule matching on the second text to obtain the third text; the second text is determined according to the first text; the preset FST intent slot model is a preset FST; the third text includes preset intent tagging information and/or preset slot tagging information, the preset intent tagging information is used to annotate the intent information of the second text, the preset slot The labeling information is used to label the slot information in the second text;
    根据预设意图标注信息和/或预设槽位标注信息,从所述第三文本中获取意图信息和/或槽位信息。According to the preset intent labeling information and/or the preset slot labeling information, the intent information and/or the slot information are obtained from the third text.
  12. 根据权利要求11所述的电子设备,其特征在于,所述一个或多个处理器,还用于调用所述计算机指令以使得所述电子设备执行:The electronic device according to claim 11, wherein the one or more processors are further configured to call the computer instructions to cause the electronic device to execute:
    对所述第一文本进行格式预处理,得到所述第二文本;所述第二文本中的格式字符少于或等于所述第一文本中的格式字符。Performing format preprocessing on the first text to obtain the second text; format characters in the second text are less than or equal to format characters in the first text.
  13. 根据权利要求11或12所述的电子设备,其特征在于,所述一个或多个处理器,还用于调用所述计算机指令以使得所述电子设备执行:The electronic device according to claim 11 or 12, wherein the one or more processors are further configured to call the computer instructions to cause the electronic device to execute:
    将所述意图信息和/或槽位信息进行结构化的输出。The intention information and/or slot information are output in a structured manner.
  14. 根据权利要求11至13中任一项所述的电子设备,其特征在于,所述预设的FST意图槽位模型为预设的WFST意图槽位模型,一个预设的WFST意图槽位模型是一个预设的WFST,所述预设的WFST意图槽位模型中,每一次状态转移时都有一个权重。The electronic device according to any one of claims 11 to 13, wherein the preset FST intent slot model is a preset WFST intent slot model, and a preset WFST intent slot model is A preset WFST, in the preset WFST intent slot model, each state transition has a weight.
  15. 根据权利要求14所述的电子设备,其特征在于,所述一个或多个处理器,具体用于调用所述计算机指令以使得所述电子设备执行:The electronic device according to claim 14, wherein the one or more processors are specifically configured to invoke the computer instructions to cause the electronic device to execute:
    使用多个预设的WFST意图槽位模型,对所述第二文本进行并行规则匹配,得到WFST结果;其中,WFST结果中包括第二文本的匹配文本以及匹配路径的累积权重,所述第二文本的匹配文本为通过所述匹配路径成功匹配后,输出的包含预设意图标注信息和/或预设槽位标注信息的文本;Use multiple preset WFST intent slot models to perform parallel rule matching on the second text to obtain a WFST result; wherein, the WFST result includes the matching text of the second text and the cumulative weight of the matching path, and the second The matching text of the text is the text that contains preset intent label information and/or preset slot label information that is output after successful matching through the matching path;
    当所述WFST结果为一个时,确定所述WFST结果中的第二文本的匹配文本为所述第三文本;When the WFST result is one, it is determined that the matching text of the second text in the WFST result is the third text;
    当所述WFST结果为多个时,确定其中可信度最高的WFST结果中的第二文本的匹配文本为所述第三文本。When there are multiple WFST results, it is determined that the matching text of the second text in the WFST result with the highest credibility is the third text.
  16. 根据权利要求15所述的电子设备,其特征在于,所述一个或多个处理器,具体用于调用所述计算机指令以使得所述电子设备执行:The electronic device according to claim 15, wherein the one or more processors are specifically configured to invoke the computer instructions to cause the electronic device to execute:
    当所述WFST结果为多个时,根据所述多个WFST结果中匹配路径的累积权重,对所述多个WFST结果进行可信度评分计算;When there are multiple WFST results, perform credibility score calculation on the multiple WFST results according to the cumulative weights of the matching paths in the multiple WFST results;
    确定可信度评分最高的WFST结果中的第二文本的匹配文本为所述第三文本。It is determined that the matching text of the second text in the WFST result with the highest credibility score is the third text.
  17. 根据权利要求16所述的电子设备,其特征在于,所述预设的WFST意图槽位模型中,接受通配符的状态转移的权重大于不接受通配符的状态转移的权重,且匹配路径上每次状态转移的权重越大,匹配路径的累积权重越大。The electronic device according to claim 16, wherein in the preset WFST intent slot model, the weight of the state transition that accepts wildcards is greater than the weight of the state transition that does not accept wildcards, and each state on the matching path The greater the weight of the transfer, the greater the cumulative weight of the matching path.
  18. 根据权利要求17所述的电子设备,其特征在于,所述预设的WFST意图槽位模型中,匹配路径的累积权重等于匹配路径上每次状态转移的权重之和。The electronic device of claim 17, wherein in the preset WFST intent slot model, the cumulative weight of the matching path is equal to the sum of the weights of each state transition on the matching path.
  19. 根据权利要求17或18所述的电子设备,其特征在于,所述一个或多个处理器,具 体用于调用所述计算机指令以使得所述电子设备执行:The electronic device according to claim 17 or 18, wherein the one or more processors are specifically configured to invoke the computer instructions to cause the electronic device to execute:
    使用可信度评分计算公式1,计算多个WFST结果的可信度评分;Use the credibility score calculation formula 1 to calculate the credibility score of multiple WFST results;
    可信度评分计算公式1:
    Figure PCTCN2021100475-appb-100002
    Reliability score calculation formula 1:
    Figure PCTCN2021100475-appb-100002
    其中,w表示需要评分的WFST结果中匹配路径的累积权重,w max表示所述多个WFST结果中匹配路径的累积权重中最大的累积权重。 Wherein, w represents the cumulative weight of the matching path in the WFST results that need to be scored, and w max represents the largest cumulative weight among the cumulative weights of the matching path in the multiple WFST results.
  20. 根据权利要求14至19中任一项所述的电子设备,其特征在于,所述一个或多个处理器,还用于调用所述计算机指令以使得所述电子设备执行:The electronic device according to any one of claims 14 to 19, wherein the one or more processors are further configured to call the computer instructions to cause the electronic device to execute:
    加载预设的WFST意图槽位模型。Load the preset WFST intent slot model.
  21. 一种芯片系统,所述芯片系统应用于电子设备,所述芯片系统包括一个或多个处理器,所述处理器用于调用计算机指令以使得所述电子设备执行如权利要求1-10中任一项所述的方法。A chip system, the chip system is applied to an electronic device, the chip system includes one or more processors, the processor is used to call computer instructions to make the electronic device execute any one of claims 1-10 The method described in the item.
  22. 一种包含指令的计算机程序产品,其特征在于,当所述计算机程序产品在电子设备上运行时,使得所述电子设备执行如权利要求1-10中任一项所述的方法。A computer program product containing instructions, characterized in that, when the computer program product runs on an electronic device, the electronic device is caused to execute the method according to any one of claims 1-10.
  23. 一种计算机可读存储介质,包括指令,其特征在于,当所述指令在电子设备上运行时,使得所述电子设备执行如权利要求1-10中任一项所述的方法。A computer-readable storage medium, comprising instructions, characterized in that, when the instructions run on an electronic device, the electronic device is caused to execute the method according to any one of claims 1-10.
PCT/CN2021/100475 2020-06-17 2021-06-17 Intent recognigion method and electronic device WO2021254411A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010555603.3 2020-06-17
CN202010555603.3A CN113806473A (en) 2020-06-17 2020-06-17 Intention recognition method and electronic equipment

Publications (1)

Publication Number Publication Date
WO2021254411A1 true WO2021254411A1 (en) 2021-12-23

Family

ID=78892632

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/100475 WO2021254411A1 (en) 2020-06-17 2021-06-17 Intent recognigion method and electronic device

Country Status (2)

Country Link
CN (1) CN113806473A (en)
WO (1) WO2021254411A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115563951A (en) * 2022-10-14 2023-01-03 美的集团(上海)有限公司 Text sequence labeling method and device, storage medium and electronic equipment
CN117034957A (en) * 2023-06-30 2023-11-10 海信集团控股股份有限公司 Semantic understanding method and device
US11934794B1 (en) * 2022-09-30 2024-03-19 Knowbl Inc. Systems and methods for algorithmically orchestrating conversational dialogue transitions within an automated conversational system

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116416993A (en) * 2021-12-30 2023-07-11 华为技术有限公司 Voice recognition method and device
CN115453897A (en) * 2022-08-18 2022-12-09 青岛海尔科技有限公司 Method and device for determining intention instruction, storage medium and electronic device
CN117973394A (en) * 2022-10-25 2024-05-03 华为技术有限公司 Natural language processing method and device, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8700404B1 (en) * 2005-08-27 2014-04-15 At&T Intellectual Property Ii, L.P. System and method for using semantic and syntactic graphs for utterance classification
CN109543190A (en) * 2018-11-29 2019-03-29 北京羽扇智信息科技有限公司 A kind of intension recognizing method, device, equipment and storage medium
CN110019745A (en) * 2017-10-26 2019-07-16 株式会社日立制作所 Conversational system with self study natural language understanding
CN111078844A (en) * 2018-10-18 2020-04-28 上海交通大学 Task-based dialog system and method for software crowdsourcing

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8700404B1 (en) * 2005-08-27 2014-04-15 At&T Intellectual Property Ii, L.P. System and method for using semantic and syntactic graphs for utterance classification
CN110019745A (en) * 2017-10-26 2019-07-16 株式会社日立制作所 Conversational system with self study natural language understanding
CN111078844A (en) * 2018-10-18 2020-04-28 上海交通大学 Task-based dialog system and method for software crowdsourcing
CN109543190A (en) * 2018-11-29 2019-03-29 北京羽扇智信息科技有限公司 A kind of intension recognizing method, device, equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
WU YUKAI: "RESEARCH AND IMPLEMENTATION ON SEMANTIC PROCESSING SYSTEM BASED ON RULE MATCHING", CHINESE SELECTED DOCTORAL DISSERTATIONS AND MASTER'S THESES FULL-TEXT DATABASES (MASTER), INFORMATION SCIENCE AND TECHNOLOGY, 15 February 2020 (2020-02-15), XP055881926 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11934794B1 (en) * 2022-09-30 2024-03-19 Knowbl Inc. Systems and methods for algorithmically orchestrating conversational dialogue transitions within an automated conversational system
CN115563951A (en) * 2022-10-14 2023-01-03 美的集团(上海)有限公司 Text sequence labeling method and device, storage medium and electronic equipment
CN117034957A (en) * 2023-06-30 2023-11-10 海信集团控股股份有限公司 Semantic understanding method and device
CN117034957B (en) * 2023-06-30 2024-05-31 海信集团控股股份有限公司 Semantic understanding method and device integrating large models

Also Published As

Publication number Publication date
CN113806473A (en) 2021-12-17

Similar Documents

Publication Publication Date Title
CN110111787B (en) Semantic parsing method and server
WO2021254411A1 (en) Intent recognigion method and electronic device
CN110910872B (en) Voice interaction method and device
CN112567457B (en) Voice detection method, prediction model training method, device, equipment and medium
CN110798506B (en) Method, device and equipment for executing command
JP7252327B2 (en) Human-computer interaction methods and electronic devices
WO2021258797A1 (en) Image information input method, electronic device, and computer readable storage medium
CN111970401B (en) Call content processing method, electronic equipment and storage medium
CN114691839A (en) Intention slot position identification method
CN112256868A (en) Zero-reference resolution method, method for training zero-reference resolution model and electronic equipment
WO2021238371A1 (en) Method and apparatus for generating virtual character
WO2022127130A1 (en) Method for adding operation sequence, electronic device, and system
WO2021031862A1 (en) Data processing method and apparatus thereof
CN113380240B (en) Voice interaction method and electronic equipment
WO2022007757A1 (en) Cross-device voiceprint registration method, electronic device and storage medium
CN116052648A (en) Training method, using method and training system of voice recognition model
CN114238554A (en) Text label extraction method
CN111768765A (en) Language model generation method and electronic equipment
CN114817521B (en) Searching method and electronic equipment
WO2023236908A1 (en) Image description method, electronic device and computer-readable storage medium
WO2024140891A1 (en) Compiling method, electronic device, and system
WO2021238338A1 (en) Speech synthesis method and device
CN118276861A (en) Compiling method, electronic equipment and system
CN114692641A (en) Method and device for acquiring characters

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21826479

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21826479

Country of ref document: EP

Kind code of ref document: A1