WO2020073248A1 - 一种人机交互的方法及电子设备 - Google Patents

一种人机交互的方法及电子设备 Download PDF

Info

Publication number
WO2020073248A1
WO2020073248A1 PCT/CN2018/109704 CN2018109704W WO2020073248A1 WO 2020073248 A1 WO2020073248 A1 WO 2020073248A1 CN 2018109704 W CN2018109704 W CN 2018109704W WO 2020073248 A1 WO2020073248 A1 WO 2020073248A1
Authority
WO
WIPO (PCT)
Prior art keywords
slot
information
input
user
server
Prior art date
Application number
PCT/CN2018/109704
Other languages
English (en)
French (fr)
Inventor
张晴
张锦辉
张轶博
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN201880093502.XA priority Critical patent/CN112154431A/zh
Priority to EP18936324.5A priority patent/EP3855338A4/en
Priority to PCT/CN2018/109704 priority patent/WO2020073248A1/zh
Priority to JP2021519867A priority patent/JP7252327B2/ja
Priority to KR1020217013813A priority patent/KR20210062704A/ko
Priority to US17/284,122 priority patent/US11636852B2/en
Publication of WO2020073248A1 publication Critical patent/WO2020073248A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1822Parsing for meaning understanding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Definitions

  • the present application relates to the field of communication technology, and in particular, to a human-computer interaction method and electronic equipment.
  • Man-machine dialogue system or call it man-machine dialogue platform, chatbot, etc., is a new generation of man-machine interaction interface.
  • the chat robot can have a conversation with the user, and recognize the user's intention during the conversation, and provide the user with services such as ordering meals, booking tickets, and taxis.
  • the working process may include: open domain dialogue, access conditions and closed domain dialogue.
  • the open domain dialogue means that the chat robot has not recognized the user's intention to conduct the dialogue.
  • the chat robot jumps to the closed domain dialogue after determining the user's intention (that is, the car-hailing service) after logical judgment (ie, access conditions).
  • Closed-field dialogue refers to a dialogue that is conducted to clarify the user's purpose (or to clarify task details) after identifying the user's intention.
  • the closed domain dialogue specifically includes the process of filling slots (referred to as filling slots for short), clarifying words, and responding to results.
  • the slot filling process refers to the process of completing the information in order to convert the user's intention into the user's explicit instruction.
  • the slot can be understood as the key information used by the user to express the intention.
  • the slots of the taxi service are: the slot of the departure place, the slot of the destination, and the slot of the departure time.
  • the chat robot extracts the information of these slots based on the dialogue with the user (for example, including the value of the slot, etc.).
  • the chat robot When the slot information lacks some necessary information, the chat robot will actively ask questions, and the user will answer, so that the chat robot can complete the necessary slot information from the user's answer. This process is called the process of clarification. After the chat robot has collected all the slot information, it can perform corresponding operations, such as placing an order for the user to take a taxi application, and notifying the user after placing the order, that is, the process of responding to the result.
  • the chat robot fails to extract the information of the required slot, the chat robot actively asks the user to ask the user for clarification until the information of the required slot is extracted.
  • the chat robot does not extract the information of the non-required slots, the chat robot does not ask questions, and directly performs the corresponding operation according to the information of the non-required slots.
  • the chat robot fails to extract some key information of non-required slots, which in turn causes chatbots
  • the operations performed may not meet the needs of users.
  • the "carpooling" that the user said can be a key information of an optional slot.
  • the chat robot does not accurately extract the key information, it may not reserve a carpool for the user, which is against the user's wishes and seriously affects the user experience.
  • a human-computer interaction method and electronic device provided by this application can accurately identify the user's purpose, meet the user's needs, and improve the user experience.
  • the method provided in this application can be applied to a human-machine dialogue system, including: a server receives a first input, and the first input includes a user's service requirements; the server determines the first corresponding to the first input according to the first input Field, the first field is the task scenario corresponding to the user's service needs; the server distributes the first input to the intent recognition model corresponding to the first field, and recognizes the first intent corresponding to the first input, and the first intent is the first field
  • the server extracts the information of the first slot in the first intent from the first input; wherein, the first slot is pre-configured in the first intent, and the first slot is an optional key slot;
  • the server determines that the information of the first slot is not extracted, the server asks the user to determine whether the information of the first slot is necessary; the server receives the second input, and the second input contains whether the information of the first slot confirmed by the user Necessary information; if the user confirms that the information in the first slot is necessary
  • the first input may be a utterance in a single round of dialogue between the user and the server 200, or multiple utterances in multiple rounds of dialogue between the user and the server 200, which is not limited in this embodiment of the present application.
  • the second input may be a utterance in a single round of dialogue between the user and the server 200, or multiple utterances in a multiple round of dialogue between the user and the server 200, which is not limited in the embodiments of the present application.
  • the non-required key slot means that the user does not have to express the information of the slot when expressing his intention. If the information of the slot is not expressed, the chat robot can ignore the information of the slot. However, if the user expresses the information of the slot, the chat robot needs to accurately extract the information of the slot.
  • the chatbot when the server automatically extracts the information of each preset slot according to the user's utterance, if there is a slot where no information is extracted, and the slot is an optional key slot
  • the chatbot will actively confirm with the user. Confirm whether the information of the non-required key slot can be missing. If it is not possible, continue to extract the information of the non-required key slot according to the user's answer. If it can be missing, the information of the non-required key slot is no longer extracted, that is, no confirmation is made to the user.
  • the chat robot does not extract the information of the non-required key slots, it can also confirm with the user to ensure that the purpose of the user is accurately identified, meet the needs of the user, and improve the user experience.
  • the server extracting the information of the first slot in the first intent from the first input includes:
  • the server inputs each word or each entity identified in the first input into the slot extraction model corresponding to the first slot, and calculates the confidence corresponding to each word or each entity in the first input; if the The confidence of the first word or first entity is greater than or equal to the first threshold, the server confirms that the first word or first entity is the information of the first slot; if the confidence of each word or each entity in the first input is less than The first threshold, the server determines that the information of the first slot is not extracted.
  • the method further includes: if the first slot corresponds to a custom slot type, the server separately calculates the entities of each entity identified in the first input and each word in the custom slot type Similarity.
  • the server confirms that the first input does not contain the information of the first slot; if the first input The similarity between the second entity and the second word in the custom slot type is greater than or equal to the third threshold, the server confirms that the second word is the information of the first slot; if any entity and custom exist in the first input.
  • the server confirms to ask the user to determine whether the information of the first slot is necessary.
  • an algorithm based on pinyin similarity, a string similarity algorithm, etc. may be used to calculate the identity of the entity identified in the first input and the keyword in the user dictionary Edit the distance to determine how similar the two are. It is also possible to calculate the similarity of words or phrases using deep learning word vectors and sentence vectors.
  • the embodiment of the present application does not limit the method for calculating the similarity.
  • the server 200 may adopt an error correction method to determine that the entity identified from the user's utterance (ie, the first input) is similar to the keyword in the user dictionary, and then trigger the confirmation mechanism to the user. In this way, it is helpful to reduce the number of confirmations to the user, to avoid excessively disturbing the user, and to improve the user experience.
  • the method further includes: if the confidence of each word or each entity in the first input is less than the fourth threshold, the server confirms that the first input does not contain the information of the first slot; if When there is any word in the first input or the confidence of any entity is less than the first threshold and greater than or equal to the fourth threshold, the server confirms to ask the user to determine whether the information in the first slot is necessary.
  • the situation that the first slot information is not extracted may be caused by the inaccurate slot extraction model itself.
  • the server training generation slot extraction model is not accurate enough.
  • the user can set a confirmation threshold.
  • the slot extraction model gives a slot labeling probability value to an entity identified from the user's utterance greater than the confirmation threshold
  • the server triggers a confirmation mechanism to the user. In this way, it is helpful to reduce the number of confirmations to the user, to avoid excessively disturbing the user, and to improve the user experience.
  • the server extracting the information of the first slot from the second input includes: if the user confirms that the information of the first slot is necessary information , The server uses the slot extraction model corresponding to the first slot or rules to extract the information of the first slot from the second input.
  • the slot extraction model may not correctly identify the entity the first time, but it can correctly identify the entity the second time. This is because, when the user speaks the entity for the first time, the statement is likely to contain other entities, that is, the entity has a context. When the slot extraction model is not accurate enough, the entity may not be recognized because these contexts are not recognized. Then, when the server cannot identify the entity for the first time, the user is asked a question about the entity, then the user's answer is to the entity. At this time, the user's answer may contain only the entity, or contain very little context. Then, the slot extraction model is likely to identify the entity this time.
  • the entity can also be identified by a non-slot extraction model, for example, a rule can be enabled to identify the entity.
  • the rule refers to that it can be identified by combining the context logic of the user's answer, the relevance of the user's intention, the correspondence between the entity and the first slot, and other factors. In this way, the probability that the server recognizes the user and speaks the entity a second or more times can be effectively improved.
  • a second slot is pre-configured in the first intention, and the second slot is a required slot.
  • the method of human-computer interaction further includes: when the server determines that the second slot is not extracted Server, the server asks the user to extract the information of the second slot; the server receives the third input and extracts the information of the second slot from the third input, the third input contains the user's answer; the server An intention, the information of the first slot, and the information of the second slot perform the operation corresponding to the first intention; or, the server performs the operation corresponding to the first intention according to the first intention and the information of the second slot.
  • a third slot is also pre-configured in the first intention, and the third slot is a non-required and non-critical slot.
  • the method of human-computer interaction further includes: when the server determines that it is not extracted When the information in the third slot is used, the server does not extract the information in the third slot.
  • a server applicable to a human-machine dialogue system, includes: a communication interface, a memory, and a processor; a communication interface, a memory, and a processor are coupled; the memory is used to store computer program code, and the computer program code includes computer instructions , When the processor reads computer instructions from the memory, so that the server performs the following steps:
  • the first input contains the user's service needs; according to the first input, determine the first field corresponding to the first input, the first field is the task scenario corresponding to the user's service needs; distribute the first input Go to the intent recognition model corresponding to the first field, and identify the first intent corresponding to the first input.
  • the first intent is a sub-scene in the first field; extract the information of the first slot in the first intent from the first input ;
  • the first intention is pre-configured with a first slot, and the first slot is a non-required key slot; when the server determines that the information of the first slot is not extracted, it asks the user to determine the first Whether the information of the slot is necessary;
  • the second input is received through the communication interface, and the second input contains information on whether the information of the first slot confirmed by the user is necessary; if the user confirms that the information of the first slot is necessary, the second input Extract the information of the first slot from the input; perform the operation corresponding to the first intention according to the first intent and the information of the first slot; if the user confirms that the information of the first slot is an unnecessary letter , Information extraction is not the first slot; first intention corresponding to a first performed according to the operator's intention.
  • the processor extracting the information of the first slot in the first intention from the first input specifically includes: the processor inputs each word or each entity identified in the first input to the first slot In the corresponding slot extraction model, the confidences corresponding to each word or each entity in the first input are calculated separately; if the confidence of the first word or the first entity in the first input is greater than or equal to the first threshold, then confirm the first A word or first entity is information of the first slot; if the confidence of each word or each entity in the first input is less than the first threshold, it is determined that the information of the first slot is not extracted.
  • the processor is further configured to calculate the similarity between each entity identified in the first input and each word in the custom slot type if the first slot corresponds to the custom slot type degree.
  • the similarity between each entity identified in the first input and each word in the custom slot type is less than the second threshold, confirm that the first input does not contain the information of the first slot; if the first input The similarity between the second entity and the second word in the custom slot type is greater than or equal to the third threshold, then the second word is confirmed as the information of the first slot; if there is any entity and custom slot in the first input When the similarity of any word in the type is greater than or equal to the second threshold and less than the third threshold, it is confirmed to ask the user to determine whether the information of the first slot is necessary.
  • the processor is also used to confirm that the first input does not contain the information of the first slot if the confidence of each word or each entity in the first input is less than the fourth threshold; When there is any word in an input or the confidence of any entity is less than the first threshold and greater than or equal to the fourth threshold, it is confirmed to ask the user to determine whether the information of the first slot is necessary.
  • the processor extracts the information of the first slot from the second input. Specifically, if the user confirms that the information of the first slot is necessary Information, the processor uses the slot extraction model corresponding to the first slot or rules to extract the first slot information from the second input.
  • the processor when the second slot is also pre-configured in the first intention, and the second slot is a required slot, the processor is further specifically used when the processor determines that the second slot is not extracted When asking for information about the position, ask the user to extract the information of the second slot; receive the third input through the communication interface and extract the information of the second slot from the third input.
  • the third input contains the user's answer; An intention, the information of the first slot, and the information of the second slot perform the operation corresponding to the first intention; or, according to the first intention, and the information of the second slot, perform the operation corresponding to the first intention.
  • the processor is also specifically configured to determine that the third slot is not extracted When the slot information is used, the information of the third slot is not extracted.
  • a computer storage medium includes computer instructions.
  • the terminal causes the terminal to perform the method described in the first aspect and any possible implementation manner thereof.
  • a computer program product when the computer program product runs on a computer, causes the computer to execute the method as described in the first aspect and any possible implementation manner thereof.
  • 1A is a schematic diagram of a terminal interface of a human-machine dialogue in the prior art
  • 1B is a schematic diagram of a terminal interface of a human-machine dialogue provided by an embodiment of the present application.
  • FIG. 2 is a schematic diagram 1 of a composition of a human-machine dialogue system provided by an embodiment of the present application;
  • FIG. 3 is a second schematic diagram of the composition of a human-machine dialogue system provided by an embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
  • FIG. 5 is a schematic diagram of the interface of some electronic devices provided by an embodiment of the present application.
  • FIG. 6 is a schematic diagram of an interface of some electronic devices provided by an embodiment of the present application.
  • FIG. 7 is a schematic diagram of an interface of some electronic devices provided by an embodiment of the present application.
  • FIG. 8 is a schematic diagram of an interface of some electronic devices provided by an embodiment of the present application.
  • FIG. 9 is a schematic flowchart 1 of a method for human-computer interaction provided by an embodiment of the present application.
  • FIG. 10 is a second schematic flowchart of a method for human-computer interaction provided by an embodiment of the present application.
  • FIG. 11 is a schematic structural diagram of a server provided by an embodiment of the present application.
  • first and second are used for description purposes only, and cannot be understood as indicating or implying relative importance or implicitly indicating the number of indicated technical features.
  • the features defined as “first” and “second” may explicitly or implicitly include one or more of the features.
  • the meaning of “plurality” is two or more.
  • the embodiment of the present application provides a method for human-computer interaction, and further divides non-required slots into non-required critical slots and non-required non-required non-critical slots.
  • a user confirmation mechanism is configured for non-required key slots.
  • the chat robot when the chat robot automatically extracts the information of each preset slot according to the user's words, if there is a slot where no information is extracted, and the slot is an optional key slot, the chat robot Will actively confirm to the user. Confirm whether the information of the non-required key slot can be missing. If it is not possible, continue to extract the information of the non-required key slot according to the user's answer. If it can be missing, the information of the non-required key slot is no longer extracted, that is, no confirmation is made to the user. In this way, when the chat robot does not extract the information of the non-required key slots, it can also confirm with the user to ensure that the purpose of the user is accurately identified, meet the needs of the user, and improve the user experience.
  • the non-required key slot means that the user does not have to express the information of the slot when expressing his intention. If the information of the slot is not expressed, the chat robot can ignore the information of the slot. However, if the user expresses the information of the slot, the chat robot needs to accurately extract the information of the slot.
  • FIG. 1B it is an example of a dialogue process between a chat robot and a user provided by an embodiment of the present application.
  • “Carpooling” is configured as an optional key slot in the taxi application.
  • the chat robot did not extract the information (that is, the information of the non-required key slot).
  • the chat bot needs to make further confirmation to the user and ask the user "whether carpooling is possible”. Then extract the information of the non-required key slots from the user's answer to ensure that the user's intention is accurately executed.
  • the chat robot extracts the information of the non-required key slot, the user can place an order for carpooling. If the user ’s answer is “not carpooling”, it means that the user is not willing to carpool. The information of the non-required key slot is important. After the chat robot extracts the information of the non-required key slot, the user can place an order for non-carpooling . If the user's answer is "it doesn't matter”, it means that the information of the non-required key slot is not important, and the chat bot can place an order for the user regardless of the factors of carpooling.
  • the confirmation of the non-required key slot information to the user can not only complete the extraction of important non-required key slot information, but also help to further confirm the user's wishes, and is more conducive to improving the execution of chat robots.
  • the accuracy of user intent improves user experience.
  • the method for human-computer interaction provided by the embodiment of the present application can be applied to the human-computer dialogue system shown in FIG. 2.
  • the human-machine dialogue system includes an electronic device 100 and one or more servers 200 (for example, a chat robot).
  • the electronic device 100 may also establish a connection with the server 200 using a telecommunication network (a communication network such as 3G / 4G / 5G) or a WIFI network, which is not limited in the embodiment of the present application.
  • the electronic device 100 may be a mobile phone, a tablet computer, a personal computer (Personal Computer, PC), a personal digital assistant (PDA), a smart watch, a netbook, a wearable electronic device, an augmented reality (Augmented Reality, AR) device , Virtual Reality (Virtual Reality, VR) equipment, in-vehicle equipment, smart cars, smart audio, etc., this application does not make special restrictions on the specific form of the electronic device 100.
  • augmented reality Augmented Reality, AR
  • VR Virtual Reality
  • in-vehicle equipment smart cars
  • smart audio etc.
  • the server 200 can provide a service of human-machine dialogue for the electronic device 100, and can recognize the user's intention according to the user utterance input by the electronic device to understand the user's needs and provide corresponding services for the user.
  • the server 200 may be a server of a manufacturer of the electronic device 100, for example, a cloud server of a voice assistant in the electronic device 100, and the server 300 may also be a server of other applications, which is not limited in this embodiment of the present application.
  • the server 200 may also establish a communication connection with the server 300 of one or more third-party applications, so that the server 200 sends a corresponding service request to the server 300 of the corresponding third-party application after understanding the needs of the user. And the response information of the server 300 of the third-party application is returned to the electronic device 100.
  • the server 200 may also establish a communication connection with the electronic device 400 of the third-party application, so that the developer or manager of the third-party application can log in to the server 200 through the electronic device 400 to perform services provided by itself Configuration and management, etc.
  • FIG. 3 it is a frame diagram of another human-machine dialogue system provided by an embodiment of the present application.
  • the user can input a user sentence (which may be in the form of voice or text) to the server 200 through the electronic device 100. If it is a voice form, the electronic device 100 may convert the voice form into a text form, and then send it to the server 200, or the server 200 may convert the voice form of a user sentence into a text form.
  • a user sentence which may be in the form of voice or text
  • the electronic device 100 may convert the voice form into a text form, and then send it to the server 200, or the server 200 may convert the voice form of a user sentence into a text form.
  • the embodiments of the present application are not limited.
  • the Natural Language Understanding (NLU) module therein first performs semantic understanding on the user sentence. Specifically, when the user sentence passes through the natural language understanding module, it needs to go through three sub-modules: domain classification, intention classification, and slot extraction.
  • domain classification module can first identify to which specific task scenario the user sentence belongs, and distribute the user utterance to the specific task scenario.
  • the intention recognition module can recognize the user's intention and subdivide the user's utterance into sub-scenarios under specific task scenarios.
  • the slot extraction module can identify the entity in the user sentence and perform slot filling (Slot Filling).
  • NER Named Entity Recognition
  • the domain classification module may determine that a taxi task needs to be performed for the user according to the user ’s “help me get a car” (sub-scenes may also include a special car task and an express task , The task of hitchhiking). Then, the intention classification can determine the need to perform the express task for the user based on the user's "Didi Express". Then, the slot extraction module can extract the destination slot information as "Shenzhen Bay Park" and the departure time slot information as "8:30". It should be noted that the user in FIG. 1B does not describe the starting location slot information. The slot extraction module can extract the default starting location set by the user as the starting location slot information, or locate by GPS, and use the located location as the starting location Slot information.
  • the output of the natural language understanding module will be used as the input of the Dialog Management module.
  • the dialogue management module includes two parts, status tracking and dialogue strategy.
  • the state tracking module includes all kinds of information for continuous dialogue, and updates the current dialogue state according to the old state, the user state (the information output by the natural language understanding module) and the system state (that is, the query with the database).
  • the dialogue strategy is closely related to the task scenario where it is located, and is usually used as the output of the dialogue management module, such as the mechanism for inquiring about missing required slots.
  • the dialogue strategy further includes a confirmation mechanism for missing non-required key slots.
  • the confirmation mechanism for the missing non-required key slots may be processed in parallel or serially with the questioning mechanism for the missing mandatory slots. That is to say, the embodiments of the present application do not limit the execution order of the confirmation mechanism and the questioning mechanism.
  • the specific confirmation mechanism will be elaborated in the following embodiments, and will not be repeated here.
  • the Natural Language Generation (NLG) module generates text information and feeds it back to the user according to the output of the dialogue management module, that is, completing the human-computer interaction process with the user.
  • the natural language generation module can generate natural language based on templates, based on grammar or based on models. Template-based and grammar-based strategies are mainly rule-based strategies, and model-based strategies such as Long Short-Term Memory (LSTM) can be used.
  • LSTM Long Short-Term Memory
  • FIG. 4 shows a schematic structural diagram of the electronic device 100.
  • the electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2 , Mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, headphone jack 170D, sensor module 180, key 190, motor 191, indicator 192, camera 193, display 194, and Subscriber identification module (SIM) card interface 195, etc.
  • SIM Subscriber identification module
  • the sensor module 180 may include a pressure sensor 180A, a gyro sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, and ambient light Sensor 180L, bone conduction sensor 180M, etc.
  • the structure illustrated in the embodiment of the present invention does not constitute a specific limitation on the electronic device 100.
  • the electronic device 100 may include more or less components than shown, or combine some components, or split some components, or arrange different components.
  • the illustrated components can be implemented in hardware, software, or a combination of software and hardware.
  • the processor 110 may include one or more processing units.
  • the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), and an image signal processor. (image) signal processor (ISP), controller, memory, video codec, digital signal processor (DSP), baseband processor, and / or neural-network processing unit (NPU) Wait.
  • image image signal processor
  • ISP image signal processor
  • controller memory
  • video codec digital signal processor
  • DSP digital signal processor
  • NPU neural-network processing unit
  • different processing units may be independent devices, or may be integrated in one or more processors.
  • the controller may be the nerve center and command center of the electronic device 100.
  • the controller can generate the operation control signal according to the instruction operation code and the timing signal to complete the control of fetching instructions and executing instructions.
  • the processor 110 may also be provided with a memory for storing instructions and data.
  • the memory in processor 110 is a cache memory.
  • the memory may store instructions or data that the processor 110 has just used or recycled. If the processor 110 needs to use the instruction or data again, it can be directly called from the memory. The repeated access is avoided, and the waiting time of the processor 110 is reduced, thereby improving the efficiency of the system.
  • the processor 110 may include one or more interfaces.
  • Interfaces can include integrated circuit (inter-integrated circuit, I2C) interface, integrated circuit built-in audio (inter-integrated circuit, sound, I2S) interface, pulse code modulation (pulse code modulation (PCM) interface, universal asynchronous transceiver (universal asynchronous) receiver / transmitter, UART) interface, mobile industry processor interface (MIPI), general-purpose input / output (GPIO) interface, subscriber identity module (SIM) interface, and / Or universal serial bus (USB) interface, etc.
  • I2C integrated circuit
  • I2S integrated circuit built-in audio
  • PCM pulse code modulation
  • PCM pulse code modulation
  • UART universal asynchronous transceiver
  • MIPI mobile industry processor interface
  • GPIO general-purpose input / output
  • SIM subscriber identity module
  • USB universal serial bus
  • the I2C interface is a bidirectional synchronous serial bus, including a serial data line (serial data line, SDA) and a serial clock line (derail clock line, SCL).
  • the processor 110 may include multiple sets of I2C buses.
  • the processor 110 may be coupled to the touch sensor 180K, the charger, the flash, the camera 193, etc. through different I2C bus interfaces.
  • the processor 110 may couple the touch sensor 180K through the I2C interface, so that the processor 110 and the touch sensor 180K communicate through the I2C bus interface to realize the touch function of the electronic device 100.
  • the I2S interface can be used for audio communication.
  • the processor 110 may include multiple sets of I2S buses.
  • the processor 110 may be coupled to the audio module 170 through an I2S bus to implement communication between the processor 110 and the audio module 170.
  • the audio module 170 can transmit audio signals to the wireless communication module 160 through the I2S interface, to realize the function of answering the phone call through the Bluetooth headset.
  • the PCM interface can also be used for audio communication, sampling, quantizing and encoding analog signals.
  • the audio module 170 and the wireless communication module 160 may be coupled through a PCM bus interface.
  • the audio module 170 can also transmit audio signals to the wireless communication module 160 through the PCM interface to realize the function of answering the call through the Bluetooth headset. Both the I2S interface and the PCM interface can be used for audio communication.
  • the UART interface is a universal serial data bus used for asynchronous communication.
  • the bus may be a bidirectional communication bus. It converts the data to be transmitted between serial communication and parallel communication.
  • the UART interface is generally used to connect the processor 110 and the wireless communication module 160.
  • the processor 110 communicates with the Bluetooth module in the wireless communication module 160 through the UART interface to implement the Bluetooth function.
  • the audio module 170 may transmit audio signals to the wireless communication module 160 through the UART interface, so as to realize the function of playing music through the Bluetooth headset.
  • the MIPI interface can be used to connect the processor 110 to peripheral devices such as the display screen 194 and the camera 193.
  • MIPI interface includes camera serial interface (camera serial interface, CSI), display serial interface (display serial interface, DSI) and so on.
  • the processor 110 and the camera 193 communicate through a CSI interface to implement the shooting function of the electronic device 100.
  • the processor 110 and the display screen 194 communicate through the DSI interface to realize the display function of the electronic device 100.
  • the GPIO interface can be configured via software.
  • the GPIO interface can be configured as a control signal or a data signal.
  • the GPIO interface may be used to connect the processor 110 to the camera 193, the display screen 194, the wireless communication module 160, the audio module 170, the sensor module 180, and the like.
  • GPIO interface can also be configured as I2C interface, I2S interface, UART interface, MIPI interface and so on.
  • the USB interface 130 is an interface that conforms to the USB standard, and may specifically be a Mini USB interface, a Micro USB interface, a USB Type C interface, etc.
  • the USB interface 130 can be used to connect a charger to charge the electronic device 100, and can also be used to transfer data between the electronic device 100 and peripheral devices. It can also be used to connect headphones and play audio through the headphones.
  • the interface can also be used to connect other electronic devices, such as AR devices.
  • the interface connection relationship between the modules illustrated in the embodiments of the present invention is only a schematic description, and does not constitute a limitation on the structure of the electronic device 100.
  • the electronic device 100 may also use different interface connection methods in the foregoing embodiments, or a combination of multiple interface connection methods.
  • the charging management module 140 is used to receive charging input from the charger.
  • the charger can be a wireless charger or a wired charger.
  • the charging management module 140 may receive the charging input of the wired charger through the USB interface 130.
  • the charging management module 140 may receive wireless charging input through the wireless charging coil of the electronic device 100. While the charging management module 140 charges the battery 142, it can also supply power to the electronic device through the power management module 141.
  • the power management module 141 is used to connect the battery 142, the charging management module 140 and the processor 110.
  • the power management module 141 receives input from the battery 142 and / or the charging management module 140, and supplies power to the processor 110, the internal memory 121, the external memory, the display screen 194, the camera 193, and the wireless communication module 160.
  • the power management module 141 can also be used to monitor battery capacity, battery cycle times, battery health status (leakage, impedance) and other parameters.
  • the power management module 141 may also be disposed in the processor 110.
  • the power management module 141 and the charging management module 140 may also be set in the same device.
  • the wireless communication function of the electronic device 100 can be realized by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modem processor, and the baseband processor.
  • Antenna 1 and antenna 2 are used to transmit and receive electromagnetic wave signals.
  • Each antenna in the electronic device 100 may be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve antenna utilization.
  • the antenna 1 can be multiplexed as a diversity antenna of a wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.
  • the mobile communication module 150 can provide a wireless communication solution including 2G / 3G / 4G / 5G and the like applied to the electronic device 100.
  • the mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (LNA), and the like.
  • the mobile communication module 150 can receive the electromagnetic wave from the antenna 1, filter and amplify the received electromagnetic wave, and transmit it to the modem processor for demodulation.
  • the mobile communication module 150 can also amplify the signal modulated by the modulation and demodulation processor and convert it to electromagnetic wave radiation through the antenna 1.
  • at least part of the functional modules of the mobile communication module 150 may be provided in the processor 110.
  • at least part of the functional modules of the mobile communication module 150 and at least part of the modules of the processor 110 may be provided in the same device.
  • the modem processor may include a modulator and a demodulator.
  • the modulator is used to modulate the low-frequency baseband signal to be transmitted into a high-frequency signal.
  • the demodulator is used to demodulate the received electromagnetic wave signal into a low-frequency baseband signal.
  • the demodulator then transmits the demodulated low-frequency baseband signal to the baseband processor for processing.
  • the low-frequency baseband signal is processed by the baseband processor and then passed to the application processor.
  • the application processor outputs a sound signal through an audio device (not limited to a speaker 170A, a receiver 170B, etc.), or displays an image or video through a display screen 194.
  • the modem processor may be an independent device.
  • the modem processor may be independent of the processor 110, and may be set in the same device as the mobile communication module 150 or other functional modules.
  • the wireless communication module 160 can provide wireless local area networks (wireless local area networks, WLAN) (such as wireless fidelity (Wi-Fi) networks), Bluetooth (bluetooth, BT), and global navigation satellites that are applied to the electronic device 100. Wireless communication solutions such as global navigation (satellite system, GNSS), frequency modulation (FM), near field communication (NFC), infrared (IR), etc.
  • the wireless communication module 160 may be one or more devices integrating at least one communication processing module.
  • the wireless communication module 160 receives the electromagnetic wave via the antenna 2, frequency-modulates and filters the electromagnetic wave signal, and sends the processed signal to the processor 110.
  • the wireless communication module 160 may also receive the signal to be transmitted from the processor 110, frequency-modulate it, amplify it, and convert it to electromagnetic waves through the antenna 2 to radiate it out.
  • the antenna 1 of the electronic device 100 and the mobile communication module 150 are coupled, and the antenna 2 and the wireless communication module 160 are coupled so that the electronic device 100 can communicate with the network and other devices through wireless communication technology.
  • the wireless communication technology may include global mobile communication system (global system for mobile communications, GSM), general packet radio service (general packet radio service, GPRS), code division multiple access (code division multiple access, CDMA), broadband Wideband code division multiple access (WCDMA), time-division code division multiple access (TD-SCDMA), long-term evolution (LTE), BT, GNSS, WLAN, NFC , FM, and / or IR technology, etc.
  • the GNSS may include a global positioning system (GPS), a global navigation satellite system (GLONASS), a beidou navigation system (BDS), and a quasi-zenith satellite system (quasi -zenith satellite system (QZSS) and / or satellite-based augmentation system (SBAS).
  • GPS global positioning system
  • GLONASS global navigation satellite system
  • BDS beidou navigation system
  • QZSS quasi-zenith satellite system
  • SBAS satellite-based augmentation system
  • the electronic device 100 realizes a display function through a GPU, a display screen 194, and an application processor.
  • the GPU is a microprocessor for image processing, connecting the display screen 194 and the application processor.
  • the GPU is used to perform mathematical and geometric calculations, and is used for graphics rendering.
  • the processor 110 may include one or more GPUs that execute program instructions to generate or change display information.
  • the display screen 194 is used to display images, videos and the like.
  • the display screen 194 includes a display panel.
  • the display panel may use a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active matrix organic light-emitting diode or an active matrix organic light-emitting diode (active-matrix organic light) emitting diode, AMOLED), flexible light-emitting diode (FLED), Miniled, MicroLed, Micro-oLed, quantum dot light emitting diode (QLED), etc.
  • the electronic device 100 may include 1 or N display screens 194, where N is a positive integer greater than 1.
  • the electronic device 100 can realize a shooting function through an ISP, a camera 193, a video codec, a GPU, a display screen 194, an application processor, and the like.
  • the ISP processes the data fed back by the camera 193. For example, when taking a picture, the shutter is opened, the light is transmitted to the camera photosensitive element through the lens, the optical signal is converted into an electrical signal, and the camera photosensitive element transmits the electrical signal to the ISP for processing, and converts it into an image visible to the naked eye.
  • ISP can also optimize the algorithm of image noise, brightness and skin color. ISP can also optimize the exposure, color temperature and other parameters of the shooting scene.
  • the ISP may be provided in the camera 193.
  • the camera 193 is used to capture still images or videos.
  • the object generates an optical image through the lens and projects it onto the photosensitive element.
  • the photosensitive element may be a charge coupled device (charge coupled device, CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor.
  • CCD charge coupled device
  • CMOS complementary metal-oxide-semiconductor
  • the photosensitive element converts the optical signal into an electrical signal, and then transmits the electrical signal to the ISP to convert it into a digital image signal.
  • the ISP outputs the digital image signal to the DSP for processing.
  • DSP converts digital image signals into standard RGB, YUV and other image signals.
  • the electronic device 100 may include 1 or N cameras 193, where N is a positive integer greater than 1.
  • the digital signal processor is used to process digital signals. In addition to digital image signals, it can also process other digital signals. For example, when the electronic device 100 is selected at a frequency point, the digital signal processor is used to perform Fourier transform on the energy at the frequency point.
  • Video codec is used to compress or decompress digital video.
  • the electronic device 100 may support one or more video codecs. In this way, the electronic device 100 can play or record videos in various encoding formats, for example: moving picture experts group (MPEG) 1, MPEG2, MPEG3, MPEG4, etc.
  • MPEG moving picture experts group
  • NPU is a neural-network (NN) computing processor.
  • NN neural-network
  • the NPU can realize applications such as intelligent recognition of the electronic device 100, such as image recognition, face recognition, voice recognition, and text understanding.
  • the external memory interface 120 can be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the electronic device 100.
  • the external memory card communicates with the processor 110 through the external memory interface 120 to realize the data storage function. For example, save music, video and other files in an external memory card.
  • the internal memory 121 may be used to store computer executable program code, where the executable program code includes instructions.
  • the processor 110 executes various functional applications and data processing of the electronic device 100 by executing instructions stored in the internal memory 121.
  • the internal memory 121 may include a storage program area and a storage data area.
  • the storage program area may store an operating system, at least one function required application programs (such as sound playback function, image playback function, etc.) and so on.
  • the storage data area may store data (such as audio data, phone book, etc.) created during use of the electronic device 100 and the like.
  • the internal memory 121 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one disk storage device, a flash memory device, a universal flash memory (universal flash storage, UFS), and so on.
  • a non-volatile memory such as at least one disk storage device, a flash memory device, a universal flash memory (universal flash storage, UFS), and so on.
  • the electronic device 100 may implement audio functions through an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, a headphone interface 170D, and an application processor. For example, music playback, recording, etc.
  • the audio module 170 is used to convert digital audio information into analog audio signal output, and also used to convert analog audio input into digital audio signal.
  • the audio module 170 can also be used to encode and decode audio signals.
  • the audio module 170 may be disposed in the processor 110, or some functional modules of the audio module 170 may be disposed in the processor 110.
  • the speaker 170A also called “speaker” is used to convert audio electrical signals into sound signals.
  • the electronic device 100 can listen to music through the speaker 170A, or listen to a hands-free call.
  • the receiver 170B also known as “handset” is used to convert audio electrical signals into sound signals.
  • the electronic device 100 answers a call or a voice message, it can answer the voice by bringing the receiver 170B close to the ear.
  • Microphone 170C also known as “microphone”, “microphone”, is used to convert sound signals into electrical signals.
  • the user can make a sound by approaching the microphone 170C through a person's mouth, and input a sound signal to the microphone 170C.
  • the electronic device 100 may be provided with at least one microphone 170C. In other embodiments, the electronic device 100 may be provided with two microphones 170C. In addition to collecting sound signals, it may also implement a noise reduction function.
  • the electronic device 100 may also be provided with three, four, or more microphones 170C to collect sound signals, reduce noise, identify sound sources, and implement directional recording functions.
  • the headset interface 170D is used to connect wired headsets.
  • the earphone interface 170D may be a USB interface 130, or a 3.5mm open mobile electronic device (open mobile terminal) platform (OMTP) standard interface, the American Telecommunications Industry Association (cellular telecommunications industry association of the United States, CTIA) standard interface.
  • OMTP open mobile electronic device
  • CTIA American Telecommunications Industry Association
  • the pressure sensor 180A is used to sense the pressure signal and can convert the pressure signal into an electrical signal.
  • the pressure sensor 180A may be provided on the display screen 194.
  • the capacitive pressure sensor may be a parallel plate including at least two conductive materials. When force is applied to the pressure sensor 180A, the capacitance between the electrodes changes.
  • the electronic device 100 determines the intensity of the pressure according to the change in capacitance.
  • the electronic device 100 detects the intensity of the touch operation according to the pressure sensor 180A.
  • the electronic device 100 may also calculate the touched position based on the detection signal of the pressure sensor 180A.
  • touch operations that act on the same touch position but have different touch operation intensities may correspond to different operation instructions. For example, when a touch operation with a touch operation intensity less than the first pressure threshold acts on the short message application icon, an instruction to view the short message is executed. When a touch operation with a touch operation intensity greater than or equal to the first pressure threshold acts on the short message application icon, an instruction to create a new short message is executed.
  • the gyro sensor 180B may be used to determine the movement posture of the electronic device 100.
  • the angular velocity of the electronic device 100 around three axes ie, x, y, and z axes
  • the gyro sensor 180B can be used for shooting anti-shake.
  • the gyro sensor 180B detects the shaking angle of the electronic device 100, calculates the distance that the lens module needs to compensate based on the angle, and allows the lens to counteract the shaking of the electronic device 100 through reverse movement to achieve anti-shake.
  • the gyro sensor 180B can also be used for navigation and somatosensory game scenes.
  • the air pressure sensor 180C is used to measure air pressure.
  • the electronic device 100 calculates the altitude by using the air pressure value measured by the air pressure sensor 180C to assist positioning and navigation.
  • the magnetic sensor 180D includes a Hall sensor.
  • the electronic device 100 can detect the opening and closing of the flip holster using the magnetic sensor 180D.
  • the electronic device 100 may detect the opening and closing of the clamshell according to the magnetic sensor 180D.
  • characteristics such as automatic unlocking of the flip cover are set.
  • the acceleration sensor 180E can detect the magnitude of acceleration of the electronic device 100 in various directions (generally three axes). When the electronic device 100 is stationary, the magnitude and direction of gravity can be detected. It can also be used to recognize the posture of electronic devices, and be used in applications such as horizontal and vertical screen switching and pedometers.
  • the distance sensor 180F is used to measure the distance.
  • the electronic device 100 can measure the distance by infrared or laser. In some embodiments, when shooting scenes, the electronic device 100 may use the distance sensor 180F to measure distance to achieve fast focusing.
  • the proximity light sensor 180G may include, for example, a light emitting diode (LED) and a light detector, such as a photodiode.
  • the light emitting diode may be an infrared light emitting diode.
  • the electronic device 100 emits infrared light outward through the light emitting diode.
  • the electronic device 100 uses a photodiode to detect infrared reflected light from nearby objects. When sufficient reflected light is detected, it may be determined that there is an object near the electronic device 100. When insufficient reflected light is detected, the electronic device 100 may determine that there is no object near the electronic device 100.
  • the electronic device 100 can use the proximity light sensor 180G to detect that the user holds the electronic device 100 close to the ear to talk, so as to automatically turn off the screen to save power.
  • the proximity light sensor 180G can also be used in leather case mode, pocket mode automatically unlocks and locks the screen.
  • the ambient light sensor 180L is used to sense the brightness of ambient light.
  • the electronic device 100 can adaptively adjust the brightness of the display screen 194 according to the perceived brightness of the ambient light.
  • the ambient light sensor 180L can also be used to automatically adjust the white balance when taking pictures.
  • the ambient light sensor 180L can also cooperate with the proximity light sensor 180G to detect whether the electronic device 100 is in a pocket to prevent accidental touch.
  • the fingerprint sensor 180H is used to collect fingerprints.
  • the electronic device 100 can use the collected fingerprint characteristics to realize fingerprint unlocking, access to application locks, fingerprint taking pictures, fingerprint answering calls, and the like.
  • the temperature sensor 180J is used to detect the temperature.
  • the electronic device 100 uses the temperature detected by the temperature sensor 180J to execute a temperature processing strategy. For example, when the temperature reported by the temperature sensor 180J exceeds a threshold, the electronic device 100 performs performance reduction of the processor located near the temperature sensor 180J in order to reduce power consumption and implement thermal protection. In other embodiments, when the temperature is lower than another threshold, the electronic device 100 heats the battery 142 to avoid abnormal shutdown of the electronic device 100 due to low temperature. In some other embodiments, when the temperature is below another threshold, the electronic device 100 performs boosting on the output voltage of the battery 142 to avoid abnormal shutdown due to low temperature.
  • Touch sensor 180K also known as "touch panel”.
  • the touch sensor 180K may be provided on the display screen 194, and the touch sensor 180K and the display screen 194 constitute a touch screen, also called a "touch screen”.
  • the touch sensor 180K is used to detect a touch operation acting on or near it.
  • the touch sensor can pass the detected touch operation to the application processor to determine the type of touch event.
  • the visual output related to the touch operation can be provided through the display screen 194.
  • the touch sensor 180K may also be disposed on the surface of the electronic device 100, which is different from the location where the display screen 194 is located.
  • the bone conduction sensor 180M can acquire vibration signals.
  • the bone conduction sensor 180M can acquire the vibration signal of the vibrating bone mass of the human body part.
  • the bone conduction sensor 180M can also contact the pulse of the human body and receive a blood pressure beating signal.
  • the bone conduction sensor 180M may also be provided in the earphone and combined into a bone conduction earphone.
  • the audio module 170 may parse out the voice signal based on the vibration signal of the vibrating bone block of the voice part acquired by the bone conduction sensor 180M to realize the voice function.
  • the application processor may analyze the heart rate information based on the blood pressure beating signal acquired by the bone conduction sensor 180M to implement the heart rate detection function.
  • the key 190 includes a power-on key, a volume key, and the like.
  • the key 190 may be a mechanical key. It can also be a touch button.
  • the electronic device 100 can receive key input and generate key signal input related to user settings and function control of the electronic device 100.
  • the motor 191 may generate a vibration prompt.
  • the motor 191 can be used for vibration notification of incoming calls and can also be used for touch vibration feedback.
  • touch operations applied to different applications may correspond to different vibration feedback effects.
  • the motor 191 can also correspond to different vibration feedback effects.
  • Different application scenarios for example: time reminder, receiving information, alarm clock, game, etc.
  • Touch vibration feedback effect can also support customization.
  • the indicator 192 may be an indicator light, which may be used to indicate a charging state, a power change, and may also be used to indicate a message, a missed call, a notification, and the like.
  • the SIM card interface 195 is used to connect a SIM card.
  • the SIM card can be inserted into or removed from the SIM card interface 195 to achieve contact and separation with the electronic device 100.
  • the electronic device 100 may support 1 or N SIM card interfaces, where N is a positive integer greater than 1.
  • the SIM card interface 195 can support Nano SIM cards, Micro SIM cards, SIM cards, etc.
  • the same SIM card interface 195 can insert multiple cards at the same time. The types of the multiple cards may be the same or different.
  • the SIM card interface 195 can also be compatible with different types of SIM cards.
  • the SIM card interface 195 can also be compatible with external memory cards.
  • the electronic device 100 interacts with the network through a SIM card to realize functions such as call and data communication.
  • the electronic device 100 uses eSIM, that is, an embedded SIM card.
  • the eSIM card can be embedded in the electronic device 100 and cannot be separated from the electronic device 100.
  • the structure of the electronic device 400 may also refer to the structure of the electronic device 100 shown in FIG. 4 and will not be described in detail.
  • a skill developer (which can be a third-party application developer or a service provider, etc.) can log in to the server 200 through the electronic device 400 and configure a new service, which can also be called a Skills.
  • FIG. 5 to FIG. 8 show some interface diagrams involved in the process of configuring a new skill by a skill developer. Skill developers configure new skills mainly involving the following steps:
  • the skill developer can log in to the skill management page of the man-machine dialogue platform through the electronic device 400 and start to configure new skills. For example, a web address associated with the man-machine dialogue platform may be entered in the browser of the electronic device 400 to log in to the skill management page. Alternatively, you can use the corresponding APP to log in to the skill management page.
  • the homepage 400 may include controls 401 and 402.
  • Skill templates can be provided on the man-machine dialogue platform. These skill templates cover some usage scenarios. Skill developers can partially modify these templates to achieve their personalized needs.
  • the skill developer can select the control 402 and use the skill template provided on the man-machine dialogue platform to configure the new skill.
  • the skill developer may select the control 401 to add a custom skill according to the service provided by himself to provide voice interaction and corresponding services for the end user. The following uses the skill developer to select the control 401 and add a custom skill as an example for description.
  • the electronic device 400 After detecting the skill developer selection control 401, the electronic device 400 enters an interface for adding a custom skill. As shown in (2) in FIG. 5, it is a page 500 for setting new skill basic information.
  • the basic information of the new skills can be set on the page 500, such as: skill identification, skill name, skill classification, and wake-up words.
  • the skill ID is a globally unique ID of a certain skill, and the skill ID of each skill cannot be repeated.
  • the skill name is a descriptive name of the skill, which facilitates the skill developer to manage the created skills by himself, and does not require repeatability. Skill developers need to choose a category for each skill (which can also be understood as the specific scenario mentioned above), which is used for searching and filtering when matching user statements. Each skill can only belong to one category. Accurate selection of skill categories helps to match user claims with the intentions in the skills more quickly and accurately. Awakening words can be understood as an alias for a skill. After the user speaks the alias, the man-machine dialogue platform can quickly access the skill to obtain the corresponding service.
  • a page 600 of creation intention displayed for the electronic device 400 may include the intent name, above context, below context, and so on.
  • Context is mainly used in multi-round dialogue scenarios.
  • the upper context is used to trigger the current intent, and the lower context is used to associate the next round of intent.
  • each sentence spoken by the user corresponds to an intention of the user, and the purpose of the sentence said by the user.
  • Each skill is composed of several intentions. Intent matching in order to understand the needs of users and provide corresponding services. When users use this skill, they will express their intentions through various expressions. Therefore, the skill developer needs to input as many possible expressions (that is, user statements) that various users have in daily life in order to express the intention in the intention configuration, so that the recognition of the intention is more accurate.
  • a page 601 for setting the user statement in the intention creation page is displayed on the electronic device 400.
  • the page 601 may include one or more controls 602 for adding user statements.
  • the page 601 can also display information items 603 of existing user statements.
  • the skill developer can add a new user statement by inputting a new user statement in the text box in the control 602, and clicking a function button of "add".
  • the human-machine dialogue platform can automatically identify the entity in the newly added user's statement, and associate the identified entity with the slot and the slot type. In other embodiments, if the human-machine dialogue platform does not automatically mark the slot or the mark is incorrect, the skill developer may choose to manually mark the slot and associate the marked slot with the slot type.
  • the slot refers to the key information contained in the user statement used to express the intent, can be understood as the keyword in the user statement, a slot corresponds to a slot type, the slot can be other Words are filled with values.
  • the slot type can be understood as a collection of vocabulary in a certain field.
  • the slot information in the user statement is composed of various slot types, and the words in the same slot type can be replaced and replaced in the corresponding slot information The identification is extracted.
  • the electronic device 400 pops up a dialogue as shown in (2) in FIG. 6 Box 604.
  • the skill developer can view and modify the slots marked in the newly added user statement and the associated slot type through the dialog box 604.
  • the dialog 604 may also display a control 605 for adding a slot type, so that when a slot type is associated and there is no suitable slot type to select, a corresponding slot type may be added.
  • the dialog 604 may also display a control 606 for viewing a list of slots. In response to the skill developer clicking the control 606, the electronic device 400 displays a page 608 as shown in (3) in FIG.
  • the page 608 may also include a control 607 for adding a new slot, which can be used to add a slot that the user claims.
  • a skill developer can configure a questioning mechanism for required slots and a confirmation mechanism for non-required key slots in interface 608.
  • a question is set in the slot.
  • the question can be a default question or a question customized by a skill developer.
  • the question in that slot is set to None by default and cannot be changed.
  • the newly added user said, "Is it raining in the capital this Friday?”
  • the marked slots include time slots and city slots.
  • the type of the slot corresponding to the time slot is sys.time, and the attribute of the time slot is a non-required key slot. That is to say, when the human-machine dialogue platform does not extract the information of the time slot, the human-machine dialogue platform will actively ask the user, and the content of the question is "question 1". It is up to the user to determine whether the information in the time slot can be missing. If not, the information in the time slot is extracted from the user's answer, and then the subsequent operations are performed. If it is missing, the human-machine dialogue platform considers that there is no information of the time slot and directly executes the subsequent operation.
  • the type of the slot corresponding to the city slot is sys.local.city, and the attribute of the city slot is a required slot. That is to say, when the human-machine dialogue platform does not extract the information of the city slot, the human-machine dialogue platform will actively ask questions to the user, and the content of the question is "question 2". Then extract the information of the city slot from the user's answer, and then perform subsequent operations.
  • non-required and non-critical slots can be marked in the new user statement. That is to say, when the man-machine dialogue platform does not extract the non-mandatory and non-critical information, the man-machine dialogue platform considers that there is no such non-mandatory and non-key information, and directly performs subsequent operations.
  • the slot types mainly include a system slot type and a custom slot type (also called a user dictionary).
  • the system slot type is a slot type preset by the human-machine dialogue platform.
  • the words in the system slot type are not enumerable, for example: sys.time, sys.location.city, sys.name, sys, phoneNum, etc.
  • the custom slot type is a slot type defined by the skill developer. The number of words in the custom slot type is limited.
  • an editing page 700 of a slot type displayed for the electronic device 400 As shown in FIG. 7, an editing page 700 of a slot type displayed for the electronic device 400.
  • the skill developer can input the text of the newly added custom slot type in the input box 701, and press the Enter key to confirm. You can enter the value of the newly added custom slot type below the value item 702. You can enter the synonym under the synonym 703 corresponding to the newly added custom slot type and click the "Save" button. You can complete a new custom slot type.
  • the editing page 700 of the slot type may also implement modification and deletion of the custom slot type through multiple controls shown in the area 704. In some embodiments, the editing page 700 of the slot type may also support adding slot types in batches.
  • a skill developer can add a batch of slot types by clicking the batch add button 705 and choosing to upload files of a specific file type or a specific file format.
  • the file of the specific file type or the specific file format contains one or more pieces of information of the type of the slot to be added.
  • the embodiments of the present application do not limit this.
  • the electronic device 400 may display the page 800.
  • the skill developer can click the "start training" control 801 to notify the man-machine dialogue platform to start training the man-machine dialogue model corresponding to the new skill.
  • the man-machine dialogue module corresponding to the new skill trained by the man-machine dialogue platform may include: a domain classification model, an intention classification model, a slot extraction model, and so on.
  • the domain classification model can be used to classify domains of user utterances.
  • the intent classification model can be used to subdivide user utterances in the corresponding fields and identify the intent of the new skills corresponding to user utterances.
  • the slot extraction model can be used to extract slot information in user utterances. In this way, the subsequent operation corresponding to the user's intention can be performed according to the user's intention output by the intention classification model and the slot information output by the slot extraction model.
  • the electronic device 400 may display the page 900.
  • the skill developer can click on the "release skills" control 902 to notify the man-machine dialogue platform to release the new skill, and push the man-machine dialogue model corresponding to the new skill online, and then, other terminals can communicate with the man-machine dialogue platform Conduct a dialogue to acquire the new skills that enable the man-machine dialogue platform.
  • the page 900 may also include a "retraining" control 901 through which the skill developer can retrain the man-machine dialogue model corresponding to the new skill.
  • a human-computer interaction method provided by an embodiment of the present application can be applied to the interaction between the electronic device 100 and the server 200.
  • the method specifically includes the following steps:
  • the server 200 receives the first input.
  • the user may make a corresponding service request to the server 200 in the form of voice or text.
  • the server 200 may recognize the voice through the automatic voice recognition module, and recognize it as a text form, which is the first input, and input it into the natural language understanding module. If the user inputs in the form of text, the server 200 inputs the text input by the user into the natural language understanding module as the first input.
  • the first input may be a utterance in a single round of dialogue between the user and the server 200, or multiple utterances in multiple rounds of dialogue between the user and the server 200, which is not limited in this embodiment of the present application.
  • the server 200 performs domain classification according to the first input, and determines the first domain corresponding to the first input.
  • the domain classification module in the natural language understanding module can search and filter based on the first input to determine which specific task scenario (ie, the first domain) the user's intention belongs to in the first input, and distribute the first input to In the specific task scenario (ie the first field).
  • the server 200 distributes the first input to the first field, and recognizes that the first input corresponds to the first intention.
  • the intention recognition module in the natural language understanding module can further subdivide the user's intention in the first input into sub-scenarios under specific task scenarios, that is, recognize the user's intention corresponding to the first input (ie, the first intention).
  • the server 200 extracts the information of each slot in the first intent from the first input according to the slot configuration corresponding to the first intent.
  • the first intention is an intention in a certain skill on the server 200.
  • the skill developer will configure corresponding slots for the first intent in the skill, that is, which slots need to be extracted for the first intent and the attributes of each slot. Therefore, after determining the first intention corresponding to the first input, the slot extraction module in the server 200 can find the slot configuration corresponding to the first intention.
  • the slot extraction module in the server 200 can identify the entities contained in the first input, call the slot extraction model stored in the slot extraction module, and perform operations on these entities to determine that these entities respectively correspond to the first intent Label the corresponding slots for these entities. It can also be considered that these entities are confirmed as the values of the corresponding slots, that is, the information extracted from these slots.
  • the slot extraction module identifies the entity A in the first input, inputs the entity A into the algorithm corresponding to each slot in the slot extraction model, and calculates the confidences corresponding to the entity A. If the confidence calculated by the algorithm entered by the entity A into the slot A does not satisfy the preset condition, for example, less than a preset threshold, such as the threshold C, the entity A is considered to be not the information of the slot A. If the confidence calculated by the algorithm entered by the entity A into the slot B meets the preset condition, for example: greater than or equal to the threshold C, the entity A is regarded as the information of the slot B.
  • the information of some slots may be set by the user by default, or may be obtained by other means, and may not necessarily be extracted from the first input.
  • the first intention is to "book a ticket”
  • the preset slot configuration in the "book a ticket” can have a time slot, a starting slot and a destination slot. If the user says “book a flight to Shanghai tomorrow" (ie the first input). Then, the server 200 can recognize multiple entities in the first input, for example: “tomorrow”, "Shanghai”. The server 200 can input “tomorrow” into the algorithm corresponding to the time slot in the slot extraction model to obtain “tomorrow” as the confidence of the time slot meets the preset condition, that is, "tomorrow” can be regarded as "booking a ticket” The value of the time slot in "”. That is, the server 200 extracts the information of the time slot in the first intention.
  • the server 200 can input "Shanghai” into the algorithm corresponding to the destination slot in the slot extraction model to obtain "Shanghai” as the destination slot.
  • the confidence level satisfies the preset condition, that is, "Shanghai” can be regarded as The value of the destination slot in "Book Ticket”. That is, the server 200 extracts the information of the destination slot in the first intention.
  • the current position of the electronic device 100 used by the user can be obtained through GPS as the value of the starting slot, or the default address set by the user can be the value of the starting slot, etc. Information about the intended starting slot.
  • the server 200 determines that the information of the first slot in the first intention is not extracted.
  • step S104 there is a case where the first input may not contain information of certain slots in the first intention (for example: the user did not say, or although the user said, but automatic speech recognition is wrong Or the user input is incorrect), or if the slot extraction model of the server 200 is not accurate enough, the server 200 may not extract certain slot information in the first intention in the first input. For this reason, step S106 and subsequent steps need to be performed.
  • the server 200 determines the attribute of the first slot.
  • the attributes of the first slot include required slots, non-required key slots, and non-required non-key slots. If the first slot is a required slot, step S107 is performed; if the first slot is a non-required non-critical slot, then step S110 is performed; if the first slot is a non-required critical slot, then performed Step S111.
  • the slot extraction module in the server 200 sends the unextracted result of the first slot to the conversation management module.
  • the dialog management module judges the attribute of the first slot to determine the subsequent operation according to the attribute of the first slot.
  • the server 200 asks the user for information about the first slot.
  • the dialogue management module sends a question to the user about the first slot according to the attributes of the first slot and the preset dialogue strategy.
  • the server 200 may ask the user to say it again, or may ask again questions previously interacted with the user, or may ask questions about the missing first slot.
  • the embodiments of the present application do not limit the content and manner of questions.
  • the server 200 receives the second input.
  • the second input is the user's answer based on the question from the server 200. If the user answers in the form of voice, the automatic voice recognition module in the server 200 can convert the voice into text to obtain the second input. If the user answers in the form of text, the server 200 uses the text input by the user as the second input. The server sends the determined second input to the natural language understanding module.
  • the second input may be a utterance in a single round of dialogue between the user and the server 200, or multiple utterances in a multiple round of dialogue between the user and the server 200, which is not limited in the embodiments of the present application.
  • the server 200 fills the first slot in the first intention according to the second input.
  • the slot extraction module in the natural language understanding module recognizes the entity in the second input and calls the algorithm corresponding to the first slot in the slot extraction model stored thereon to perform operations to identify the entity corresponding to the first slot, That is, the determined entity is used as the value of the first slot, that is, the information extracted into the first slot. Then, step S116 is executed.
  • the server 200 does not need to fill the first slot.
  • the slot extraction module in the natural language understanding module determines that the information of the first slot is not filled, that is, it is not necessary to determine the value of the first slot. Go to step S116.
  • the server 200 asks the user to confirm whether the information in the first slot is necessary.
  • the dialog management module determines that the first slot is a non-required key slot (ie, step S106), regardless of whether the user has information about the first slot, he can directly report to the user Ask questions about the information in the first slot, and the user confirms whether the information in the first slot needs to be completed.
  • the manner and content of asking questions about the information of the first slot are not limited.
  • the dialog management module may further determine whether the user may have said the information of the first slot.
  • the user is most likely to say the first slot information
  • the user is asked a question about the first slot information, and the user confirms whether the first slot information needs to be completed. In this way, the user is targeted to confirm, which is helpful to reduce the situation of disturbing the user.
  • the server 200 may refer to the following description for the process of determining whether the user may say the information of the first slot, which will not be repeated here.
  • the server 200 receives the third input of the electronic device.
  • the third input is the user's answer based on the question from the server 200. If the user answers in the form of voice, the automatic voice recognition module in the server 200 can convert the voice into text to obtain a third input. If the user answers in the form of text, the server 200 uses the text input by the user as the third input. The server sends the determined third input to the natural language understanding module.
  • the third input may be a utterance in a single round of dialogue between the user and the server 200, or multiple utterances in multiple rounds of dialogue between the user and the server 200, which is not limited in this embodiment of the present application.
  • step S113 The server 200 confirms whether the information of the first slot is necessary according to the third input of the electronic device 100. If necessary, step S114 is executed, otherwise, step S115 is executed.
  • the server 200 fills the first slot according to the third input.
  • step S109 Refer to step S109, and then execute step S116.
  • the server 200 does not need to fill the first slot.
  • step S116 is executed.
  • the server 200 performs the operation corresponding to the first intention according to the first intention and the slot information in the extracted first intention.
  • step S201 is also included, as follows:
  • step S201 The server 200 determines whether the first input may contain the information of the first slot. If it is confirmed that the first input may contain the information of the first slot, step S111 is executed, otherwise step S115 is executed.
  • the server 200 may recognize the voice as "open MSI settings” because of the accent or pause in the user's speech. Then, the server 200 does not extract "MSI”. If the server 200 does not confirm to the user, it is likely to directly recognize the user's intention as "open setting”, that is, to perform the opening system setting, which is different from the goal of the setting that the user wishes to perform to open the WeChat application.
  • the server 200 trains to generate the slot extraction model.
  • the embodiments of the present application provide the following two methods, which can be used to confirm that the user may say the information of the first slot. details as follows:
  • Method 1 For user input error or speech recognition error.
  • the server 200 may adopt an error correction method to determine that the entity identified from the user's utterance (ie, the first input) is similar to the keyword in the user dictionary, and then trigger the confirmation mechanism to the user.
  • an algorithm based on pinyin similarity, a string similarity algorithm, etc. may be used to calculate the identity of the entity identified in the first input and the keyword in the user dictionary Edit the distance to determine how similar the two are. It is also possible to calculate the similarity of words or phrases using deep learning word vectors and sentence vectors.
  • the embodiment of the present application does not limit the method for calculating the similarity.
  • the editing distance refers to the minimum number of editing operations required to convert from one string to another between two strings.
  • the editing operation may include replacing one character with another character, inserting a character, and deleting a character.
  • the server 200 confirms that the first slot corresponds to a custom slot type. Since the words in the custom slot type are user-defined, there is a limited number. Therefore, the server 200 can traverse all the words in the custom slot type corresponding to the first slot, respectively calculate the editing distance between the entity determined in the user's utterance and each word in the custom slot type, and determine from these editing distances The minimum editing distance. The entity in the first input corresponding to the minimum edit distance can be confirmed as a potential entity of the first slot, that is, information of the first slot may be.
  • the server 200 can determine the size of the minimum value and the threshold A.
  • the threshold A may be a threshold set by a developer or a user. If the minimum value is less than the threshold value A, it may be considered that the user does not say the information of the first slot, that is, the first input does not contain the information of the first slot. Then, the server 200 may not confirm to the user. If the minimum value is greater than or equal to the threshold A, it may be considered that the user may have spoken the information of the first slot. Thus, the server 200 can confirm to the user.
  • the user's purpose is "book a ticket to Shanghai tomorrow"
  • the wrong input is "book a ticket to Shanghai tomorrow”.
  • the first intention is to "book a ticket”
  • the slots included in the first intention include a time slot, a departure slot and a destination slot. It is assumed here that the destination slot corresponds to the user dictionary 1.
  • the server 200 does not recognize the information of the destination slot.
  • the server 200 can recognize that the entities in the first input are "tomorrow” and "shangha”.
  • the server 200 calculates the distance between "tomorrow” and all words in the user dictionary 1 and the distance between "Shangha” and all words in the user dictionary 1, respectively.
  • the developer or user may also set a threshold B, which is greater than the threshold A. If the minimum value is greater than or equal to the threshold B, it indicates that the potential entity is very similar to the word in the custom slot type, and it can basically be considered that the user has spoken the information of the first slot. In other words, the server may directly confirm the potential entity as the information of the first slot without confirming with the user. If the above minimum value is less than or equal to the threshold value B and greater than the threshold value A, it may be considered that the user may have said the information of the first slot, that is, the potential entity may be the information of the first slot, so the server can further confirm .
  • the embodiments of the present application are not limited.
  • the server 200 confirms that the first slot corresponds to the system slot type. Because the words in the system slot type are not enumerable. Therefore, it is not possible to calculate the edit distance between the entity word in the first input and all words in the system slot type by traversing all the words in the system slot type. Therefore, the server 200 cannot confirm whether the user said the information of the first slot. In order to avoid causing excessive disturbance to the user, the user may not be required to confirm the information of the first slot.
  • the second case is that the slot extraction model is not accurate enough, so that the server 200 does not extract the information of the first slot.
  • the slot extraction model may use, for example, a named entity recognition (Named Entity Recognition, NER) method to identify the entity in the first input, and input the identified entity into the algorithm corresponding to the first slot in the slot extraction model, Calculate the confidence of each entity.
  • the slot extraction model may not directly identify the entity, and directly input each word included in the first input into the algorithm corresponding to the first slot in the slot extraction model to calculate the confidence of each word.
  • the calculated confidence of each entity or each word satisfies certain conditions, it is considered that the user may have spoken the information of the first slot before confirming with the user.
  • NER Named Entity Recognition
  • the server 200 may input each entity in the first input to the slot extraction model, and calculate the confidence of each entity. Only after the certain entity's confidence level meets certain conditions, will the user be confirmed.
  • the slot extraction model does not extract the information of the first slot. It can be understood that the slot extraction model in the server 200 gives the slot labeling probability value to the entity identified from the user utterance below the recognition threshold. Then, at this time, the user may set a confirmation threshold, and when the slot extraction model gives a slot labeling probability value to an entity identified from the user's utterance greater than the confirmation threshold, the server 200 triggers a confirmation mechanism to the user. That is, when the confidence of the model extraction model corresponding to a certain entity or certain entities in the first input is greater than the confirmation threshold and less than the recognition threshold, the server 200 confirms the information of the first slot to the user.
  • the slot extraction model may not correctly identify the entity the first time, but it can correctly identify the entity the second time. This is because, when the user speaks the entity for the first time, the statement is likely to contain other entities, that is, the entity has a context. When the slot extraction model is not accurate enough, the entity may not be recognized because these contexts are not recognized. Then, when the server 200 cannot identify the entity for the first time, the user is asked a question about the entity, then the user's answer is to answer the entity. At this time, the user's answer may contain only the entity, or contain very little context. Then, the slot extraction model is likely to identify the entity this time.
  • the entity can also be identified by a non-slot extraction model, for example, a rule can be enabled to identify the entity.
  • the rule refers to that it can be identified by combining the context logic of the user's answer, the relevance of the user's intention, the correspondence between the entity and the first slot, and other factors. In this way, the probability that the server 200 recognizes the user to speak the entity for the second or more times can also be effectively increased.
  • the above-mentioned terminal or the like includes a hardware structure and / or a software module corresponding to each function.
  • the embodiments of the present application can be implemented in the form of hardware or a combination of hardware and computer software. Whether a function is executed by hardware or computer software driven hardware depends on the specific application and design constraints of the technical solution. Professional technicians can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of the embodiments of the present invention.
  • the embodiments of the present application may divide the above-mentioned terminals and the like into function modules according to the above method examples.
  • each function module may be divided corresponding to each function, or two or more functions may be integrated into one processing module.
  • the above integrated modules can be implemented in the form of hardware or software function modules. It should be noted that the division of the modules in the embodiment of the present invention is schematic, and is only a division of logical functions. In actual implementation, there may be another division manner.
  • FIG. 11 it is a schematic diagram of a hardware structure of a server 200 disclosed in this embodiment of the present application.
  • the server 200 includes at least one processor 201, at least one memory 202, and at least one communication interface 203.
  • the server 200 may further include an output device and an input device, not shown in the figure.
  • the processor 201, the memory 202, and the communication interface 203 are connected through a bus.
  • the processor 201 may be a general-purpose central processing unit (Central Processing Unit, CPU), a microprocessor, an application-specific integrated circuit (Application-Specific Integrated Circuit, ASIC), or one or more of which are used to control the execution of the program program of this application integrated circuit.
  • the processor 201 may also include multiple CPUs, and the processor 201 may be a single-core (single-CPU) processor or a multi-core (multi-CPU) processor.
  • the processor here may refer to one or more devices, circuits, or processing cores for processing data (eg, computer program instructions).
  • the memory 202 may be a read-only memory (Read-Only Memory, ROM) or other types of static storage devices that can store static information and instructions, a random access memory (Random Access, Memory, RAM), or other types that can store information and instructions
  • the dynamic storage device can also be electrically erasable programmable read-only memory (Electrically Erasable Programmable Read-Only Memory, EEPROM), read-only disc (Compact Disc Read-Only Memory, CD-ROM) or other disc storage, disc storage (Including compact discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or can be used to carry or store desired program code in the form of instructions or data structures and can be used by a computer Access to any other media, but not limited to this.
  • the memory 202 may exist independently, and is connected to the processor 201 through a bus.
  • the memory 202 may also be integrated with the processor 201.
  • the memory 202 is used to store application program codes for executing the solution of the present application, and is controlled and executed by the processor 201.
  • the processor 201 is used to execute the computer program code stored in the memory 202, so as to implement the human-computer interaction method described in the embodiments of the present application.
  • the communication interface 303 can be used to communicate with other devices or communication networks, such as Ethernet, wireless local area networks (WLAN), and so on.
  • devices or communication networks such as Ethernet, wireless local area networks (WLAN), and so on.
  • the output device communicates with the processor and can display information in a variety of ways.
  • the output device may be a liquid crystal display (Liquid Crystal Display, LCD), a light emitting diode (Light Emitting Diode, LED) display device, a cathode ray tube (Cathode Ray Tube, CRT) display device, or a projector (projector), etc.
  • the input device communicates with the processor and can receive user input in a variety of ways.
  • the input device may be a mouse, keyboard, touch screen device, or sensor device.
  • the functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or software function unit.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a computer-readable storage medium.
  • the technical solutions of the embodiments of the present application may be essentially or part of the contribution to the existing technology or all or part of the technical solutions may be embodied in the form of software products, and the computer software products are stored in a storage
  • the medium includes several instructions to enable a computer device (which may be a personal computer, server, or network device, etc.) or processor to perform all or part of the steps of the methods described in the embodiments of the present application.
  • the foregoing storage media include: flash memory, mobile hard disk, read-only memory, random access memory, magnetic disk or optical disk, and other media that can store program codes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • General Health & Medical Sciences (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • User Interface Of Digital Computer (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种人机交互的方法及电子设备(100),涉及通信技术领域,有利于准确识别用户的目的,满足用户的需求,提升用户体验,该方法具体包括:在人机对话交互的过程中,服务器(200)对用户话语进行槽位提取时,若存在有未提取到信息的槽位,且该槽位为非必填关键槽位时,服务器(200)向用户发问,以确定该槽位的信息是否必要,若为必要信息,则服务器(200)进一步提取该槽位的信息,若不必要,则服务器(200)不再提取该槽位的信息。

Description

一种人机交互的方法及电子设备 技术领域
本申请涉及通信技术领域,尤其涉及一种人机交互的方法及电子设备。
背景技术
人机对话系统,或称之为人机对话平台、聊天机器人(chatbot)等,是新一代的人机交互界面。聊天机器人可以与用户进行对话,并在对话过程中识别出用户的意图,并为该用户提供例如订餐、订票、打车等服务。
如图1A所示,为一个聊天机器人与用户的对话过程的举例。以该举例说明聊天机器人的工作过程,该工作过程可以包括:开放域对话、准入条件和封闭域对话。其中,开放域对话是指聊天机器人还未识别出用户意图进行对话。在用户说出“帮我叫个车”时,聊天机器人经过逻辑判断(即准入条件),确定出用户意图(即叫车服务)后,跳转到封闭域对话。封闭域对话是指识别用户意图后,为了明确用户目的(或者称为明确任务细节)而进行的对话。
封闭域对话中具体包含填充槽位(简称为填槽)、澄清话术和响应结果的过程。其中,填槽过程指的是为了让用户意图转化为用户明确的指令而补全信息的过程。其中,槽位可以理解为用户用来表达意图的关键信息。例如,在图1A中所示的对话中,打车服务的槽位有:出发地的槽位、目的地的槽位和出发时间的槽位。聊天机器人根据与用户的对话,提取这些槽位的信息(例如包括槽位的取值等)。当槽位的信息中缺乏一些必要信息时,聊天机器人会主动发问,由用户进行回答,以便聊天机器人在从用户回答中补全必要的槽位信息,这个过程称之为澄清话术的过程。当聊天机器人将槽位信息收集齐全后,可以执行相应的操作,例如为用户去打车应用下单,下单后告知用户,即响应结果的过程。
目前,槽位有两种分类,一类为必填槽位,另一类为非必填槽位。当聊天机器人未提取到必填槽位的信息时,聊天机器人主动发问,要求用户澄清,直到提取到必填槽位的信息。当聊天机器人未提取到非必填槽位的信息时,聊天机器人不会发问,直接按照没有非必填槽位的信息,去执行相应的操作。
在实际的场景中,常常会因为用户输入错误或者语音识别错误,或者槽位提取的算法不足够准确等因素,造成聊天机器人未提取到一些非必填槽位的关键信息,进而造成聊天机器人之后执行的操作可能不满足用户的需求。例如:在图1A所示的对话中,用户说出的“拼车吧”可以是一个非必填槽位的关键信息。当聊天机器人未准确提取出该关键信息时,可能不会为用户预约拼车,违背了用户的意愿,严重影响用户体验。
发明内容
本申请提供的一种人机交互的方法及电子设备,可以准确识别用户的目的,满足 用户的需求,提升用户体验。
第一方面,本申请提供的方法,可运用于人机对话系统中,包括:服务器接收第一输入,第一输入包含用户的服务需求;服务器根据第一输入,确定第一输入对应的第一领域,第一领域为用户的服务需求对应的任务场景;服务器将第一输入分发到第一领域对应的意图识别模型中,识别出第一输入对应的第一意图,第一意图为第一领域中的子场景;服务器从第一输入中提取第一意图中第一槽位的信息;其中,第一意图中预先配置有第一槽位,且第一槽位为非必填关键槽位;当服务器确定未提取到第一槽位的信息时,服务器向用户提问,以确定第一槽位的信息是否必要;服务器接收第二输入,第二输入包含用户确认的第一槽位的信息是否必要的信息;若用户确认第一槽位的信息为必要信息,则服务器从第二输入中提取第一槽位的信息;服务器根据第一意图,以及第一槽位的信息执行第一意图对应的操作;若用户确认第一槽位的信息为非必要信息,则服务器不提取第一槽位的信息;服务器根据第一意图执行第一意图对应的操作。
其中,第一输入可以是用户与服务器200的单轮对话中的一次话语,也可以是用户与服务器200的多轮对话中的多次话语,本申请实施例不做限定。
其中,第二输入可以是用户与服务器200的单轮对话中的一次话语,也可以是用户与服务器200的多轮对话中的多次话语,本申请实施例不做限定。
可以理解的是,非必填关键槽位是指,用户在表达自己意图时,并不一定要表达该槽位的信息。若没有表达该槽位的信息,则聊天机器人可以忽略该槽位的信息。但如果用户表达了该槽位的信息,则聊天机器人需要准确提取到该槽位的信息。
由此可见,本申请实施例中,在服务器根据用户话语自动提取预先设置的各个槽位的信息的过程中,若存在未提取到信息的槽位时,且该槽位为非必填关键槽位时,聊天机器人会主动向用户进行确认。确认是否可以缺少该非必填关键槽位的信息,若不可以缺少,则根据用户的回答继续提取该非必填关键槽位的信息。若可以缺少,则不再提取该非必填关键槽位的信息,也就是不再向用户进行确认。这样,当聊天机器人在未提取出非必填关键槽位的信息时,还可以向用户进行确认,以确保准确识别用户的目的,满足用户的需求,提升用户体验。
一种可能的实现方式中,服务器从第一输入中提取第一意图中第一槽位的信息包括:
服务器将第一输入中识别出的各个词或各个实体输入到第一槽位对应的槽位提取模型中,分别计算第一输入中各个词或各个实体对应的置信度;若第一输入中的第一词或者第一实体的置信度大于或等于第一阈值,则服务器确认第一词或者第一实体为第一槽位的信息;若第一输入中各个词或者各个实体的置信度均小于第一阈值,则服务器确定未提取到第一槽位的信息。
一种可能的实现方式中,该方法还包括:若第一槽位对应于自定义槽位类型,则服务器分别计算第一输入中识别出的各个实体与自定义槽位类型中的各个词的相似度。
若第一输入中识别出的各个实体与自定义槽位类型中的各个词的相似度均小于第二阈值,则服务器确认第一输入中不含有第一槽位的信息;若第一输入中第二实体与 自定义槽位类型中的第二词的相似度大于或等于第三阈值,则服务器确认第二词为第一槽位的信息;若第一输入中存在任一实体与自定义槽位类型中的任一词的相似度大于或等于第二阈值,且小于第三阈值时,服务器确认向用户提问,以确定第一槽位的信息是否必要。
其中,判断实体与用户词典中的关键词的相似度时,例如可以采用基于拼音相似度的算法,字符串相似度的算法等,计算第一输入中识别出的实体与用户词典中关键词的编辑距离,以确定两者的相似程度。还可以是采用深度学习词向量、句子向量等方法计算词或短语的相似度。本申请实施例对计算相似度的方法不做限定。
考虑到在用户输入错误,或者语音识别错误的情况下,可能造成用户说了第一槽位的信息,但服务器未提取到第一槽位的信息。服务器200可以采用纠错的方法,确定从用户话语(即第一输入)中识别的实体与用户词典中的关键词较为相似时,才触发向用户进行确认的机制。这样,有利于减少向用户进行确认的次数,避免对用户过多打扰,有利于提升用户体验。
一种可能的实现方式中,该方法还包括:若第一输入中的各个词或者各个实体的置信度均小于第四阈值,则服务器确认第一输入中不含有第一槽位的信息;若第一输入中存在任一词或者任一实体的置信度小于第一阈值,且大于或等于第四阈值时,则服务器确认向用户提问,以确定第一槽位的信息是否必要。
考虑到在用户正确表达了第一槽位的信息时,可能由于槽位提取模型自身不够准确,造成未提取到第一槽位信息的情况。例如:由于技能开发者在训练槽位提取模型之前,输入的用户说法的数量较少,或者输入的用户说法不足够准确等因素,造成服务器训练生成槽位提取模型也不足够准确。那么,此时,用户可以设置一个确认阈值,当槽位提取模型对从用户话语中识别出的实体给出的槽位标注概率值大于该确认阈值时,服务器触发向用户进行确认的机制。这样,有利于减少向用户进行确认的次数,避免对用户过多打扰,有利于提升用户体验。
一种可能的实现方式中,若用户确认的第一槽位的信息为必要信息,则服务器从第二输入中提取第一槽位的信息包括:若用户确认第一槽位的信息为必要信息,则服务器采用第一槽位对应的槽位提取模型或者采用规则,从第二输入中提取第一槽位的信息。
针对同一个实体,同一个槽位提取模型,槽位提取模型可能在第一次时不能正确识别该实体,但在第二次时能正确识别出该实体。这是因为,用户在第一次说出该实体时,语句中很可能包含有其他实体,即该实体存在上下文。槽位提取模型不足够准确时,即有可能会因为没有识别出这些上下文,而导致该实体也未识别出。而后,服务器在第一次不能识别该实体时,向用户针对该实体进行提问,那么用户回答则是针对该实体回答。此时,用户回答可能只包含该实体,或者包含极少的上下文。那么,槽位提取模型很可能在本次识别出该实体。在另一些实施例中,针对用户回答,也可以采用非槽位提取模型的方式来识别该实体,例如可以启用规则识别该实体。其中,规则是指可以结合用户回答的上下文逻辑、用户意图的关联性、实体与第一槽位中的对应关系等因素进行识别。这样,也能有效的提高服务器识别用户第二次或以上次数 说出实体的概率。
一种可能的实现方式中,第一意图中还预先配置有第二槽位,且第二槽位为必填槽位,的人机交互的方法还包括:当服务器确定未提取到第二槽位的信息时,服务器向用户提问,以便提取第二槽位的信息;服务器接收第三输入,并从第三输入中提取第二槽位的信息,第三输入包含用户的回答;服务器根据第一意图,第一槽位的信息,以及第二槽位的信息执行第一意图对应的操作;或者,服务器根据第一意图,以及第二槽位的信息执行第一意图对应的操作。
一种可能的实现方式中,第一意图中还预先配置有第三槽位,且第三槽位为非必填非关键槽位,的人机交互的方法还包括:当服务器确定未提取到第三槽位的信息时,服务器不提取第三槽位的信息。
第二方面、一种服务器,可运用于人机对话系统中,包括:通信接口、存储器和处理器;通信接口、存储器与处理器耦合,存储器用于存储计算机程序代码,计算机程序代码包括计算机指令,当处理器从存储器中读取计算机指令,以使得服务器执行如下步骤:
通过通信接口接收第一输入,第一输入包含用户的服务需求;根据第一输入,确定第一输入对应的第一领域,第一领域为用户的服务需求对应的任务场景;将第一输入分发到第一领域对应的意图识别模型中,识别出第一输入对应的第一意图,第一意图为第一领域中的子场景;从第一输入中提取第一意图中第一槽位的信息;其中,第一意图中预先配置有第一槽位,且第一槽位为非必填关键槽位;当服务器确定未提取到第一槽位的信息时,向用户提问,以确定第一槽位的信息是否必要;通过通信接口接收第二输入,第二输入包含用户确认的第一槽位的信息是否必要的信息;若用户确认第一槽位的信息为必要信息,则从第二输入中提取第一槽位的信息;根据第一意图,以及第一槽位的信息执行第一意图对应的操作;若用户确认第一槽位的信息为非必要信息,则不提取第一槽位的信息;根据第一意图执行第一意图对应的操作。
一种可能的实现方式中,处理器从第一输入中提取第一意图中第一槽位的信息具体包括:处理器将第一输入中识别出的各个词或各个实体输入到第一槽位对应的槽位提取模型中,分别计算第一输入中各个词或各个实体对应的置信度;若第一输入中的第一词或者第一实体的置信度大于或等于第一阈值,则确认第一词或者第一实体为第一槽位的信息;若第一输入中各个词或者各个实体的置信度均小于第一阈值,则确定未提取到第一槽位的信息。
一种可能的实现方式中,处理器还用于若第一槽位对应于自定义槽位类型,则分别计算第一输入中识别出的各个实体与自定义槽位类型中的各个词的相似度。
若第一输入中识别出的各个实体与自定义槽位类型中的各个词的相似度均小于第二阈值,则确认第一输入中不含有第一槽位的信息;若第一输入中第二实体与自定义槽位类型中的第二词的相似度大于或等于第三阈值,则确认第二词为第一槽位的信息;若第一输入中存在任一实体与自定义槽位类型中的任一词的相似度大于或等于第二阈值,且小于第三阈值时,则确认向用户提问,以确定第一槽位的信息是否必要。
一种可能的实现方式中,处理器还用于若第一输入中的各个词或者各个实体的置 信度均小于第四阈值,则确认第一输入中不含有第一槽位的信息;若第一输入中存在任一词或者任一实体的置信度小于第一阈值,且大于或等于第四阈值时,则确认向用户提问,以确定第一槽位的信息是否必要。
一种可能的实现方式中,若用户确认第一槽位的信息为必要信息,则处理器从第二输入中提取第一槽位的信息具体包括:若用户确认第一槽位的信息为必要信息,则处理器采用第一槽位对应的槽位提取模型或者采用规则,从第二输入中提取第一槽位的信息。
一种可能的实现方式中,在第一意图中还预先配置有第二槽位,且第二槽位为必填槽位时,处理器还具体用于当处理器确定未提取到第二槽位的信息时,向用户提问,以便提取第二槽位的信息;通过通信接口接收第三输入,并从第三输入中提取第二槽位的信息,第三输入包含用户的回答;根据第一意图,第一槽位的信息,以及第二槽位的信息执行第一意图对应的操作;或者,根据第一意图,以及第二槽位的信息执行第一意图对应的操作。
一种可能的实现方式中,在第一意图中还预先配置有第三槽位,且第三槽位为非必填非关键槽位时,处理器还具体用于当确定未提取到第三槽位的信息时,不提取第三槽位的信息。
第三方面、一种计算机存储介质,包括计算机指令,当计算机指令在终端上运行时,使得终端执行如第一方面及其中任一种可能的实现方式中所述的方法。
第四方面、一种计算机程序产品,当计算机程序产品在计算机上运行时,使得计算机执行如第一方面中及其中任一种可能的实现方式中所述的方法。
附图说明
图1A为现有技术中一种人机对话的终端界面示意图;
图1B为本申请实施例提供的一种人机对话的终端界面示意图;
图2为本申请实施例提供的一种人机对话系统的组成示意图一;
图3为本申请实施例提供的一种人机对话系统的组成示意图二;
图4为本申请实施例提供的一种电子设备的结构示意图;
图5为本申请实施例提供的一些电子设备的界面示意图;
图6为本申请实施例提供的又一些电子设备的界面示意图;
图7为本申请实施例提供的又一些电子设备的界面示意图;
图8为本申请实施例提供的又一些电子设备的界面示意图;
图9为本申请实施例提供的一种人机交互的方法的流程示意图一;
图10为本申请实施例提供的一种人机交互的方法的流程示意图二;
图11为本申请实施例提供的一种服务器的结构示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行描述。其中,在本申请实施例的描述中,除非另有说明,“/”表示或的意思,例如,A/B可以表示A或B;本文中的“和/或”仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三 种情况。
以下,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征。在本申请实施例的描述中,除非另有说明,“多个”的含义是两个或两个以上。
考虑到现有技术中,存在由于用户输入错误或者语音识别错误,或者槽位提取的算法不足够准确等因素,造成聊天机器人未提取到一些非必填槽位的关键信息,进而造成聊天机器人之后执行的操作不满足用户需求的情况。本申请实施例提供了一种人机交互的方法,进一步将非必填槽位划分为非必填关键槽位和非必填非关键槽位。并且,为非必填关键槽位配置了用户确认机制。也就是说,在聊天机器人根据用户话语自动提取预先设置的各个槽位的信息的过程中,若存在未提取到信息的槽位时,且该槽位为非必填关键槽位时,聊天机器人会主动向用户进行确认。确认是否可以缺少该非必填关键槽位的信息,若不可以缺少,则根据用户的回答继续提取该非必填关键槽位的信息。若可以缺少,则不再提取该非必填关键槽位的信息,也就是不再向用户进行确认。这样,当聊天机器人在未提取出非必填关键槽位的信息时,还可以向用户进行确认,以确保准确识别用户的目的,满足用户的需求,提升用户体验。
可以理解的是,非必填关键槽位是指,用户在表达自己意图时,并不一定要表达该槽位的信息。若没有表达该槽位的信息,则聊天机器人可以忽略该槽位的信息。但如果用户表达了该槽位的信息,则聊天机器人需要准确提取到该槽位的信息。
例如:如图1B所示,为本申请实施例提供的一个聊天机器人与用户的对话过程的举例。其中,“拼车”被配置为打车应用中的一个非必填关键槽位。在与用户的对话中,用户说了“拼车吧”,但聊天机器人没有提取到该信息(即非必填关键槽位的信息)。在这种情况下,聊天机器人需要向用户进行进一步确认,向用户提问“是否可以拼车”。再从用户的回答中提取非必填关键槽位的信息,以确保准确执行用户意图。
若用户的回答为“拼车”,则表示用户愿意拼车,该非必填关键槽位的信息重要,聊天机器人提取该非必填关键槽位的信息后,可以为用户下拼车的订单。若用户的回答为“不拼车”,则表示用户不愿意拼车,该非必填关键槽位的信息重要,聊天机器人提取该非必填关键槽位的信息后,可以为用户下非拼车的订单。若用户的回答为“无所谓”,则表示该非必填关键槽位的信息不重要,聊天机器人可以不考虑拼车的因素,为用户下单。可以理解的是,针对非必填关键槽位的信息向用户确认,既可以完成对重要的非必填关键槽位信息的提取,又有利于进一步确认用户的意愿,更有利于提升聊天机器人执行用户意图的准确性,提升用户体验。
本申请实施例提供的一种人机交互的方法,可运用于如图2所示的人机对话系统中。该人机对话系统包括电子设备100以及一个或多个服务器200(例如:聊天机器人)。电子设备100还可以与服务器200之间可以采用电信网络(3G/4G/5G等通信网络)或者WIFI网络等建立连接,本申请实施例对此不做限定。
其中,用户可以通过电子设备100,与服务器200进行人机对话。电子设备100可以为手机、平板电脑、个人计算机(Personal Computer,PC)、个人数字助理(personal  digital assistant,PDA)、智能手表、上网本、可穿戴电子设备、增强现实技术(Augmented Reality,AR)设备、虚拟现实(Virtual Reality,VR)设备、车载设备、智能汽车、智能音响等,本申请对该电子设备100的具体形式不做特殊限制。
服务器200,服务器200可以为电子设备100提供人机对话的服务,可以根据电子设备输入的用户话语,识别出用户意图,以了解用户需求,并为该用户提供相应的服务。服务器200可以是电子设备100的厂商的服务器,例如可以是电子设备100中语音助手的云服务器等,服务器300还可以是其他应用的服务器,本申请实施例不做限定。
在一些实施例中,服务器200还可以与一个或多个第三方应用的服务器300建立通信连接,以便服务器200在了解用户的需求后,向相应的第三方应用的服务器300发送相应的服务请求,并且将第三方应用的服务器300的响应信息返回给电子设备100。在另一些实施例中,服务器200还可以与第三方应用的电子设备400建立通信连接,以便第三方应用的开发者或管理者通过电子设备400,登录到服务器200上,对自身提供的服务进行配置和管理等。
如图3所示,为本申请实施例提供的另一种人机对话系统的框架图。下面结合该框架图,先对本申请实施例适用的人机交互的过程进行简要说明。
首先,用户可以通过电子设备100向服务器200输入用户语句(可以是语音形式,也可以是文本形式)。若是语音形式,电子设备100可以将语音形式转化为文本形式,然后发送到服务器200,或者服务器200可以将用户语句的语音形式转化为文本形式。本申请实施例不做限定。
服务器200接收到电子设备100发送的用户语句后,由其中的自然语言理解(Natural Language Understanding,NLU)模块先对用户语句进行语义理解。具体的,当用户语句经过自然语言理解模块时,即需要经过领域分类,意图分类以及槽位提取三个子模块。一般情况下,服务器200上集成有多个具体的任务场景,例如:订餐、打车、天气等。于是,领域分类模块,可以先识别该用户语句是属于哪个具体的任务场景,并将该用户话语分发到具体的任务场景中。意图识别模块,可以识别用户意图,将该用户话语再细分为具体任务场景下的子场景。槽位提取模块,可以对用户语句中的实体(entity)进行识别,并进行槽位填充(Slot Filling)。例如:可以采用命名实体识别(Named Entity Recognition,NER)来识别用户语句中的人名、地名、时间、日期、机构名、组织名、货币等具有特定意义的实体。可简单理解为,提取中用户语句中各个词的特征,与预先定义的各个实体的特征进行比对,以便从该用户语句中识别出相应的实体。
举例来说,仍以图1B所示的对话为例进行说明,领域分类模块可以根据用户的“帮我叫个车”,确定需要为用户执行打车任务(子场景还可以包括专车任务、快车任务、顺风车任务)。而后,意图分类可以根据用户的“滴滴快车”,确定需要为用户执行快车任务。然后,槽位提取模块可以提取出目的地槽位信息为“深圳湾公园”,出发时间槽位信息为“8点半”。需要说明的是,图1B中用户并未说明出发地槽位信息,槽位提取模块可以提取用户设置的默认出发地为出发地槽位信息,或者通过GPS进行定位, 将定位的位置作为出发地槽位信息。
自然语言理解模块的输出将作为对话管理(Dialog Management)模块的输入。对话管理模块包括两部分,状态追踪以及对话策略。状态追踪模块包括持续对话的各种信息,根据旧状态,用户状态(自然语言理解模块输出的信息)与系统状态(即通过与数据库的查询情况)来更新当前的对话状态。对话策略与所在任务场景息息相关,通常作为对话管理模块的输出,例如对缺失的必填槽位的追问机制等。
在本申请实施例中,对话策略还包括对缺失的非必填关键槽位的确认机制。具体的,对缺失的非必填关键槽位的确认机制可以与对缺失的必填槽位的追问机制并行处理或串行处理。也就是说,本申请实施例并不限定确认机制和追问机制的执行顺序。具体的确认机制将在下面的实施例中详细阐述,这里不再赘述。
自然语言生成(Natural Language Generation,NLG)模块根据对话管理模块的输出,生成文本信息反馈给用户,即完成与用户的人机交互过程。其中,自然语言生成模块可以采用基于模版,基于语法或基于模型等方式生成自然语言。基于模版与基于语法主要是基于规则的策略,基于模型可以采用例如长短期记忆网络(Long Short-Term Memory,LSTM)等。本申请实施例对自然语言生成的具体实现方式不做限定。
图4示出了电子设备100的结构示意图。
电子设备100可以包括处理器110,外部存储器接口120,内部存储器121,通用串行总线(universal serial bus,USB)接口130,充电管理模块140,电源管理模块141,电池142,天线1,天线2,移动通信模块150,无线通信模块160,音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,传感器模块180,按键190,马达191,指示器192,摄像头193,显示屏194,以及用户标识模块(subscriber identification module,SIM)卡接口195等。其中传感器模块180可以包括压力传感器180A,陀螺仪传感器180B,气压传感器180C,磁传感器180D,加速度传感器180E,距离传感器180F,接近光传感器180G,指纹传感器180H,温度传感器180J,触摸传感器180K,环境光传感器180L,骨传导传感器180M等。
可以理解的是,本发明实施例示意的结构并不构成对电子设备100的具体限定。在本申请另一些实施例中,电子设备100可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。
处理器110可以包括一个或多个处理单元,例如:处理器110可以包括应用处理器(application processor,AP),调制解调处理器,图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),控制器,存储器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网络处理器(neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。
其中,控制器可以是电子设备100的神经中枢和指挥中心。控制器可以根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。
处理器110中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理 器110中的存储器为高速缓冲存储器。该存储器可以保存处理器110刚用过或循环使用的指令或数据。如果处理器110需要再次使用该指令或数据,可从所述存储器中直接调用。避免了重复存取,减少了处理器110的等待时间,因而提高了系统的效率。
在一些实施例中,处理器110可以包括一个或多个接口。接口可以包括集成电路(inter-integrated circuit,I2C)接口,集成电路内置音频(inter-integrated circuit sound,I2S)接口,脉冲编码调制(pulse code modulation,PCM)接口,通用异步收发传输器(universal asynchronous receiver/transmitter,UART)接口,移动产业处理器接口(mobile industry processor interface,MIPI),通用输入输出(general-purpose input/output,GPIO)接口,用户标识模块(subscriber identity module,SIM)接口,和/或通用串行总线(universal serial bus,USB)接口等。
I2C接口是一种双向同步串行总线,包括一根串行数据线(serial data line,SDA)和一根串行时钟线(derail clock line,SCL)。在一些实施例中,处理器110可以包含多组I2C总线。处理器110可以通过不同的I2C总线接口分别耦合触摸传感器180K,充电器,闪光灯,摄像头193等。例如:处理器110可以通过I2C接口耦合触摸传感器180K,使处理器110与触摸传感器180K通过I2C总线接口通信,实现电子设备100的触摸功能。
I2S接口可以用于音频通信。在一些实施例中,处理器110可以包含多组I2S总线。处理器110可以通过I2S总线与音频模块170耦合,实现处理器110与音频模块170之间的通信。在一些实施例中,音频模块170可以通过I2S接口向无线通信模块160传递音频信号,实现通过蓝牙耳机接听电话的功能。
PCM接口也可以用于音频通信,将模拟信号抽样,量化和编码。在一些实施例中,音频模块170与无线通信模块160可以通过PCM总线接口耦合。在一些实施例中,音频模块170也可以通过PCM接口向无线通信模块160传递音频信号,实现通过蓝牙耳机接听电话的功能。所述I2S接口和所述PCM接口都可以用于音频通信。
UART接口是一种通用串行数据总线,用于异步通信。该总线可以为双向通信总线。它将要传输的数据在串行通信与并行通信之间转换。在一些实施例中,UART接口通常被用于连接处理器110与无线通信模块160。例如:处理器110通过UART接口与无线通信模块160中的蓝牙模块通信,实现蓝牙功能。在一些实施例中,音频模块170可以通过UART接口向无线通信模块160传递音频信号,实现通过蓝牙耳机播放音乐的功能。
MIPI接口可以被用于连接处理器110与显示屏194,摄像头193等外围器件。MIPI接口包括摄像头串行接口(camera serial interface,CSI),显示屏串行接口(display serial interface,DSI)等。在一些实施例中,处理器110和摄像头193通过CSI接口通信,实现电子设备100的拍摄功能。处理器110和显示屏194通过DSI接口通信,实现电子设备100的显示功能。
GPIO接口可以通过软件配置。GPIO接口可以被配置为控制信号,也可被配置为数据信号。在一些实施例中,GPIO接口可以用于连接处理器110与摄像头193,显示屏194,无线通信模块160,音频模块170,传感器模块180等。GPIO接口还可以被 配置为I2C接口,I2S接口,UART接口,MIPI接口等。
USB接口130是符合USB标准规范的接口,具体可以是Mini USB接口,Micro USB接口,USB Type C接口等。USB接口130可以用于连接充电器为电子设备100充电,也可以用于电子设备100与外围设备之间传输数据。也可以用于连接耳机,通过耳机播放音频。该接口还可以用于连接其他电子设备,例如AR设备等。
可以理解的是,本发明实施例示意的各模块间的接口连接关系,只是示意性说明,并不构成对电子设备100的结构限定。在本申请另一些实施例中,电子设备100也可以采用上述实施例中不同的接口连接方式,或多种接口连接方式的组合。
充电管理模块140用于从充电器接收充电输入。其中,充电器可以是无线充电器,也可以是有线充电器。在一些有线充电的实施例中,充电管理模块140可以通过USB接口130接收有线充电器的充电输入。在一些无线充电的实施例中,充电管理模块140可以通过电子设备100的无线充电线圈接收无线充电输入。充电管理模块140为电池142充电的同时,还可以通过电源管理模块141为电子设备供电。
电源管理模块141用于连接电池142,充电管理模块140与处理器110。电源管理模块141接收电池142和/或充电管理模块140的输入,为处理器110,内部存储器121,外部存储器,显示屏194,摄像头193,和无线通信模块160等供电。电源管理模块141还可以用于监测电池容量,电池循环次数,电池健康状态(漏电,阻抗)等参数。在其他一些实施例中,电源管理模块141也可以设置于处理器110中。在另一些实施例中,电源管理模块141和充电管理模块140也可以设置于同一个器件中。
电子设备100的无线通信功能可以通过天线1,天线2,移动通信模块150,无线通信模块160,调制解调处理器以及基带处理器等实现。
天线1和天线2用于发射和接收电磁波信号。电子设备100中的每个天线可用于覆盖单个或多个通信频带。不同的天线还可以复用,以提高天线的利用率。例如:可以将天线1复用为无线局域网的分集天线。在另外一些实施例中,天线可以和调谐开关结合使用。
移动通信模块150可以提供应用在电子设备100上的包括2G/3G/4G/5G等无线通信的解决方案。移动通信模块150可以包括至少一个滤波器,开关,功率放大器,低噪声放大器(low noise amplifier,LNA)等。移动通信模块150可以由天线1接收电磁波,并对接收的电磁波进行滤波,放大等处理,传送至调制解调处理器进行解调。移动通信模块150还可以对经调制解调处理器调制后的信号放大,经天线1转为电磁波辐射出去。在一些实施例中,移动通信模块150的至少部分功能模块可以被设置于处理器110中。在一些实施例中,移动通信模块150的至少部分功能模块可以与处理器110的至少部分模块被设置在同一个器件中。
调制解调处理器可以包括调制器和解调器。其中,调制器用于将待发送的低频基带信号调制成中高频信号。解调器用于将接收的电磁波信号解调为低频基带信号。随后解调器将解调得到的低频基带信号传送至基带处理器处理。低频基带信号经基带处理器处理后,被传递给应用处理器。应用处理器通过音频设备(不限于扬声器170A,受话器170B等)输出声音信号,或通过显示屏194显示图像或视频。在一些实施例中, 调制解调处理器可以是独立的器件。在另一些实施例中,调制解调处理器可以独立于处理器110,与移动通信模块150或其他功能模块设置在同一个器件中。
无线通信模块160可以提供应用在电子设备100上的包括无线局域网(wireless local area networks,WLAN)(如无线保真(wireless fidelity,Wi-Fi)网络),蓝牙(bluetooth,BT),全球导航卫星系统(global navigation satellite system,GNSS),调频(frequency modulation,FM),近距离无线通信技术(near field communication,NFC),红外技术(infrared,IR)等无线通信的解决方案。无线通信模块160可以是集成至少一个通信处理模块的一个或多个器件。无线通信模块160经由天线2接收电磁波,将电磁波信号调频以及滤波处理,将处理后的信号发送到处理器110。无线通信模块160还可以从处理器110接收待发送的信号,对其进行调频,放大,经天线2转为电磁波辐射出去。
在一些实施例中,电子设备100的天线1和移动通信模块150耦合,天线2和无线通信模块160耦合,使得电子设备100可以通过无线通信技术与网络以及其他设备通信。所述无线通信技术可以包括全球移动通讯系统(global system for mobile communications,GSM),通用分组无线服务(general packet radio service,GPRS),码分多址接入(code division multiple access,CDMA),宽带码分多址(wideband code division multiple access,WCDMA),时分码分多址(time-division code division multiple access,TD-SCDMA),长期演进(long term evolution,LTE),BT,GNSS,WLAN,NFC,FM,和/或IR技术等。所述GNSS可以包括全球卫星定位系统(global positioning system,GPS),全球导航卫星系统(global navigation satellite system,GLONASS),北斗卫星导航系统(beidou navigation satellite system,BDS),准天顶卫星系统(quasi-zenith satellite system,QZSS)和/或星基增强系统(satellite based augmentation systems,SBAS)。
电子设备100通过GPU,显示屏194,以及应用处理器等实现显示功能。GPU为图像处理的微处理器,连接显示屏194和应用处理器。GPU用于执行数学和几何计算,用于图形渲染。处理器110可包括一个或多个GPU,其执行程序指令以生成或改变显示信息。
显示屏194用于显示图像,视频等。显示屏194包括显示面板。显示面板可以采用液晶显示屏(liquid crystal display,LCD),有机发光二极管(organic light-emitting diode,OLED),有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrix organic light emitting diode的,AMOLED),柔性发光二极管(flex light-emitting diode,FLED),Miniled,MicroLed,Micro-oLed,量子点发光二极管(quantum dot light emitting diodes,QLED)等。在一些实施例中,电子设备100可以包括1个或N个显示屏194,N为大于1的正整数。
电子设备100可以通过ISP,摄像头193,视频编解码器,GPU,显示屏194以及应用处理器等实现拍摄功能。
ISP用于处理摄像头193反馈的数据。例如,拍照时,打开快门,光线通过镜头被传递到摄像头感光元件上,光信号转换为电信号,摄像头感光元件将所述电信号传递给ISP处理,转化为肉眼可见的图像。ISP还可以对图像的噪点,亮度,肤色进行算法优化。ISP还可以对拍摄场景的曝光,色温等参数优化。在一些实施例中,ISP可 以设置在摄像头193中。
摄像头193用于捕获静态图像或视频。物体通过镜头生成光学图像投射到感光元件。感光元件可以是电荷耦合器件(charge coupled device,CCD)或互补金属氧化物半导体(complementary metal-oxide-semiconductor,CMOS)光电晶体管。感光元件把光信号转换成电信号,之后将电信号传递给ISP转换成数字图像信号。ISP将数字图像信号输出到DSP加工处理。DSP将数字图像信号转换成标准的RGB,YUV等格式的图像信号。在一些实施例中,电子设备100可以包括1个或N个摄像头193,N为大于1的正整数。
数字信号处理器用于处理数字信号,除了可以处理数字图像信号,还可以处理其他数字信号。例如,当电子设备100在频点选择时,数字信号处理器用于对频点能量进行傅里叶变换等。
视频编解码器用于对数字视频压缩或解压缩。电子设备100可以支持一种或多种视频编解码器。这样,电子设备100可以播放或录制多种编码格式的视频,例如:动态图像专家组(moving picture experts group,MPEG)1,MPEG2,MPEG3,MPEG4等。
NPU为神经网络(neural-network,NN)计算处理器,通过借鉴生物神经网络结构,例如借鉴人脑神经元之间传递模式,对输入信息快速处理,还可以不断的自学习。通过NPU可以实现电子设备100的智能认知等应用,例如:图像识别,人脸识别,语音识别,文本理解等。
外部存储器接口120可以用于连接外部存储卡,例如Micro SD卡,实现扩展电子设备100的存储能力。外部存储卡通过外部存储器接口120与处理器110通信,实现数据存储功能。例如将音乐,视频等文件保存在外部存储卡中。
内部存储器121可以用于存储计算机可执行程序代码,所述可执行程序代码包括指令。处理器110通过运行存储在内部存储器121的指令,从而执行电子设备100的各种功能应用以及数据处理。内部存储器121可以包括存储程序区和存储数据区。其中,存储程序区可存储操作系统,至少一个功能所需的应用程序(比如声音播放功能,图像播放功能等)等。存储数据区可存储电子设备100使用过程中所创建的数据(比如音频数据,电话本等)等。此外,内部存储器121可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件,闪存器件,通用闪存存储器(universal flash storage,UFS)等。
电子设备100可以通过音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,以及应用处理器等实现音频功能。例如音乐播放,录音等。
音频模块170用于将数字音频信息转换成模拟音频信号输出,也用于将模拟音频输入转换为数字音频信号。音频模块170还可以用于对音频信号编码和解码。在一些实施例中,音频模块170可以设置于处理器110中,或将音频模块170的部分功能模块设置于处理器110中。
扬声器170A,也称“喇叭”,用于将音频电信号转换为声音信号。电子设备100可以通过扬声器170A收听音乐,或收听免提通话。
受话器170B,也称“听筒”,用于将音频电信号转换成声音信号。当电子设备100 接听电话或语音信息时,可以通过将受话器170B靠近人耳接听语音。麦克风170C,也称“话筒”,“传声器”,用于将声音信号转换为电信号。当拨打电话或发送语音信息时,用户可以通过人嘴靠近麦克风170C发声,将声音信号输入到麦克风170C。电子设备100可以设置至少一个麦克风170C。在另一些实施例中,电子设备100可以设置两个麦克风170C,除了采集声音信号,还可以实现降噪功能。在另一些实施例中,电子设备100还可以设置三个,四个或更多麦克风170C,实现采集声音信号,降噪,还可以识别声音来源,实现定向录音功能等。耳机接口170D用于连接有线耳机。耳机接口170D可以是USB接口130,也可以是3.5mm的开放移动电子设备平台(open mobile terminal platform,OMTP)标准接口,美国蜂窝电信工业协会(cellular telecommunications industry association of the USA,CTIA)标准接口。
压力传感器180A用于感受压力信号,可以将压力信号转换成电信号。在一些实施例中,压力传感器180A可以设置于显示屏194。压力传感器180A的种类很多,如电阻式压力传感器,电感式压力传感器,电容式压力传感器等。电容式压力传感器可以是包括至少两个具有导电材料的平行板。当有力作用于压力传感器180A,电极之间的电容改变。电子设备100根据电容的变化确定压力的强度。当有触摸操作作用于显示屏194,电子设备100根据压力传感器180A检测所述触摸操作强度。电子设备100也可以根据压力传感器180A的检测信号计算触摸的位置。在一些实施例中,作用于相同触摸位置,但不同触摸操作强度的触摸操作,可以对应不同的操作指令。例如:当有触摸操作强度小于第一压力阈值的触摸操作作用于短消息应用图标时,执行查看短消息的指令。当有触摸操作强度大于或等于第一压力阈值的触摸操作作用于短消息应用图标时,执行新建短消息的指令。
陀螺仪传感器180B可以用于确定电子设备100的运动姿态。在一些实施例中,可以通过陀螺仪传感器180B确定电子设备100围绕三个轴(即,x,y和z轴)的角速度。陀螺仪传感器180B可以用于拍摄防抖。示例性的,当按下快门,陀螺仪传感器180B检测电子设备100抖动的角度,根据角度计算出镜头模组需要补偿的距离,让镜头通过反向运动抵消电子设备100的抖动,实现防抖。陀螺仪传感器180B还可以用于导航,体感游戏场景。
气压传感器180C用于测量气压。在一些实施例中,电子设备100通过气压传感器180C测得的气压值计算海拔高度,辅助定位和导航。
磁传感器180D包括霍尔传感器。电子设备100可以利用磁传感器180D检测翻盖皮套的开合。在一些实施例中,当电子设备100是翻盖机时,电子设备100可以根据磁传感器180D检测翻盖的开合。进而根据检测到的皮套的开合状态或翻盖的开合状态,设置翻盖自动解锁等特性。
加速度传感器180E可检测电子设备100在各个方向上(一般为三轴)加速度的大小。当电子设备100静止时可检测出重力的大小及方向。还可以用于识别电子设备姿态,应用于横竖屏切换,计步器等应用。
距离传感器180F,用于测量距离。电子设备100可以通过红外或激光测量距离。在一些实施例中,拍摄场景,电子设备100可以利用距离传感器180F测距以实现快速 对焦。
接近光传感器180G可以包括例如发光二极管(LED)和光检测器,例如光电二极管。发光二极管可以是红外发光二极管。电子设备100通过发光二极管向外发射红外光。电子设备100使用光电二极管检测来自附近物体的红外反射光。当检测到充分的反射光时,可以确定电子设备100附近有物体。当检测到不充分的反射光时,电子设备100可以确定电子设备100附近没有物体。电子设备100可以利用接近光传感器180G检测用户手持电子设备100贴近耳朵通话,以便自动熄灭屏幕达到省电的目的。接近光传感器180G也可用于皮套模式,口袋模式自动解锁与锁屏。
环境光传感器180L用于感知环境光亮度。电子设备100可以根据感知的环境光亮度自适应调节显示屏194亮度。环境光传感器180L也可用于拍照时自动调节白平衡。环境光传感器180L还可以与接近光传感器180G配合,检测电子设备100是否在口袋里,以防误触。
指纹传感器180H用于采集指纹。电子设备100可以利用采集的指纹特性实现指纹解锁,访问应用锁,指纹拍照,指纹接听来电等。
温度传感器180J用于检测温度。在一些实施例中,电子设备100利用温度传感器180J检测的温度,执行温度处理策略。例如,当温度传感器180J上报的温度超过阈值,电子设备100执行降低位于温度传感器180J附近的处理器的性能,以便降低功耗实施热保护。在另一些实施例中,当温度低于另一阈值时,电子设备100对电池142加热,以避免低温导致电子设备100异常关机。在其他一些实施例中,当温度低于又一阈值时,电子设备100对电池142的输出电压执行升压,以避免低温导致的异常关机。
触摸传感器180K,也称“触控面板”。触摸传感器180K可以设置于显示屏194,由触摸传感器180K与显示屏194组成触摸屏,也称“触控屏”。触摸传感器180K用于检测作用于其上或附近的触摸操作。触摸传感器可以将检测到的触摸操作传递给应用处理器,以确定触摸事件类型。可以通过显示屏194提供与触摸操作相关的视觉输出。在另一些实施例中,触摸传感器180K也可以设置于电子设备100的表面,与显示屏194所处的位置不同。
骨传导传感器180M可以获取振动信号。在一些实施例中,骨传导传感器180M可以获取人体声部振动骨块的振动信号。骨传导传感器180M也可以接触人体脉搏,接收血压跳动信号。在一些实施例中,骨传导传感器180M也可以设置于耳机中,结合成骨传导耳机。音频模块170可以基于所述骨传导传感器180M获取的声部振动骨块的振动信号,解析出语音信号,实现语音功能。应用处理器可以基于所述骨传导传感器180M获取的血压跳动信号解析心率信息,实现心率检测功能。
按键190包括开机键,音量键等。按键190可以是机械按键。也可以是触摸式按键。电子设备100可以接收按键输入,产生与电子设备100的用户设置以及功能控制有关的键信号输入。
马达191可以产生振动提示。马达191可以用于来电振动提示,也可以用于触摸振动反馈。例如,作用于不同应用(例如拍照,音频播放等)的触摸操作,可以对应不同的振动反馈效果。作用于显示屏194不同区域的触摸操作,马达191也可对应不同 的振动反馈效果。不同的应用场景(例如:时间提醒,接收信息,闹钟,游戏等)也可以对应不同的振动反馈效果。触摸振动反馈效果还可以支持自定义。
指示器192可以是指示灯,可以用于指示充电状态,电量变化,也可以用于指示消息,未接来电,通知等。
SIM卡接口195用于连接SIM卡。SIM卡可以通过插入SIM卡接口195,或从SIM卡接口195拔出,实现和电子设备100的接触和分离。电子设备100可以支持1个或N个SIM卡接口,N为大于1的正整数。SIM卡接口195可以支持Nano SIM卡,Micro SIM卡,SIM卡等。同一个SIM卡接口195可以同时插入多张卡。所述多张卡的类型可以相同,也可以不同。SIM卡接口195也可以兼容不同类型的SIM卡。SIM卡接口195也可以兼容外部存储卡。电子设备100通过SIM卡和网络交互,实现通话以及数据通信等功能。在一些实施例中,电子设备100采用eSIM,即:嵌入式SIM卡。eSIM卡可以嵌在电子设备100中,不能和电子设备100分离。
电子设备400的结构也可以参考图4所示的电子设备100的结构,不再赘述。
以下实施例中所涉及的技术方案均可以在具有上述硬件架构电子设备100、服务器200或电子设备400中实现。
下面结合附图对本申请实施例提供的技术方案进行详细说明。
首先,技能开发者(可以是第三方应用开发者、或者服务提供者等)可以通过电子设备400登录到服务器200上,配置一项新的服务,在人机对话平台中也可以称之为一项技能。
其中,图5至图8中示出了技能开发者配置新技能的过程中涉及到的一些界面图。技能开发者配置新技能主要涉及如下一些步骤:
一、设置新技能的基础信息
技能开发者可以通过电子设备400登录到人机对话平台的技能管理页面中,并开始对新技能进行配置。例如,可以在电子设备400的浏览器中输入与人机对话平台相关联的网址,登录技能管理页面。或者,也可以使用相应的APP登录技能管理页面。
如图5中(1)所示,为技能管理页面的首页400,首页400中可以包含有控件401和控件402。人机对话平台上可以提供有技能模板,这些技能模板涵盖了部分使用场景,技能开发者可以通过对这些模板进行部分修改,就可以实现自己的个性化需求。一些实施例中,技能开发者可以通过选择控件402,使用人机对话平台上提供的技能模板,来进行新技能的配置。另一些实施例中,技能开发者可以通过选择控件401,根据自己提供服务添加自定义技能,为终端用户提供语音交互和相应的服务。下文以技能开发者选择控件401,添加自定义技能为例进行说明。
电子设备400在检测到技能开发者选择控件401之后,进入添加自定义技能的界面。如图5中(2)所示,为设置新技能基础信息的页面500。该页面500可以设置新技能的基础信息例如有:技能标识、技能名称、技能分类以及唤醒词等。
其中,技能标识是某个技能的全局唯一标识,每个技能的技能标识不能重复。技能名称是该技能的描述性名称,方便技能开发者自己对创建的各个技能进行管理,对于重复性不作要求。技能开发者需要为每个技能选择一个分类(也可以理解为上文所 所说的具体场景),用于匹配用户说法时的搜索和筛选。每个技能只能从属于一个分类,准确地选择技能分类有助于更快更精准地将用户说法和技能中的意图匹配。唤醒词,可理解为某个技能的一个别名。当用户说了该别名后,人机对话平台可快速的到该技能里来获取对应的服务。
可以理解的是,对新技能的基础信息的设置还可以包括其他内容,不再一一列举。
二、创建新技能中的意图
在完成对新技能基础信息的设置之后,可以开始创建新技能中的意图。如图5中(3)所示,为电子设备400显示的一种创建意图的页面600。该页面600可以包括意图名称、上文语境、下文语境等。
其中,意图名称技能内不得重复。上下文语境主要用于多轮对话场景。上文语境用于触发当前意图,下文语境用于关联下一轮意图。
需要说明的是,用户说的每句话都对应着用户的一个意图,是用户说这句话的目的,每个技能都是由数个意图组成,通过将用户说的每句话来和技能中的意图匹配来了解用户的需求,并提供相应的服务。而在用户使用该技能时,会通过各种表述方式来表达自己的意图。因此,技能开发者需要在意图配置中尽可能多的录入各种用户在日常生活中为了表达该意图所可能有的表达方式(即用户说法),这样意图的识别更准确。
三、输入用户说法,并标记用户说法中的槽位(包括设置槽位属性,关联槽位类型等)
如图6中(1)所示,为电子设备400上显示意图创建页面中设置用户说法的页面601。该页面601可以包括用于新增用户说法一个或多个控件602。该页面601中还可以显示已有用户说法的信息项603。
其中,技能开发者可以通过在控件602中的文本框中输入新增的用户说法,并点击“新增”的功能按钮,新增用户说法。在一些实施例中,人机对话平台可以自动识别新增用户说法中的实体,并将识别出的实体与槽位,以及槽位类型进行关联。在另一些实施例中,若人机对话平台未自动标注槽位或者标注有误时,技能开发者可以选择手动标记槽位,以及将标记的槽位与槽位类型进行关联。
其中,槽位是指用户说法中包含的用来表达意图的关键信息,可以理解为用户说法中的关键词,一个槽位对应着一个槽位类型,该槽位可以由槽位类型中的其他词进行填充取值。槽位类型可以理解为在某个领域的词汇的集合,在用户说法中的槽位信息由各种槽位类型组成,同一个槽位类型中的词汇在对应的槽位信息中可以替换并被识别提取出来。
例如:响应于技能开发者在控件602中的文本框中输入新增的用户说法,并点击“新增”的功能按钮的操作后,电子设备400弹出如图6中(2)所示的对话框604。技能开发者可以通过对话框604对新增用户说法中标记的槽位以及关联的槽位类型进行查看、修改等操作。该对话框604中还可以显示有用于新增槽位类型的控件605,以便在关联槽位类型时,没有合适的槽位类型可选时,可以新增相应的槽位类型。该对话框604中还可以显示有用于查看槽位列表的控件606。响应于技能开发者点击控件 606,电子设备400显示如图6中(3)所示的页面608。在该页面608中,显示有该用户说法中包含的槽位,以及各个槽位关联的槽位类型,以及各个槽位的属性(必填槽位、非必填关键槽位、非必填非关键槽位),以及缺失槽位信息时的提问等信息。该页面608中还可以包括新增槽位的控件607,可用于对该用户说法的槽位进行添加。
通常,技能开发者可以在界面608中,为必填槽位配置追问机制,为非必填关键槽位配置确认机制。对于非必填非关键槽位,不用设置提问。也就是说,当某个槽位被设置为必填槽位或非必填关键槽位时,该槽位设置有提问,该提问可以是默认的提问,也可以是技能开发者自定义的提问。当某个槽位被设置为非必填非关键槽位时,该槽位中的提问被默认设置为无,且不可更改。又例如:新增用户说法为“这周五首都下雨吗”,标记的槽位有时间槽位和城市槽位。其中,时间槽位对应的槽位类型为sys.time,该时间槽位的属性为非必填关键槽位。也就是说,当人机对话平台未提取到该时间槽位的信息时,人机对话平台会主动向用户提问,提问的内容为“提问1”。由用户确定是否可以缺少该时间槽位的信息,若不可以缺少,则从用户的回答中提取出该时间槽位的信息,再执行后续操作。若可以缺少,则人机对话平台认为没有该时间槽位信息,直接执行后续操作。
其中,城市槽位对应的槽位类型为sys.local.city,该城市槽位的属性为必填槽位。也就是说,当人机对话平台未提取到该城市槽位的信息时,人机对话平台会主动向用户提问,提问的内容为“提问2”。再从用户的回答中提取出该城市槽位的信息,再执行后续操作。
其中,该新增用户说法中还可以标注其他非必填非关键槽位。也就是说,当人机对话平台未提取到该非必填非关键的信息时,则人机对话平台认为没有该非必填非关键的信息,直接执行后续操作。
在本申请实施例中,槽位类型主要包括系统槽位类型和自定义槽位类型(也可称之为用户字典)。系统槽位类型为人机对话平台预先设置的槽位类型,系统槽位类型内的词为不可枚举,例如:sys.time、sys.location.city、sys.name、sys,phoneNum等。自定义槽位类型为技能开发者自行定义的槽位类型,自定义槽位类型内的词为有限数量。
如图7所示,为电子设备400显示的一种槽位类型的编辑页面700。技能开发者可以在输入框701输入新增的自定义槽位类型的文本,并按回车键确认。可以在取值项702的下方输入该新增的自定义槽位类型的取值,可以在该新增的自定义槽位类型对应的同义词703项的下方输入同义词后,点击“保存”按钮,即可完成一项新增自定义槽位类型。该槽位类型的编辑页面700还可以通过区域704中所示的多个控件,实现对自定义槽位类型的修改与删除等。在一些实施例中,该槽位类型的编辑页面700还可以支持批量添加槽位类型。例如:技能开发者可以通过点击批量添加按键705,选择上传特定文件类型或特定文件格式的文件来批量添加槽位类型。其中,特定文件类型或特定文件格式的文件中包含有一条或多条待新增的槽位类型的信息。本申请实施例对此不做限定。
四、新技能设置完成,训练并发布新技能对应的人机对话模型
在技能开发者输入新技能所需的用户说法,并对用户说法中的槽位进行标注、属性设置、以及槽位类型的关联后,电子设备400可以显示页面800。技能开发者可以通过点击“开始训练”控件801,通知人机对话平台开始训练新技能对应的人机对话模型。人机对话平台训练出的该新技能对应的人机对话模块可以包括:领域分类模型、意图分类模型和槽位提取模型等。其中,领域分类模型可用于对用户话语进行领域分类。意图分类模型可用于对用户话语在对应的领域内再细分,识别出用户话权对应的新技能的意图。槽位提取模型可用于提取用户话语中的槽位信息。这样,可以根据意图分类模型输出的用户意图与槽位提取模型输出的槽位信息,执行用户意图对应的后续操作。
在人机对话平台生成新技能对应的人机对话模型之后,电子设备400可以显示页面900。技能开发者可以通过点击“发布技能”控件902,以通知人机对话平台发布该新技能,将该新技能对应的人机对话模型推送到线上,而后,其他终端可以通过与人机对话平台进行对话,以获取使得人机对话平台为其提供该新技能。页面900还可以包括“重新训练”的控件901,技能开发者可以通过该控件901,重新训练新技能对应的人机对话模型。
如图9所示,为本申请实施例提供的一种人机交互的方法,可运用于电子设备100与服务器200之间交互,该方法具体包括如下步骤:
S101、服务器200接收第一输入。
用户在使用电子设备100与服务器200进行对话交互时,可以通过语音的形式,也可以通过文本的形式,向服务器200提出相应的服务需求。若用户以语音形式输入时,服务器200可以通过自动语音识别模块对语音进行识别,识别为文本形式,即为第一输入,并输入到自然语言理解模块中。若用户以文本形式输入时,则服务器200将用户输入的文本,作为第一输入输入到自然语言理解模块中。
其中,第一输入可以是用户与服务器200的单轮对话中的一次话语,也可以是用户与服务器200的多轮对话中的多次话语,本申请实施例不做限定。
S102、服务器200根据第一输入,进行领域分类,确定第一输入对应的第一领域。
由于第一输入对应于用户的一个意图,即用户想要服务器200提供的一项服务或执行的一些操作。自然语言理解模块中的领域分类模块可以根据第一输入进行搜索和筛选,以确定出第一输入中用户意图是属于哪个具体的任务场景(即第一领域),并将该第一输入分发到具体的任务场景(即第一领域)中。
S103、服务器200将第一输入分发到第一领域,并识别第一输入对应第一意图。
自然语言理解模块中的意图识别模块可以进一步将第一输入中用户意图再细分为具体任务场景下的子场景,即识别出第一输入对应的用户意图(即第一意图)。
S104、服务器200根据第一意图对应的槽位配置,从第一输入中提取第一意图中各个槽位的信息。
其中,第一意图为服务器200上某个技能中的一个意图。技能开发者在配置该技能时,会对该技能中的第一意图配置相应的槽位,即第一意图需要提取哪些槽位,以及各个槽位的属性。因此,在确定第一输入对应的第一意图后,服务器200中的槽位 提取模块可以查找到该第一意图对应的槽位配置。
服务器200中的槽位提取模块可以识别出第一输入中包含的实体,调用槽位提取模块中存储的槽位提取模型,对这些实体进行运算,以确定出这些实体分别对应着第一意图中的哪些槽位,为这些实体打上相应槽位的标签。也可认为是将这些实体确认为相应槽位的取值,即提取到这些槽位的信息。例如:槽位提取模块识别出第一输入中的实体A,将实体A输入到槽位提取模型中各个槽位对应的算法,计算出实体A对应的各个置信度。若实体A输入到槽位A对应的算法中计算得到的置信度不满足预设条件,例如:小于预设阈值,例如阈值C,则认为实体A不是槽位A的信息。若实体A输入到槽位B对应的算法中计算得到的置信度满足预设条件,例如:大于或等于阈值C,则认为实体A为槽位B的信息。
需要说明的是,有一些槽位的信息可以是用户默认设置的,或者可以通过其他方式获取的信息,并不一定是从第一输入中提取的。
例如:第一意图为“订机票”,预设的“订机票”中槽位配置可以有时间槽位、出发地槽位和目的地槽位。若用户说出“订明天去上海的机票”(即第一输入)。那么,服务器200可以识别出第一输入中多个实体,例如:“明天”、“上海”。服务器200可以将“明天”输入到槽位提取模型中时间槽位对应的算法中运算,得到“明天”为时间槽位的置信度满足预设条件,即可以认为“明天”即是“订机票”中的时间槽位的取值。也就是说,服务器200提取到第一意图中的时间槽位的信息。类似的,服务器200可以将“上海”输入到槽位提取模型中目的地槽对应的算法中运算,得到“上海”为目的地槽的置信度满足预设条件,即可以认为“上海”即是“订机票”中的目的地槽位的取值。也就是说,服务器200提取到第一意图中的目的地槽位的信息。第一输入中并没有对应于出发地槽位的实体。可以通过GPS获取用户所使用的电子设备100的当前位置作为出发地槽位的取值,也可以通过用户设置的默认地址为出发地槽位的取值等,也就是说,服务器200提取到第一意图中的出发地槽位的信息。
S105、服务器200确定未提取到第一意图中第一槽位的信息。
在执行步骤S104的过程中,存在这样的情况,第一输入中可能没有包含有第一意图中的某些槽位的信息(例如:用户没有说,或者虽然用户说了,但自动语音识别错误或用户输入错误),或者服务器200的槽位提取模型不足够准确时,都可能造成服务器200没有提取出第一输入中第一意图中的某些槽位信息。为此,需要执行步骤S106及之后的步骤。
S106、服务器200判断第一槽位的属性。其中,第一槽位的属性包括必填槽位、非必填关键槽位和非必填非关键槽位。若第一槽位为必填槽位,则执行步骤S107;若第一槽位为非必填非关键槽位,则执行步骤S110;若第一槽位为非必填关键槽位,则执行步骤S111。
具体的,服务器200中的槽位提取模块将未提取到的第一槽位的结果,发送给对话管理模块。对话管理模块对第一槽位的属性进行判断,以便根据第一槽位的属性,确定后续的操作。
S107、服务器200向用户追问第一槽位的信息。
对话管理模块根据第一槽位的属性,以及预先设置的对话策略,向用户发出针对第一槽位的提问。示例性的,服务器200可以要求用户重新说一遍,也可以将之前与用户交互的提问再重新问一遍,也可以针对缺失的第一槽位进行提问。本申请实施例对提问的内容和方式均不做限定。
S108、服务器200接收第二输入。
其中,第二输入为用户根据服务器200的提问进行的回答。若用户采用语音形式进行回答,则服务器200中的自动语音识别模块,可以将语音转化为文本,得到第二输入。若用户采用文本形式进行回答,则服务器200将用户输入的文本,作为第二输入。服务器将确定的第二输入发送到自然语言理解模块中。
其中,第二输入可以是用户与服务器200的单轮对话中的一次话语,也可以是用户与服务器200的多轮对话中的多次话语,本申请实施例不做限定。
S109、服务器200根据第二输入,填充第一意图中的第一槽位。
自然语言理解模块中的槽位提取模块,识别第二输入中的实体,调用其上存储的槽位提取模型中第一槽位对应的算法进行运算,以识别出第一槽位对应的实体,即用确定的实体作为第一槽位的取值,即是提取到第一槽位的信息。而后,执行步骤S116。
S110、服务器200不用对第一槽位进行填充。
自然语言理解模块中的槽位提取模块,确定不填充第一槽位的信息,即不用确定第一槽位的取值。执行步骤S116。
S111、服务器200向用户提问,以确认第一槽位的信息是否为必要。
在本申请的一些实施例中,对话管理模块在确定第一槽位为非必填关键槽位(即步骤S106)后,无论用户是否有说第一槽位的信息时,都可直接向用户针对第一槽位的信息进行提问,由用户确认是否需要补齐第一槽位的信息。本申请实施例中对针对第一槽位的信息进行提问的方式和内容均不做限定。
在本申请的另一些实施例中,在确定第一槽位为非必填关键槽位(即步骤S106)后,对话管理模块也可以进一步判断用户是否可能说了第一槽位的信息。在用户极可能说了第一槽位信息的情况下,在向用户针对第一槽位的信息进行提问,由用户确认是否需要补齐第一槽位的信息。这样,有针对性向用户进行确认,这样有利于减少打扰用户的情况。其中,服务器200判断用户是否可能说了第一槽位的信息的过程可以参考下文的描述,这里不再赘述。
S112、服务器200接收电子设备的第三输入。
其中,第三输入为用户根据服务器200的提问进行的回答。若用户采用语音形式进行回答,则服务器200中的自动语音识别模块,可以将语音转化为文本,得到第三输入。若用户采用文本形式进行回答,则服务器200将用户输入的文本,作为第三输入。服务器将确定的第三输入发送到自然语言理解模块中。
其中,第三输入可以是用户与服务器200的单轮对话中的一次话语,也可以是用户与服务器200的多轮对话中的多次话语,本申请实施例不做限定。
S113、服务器200根据电子设备100的第三输入确认第一槽位的信息是否为必要。若必要,则执行步骤S114,否则,执行S115。
S114、服务器200根据第三输入,填充第一槽位。
可参考步骤S109,而后,执行步骤S116。
S115、服务器200不用对第一槽位进行填充。
而后,执行步骤S116。
S116、服务器200根据第一意图,以及提取到的第一意图中的槽位信息,执行第一意图对应的操作。
如图10所示,为本申请实施例提供的又一种人机交互方法的流程示意图,还包括对用户是否可能说了第一槽位的信息的判断过程。也就是说,在步骤S106之后,步骤S111之前,还包括步骤S201,具体如下:
S201、服务器200判断第一输入中是否可能包含有第一槽位的信息。若确认第一输入中可能包含有第一槽位的信息,则执行步骤S111,否则执行步骤S115。
示例性的,考虑到在用户说了第一槽位的信息,但服务器200没有提取到第一槽位信息,可能有以下两种原因:
原因一、用户输入错误,或者语音识别错误,造成未提取到第一槽位的信息。例如:假设有服务器200上有两个意图:意图1为“打开设置”,对应的操作为打开系统设置。意图2为“打开微信设置”,对应的操作为打开微信应用的设置。若用户说“打开微信设置”时,可能因为用户说话时的口音或者停顿等,可能造成服务器200将语音识别为“打开微星啊设置”。那么,服务器200未提取出“微星啊”。若服务器200不向用户进行确认,则很可能直接将该用户的意图识别为“打开设置”,即执行打开系统设置,与用户希望执行的打开微信应用的设置的目标不同。
原因二、用户正确表达了第一槽位的信息,但由于槽位提取模型自身不够准确,造成未提取到第一槽位信息。例如:由于技能开发者在训练槽位提取模型之前,输入的用户说法的数量较少,或者输入的用户说法不足够准确等因素,造成服务器200训练生成槽位提取模型也不足够准确。
针对上述两种不同原因,本申请实施例提供了以下两种方法,可以用于确认用户可能说了第一槽位的信息。具体如下:
方法一、针对用户输入错误,或者语音识别错误的情况。
服务器200可以采用纠错的方法,确定从用户话语(即第一输入)中识别的实体与用户词典中的关键词较为相似时,才触发向用户进行确认的机制。其中,判断实体与用户词典中的关键词的相似度时,例如可以采用基于拼音相似度的算法,字符串相似度的算法等,计算第一输入中识别出的实体与用户词典中关键词的编辑距离,以确定两者的相似程度。还可以是采用深度学习词向量、句子向量等方法计算词或短语的相似度。本申请实施例对计算相似度的方法不做限定。
下文以计算编辑距离为例,对本申请实施例提供的确定第一输入可能含有的第一槽位信息的方法进行说明。其中,编辑距离(Levenshtein Distance),指的是两个字符串之间,由一个转换成另一个所需的最少编辑操作次数。编辑操作可以包括将一个字符替换成另一个字符,插入一个字符,删除一个字符。
首先,判断第一槽位对应着系统槽位类型,还是对应着用户词典(即自定义槽位 类型)。
在一些实施例中,服务器200确认第一槽位对应着自定义槽位类型。由于自定义槽位类型中的词为用户自定义的,是有限数量的。因此,服务器200可以通过遍历第一槽位对应的自定义槽位类型中所有的词,分别计算用户话语中确定的实体与自定义槽位类型中各个词的编辑距离,从这些编辑距离中确定最小的编辑距离。该最小的编辑距离对应的第一输入中的实体,可确认为第一槽位的潜在实体,即可能是第一槽位的信息。
可选的,服务器200可以判断该最小值与阈值A的大小。其中,阈值A可以是开发者或者用户设置的阈值。如果该最小值小于阈值A,则可认为用户没有说第一槽位的信息,即第一输入中不含有第一槽位的信息。那么,服务器200可以不向用户进行确认。如果该最小值大于或等于阈值A,则可认为用户可能说了第一槽位的信息。于是,服务器200可以向用户进行确认。
例如:用户的目的是“订一张明天去上海的机票”,而错误输入为“订一张明天去上哈的机票”。第一意图为“订机票”,第一意图中包含的槽位有时间槽位、出发地槽位和目的地槽位。这里假设目的地槽位对应着用户词典1。服务器200未识别出目的地槽位的信息。那么,服务器200可以识别出第一输入中的实体有“明天”和“上哈”。服务器200分别计算“明天”与用户词典1中的所有词的距离,以及分别计算“上哈”与用户词典1中的所有词的距离。从中,选择距离最小的词,例如“上哈”和用户词典1中的“上海”的编辑距离最小,那么,可以确认“上哈”为第一槽位的潜在实体。进一步的,“上哈”与用户词典1中的最相似的词之间的编辑距离若大于阈值A,则认为“上哈”有可能是用户说的第一槽位的信息。那么,可以向用户进行确认。
可选的,开发者或者用户还可以设置阈值B,阈值B大于阈值A。如果该最小值大于或等于阈值B,表明潜在实体与自定义槽位类型中的词极为相似,基本可认为用户说了第一槽位的信息。也就是说,服务器可以不向用户进行确认,直接将该潜在实体确认为第一槽位的信息。如果上述最小值小于或等于阈值B,且大于阈值A,则可认为用户可能说了第一槽位的信息,即是潜在实体可能是第一槽位的信息,于是,服务器可以向用户进一步确认。本申请实施例不做限定。
在另一些实施例中,服务器200确认第一槽位对应着系统槽位类型。由于系统槽位类型中的词为不可枚举的。因此,不可以通过遍历所有的系统槽位类型中词,计算第一输入中的实体词与系统槽位类型中的所有词的编辑距离。于是,服务器200不能确认用户是否说了第一槽位的信息。为了避免对用户造成过多的打扰,可以不就第一槽位的信息向用户进行确认。
第二种情况是,槽位提取模型不够准确,导致服务器200未提取出第一槽位的信息。
槽位提取模型可以采用例如命名实体识别(Named Entity Recognition,NER)的方法识别出第一输入中的实体,并将识别出的实体输入到槽位提取模型中第一槽位对应的算法中,计算各个实体的置信度。可选的,槽位提取模型也可以不识别实体,直接将第一输入中包含的各个词,直接输入到槽位提取模型中第一槽位对应的算法中, 计算各个词的置信度。当计算得到的各个实体或者各个词的置信度满足一定条件后,认为用户可能说了第一槽位的信息,才向用户进行确认。其中,计算第一输入中各个实体或各个分词的置信度,可参考现有技术中基于分类的计算方法,以及基于序列标注的计算方法等,这里不再赘述。
示例性的,服务器200可以将第一输入中各个实体分别输入到槽位提取模型中,计算得到各个实体的置信度。在某个实体的置信度满足一定条件后,才向用户进行确认。
槽位提取模型未提取出第一槽位的信息,可以理解为,服务器200中槽位提取模型对从用户话语中识别出的实体给出的槽位标注概率值低于识别阈值。那么,此时,用户可以设置一个确认阈值,当槽位提取模型对从用户话语中识别出的实体给出的槽位标注概率值大于该确认阈值时,服务器200触发向用户进行确认的机制。也就是说,当第一输入中存在某个或某些实体对应的模型提取模型的置信度大于确认阈值,小于识别阈值时,服务器200才就第一槽位的信息向用户进行确认。
需要说明的是,针对同一个实体,同一个槽位提取模型,槽位提取模型可能在第一次时不能正确识别该实体,但在第二次时能正确识别出该实体。这是因为,用户在第一次说出该实体时,语句中很可能包含有其他实体,即该实体存在上下文。槽位提取模型不足够准确时,即有可能会因为没有识别出这些上下文,而导致该实体也未识别出。而后,服务器200在第一次不能识别该实体时,向用户针对该实体进行提问,那么用户回答则是针对该实体回答。此时,用户回答可能只包含该实体,或者包含极少的上下文。那么,槽位提取模型很可能在本次识别出该实体。在另一些实施例中,针对用户回答,也可以采用非槽位提取模型的方式来识别该实体,例如可以启用规则识别该实体。其中,规则是指可以结合用户回答的上下文逻辑、用户意图的关联性、实体与第一槽位中的对应关系等因素进行识别。这样,也能有效的提高服务器200识别用户第二次或以上次数说出实体的概率。
可以理解的是,上述终端等为了实现上述功能,其包含了执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,本申请实施例能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明实施例的范围。
本申请实施例可以根据上述方法示例对上述终端等进行功能模块的划分,例如,可以对应各个功能划分各个功能模块,也可以将两个或两个以上的功能集成在一个处理模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。需要说明的是,本发明实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。
如图11所示,为本申请实施例公开了一种服务器200的硬件结构示意图,服务器200包括至少一个处理器201、至少一个存储器202、至少一个通信接口203。可选的,服务器200还可以包括输出设备和输入设备,图中未示出。
处理器201、存储器202和通信接口203通过总线相连接。处理器201可以是一个通用中央处理器(Central Processing Unit,CPU)、微处理器、特定应用集成电路(Application-Specific Integrated Circuit,ASIC),或者一个或多个用于控制本申请方案程序执行的集成电路。处理器201也可以包括多个CPU,并且处理器201可以是一个单核(single-CPU)处理器或多核(multi-CPU)处理器。这里的处理器可以指一个或多个设备、电路或用于处理数据(例如计算机程序指令)的处理核。
存储器202可以是只读存储器(Read-Only Memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备、随机存取存储器(Random Access Memory,RAM)或者可存储信息和指令的其他类型的动态存储设备,也可以是电可擦可编程只读存储器(Electrically Erasable Programmable Read-Only Memory,EEPROM)、只读光盘(Compact Disc Read-Only Memory,CD-ROM)或其他光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其他磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。存储器202可以是独立存在,通过总线与处理器201相连接。存储器202也可以和处理器201集成在一起。其中,存储器202用于存储执行本申请方案的应用程序代码,并由处理器201来控制执行。处理器201用于执行存储器202中存储的计算机程序代码,从而实现本申请实施例中所述人机交互的方法。
通信接口303,可用于与其他设备或通信网络通信,如以太网,无线局域网(wireless local area networks,WLAN)等。
输出设备和处理器通信,可以以多种方式来显示信息。例如,输出设备可以是液晶显示器(Liquid Crystal Display,LCD),发光二级管(Light Emitting Diode,LED)显示设备,阴极射线管(Cathode Ray Tube,CRT)显示设备,或投影仪(projector)等。输入设备和处理器通信,可以以多种方式接收用户的输入。例如,输入设备可以是鼠标、键盘、触摸屏设备或传感设备等。
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请实施例各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或处理器执行 本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:快闪存储器、移动硬盘、只读存储器、随机存取存储器、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何在本申请揭露的技术范围内的变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (16)

  1. 一种人机交互的方法,其特征在于,可运用于人机对话系统中,包括:
    服务器接收第一输入,所述第一输入包含用户的服务需求;
    所述服务器根据所述第一输入,确定第一输入对应的第一领域,所述第一领域为所述用户的服务需求对应的任务场景;
    所述服务器将所述第一输入分发到所述第一领域对应的意图识别模型中,识别出第一输入对应的第一意图,所述第一意图为所述第一领域中的子场景;
    所述服务器从所述第一输入中提取所述第一意图中第一槽位的信息;其中,所述第一意图中预先配置有所述第一槽位,且所述第一槽位为非必填关键槽位;
    当所述服务器确定未提取到所述第一槽位的信息时,所述服务器向所述用户提问,以确定所述第一槽位的信息是否必要;
    所述服务器接收第二输入,所述第二输入包含用户确认的所述第一槽位的信息是否必要的信息;
    若所述用户确认所述第一槽位的信息为必要信息,则所述服务器从所述第二输入中提取所述第一槽位的信息;所述服务器根据所述第一意图,以及所述第一槽位的信息执行所述第一意图对应的操作;
    若所述用户确认所述第一槽位的信息为非必要信息,则所述服务器不提取所述第一槽位的信息;所述服务器根据所述第一意图执行所述第一意图对应的操作。
  2. 根据权利要求1所述的人机交互的方法,其特征在于,所述服务器从所述第一输入中提取所述第一意图中第一槽位的信息包括:
    所述服务器将所述第一输入中识别出的各个词或各个实体输入到所述第一槽位对应的槽位提取模型中,分别计算所述第一输入中各个词或各个实体对应的置信度;
    若所述第一输入中的第一词或者第一实体的置信度大于或等于第一阈值,则所述服务器确认所述第一词或者所述第一实体为所述第一槽位的信息;
    若所述第一输入中各个词或者各个实体的置信度均小于所述第一阈值,则所述服务器确定未提取到所述第一槽位的信息。
  3. 根据权利要求1或2所述的人机交互的方法,其特征在于,所述的人机交互的方法还包括:
    若所述第一槽位对应于自定义槽位类型,则所述服务器分别计算所述第一输入中识别出的各个实体与所述自定义槽位类型中的各个词的相似度;
    若所述第一输入中识别出的各个实体与所述自定义槽位类型中的各个词的相似度均小于第二阈值,则所述服务器确认所述第一输入中不含有所述第一槽位的信息;若所述第一输入中第二实体与所述自定义槽位类型中的第二词的相似度大于或等于第三阈值,则所述服务器确认所述第二词为所述第一槽位的信息;若所述第一输入中存在任一实体与所述自定义槽位类型中的任一词的相似度大于或等于所述第二阈值,且小于所述第三阈值时,所述服务器确认向所述用户提问,以确定所述第一槽位的信息是否必要。
  4. 根据权利要求2所述的人机交互的方法,其特征在于,所述的人机交互的方法还包括:
    若所述第一输入中的各个词或者各个实体的置信度均小于第四阈值,则所述服务器确认所述第一输入中不含有所述第一槽位的信息;
    若所述第一输入中存在任一词或者任一实体的置信度小于所述第一阈值,且大于或等于第四阈值时,则所述服务器确认向所述用户提问,以确定所述第一槽位的信息是否必要。
  5. 根据权利要求1-4任一项所述的人机交互的方法,其特征在于,若所述用户确认的所述第一槽位的信息为必要信息,则所述服务器从所述第二输入中提取所述第一槽位的信息包括:
    若所述用户确认所述第一槽位的信息为必要信息,则所述服务器采用所述第一槽位对应的槽位提取模型或者采用规则,从所述第二输入中提取所述第一槽位的信息。
  6. 根据权利要求1-5任一项所述的人机交互的方法,其特征在于,所述第一意图中还预先配置有第二槽位,且所述第二槽位为必填槽位,所述的人机交互的方法还包括:
    当所述服务器确定未提取到所述第二槽位的信息时,所述服务器向所述用户提问,以便提取所述第二槽位的信息;
    所述服务器接收第三输入,并从所述第三输入中提取所述第二槽位的信息,所述第三输入包含所述用户的回答;
    所述服务器根据所述第一意图,所述第一槽位的信息,以及所述第二槽位的信息执行所述第一意图对应的操作;或者,所述服务器根据所述第一意图,以及所述第二槽位的信息执行所述第一意图对应的操作。
  7. 根据权利要求1-6任一项所述的人机交互的方法,其特征在于,所述第一意图中还预先配置有第三槽位,且所述第三槽位为非必填非关键槽位,所述的人机交互的方法还包括:
    当所述服务器确定未提取到所述第三槽位的信息时,所述服务器不提取所述第三槽位的信息。
  8. 一种服务器,其特征在于,可运用于人机对话系统中,包括:通信接口、存储器和处理器;所述通信接口、所述存储器与所述处理器耦合,所述存储器用于存储计算机程序代码,所述计算机程序代码包括计算机指令,当所述处理器从所述存储器中读取所述计算机指令,以使得所述服务器执行如下步骤:
    通过所述通信接口接收第一输入,所述第一输入包含用户的服务需求;
    根据所述第一输入,确定第一输入对应的第一领域,所述第一领域为所述用户的服务需求对应的任务场景;
    将所述第一输入分发到所述第一领域对应的意图识别模型中,识别出第一输入对应的第一意图,所述第一意图为所述第一领域中的子场景;
    从所述第一输入中提取所述第一意图中第一槽位的信息;其中,所述第一意图中预先配置有所述第一槽位,且所述第一槽位为非必填关键槽位;
    当所述服务器确定未提取到所述第一槽位的信息时,向所述用户提问,以确定所述第一槽位的信息是否必要;
    通过所述通信接口接收第二输入,所述第二输入包含用户确认的所述第一槽位的 信息是否必要的信息;
    若所述用户确认所述第一槽位的信息为必要信息,则从所述第二输入中提取所述第一槽位的信息;根据所述第一意图,以及所述第一槽位的信息执行所述第一意图对应的操作;
    若所述用户确认所述第一槽位的信息为非必要信息,则不提取所述第一槽位的信息;根据所述第一意图执行所述第一意图对应的操作。
  9. 根据权利要求8所述的服务器,其特征在于,所述处理器从所述第一输入中提取所述第一意图中第一槽位的信息具体包括:
    所述处理器将所述第一输入中识别出的各个词或各个实体输入到所述第一槽位对应的槽位提取模型中,分别计算所述第一输入中各个词或各个实体对应的置信度;
    若所述第一输入中的第一词或者第一实体的置信度大于或等于第一阈值,则确认所述第一词或者所述第一实体为所述第一槽位的信息;
    若所述第一输入中各个词或者各个实体的置信度均小于所述第一阈值,则确定未提取到所述第一槽位的信息。
  10. 根据权利要求8或9所述的服务器,其特征在于,
    所述处理器还用于若所述第一槽位对应于自定义槽位类型,则分别计算所述第一输入中识别出的各个实体与所述自定义槽位类型中的各个词的相似度;
    若所述第一输入中识别出的各个实体与所述自定义槽位类型中的各个词的相似度均小于第二阈值,则确认所述第一输入中不含有所述第一槽位的信息;若所述第一输入中第二实体与所述自定义槽位类型中的第二词的相似度大于或等于第三阈值,则确认所述第二词为所述第一槽位的信息;若所述第一输入中存在任一实体与所述自定义槽位类型中的任一词的相似度大于或等于所述第二阈值,且小于所述第三阈值时,则确认向所述用户提问,以确定所述第一槽位的信息是否必要。
  11. 根据权利要求9所述的服务器,其特征在于,
    所述处理器还用于若所述第一输入中的各个词或者各个实体的置信度均小于第四阈值,则确认所述第一输入中不含有所述第一槽位的信息;
    若所述第一输入中存在任一词或者任一实体的置信度小于所述第一阈值,且大于或等于第四阈值时,则确认向所述用户提问,以确定所述第一槽位的信息是否必要。
  12. 根据权利要求8-11任一项所述的服务器,其特征在于,若所述用户确认所述第一槽位的信息为必要信息,则所述处理器从所述第二输入中提取所述第一槽位的信息具体包括:
    若所述用户确认所述第一槽位的信息为必要信息,则所述处理器采用所述第一槽位对应的槽位提取模型或者采用规则,从所述第二输入中提取所述第一槽位的信息。
  13. 根据权利要求8-12任一项所述的服务器,其特征在于,在所述第一意图中还预先配置有第二槽位,且所述第二槽位为必填槽位时,
    所述处理器还具体用于当所述处理器确定未提取到所述第二槽位的信息时,向所述用户提问,以便提取所述第二槽位的信息;
    通过所述通信接口接收第三输入,并从所述第三输入中提取所述第二槽位的信息,所述第三输入包含所述用户的回答;
    根据所述第一意图,所述第一槽位的信息,以及所述第二槽位的信息执行所述第一意图对应的操作;或者,根据所述第一意图,以及所述第二槽位的信息执行所述第一意图对应的操作。
  14. 根据权利要求8-13任一项所述的服务器,其特征在于,在所述第一意图中还预先配置有第三槽位,且所述第三槽位为非必填非关键槽位时,
    所述处理器还具体用于当确定未提取到所述第三槽位的信息时,不提取所述第三槽位的信息。
  15. 一种计算机存储介质,其特征在于,包括计算机指令,当所述计算机指令在终端上运行时,使得所述终端执行如权利要求1-7中任一项所述的人机交互的方法。
  16. 一种计算机程序产品,其特征在于,当所述计算机程序产品在计算机上运行时,使得所述计算机执行如权利要求1-7中任一项所述的人机交互的方法。
PCT/CN2018/109704 2018-10-10 2018-10-10 一种人机交互的方法及电子设备 WO2020073248A1 (zh)

Priority Applications (6)

Application Number Priority Date Filing Date Title
CN201880093502.XA CN112154431A (zh) 2018-10-10 2018-10-10 一种人机交互的方法及电子设备
EP18936324.5A EP3855338A4 (en) 2018-10-10 2018-10-10 HUMAN-COMPUTER INTERACTION PROCESS AND ELECTRONIC DEVICE
PCT/CN2018/109704 WO2020073248A1 (zh) 2018-10-10 2018-10-10 一种人机交互的方法及电子设备
JP2021519867A JP7252327B2 (ja) 2018-10-10 2018-10-10 人間とコンピュータとの相互作用方法および電子デバイス
KR1020217013813A KR20210062704A (ko) 2018-10-10 2018-10-10 인간-컴퓨터 상호작용 방법 및 전자 장치
US17/284,122 US11636852B2 (en) 2018-10-10 2018-10-10 Human-computer interaction method and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/109704 WO2020073248A1 (zh) 2018-10-10 2018-10-10 一种人机交互的方法及电子设备

Publications (1)

Publication Number Publication Date
WO2020073248A1 true WO2020073248A1 (zh) 2020-04-16

Family

ID=70165016

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/109704 WO2020073248A1 (zh) 2018-10-10 2018-10-10 一种人机交互的方法及电子设备

Country Status (6)

Country Link
US (1) US11636852B2 (zh)
EP (1) EP3855338A4 (zh)
JP (1) JP7252327B2 (zh)
KR (1) KR20210062704A (zh)
CN (1) CN112154431A (zh)
WO (1) WO2020073248A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112380328A (zh) * 2020-11-11 2021-02-19 广州知图科技有限公司 一种安全应急响应机器人交互方法及系统
CN112528002A (zh) * 2020-12-23 2021-03-19 北京百度网讯科技有限公司 对话识别方法、装置、电子设备和存储介质
CN114490968A (zh) * 2021-12-29 2022-05-13 北京百度网讯科技有限公司 对话状态跟踪方法、模型训练方法、装置以及电子设备

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11875231B2 (en) * 2019-06-26 2024-01-16 Samsung Electronics Co., Ltd. System and method for complex task machine learning
US11972759B2 (en) * 2020-12-02 2024-04-30 International Business Machines Corporation Audio mistranscription mitigation
US20230195933A1 (en) * 2021-12-17 2023-06-22 Accenture Global Solutions Limited Machine learning and rule-based identification, anonymization, and de-anonymization of sensitive structured and unstructured data
KR102666928B1 (ko) * 2023-09-20 2024-05-20 (주)뮤자인 챗봇을 이용한 컨텐츠 관리 시스템

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103198155A (zh) * 2013-04-27 2013-07-10 俞志晨 一种基于移动终端的智能问答交互系统及方法
US20170011117A1 (en) * 2014-03-26 2017-01-12 Huawei Technologies Co., Ltd. Help Processing Method and Device Based on Semantic Recognition
CN107133349A (zh) * 2017-05-24 2017-09-05 北京无忧创新科技有限公司 一种对话机器人系统
CN107170446A (zh) * 2017-05-19 2017-09-15 深圳市优必选科技有限公司 语义处理服务器及用于语义处理的方法
CN108073628A (zh) * 2016-11-16 2018-05-25 中兴通讯股份有限公司 一种基于智能问答的交互系统与方法

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20010049707A1 (en) * 2000-02-29 2001-12-06 Tran Bao Q. Systems and methods for generating intellectual property
US7228278B2 (en) 2004-07-06 2007-06-05 Voxify, Inc. Multi-slot dialog systems and methods
US9772994B2 (en) 2013-07-25 2017-09-26 Intel Corporation Self-learning statistical natural language processing for automatic production of virtual personal assistants
GB2537903B (en) * 2015-04-30 2019-09-04 Toshiba Res Europe Limited Device and method for a spoken dialogue system
CN105068661B (zh) * 2015-09-07 2018-09-07 百度在线网络技术(北京)有限公司 基于人工智能的人机交互方法和系统
JP6960914B2 (ja) * 2015-10-21 2021-11-05 グーグル エルエルシーGoogle LLC ダイアログ・システムにおけるパラメータ収集および自動ダイアログ生成
CN109923512A (zh) * 2016-09-09 2019-06-21 上海海知智能科技有限公司 人机交互的系统及方法
WO2018157349A1 (zh) * 2017-03-02 2018-09-07 深圳前海达闼云端智能科技有限公司 一种机器人交互方法及交互机器人
JP6370962B1 (ja) * 2017-05-12 2018-08-08 ヤフー株式会社 生成装置、生成方法および生成プログラム
US10984003B2 (en) * 2017-09-16 2021-04-20 Fujitsu Limited Report generation for a digital task
US10490185B2 (en) * 2017-09-27 2019-11-26 Wipro Limited Method and system for providing dynamic conversation between application and user
CN107886948A (zh) * 2017-11-16 2018-04-06 百度在线网络技术(北京)有限公司 语音交互方法及装置,终端,服务器及可读存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103198155A (zh) * 2013-04-27 2013-07-10 俞志晨 一种基于移动终端的智能问答交互系统及方法
US20170011117A1 (en) * 2014-03-26 2017-01-12 Huawei Technologies Co., Ltd. Help Processing Method and Device Based on Semantic Recognition
CN108073628A (zh) * 2016-11-16 2018-05-25 中兴通讯股份有限公司 一种基于智能问答的交互系统与方法
CN107170446A (zh) * 2017-05-19 2017-09-15 深圳市优必选科技有限公司 语义处理服务器及用于语义处理的方法
CN107133349A (zh) * 2017-05-24 2017-09-05 北京无忧创新科技有限公司 一种对话机器人系统

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3855338A4 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112380328A (zh) * 2020-11-11 2021-02-19 广州知图科技有限公司 一种安全应急响应机器人交互方法及系统
CN112380328B (zh) * 2020-11-11 2024-02-06 广州知图科技有限公司 一种安全应急响应机器人交互方法及系统
CN112528002A (zh) * 2020-12-23 2021-03-19 北京百度网讯科技有限公司 对话识别方法、装置、电子设备和存储介质
CN112528002B (zh) * 2020-12-23 2023-07-18 北京百度网讯科技有限公司 对话识别方法、装置、电子设备和存储介质
CN114490968A (zh) * 2021-12-29 2022-05-13 北京百度网讯科技有限公司 对话状态跟踪方法、模型训练方法、装置以及电子设备

Also Published As

Publication number Publication date
CN112154431A (zh) 2020-12-29
JP7252327B2 (ja) 2023-04-04
EP3855338A4 (en) 2021-10-06
US11636852B2 (en) 2023-04-25
US20210383798A1 (en) 2021-12-09
EP3855338A1 (en) 2021-07-28
KR20210062704A (ko) 2021-05-31
JP2022515005A (ja) 2022-02-17

Similar Documents

Publication Publication Date Title
WO2020221072A1 (zh) 一种语义解析方法及服务器
CN112567457B (zh) 语音检测方法、预测模型的训练方法、装置、设备及介质
WO2020073248A1 (zh) 一种人机交互的方法及电子设备
CN110910872B (zh) 语音交互方法及装置
CN110134316B (zh) 模型训练方法、情绪识别方法及相关装置和设备
KR102389625B1 (ko) 사용자 발화를 처리하는 전자 장치 및 이 전자 장치의 제어 방법
US20220214894A1 (en) Command execution method, apparatus, and device
WO2022052776A1 (zh) 一种人机交互的方法、电子设备及系统
CN110556127B (zh) 语音识别结果的检测方法、装置、设备及介质
WO2021254411A1 (zh) 意图识别方法和电子设备
CN111724775A (zh) 一种语音交互方法及电子设备
US11537360B2 (en) System for processing user utterance and control method of same
CN111881315A (zh) 图像信息输入方法、电子设备及计算机可读存储介质
WO2022161077A1 (zh) 语音控制方法和电子设备
CN114691839A (zh) 一种意图槽位识别方法
CN114822543A (zh) 唇语识别方法、样本标注方法、模型训练方法及装置、设备、存储介质
CN111191018A (zh) 对话系统的应答方法和装置、电子设备、智能设备
WO2021238371A1 (zh) 生成虚拟角色的方法及装置
WO2022007757A1 (zh) 跨设备声纹注册方法、电子设备及存储介质
WO2024067630A1 (zh) 一种输入方法、电子设备和存储介质
WO2023098467A1 (zh) 语音解析方法、电子设备、可读存储介质及芯片系统
WO2021238338A1 (zh) 语音合成方法及装置
CN116860913A (zh) 语音交互方法、装置、设备及存储介质
CN115938369A (zh) 语音识别方法、电子设备及存储介质

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2021519867

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2018936324

Country of ref document: EP

Effective date: 20210423

ENP Entry into the national phase

Ref document number: 20217013813

Country of ref document: KR

Kind code of ref document: A