WO2020221072A1 - 一种语义解析方法及服务器 - Google Patents

一种语义解析方法及服务器 Download PDF

Info

Publication number
WO2020221072A1
WO2020221072A1 PCT/CN2020/086002 CN2020086002W WO2020221072A1 WO 2020221072 A1 WO2020221072 A1 WO 2020221072A1 CN 2020086002 W CN2020086002 W CN 2020086002W WO 2020221072 A1 WO2020221072 A1 WO 2020221072A1
Authority
WO
WIPO (PCT)
Prior art keywords
slot
server
skill
entity
intention
Prior art date
Application number
PCT/CN2020/086002
Other languages
English (en)
French (fr)
Inventor
张晴
杨威
张良和
张轶博
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to US17/607,657 priority Critical patent/US11900924B2/en
Priority to EP20798047.5A priority patent/EP3951773A4/en
Publication of WO2020221072A1 publication Critical patent/WO2020221072A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1815Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • G06F40/35Discourse or dialogue representation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/065Adaptation
    • G10L15/07Adaptation to the speaker
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1822Parsing for meaning understanding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Definitions

  • This application relates to the field of artificial intelligence, and in particular to a semantic analysis method and server.
  • voice assistants With the popularity of voice technology and the prevalence of voice interaction, the position of voice assistants in smart electronic devices such as mobile phones will also become more and more important. Generally speaking, the voice assistant can be disassembled into: voice technology and content service.
  • voice technologies including voice recognition, semantic understanding, and speech synthesis.
  • Figure 1 exemplarily shows a conversation between an existing voice assistant and a user.
  • the voice assistant will give the weather in Beijing tomorrow.
  • the voice assistant is not sure what time “that day” refers to.
  • the voice assistant needs to ask the user, such as "Which day do you want to book the ticket?". In this way, user operations are cumbersome, resulting in poor user experience.
  • the embodiments of the present application provide a semantic analysis method and server.
  • the server can accurately understand the meaning of pronouns in user sentences without asking the user for the meaning of pronouns, which improves user experience.
  • a semantic analysis method may include: a first server extracts an entity of a first slot from a first user sentence; the first user sentence is a user sentence received by the first server; The slot is the slot where the first intention is configured; the first intention is the intention where the first skill is configured, and the first skill is configured with one or more intentions; the first intention and the first skill are the first server according to the first If the user sentence is determined, it matches the service demand indicated by the first user sentence; under the condition that the entity in the first slot is a pronoun, the first server modifies the entity in the first slot to the entity in the second slot; The second slot is configured as an associated slot of the first slot, and the entity of the second slot is extracted by the first server from the second user sentence; the second user sentence is received by the first server before the first user sentence To; the second slot is the slot where the second intention is configured, and the second intention is configured as the associated intention of the first intention; the second intention is the intention where the second skill is configured, and the second
  • the first server receives the first user sentence collected from the electronic device; the first user sentence is an audio user sentence or a text user sentence.
  • the first server receives the first user sentence sent by the voice recognition server, and the voice recognition server converts the user sentence in audio form collected by the electronic device into a user sentence in text form after voice recognition.
  • the method further includes: the first server receives an association skill request sent by the second server, and the association skill request is used to request that the second skill is configured as an association of the first skill Skills; the associated skills request contains the instruction information of the first skill and the instruction information of the second skill; in response to the associated skills request, the first server obtains confirmation information from the third server; the third server is the application server corresponding to the second skill; The confirmation information is used by the third server to confirm that the second skill is configured as the associated skill of the first skill; based on the confirmation information, the first server configures the second skill as the associated skill of the first skill. In this way, the skill developer of the first skill and the card skill developer of the second skill can check the slot settings of the opponent's skill to perform further association configuration.
  • the method further includes: the first server receives an associated slot request sent by the second server, the associated slot request is used to request that the second slot be configured as the first slot The associated slot of the bit; the associated slot request contains the indication information of the first slot and the second slot; in response to the associated slot request; the first server configures the second slot as the association of the first slot Slot.
  • the first server can modify the entity in the second slot to the entity in the second slot.
  • the method further includes: the first server determines whether the slot type of the first slot is the same as the slot type of the second slot; if they are the same, the first server Configure the second slot as the associated slot of the first slot. In this way, it is avoided that different types of slot associations will affect the accuracy of semantic analysis.
  • the method further includes: if the entity configured for the first slot comes from the system thesaurus, the first server compares the slot name of the second slot with the first The slot name is associated with the slot; the system thesaurus is the one provided by the first server for all skills; the system thesaurus makes the configured entities come from the same set of entities of all slots in the same system thesaurus; the second slot
  • the source of the configured entity is the same as the source of the configured entity in the first slot; under the condition that the configured entity in the first slot comes from the first custom vocabulary, the first server will change the slot name of the second slot Associate with the slot name of the first slot; the first server associates the first custom dictionary with the second custom dictionary; the first custom dictionary is the set of entities configured in the first slot; the first custom Define the dictionary as the one created by the first server for the first skill; the first custom dictionary contains limited words; the second custom dictionary is the set of configured entities in the second slot; the second custom dictionary The vocabulary created for the first server for the second
  • the first service result is output by the electronic device; the output method includes at least the first service result being displayed on the screen of the electronic device, and the first service result being voiced by the electronic device. Broadcast. In this way, the end user can obtain the service result.
  • a semantic analysis method may include: a second server receives a first service request sent by a first server; the first service request includes indication information of a first intention and an entity in a first slot; Under the condition that the entity in the first slot extracted from the first user sentence is a pronoun, the entity in the first slot is modified from the pronoun to the entity in the second slot; the second slot is configured as the first slot The associated slot; the first user sentence is collected by the electronic device and sent to the first server; the first slot is the slot where the first intention is configured; the first intention is the intention of the first skill, the first skill It is configured with one or more intents; the second server is the application server corresponding to the first skill; the first skill and the first intent are determined by the first server according to the first user sentence, and correspond to the service demand represented by the first user sentence Match; the second user sentence is collected by the electronic device before the first user sentence; the second slot is the slot where the second intention is configured, and the second intention is the intention where the second
  • the second server sends an associated skill request to the first server, and the associated skill request is used to request that the second skill is configured as an associated skill of the first skill; the first request Contains the instruction information of the first skill and the instruction information of the second skill. In this way, the first server can be made to associate the first skill with the second skill.
  • the second server sends an associated slot request to the first server; the associated slot request is used to request that the second slot be configured as an associated slot of the first slot;
  • the second request includes the indication information of the first slot and the indication information of the second slot.
  • the first server can associate the first slot with the second slot.
  • a semantic analysis method may include: a first server extracts an entity of a first slot from a first user sentence; the first user sentence is a user sentence received by the first server; The slot is the slot where the first intention is configured; the first intention is the intention where the first skill is configured, and the first skill is configured with one or more intentions; the first intention and the first skill are the first server according to the first If the user sentence is determined, it matches the service demand represented by the first user sentence; under the condition that the entity in the first slot is a pronoun, the first server modifies the entity in the first slot to the first candidate sentence corresponding to the A candidate entity; the first candidate sentence is the candidate sentence with the highest score after scoring and sorting among the M candidate sentences; the M candidate sentences are candidate sentences whose semantic recognition confidence is greater than the confidence threshold from the K candidate sentences; K candidates The sentences are candidate sentences obtained by replacing the entities in the first slot in the first user sentence by K candidate entities; the K candidate entities are the entities in the second slot extracted from the second user
  • a server which is used in a human-machine dialogue system, and includes: a communication interface, a memory, and a processor; the communication interface, the memory are coupled with the processor, and the memory is used to store computer program codes.
  • the program code includes computer instructions.
  • a computer-readable storage medium including instructions, which are characterized in that, when the foregoing instructions are executed on a server, the server executes any possible implementation manner as in the first aspect, or as the second Any possible implementation manner in the aspect, or as any possible implementation manner in the third aspect.
  • a computer program product which when the computer program product runs on a computer, causes the computer to execute any one of the possible implementation manners of the first aspect, or any one of the second aspect Possible implementation, or any one of the possible implementations in the third aspect.
  • Fig. 1 is a schematic diagram of a terminal interface of a human-machine dialogue in the prior art
  • FIG. 2 is a schematic diagram 1 of the composition of a human-machine dialogue system provided by an embodiment of the application;
  • FIG. 3 is a second schematic diagram of the composition of a human-machine dialogue system provided by an embodiment of the application.
  • FIGS. 4A-4D are schematic diagrams of some electronic device interfaces for creating skills provided by embodiments of this application.
  • 5A-5B are schematic diagrams of some electronic device interfaces for completing creation skills provided by embodiments of the application.
  • 6A-6D are schematic diagrams of some electronic device interfaces for skill building provided by embodiments of this application.
  • Figures 7A-7B are schematic diagrams of interaction between skills group building provided by embodiments of the application.
  • FIG. 8 is a schematic diagram of an electronic device interface configured between skills provided by an embodiment of the application.
  • FIG. 9 is a schematic diagram of some electronic device interfaces configured between skills provided by an embodiment of the application.
  • FIG. 10 is a schematic diagram of an electronic device interface for viewing shared skills according to an embodiment of the application.
  • FIG. 11 is a schematic diagram of a terminal interface of a human-machine dialogue provided by an embodiment of the application.
  • 12A-12B are schematic diagrams of a human-machine system for realizing location entity sharing according to an embodiment of this application;
  • FIG. 13 is a schematic diagram of another terminal interface for man-machine dialogue provided by an embodiment of the application.
  • 14A-14B are schematic diagrams of realizing time entity sharing in a human-machine system according to an embodiment of the application.
  • FIG. 16 is a schematic flowchart of another semantic parsing method provided by an embodiment of this application.
  • FIG. 17 is a schematic flowchart of another semantic parsing method provided by an embodiment of this application.
  • FIG. 19 is a schematic structural diagram of a server provided by an embodiment of this application.
  • FIG. 20 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • the electronic device may be a portable electronic device that also includes other functions such as a personal digital assistant and/or a music player function, such as a mobile phone, a tablet computer, and a wearable electronic device with wireless communication function (such as a smart watch) Wait.
  • portable electronic devices include but are not limited to carrying Or portable electronic devices with other operating systems.
  • the above-mentioned portable electronic device may also be other portable electronic devices, such as a laptop computer with a touch-sensitive surface or a touch panel. It should also be understood that in some other embodiments, the above-mentioned electronic device may not be a portable electronic device, but a desktop computer with a touch-sensitive surface or a touch panel.
  • UI user interface
  • the term "user interface (UI)" in the description, claims and drawings of this application is a medium interface for interaction and information exchange between applications or operating systems and users, which implements the internal form of information And the user can accept the conversion between the forms.
  • the user interface of the application is the source code written in a specific computer language such as java, extensible markup language (XML), etc.
  • the interface source code is parsed and rendered on the terminal device, and finally presented as content that can be recognized by the user.
  • Control also called widget, is the basic element of user interface. Typical controls include toolbar, menu bar, text box, button, scroll bar (scrollbar), pictures and text.
  • the attributes and content of the controls in the interface are defined by tags or nodes.
  • XML specifies the controls contained in the interface through nodes such as ⁇ Textview>, ⁇ ImgView>, and ⁇ VideoView>.
  • a node corresponds to a control or attribute in the interface, and the node is parsed and rendered as user-visible content.
  • applications such as hybrid applications, usually include web pages in their interfaces.
  • a web page, also called a page, can be understood as a special control embedded in the application program interface.
  • the web page is source code written in a specific computer language, such as hypertext markup language (GTML), cascading style Tables (cascading style sheets, CSS), java scripts (JavaScript, JS), etc.
  • web page source code can be loaded and displayed as user-recognizable content by a browser or a web page display component with similar functions.
  • the specific content contained in a web page is also defined by tags or nodes in the source code of the web page.
  • GTML uses ⁇ p>, ⁇ img>, ⁇ video>, and ⁇ canvas> to define the elements and attributes of the web page.
  • GUI graphical user interface
  • the commonly used form of user interface is a graphical user interface (GUI), which refers to a user interface related to computer operations that is displayed in a graphical manner. It can be an icon, window, control and other interface elements displayed on the display screen of an electronic device.
  • the control can include icons, buttons, menus, tabs, text boxes, dialog boxes, status bars, navigation bars, Widgets, etc. Visual interface elements.
  • the human-machine dialogue system 10 may include an electronic device 100, a human-machine interaction server 200, and application servers 300 of one or more content providers.
  • the application server of the content provider may be referred to as a third-party application server.
  • the electronic device 100 and the human-computer interaction server 200 may adopt a telecommunication network (3G/4G/5G and other communication networks) communication technology or wireless fidelity (Wireless Fidelity, Wi-Fi) communication technology to establish a communication connection.
  • the human-computer interaction server 200 and the third-party application server 300 may establish a communication connection through a local area network or a wide area network. among them:
  • the electronic device 100 can be used to collect user sentences and send the user sentences to the human-computer interaction server 200.
  • the user statement can indicate the user's service requirements. For example, weather query requirements, air ticket reservation requirements, and so on.
  • the electronic device 100 may convert the collected user sentences in audio form into user sentences in text form, and then send the user sentences in text form to the human-computer interaction server 200.
  • the electronic device 100 can also be used to receive service results based on user service requirements fed back by the human-computer interaction server 200, such as weather query results, air ticket reservation results, and so on.
  • the electronic device 100 may also feed back the received service result to the user.
  • These functions may be performed by the electronic device 100 based on a voice assistant.
  • the voice assistant can be installed on the electronic device 100.
  • the voice assistant may be a voice interactive application. Voice assistants can also be called chat assistants, chat robots, and so on. This application does not restrict its naming. Through the voice assistant, the user and the electronic device 100 can perform voice interaction.
  • the electronic device 100 may be a mobile phone, a tablet computer, a personal computer (PC), a personal digital assistant (personal digital assistant, PDA), a smart watch, a netbook, a wearable electronic device, or an augmented reality technology (Augmented Reality, AR) equipment, virtual reality (Virtual Reality, VR) equipment, vehicle-mounted equipment, smart cars, smart speakers, etc., this application does not impose special restrictions on the specific form of the electronic device 100.
  • augmented reality technology Augmented Reality, AR
  • VR Virtual Reality
  • vehicle-mounted equipment smart cars, smart speakers, etc.
  • the human-computer interaction server 200 may be used to receive user sentences sent by the electronic device 100.
  • the human-computer interaction server 200 performs semantic understanding of the user sentence, and determines the skill (such as the “ink weather query” skill) and the intention (such as the dialogue intention “check the weather”) corresponding to the user sentence through the semantic understanding. And extract from the user sentence an entity (for example, "Beijing") corresponding to the intent to configure a slot (for example, "city slot”).
  • the human-computer interaction server 200 sends a service request to the third-party application server 300 based on the intention of the user sentence and the entity corresponding to the slot extracted from the user sentence.
  • the service request sent by the human-computer interaction server 200 matches the service requirement expressed in the user sentence.
  • the service request may include indication information corresponding to the user's sentence (such as "check weather") and the entity corresponding to the slot (such as "tomorrow, Beijing").
  • the weather query service request may include the time and city extracted from the user sentence
  • the air ticket reservation service request may include the ticket booking time, departure place, destination, etc. extracted from the user sentence.
  • the human-computer interaction server 200 may also receive service results returned by the third-party application server 300, such as weather query results, air ticket reservation results, and so on. Finally, the human-computer interaction server 200 sends the received service result to the electronic device 100.
  • the third-party application server 300 may be used to receive the service request sent by the human-computer interaction server 200.
  • the third-party application server 300 obtains the indication information corresponding to the user sentence in the service request (such as "weather query request"), such as checking the weather, and the entity extracted from the user sentence (such as "Tomorrow, Beijing”).
  • the service result of the service request (the weather in Beijing tomorrow).
  • the third-party application server 300 may return the service result of the service request to the human-computer interaction server 200.
  • the human-computer interaction system 10 may also include a voice recognition server (such as the voice recognition server of iFLYTEK, Baidu voice recognition server).
  • a voice recognition server such as the voice recognition server of iFLYTEK, Baidu voice recognition server.
  • the electronic device 100 After the electronic device 100 receives the user sentence, it sends it to the voice recognition server for voice recognition. , And then convert the user sentence into text and send it to the human-computer interaction server 200 for semantic analysis.
  • FIG. 3 shows the overall flow of the man-machine dialogue involved in this application. Expand below:
  • the electronic device 100 can collect the user sentence of the user 301. This user sentence may be referred to as voice input 30a.
  • the electronic device 100 can convert the voice input 30a into a text form through the voice recognition module 302, that is, the text input 30b.
  • the electronic device 100 may send the user sentence in audio form, that is, the voice input 30a to the human-computer interaction server 200.
  • the electronic device 100 may also send the user sentence in text form, that is, the text input 30b to the human-computer interaction server 200.
  • the electronic device 100 may send the text input 30b to the human-computer interaction server 200 through the communication connection 101 shown in FIG. 2.
  • the human-computer interaction server 200 may receive the user sentence (voice input 30a or text input 30b) sent by the electronic device 100.
  • the human-computer interaction server 200 may have a voice recognition module, which is used to convert a user sentence in a voice form into a user sentence in a text form.
  • the voice recognition module in the human-computer interaction server 200 may perform voice recognition on the voice input and convert it into text.
  • the human-computer interaction server 200 can perform semantic understanding of user sentences, and extract the user's service requirements from the user sentences. After that, the human-computer interaction server 200 may also send a service request to the third-party application server 300 based on the user's service requirements.
  • the human-computer interaction server may also receive the service result returned by the third-party application server 300, and send the service result to the electronic device 100.
  • the human-computer interaction server 200 may include: a semantic understanding module 303, a dialogue management module 304, and a natural language generation module 305, where:
  • the semantic understanding module 303 can be used to perform semantic recognition on user sentences (voice input 30a in audio form or text input 30b in text form). Specifically, the semantic understanding module 303 can perform skill classification, intention classification, and slot extraction on user sentences (voice input 30a in audio form or text input 30b in text form).
  • multiple specific skills are integrated on the human-computer interaction server 200, and each skill corresponds to a type of service or function, such as meal order service, taxi service, weather query, etc. How to create skills will be described in detail below, so I won’t go into details here.
  • Each skill can be configured with one or more intentions.
  • the "ink weather query” skill can be configured with: dialogue intent "check weather”, question and answer intent "check weather”.
  • Each intent can be configured with one or more slots.
  • the dialogue intention "Check the weather” can be configured with a time slot and a city slot. The intent configuration and slot configuration will also be described in detail below, and will not be repeated here.
  • the Dialog Management module 304 can be used to take the output of the semantic understanding module 303 as an input, and decide the operation to be performed by the human-computer interaction server 200 at this time based on the historical input.
  • the dialogue management module 304 may include two parts: status tracking and dialogue strategy. State tracking includes continuous tracking of various information of the dialogue, and the current dialogue state is updated according to the old state, user state (information output by the natural language understanding module 303) and system state (that is, by querying the database).
  • the dialogue strategy is closely related to the task scenario, and is usually used as the output of the dialogue management module 303.
  • the Natural Language Generation (NLG) module 305 is configured to generate text information for feedback to the user according to the output of the dialogue management module 304.
  • the natural language generation module 305 can generate natural language in a template-based, grammar-based or model-based manner. Template-based and grammar-based are mainly rule-based strategies. Model-based can use, for example, Long Short-Term Memory (LSTM). The embodiments of the present application do not limit the specific implementation of natural language generation.
  • LSTM Long Short-Term Memory
  • Skill can be a service or function, such as weather inquiry service, air ticket reservation service, etc.
  • Skills can be configured by developers of third-party applications (such as "Ink Weather").
  • One or more intents can be configured under a skill. Specifically, a developer of a third-party application can log in to the skill creation page of the human-computer interaction server 200 through the electronic device 400 to create skills.
  • An intent can be a more detailed service or function under a skill. Intent can be divided into dialogue intent and question and answer intent. If you need to parameterize, you should use the dialog intent, such as the intent of ordering a train ticket, which requires parameters such as train number and departure time, you should use the dialog intent. Q&A intentions prefer to solve FAQ-type problems. For example, how do you charge refund fees? One intent can be configured with one or more slots.
  • the slot is the key information used to express the user's intention in the user sentence. For example, if the user's intention is a dialogue intent to "check the weather", then the slots that the human-computer interaction server 200 needs to extract from the sentence are the city slot and the time slot.
  • the city slot is used to indicate the weather of "where" to query, and the time slot is used to indicate the weather of "what day” to query.
  • Slots can include attributes such as slot name and slot type. For example, if the slot name is equivalent to the specific parameter of the slot, then the slot type is the value set of the parameter, and one value in the value set represents an entity. For example, the sentence "What's the weather in Beijing tomorrow" can extract city slots and time slots, where the entity of the city slot is "Beijing" and the entity of the time slot is "Tomorrow”.
  • the slot type is used to indicate which word database (system word database or custom word database) the entity configured for the slot comes from.
  • the entity configured for the city slot can be from a system thesaurus (such as a system location-based thesaurus), or it can be a custom dictionary (such as a custom location-based thesaurus).
  • the system lexicon is a lexicon provided by the human-computer interaction server 200, which can be selected for each skill.
  • the words of the system thesaurus are not enumerable.
  • the source of configuration entities of slots configured under different skills can be the same system dictionary. If the source of entities configured for multiple slots is the same system vocabulary, the set of entities configured for these multiple slots is the same.
  • the custom dictionary is a dictionary established by the human-computer interaction server 200 for a certain skill.
  • the entities in the custom dictionary are limited. For example, if the human-computer interaction server 200 provides a custom vocabulary for the Ink Weather skill, then the custom vocabulary can only be selected for the slots configured under the Ink Weather skill when the entity source is configured. The entity source of the slot configuration of other skills cannot select the custom dictionary.
  • Slots can include required slots, non-required critical slots, and non-required non-critical slots.
  • a required slot is a slot that must be in a sentence. If the required slot in the user input sentence is missing, the human-computer interaction server 200 will not be able to correctly understand the meaning of the user input sentence. The non-mandatory key slot may not appear in a sentence, but the human-computer interaction server 200 can fill in the slot information according to GPS or default information. If the user input sentence acquired by the human-computer interaction server 200 is missing non-required non-critical slots, it will not affect the semantic understanding of the user input sentence by the human-computer interaction server 200.
  • the time slot corresponding to the entity "today” is an optional key slot
  • the city slot corresponding to the entity "Shenzhen” is a required slot Position
  • the regional slot corresponding to the entity “Nanshan Science and Technology Park” is not required and non-critical.
  • the skill in the human-computer interaction server 200 that can provide a service that matches the service demand indicated by the first user sentence is called the first skill.
  • the intent configured in the first skill to match the service demand indicated in the first user sentence is called the first intent, and the first intent is configured with the first slot.
  • the entity of the first slot extracted by the human-computer interaction server 200 from the first user sentence can express the key information of the service demand expressed in the first user sentence. Only when the application server corresponding to the first skill receives the indication information containing the first intention and the entity in the first slot can provide the service corresponding to the service demand indicated in the first user sentence.
  • the first user sentence is "query today's weather in Beijing".
  • the service demand indicated by the first user sentence is to check the weather.
  • the "ink weather query” skill in the human-computer interaction server 200 that matches the service requirement is called the first skill.
  • the dialogue intention "Check weather” configured in the first skill is the first intention.
  • the city slot or time slot where the first intention is configured is the first slot.
  • the following describes the user interface that can be used to create skills, create intentions, and train human-machine dialogue models.
  • FIG. 4A exemplarily shows a user interface 40A displayed by the electronic device 400 that can be used to create skills.
  • a control 401 (“Create Skill") may be displayed in the user interface 40A.
  • the electronic device 400 can detect the selection operation on the control 401.
  • the selection operation may be a mouse operation on the control 401 (such as a mouse click operation), or a touch operation on the control 401 (such as a finger click operation), and so on.
  • the electronic device 400 may refresh the user interface 40A.
  • the refreshed user interface 40A may include a control 402 and a control 403 as shown in FIG. 4B.
  • the control 402 can be used for the user (such as the ink weather developer) to input the skill name
  • the control 403 can be used for the user (such as the ink weather developer) to input the skill classification.
  • the user can set the skill name as "ink weather query” through the control 402, and can set the skill classification as "weather query” through the control 403.
  • the refreshed user interface 40A may also include a control 404 ("Save") as shown in FIG. 4B.
  • the electronic device 400 can detect the selection operation acting on the control 404.
  • the selection operation may be a mouse operation on the control 404 (such as a mouse click operation), or a touch operation on the control 404 (such as a finger click operation), and so on.
  • the electronic device 400 may create a skill based on the skill name and skill classification set by the user.
  • FIG. 4C exemplarily shows a user interface 40C displayed by the electronic device 400 that can be used to create an intent and set the slot associated with the intent.
  • a control 405, a control 406, and a control 407 may be displayed in the user interface 40C.
  • the control 405 can be used by a user (such as an ink weather developer) to input an intention name.
  • the control 406 is used to display the intention name (such as "check weather") input by the user (such as the ink weather developer).
  • the control 407 can be used (for example, ink weather developers) to add new slots.
  • the user electronic equipment 400 can detect the selection operation on the control 407.
  • the selection operation may be a mouse operation on the control 407 (such as a mouse click operation), or a touch operation on the control 407 (such as a finger click operation), and so on.
  • the electronic device 400 may refresh the user interface 40C.
  • the refreshed user interface 40C may include controls as shown in FIG. 4D: control 408, control 409, and control 4010.
  • the control 408 can be used by a user (such as an ink weather developer) to set the city slot in the "weather check" intention.
  • the interface 40D can show that the entity source in the slot type corresponding to the city slot is the system thesaurus sys.location.city, and the attribute of the city slot is a required slot.
  • the control 409 may be used by a user (such as an ink weather developer) to set the time slot in the "weather check” intention.
  • the interface 40D can display that the entity source in the slot type corresponding to the time slot is the system thesaurus sys.time, and the attribute of the time slot is an optional key slot.
  • the entity sources in the slot type are mainly a system dictionary and a custom dictionary (also called a user dictionary).
  • the system lexicon is a lexicon preset by the human-computer interaction server 200, and entities in the system lexicon are non-enumerable, such as sys.time, sys.location.city, sys.name, sys, phoneNum, etc.
  • the custom lexicon is a lexicon defined by the skill developer, and the number of words in the custom lexicon is limited.
  • FIG. 5A exemplarily shows a user interface 50A displayed by the electronic device 400 that can be used to train a human-machine dialogue model.
  • a control 501 (“Start Training") may be displayed in the user interface 50A.
  • the electronic device 400 can detect the selection operation acting on the control 501.
  • the selection operation may be a mouse operation on the control 501 (such as a mouse click operation), or a touch operation on the control 501 (such as a finger click operation), and so on.
  • the electronic device 400 may refresh the user interface 50A.
  • the man-machine dialogue model of a new skill (such as the “ink weather query” skill) trained by the man-machine interaction server 200 can classify user input sentences into skill classification, intention classification, and slot extraction. For example, suppose that the man-machine dialogue model trained by the human-computer interaction server 200 is the "ink weather query” skill. Then the man-machine dialogue model can identify the user input sentence (such as: what will be the weather in Beijing tomorrow) corresponding to the skill "ink weather query", the corresponding intent is the dialogue intent "check weather”, and extract the corresponding entity of the city slot (Beijing) and the time slot correspond to the entity (tomorrow).
  • the man-machine dialogue model trained by the human-computer interaction server 200 is the "ink weather query” skill.
  • the man-machine dialogue model can identify the user input sentence (such as: what will be the weather in Beijing tomorrow) corresponding to the skill "ink weather query", the corresponding intent is the dialogue intent "check weather”, and extract the corresponding entity of the city slot (Beijing) and the
  • the refreshed user interface 50A may include a control 502 and a control 503 as shown in FIG. 5B.
  • the control 502 can be used to retrain the man-machine dialogue model.
  • the control 503 (“release skill") can be used to release the created skill (such as the weather checking skill).
  • the voice assistant can interact with the user through voice, identify the user's service needs, and provide feedback to the user.
  • existing voice assistants cannot determine the specific meanings of pronouns in user sentences. This is because, after identifying the skill and intention corresponding to the user sentence, the human-computer interaction server 200 can further determine the entity corresponding to the slot associated with the intention from the user sentence. If the entity corresponding to the certain slot is a pronoun, the existing human-computer interaction server 200 cannot determine the specific meaning of the pronoun.
  • the existing voice assistant can recognize that the skill corresponding to the user sentence is the "ink weather query” skill, and it can also recognize that the intention corresponding to the user sentence is the dialogue intention" Check the weather”. Moreover, the existing voice assistant can also determine the entity corresponding to the slot (such as time slot, city slot) associated with the dialogue intention "check weather" from this user sentence. Specifically, the entity corresponding to the time slot is tomorrow, and the entity corresponding to the city slot is Beijing.
  • the entity corresponding to the slot such as time slot, city slot
  • the existing voice assistant can recognize that the skill corresponding to the user sentence is the ticket booking skill, and the intent corresponding to the user sentence is the dialogue intent. "book a flight”.
  • Existing voice assistants can also determine the entity corresponding to the slot (such as time slot, departure slot, destination slot) associated with the dialogue intention "booking a ticket” from this user sentence. Specifically, the entity corresponding to the time slot is the pronoun "that day”, the entity corresponding to the departure slot is the current location of the user, and the entity corresponding to the destination slot is Beijing.
  • the electronic device 100 may determine the departure place through positioning technology (such as GPS positioning, etc.), and notify the human-computer interaction server 200 of the departure place.
  • the following embodiments of the present application provide a human-computer interaction method, which can determine the meaning of the pronouns in the human-computer dialogue.
  • the specific meanings of pronouns such as "here” and "day”. In this way, the efficiency of the user using the electronic device in the voice interaction process can be improved, and the user experience can be improved.
  • the human-computer interaction server 200 can group different skills, and then configure skill 1 (such as the ink weather skill) in the grouped skills as the associated skill of skill 2 (such as the travel skill).
  • skill 1 such as the ink weather skill
  • skill 2 such as the travel skill
  • the human-computer interaction server 200 detects that the user's service demand is switched from skill 1 to skill 2, and the user input sentence corresponding to skill 2 contains pronouns.
  • skill 1 is configured as a related skill of skill 2.
  • the human-computer interaction server 200 determines the meaning of the pronoun by acquiring the associated skill of skill 2, that is, the entity in skill 1. How to build a group between skills, and how to perform entity sharing configuration between skills after the group is established will be described in detail below, so I won't go into details here.
  • the human-computer interaction server 200 first receives the user sentence A (such as "What's the weather in Beijing tomorrow") that the user sentence A sent by the electronic device 100 corresponds to the skill A (such as the "ink weather query” skill).
  • the service demand expressed by the user sentence B sent by the electronic device 100 (for example, "what is the weather next week") received by the human-computer interaction server 200 also corresponds to skill A.
  • the user sentence B received after the human-computer interaction server 200 contains pronouns. Because under the same skill, the slots associated with the same intent are the same. For example, the voice assistant in the electronic device 100 first collects the user sentence A "what is the weather in Beijing tomorrow" and returns the weather query result to the user.
  • the voice assistant in the electronic device 100 immediately collected the user sentence B "What's the weather next week", because the skills corresponding to the user sentence A and the user sentence B are the same, and both are the "ink weather query" skills .
  • User sentence A and user sentence B correspond to the same intention, and both are dialogue intentions to "check the weather”. Therefore, the slots that the human-computer interaction server 200 needs to extract from the user sentence A and the user sentence B are the same, and the slots that the human-computer interaction server 200 needs to extract are the time slot and the city slot.
  • the human-computer interaction server 200 extracts the entity corresponding to the city slot from the user sentence B "what is the weather next week", it is the pronoun "there".
  • the human-computer interaction server 200 directly replaces “there” with the entity “Beijing” corresponding to the city slot extracted from the user sentence A. In order to determine the meaning of the pronoun.
  • the establishment of a group between skills may refer to the establishment of a mapping relationship between one skill and other skills in the human-computer interaction server 200.
  • the human-computer interaction server 200 establishes a mapping relationship between ink weather skills and where to travel skills. After the human-computer interaction server 200 saves the mapping relationship established by the two skills, the human-computer interaction server 200 allows the two skills to view each other's slot settings.
  • 6A-6D exemplarily show the user interface of the electronic device 400 to build a group between skills. This will be described in detail below.
  • the user interface 60A may include: a control 601 ("invitation skills") and a control 602 ("received invitation to be confirmed”).
  • the electronic device 400 can detect the selection operation acting on the control 601.
  • the selection operation may be a mouse operation on the control 601 (such as a mouse click operation), or a touch operation on the control 601 (such as a finger click operation), and so on.
  • the electronic device 400 may refresh the user interface 60A.
  • the refreshed user interface 60B may include a control 603 and a control 604 as shown in FIG. 6B.
  • the control 603 can be used for the user (such as the ink weather developer) to select the skills to be invited to the group.
  • the control 604 can be used by a user (such as an ink weather developer) to send a group establishment request to the human-computer interaction server 200.
  • the electronic device 400 can detect a selection operation on the control 602.
  • the selection operation may be a control 602 operation (such as a mouse click operation), or a touch operation on the control 602 (such as a finger click operation), and so on.
  • the electronic device 400 may refresh the user interface 60A.
  • the refreshed user interface 60A may be as shown in FIG. 6D, and the electronic device 400 may display which skills group invitations have been received. For example, as shown in FIG. 6D, the electronic device 400 may display the user "received an invitation from JD.com” and "received an invitation from Taobao".
  • Figures 7A-7B exemplarily show the interaction process of two skill group building, which will be described below.
  • FIG. 7A exemplarily shows the process of the Ink Weather application sending a group invitation to the Qunar Travel application.
  • the interface 70A is a user interface of the ink weather application displayed by the electronic device 400 for initiating skill building.
  • the interface 70A may include: a control 701 and a control 702.
  • the control 701 can be used for the ink weather developer to determine the skills to be invited.
  • the ink weather application developer determines through the control 702 that the invitation skill is the "where to travel" skill.
  • the control 702 can be used by the ink weather application developer to send skill invitations to the human-computer interaction server 200.
  • the human-computer interaction server 200 may receive the group establishment request 700 sent by the ink weather application through the communication connection 102 shown in FIG. 2. Then, the human-computer interaction server 200 sends the group establishment request 700 to the Qunar travel application through the communication connection 102.
  • the electronic device 400 may display the where to travel application receives the group establishment request interface 70B.
  • the interface 70B may include a control 703 and a control 704.
  • the control 703 can be used for Qunar Travel application developers to approve the group invitation of the ink weather application.
  • the control 703 can be used for the Qunar Travel application developer to reject the group invitation of the ink weather application.
  • the electronic device detects the selection operation on the control 703 as an example for description.
  • FIG. 7B exemplarily shows the process of Qunar Travel application responding to the invitation of the Ink Weather application group establishment.
  • the electronic device 400 detects a selection operation acting on the control 703 in the interface 70B.
  • the selection operation may be a mouse operation on the control 501 (such as a mouse click operation), or a touch operation on the control 703 (such as a finger click operation), and so on.
  • the electronic device 400 may send a group establishment approval response 707 to the human-computer interaction server 200.
  • the human-computer interaction server 200 receives the group establishment approval response 707 sent by Qunar via the communication connection 102 shown in FIG. 2. Then, the human-computer interaction server 200 sends the group approval response 707 to the ink weather application through the communication connection 102.
  • the human-computer interaction server 200 can generate a mapping relationship between the ink weather and where to travel. Then, the human-computer interaction server 200 can save the mapping relationship between the ink weather and where to travel.
  • the electronic device 400 may display a response interface 70C of the ink weather application receiving consent to establish a group.
  • the interface 80C may include: a control 705 and a control 706.
  • the control 705 can be used for ink weather application developers to configure between skills.
  • the control 706 may be used by the ink weather application developer to open a chat window to send a message to the Qunar Travel skill.
  • the embodiments of this application only take the building of ink weather skills and where to travel skills as examples. It can also be built between other skills, and one skill can also be built with multiple skills, and multiple skill components can be built into a group, etc. , This should not constitute a restriction.
  • Configuring entity sharing is to configure a skill (such as the ink and weather query skill) to share an entity with another skill (such as the where to travel skill).
  • the shared entity may refer to a skill (such as the ink weather query skill) when a modern word appears in the user sentence A, the human-computer interaction server 200 configures another skill associated with the ink weather skill (such as where to travel) in the intent.
  • the entity of the slot is used to replace the pronoun in user sentence A.
  • Figures 8-10 exemplarily show the process of sharing the ink weather skill and the Qunar Travel skill configuration entity.
  • FIG. 8 exemplarily shows a user interface 80A displayed by the electronic device 400 for checking weather configuration entity sharing.
  • the user interface 80A may display a control 801.
  • the control 801 can be used by a user (such as an ink weather application developer) to display the skills that can be selected for skill sharing configuration.
  • the electronic device 400 can detect the selection operation acting on the control 801.
  • the selection operation may be a mouse operation (such as a mouse click operation) on the control 801, a touch operation (such as a finger click operation) on the control 801, and so on.
  • the electronic device 400 may refresh the user interface 80A.
  • the refreshed user interface 80A may be as shown in the user interface 90A in FIG. 9.
  • the user interface 90A may display a control 901, and the control 901 may be used to select a skill to be configured for skill sharing in the skill list. For example, as shown in the user interface 90A, the user can select the "where to travel" skill through the control 901 for skill sharing configuration.
  • the electronic device 400 can detect the selection operation acting on the control 901.
  • the selection operation may be a mouse operation on the control 901 (such as a mouse click operation), or a touch operation on the control 901 (such as a finger click operation), and so on. In response to the selection operation, the electronic device 400 may refresh the user interface 90A.
  • the refreshed user interface 90A may display a control 902 ("shared entity") as shown in the user interface 90B.
  • the electronic device 400 can detect the selection operation on the control 902.
  • the selection operation can be a mouse operation on the control 902 (such as a mouse click operation), or a touch operation on the control 902 (such as a finger click operation), and so on.
  • the human-computer interaction server 200 may configure the "city slot” in the "weather check” skill and the "destination slot” in the "where to travel and book tickets” skill.
  • the human-computer interaction server 200 configures the destination slot as the associated slot of the “city slot.” Specifically, when the entity source of the “city slot” is configured as the system thesaurus (such as sys.location.city), The configured entity source of the "destination slot” is also the system vocabulary (such as sys.location.city). The human-computer interaction server 200 combines the slot name of the "city slot” and the "destination slot” Slot name association.
  • the human-computer interaction server 200 When the entity of the "city slot” comes from the custom vocabulary created by the human-computer interaction server 200 for the ink weather skill, the human-computer interaction server 200 combines the slot name of the "city slot” and the "purpose Associate the name of the slot of "Ground Slot”, and associate the custom vocabulary of the entity source where the "City Slot” is configured with the system vocabulary or custom vocabulary of the entity source where the "Destination Slot” is configured.
  • Configuration The interface for entity sharing is not limited to the interface shown in the user interface 90B, and may also be an interface for entity sharing through command lines, which is not limited here.
  • the interface 100A displays the details of the shared entity in the city slot in the ink weather skill and the destination slot in the Qunar Travel skill.
  • the interface 100A may be stored in the human-computer interaction server 200 in the form of a table. It may also be that the human-computer interaction server 200 saves the mapping relationship of shared entities between the ink weather skill city slot and the Qunar travel skill destination slot. There is no restriction here.
  • Fig. 11 exemplarily shows the man-machine dialogue on which the first embodiment is based.
  • Fig. 11 exemplarily shows a shared entity scenario where the entity is a place.
  • the electronic device 100 may display a man-machine dialogue interface 110A.
  • the electronic device 100 may display the collected user sentence 1101 "book a flight from Shanghai to Beijing tomorrow" on the interface 110A. Then, the voice assistant in the electronic device 100 can feed back the ticket booking result to the user (not shown).
  • the feedback of the booking result can include but is not limited to the following two ways: Mode 1.
  • the electronic device 100 can display the booking result in the form of a web page (not shown) in the interface 110A; Mode 2.
  • the electronic device 100 can also Voice broadcast the booking result to the user.
  • the electronic device 100 can collect the user sentence 1102 "Yes, how is the weather there tomorrow", and display 1102 on the interface 110A.
  • the human-computer interaction server 200 extracts from the user sentence 1102 that the entity corresponding to the city slot is the pronoun "there".
  • the human-computer interaction server 200 determines that the city slot and the corresponding destination slot in the user sentence 1101 share an entity.
  • the human-computer interaction server 200 replaces the word "there” with the entity "Beijing" of the destination slot.
  • the electronic device 100 can correctly feed back the weather query result to the user.
  • feedback of the weather query result may include but is not limited to the following two ways: Method 1.
  • the electronic device 100 can display the weather query result in the form of a web page (not shown) in the interface 110A; Method 2.
  • the electronic device 100 can also Voice broadcast of weather query results to users.
  • the user sentence 1101 may also be "book a ticket to Beijing, departing from Shanghai".
  • the human-computer interaction server 200 can still extract the entity "Beijing" of the destination slot from the user sentence 1101. Then, the human-computer interaction server 200 replaces the entity "Beijing" of the destination slot for the pronoun "there” in the user sentence 1102. ". In this way, the electronic device 100 can correctly feed back the weather query result to the user.
  • FIGS. 12A-12B show the implementation of the man-machine dialogue method provided in the first embodiment in the man-machine dialogue system 10.
  • FIG. 12A specifically shows the process of the man-machine dialogue system 10 processing a ticket booking request.
  • the electronic device 100 can collect a user sentence 1101.
  • the electronic device performs voice recognition on the user sentence 1201 and converts it into text 1202.
  • the electronic device 100 sends the text 1202 to the human-computer interaction server 200.
  • the human-computer interaction server 200 can receive the text 1202.
  • the human-computer interaction server 200 performs skill classification, intention classification, and slot extraction on the text 1202.
  • the semantic understanding module 303 in the human-computer interaction server 200 as shown in FIG. 3 can perform skill classification on the text 1202.
  • the human-computer interaction server 200 can use the human-machine dialogue model under the skills corresponding to the text 1202 to perform intent classification and slot extraction of the text 1202.
  • the man-machine dialogue model may be the man-machine dialogue model trained in FIG. 5A.
  • the human-computer interaction server 200 may store the skills and slots corresponding to the text 1202 in the form of a table 1201.
  • the human-computer interaction server 200 sends a ticket booking request 1203 to the Qunar trip server 301.
  • the ticket booking request 1203 may include request parameters such as "tomorrow, Shanghai, Beijing".
  • the request parameter may be the entity corresponding to the slot extracted from the text 1202 by the human-computer interaction server 200.
  • the specific form of the ticket booking request 1203 is not limited here.
  • the Qunar Trip server 301 can receive the ticket booking request 1203.
  • the Qunar Travel server 301 may obtain the booking result 1204 based on the booking request 1203 and the request parameter "tomorrow, Shanghai, Beijing" included in the booking request 1203.
  • the Qunar Travel server 301 can return the booking result 1204 (the flight from Beijing to Shanghai tomorrow) to the human-computer interaction server 200.
  • the human-computer interaction server 200 may send the booking result 1204 to the electronic device 100 after receiving the booking result 1204 fed back by the Qunar.com server 301.
  • the human-computer interaction server 200 may send the ticket booking page to the electronic device 100.
  • the human-computer interaction server 200 may also send the ticket booking parameters to the electronic device 100.
  • the electronic device 100 may generate a booking page according to the booking parameters.
  • the electronic device 100 may output (display or voice broadcast) the ticket booking result from Shanghai to Beijing tomorrow after receiving the ticket booking result 1204 sent by the human-computer interaction server 200.
  • FIG. 12B specifically shows the process in which the man-machine dialogue system 10 processes a weather query request.
  • the electronic device 100 can collect a user sentence 1102.
  • the electronic device performs voice recognition on the user sentence 1102 and converts it into text 1206.
  • the electronic device 100 sends the text 1206 to the human-computer interaction server 200.
  • the server 200 may receive the text 1206.
  • the human-computer interaction server 200 performs skill classification, intention classification and slot extraction on the text 1206.
  • the semantic understanding module 303 in the human-computer interaction server 200 shown in FIG. 3 can perform skill classification on the text 1206.
  • the human-computer interaction server 200 can use the human-computer dialogue model under the skills corresponding to the text 1206 to classify the text 1206 intent and extract the slots.
  • the man-machine dialogue model may be the man-machine dialogue model trained in FIG. 5A.
  • the human-computer interaction server 200 may store the skills and slots corresponding to the text 1206 in the form of a table 1205.
  • the human-computer interaction server 200 needs to query whether the city slot in the text 1206 has a shared entity. It can be seen from Figure 8-10 that the city slot in the ink weather skill and the destination slot in the Qunar Travel skill have been configured as shared entities. Therefore, the human-computer interaction server 200 will directly share the entity “Beijing” corresponding to the destination in the table 1201 stored in the memory to the city slot in the table 1205. In this way, the human-computer interaction server 200 knows that the specific intention of the text 1306 is "to query the weather in Beijing tomorrow".
  • the human-computer interaction server 200 sends a query request 1207 to the ink weather server 302.
  • the query request 1207 may include request parameters such as "Tomorrow, Beijing".
  • the request parameter may be the entity corresponding to the slot extracted from the text 1206 by the human-computer interaction server 200.
  • the specific form of the query request 1207 is not limited here.
  • the ink weather server 302 may receive the query request 1207.
  • the ink weather server 302 can obtain the query result 1208 based on the query request 1207 and the parameter "Tomorrow, Beijing" included in the query request 1207.
  • the ink weather server 302 may return a query result 1208 (such as the weather forecast for tomorrow in Beijing) to the human-computer interaction server 200.
  • a query result 1208 such as the weather forecast for tomorrow in Beijing
  • the human-computer interaction server 200 may send the query result 1208 to the electronic device 100 after receiving the query result 1208 fed back by the ink weather server 302.
  • the human-computer interaction server 200 may send a weather forecast page to the electronic device 100.
  • the human-computer interaction server 200 may also send weather forecast parameters to the electronic device 100.
  • the electronic device 100 may generate a weather forecast page according to weather forecast parameters.
  • the electronic device 100 may output (display or voice broadcast) the weather conditions of Beijing tomorrow after receiving the query result 1208 sent by the human-computer interaction server 200.
  • the city slot of the ink weather skill and the destination slot of the Qunar Travel skill are configured with shared entities. Therefore, when the entity corresponding to the city slot in the user sentence 1102 is the pronoun "where", the human-computer interaction server 200 can still understand that the "where" in the user sentence 1102 refers to "Beijing". The human-computer interaction server 200 does not need to confirm to the user the meaning of the pronoun "there” in the user sentence 1102. Improved user experience.
  • Fig. 13 exemplarily shows the man-machine dialogue based on the second embodiment.
  • Fig. 13 shows an exemplary shared entity scenario where the entity is time.
  • the electronic device 100 may display a man-machine dialogue interface 130A.
  • the electronic device 100 can display the collected user sentence 1301 "What's the weather like tomorrow in Beijing" on the interface 130A. Then, the voice assistant in the electronic device 100 may feed back the query result (not shown) to the user.
  • the feedback query result may include, but is not limited to, the following two ways: Method 1, the electronic device 100 can display the query result in the form of a web page (not shown) in the interface 110A; Method 2, the electronic device 100 can also display the query result The result is voice broadcast to the user.
  • the electronic device 100 can collect the user sentence 1302 "book a ticket to Beijing that day".
  • the human-computer interaction exemplarily shows that the shared entity mutual server 200 whose entity is a location extracts from the user sentence 1302 that the entity corresponding to the time slot is the pronoun "that day”.
  • the human-computer interaction server 200 determines a time slot and a time slot shared entity corresponding to the user sentence 1301. Then, the human-computer interaction server 200 replaces the word "that day” with the entity "tomorrow” in the time slot corresponding to the user sentence 1301. In this way, the electronic device 100 can correctly feedback the ticket booking result to the user.
  • the voice assistant in the electronic device 100 may feedback the ticket booking result to the user (not shown).
  • feedback of the booking result can include but is not limited to the following two ways: Method 1.
  • the electronic device 100 can display the booking result in the form of a web page (not shown) in the interface 110A; Method 2.
  • the electronic device 100 can also Voice broadcast the booking result to the user.
  • FIGS. 14A-14B show the implementation of the voice interaction method provided in the second embodiment in the man-machine dialogue system 10.
  • FIG. 14A specifically shows the process in which the man-machine dialogue system 10 processes a weather query request.
  • the electronic device 100 can collect a user sentence 1301.
  • the electronic device performs voice recognition on the user sentence 1201 and converts it into text 1402.
  • the electronic device 100 sends the text 1302 to the human-computer interaction server 200.
  • the server 200 may receive the text 1402.
  • the human-computer interaction server 200 performs skill classification, intention classification, and slot extraction on the text 1402.
  • the semantic understanding module 303 in the human-computer interaction server 200 shown in FIG. 3 can perform skill classification on the text 1402.
  • the human-computer interaction server 200 can use the human-machine dialogue model under the skills corresponding to the text 1402 to perform intent classification and slot extraction of the text 1402.
  • the man-machine dialogue model may be the man-machine dialogue model trained in FIG. 5A.
  • the human-computer interaction server 200 may store the skills and slots corresponding to the text 1402 in the form of a table 1401.
  • the human-computer interaction server 200 sends a query request 1403 to the ink weather server 302.
  • the query request 1403 may include request parameters such as "Tomorrow, Beijing".
  • the request parameter may be the entity corresponding to the slot extracted from the text 1402 by the human-computer interaction server 200, and the specific form of the query request 1403 is not limited here.
  • the ink weather server 302 may receive the query request 1403.
  • the ink weather server 302 can obtain the query result 1404 (such as the weather forecast for Beijing tomorrow) based on the query request 1403 and the parameter "Tomorrow, Beijing" included in the query request 1403.
  • the ink weather server 302 may return a query result 1404 (such as the weather forecast for Beijing tomorrow) to the human-computer interaction server 200.
  • a query result 1404 such as the weather forecast for Beijing tomorrow
  • the human-computer interaction server 200 may send the query result 1404 to the electronic device 100 after receiving the query result (such as the weather forecast for tomorrow in Beijing) 1404 fed back by the ink weather server 302.
  • the human-computer interaction server 200 may send a weather forecast page to the electronic device 100.
  • the human-computer interaction server 200 may also send weather forecast parameters to the electronic device 100.
  • the electronic device 100 may generate a weather forecast page according to weather forecast parameters.
  • the electronic device 100 may output (display or voice broadcast) the query result of the weather in Beijing tomorrow after receiving the query result 1404 sent by the human-computer interaction server 200.
  • FIG. 14B specifically shows the process in which the man-machine dialogue system 10 processes a ticket booking request.
  • the electronic device 100 can collect a user sentence 1302.
  • the electronic device converts the user sentence 1302 into text 1406 after voice recognition.
  • the electronic device 100 sends the text 1406 to the human-computer interaction server 200.
  • the server 200 may receive the text 1406.
  • the human-computer interaction server 200 performs skill classification, intention classification, and slot extraction on the text 1406.
  • the semantic understanding module 303 in the human-computer interaction server 200 shown in FIG. 3 can perform skill classification on the text 1406.
  • the human-computer interaction server 200 can use the human-computer dialogue model under the skills corresponding to the text 1406 to perform intent classification and slot extraction on the text 1406.
  • the man-machine dialogue model may be the man-machine dialogue model trained in FIG. 5A.
  • the human-computer interaction server 200 may save the skills and slots corresponding to the text 1406 in the form of a table 1405.
  • the human-computer interaction server 200 needs to query whether the time slot in the text 1406 has a shared entity. Shared entities have been configured in the time slot in the travel skill and the time slot in the ink weather skill. For the process of configuring a shared entity in the time slot of the Qunar Travel skill and the time slot of the Ink Weather skill, refer to the shared entity configuration process shown in Figs. 8-10. Therefore, the human-computer interaction server 200 will directly share the entity "tomorrow" corresponding to the time slot in the table 1401 stored in the memory to the time slot in the table 1405.
  • the human-computer interaction server 200 knows that the specific intention of the text 1406 is to "book a flight from Shenzhen (a GPS location city) to Beijing tomorrow.” Then, the human-computer interaction server 200 sends a ticket booking request 1407 to the Qunar Travel server 301.
  • the ticket booking request 1407 may include request parameters such as "tomorrow, Shenzhen, Beijing".
  • the request parameter may be the entity corresponding to the slot extracted from the text 1406 by the human-computer interaction server 200.
  • the specific form of the booking request 1407 is not limited here.
  • Qunar Travel server 301 can receive a ticket booking request 1407.
  • the Qunar Travel server 301 can obtain the booking result 1408 based on the query request 1207 and the parameter "Tomorrow, Shenzhen, Beijing" included in the booking request 1407 (for example, the flight from Shenzhen to Beijing tomorrow).
  • the Qunar Travel server 301 can return the booking result 1408 to the human-computer interaction server 200 (such as the flight from Shenzhen to Beijing tomorrow).
  • the human-computer interaction server 200 may send the booking result 1408 to the electronic device 100 after receiving the booking result 1408 fed back by the Qunar Trip server 301.
  • the human-computer interaction server 200 may send the ticket booking page to the electronic device 100.
  • the human-computer interaction server 200 may also send the ticket booking parameters to the electronic device 100.
  • the electronic device 100 may generate a booking page according to the booking parameters.
  • the electronic device 100 may output (display or voice broadcast) the ticket booking result from Shenzhen to Beijing tomorrow after receiving the ticket booking result 1408 sent by the human-computer interaction server 200.
  • the time slot of the ink weather skill and the time slot of the Qunar Travel skill have been configured with a shared entity. Therefore, when the entity corresponding to the time slot in the user sentence 1302 is the pronoun "that day", the human-computer interaction server 200 can still understand that the "day" of the user sentence 1302 means "tomorrow”. The human-computer interaction server 200 does not need to confirm the meaning of the pronoun "that day” in the sentence 1302 to the user. Improved user experience.
  • the human-computer interaction server 200 receives the first user sentence collected by the electronic device 100; the human-computer interaction server 200 extracts the entity of the first slot from the first user sentence; the first slot The position is the slot where the first intention is configured; the first intention is the intention where the first skill is configured, and the first skill is configured with one or more intentions; the first intention and the first skill are the human-computer interaction server 200 Determined according to the first user sentence, and matches the service demand indicated by the first user sentence; if the entity in the first slot is a pronoun, the human-computer interaction server 200 will modify the entity in the first slot to that of the second slot Entity; the second slot is configured as an associated slot of the first slot, the entity of the second slot is extracted from the second user sentence by the human-computer interaction server 200; the second user sentence is before the first user sentence Collected by the electronic device 100; the intention configured with the second slot is the second intention, and the second intention is configured as the associated intention of the first intention; the skill configured with the second intention is
  • FIG. 15 shows the overall flow of a semantic analysis method provided by an embodiment of the present application. Expand below:
  • Phase 1 Prior voice interaction (S101-S107)
  • the electronic device 100 collects the user sentence A, processes it through the voice recognition module, and sends it to the human-computer interaction server 200.
  • the user sentence A may be the user sentence 1501 "inquire about the weather in Beijing tomorrow" shown in FIG. 15.
  • the voice recognition module in the electronic device 100 performs voice recognition on the user sentence 1501.
  • the user sentence A sent by the electronic device 100 to the human-computer interaction server 200 may be in the form of audio or text. There is no limitation here.
  • the human-computer interaction server 200 receives the user sentence A.
  • the human-computer interaction server 200 When the user uses the electronic device 100 to interact with the human-computer interaction server 200, he can propose corresponding service requirements to the human-computer interaction server 200 in the form of voice or text. If the user inputs in the form of voice, the human-computer interaction server 200 can recognize the voice, recognize it as a text form, and input it into the semantic understanding module 303. If the user inputs in text form, the human-computer interaction server 200 inputs the text input by the user into the semantic understanding module 303.
  • the user sentence A may be a sentence in a single round of dialogue between the user and the human-computer interaction server 200, or may be multiple sentences in multiple rounds of dialogue between the user and the human-computer interaction server 200, which is not limited in this embodiment of the application. .
  • the human-computer interaction server 200 may receive the user sentence A sent by the electronic device 100 through the communication connection 101 as shown in FIG. 2.
  • the human-computer interaction server 200 extracts the entity of the slot A from the user sentence A, the slot A is the slot where the intention A is configured, the intention A is determined according to the user sentence A, and the intention A is the intention where the skill A is configured.
  • the semantic understanding module 303 can search and filter according to the first user sentence to determine the intention A corresponding to the user sentence A and the slot information (including the slot A) associated with the intention.
  • the intention A is one of the intentions of the skill A (such as the weather checking skill) on the human-computer interaction server 200 (such as the dialogue intention "checking the weather").
  • the skill developer will configure the corresponding slots (such as city slots, time slots) for the intention A in the skill, that is, which slots need to be extracted by the intention A and the attributes of each slot. Therefore, after determining the intention A corresponding to the user sentence A, the human-computer interaction server 200 can use the human-machine dialogue model corresponding to the intention A to output the slot configuration associated with the intention A. For example, when the user sentence A is "query the weather in Beijing tomorrow".
  • the human-computer interaction server 200 can determine that the intention A corresponding to the user sentence A is the dialogue intention "check the weather”.
  • the man-machine dialogue model corresponding to the dialogue intention "Check the weather” can output the slots associated with the intent as time slots and city slots.
  • the corresponding entity of the time slot is "Tomorrow”, and the corresponding entity of the city slot is "Beijing".
  • slot A can be a city slot.
  • slot information that can be set by the user by default, or can be obtained by other means (such as GPS positioning), and is not necessarily extracted from the user sentence A.
  • the human-computer interaction server 200 obtains the service result A for the service request A based on the entity of the intention A and the slot A; the service request A includes the indication information of the intention A and the entity of the slot A.
  • the human-computer interaction server 200 After obtaining the specific intent and the slot information associated with the intent, the human-computer interaction server 200 sends a service request to the third-party application server that has a mapping relationship with the intent.
  • the mapping relationship between the intention and the third-party application server may be established before the human-computer interaction server 200 receives the first user sentence.
  • the mapping relationship between the intent and the third-party application server may also be established when the human-computer interaction server creates the skill.
  • the dialogue intent “Check the weather” corresponds to the ink weather server.
  • the dialogue intent "book a ticket” corresponds to the where to travel server.
  • the first service request can be a weather query request or a ticket booking request, which is not limited here.
  • the intention acquired by the human-computer interaction server 200 is the dialogue intention "check weather”.
  • the slots corresponding to the "Check Weather” dialogue intention are the time slot and the city slot.
  • the human-computer interaction server 200 obtains the entity “Tomorrow” corresponding to the time slot and the entity "Beijing" corresponding to the city slot. Then, the human-computer interaction server 200 sends a weather query request to the ink weather server.
  • the weather query request includes the query time "tomorrow” and the query city "Beijing”.
  • the service result obtained by the human-computer interaction server 200 may be the weather forecast for Beijing tomorrow.
  • the third application 2 server 302 obtains the service result A according to the received service request A, and feeds back the service result A to the human-computer interaction server 200.
  • the third-party application 2 server 302 receives the service request A (such as the weather query request) sent by the human-computer interaction server.
  • the third-party application 2 server 302 obtains the service result A (such as the weather query result of Beijing tomorrow) according to the service request A and the parameters carried in the service request A (such as "Tomorrow, Beijing"). Then the third-party application 2 server 302 returns the service result A to the human-computer interaction server 200.
  • the human-computer interaction server receives the service result A, and sends the service result A to the electronic device 100.
  • the service result A sent by the human-computer interaction server 200 may be in the form of a web page.
  • the service result A may also be in the form of parameters, and the electronic device 100 generates a corresponding web page. There is no limitation here.
  • the electronic device 100 receives the service result A, and outputs the service result A.
  • the electronic device 100 may display the service result A (for example, the weather forecast for tomorrow in Beijing) in the form of a web page on the screen for the user to view.
  • the electronic device 100 may also voice broadcast the A service result to the user.
  • the format in which the electronic device 100 outputs the service result A is not limited here.
  • the electronic device 100 collects the user sentence B, processes it through the voice recognition module, and sends it to the human-computer interaction server 200.
  • the user sentence B that can be collected by the electronic device 100 may be the sentence 1502 "book a ticket to go there tomorrow".
  • the second user sentence may be in the form of audio or text.
  • the electronic device 100 may send the user sentence B to the human-computer interaction server 200 through the communication connection 101 shown in FIG. 3.
  • S109 The human-computer interaction server 200 receives the user sentence B.
  • the process of the user sentence B received by the human-computer interaction server 200 can refer to the process of the human-computer interaction server 200 receiving the user sentence A in step 102, which will not be repeated here.
  • the human-computer interaction server 200 extracts the entity of the slot B from the user sentence B; the slot B is the slot where the intention B is configured; the intention B is determined according to the user sentence B, and the intention B is the intention of the skill B being configured.
  • the human-computer interaction server 200 recognizes that the skill corresponding to the user sentence B is the air ticket booking skill, and can also recognize that the intent corresponding to the dialogue intention is the dialogue intention "book air ticket”.
  • the human-computer interaction server 200 can also determine from this user sentence B the entity corresponding to the slot (such as the time slot, the departure slot, and the destination slot) associated with the dialogue intention "booking a ticket”. Specifically, the entity corresponding to the time slot is "tomorrow", the entity corresponding to the departure slot is the current location of the user, and the entity corresponding to the destination slot is the pronoun "there".
  • the human-computer interaction server 200 performs skill classification and intention recognition to extract slots, it has been described in step 103, and will not be repeated here.
  • the human-computer interaction server 200 modifies the entity of slot B to the entity of slot A, slot A is configured as an associated slot of slot B, and skill A is configured as a skill B's related skills.
  • the human-computer interaction server 200 cannot determine the specific meaning of the pronoun "where". Because the human-computer interaction server 200 configures the slot A and the slot B with a shared entity. Therefore, the human-computer interaction server 200 replaces the entity in the slot B with the entity in the slot A.
  • the human-computer interaction server 200 configures the slot A and the slot B with a shared entity. For example, slot A in the "weather check” skill (such as the "city slot” in the "weather check” skill) and the "destination slot” in the "where to go skill” configure a shared entity.
  • the human-computer interaction server 200 When the entity of the "city slot” is a pronoun, the human-computer interaction server 200 will replace the entity of the "city slot” with the entity of the "destination slot".
  • the process of configuring a shared entity in the human-computer interaction server 200 is shown in Figs. 8-10, and will not be repeated here.
  • the human-computer interaction server 200 replaces the entity "Beijing” corresponding to the destination slot in the table 1201 with the entity "There” in the city slot of the table 1205. Then, the entity corresponding to the city slot in table 1205 is "Beijing".
  • the second slot is the city slot in Table 1305, and the pronoun "there” means “Beijing”.
  • the human-computer interaction server 200 replaces the entity "tomorrow” corresponding to the time slot in the table 1401 with the entity "that day” in the time slot 1405. Then, the entity corresponding to the time slot in table 1505 is "tomorrow".
  • the first slot is the time slot in Table 1405, and the pronoun "that day” means “tomorrow”.
  • the human-computer interaction server 200 obtains the service result B for the service request B from the third-party application server based on the entity of the intention B and the slot B; the service request B includes the indication information of the intention B and the entity of the slot B.
  • the human-computer interaction server 200 After the human-computer interaction server 200 obtains the specific intent and the slot information corresponding to the intent, it sends the service to the third-party application server (such as the "Go where to travel server") that has a mapping relationship with the intent B (such as the dialog intention "book a ticket") Request B (such as "book a flight from Shenzhen to Beijing tomorrow").
  • the intention acquired by the human-computer interaction server 200 is the dialogue intention of "booking a ticket”.
  • the slots corresponding to the dialogue intention "booking a ticket” intention are the time slot, the departure slot, and the destination slot.
  • the human-computer interaction server 200 obtains the entity “tomorrow” corresponding to the time slot, the entity corresponding to the departure slot is "Shenzhen” and the entity “Beijing” corresponding to the destination slot. Then, the human-computer interaction server 200 will send a service request A (such as a request for booking a ticket) to the Qunar Travel server.
  • the air ticket booking request includes the instruction information of the dialogue intention "book air ticket”, the time "tomorrow”, the departure place "Shenzhen” and the destination "Beijing”.
  • the service result B obtained by the human-computer interaction server 200 may be the flight information from Shenzhen to Beijing tomorrow.
  • the instruction information of the dialogue intention "book a ticket” may be the name of the dialogue intention “book a ticket”, or the ID of the dialogue intention “book a ticket”, and so on.
  • the indication information of the dialogue intention "book a ticket” may be used to indicate the intention.
  • the third-party application 1 server 301 obtains the service result B according to the received service request B, and feeds back the service result B to the human-computer interaction server 200.
  • the third-party application 1 server 301 receives the service request B (such as a ticket booking request) sent by the human-computer interaction server 200. Then, the third-party application 1 server 301 obtains the second service result (such as a flight from Shenzhen to Beijing tomorrow) according to the service request B (such as a ticket booking request) and the second service request parameters (such as "tomorrow, Shenzhen, Beijing"). After that, the third-party application 1 server 301 sends the service result B to the human-computer interaction server 200. Specifically, the third-party application 1 server may send the service result B to the human-computer interaction server 200 through the communication connection 102 as shown in FIG. 2.
  • the human-computer interaction server 200 receives the service result B, and sends the service result B to the terminal device 100.
  • the service result B sent by the human-computer interaction server 200 may be in the form of a web page.
  • the service result B may also be in the form of parameters, and the corresponding webpage is generated by the electronic device 100. There is no limitation here.
  • the electronic device 100 receives the service result B, and outputs the service result B.
  • the electronic device 100 may display the service result B in the form of a web page on the screen for the user to view.
  • the electronic device 100 may also voice broadcast the service result B to the user. There is no limitation here.
  • the semantic analysis method provided by the embodiment of the present application may collect the first user sentence through the electronic device 100 and send the first user sentence to the human-computer interaction server 200.
  • the human-computer interaction server 200 receives the first user sentence collected by the electronic device 100; the human-computer interaction server 200 extracts the entity of the first slot from the first user sentence; the first slot is the slot where the first intention is configured ;
  • the first intention is the intention that the first skill is configured, and the first skill is configured with one or more intentions; the first intention and the first skill are determined by the human-computer interaction server 200 according to the first user sentence, and the first A user sentence indicates a matching service requirement; if the entity in the first slot is a pronoun, the human-computer interaction server 200 modifies the entity in the first slot to the entity in the second slot; the second slot is configured as the first An associated slot of a slot, and the entity of the second slot is extracted by the human-computer interaction server 200 from the second user sentence; the second user sentence is collected by the electronic device 100 before the first user sentence;
  • the semantic analysis method provided in this application further includes the steps of creating skills, building groups between skills, and configuring skill sharing as shown in FIG. 16. These steps are as follows:
  • the human-computer interaction server 200 creates a skill corresponding to a third-party application, and the created skill A is configured with intention A, and the intention A is configured with slot A; the created skill B is configured with intention B, and the intention B is configured with slot B.
  • the human-computer interaction server 200 can create skill A (such as the “weather check” skill) based on the skills (ink weather query skills) provided by the third-party application server 301 (such as the ink weather server), and the intention A (such as The dialogue intention is "check weather"), and slot A is configured in intention A (such as "city slot”).
  • skill A such as the “weather check” skill
  • skill B such as the “booking air ticket” skill
  • the dialog intention is "book a ticket”
  • slot B is configured in intention B (such as "destination slot”).
  • the third-party application can be the Ink Weather application, Taobao application, JD application, etc. There is no restriction here. Regarding how to create a skill, please refer to the above description of the skill creation process shown in FIGS. 4A to 4D, which will not be repeated here.
  • the human-computer interaction server 200 configures skill A as an associated skill of skill B.
  • the human-computer interaction server 200 receives the request A sent by the third-party application server 301 (such as the ink weather server) that provides the skill A (such as the "ink weather” skill).
  • Request A is used to configure skill A (such as the "ink and weather” skill) as an associated skill of skill B (such as the "where to travel” skill).
  • Request A contains instruction information of skill A and instruction information of skill B.
  • the instruction information of skill A may be the name of skill A, or the ID of skill A and other information that can indicate skill A.
  • the instruction information of skill B can be the name of skill B, or the ID of skill B and other information that can indicate skill B.
  • the human-computer interaction server 200 sends the request A and the instruction information of the skill A and the instruction information of the skill B to the third-party application server 302 (such as the travel server) that provides the skill B (such as the "where to travel” skill).
  • the third-party application server 302 receives the request A, and returns a response A to the request A (such as the "agree” request).
  • the human-computer interaction server 200 configures skill A as the associated skill of skill B after receiving the response (such as “agree”). Then, the human-computer interaction server 200 saves the association relationship between skill A and skill B.
  • the process by which the human-computer interaction server 200 configures skill A (such as the "ink and weather” skill) as the associated skill of skill B (such as the “where to travel” skill) can refer to the ink weather skill configuration shown in FIGS. 7A-7B
  • skill A such as the "ink and weather” skill
  • skill B such as the “where to travel” skill
  • the human-computer interaction server 200 receives the request B sent by the third-party application server 302, and the request B is used to request the human-computer interaction server 200 to configure the slot A as an associated slot of the slot B.
  • Request B contains the indication information of slot A and the indication information of slot B.
  • the human-computer interaction server 200 configures slot A (such as a city slot) as an associated slot of slot B (such as a destination slot) according to the indication information of slot A and the indication information of slot B, namely Slot A and Slot B perform shared entity configuration.
  • the indication information of the slot A can be the slot name of the slot A or the ID of the slot A and other information.
  • the indication information of the slot B can be the slot name of the slot B or the ID of the slot B and other information.
  • the configuration process of the shared entity is shown in Figure 8-10, and will not be repeated here.
  • the human-computer interaction server 200 compares the slot name of slot B (such as the destination slot) with that of slot A Slot name association; the system thesaurus makes the configured entity from all slots of the system thesaurus have the same set of entities; the configured entity in slot B comes from the system thesaurus; if slot A (such as a city slot) The configured entity comes from the first custom vocabulary, and the human-computer interaction server 200 associates the slot name of slot B (such as the destination slot) with the slot name of slot A; the human-computer interaction server 200 associates The first custom dictionary is associated with the second custom dictionary; the first custom dictionary is a set of configured entities in slot A; the second custom dictionary is a set of configured entities in slot B; The entity set configured for slot A is different from the entity set configured for slot B.
  • the human-computer interaction server is configured through skills creation, group formation between skills, and configuration skills sharing.
  • the human-computer interaction server can replace the pronoun by acquiring the entity associated with the slot.
  • the human-computer interaction server can know the meaning of the pronoun.
  • this application also provides another semantic analysis method.
  • the human-computer interaction server 200 does not extract the entity corresponding to the second slot in the second input. There is no shared entity in the second slot configured in the human-computer interaction server 200 either.
  • This method can use the scoring and sorting model to find candidate entities to fill the second slot.
  • FIG 17 shows the overall flow of another human-machine dialogue method provided by this application. Expand below:
  • the human-computer interaction server 200 receives the user sentence A collected by the electronic device 100.
  • step S102 the reception of the user sentence A collected by the electronic device 100 by the human-computer interaction server 200 has been described, and will not be repeated here.
  • the collection of the user sentence A by the electronic device 100 has been described in step S101, and will not be repeated here.
  • the human-computer interaction server 200 extracts the entity of the slot from the user sentence A, where the slot A is the slot where the intention A is configured, the intention A is determined according to the user sentence A, and the intention A is the slot where the skill A is configured.
  • Step 302 can refer to step S103, which will not be repeated here.
  • S303-S308 Use the scoring ranking model to find a candidate entity to replace the entity in the second slot.
  • the human-computer interaction server 200 extracts all the entities corresponding to the slots of the user sentence B, and the user sentence B is received by the human-computer interaction server 200 before the user sentence A.
  • the human-computer interaction server 200 extracts the slot in the user sentence B stored in the dialog management module and the entity corresponding to the slot. For example, suppose that the user sentence B saved by the human-computer interaction server 200 is "what is the weather in Beijing tomorrow", and its slots are time slots and city slots. The corresponding entity of the time slot is "Tomorrow”, and the corresponding entity of the city slot is "Beijing”. The human-computer interaction server 200 extracts both the entity “tomorrow” in the time slot and the entity "Beijing" in the city slot.
  • the user sentence B may be a sentence in a single-round dialogue between the user and the human-computer interaction server 200, or may be multiple sentences in a multi-round dialogue between the user and the human-computer interaction server 200, which is not limited in the embodiment of the present application.
  • the human-computer interaction server 200 finds K candidate entities with the same entity information type as the slot A.
  • the human-computer interaction server 200 filters the saved slots and corresponding entity information according to the information of the slot A. For example, if the entity corresponding to the slot A is a location, then the selected candidate entity is also an entity representing the location. In this way, K candidate entities are obtained, and K is a natural number greater than 1. For example, if slot A is a city slot, the corresponding entity should be of the location type. If the slots and corresponding entities extracted by the human-computer interaction server 200 from the user sentence B are: "time slot, tomorrow", “time slot, today”, “departure slot, Beijing", “destination” Slot, Shanghai”, “City Slot, Shenzhen". Then the human-computer interaction server 200 will select "Beijing", “Shanghai”, and "Shenzhen" as candidate entities.
  • S305 The human-computer interaction server 200 replaces the K candidate entities with the entities in the slot A to obtain K candidate sentences.
  • the human-computer interaction server 200 fills the K candidate entities into the slot A in the user sentence A respectively, and can obtain K candidate sentences. For example, suppose user sentence A is "book a ticket to go there tomorrow". The intention of user sentence A is to book tickets. The slots under the booking intention are time slot, departure slot, and destination slot. The corresponding entity of the time slot in the user sentence A is "tomorrow", the entity of the departure slot is not reflected, but it can default to the GPS location city (such as Shenzhen), and the destination slot only has the pronoun "there”. Therefore, the human-computer interaction server 200 needs to find the entity corresponding to the destination slot. Assume that the candidate entities found by the human-computer interaction server 200 in step S303 are "Beijing" and "Shanghai”. Then the candidate sentences are "book a ticket to Beijing tomorrow" and "book a ticket to Shanghai tomorrow".
  • the human-computer interaction server 200 uses the natural language understanding model to identify K candidate sentences, and outputs the semantics and corresponding confidence of the K candidate sentences.
  • candidate sentence 1 is "book a ticket to Beijing tomorrow.”
  • candidate sentence 2 is "book a ticket to Shanghai tomorrow”.
  • the human-computer interaction server 200 will use the natural language understanding model to output the semantics and confidence of the candidate sentence 1 and candidate sentence 2.
  • candidate sentence 1 has a confidence of 0.9
  • candidate sentence 2 has a confidence of 0.9.
  • the human-computer interaction server 200 sorts the candidate sentence 1 and the candidate sentence 2 through the scoring and ranking model.
  • the scoring and sorting model can be a model constructed by a neural network, or a model constructed by a sorting algorithm such as bubble sorting and selection sorting.
  • the training data of the scoring and ranking model can be from online questionnaires. The questionnaire gave a dialogue scenario, such as: the user first said "book a ticket from Shenzhen to Beijing", and then the user said "how is the weather there". Finally, let the netizens rate whether the user said "Shenzhen" or "Beijing". Then count the results of netizens' scores, and select the results with high scores as the output of the scoring ranking model.
  • S308 The human-computer interaction server 200 replaces the entity in slot A with the candidate entity in the candidate sentence with the highest score.
  • candidate sentence 1 mentioned in S150 is scored 90 points, and candidate sentence 2 is scored 95 points. Then, the human-computer interaction server 200 will select "Shanghai" to fill slot A.
  • the human-computer interaction server 200 obtains the service result A for the service request A based on the intention A and the entity of the slot A; the service request A includes the indication information of the intention A and the entity of the slot A.
  • Step S309 can refer to step S104, which will not be repeated here.
  • FIG. 18 shows an exemplary application of the method.
  • the electronic device 100 obtains the user sentence 1803 "Get a taxi there now”.
  • the user sentence 1803 currently acquired by the electronic device 100 is "take a taxi and go there now".
  • the electronic device 100 has also provided a human-computer interaction service for the user before.
  • the electronic device 100 as shown in FIG. 18 has received a user sentence 1801 before receiving the user sentence 1803, and provided an execution result 1802 based on the user sentence 1801.
  • the human-computer interaction server 200 receives the user sentence 1803 sent by the electronic device 100, analyzes skills and intentions and extracts slots through the semantic understanding module.
  • the human-computer interaction server 200 After receiving the user sentence 1803, the human-computer interaction server 200 analyzes the text through the semantic understanding module. The human-computer interaction server 200 analyzes that the skill corresponding to the input sentence 1803 is "taxi service", the intent is “taxi”, and the slot is "time” and “taxi destination”. But the slot "Taxi Destination” entity is the pronoun "There”. The human-computer interaction server 200 needs to query whether the "hailing service” skill has a shared skill. The shared entity of the location category can be extracted through shared skills to replace the corresponding entity "where" of the slot "ride destination”.
  • S403 The human-computer interaction server 200 does not find the shared skills, and calls the dialog management module to query the slot and entity information in the user sentence 1801.
  • the human-computer interaction server 200 first needs to inquire whether there is a shared skill, and when there is no shared skill, it calls the history round dialogue in the dialogue management module.
  • the historical round dialogue given in this embodiment is 1801 "Using Gaode to check the road conditions from Huawei to KFC".
  • the entity corresponding to slot 1701 "Departure Place” is "Huawei”
  • the entity corresponding to slot “Destination” is "KFC”.
  • Both "Huawei" and "KFC” are entities of the location category, which are the same as the entity type of "Taxi Destination".
  • the human-computer interaction server 200 calls the dialogue management module to replace the entity of the user sentence 1801 with the entity "where" of the slot "taxi destination" in the user sentence 1803 to obtain candidate sentences.
  • the human-computer interaction server 200 calls the dialogue management module to replace the slot "Taxi Destination" in the user sentence 1803 with the entities "Huawei” and “KFC” respectively, and obtain candidate sentence 1 "Take a taxi to Huawei” and candidate sentence 2 "Take a taxi to KFC” ".
  • the human-computer interaction server 200 performs semantic recognition on candidate sentences through the semantic understanding module.
  • the human-computer interaction server 200 obtains the semantic recognition result and confidence of the candidate sentence 1 and the candidate sentence 2 through the semantic understanding module 303. "Huawei” and “KFC” replace the slot "Taxi Destination” entity with a confidence level of 0.9.
  • the human-computer interaction server 200 may preset a reliability threshold, and filter out candidate sentences that are lower than the preset reliability threshold.
  • the preset reliability threshold in the embodiment of the present application is 0.8, and candidate sentence 1 and candidate sentence 2 are both higher than the preset reliability threshold.
  • the human-computer interaction server 200 uses the scoring and sorting model in the dialogue management module to score and sort candidate inputs that are higher than the confidence threshold, and select the entity in the candidate sentence with the highest score to replace the entity in the slot "taxi destination”. There.
  • the human-computer interaction server 200 takes the candidate sentence 1 and the candidate sentence 2 as the input of the scoring and ranking model, and obtains the scoring and ranking result.
  • candidate sentence 1 "Take a taxi to KFC” ranks first with a score of 98
  • candidate sentence 2 "Take a taxi to Huawei” ranks second with a score of 95. Therefore, the selection ranking is first.
  • the high-scoring "KFC” serves as the entity of the slot “ride-hailing destination" and executes the corresponding taxi-hailing service.
  • the human-computer interaction server 200 generates natural language feedback to the user through the natural language generation module as a result of executing the taxi service.
  • the human-computer interaction server 200 sends the taxi-hailing intention and slot information to the server corresponding to the taxi-hailing skills, and obtains the taxi-hailing result returned by the taxi-hailing skill server.
  • the natural language generation module in the human-computer interaction server 200 generates the natural language of the taxi-hailing result and sends it to the electronics. Equipment 100.
  • the electronic device 100 displays the taxi service result to the user.
  • the electronic device 100 displays a taxi service page or voice broadcasts the taxi result, which is not limited here.
  • the human-computer interaction server finds the entity to replace the pronoun in the user through the scoring and ranking model. Therefore, the human-computer interaction server can know the meaning of the pronouns in the user sentence without asking the user, which can improve the user experience.
  • the above-mentioned terminal and the like include hardware structures and/or software modules corresponding to each function.
  • the embodiments of the present application can be implemented in the form of hardware or a combination of hardware and computer software. Whether a certain function is executed by hardware or computer software-driven hardware depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered as going beyond the scope of the embodiments of the present invention.
  • the embodiment of the present application may divide the above-mentioned terminal and the like into functional modules according to the above method examples.
  • each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module.
  • the above-mentioned integrated modules can be implemented in the form of hardware or software functional modules. It should be noted that the division of modules in the embodiment of the present invention is illustrative, and is only a logical function division, and there may be other division methods in actual implementation.
  • this embodiment of the application discloses a schematic diagram of the hardware structure of a server 200.
  • the server 200 includes at least one processor 201, at least one memory 202, and at least one communication interface 203.
  • the server 200 may also include an output device and an input device, which are not shown in the figure.
  • the processor 201, the memory 202, and the communication interface 203 are connected by a bus.
  • the processor 201 can be a general-purpose central processing unit (Central Processing Unit, CPU), a microprocessor, an application-specific integrated circuit (Application-Specific Integrated Circuit, ASIC), or one or more programs used to control the execution of the program of this application integrated circuit.
  • the processor 201 may also include multiple CPUs, and the processor 201 may be a single-CPU (single-CPU) processor or a multi-core (multi-CPU) processor.
  • the processor here may refer to one or more devices, circuits, or processing cores for processing data (for example, computer program instructions).
  • the memory 202 can be a read-only memory (Read-Only Memory, ROM) or other types of static storage devices that can store static information and instructions, random access memory (Random Access Memory, RAM), or other types that can store information and instructions
  • the dynamic storage device can also be electrically erasable programmable read-only memory (Electrically Erasable Programmable Read-Only Memory, EEPROM), CD-ROM (Compact Disc Read-Only Memory, CD-ROM) or other optical disc storage, optical disc storage (Including compact discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or can be used to carry or store desired program codes in the form of instructions or data structures and can be used by a computer Any other media accessed, but not limited to this.
  • the memory 202 may exist independently and is connected to the processor 201 through a bus.
  • the memory 202 may also be integrated with the processor 201.
  • the memory 202 is used to store application program codes for executing the solutions of the present application, and the processor 201 controls the execution.
  • the processor 201 is configured to execute the computer program code stored in the memory 202, so as to implement the human-computer interaction method described in the embodiment of the present application.
  • the communication interface 203 can be used to communicate with other devices or communication networks, such as Ethernet, wireless local area networks (WLAN), and so on.
  • devices or communication networks such as Ethernet, wireless local area networks (WLAN), and so on.
  • the output device communicates with the processor and can display information in a variety of ways.
  • the output device may be a liquid crystal display (Liquid Crystal Display, LCD), a light emitting diode (Light Emitting Diode, LED) display device, a cathode ray tube (Cathode Ray Tube, CRT) display device, or a projector (projector), etc.
  • the input device communicates with the processor and can receive user input in a variety of ways.
  • the input device may be a mouse, a keyboard, a touch screen device, or a sensor device.
  • the electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, and an antenna 2.
  • Mobile communication module 150 wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, earphone jack 170D, sensor module 180, buttons 190, motor 191, indicator 192, camera 193, display screen 194, and Subscriber identification module (subscriber identification module, SIM) card interface 195, etc.
  • SIM Subscriber identification module
  • the sensor module 180 may include pressure sensor 180A, gyroscope sensor 180B, air pressure sensor 180C, magnetic sensor 180D, acceleration sensor 180E, distance sensor 180F, proximity light sensor 180G, fingerprint sensor 180H, temperature sensor 180J, touch sensor 180K, ambient light Sensor 180L, bone conduction sensor 180M, etc.
  • the structure illustrated in the embodiment of the present invention does not constitute a specific limitation on the electronic device 100.
  • the electronic device 100 may include more or fewer components than shown, or combine certain components, or split certain components, or arrange different components.
  • the illustrated components can be implemented in hardware, software, or a combination of software and hardware.
  • the processor 110 may include one or more processing units.
  • the processor 110 may include an application processor (AP), a modem processor, a graphics processing unit (GPU), and an image signal processor. (image signal processor, ISP), controller, memory, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural-network processing unit (NPU) Wait.
  • AP application processor
  • modem processor modem processor
  • GPU graphics processing unit
  • image signal processor image signal processor
  • ISP image signal processor
  • controller memory
  • video codec digital signal processor
  • DSP digital signal processor
  • NPU neural-network processing unit
  • the different processing units may be independent devices or integrated in one or more processors.
  • the controller may be the nerve center and command center of the electronic device 100.
  • the controller can generate operation control signals according to the instruction operation code and timing signals to complete the control of fetching and executing instructions.
  • a memory may also be provided in the processor 110 to store instructions and data.
  • the memory in the processor 110 is a cache memory.
  • the memory can store instructions or data that have just been used or recycled by the processor 110. If the processor 110 needs to use the instruction or data again, it can be directly called from the memory. Repeated accesses are avoided, the waiting time of the processor 110 is reduced, and the efficiency of the system is improved.
  • the processor 110 may include one or more interfaces.
  • the interface may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, and a universal asynchronous transmitter (universal asynchronous transmitter) interface.
  • I2C integrated circuit
  • I2S integrated circuit built-in audio
  • PCM pulse code modulation
  • PCM pulse code modulation
  • UART universal asynchronous transmitter
  • MIPI mobile industry processor interface
  • GPIO general-purpose input/output
  • SIM subscriber identity module
  • USB Universal Serial Bus
  • the I2C interface is a two-way synchronous serial bus, including a serial data line (SDA) and a serial clock line (SCL).
  • the processor 110 may include multiple sets of I2C buses.
  • the processor 110 may be coupled to the touch sensor 180K, charger, flash, camera 193, etc. through different I2C bus interfaces.
  • the processor 110 may couple the touch sensor 180K through an I2C interface, so that the processor 110 and the touch sensor 180K communicate through an I2C bus interface to implement the touch function of the electronic device 100.
  • the I2S interface can be used for audio communication.
  • the processor 110 may include multiple sets of I2S buses.
  • the processor 110 may be coupled with the audio module 170 through an I2S bus to realize communication between the processor 110 and the audio module 170.
  • the audio module 170 may transmit audio signals to the wireless communication module 160 through an I2S interface, so as to realize the function of answering calls through a Bluetooth headset.
  • the PCM interface can also be used for audio communication to sample, quantize and encode analog signals.
  • the audio module 170 and the wireless communication module 160 may be coupled through a PCM bus interface.
  • the audio module 170 may also transmit audio signals to the wireless communication module 160 through the PCM interface, so as to realize the function of answering calls through the Bluetooth headset. Both the I2S interface and the PCM interface can be used for audio communication.
  • the UART interface is a universal serial data bus used for asynchronous communication.
  • the bus can be a two-way communication bus. It converts the data to be transmitted between serial communication and parallel communication.
  • the UART interface is generally used to connect the processor 110 and the wireless communication module 160.
  • the processor 110 communicates with the Bluetooth module in the wireless communication module 160 through the UART interface to implement the Bluetooth function.
  • the audio module 170 may transmit audio signals to the wireless communication module 160 through a UART interface, so as to realize the function of playing music through a Bluetooth headset.
  • the MIPI interface can be used to connect the processor 110 with the display screen 194, the camera 193 and other peripheral devices.
  • the MIPI interface includes a camera serial interface (camera serial interface, CSI), a display serial interface (display serial interface, DSI), and so on.
  • the processor 110 and the camera 193 communicate through a CSI interface to implement the shooting function of the electronic device 100.
  • the processor 110 and the display screen 194 communicate through a DSI interface to realize the display function of the electronic device 100.
  • the GPIO interface can be configured through software.
  • the GPIO interface can be configured as a control signal or as a data signal.
  • the GPIO interface can be used to connect the processor 110 with the camera 193, the display screen 194, the wireless communication module 160, the audio module 170, the sensor module 180, and so on.
  • GPIO interface can also be configured as I2C interface, I2S interface, UART interface, MIPI interface, etc.
  • the USB interface 130 is an interface that complies with the USB standard specification, and specifically may be a Mini USB interface, a Micro USB interface, a USB Type C interface, and so on.
  • the USB interface 130 can be used to connect a charger to charge the electronic device 100, and can also be used to transfer data between the electronic device 100 and peripheral devices. It can also be used to connect headphones and play audio through the headphones. This interface can also be used to connect other electronic devices, such as AR devices.
  • the interface connection relationship between the modules illustrated in the embodiment of the present invention is merely a schematic description, and does not constitute a structural limitation of the electronic device 100.
  • the electronic device 100 may also adopt different interface connection modes in the foregoing embodiments, or a combination of multiple interface connection modes.
  • the charging management module 140 is used to receive charging input from the charger.
  • the charger can be a wireless charger or a wired charger.
  • the charging management module 140 may receive the charging input of the wired charger through the USB interface 130.
  • the charging management module 140 may receive the wireless charging input through the wireless charging coil of the electronic device 100. While the charging management module 140 charges the battery 142, it can also supply power to the electronic device through the power management module 141.
  • the power management module 141 is used to connect the battery 142, the charging management module 140 and the processor 110.
  • the power management module 141 receives input from the battery 142 and/or the charge management module 140, and supplies power to the processor 110, the internal memory 121, the external memory, the display screen 194, the camera 193, and the wireless communication module 160.
  • the power management module 141 can also be used to monitor parameters such as battery capacity, battery cycle times, and battery health status (leakage, impedance).
  • the power management module 141 may also be provided in the processor 110.
  • the power management module 141 and the charging management module 140 may also be provided in the same device.
  • the wireless communication function of the electronic device 100 can be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modem processor, and the baseband processor.
  • the antenna 1 and the antenna 2 are used to transmit and receive electromagnetic wave signals.
  • Each antenna in the electronic device 100 can be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve antenna utilization.
  • antenna 1 can be multiplexed as a diversity antenna of a wireless local area network.
  • the antenna can be used in combination with a tuning switch.
  • the mobile communication module 150 can provide a wireless communication solution including 2G/3G/4G/5G and the like applied to the electronic device 100.
  • the mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (LNA), etc.
  • the mobile communication module 150 can receive electromagnetic waves by the antenna 1, and perform processing such as filtering, amplifying and transmitting the received electromagnetic waves to the modem processor for demodulation.
  • the mobile communication module 150 can also amplify the signal modulated by the modem processor, and convert it into electromagnetic waves for radiation via the antenna 1.
  • at least part of the functional modules of the mobile communication module 150 may be provided in the processor 110.
  • at least part of the functional modules of the mobile communication module 150 and at least part of the modules of the processor 110 may be provided in the same device.
  • the modem processor may include a modulator and a demodulator.
  • the modulator is used to modulate the low frequency baseband signal to be sent into a medium and high frequency signal.
  • the demodulator is used to demodulate the received electromagnetic wave signal into a low-frequency baseband signal. Then the demodulator transmits the demodulated low-frequency baseband signal to the baseband processor for processing.
  • the low-frequency baseband signal is processed by the baseband processor and then passed to the application processor.
  • the application processor outputs a sound signal through an audio device (not limited to the speaker 170A, the receiver 170B, etc.), or displays an image or video through the display screen 194.
  • the modem processor may be an independent device.
  • the modem processor may be independent of the processor 110 and be provided in the same device as the mobile communication module 150 or other functional modules.
  • the wireless communication module 160 can provide applications on the electronic device 100 including wireless local area networks (WLAN) (such as wireless fidelity (Wi-Fi) networks), bluetooth (BT), and global navigation satellites.
  • WLAN wireless local area networks
  • BT wireless fidelity
  • GNSS global navigation satellite system
  • FM frequency modulation
  • NFC near field communication technology
  • infrared technology infrared, IR
  • the wireless communication module 160 may be one or more devices integrating at least one communication processing module.
  • the wireless communication module 160 receives electromagnetic waves via the antenna 2, frequency modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 110.
  • the wireless communication module 160 can also receive the signal to be sent from the processor 110, perform frequency modulation, amplify it, and convert it into electromagnetic wave radiation via the antenna 2.
  • the antenna 1 of the electronic device 100 is coupled with the mobile communication module 150, and the antenna 2 is coupled with the wireless communication module 160, so that the electronic device 100 can communicate with the network and other devices through wireless communication technology.
  • the wireless communication technologies may include global system for mobile communications (GSM), general packet radio service (GPRS), code division multiple access (CDMA), broadband Code division multiple access (wideband code division multiple access, WCDMA), time-division code division multiple access (TD-SCDMA), long term evolution (LTE), BT, GNSS, WLAN, NFC , FM, and/or IR technology, etc.
  • the GNSS may include global positioning system (GPS), global navigation satellite system (GLONASS), Beidou navigation satellite system (BDS), quasi-zenith satellite system (quasi -zenith satellite system, QZSS) and/or satellite-based augmentation systems (SBAS).
  • GPS global positioning system
  • GLONASS global navigation satellite system
  • BDS Beidou navigation satellite system
  • QZSS quasi-zenith satellite system
  • SBAS satellite-based augmentation systems
  • the electronic device 100 implements a display function through a GPU, a display screen 194, and an application processor.
  • the GPU is a microprocessor for image processing, connected to the display 194 and the application processor.
  • the GPU is used to perform mathematical and geometric calculations for graphics rendering.
  • the processor 110 may include one or more GPUs, which execute program instructions to generate or change display information.
  • the display screen 194 is used to display images, videos, etc.
  • the display screen 194 includes a display panel.
  • the display panel can adopt liquid crystal display (LCD), organic light-emitting diode (OLED), active-matrix organic light-emitting diode or active-matrix organic light-emitting diode (active-matrix organic light-emitting diode).
  • LCD liquid crystal display
  • OLED organic light-emitting diode
  • active-matrix organic light-emitting diode active-matrix organic light-emitting diode
  • AMOLED flexible light-emitting diode (FLED), Miniled, MicroLed, Micro-oLed, quantum dot light-emitting diode (QLED), etc.
  • the electronic device 100 may include one or N display screens 194, and N is a positive integer greater than one.
  • the electronic device 100 can implement a shooting function through an ISP, a camera 193, a video codec, a GPU, a display screen 194, and an application processor.
  • the ISP is used to process the data fed back from the camera 193. For example, when taking a picture, the shutter is opened, the light is transmitted to the photosensitive element of the camera through the lens, the light signal is converted into an electrical signal, and the photosensitive element of the camera transfers the electrical signal to the ISP for processing and is converted into an image visible to the naked eye.
  • ISP can also optimize the image noise, brightness, and skin color. ISP can also optimize the exposure, color temperature and other parameters of the shooting scene.
  • the ISP may be provided in the camera 193.
  • the camera 193 is used to capture still images or videos.
  • the object generates an optical image through the lens and projects it to the photosensitive element.
  • the photosensitive element may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor.
  • CMOS complementary metal-oxide-semiconductor
  • the photosensitive element converts the optical signal into an electrical signal, and then transmits the electrical signal to the ISP to convert it into a digital image signal.
  • ISP outputs digital image signals to DSP for processing.
  • DSP converts digital image signals into standard RGB, YUV and other formats.
  • the electronic device 100 may include 1 or N cameras 193, and N is a positive integer greater than 1.
  • Digital signal processors are used to process digital signals. In addition to digital image signals, they can also process other digital signals. For example, when the electronic device 100 selects the frequency point, the digital signal processor is used to perform Fourier transform on the energy of the frequency point.
  • Video codecs are used to compress or decompress digital video.
  • the electronic device 100 may support one or more video codecs. In this way, the electronic device 100 can play or record videos in a variety of encoding formats, such as: moving picture experts group (MPEG) 1, MPEG2, MPEG3, MPEG4, and so on.
  • MPEG moving picture experts group
  • NPU is a neural-network (NN) computing processor.
  • NN neural-network
  • the NPU can realize applications such as intelligent cognition of the electronic device 100, such as image recognition, face recognition, voice recognition, text understanding, and so on.
  • the external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the electronic device 100.
  • the external memory card communicates with the processor 110 through the external memory interface 120 to realize the data storage function. For example, save music, video and other files in an external memory card.
  • the internal memory 121 may be used to store computer executable program code, where the executable program code includes instructions.
  • the processor 110 executes various functional applications and data processing of the electronic device 100 by running instructions stored in the internal memory 121.
  • the internal memory 121 may include a storage program area and a storage data area.
  • the storage program area can store an operating system, at least one application program (such as a sound playback function, an image playback function, etc.) required by at least one function.
  • the data storage area can store data (such as audio data, phone book, etc.) created during the use of the electronic device 100.
  • the internal memory 121 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, a universal flash storage (UFS), etc.
  • UFS universal flash storage
  • the electronic device 100 can implement audio functions through the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the earphone interface 170D, and the application processor. For example, music playback, recording, etc.
  • the audio module 170 is used to convert digital audio information into an analog audio signal for output, and is also used to convert an analog audio input into a digital audio signal.
  • the audio module 170 can also be used to encode and decode audio signals.
  • the audio module 170 may be provided in the processor 110, or part of the functional modules of the audio module 170 may be provided in the processor 110.
  • the speaker 170A also called a “speaker” is used to convert audio electrical signals into sound signals.
  • the electronic device 100 can listen to music through the speaker 170A, or listen to a hands-free call.
  • the receiver 170B also called “earpiece” is used to convert audio electrical signals into sound signals.
  • the electronic device 100 answers a call or voice message, it can receive the voice by bringing the receiver 170B close to the human ear.
  • the microphone 170C also called “microphone”, “microphone”, is used to convert sound signals into electrical signals. When making a call or sending a voice message, the user can make a sound by approaching the microphone 170C through his mouth, and input the sound signal to the microphone 170C.
  • the electronic device 100 may be provided with at least one microphone 170C. In other embodiments, the electronic device 100 may be provided with two microphones 170C, which can implement noise reduction functions in addition to collecting sound signals. In some other embodiments, the electronic device 100 can also be provided with three, four or more microphones 170C to collect sound signals, reduce noise, identify sound sources, and realize directional recording functions.
  • the earphone interface 170D is used to connect wired earphones.
  • the earphone interface 170D may be a USB interface 130, or a 3.5mm open mobile terminal platform (OMTP) standard interface, or a cellular telecommunications industry association (cellular telecommunications industry association of the USA, CTIA) standard interface.
  • OMTP open
  • the pressure sensor 180A is used to sense the pressure signal and can convert the pressure signal into an electrical signal.
  • the pressure sensor 180A may be provided on the display screen 194.
  • the capacitive pressure sensor may include at least two parallel plates with conductive material. When a force is applied to the pressure sensor 180A, the capacitance between the electrodes changes.
  • the electronic device 100 determines the intensity of the pressure according to the change in capacitance.
  • the electronic device 100 detects the intensity of the touch operation according to the pressure sensor 180A.
  • the electronic device 100 may also calculate the touched position according to the detection signal of the pressure sensor 180A.
  • touch operations that act on the same touch location but have different touch operation strengths may correspond to different operation instructions. For example: when a touch operation whose intensity of the touch operation is less than the first pressure threshold is applied to the short message application icon, an instruction to view the short message is executed. When a touch operation with a touch operation intensity greater than or equal to the first pressure threshold acts on the short message application icon, an instruction to create a new short message is executed.
  • the gyro sensor 180B may be used to determine the movement posture of the electronic device 100.
  • the angular velocity of the electronic device 100 around three axes ie, x, y, and z axes
  • the gyro sensor 180B can be used for image stabilization.
  • the gyro sensor 180B detects the shake angle of the electronic device 100, calculates the distance that the lens module needs to compensate according to the angle, and allows the lens to counteract the shake of the electronic device 100 through reverse movement to achieve anti-shake.
  • the gyro sensor 180B can also be used for navigation and somatosensory game scenes.
  • the air pressure sensor 180C is used to measure air pressure.
  • the electronic device 100 calculates the altitude based on the air pressure value measured by the air pressure sensor 180C to assist positioning and navigation.
  • the magnetic sensor 180D includes a Hall sensor.
  • the electronic device 100 can use the magnetic sensor 180D to detect the opening and closing of the flip holster.
  • the electronic device 100 can detect the opening and closing of the flip according to the magnetic sensor 180D.
  • features such as automatic unlocking of the flip cover are set.
  • the acceleration sensor 180E can detect the magnitude of the acceleration of the electronic device 100 in various directions (generally three axes). When the electronic device 100 is stationary, the magnitude and direction of gravity can be detected. It can also be used to identify the posture of electronic devices, and used in applications such as horizontal and vertical screen switching, pedometers and so on.
  • the electronic device 100 can measure the distance by infrared or laser. In some embodiments, when shooting a scene, the electronic device 100 can use the distance sensor 180F to measure the distance to achieve fast focusing.
  • the proximity light sensor 180G may include, for example, a light emitting diode (LED) and a light detector such as a photodiode.
  • the light emitting diode may be an infrared light emitting diode.
  • the electronic device 100 emits infrared light to the outside through the light emitting diode.
  • the electronic device 100 uses a photodiode to detect infrared reflected light from nearby objects. When sufficient reflected light is detected, it can be determined that there is an object near the electronic device 100. When insufficient reflected light is detected, the electronic device 100 can determine that there is no object near the electronic device 100.
  • the electronic device 100 can use the proximity light sensor 180G to detect that the user holds the electronic device 100 close to the ear to talk, so as to automatically turn off the screen to save power.
  • the proximity light sensor 180G can also be used in leather case mode, and the pocket mode will automatically unlock and lock the screen.
  • the ambient light sensor 180L is used to sense the brightness of the ambient light.
  • the electronic device 100 can adaptively adjust the brightness of the display screen 194 according to the perceived brightness of the ambient light.
  • the ambient light sensor 180L can also be used to automatically adjust the white balance when taking pictures.
  • the ambient light sensor 180L can also cooperate with the proximity light sensor 180G to detect whether the electronic device 100 is in the pocket to prevent accidental touch.
  • the fingerprint sensor 180H is used to collect fingerprints.
  • the electronic device 100 can use the collected fingerprint characteristics to realize fingerprint unlocking, access application locks, fingerprint photographs, fingerprint answering calls, etc.
  • the temperature sensor 180J is used to detect temperature.
  • the electronic device 100 uses the temperature detected by the temperature sensor 180J to execute a temperature processing strategy. For example, when the temperature reported by the temperature sensor 180J exceeds a threshold value, the electronic device 100 executes to reduce the performance of the processor located near the temperature sensor 180J, so as to reduce power consumption and implement thermal protection.
  • the electronic device 100 when the temperature is lower than another threshold, the electronic device 100 heats the battery 142 to avoid abnormal shutdown of the electronic device 100 due to low temperature.
  • the electronic device 100 boosts the output voltage of the battery 142 to avoid abnormal shutdown caused by low temperature.
  • Touch sensor 180K also called “touch panel”.
  • the touch sensor 180K may be disposed on the display screen 194, and the touch screen is composed of the touch sensor 180K and the display screen 194, which is also called a “touch screen”.
  • the touch sensor 180K is used to detect touch operations acting on or near it.
  • the touch sensor can pass the detected touch operation to the application processor to determine the type of touch event.
  • the visual output related to the touch operation can be provided through the display screen 194.
  • the touch sensor 180K may also be disposed on the surface of the electronic device 100, which is different from the position of the display screen 194.
  • the bone conduction sensor 180M can acquire vibration signals.
  • the bone conduction sensor 180M can obtain the vibration signal of the vibrating bone mass of the human voice.
  • the bone conduction sensor 180M can also contact the human pulse and receive the blood pressure pulse signal.
  • the bone conduction sensor 180M may also be provided in the earphone, combined with the bone conduction earphone.
  • the audio module 170 can parse the voice signal based on the vibration signal of the vibrating bone block of the voice obtained by the bone conduction sensor 180M, and realize the voice function.
  • the application processor may analyze the heart rate information based on the blood pressure beat signal obtained by the bone conduction sensor 180M, and realize the heart rate detection function.
  • the button 190 includes a power button, a volume button, and so on.
  • the button 190 may be a mechanical button. It can also be a touch button.
  • the electronic device 100 may receive key input, and generate key signal input related to user settings and function control of the electronic device 100.
  • the motor 191 can generate vibration prompts.
  • the motor 191 can be used for incoming call vibration notification, and can also be used for touch vibration feedback.
  • touch operations applied to different applications can correspond to different vibration feedback effects.
  • Acting on touch operations in different areas of the display screen 194, the motor 191 can also correspond to different vibration feedback effects.
  • Different application scenarios for example: time reminding, receiving information, alarm clock, games, etc.
  • the touch vibration feedback effect can also support customization.
  • the indicator 192 may be an indicator light, which may be used to indicate the charging status, power change, or to indicate messages, missed calls, notifications, and so on.
  • the SIM card interface 195 is used to connect to the SIM card.
  • the SIM card can be inserted into the SIM card interface 195 or pulled out from the SIM card interface 195 to achieve contact and separation with the electronic device 100.
  • the electronic device 100 may support 1 or N SIM card interfaces, and N is a positive integer greater than 1.
  • the SIM card interface 195 can support Nano SIM cards, Micro SIM cards, SIM cards, etc.
  • the same SIM card interface 195 can insert multiple cards at the same time. The types of the multiple cards can be the same or different.
  • the SIM card interface 195 can also be compatible with different types of SIM cards.
  • the SIM card interface 195 may also be compatible with external memory cards.
  • the electronic device 100 interacts with the network through the SIM card to implement functions such as call and data communication.
  • the electronic device 100 adopts an eSIM, that is, an embedded SIM card.
  • the eSIM card can be embedded in the electronic device 100 and cannot be separated from the electronic device 100.
  • the microphone 170C can collect user voice, and the processor 110 is used to process the user voice collected by the microphone 170C. Then, the mobile communication module 150 and the wireless communication module 160 may establish a communication connection with the human-computer interaction server 200, for example, the communication connection 101 shown in FIG. 2.
  • the display screen 194 can display the voice processing result fed back by the human-computer interaction server 200 to the user.
  • the speaker 170A and the receiver 170B can broadcast the voice processing result fed back by the human-computer interaction server 200 to the user.
  • the functional units in the various embodiments of the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • a computer readable storage medium includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor to execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage media include: flash memory, mobile hard disk, read-only memory, random access memory, magnetic disk or optical disk and other media that can store program codes.

Abstract

一种语义解析方法,包括:第一服务器从第一用户语句中提取出第一槽位的实体;如果第一槽位的实体为代词,则第一服务器将第一槽位的实体修改为第二槽位的实体;第一服务器向第二服务器发送第一服务器请求,并从第二服务器获取响应第一服务请求的第一服务结果;第一服务请求包括第一意图的指示信息以及第一槽位的实体;第二服务器是提供第一技能的应用服务器;第一服务结果是第二服务器根据第一意图以及第一槽位的实体确定的。第一服务器向电子设备返回第一服务结果;第一服务结果由电子设备输出。这样,第一服务器可以准确理解用户语句中代词的含义,不需要向用户询问代词的含义,用户体验提升。

Description

一种语义解析方法及服务器
本申请要求在2019年4月30日提交中国国家知识产权局、申请号为201910370839.7的中国专利申请的优先权,发明名称为“一种语义解析方法及服务器”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能领域,尤其涉及一种语义解析方法及服务器。
背景技术
随着语音技术的普及以及语音交互的盛行,语音助手在手机等智能电子设备中的地位也将越来越重要。语音助手大体来说可以拆解为:语音技术、内容服务。一方面是语音技术,包括语音识别、语义理解、语音合成等技术,手机厂商大多通过技术公司来实现这些技术;另一方面是内容服务,比如百科搜索、天气查询、资讯浏览等,大多由内容服务商提供。
图1示例性示出了现有语音助手和用户的对话。如图1所示,当用户说“北京明天的天气怎么样”,语音助手会给出北京明天的天气。但是,当用户接下来说“订一张那天去北京的机票”这句话时,语音助手就不确定“那天”具体是指什么时间。语音助手需要询问用户,如“请问您要订哪天的机票?”。这样,用户操作繁琐,导致用户体验差。
发明内容
本申请实施例提供了一种语义解析方法及服务器,在语义解析的过程中服务器可以准确的理解出用户语句中代词的含义,不需要向用户询问代词的含义,用户体验提升。
上述目标和其他目标将通过独立权利要求中的特征来达成。进一步的实现方式在从属权利要求、说明书和附图中体现。
第一方面,提供一种语义解析方法,该方法可包括:第一服务器从第一用户语句中提取出第一槽位的实体;第一用户语句为第一服务器接收到的用户语句;第一槽位为第一意图被配置的槽位;第一意图为第一技能被配置的意图,第一技能被配置有一个或多个意图;第一意图、第一技能是第一服务器根据第一用户语句确定的,与第一用户语句表示的服务需求相匹配;在第一槽位的实体为代词的条件下,第一服务器将第一槽位的实体修改为第二槽位的实体;第二槽位被配置为第一槽位的关联槽位,第二槽位的实体是第一服务器从第二用户语句中提取出来的;第二用户语句在第一用户语句之前被第一服务器接收到;第二槽位为第二意图被配置的槽位,第二意图被配置为所述第一意图的关联意图;第二意图为第二技能被配置的意图,第二技能被配置为所述第一技能的关联技能;第一服务器向第二服务器发送第一服务请求,并从第二服务器获取响应所述第一服务请求的第一服务结果;第一服务请求至少包括第一意图的指示信息以及第一槽位的实体;第二服务器是第一技能对应的应用服务器;第一服务结果是第二服务器根据第一意图的指示信息以及第一槽位的实体确定的。这样,第一服务器可以准确理解用户语句中代词的含义,不需要再向用户确 认代词的含义,可以提升用户体验。
结合第一方面,在一种可能的实现方式中,第一服务器接收从电子设备采集到的第一用户语句;该第一用户语句为音频形式的用户语句或文本形式的用户语句。
可选地,第一服务器接收语音识别服务器发送的第一用户语句,该语音识别服务器对电子设备采集到的音频形式的用户语句语音识别后转化为文本形式的用户语句。
结合第一方面,在一种可能的实现方式中,该方法还包括:第一服务器接收第二服务器发送的关联技能请求,该关联技能请求用于请求将第二技能配置为第一技能的关联技能;关联技能请求包含第一技能的指示信息和第二技能的指示信息;响应于关联技能请求,第一服务器向第三服务器获取确认信息;所第三服务器是第二技能对应的应用服务器;该确认信息用于第三服务器确认将第二技能配置为第一技能的关联技能;基于该确认信息,第一服务器将第二技能配置为第一技能的关联技能。这样,第一技能的技能开发者和第二技能的卡技能开发者可以查看对方技能的槽位设置,从而进行进一步的关联配置。
结合第一方面,在一种可能的实现方式中,该方法还包括:第一服务器接收第二服务器发送的关联槽位请求,关联槽位请求用于请求将第二槽位配置为第一槽位的关联槽位;关联槽位请求包含第一槽位的指示信息和第二槽位的指示信息;响应于关联槽位请求;第一服务器将第二槽位配置为第一槽位的关联槽位。这样,当第一槽位的实体是代词时,第一服务器就可以把第二槽位的实体修改为第二槽位的实体。
结合第一方面,在一种可能的实现方式中,该方法还包括:第一服务器判断第一槽位的槽位类型与第二槽位的槽位类型是否相同;若相同,则第一服务器将第二槽位配置为第一槽位的关联槽位。这样,避免了不同类型的槽位关联会影响语义解析准确性的情况发生。
结合第一方面,在一种可能的实现方式中,该方法还包括:若第一槽位被配置的实体来源于系统词库,则第一服务器将第二槽位的槽位名称与第一槽位的槽位名称关联;系统词库是第一服务器提供给所有技能的词库;系统词库使得配置的实体来源于同一个系统词库的所有槽位的实体集合相同;第二槽位被配置的实体来源与第一槽位被配置的实体来源相同;在第一槽位被配置的实体来源于第一自定义词库的条件下,第一服务器将第二槽位的槽位名称与第一槽位的槽位名称关联;第一服务器将第一自定义词库与第二自定义词库关联;第一自定义词库为第一槽位被配置的实体集合;第一自定义词库为第一服务器为第一技能创建的词库;第一自定义词库包含有限的词;第二自定义词库为第二槽位的被配置实体集合;第二自定义词库为第一服务器为第二技能创建的词库;第二自定义词库包含有限的词。这样,可以实体来源于自定义词库的槽位能够关联成功。
结合第一方面,在一种可能的实现方式中,第一服务结果由电子设备输出;输出的方式至少包括第一服务结果在电子设备的屏幕上显示、第一服务结果由所述电子设备语音播报。这样,终端用户可以获取到服务结果。
第二方面,提供一种语义解析方法,该方法可包括:第二服务器接收第一服务器发送的第一服务请求;第一服务请求包括第一意图的指示信息和第一槽位的实体;在第一用户语句中被提取出的第一槽位的实体为代词的条件下,第一槽位的实体从代词被修改为了第二槽位的实体;第二槽位被配置为第一槽位的关联槽位;第一用户语句是电子设备采集并发送给第一服务器的;第一槽位为第一意图被配置的槽位;第一意图为第一技能被配置的意图,第一技能被配置有一个或多个意图;第二服务器为第一技能对应的应用服务器;第 一技能、第一意图是第一服务器根据第一用户语句确定的,与第一用户语句表示的服务需求相匹配;第二用户语句在第一用户语句之前被电子设备采集到;第二槽位为第二意图被配置的槽位,第二意图为第二技能被配置的意图;第二技能被配置为第一技能的关联技能;第二技能、第二意图是第一服务器根据第二用户语句确定的,与第二用户语句表示的服务需求相匹配;响应于第一服务请求,第二服务器向第一服务器发送第一服务结果;第一服务结果是第二服务器根据第一意图的指示信息以及第一槽位的实体确定的。
结合第二方面,在一种可能的实现方式中,第二服务器向第一服务器发送关联技能请求,关联技能请求用于请求将第二技能配置为所述第一技能的关联技能;第一请求包含第一技能的指示信息和第二技能的指示信息。这样,可以使得第一服务器将第一技能和第二技能关联。
结合第二方面,在一种可能的实现方式中,第二服务器向第一服务器发送关联槽位请求;关联槽位请求用于请求将第二槽位配置为第一槽位的关联槽位;第二请求包含第一槽位的指示信息和第二槽位的指示信息。这样,可以使得第一服务器将第一槽位和第二槽位关联。
第三方面,提供一种语义解析方法,该方法可包括:第一服务器从第一用户语句中提取出第一槽位的实体;第一用户语句为第一服务器接收到的用户语句;第一槽位为第一意图被配置的槽位;第一意图为第一技能被配置的意图,第一技能被配置有一个或多个意图;第一意图、第一技能是第一服务器根据第一用户语句确定的,与第一用户语句表示的服务需求相匹配;在第一槽位的实体为代词的条件下,则第一服务器将第一槽位的实体修改为第一候选语句对应的第一候选实体;第一候选语句为M个候选语句中打分排序后打分最高的候选语句;M个候选语句为从K个候选候选中语义识别的置信度大于置信度阈值的候选语句;K个候选语句为K个候选实体分别替换第一用户语句中第一槽位的实体得到的候选语句;K个候选实体为第一服务器从第二用户语句中提取出的第二槽位的实体;第二槽位的槽位类型与第一槽位的槽位类型相同;第二用户语句在第一用户语句之前被第一服务器接收到;K>=1;M<=K;第一服务器基于第一意图和第一槽位的实体,获取针对第一服务请求的第一服务结果;第一服务请求包含第一意图的指示信息和第一槽位的实体;第一服务器向电子设备返回第一服务结果;第一服务结果是第二服务器根据第一意图的指示信息以及第一槽位的实体确定的;第二服务器是第一技能对应的应用服务器。这样,第一服务器可以理解用户语句中代词的含义。
第四方面,还提供一种服务器,该服务器运用于人机对话系统中,包括:通信接口、存储器和处理器;所述通信接口、存储器与处理器耦合,存储器用于存储计算机程序代码,计算机程序代码包括计算机指令,当处理器从存储器中读取计算机指令,以使得服务器执行如第一方面中任一种可能的实现方式,或如第二方面中任一种可能的实现方式,或如第三方面中任一种可能的实现方式。
第五方面,提供一种计算机可读存储介质,包括指令,其特征在于,当上述指令在服务器上运行时,以使得服务器执行如第一方面中任一种可能的实现方式,或如第二方面中任一种可能的实现方式,或如第三方面中任一种可能的实现方式。
第六方面,提供一种计算机程序产品,当所述计算机程序产品在计算机上运行时,使得所述计算机执行如第一方面中任一种可能的实现方式,或如第二方面中任一种可能的实 现方式,或如第三方面中任一种可能的实现方式。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对本申请实施例中所需要使用的附图进行说明。
图1为现有技术中一种人机对话的终端界面示意图;
图2为本申请实施例提供的一种人机对话系统的组成示意图一;
图3为本申请实施例提供的一种人机对话系统的组成示意图二;
图4A-图4D为本申请实施例提供的创建技能的一些电子设备界面示意图;
图5A-图5B为本申请实施例提供的完成创建技能的一些电子设备界面示意图;
图6A-图6D为本申请实施例提供的技能建群的一些电子设备界面示意图;
图7A-图7B为本申请实施例提供的技能间建群的交互示意图;
图8为本申请实施例提供的技能间配置的电子设备界面示意图;
图9为本申请实施例提供的技能间配置的一些电子设备界面示意图;
图10为本申请实施例提供的查看共享技能的一种电子设备界面示意图;
图11为本申请实施例提供的一种人机对话的终端界面示意图;
图12A-图12B为本申请实施例提供的一种人机系统实现地点实体共享示意图;
图13为本申请实施例提供的又一种人机对话的终端界面示意图;
图14A-图14B为本申请实施例提供的一种人机系统实现时间实体共享示意图;
图15为本申请实施例提供的一种语义解析方法的流程示意图;
图16为本申请实施例提供的又一种语义解析方法的流程示意图;
图17为本申请实施例提供的另一种语义解析方法的流程示意图;
图18为本申请实施例提供的另一种语义解析方法示例性应用流程图;
图19为本申请实施例提供的一种服务器的结构示意图;
图20本申请实施例提供的一种电子设备的结构示意图。
具体实施方式
本申请以下实施例中所使用的术语只是为了描述特定实施例的目的,而并非旨在作为对本申请的限制。如在本申请的说明书和所附权利要求书中所使用的那样,单数表达形式“一个”、“一种”、“所述”、“上述”、“该”和“这一”旨在也包括复数表达形式,除非其上下文中明确地有相反指示。还应当理解,本申请中使用的术语“和/或”是指并包含一个或多个所列出项目的任何或所有可能组合。
以下介绍了电子设备、用于这样的电子设备的用户界面、和用于使用这样的电子设备的实施例。在一些实施例中,电子设备可以是还包含其它功能诸如个人数字助理和/或音乐播放器功能的便携式电子设备,诸如手机、平板电脑、具备无线通讯功能的可穿戴电子设备(如智能手表)等。便携式电子设备的示例性实施例包括但不限于搭载
Figure PCTCN2020086002-appb-000001
Figure PCTCN2020086002-appb-000002
或者其它操作系统的便携式电子设备。上述便携式电子设备也可以是其它便携式电子设备,诸如具有触敏表面或触控面板的膝上型计算机(Laptop)等。还应当理解的是,在其他一些实施例中,上述电子设备也可以不是便携式电子设备,而是具有触敏表面 或触控面板的台式计算机。
本申请的说明书和权利要求书及附图中的术语“用户界面(user interface,UI)”,是应用程序或操作系统与用户之间进行交互和信息交换的介质接口,它实现信息的内部形式与用户可以接受形式之间的转换。应用程序的用户界面是通过java、可扩展标记语言(extensible markup language,XML)等特定计算机语言编写的源代码,界面源代码在终端设备上经过解析,渲染,最终呈现为用户可以识别的内容,比如图片、文字、按钮等控件。控件(control)也称为部件(widget),是用户界面的基本元素,典型的控件有工具栏(toolbar)、菜单栏(menu bar)、文本框(text box)、按钮(button)、滚动条(scrollbar)、图片和文本。界面中的控件的属性和内容是通过标签或者节点来定义的,比如XML通过<Textview>、<ImgView>、<VideoView>等节点来规定界面所包含的控件。一个节点对应界面中一个控件或属性,节点经过解析和渲染之后呈现为用户可视的内容。此外,很多应用程序,比如混合应用(hybrid application)的界面中通常还包含有网页。网页,也称为页面,可以理解为内嵌在应用程序界面中的一个特殊的控件,网页是通过特定计算机语言编写的源代码,例如超文本标记语言(hyper text markup language,GTML),层叠样式表(cascading style sheets,CSS),java脚本(JavaScript,JS)等,网页源代码可以由浏览器或与浏览器功能类似的网页显示组件加载和显示为用户可识别的内容。网页所包含的具体内容也是通过网页源代码中的标签或者节点来定义的,比如GTML通过<p>、<img>、<video>、<canvas>来定义网页的元素和属性。
用户界面常用的表现形式是图形用户界面(graphic user interface,GUI),是指采用图形方式显示的与计算机操作相关的用户界面。它可以是在电子设备的显示屏中显示的一个图标、窗口、控件等界面元素,其中控件可以包括图标、按钮、菜单、选项卡、文本框、对话框、状态栏、导航栏、Widget等可视的界面元素。
图2示例性示出了本申请涉及的人机对话系统10。如图2所示,人机对话系统10可以包括电子设备100、人机交互服务器200,以及一个或多个内容提供商的应用服务器300。本申请中,内容提供商的应用服务器可以称为第三方应用服务器。电子设备100与人机交互服务器200之间可以采用电信网络(3G/4G/5G等通信网络)通信技术或者无线保真(Wireless Fidelity,Wi-Fi)通信技术等建立通信连接。人机交互服务器200与第三方应用服务器300之间可以通过局域网或者广域网建立通信连接。其中:
电子设备100可用于采集用户语句,并向人机交互服务器200发送该用户语句。该用户语句中可以表示用户的服务需求。例如天气查询需求、机票预订需求等等。可选地,电子设备100可以将采集到的音频形式的用户语句转化为文字形式的用户语句,然后向人机交互服务器200发送文字形式的用户语句。电子设备100还可用于接收人机交互服务器200反馈的基于用户服务需求的服务结果,例如天气查询结果、机票预订结果等等。电子设备100还可以将接收的服务结果反馈给用户。这些功能可以是电子设备100基于语音助手来完成的。电子设备100上可以安装该语音助手。该语音助手可以是一款语音交互应用程序。语音助手又可以称为聊天助手、聊天机器人等等。本申请对其命名不做限制。通过语音助手,用户和电子设备100可以进行语音交互。
具体地,电子设备100可以为手机、平板电脑、个人计算机(Personal Computer,PC)、 个人数字助理(personal digital assistant,PDA)、智能手表、上网本、可穿戴电子设备、增强现实技术(Augmented Reality,AR)设备、虚拟现实(Virtual Reality,VR)设备、车载设备、智能汽车、智能音响等,本申请对该电子设备100的具体形式不做特殊限制。
人机交互服务器200可用于接收电子设备100发送的用户语句。人机交互服务器200对用户语句进行语义理解,通过语义理解确定该用户语句对应的技能(如“墨迹天气查询”技能)和意图(如对话意图“查天气”)。并从该用户语句中提取出该用户语句对应意图下配置槽位(如“城市槽位”)的实体(如“北京”)。之后,人机交互服务器200基于用户语句的意图和从用户语句中提取出对应槽位的实体,向第三方应用服务器300发送服务请求。人机交互服务器200发送的服务请求与用户语句中表示服务需求相匹配。该服务请求中可以包含该用户语句对应意图的指示信息(如“查天气”)和对应槽位的实体(如“明天、北京”)。例如,天气查询服务请求中可以包含从用户语句中提取到的时间和城市,机票预订服务请求中可以包含从用户语句中提取到的订票时间、出发地、目的地等。人机交互服务器200还可以接收第三方应用服务器300返回的服务结果,例如,天气查询结果、机票预订结果等等。最后,人机交互服务器200将接收到的服务结果发送给电子设备100。
第三方应用服务器300可用于接收人机交互服务器200发送的服务请求。第三方应用服务器300根据服务请求(如“天气查询请求”)中包含该用户语句对应意图的指示信息,如查天气,以及从用户语句中提取出的实体(如“明天,北京”)获取该服务请求的服务结果(明天北京的天气)。第三方应用服务器300可以将该服务请求的服务结果返回给人机交互服务器200。
可选地,该人机交互系统10中还可以包括语音识别服务器(如科大讯飞的语音识别服务器、百度语音识别服务器),电子设备100接收到用户语句之后,发送给语音识别服务器进行语音识别,然后将用户语句转化为文字后发送给人机交互服务器200进行语义解析。
基于图2所示的人机对话系统10,图3示出了本申请涉及的人机对话的总体流程。下面展开:
电子设备100可以采集到用户301的用户语句。该用户语句可以称为语音输入30a。电子设备100可以通过语音识别模块302将语音输入30a转化为文本形式,即文本输入30b。最后,电子设备100可以将音频形式的用户语句,即语音输入30a发送给人机交互服务器200。电子设备100也可以将文本形式的用户语句,即文本输入30b发送给人机交互服务器200。具体地,电子设备100可以通过图2中示出的通信连接101将文本输入30b发送给人机交互服务器200。
人机交互服务器200可以接收到电子设备100发送的用户语句(语音输入30a或文本输入30b)。人机交互服务器200中可以有语音识别模块,该语音识别模块用来将语音形式的用户语句转化为文本形式的用户语句。在用户语句为语音输入30a的情况下,人机交互服务器200中的语音识别模块可以对语音输入进行语音识别后转化为文本。人机交互服务器200可以对用户语句进行语义理解,并从用户语句中提取出用户的服务需求。之后,人机交互服务器200还可以基于用户的服务需求向第三方应用服务器300发送服务请求。人机交互服务器还可以接收第三方应用服务器300返回的服务结果,并将该服务结果发送给 电子设备100。人机交互服务器200可以包括:语义理解模块303、对话管理模块304、自然语言生成模块305,其中:
语义理解模块303可用于对用户语句(音频形式的语音输入30a或文本形式的文本输入30b)进行语义识别。具体地,语义理解模块303可以将用户语句(音频形式的语音输入30a或文本形式的文本输入30b)进行技能分类、意图分类以及槽位提取。一般情况下,人机交互服务器200上集成有多个具体的技能,每个技能对应着一种类型的服务或者功能,例如:订餐服务、打车服务、查询天气等。关于如何创建技能下文将详细描述,此处先不赘述。每个技能下可以配置有一个或多个意图。例如“墨迹天气查询”技能下可以配置有:对话意图“查天气”、问答意图“查天气”。每个意图下可以配置有一个或多个槽位。例如对话意图“查天气”可以配置有时间槽位和城市槽位。关于意图的配置和槽位的配置也会在下文中详细描述,此处先不赘述。
对话管理(Dialog Management)模块304可用于将语义理解模块303输出作为输入,并根据历史输入决策此时人机交互服务器200要作出的操作。对话管理模块304可以包括两部分:状态追踪、对话策略。状态追踪包括持续追踪对话的各种信息,根据旧状态,用户状态(自然语言理解模块303输出的信息)与系统状态(即通过与数据库的查询情况)来更新当前的对话状态。对话策略与所在任务场景息息相关,通常作为对话管理模块303的输出。
自然语言生成(Natural Language Generation,NLG)模块305用于根据对话管理模块304的输出,生成文本信息反馈给用户。其中,自然语言生成模块305可以采用基于模版,基于语法或基于模型等方式生成自然语言。基于模版与基于语法主要是基于规则的策略,基于模型可以采用例如长短期记忆网络(Long Short-Term Memory,LSTM)等。本申请实施例对自然语言生成的具体实现方式不做限定。
关于上面内容中提及的一些基本概念,如技能、意图、槽位,下面进行说明。
(1)技能
技能可以是一项服务或功能,例如天气查询服务、机票预定服务等等。技能可以由第三方应用(如“墨迹天气”)的开发者来配置。一个技能下可以配置有一个或多个意图。具体地,第三方应用的开发者可以通过电子设备400登录人机交互服务器200的技能创建页面来创建技能。
(2)意图
一个意图可以是一个技能下更为细化的服务或功能。意图可以分为对话意图和问答意图。需要参数化的应该使用对话意图,比如订购火车票意图,里面需要车次,出发时间等参数,则应该使用对话意图。问答意图更偏好于解决FAQ类型的问题。比如退票费怎么收?一个意图下可以配置有一个或多个槽位。
(3)槽位
槽位为用户语句中用来表达用户意图的关键信息,例如,用户意图为对话意图“查天气”,那么人机交互服务器200需要从语句中提取的槽位为城市槽位和时间槽位。城市槽位用来表明查询“哪里”的天气,时间槽位用来表明查询“哪天”的天气。
槽位可以包含槽位名称和槽位类型等属性。举例来说,若槽位名称相当于槽位的具体 参数,那么槽位类型就是参数的取值集合,取值集合中一个值代表着一个实体。例如,“北京明天天气怎么样”这句话中可以提取城市槽位、时间槽位,其中城市槽位的实体为“北京”,时间槽位的实体为“明天”。
槽位类型用来指示槽位被配置的实体来源于哪一个词库(系统词库或自定义词库)。例如,城市槽位被配置的实体可以是来自系统词库(如系统地点类词库),也可以是自定义词库(如自定义地点类词库)。系统词库是人机交互服务器200提供的词库,可以供每个技能选择。系统词库的词是不可枚举的。不同技能下被配置的槽位的配置实体来源可以是同一个系统词库。多个槽位被配置的实体来源若为同一个系统词库,则这多个槽位被配置的实体集合是相同的。自定义词库是人机交互服务器200为某个技能建立的词库。自定义词库中的实体是有限的。例如,人机交互服务器200为墨迹天气技能提供的自定义词库,那么该自定义词库只可以供墨迹天气技能下配置的槽位在被配置实体来源时选择。其他技能的槽位配置的实体来源不可以选择该自定义词库。
槽位可以包括必填槽位、非必填关键槽位、非必填非关键槽位。必填槽位是一句话中必须要有的槽位。如果用户输入语句中必填槽位缺失,人机交互服务器200会无法正确的理解用户输入语句的含义。非必填关键槽位可以在一句话中没有出现,但是人机交互服务器200可以根据GPS或者默认信息等等来填充该槽位信息。人机交互服务器200获取到的用户输入语句如果缺失非必填非关键槽位,不会影响人机交互服务器200对该用户输入语句的语义理解。举例来说,“今天深圳南山科技园的天气怎么样”这句话中,实体“今天”对应的时间槽位为非必填关键槽位,实体“深圳”对应的城市槽位为必填槽位,实体“南山科技园”对应的地区槽位为非必填非关键槽位。
在本申请中,人机交互服务器200中能够提供与第一用户语句表示的服务需求相匹配服务的技能被称为第一技能。第一技能中配置的匹配第一用户语句中表示的服务需求的意图被称为第一意图,第一意图配置有第一槽位。人机交互服务器200从第一用户语句中提取出的第一槽位的实体可以表达该第一用户语句中表示的服务需求的关键信息。第一技能对应的应用服务器接收到了包含第一意图的指示信息和第一槽位的实体才能够提供该第一用户语句中表示的服务需求相对应的服务。举例来说,第一用户语句为“查询今天北京的天气”。第一用户语句表示的服务需求是查天气。那么人机交互服务器200中与该服务需求相匹配的“墨迹天气查询”技能被称为第一技能。第一技能中被配置的对话意图“查天气”就是第一意图。第一意图被配置的城市槽位或者时间槽位为第一槽位。
下面接着说明可用于创建技能、创建意图、训练人机对话模型等配置工作的用户界面。
1.创建技能
图4A示例性示出了电子设备400显示的可用于创建技能的用户界面40A。如图4A所示,用户界面40A中可以显示有控件401(“创建技能”)。电子设备400可以检测到作用于控件401的选择操作。该选择操作可以是在控件401上的鼠标操作(如鼠标单击操作),也可以是在控件401上的触控操作(如手指点击操作),等等。响应该选择操作,电子设备400可以刷新用户界面40A。
刷新后的用户界面40A可以如图4B所示可以包括:控件402、控件403。其中控件402可用于用户(如墨迹天气开发人员)输入技能名称,控件403可用于用户(如墨迹天气开发人员)输入技能分类。例如,如图4B所示,用户可以通过控件402设置技能名称 为“墨迹天气查询”,可以通过控件403设置技能分类为“查询天气”。
刷新后的用户界面40A可以如图4B所示还可以包括:控件404(“保存”)。电子设备400可以检测到作用于控件404的选择操作。该选择操作可以是在控件404上的鼠标操作(如鼠标单击操作),也可以是在控件404上的触控操作(如手指点击操作),等等。响应该选择操作,电子设备400可以基于用户设置的技能名称、技能分类来创建技能。
2.创建意图以及设置意图关联的槽位
图4C示例性示出了电子设备400显示的可用于创建意图以及设置意图关联的槽位的用户界面40C。如图4C所示,用户界面40C中可以显示控件405、控件406、控件407。其中,控件405可用于用户(如墨迹天气开发人员)输入意图名称。控件406用于显示用户(如墨迹天气开发人员)输入的意图名称(如“查天气”)。控件407可以用于(如墨迹天气开发人员)新增槽位。用户电子设备400可以检测到作用于控件407的选择操作。该选择操作可以是在控件407上的鼠标操作(如鼠标单击操作),也可以是在控件407上的触控操作(如手指点击操作),等等。响应该选择操作,电子设备400可以刷新用户界面40C。
刷新后的用户界面40C可以如图4D所示可以包括控件:控件408、控件409、控件4010。控件408可用于用户(如墨迹天气开发人员)设置“查天气”意图中的城市槽位。界面40D中可以显示出城市槽位对应的槽位类型中的实体来源为系统词库sys.location.city,该城市槽位的属性为必填槽位。控件409可用于用户(如墨迹天气开发人员)设置“查天气”意图中的时间槽位。界面40D中可以显示出时间槽位对应的槽位类型中的实体来源为系统词库sys.time,该时间槽位的属性为非必填关键槽位。在本申请实施例中,槽位类型中的实体来源主要为系统词库和自定义词库(也可称之为用户字典)。系统词库为人机交互服务器200预先设置的词库,系统词库内的实体为不可枚举,例如:sys.time、sys.location.city、sys.name、sys,phoneNum等。自定义词库为技能开发者自行定义的词库,自定义词库内的词为有限数量。
3.训练人机对话模型
图5A示例性示出了电子设备400显示的可用于训练人机对话模型的用户界面50A。如图5A所示,用户界面50A中可以显示有控件501(“开始训练”)。电子设备400可以检测到作用于控件501的选择操作。该选择操作可以是在控件501上的鼠标操作(如鼠标单击操作),也可以是在控件501上的触控操作(如手指点击操作),等等。响应该选择操作,电子设备400可以刷新用户界面50A。
人机交互服务器200训练出来的新技能(如“墨迹天气查询”技能)的人机对话模型可以将用户输入语句进行技能分类、意图分类以及槽位提取。举例来说,假设人机交互服务器200训练的为“墨迹天气查询”技能的人机对话模型。那么该人机对话模型就能够识别出用户输入语句(如:明天北京天气如何)对应的技能为“墨迹天气查询”,对应的意图为对话意图“查天气”,以及提取出城市槽位对应实体(北京)和时间槽位对应实体(明天)。
刷新后的用户界面50A可以如图5B所示可以包括:控件502、控件503。其中,控件 502可以用于重新训练该人机对话模型。控件503(“发布技能”)可以用于发布创建好的技能(如查天气技能)。
经过创建技能、创建意图、训练人机对话模型等一系列的配置工作,语音助手便可以和用户进行语音交互,识别用户的服务需求,并向用户反馈服务结果。
但是现有的语音助手不能确定出用户语句中代词所指代的具体含义。因为,在识别出用户语句对应的技能、意图之后,人机交互服务器200可以进一步从该用户语句中确定出意图关联的槽位对应的实体。如果该某个槽位对应的实体是代词,则现有的人机交互服务器200便不能确定出该代词所指代的具体含义。
例如,当用户说“查询明天北京的天气”时,现有的语音助手可识别出这个用户语句对应的技能为“墨迹天气查询”技能,还可以识别出这个用户语句对应的意图为对话意图“查天气”。而且,现有的语音助手还可以从这个用户语句中确定出对话意图“查天气”关联的槽位(如时间槽位、城市槽位)对应的实体。具体的,时间槽位对应的实体为明天,城市槽位对应的实体为北京。
当用户接着说“帮我订一张那天去北京的机票”时,现有的语音助手可识别出这个用户语句对应的技能为机票预订技能,还可以识别出这个用户语句对应的意图为对话意图“订机票”。现有的语音助手还可以从这个用户语句中确定出对话意图“订机票”关联的槽位(如时间槽位、出发地槽位、目的地槽位)对应的实体。具体的,时间槽位对应的实体为代词“那天”,出发地槽位对应的实体为用户当前所处的位置,目的地槽位对应的实体为北京。电子设备100可以通过定位技术(如GPS定位等)确定出发地,并将该出发地告知人机交互服务器200。
由于时间槽位对应的实体为代词“那天”,因此现有的语音助手便不能确定代词“那天”所指代的具体含义,无法准确确定用户的服务需求。
为了解决现有的语音助手存在的问题,本申请以下实施例提供了一种人机交互方法,可以确定出人机对话中的代词所指代的含义。如“这里”、“那天”等等代词所指代的具体含义。这样可以提高语音交互过程中用户使用电子设备的效率,提升用户体验。
本申请中人机交互服务器200可以将不同技能建群,然后将建群的技能中的技能1(如墨迹天气技能)配置为技能2(如去哪儿旅行技能)的关联技能。当人机交互服务器200检测到用户的服务需求从技能1切换到技能2,而对应技能2的用户输入语句中有代词。其中,技能1配置为技能2的关联技能。然后人机交互服务器200通过获取技能2的关联技能,即技能1中的实体来确定代词的含义。关于技能之间如何建群,建群之后的如何进行技能间的实体共享配置将会在下文进行详细说明,此处先不赘述。
人机交互服务器200先接收到的电子设备100发送的用户语句A(如“明天北京天气如何”)表达的服务需求对应技能A(如“墨迹天气查询”技能)。人机交互服务器200后接收到的电子设备100发送的用户语句B(如“下个星期那里的天气怎么样”)表达的服务需求也对应技能A。人机交互服务器200后接收的用户语句B中有代词。由于同一个技能下,同一意图关联的槽位是一样的。例如,电子设备100中的语音助手先采集到用户语句A“明天北京天气如何”并给用户返回天气查询结果。之后,电子设备100中的语音助手紧接着采集到的用户语句B“下个星期那里的天气如何”,因为用户语句A和用户语句B 对应的技能是一样的,都是“墨迹天气查询”技能。用户语句A和用户语句B对应的意图也是一样的,都是对话意图“查天气”。因此,人机交互服务器200需要从用户语句A和用户语句B中提取出的槽位也是一样的,人机交互服务器200需要提取的槽位为时间槽位和城市槽位。当人机交互服务器200从用户语句B“下个星期那里的天气如何”中提取到的城市槽位对应的实体为代词“那里”。人机交互服务器200直接将用户语句A中提取到的城市槽位对应实体“北京”来替代“那里”。从而确定代词所指代的含义。
下面对技能间建群、配置技能共享进行详细说明。
1.技能间建群
技能间建群可以指一个技能和其他技能在人机交互服务器200中建立映射关系。例如,人机交互服务器200将墨迹天气技能与去哪儿旅行技能建立映射关系。人机交互服务器200保存这两个技能建立的映射关系后,人机交互服务器200就允许这两个技能互相查看对方的槽位设置。
图6A-图6D示例性了示出了电子设备400在技能间建群的用户界面。下面将详细说明。
如图6A所示,用户界面60A中可以包括:控件601(“邀请技能”)、控件602(“收到待确认邀请”)。电子设备400可以检测到作用于控件601的选择操作。该选择操作可以是控件601上的鼠标操作(如鼠标单击操作),也可以是在控件601上触控操作(如手指点击操作),等等。响应该选择操作,电子设备400可以刷新用户界面60A。
刷新后的用户界面60B可以如图6B所示可以包括:控件603、控件604。其中,控件603可用于用户(如墨迹天气开发人员)选择要进行建群邀请的技能。控件604可用于用户(如墨迹天气开发人员)向人机交互服务器200发送建群请求。
如图6C所示,电子设备400可以检测到作用于控件602的选择操作。该选择操作可以是控件602操作(如鼠标单击操作),也可以是在控件602上触控操作(如手指点击操作),等等。响应该选择操作,电子设备400可以刷新用户界面60A。
刷新后的用户界面60A可以如图6D所示,电子设备400可以显示具体收到了哪些技能的建群邀请。例如,如图6D所示,电子设备400可以显示用户“收到京东的邀请”、“收到淘宝的邀请”。
图7A-图7B示例性了示出了两个技能建群的交互过程,下面将展开描述。
如图7A所示,图7A示例性的展示了墨迹天气应用向去哪儿旅行应用发送建群邀请的过程。界面70A为电子设备400显示的墨迹天气应用发起技能建群的用户界面。界面70A中可以包括:控件701、控件702。其中,控件701可用于墨迹天气开发人员确定要邀请的技能。例如,如图7A所示,墨迹天气应用开发人员通过控件702确定邀请技能为“去哪儿旅行”技能。控件702可用于墨迹天气应用开发人员向人机交互服务器200发送技能邀请。具体地,人机交互服务器200可以通过图2中示出的通信连接102接收墨迹天气应用发送的建群请求700。然后,人机交互服务器200通过通信连接102将建群请求700发送给去哪儿旅行应用。电子设备400可以显示去哪儿旅行应用接收建群请求界面70B。该界面70B中可以包含控件703和控件704。其中,控件703可用于去哪儿旅行应用开发人 员同意墨迹天气应用的建群邀请。控件703可用于去哪儿旅行应用开发人员拒绝墨迹天气应用的建群邀请。下文以电子设备检测到作用于控件703的选择操作为例进行说明。
如7B所示,图7B示例性的展示了去哪儿旅行应用响应墨迹天气应用建群邀请的过程。电子设备400检测到作用于界面70B中控件703的选择操作。该选择操作可以是在控件501上的鼠标操作(如鼠标单击操作),也可以是在控件703上的触控操作(如手指点击操作),等等。响应该选择操作,电子设备400可以箱人机交互服务器200发送同意建群响应707。具体地,人机交互服务器200通过图2中示出的通信连接102接收去哪儿旅行发送的同意建群响应707。然后,人机交互服务器200通过通信连接102将同意建群响应707发送给墨迹天气应用。
墨迹天气和去哪儿旅行在人机交互服务器200中建群成功后,人机交互服务器200可以生成墨迹天气和去哪儿旅行之间的映射关系。然后,人机交互服务器200可以保存墨迹天气和去哪儿旅行之间的映射关系。
电子设备400可以显示墨迹天气应用接收同意建群响应界面70C。该界面80C中可以包括:控件705、控件706。其中,控件705可用于墨迹天气应用开发人员进行技能间配置。控件706可用于墨迹天气应用开发人员打开聊天窗口向去哪儿旅行技能发送消息。
本申请实施例仅以墨迹天气技能和去哪儿旅行技能建群为例,还可以是其他技能间建群,并且一个技能还可以和多个技能进行建群,以及多个技能组件一个群等等,此处不应构成限制。
2.配置实体共享
配置实体共享为配置一个技能(如墨迹天气查询技能)与另一个技能(如去哪儿旅行技能)共享实体。共享实体可以是指一个技能(如墨迹天气查询技能)对应的用户语句A中出现代词时,人机交互服务器200将与墨迹天气技能关联的另一个技能(如去哪儿旅行技能)关联意图中配置的槽位的实体拿来替换用户语句A中的代词。
图8-图10示例性示出了墨迹天气技能与去哪儿旅行技能配置实体共享的过程。
图8示例性示出了电子设备400显示的用于查天气配置实体共享的用户界面80A。如图8所示,用户界面80A可以显示有控件801。控件801可用于用户(如墨迹天气应用开发人员)显示可以选择的能够进行技能共享配置的技能。电子设备400可以检测到作用于控件801的选择操作。该选择操作可以是在控件801上的鼠标操作(如鼠标单击操作),也可以是在控件801上的触控操作(如手指点击操作),等等。响应该选择操作,电子设备400可以刷新用户界面80A。
刷新后的用户界面80A可以如图9中用户界面90A所示。用户界面90A可以显示控件901,控件901可用于选择技能列表中的要进行技能共享配置的技能。例如,如用户界面90A所示,用户可以通过控件901选择“去哪儿旅行”技能进行技能共享配置。电子设备400可以检测到作用于控件901的选择操作。该选择操作可以是在控件901上的鼠标操作(如鼠标单击操作),也可以是在控件901上的触控操作(如手指点击操作),等等。响应该选择操作,电子设备400可以刷新用户界面90A。
刷新后的用户界面90A可以如用户界面90B所示可以显示有控件902(“共享实体”)。电子设备400可以检测到作用于控件902的选择操作。该选择操作可以是在控件902上的 鼠标操作(如鼠标单击操作),也可以是在控件902上的触控操作(如手指点击操作),等等。响应该选择操作,人机交互服务器200可以配置“查天气”技能中“城市槽位”和“去哪儿旅行订票”技能中“目的地槽位”共享实体。人机交互服务器200将目的地槽位”配置为“城市槽位”的关联槽位。具体地,当“城市槽位”被配置的实体来源为系统词库(如sys.location.city),“目的地槽位”的被配置的实体来源也为该系统词库(如sys.location.city)。人机交互服务器200将“城市槽位”的槽位名称和“目的地槽位”的槽位名称关联。当“城市槽位”的实体来源于人机交互服务器200为墨迹天气技能创建的自定义词库时,人机交互服务器200将“城市槽位”的槽位名称和“目的地槽位”的槽位名称关联,以及将“城市槽位”被配置的实体来源的自定义词库与“目的地槽位”被配置实体来源的系统词库或自定义词库关联。配置实体共享的界面不限于用户界面90B所示的界面,还可以是通过命令行实现实体共享的界面,此处不作限定。
如图10所示,界面100A显示了墨迹天气技能中城市槽位和去哪儿旅行技能中目的地槽位共享实体后的详情。其中,界面100A可以是表格的形式存储在人机交互服务器200中。也可以是,人机交互服务器200保存了墨迹天气技能城市槽位和去哪儿旅行技能目的地槽位之间共享实体的映射关系。此处不做限制。
下面结合实施例一至实施例二来详细介绍本申请提供的语音交互方法。
实施例一
图11示例性示出了实施例一所基于的人机对话。图11中示例性的展示了实体为地点的共享实体场景。如图11所示,电子设备100可以显示人机对话界面110A。
首先,电子设备100可以在界面110A显示采集到的用户语句1101“订一张明天从上海到北京的机票”。然后,电子设备100中的语音助手可以向用户反馈订票结果(未示出)。这里,反馈订票结果可包括但不限于以下两种方式:方式1.电子设备100可以将订票结果以网页的形式(未示出)显示在界面110A中;方式2.电子设备100也可以将订票结果语音播报给用户。
接下来,电子设备100可以采集到用户语句1102“对了,明天那里的天气怎么样”,并在界面110A中显示1102。人机交互服务器200从用户语句1102中提取到城市槽位对应的实体为代词“那里”。人机交互服务器200确定城市槽位与用户语句1101中对应的目的地槽位共享实体。然后,人机交互服务器200将目的地槽位的实体“北京”代替代词“那里”。这样,电子设备100能够正确地将天气查询结果反馈给用户。这里,反馈天气查询结果可包括但不限于以下两种方式:方式1、电子设备100可以将天气查询结果以网页的形式(未示出)显示在界面110A中;方式2、电子设备100也可以将天气查询结果语音播报给用户。
用户语句1101也可以是“订一张去北京的票,从上海出发的”。人机交互服务器200仍然可以从用户语句1101中提取出目的地槽位的实体“北京”,然后,人机交互服务器200将目的地槽位的实体“北京”代替用户语句1102中的代词“那里”。这样,电子设备100能够正确地将天气查询结果反馈给用户。
基于图11所示的人机对话,图12A-图12B示出了实施例一提供的人机对话方法在人机对话系统10中的实施。
图12A具体示出了人机对话系统10处理订票请求的过程。
1.发送订票请求
如图12A所示,电子设备100可以采集到用户语句1101。可选地,电子设备对用户语句1201进行语音识别后转换成文本1202。电子设备100将文本1202发送给人机交互服务器200。
如图12A所示,人机交互服务器200可以接收到文本1202。人机交互服务器200中对文本1202进行技能分类、意图分类和槽位提取。具体地,如图3所示的人机交互服务器200中的语义理解模块303可以对文本1202进行技能分类。然后,人机交互服务器200可以利用文本1202对应技能下的人机对话模型来对文本1202进行意图分类以及槽位提取。具体地,该人机对话模型可以是图5A中训练的人机对话模型。人机交互服务器200可以如表1201的形式保存文本1202对应的技能和槽位。从表1201可以看出,文本1202对应的技能为去哪儿旅行技能。因此,人机交互服务器200向去哪儿旅行服务器301发送订票请求1203。订票请求1203中可以包含请求参数如“明天,上海,北京”。该请求参数可以为人机交互服务器200从文本1202中提取到的槽位对应的实体。此处对订票请求1203的具体形式不作限定。
如图12A所示,去哪儿旅行服务器301可以接收到订票请求1203。去哪儿旅行服务器301可以基于订票请求1203以及订票请求1203中包含的请求参数“明天,上海,北京”获取到订票结果1204。
2.反馈订票结果
如图12A所示,去哪儿旅行服务器301可以向人机交互服务器200返回订票结果1204(明天北京到上海的航班)。
如图12A所示,人机交互服务器200可以在接收到去哪儿旅行服务器301反馈的订票结果1204之后,向电子设备100发送订票结果1204。可选地,人机交互服务器200可以将订票页面发送给电子设备100。人机交互服务器200也可以将订票参数发送给电子设备100。电子设备100可以根据订票参数后生成订票页面。
如图12A所示,电子设备100可以在接收到人机交互服务器200发送的订票结果1204之后,输出(显示或语音播报)明天从上海到北京的订票结果。
图12B具体示出了人机对话系统10处理天气查询请求的过程。
1.发送查询请求
如图12B所示,电子设备100可以采集到用户语句1102。可选地,电子设备对用户语句1102进行语音识别后转换成文本1206。电子设备100将文本1206发送给人机交互服务器200。
如图12B所示,服务器200可以接收到文本1206。人机交互服务器200中对文本1206进行技能分类、意图分类和槽位提取。具体地,如图3所示的人机交互服务器200中的语义理解模块303可以对文本1206进行技能分类。然后,人机交互服务器200可以利用文本1206对应技能下的人机对话模型来对文本1206进行意图分类以及槽位提取。具体地,该人机对话模型可以是图5A中训练的人机对话模型。人机交互服务器200可以如表1205的形式保存文本1206对应的技能和槽位。
如图12B所示,表1205中示出的文本1206中城市槽位处的实体为代词“那里”。因此,人机交互服务器200需要查询文本1206中的城市槽位是否有共享实体。从图8-图10可知,墨迹天气技能中的城市槽位和去哪儿旅行技能中的目的地槽位进行过共享实体配置。所以,人机交互服务器200会直接将存储器中保存的表1201中目的地对应的实体“北京”共享给表1205中的城市槽位。这样,人机交互服务器200就知道文本1306的具体意图是“查询明天北京的天气”。然后,人机交互服务器200发送查询请求1207给墨迹天气服务器302。其中,查询请求1207中可以包含请求参数如“明天,北京”。该请求参数可以为人机交互服务器200从文本1206中提取到的槽位对应的实体。此处对查询请求1207的具体形式不作限定。
如图12B所示,墨迹天气服务器302可以接收到查询请求1207。墨迹天气服务器302可以基于查询请求1207以及查询请求1207中包含的参数“明天,北京”获取到查询结果1208。
2.反馈查询结果
如图12B所示,墨迹天气服务器302可以向人机交互服务器200返回查询结果1208(如北京明天的天气预报)。
如图12B所示,人机交互服务器200可以在接收到墨迹天气服务器302反馈的查询结果1208之后,向电子设备100发送查询结果1208。可选地,人机交互服务器200可以将天气预报页面发送给电子设备100。人机交互服务器200也可以将天气预报参数发送给电子设备100。电子设备100可以根据天气预报参数生成天气预报页面。
如图12B所示,电子设备100可以在接收到人机交互服务器200发送的查询结果1208之后,输出(显示或语音播报)明天北京的天气情况。
在本申请实施例一中,由于墨迹天气技能的城市槽位和去哪儿旅行技能的目的地槽位配置过了共享实体。所以,当用户语句1102中城市槽位对应实体为代词“那里”,人机交互服务器200仍能理解用户语句1102的“那里”是指“北京”。人机交互服务器200不需要向用户确认用户语句1102中的代词“那里”的含义。用户体验提升。
实施例二
图13示例性示出了实施例二所基于的人机对话。图13中展示了示例性的展示了实体为时间的共享实体场景。如图13所示,电子设备100可以显示人机对话界面130A。
首先,电子设备100可以在界面130A显示采集到的用户语句1301“北京明天天气怎么样”。然后,电子设备100中的语音助手可以向用户反馈查询结果(未示出)。这里,反馈查询结果可包括但不限于以下两种方式:方式1、电子设备100可以将查询结果以网页的形式(未示出)显示在界面110A中;方式2、电子设备100也可以将查询结果语音播报给用户。
接下来,电子设备100可以采集到用户语句1302“订一张那天去北京的机票”。人机交示例性的展示了实体为地点的共享实体互服务器200从用户语句1302中提取到时间槽位对应的实体为代词“那天”。人机交互服务器200确定时间槽位与用户语句1301对应的时间槽位共享实体。然后,人机交互服务器200将用户语句1301对应的时间槽位的实体 “明天”代替代词“那天”。这样,电子设备100能够正确地将订票结果反馈给用户。电子设备100中的语音助手可以向用户反馈订票结果(未示出)。这里,反馈订票结果可包括但不限于以下两种方式:方式1、电子设备100可以将订票结果以网页的形式(未示出)显示在界面110A中;方式2、电子设备100也可以将订票结果语音播报给用户。
基于图13所示的人机对话,图14A-图14B示出了实施例二提供的语音交互方法在人机对话系统10的实施。
图14A具体示出了人机对话系统10处理天气查询请求的过程。
1.发送查询请求
如图14A所示,电子设备100可以采集到用户语句1301。可选地,电子设备对用户语句1201进行语音识别后转换成文本1402。电子设备100将文本1302发送给人机交互服务器200。
如图14A所示,服务器200可以接收到文本1402。人机交互服务器200中对文本1402进行技能分类、意图分类和槽位提取。具体地,如图3所示的人机交互服务器200中的语义理解模块303可以对文本1402进行技能分类。然后,人机交互服务器200可以利用文本1402对应技能下的人机对话模型来对文本1402进行意图分类以及槽位提取。具体地,该人机对话模型可以是图5A中训练的人机对话模型。人机交互服务器200可以如表1401的形式保存文本1402对应的技能和槽位。从表1401可以看出,文本1402对应的技能为墨迹天气技能。因此,人机交互服务器200向墨迹天气服务器302发送查询请求1403。查询请求1403中可以包含请求参数如“明天,北京”。该请求参数可以为人机交互服务器200从文本1402中提取到的槽位对应的实体,此处对查询请求1403的具体形式不作限定。
如图14A所示,墨迹天气服务器302可以接收到查询请求1403。墨迹天气服务器302可以基于查询请求1403以及查询请求1403中包含的参数“明天,北京”获取到查询结果1404(如明天北京的天气预报)。
2.反馈查询请求
如图14A所示,墨迹天气服务器302可以向人机交互服务器200返回查询结果1404(如明天北京的天气预报)。
如图14A所示,人机交互服务器200可以在接收到墨迹天气服务器302反馈的查询结果(如明天北京的天气预报)1404之后,向电子设备100发送查询结果1404。可选地,人机交互服务器200可以将天气预报页面发送给电子设备100。人机交互服务器200也可以将天气预报参数发送给电子设备100。电子设备100可以根据天气预报参数后生成天气预报页面。
如图14A所示,电子设备100可以在接收到人机交互服务器200发送的查询结果1404之后,输出(显示或语音播报)明天北京的天气查询结果。
图14B具体示出了人机对话系统10处理订票请求的过程。
1.发送订票请求
如图14B所示,电子设备100可以采集到用户语句1302。可选地,电子设备对用户语句1302进行语音识别后转换成文本1406。电子设备100将文本1406发送给人机交互服务器200。
如图14B所示,服务器200可以接收到文本1406。人机交互服务器200中对文本1406进行技能分类、意图分类和槽位提取。具体地,如图3所示的人机交互服务器200中的语义理解模块303可以对文本1406进行技能分类。然后,人机交互服务器200可以利用文本1406对应技能下的人机对话模型来对文本1406进行意图分类以及槽位提取。具体地,该人机对话模型可以是图5A中训练的人机对话模型。人机交互服务器200可以如表1405的形式保存文本1406对应的技能和槽位。
如图14B所示,表1405中示出的文本1406中时间槽位处的实体为代词“那天”。因此,人机交互服务器200需要查询文本1406中的时间槽位是否有共享实体。去哪儿旅行技能中时间槽位和墨迹天气技能中时间槽位已配置过共享实体。去哪儿旅行技能中时间槽位和墨迹天气技能中时间槽位配置共享实体过程可参考图8-图10示出的共享实体配置过程。所以,人机交互服务器200会直接将存储器中保存的表1401中时间槽位对应的实体“明天”共享给表1405中的时间槽位。这样,人机交互服务器200就知道文本1406的具体意图是“订明天从深圳(GPS定位城市)到北京的机票”。然后,人机交互服务器200发送订票请求1407给去哪儿旅行服务器301。其中,订票请求1407中可以包含请求参数如“明天,深圳,北京”。该请求参数可以为人机交互服务器200从文本1406中提取的槽位对应的实体。此处对订票请求1407的具体形式不作限定。
如图14B所示,去哪儿旅行服务器301可以接收到订票请求1407。去哪儿旅行服务器301可以基于查询请求1207以及订票请求1407中包含的参数“明天,深圳,北京”获取到订票结果1408(如明天深圳到北京的航班)。
2.反馈订票请求
如图14B所示,去哪儿旅行服务器301可以向人机交互服务器200返回订票结果1408(如明天深圳到北京的航班)。
如图14B所示,人机交互服务器200可以在接收到去哪儿旅行服务器301反馈的订票结果1408之后,向电子设备100发送订票结果1408。可选地,人机交互服务器200可以将订票页面发送给电子设备100。人机交互服务器200也可以将订票参数发送给电子设备100。电子设备100可以根据订票参数后生成订票页面。
如图14B所示,电子设备100可以在接收到人机交互服务器200发送的订票结果1408之后,输出(显示或语音播报)明天从深圳到北京的订票结果。
在本申请实施例二中,由于墨迹天气技能的时间槽位和去哪儿旅行技能的时间槽位配置过了共享实体。所以,当用户语句1302中时间槽位对应实体为代词“那天”,人机交互服务器200仍能理解用户语句1302的“那天”是指“明天”。人机交互服务器200不需要向用户确认语句1302中的代词“那天”的含义。用户体验提升。
在本申请提供的语音交互方法中,人机交互服务器200接收电子设备100采集到的第一用户语句;人机交互服务器200从第一用户语句中提取出第一槽位的实体;第一槽位为第一意图被配置的槽位;第一意图为第一技能被配置的意图,第一技能被配置有一个或多个意图;所述第一意图、第一技能是人机交互服务器200根据第一用户语句确定的,与第一用户语句表示的服务需求相匹配;如果第一槽位的实体为代词,则人机交互服务器200将第一槽位的实体修改为第二槽位的实体;第二槽位被配置为第一槽位的关联槽位,第二槽位的实体是人机交互服务器200从第二用户语句中提取出来的;第二用户语句在第一用 户语句之前被电子设备100采集到;配置有第二槽位的意图为第二意图,第二意图被配置为第一意图的关联意图;配置有第二意图的技能为第二技能,第二技能被配置为第一技能的关联技能;人机交互服务器200向第三方应用服务器300发送第一服务器请求,并从第三方应用服务器300获取响应第一服务请求的第一服务结果;第一服务请求包括第一意图的指示信息以及第一槽位的实体;第三方应用服务器300是提供第一技能的应用服务器;第一服务结果是第三方应用服务器300根据所述第一意图以及所述第一槽位的实体确定的。人机交互服务器200向电子设备100返回所述第一服务结果;第一服务结果由电子设备100输出。
图15示出了本申请实施例提供的一种语义解析方法的总体流程。下面展开:
阶段1:在前的语音交互(S101-S107)
S101、电子设备100采集用户语句A,并通过语音识别模块处理后发送给人机交互服务器200。
用户语句A可以是图15中示出的用户语句1501“查询明天北京的天气”。电子设备100中语音识别模块对用户语句1501进行语音识别。可选地,电子设备100发送人机交互服务器200的用户语句A可以是音频形式的,可以是文本形式的。此处不做限定。
S102、人机交互服务器200接收用户语句A。
用户在使用电子设备100与人机交互服务器200进行对话交互时,可以通过语音的形式,也可以通过文本的形式,向人机交互服务器200提出相应的服务需求。若用户以语音形式输入时,人机交互服务器200可以对语音进行识别,识别为文本形式,并输入到语义理解模块303中。若用户以文本形式输入时,则人机交互服务器200将用户输入的文本输入到语义理解模块303中。
其中,用户语句A可以是用户与人机交互服务器200的单轮对话中的一次语句,也可以是用户与人机交互服务器200的多轮对话中的多次语句,本申请实施例不做限定。
人机交互服务器200可以通过如图2中所示的通信连接101接收电子设备100发送的用户语句A。
S103、人机交互服务器200从用户语句A中提取槽位A的实体,槽位A为意图A被配置的槽位,意图A根据用户语句A确定,意图A为技能A被配置的意图。
由于第一用户语句可以表示用户的服务需求,即用户想要人机交互服务器200提供的一项服务。语义理解模块303可以根据第一用户语句进行搜索和筛选,以确定出用户语句A对应的意图A和意图关联的槽位信息(包含槽位A)。
其中,意图A为人机交互服务器200上技能A(如查天气技能)中的一个意图(如对话意图“查天气”)。技能开发者在配置该技能时,会对该技能中的意图A配置相应的槽位(如城市槽位,时间槽位),即意图A需要提取哪些槽位,以及各个槽位的属性。因此,在确定用户语句A对应的意图A后,人机交互服务器200可以利用该意图A对应的人机对话模型输出该意图A关联的槽位配置。例如,当用户语句A为“查询明天北京的天气”。那么人机交互服务器200可以确定用户语句A对应的意图A为对话意图“查天气”。对话意图“查天气”对应的人机对话模型可以输出该意图关联的槽位为时间槽位和城市槽位。时间槽位对应实体为“明天”,城市槽位对应实体为“北京”。此处,槽位A可以是城市槽 位,。
需要说明的是,有一些槽位的信息可以是用户默认设置的,或者可以通过其他方式(如,GPS定位)获取的信息,并不一定是从用户语句A中提取的。
S104、人机交互服务器200基于意图A和槽位A的实体,获取针对服务请求A的服务结果A;服务请求A包含意图A的指示信息和槽位A的实体。
人机交互服务器200获取到具体意图和意图关联的槽位信息后,就会向该与意图有映射关系的第三方应用服务器发送服务请求。意图和第三方应用服务器之间的映射关系可以在人机交互服务器200接收第一用户语句之前建立。意图和第三方应用服务器之间的映射关系还可以是在人机交互服务器创建技能时建立。此处不做限定。举例来说,对话意图“查天气”对应墨迹天气服务器。对话意图“订机票”对应去哪儿旅行服务器。第一服务请求可以是天气查询请求,也可以是订票请求,此处不做限定。人机交互服务器200获取到的意图是对话意图“查天气”。“查天气”对话意图对应的槽位分别为时间槽位和城市槽位。人机交互服务器200获取到时间槽位对应的实体“明天”和城市槽位对应的实体“北京”。然后,人机交互服务器200向墨迹天气服务器发送天气查询请求。天气查询请求中包括查询时间“明天”以及查询城市“北京”。对应的,人机交互服务器200获取到的服务结果可以是明天北京的天气预报。
S105、第三应用2服务器302根据接收到的服务请求A获取服务结果A,向人机交互服务器200反馈服务结果A。
第三方应用2服务器302(如墨迹天气应用服务器)接收人机交互服务器发送的服务请求A(如查询天气请求)。第三方应用2服务器302根据服务请求A和服务请求A中携带的参数(如“明天,北京”)获取服务结果A(如明天北京的天气查询结果)。然后第三方应用2服务器302将服务结果A返回给人机交互服务器200。
S106、人机交互服务器接收服务结果A,将服务结果A发送给电子设备100。
具体地,人机交互服务器200发送的服务结果A可以是网页的形式。服务结果A也可以是参数的形式,由电子设备100生成相应的网页。此处不做限定。
S107、电子设备100接收服务结果A,并输出服务结果A。
电子设备100可以将服务结果A(如北京明天的天气预报)以网页的形式显示在屏幕上供用户查看。电子设备100还可以将A服务结果语音播报给用户。此处对电子设备100输出服务结果A的形式不做限定。
阶段2:在后的语音交互(S108-S117)
S108、电子设备100采集用户语句B,并通过语音识别模块处理后发送给人机交互服务器200。
电子设备100可以采集到的用户语句B可以是语句1502“订一张明天去那里的机票”。可选地,第二用户语句可以是音频形式的,也可以是文本形式的。
具体地,电子设备100可以通过图3中示出的通信连接101将用户语句B发送给人机交互服务器200。
S109、人机交互服务器200接收用户语句B。
人机交互服务器200接收的用户语句B的过程可参考步骤102中人机交互服务器200接收用户语句A的过程,此处不再赘述。
S110、人机交互服务器200从用户语句B中提取出槽位B的实体;槽位B为意图B被配置的槽位;意图B根据用户语句B确定,意图B为技能B被配置的意图。
人机交互服务器200识别出用户语句B对应的技能为机票预订技能,还可以识别出对话意图对应的意图为对话意图“订机票”。人机交互服务器200还可以从这个用户语句B确定出对话意图“订机票”关联的槽位(如时间槽位、出发地槽位、目的地槽位)对应的实体。具体的,时间槽位对应的实体为“明天”,出发地槽位对应的实体为用户当前所处的位置,目的地槽位对应实体为代词“那里”。至于人机交互服务器200是如何进行技能分类和意图识别提取槽位的,在步骤103中已描述,此处不再赘述。
S111、如果槽位B的实体为代词,人机交互服务器200将槽位B的实体修改为槽位A的实体,槽位A被配置为槽位B的关联槽位,技能A被配置为技能B的关联技能。
具体地,如果槽位B的实体为代词。如用户语句1502“订一张明天去那里的机票”中的目的地槽位对应代词“那里”。人机交互服务器200无法确定代词“那里”的具体含义。因为人机交互服务器200将槽位A和槽位B配置了共享实体。所以,人机交互服务器200将槽位A的实体来替换槽位B中的实体。人机交互服务器200将槽位A和槽位B配置了共享实体。举例来说,“查天气”技能中槽位A(如“查天气”技能中的“城市槽位”)和“去哪儿技能”中的“目的地槽位”配置共享实体。当“城市槽位”的实体为代词时,人机交互服务器200会将“目的地槽位”的实体来替换“城市槽位”的实体。人机交互服务器200中配置共享实体的过程如图8-图10所示,此处不再赘述。
如图12B所示,人机交互服务器200将表1201中的目的地槽位对应的实体“北京”替代表1205城市槽位的实体“那里”。然后,表1205中的城市槽位对应的实体就是“北京”。在此处,第二槽位即为表1305中的城市槽位,代词“那里”的含义即为“北京”。又如图14B所示,人机交互服务器200将表1401中的时间槽位对应的实体“明天”替代表1405时间槽位的实体“那天”。然后,表1505中的时间槽位对应的实体就是“明天”。在此处,第一槽位即为表1405中的时间槽位,代词“那天”的含义即为“明天”。
S112、人机交互服务器200基于意图B和槽位B的实体,从第三方应用服务器获取针对服务请求B的服务结果B;服务请求B包含意图B的指示信息和槽位B的实体。
人机交互服务器200获取到具体意图和意图对应的槽位信息后,向与意图B(如对话意图“订机票”)有映射关系的第三方应用服务器(如“去哪儿旅行服务器”)发送服务请求B(如“订明天深圳到北京的机票”)。例如,人机交互服务器200获取到的意图是对话意图“订机票”。对话意图“订机票”意图对应的槽位分别为时间槽位、出发地槽位和目的地槽位。人机交互服务器200获取到时间槽位对应的实体“明天”、出发地槽位对应的实体为“深圳”和目的地槽位对应的实体“北京”。那么,人机交互服务器200就会向去哪儿旅行服务器发送服务请求A(如订机票请求)。该订机票请求中包括对话意图“订机票”的指示信息,时间“明天”以及出发地“深圳”、目的地“北京”。人机交互服务器200获取的服务结果B可以是明天深圳到北京的航班信息。对话意图“订机票”的指示信息可以是对话意图“订机票”的名称,也可以是对话意图“订机票”的ID等等。该对话意图“订机票”的指示信息可以用来指示该意图。
S113、第三方应用1服务器301根据接收到的服务请求B获取服务结果B,向人机交互服务器200反馈服务结果B。
第三方应用1服务器301接收到人机交互服务器200发送的服务请求B(如订票请求)。然后,第三方应用1服务器301根据服务请求B(如订票请求)以及第二服务请求参数(如“明天,深圳,北京”)获取第二服务结果(如明天深圳到北京的机票航班)。之后,第三方应用1服务器301将服务结果B发送给人机交互服务器200。具体地,第三方应用1服务器可以通过如图2中示出的通信连接102将服务结果B发送给人机交互服务器200。
S114、人机交互服务器200接收服务结果B,并将服务结果B发送给终端设备100。
人机交互服务器200发送的服务结果B可以是网页的形式。服务结果B也可以是参数的形式,由电子设备100生成相应的网页。此处不作限定。
S115、电子设备100接收服务结果B,并输出服务结果B。
具体地,电子设备100可以将服务结果B以网页的形式显示在屏幕上供用户查看。电子设备100还可以将服务结果B语音播报给用户。此处不做限定。
本申请实施例提供的语义解析方法,可以通过电子设备100采集第一用户语句,并向人机交互服务器200发送第一用户语句。人机交互服务器200接收电子设备100采集到的第一用户语句;人机交互服务器200从第一用户语句中提取出第一槽位的实体;第一槽位为第一意图被配置的槽位;第一意图为第一技能被配置的意图,第一技能被配置有一个或多个意图;所述第一意图、第一技能是人机交互服务器200根据第一用户语句确定的,与第一用户语句表示的服务需求相匹配;如果第一槽位的实体为代词,则人机交互服务器200将第一槽位的实体修改为第二槽位的实体;第二槽位被配置为第一槽位的关联槽位,第二槽位的实体是人机交互服务器200从第二用户语句中提取出来的;第二用户语句在第一用户语句之前被电子设备100采集到;配置有第二槽位的意图为第二意图,第二意图被配置为第一意图的关联意图;配置有第二意图的技能为第二技能,第二技能被配置为第一技能的关联技能;人机交互服务器200向第三方应用服务器300发送第一服务器请求,并从第三方应用服务器300获取响应第一服务请求的第一服务结果;第一服务请求包括第一意图的指示信息以及第一槽位的实体;第三方应用服务器300是第一技能对应的应用服务器;第一服务结果是第三方应用服务器300根据所述第一意图以及所述第一槽位的实体确定的。人机交互服务器200向电子设备100返回所述第一服务结果;第一服务结果由电子设备100输出。人机交互服务器200不需要向用户询问代词的含义,提高了用户体验。
在本申请中图15提供的语义解析方法中步骤S101之前,本申请提供的语义解析方法还包括如图16所示的创建技能、技能间建群、配置技能共享等步骤。这些步骤具体如下:
S201、人机交互服务器200创建第三方应用对应的技能,创建的技能A中配置意图A,意图A中配置槽位A;创建的技能B中配置意图B,意图B中配置槽位B。
具体地,人机交互服务器200可以基于第三方应用服务器301(如墨迹天气服务器)提供的技能(墨迹天气查询技能)创建技能A(如“查天气”技能),技能A中配置意图A(如对话意图“查天气”),意图A中配置槽位A(如“城市槽位”)。人机交互服务器200可以基于第三方应用服务器302(如去哪儿旅行服务器)提供的技能(如去哪儿旅行订票技能)创建技能B(如“订机票”技能),技能B中配置意图B(如对话意图“订机票”),意图B中配置槽位B(如“目的地槽位”)。第三方应用可以是墨迹天气应用,也可以是淘 宝应用、京东应用等等,此处不做限制。关于如何创建技能,请参考上文对图4A-图4D示出的创建技能的过程的描述,此处不再赘述。
S202-S204、人机交互服务器200将技能A配置为技能B的关联技能。
具体地,人机交互服务器200接收提供技能A(如“墨迹天气”技能)的第三方应用服务器301(如墨迹天气服务器)发送的请求A。请求A用于将技能A(如“墨迹天气”技能)配置为技能B(如“去哪儿旅行”技能)的关联技能。请求A中包含技能A的指示信息和技能B的指示信息。技能A的指示信息可以是技能A的名称,也可以是技能A的ID等可以表示技能A的信息。技能B的指示信息可以是技能B的名称,也可以是技能B的ID等可以表示技能B的信息。人机交互服务器200将请求A以及技能A的指示信息和技能B的指示信息发送给提供技能B(如“去哪儿旅行”技能)的第三方应用服务器302(如去哪儿旅行服务器)。第三方应用服务器302收到请求A,返回针对请求A的响应A(如“同意”请求)。人机交互服务器200接收到响应(如“同意”)后将技能A配置为技能B的关联技能。然后,人机交互服务器200会保存技能A与技能B之间的关联关系。
人机交互服务器200将技能A(如“墨迹天气”技能)配置为技能B(如“去哪儿旅行”技能)的关联技能的过程可参考图7A-图7B所示出的墨迹天气技能被配置为去哪儿旅行技能关联技能的过程,此处不再赘述。
S205、人机交互服务器200接收第三方应用服务器302发送的请求B,请求B用于请求人机交互服务器200将槽位A配置为槽位B的关联槽位。请求B中包含槽位A的指示信息和槽位B的指示信息。
具体地,人机交互服务器200根据槽位A的指示信息和槽位B的指示信息将槽位A(如城市槽位)配置为槽位B(如目的地槽位)的关联槽位,即将槽位A与槽位B进行共享实体配置。槽位A的指示信息可以是槽位A的槽位名称,也可以是槽位A的ID等信息。槽位B的指示信息可以是槽位B的槽位名称,也可以是槽位B的ID等信息。共享实体的配置过程如图8-图10所示,此处不再赘述。
可选地,若槽位A(如城市槽位)被配置的实体来源于系统词库,则人机交互服务器200将槽位B(如目的地槽位)的槽位名称与槽位A的槽位名称关联;系统词库使得配置的实体来源于系统词库的所有槽位的实体集合相同;槽位B被配置的实体来源于该系统词库;若槽位A(如城市槽位)被配置的实体来源于第一自定义词库,则人机交互服务器200将槽位B(如目的地槽位)的槽位名称与槽位A的槽位名称关联;人机交互服务器200将第一自定义词库与第二自定义词库关联;第一自定义词库为槽位A被配置的实体集合;第二自定义词库为所述槽位B的被配置实体集合;所述槽位A被配置的实体集合与槽位B被配置的实体集合不同。
在本申请提供的实施例中,人机交互服务器通过创建技能、技能间建群、配置技能共享等配置。当人机交互服务器接收到用户语句中对应的槽位为代词时,人机交互服务器可以通过获取关联槽位的实体来替代该代词。从而,人机交互服务器可以知道该代词的含义。
另外,本申请还提供了另一种语义解析方法。当人机交互服务器200未提取到第二输入中第二槽位对应的实体。人机交互服务器200中也没有配置过第二槽位的共享实体。该 方法可以利用打分排序模型找到候选实体填充第二槽位。
图17示出了本申请提供的另一种人机对话方法的总体流程。下面展开:
S301、人机交互服务器200接收电子设备100采集到的用户语句A。
步骤S102中已对人机交互服务器200接收电子设备100采集的用户语句A的进行过描述,此处不再赘述。电子设备100采集用户语句A在步骤S101中已有描述,此处不再赘述。
S302、人机交互服务器200从用户语句A中提取出槽位的实体,槽位A为意图A被配置的槽位,意图A根据用户语句A确定,意图A为技能A被配置的槽位。
步骤302可参考步骤S103,此处不再赘述。
S303-S308、利用打分排序模型找到候选实体替代第二槽位的实体。
S303、如果槽位A的实体为代词,则人机交互服务器200提取用户语句B的全部槽位对应的实体中,用户语句B在用户语句A之前被人机交互服务器200接收到。
人机交互服务器200提取出对话管理模块中保存的用户语句B中的槽位以及槽位对应的实体。举例来说,假设人机交互服务器200保存的用户语句B为“明天北京天气怎么样”,其槽位为时间槽位和城市槽位。时间槽位对应实体为“明天”,城市槽位对应实体为“北京”。人机交互服务器200会将时间槽位的实体“明天”以及城市槽位的实体“北京”都提取出来。用户语句B可以是用户与人机交互服务器200的单轮对话中的一次语句,也可以是用户与人机交互服务器200的多轮对话中的多次语句,本申请实施例不做限定。
S304、人机交互服务器200找出与槽位A的实体信息类型相同的K个候选实体。
人机交互服务器200根据槽位A的信息来筛选保存的槽位以及对应的实体信息,例如,若槽位A对应的实体是地点,那么筛选出来的候选实体也是表示地点的实体。这样得到K个候选实体,K为大于1的自然数。举例来说,假设槽位A是城市槽位,对应的实体就应该是地点类型的。若人机交互服务器200从用户语句B中提取到的槽位以及对应的实体有:“时间槽位,明天”,“时间槽位,今天”,“出发地槽位,北京”,“目的地槽位,上海”,“城市槽位,深圳”。那么人机交互服务器200会选择“北京”,“上海”,“深圳”作为候选实体。
S305、人机交互服务器200将K个候选实体分别替换槽位A的实体得到K个候选语句。
人机交互服务器200将K个候选实体分别填入用户语句A中的槽位A处,可以得到K个候选语句。举例来说,假设用户语句A为“订一张明天去那里的机票”。用户语句A的意图为订票。订票意图下的槽位分别为时间槽位、出发地槽位、目的地槽位。用户语句A中的时间槽位对应实体为“明天”,出发地槽位的实体未体现,但是可以默认为GPS定位城市(如,深圳),目的地槽位只有代词“那里”。因此,人机交互服务器200需要找到目的地槽位对应的实体。假设人机交互服务器200在S303步骤中找出的候选实体为“北京”和“上海”。那么候选语句分别为“订一张明天去北京的机票”、“订一张明天去上海的机票”。
S306、人机交互服务器200利用自然语言理解模型识别K个候选语句,并输出K个候选语句的语义和对应的置信度。
举例来说,假设候选语句1为“订一张明天去北京的机票”。候选语句2为“订一张明天去上海的机票”。那么人机交互服务器200会利用自然语言理解模型输出候选语句1和候选语句2的语义以及置信度。候选语句1的置信度为0.9,候选语句2的置信度为0.9。
S307、人机交互服务器200将K个候选语句中对应置信度大于预设值的M个候选语句利用打分排序模型进行排序,其中,M<=K。
举例来说,假设置信度阈值为0.85。那么步骤S150中的候选语句1和候选语句2的置信度均大于置信度阈值。人机交互服务器200再通过打分排序模型对候选语句1和候选语句2进行排序。此时,K=2,M=2。打分排序模型可以是用神经网络构建的模型,也可以是排序算法如冒泡排序、选择排序等算法构建的模型。此处不做限定。打分排序模型的训练数据可以是来自网上的调查问卷。调查问卷中给出对话情景,如:用户先说“订一张从深圳去北京的机票”,然后用户接着说“那里的天气怎么样”。最后让网友给用户所说的“那里”是指“深圳”还是“北京”打分。然后将网友打分的结果进行统计,选择打分高的结果作为打分排序模型的输出。
S308、人机交互服务器200将打分最高的候选语句中的候选实体替代槽位A的实体。
具体地,假设S150中提到的候选语句1打分为90分,候选语句2打分为95分。那么,人机交互服务器200就会选择“上海”填充槽位A。
S309、人机交互服务器200基于意图A和槽位A的实体,获取针对服务请求A的服务结果A;服务请求A包括意图A的指示信息和槽位A的实体。
步骤S309可参考步骤S104,此处不再赘述。
基于图17示出的另一语义解析方法,图18示出了该方法的示例性应用。
S401、电子设备100获取用户语句1803“现在打车去那里”。
电子设备100当前获取的用户语句1803为“现在打车去那里”。电子设备100之前还为用户提供过人机交互服务,如图18中所述的电子设备100在接收用户语句1803之前接收过用户语句1801,并基于用户语句1801给出了执行结果1802。
S402、人机交互服务器200接收电子设备100发送的用户语句1803,通过语义理解模块分析技能和意图以及提取槽位。
人机交互服务器200接收到用户语句1803的后通过语义理解模块对文本进行分析。人机交互服务器200分析出输入语句1803对应的技能为“打车服务”,意图为“打车”,槽位为“时间”和“打车目的地”。但是槽位“打车目的地”实体为代词“那里”。人机交互服务器200需要查询“打车服务”技能是否有共享的技能。可以通过共享技能提取地点类的共享实体来替代槽位“打车目的地”的对应的实体“那里”。
S403、人机交互服务器200未查询到共享技能,调用对话管理模块来查询用户语句1801中的槽位以及实体信息。
人机交互服务器200首先要查询是否有共享技能,没有共享技能后就调用对话管理模块中的历史轮对话。如本实施例中给出的历史轮对话为1801“用高德查从华为到肯德基的路况”。1701的槽位“出发地”对应实体为“华为”,槽位“目的地”对应实体为“肯德基”。“华为”和“肯德基”都是地点类的实体,与“打车目的地”的实体类型相同。
S404、人机交互服务器200调用对话管理模块将用户语句1801的实体替换用户语句1803中的槽位“打车目的地”的实体“那里”,得到候选语句。
人机交互服务器200调用对话管理模块用实体“华为”和“肯德基”分别替换用户语句1803中的槽位“打车目的地”,得到候选语句1“打车去华为”和候选语句2“打车去肯德基”。
S405、人机交互服务器200通过语义理解模块对候选语句进行语义识别。
人机交互服务器200通过语义理解模块303得到候选语句1和候选语句2的语义识别结果和置信度。“华为”和“肯德基”替换槽位“打车目的地”的实体的置信度均为0.9。人机交互服务器200可以预设置信度阈值,筛掉低于预设置信度阈值的候选语句。本申请实施例中的预设置信度阈值为0.8,候选语句1和候选语句2均高于预设置信度阈值。
S406、人机交互服务器200通过对话管理模块中的打分排序模型对高于置信度阈值的候选输入进行打分排序,选择打分最高的候选语句中的实体来替代槽位“打车目的地”的实体“那里”。
人机交互服务器200将候选语句1和候选语句2作为打分排序模型的输入,得到打分排序结果。如图18步骤S206中所示,候选语句1“打车去肯德基”排第一,分值为98,候选语句2“打车去华为”排第二,分值为95,因此,选择排序第一,打分高的“肯德基”作为槽位“打车目的地”的实体,并执行对应的打车服务。
S407、人机交互服务器200通过自然语言生成模块将收到的执行打车服务的结果生成自然语言反馈给用户。
人机交互服务器200将打车意图以及槽位信息发给对应打车技能的服务器,获得打车技能服务器返回的打车结果,人机交互服务器200中的自然语言生成模块将打车结果生成自然语言后发送给电子设备100。
S408、电子设备100给用户展示打车服务结果。
电子设备100显示打车服务页面或者语音播报打车结果,此处不做限定。
本申请实施例提供的语义解析方法,人机交互服务器通过打分排序模型来找到实体替代用户中的代词。从而,人机交互服务器不需要向用户询问就可以知道用户语句中代词的含义,可以提升用户体验。
可以理解的是,上述终端等为了实现上述功能,其包含了执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,本申请实施例能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明实施例的范围。
本申请实施例可以根据上述方法示例对上述终端等进行功能模块的划分,例如,可以对应各个功能划分各个功能模块,也可以将两个或两个以上的功能集成在一个处理模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。需要说明的是,本发明实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。
如图18所示,为本申请实施例公开了一种服务器200的硬件结构示意图,服务器200包括至少一个处理器201、至少一个存储器202、至少一个通信接口203。可选的,服务器 200还可以包括输出设备和输入设备,图中未示出。
处理器201、存储器202和通信接口203通过总线相连接。处理器201可以是一个通用中央处理器(Central Processing Unit,CPU)、微处理器、特定应用集成电路(Application-Specific Integrated Circuit,ASIC),或者一个或多个用于控制本申请方案程序执行的集成电路。处理器201也可以包括多个CPU,并且处理器201可以是一个单核(single-CPU)处理器或多核(multi-CPU)处理器。这里的处理器可以指一个或多个设备、电路或用于处理数据(例如计算机程序指令)的处理核。
存储器202可以是只读存储器(Read-Only Memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备、随机存取存储器(Random Access Memory,RAM)或者可存储信息和指令的其他类型的动态存储设备,也可以是电可擦可编程只读存储器(Electrically Erasable Programmable Read-Only Memory,EEPROM)、只读光盘(Compact Disc Read-Only Memory,CD-ROM)或其他光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其他磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。存储器202可以是独立存在,通过总线与处理器201相连接。存储器202也可以和处理器201集成在一起。其中,存储器202用于存储执行本申请方案的应用程序代码,并由处理器201来控制执行。处理器201用于执行存储器202中存储的计算机程序代码,从而实现本申请实施例中所述人机交互的方法。
通信接口203,可用于与其他设备或通信网络通信,如以太网,无线局域网(wireless local area networks,WLAN)等。
输出设备和处理器通信,可以以多种方式来显示信息。例如,输出设备可以是液晶显示器(Liquid Crystal Display,LCD),发光二级管(Light Emitting Diode,LED)显示设备,阴极射线管(Cathode Ray Tube,CRT)显示设备,或投影仪(projector)等。输入设备和处理器通信,可以以多种方式接收用户的输入。例如,输入设备可以是鼠标、键盘、触摸屏设备或传感设备等。
如图20所示,为申请实施例公开的一种电子设备100的结构示意图。电子设备100可以包括处理器110,外部存储器接口120,内部存储器121,通用串行总线(universal serial bus,USB)接口130,充电管理模块140,电源管理模块141,电池142,天线1,天线2,移动通信模块150,无线通信模块160,音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,传感器模块180,按键190,马达191,指示器192,摄像头193,显示屏194,以及用户标识模块(subscriber identification module,SIM)卡接口195等。其中传感器模块180可以包括压力传感器180A,陀螺仪传感器180B,气压传感器180C,磁传感器180D,加速度传感器180E,距离传感器180F,接近光传感器180G,指纹传感器180H,温度传感器180J,触摸传感器180K,环境光传感器180L,骨传导传感器180M等。
可以理解的是,本发明实施例示意的结构并不构成对电子设备100的具体限定。在本申请另一些实施例中,电子设备100可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。
处理器110可以包括一个或多个处理单元,例如:处理器110可以包括应用处理器(application processor,AP),调制解调处理器,图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),控制器,存储器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网络处理器(neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。
其中,控制器可以是电子设备100的神经中枢和指挥中心。控制器可以根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。
处理器110中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器110中的存储器为高速缓冲存储器。该存储器可以保存处理器110刚用过或循环使用的指令或数据。如果处理器110需要再次使用该指令或数据,可从所述存储器中直接调用。避免了重复存取,减少了处理器110的等待时间,因而提高了系统的效率。
在一些实施例中,处理器110可以包括一个或多个接口。接口可以包括集成电路(inter-integrated circuit,I2C)接口,集成电路内置音频(inter-integrated circuit sound,I2S)接口,脉冲编码调制(pulse code modulation,PCM)接口,通用异步收发传输器(universal asynchronous receiver/transmitter,UART)接口,移动产业处理器接口(mobile industry processor interface,MIPI),通用输入输出(general-purpose input/output,GPIO)接口,用户标识模块(subscriber identity module,SIM)接口,和/或通用串行总线(universal serial bus,USB)接口等。
I2C接口是一种双向同步串行总线,包括一根串行数据线(serial data line,SDA)和一根串行时钟线(derail clock line,SCL)。在一些实施例中,处理器110可以包含多组I2C总线。处理器110可以通过不同的I2C总线接口分别耦合触摸传感器180K,充电器,闪光灯,摄像头193等。例如:处理器110可以通过I2C接口耦合触摸传感器180K,使处理器110与触摸传感器180K通过I2C总线接口通信,实现电子设备100的触摸功能。
I2S接口可以用于音频通信。在一些实施例中,处理器110可以包含多组I2S总线。处理器110可以通过I2S总线与音频模块170耦合,实现处理器110与音频模块170之间的通信。在一些实施例中,音频模块170可以通过I2S接口向无线通信模块160传递音频信号,实现通过蓝牙耳机接听电话的功能。
PCM接口也可以用于音频通信,将模拟信号抽样,量化和编码。在一些实施例中,音频模块170与无线通信模块160可以通过PCM总线接口耦合。在一些实施例中,音频模块170也可以通过PCM接口向无线通信模块160传递音频信号,实现通过蓝牙耳机接听电话的功能。所述I2S接口和所述PCM接口都可以用于音频通信。
UART接口是一种通用串行数据总线,用于异步通信。该总线可以为双向通信总线。它将要传输的数据在串行通信与并行通信之间转换。在一些实施例中,UART接口通常被用于连接处理器110与无线通信模块160。例如:处理器110通过UART接口与无线通信模块160中的蓝牙模块通信,实现蓝牙功能。在一些实施例中,音频模块170可以通过UART接口向无线通信模块160传递音频信号,实现通过蓝牙耳机播放音乐的功能。
MIPI接口可以被用于连接处理器110与显示屏194,摄像头193等外围器件。MIPI接口包括摄像头串行接口(camera serial interface,CSI),显示屏串行接口(display serial  interface,DSI)等。在一些实施例中,处理器110和摄像头193通过CSI接口通信,实现电子设备100的拍摄功能。处理器110和显示屏194通过DSI接口通信,实现电子设备100的显示功能。
GPIO接口可以通过软件配置。GPIO接口可以被配置为控制信号,也可被配置为数据信号。在一些实施例中,GPIO接口可以用于连接处理器110与摄像头193,显示屏194,无线通信模块160,音频模块170,传感器模块180等。GPIO接口还可以被配置为I2C接口,I2S接口,UART接口,MIPI接口等。
USB接口130是符合USB标准规范的接口,具体可以是Mini USB接口,Micro USB接口,USB Type C接口等。USB接口130可以用于连接充电器为电子设备100充电,也可以用于电子设备100与外围设备之间传输数据。也可以用于连接耳机,通过耳机播放音频。该接口还可以用于连接其他电子设备,例如AR设备等。
可以理解的是,本发明实施例示意的各模块间的接口连接关系,只是示意性说明,并不构成对电子设备100的结构限定。在本申请另一些实施例中,电子设备100也可以采用上述实施例中不同的接口连接方式,或多种接口连接方式的组合。
充电管理模块140用于从充电器接收充电输入。其中,充电器可以是无线充电器,也可以是有线充电器。在一些有线充电的实施例中,充电管理模块140可以通过USB接口130接收有线充电器的充电输入。在一些无线充电的实施例中,充电管理模块140可以通过电子设备100的无线充电线圈接收无线充电输入。充电管理模块140为电池142充电的同时,还可以通过电源管理模块141为电子设备供电。
电源管理模块141用于连接电池142,充电管理模块140与处理器110。电源管理模块141接收电池142和/或充电管理模块140的输入,为处理器110,内部存储器121,外部存储器,显示屏194,摄像头193,和无线通信模块160等供电。电源管理模块141还可以用于监测电池容量,电池循环次数,电池健康状态(漏电,阻抗)等参数。在其他一些实施例中,电源管理模块141也可以设置于处理器110中。在另一些实施例中,电源管理模块141和充电管理模块140也可以设置于同一个器件中。
电子设备100的无线通信功能可以通过天线1,天线2,移动通信模块150,无线通信模块160,调制解调处理器以及基带处理器等实现。
天线1和天线2用于发射和接收电磁波信号。电子设备100中的每个天线可用于覆盖单个或多个通信频带。不同的天线还可以复用,以提高天线的利用率。例如:可以将天线1复用为无线局域网的分集天线。在另外一些实施例中,天线可以和调谐开关结合使用。
移动通信模块150可以提供应用在电子设备100上的包括2G/3G/4G/5G等无线通信的解决方案。移动通信模块150可以包括至少一个滤波器,开关,功率放大器,低噪声放大器(low noise amplifier,LNA)等。移动通信模块150可以由天线1接收电磁波,并对接收的电磁波进行滤波,放大等处理,传送至调制解调处理器进行解调。移动通信模块150还可以对经调制解调处理器调制后的信号放大,经天线1转为电磁波辐射出去。在一些实施例中,移动通信模块150的至少部分功能模块可以被设置于处理器110中。在一些实施例中,移动通信模块150的至少部分功能模块可以与处理器110的至少部分模块被设置在同一个器件中。
调制解调处理器可以包括调制器和解调器。其中,调制器用于将待发送的低频基带信 号调制成中高频信号。解调器用于将接收的电磁波信号解调为低频基带信号。随后解调器将解调得到的低频基带信号传送至基带处理器处理。低频基带信号经基带处理器处理后,被传递给应用处理器。应用处理器通过音频设备(不限于扬声器170A,受话器170B等)输出声音信号,或通过显示屏194显示图像或视频。在一些实施例中,调制解调处理器可以是独立的器件。在另一些实施例中,调制解调处理器可以独立于处理器110,与移动通信模块150或其他功能模块设置在同一个器件中。
无线通信模块160可以提供应用在电子设备100上的包括无线局域网(wireless local area networks,WLAN)(如无线保真(wireless fidelity,Wi-Fi)网络),蓝牙(bluetooth,BT),全球导航卫星系统(global navigation satellite system,GNSS),调频(frequency modulation,FM),近距离无线通信技术(near field communication,NFC),红外技术(infrared,IR)等无线通信的解决方案。无线通信模块160可以是集成至少一个通信处理模块的一个或多个器件。无线通信模块160经由天线2接收电磁波,将电磁波信号调频以及滤波处理,将处理后的信号发送到处理器110。无线通信模块160还可以从处理器110接收待发送的信号,对其进行调频,放大,经天线2转为电磁波辐射出去。
在一些实施例中,电子设备100的天线1和移动通信模块150耦合,天线2和无线通信模块160耦合,使得电子设备100可以通过无线通信技术与网络以及其他设备通信。所述无线通信技术可以包括全球移动通讯系统(global system for mobile communications,GSM),通用分组无线服务(general packet radio service,GPRS),码分多址接入(code division multiple access,CDMA),宽带码分多址(wideband code division multiple access,WCDMA),时分码分多址(time-division code division multiple access,TD-SCDMA),长期演进(long term evolution,LTE),BT,GNSS,WLAN,NFC,FM,和/或IR技术等。所述GNSS可以包括全球卫星定位系统(global positioning system,GPS),全球导航卫星系统(global navigation satellite system,GLONASS),北斗卫星导航系统(beidou navigation satellite system,BDS),准天顶卫星系统(quasi-zenith satellite system,QZSS)和/或星基增强系统(satellite based augmentation systems,SBAS)。
电子设备100通过GPU,显示屏194,以及应用处理器等实现显示功能。GPU为图像处理的微处理器,连接显示屏194和应用处理器。GPU用于执行数学和几何计算,用于图形渲染。处理器110可包括一个或多个GPU,其执行程序指令以生成或改变显示信息。
显示屏194用于显示图像,视频等。显示屏194包括显示面板。显示面板可以采用液晶显示屏(liquid crystal display,LCD),有机发光二极管(organic light-emitting diode,OLED),有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrix organic light emitting diode的,AMOLED),柔性发光二极管(flex light-emitting diode,FLED),Miniled,MicroLed,Micro-oLed,量子点发光二极管(quantum dot light emitting diodes,QLED)等。在一些实施例中,电子设备100可以包括1个或N个显示屏194,N为大于1的正整数。
电子设备100可以通过ISP,摄像头193,视频编解码器,GPU,显示屏194以及应用处理器等实现拍摄功能。
ISP用于处理摄像头193反馈的数据。例如,拍照时,打开快门,光线通过镜头被传递到摄像头感光元件上,光信号转换为电信号,摄像头感光元件将所述电信号传递给ISP处理,转化为肉眼可见的图像。ISP还可以对图像的噪点,亮度,肤色进行算法优化。ISP 还可以对拍摄场景的曝光,色温等参数优化。在一些实施例中,ISP可以设置在摄像头193中。
摄像头193用于捕获静态图像或视频。物体通过镜头生成光学图像投射到感光元件。感光元件可以是电荷耦合器件(charge coupled device,CCD)或互补金属氧化物半导体(complementary metal-oxide-semiconductor,CMOS)光电晶体管。感光元件把光信号转换成电信号,之后将电信号传递给ISP转换成数字图像信号。ISP将数字图像信号输出到DSP加工处理。DSP将数字图像信号转换成标准的RGB,YUV等格式的图像信号。在一些实施例中,电子设备100可以包括1个或N个摄像头193,N为大于1的正整数。
数字信号处理器用于处理数字信号,除了可以处理数字图像信号,还可以处理其他数字信号。例如,当电子设备100在频点选择时,数字信号处理器用于对频点能量进行傅里叶变换等。
视频编解码器用于对数字视频压缩或解压缩。电子设备100可以支持一种或多种视频编解码器。这样,电子设备100可以播放或录制多种编码格式的视频,例如:动态图像专家组(moving picture experts group,MPEG)1,MPEG2,MPEG3,MPEG4等。
NPU为神经网络(neural-network,NN)计算处理器,通过借鉴生物神经网络结构,例如借鉴人脑神经元之间传递模式,对输入信息快速处理,还可以不断的自学习。通过NPU可以实现电子设备100的智能认知等应用,例如:图像识别,人脸识别,语音识别,文本理解等。
外部存储器接口120可以用于连接外部存储卡,例如Micro SD卡,实现扩展电子设备100的存储能力。外部存储卡通过外部存储器接口120与处理器110通信,实现数据存储功能。例如将音乐,视频等文件保存在外部存储卡中。
内部存储器121可以用于存储计算机可执行程序代码,所述可执行程序代码包括指令。处理器110通过运行存储在内部存储器121的指令,从而执行电子设备100的各种功能应用以及数据处理。内部存储器121可以包括存储程序区和存储数据区。其中,存储程序区可存储操作系统,至少一个功能所需的应用程序(比如声音播放功能,图像播放功能等)等。存储数据区可存储电子设备100使用过程中所创建的数据(比如音频数据,电话本等)等。此外,内部存储器121可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件,闪存器件,通用闪存存储器(universal flash storage,UFS)等。
电子设备100可以通过音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,以及应用处理器等实现音频功能。例如音乐播放,录音等。
音频模块170用于将数字音频信息转换成模拟音频信号输出,也用于将模拟音频输入转换为数字音频信号。音频模块170还可以用于对音频信号编码和解码。在一些实施例中,音频模块170可以设置于处理器110中,或将音频模块170的部分功能模块设置于处理器110中。
扬声器170A,也称“喇叭”,用于将音频电信号转换为声音信号。电子设备100可以通过扬声器170A收听音乐,或收听免提通话。
受话器170B,也称“听筒”,用于将音频电信号转换成声音信号。当电子设备100接听电话或语音信息时,可以通过将受话器170B靠近人耳接听语音。
麦克风170C,也称“话筒”,“传声器”,用于将声音信号转换为电信号。当拨打电话或 发送语音信息时,用户可以通过人嘴靠近麦克风170C发声,将声音信号输入到麦克风170C。电子设备100可以设置至少一个麦克风170C。在另一些实施例中,电子设备100可以设置两个麦克风170C,除了采集声音信号,还可以实现降噪功能。在另一些实施例中,电子设备100还可以设置三个,四个或更多麦克风170C,实现采集声音信号,降噪,还可以识别声音来源,实现定向录音功能等。耳机接口170D用于连接有线耳机。耳机接口170D可以是USB接口130,也可以是3.5mm的开放移动电子设备平台(open mobile terminal platform,OMTP)标准接口,美国蜂窝电信工业协会(cellular telecommunications industry association of the USA,CTIA)标准接口。
压力传感器180A用于感受压力信号,可以将压力信号转换成电信号。在一些实施例中,压力传感器180A可以设置于显示屏194。压力传感器180A的种类很多,如电阻式压力传感器,电感式压力传感器,电容式压力传感器等。电容式压力传感器可以是包括至少两个具有导电材料的平行板。当有力作用于压力传感器180A,电极之间的电容改变。电子设备100根据电容的变化确定压力的强度。当有触摸操作作用于显示屏194,电子设备100根据压力传感器180A检测所述触摸操作强度。电子设备100也可以根据压力传感器180A的检测信号计算触摸的位置。在一些实施例中,作用于相同触摸位置,但不同触摸操作强度的触摸操作,可以对应不同的操作指令。例如:当有触摸操作强度小于第一压力阈值的触摸操作作用于短消息应用图标时,执行查看短消息的指令。当有触摸操作强度大于或等于第一压力阈值的触摸操作作用于短消息应用图标时,执行新建短消息的指令。
陀螺仪传感器180B可以用于确定电子设备100的运动姿态。在一些实施例中,可以通过陀螺仪传感器180B确定电子设备100围绕三个轴(即,x,y和z轴)的角速度。陀螺仪传感器180B可以用于拍摄防抖。示例性的,当按下快门,陀螺仪传感器180B检测电子设备100抖动的角度,根据角度计算出镜头模组需要补偿的距离,让镜头通过反向运动抵消电子设备100的抖动,实现防抖。陀螺仪传感器180B还可以用于导航,体感游戏场景。
气压传感器180C用于测量气压。在一些实施例中,电子设备100通过气压传感器180C测得的气压值计算海拔高度,辅助定位和导航。
磁传感器180D包括霍尔传感器。电子设备100可以利用磁传感器180D检测翻盖皮套的开合。在一些实施例中,当电子设备100是翻盖机时,电子设备100可以根据磁传感器180D检测翻盖的开合。进而根据检测到的皮套的开合状态或翻盖的开合状态,设置翻盖自动解锁等特性。
加速度传感器180E可检测电子设备100在各个方向上(一般为三轴)加速度的大小。当电子设备100静止时可检测出重力的大小及方向。还可以用于识别电子设备姿态,应用于横竖屏切换,计步器等应用。
距离传感器180F,用于测量距离。电子设备100可以通过红外或激光测量距离。在一些实施例中,拍摄场景,电子设备100可以利用距离传感器180F测距以实现快速对焦。
接近光传感器180G可以包括例如发光二极管(LED)和光检测器,例如光电二极管。发光二极管可以是红外发光二极管。电子设备100通过发光二极管向外发射红外光。电子设备100使用光电二极管检测来自附近物体的红外反射光。当检测到充分的反射光时,可以确定电子设备100附近有物体。当检测到不充分的反射光时,电子设备100可以确定电子设备100附近没有物体。电子设备100可以利用接近光传感器180G检测用户手持电子设 备100贴近耳朵通话,以便自动熄灭屏幕达到省电的目的。接近光传感器180G也可用于皮套模式,口袋模式自动解锁与锁屏。
环境光传感器180L用于感知环境光亮度。电子设备100可以根据感知的环境光亮度自适应调节显示屏194亮度。环境光传感器180L也可用于拍照时自动调节白平衡。环境光传感器180L还可以与接近光传感器180G配合,检测电子设备100是否在口袋里,以防误触。
指纹传感器180H用于采集指纹。电子设备100可以利用采集的指纹特性实现指纹解锁,访问应用锁,指纹拍照,指纹接听来电等。
温度传感器180J用于检测温度。在一些实施例中,电子设备100利用温度传感器180J检测的温度,执行温度处理策略。例如,当温度传感器180J上报的温度超过阈值,电子设备100执行降低位于温度传感器180J附近的处理器的性能,以便降低功耗实施热保护。在另一些实施例中,当温度低于另一阈值时,电子设备100对电池142加热,以避免低温导致电子设备100异常关机。在其他一些实施例中,当温度低于又一阈值时,电子设备100对电池142的输出电压执行升压,以避免低温导致的异常关机。
触摸传感器180K,也称“触控面板”。触摸传感器180K可以设置于显示屏194,由触摸传感器180K与显示屏194组成触摸屏,也称“触控屏”。触摸传感器180K用于检测作用于其上或附近的触摸操作。触摸传感器可以将检测到的触摸操作传递给应用处理器,以确定触摸事件类型。可以通过显示屏194提供与触摸操作相关的视觉输出。在另一些实施例中,触摸传感器180K也可以设置于电子设备100的表面,与显示屏194所处的位置不同。
骨传导传感器180M可以获取振动信号。在一些实施例中,骨传导传感器180M可以获取人体声部振动骨块的振动信号。骨传导传感器180M也可以接触人体脉搏,接收血压跳动信号。在一些实施例中,骨传导传感器180M也可以设置于耳机中,结合成骨传导耳机。音频模块170可以基于所述骨传导传感器180M获取的声部振动骨块的振动信号,解析出语音信号,实现语音功能。应用处理器可以基于所述骨传导传感器180M获取的血压跳动信号解析心率信息,实现心率检测功能。
按键190包括开机键,音量键等。按键190可以是机械按键。也可以是触摸式按键。电子设备100可以接收按键输入,产生与电子设备100的用户设置以及功能控制有关的键信号输入。
马达191可以产生振动提示。马达191可以用于来电振动提示,也可以用于触摸振动反馈。例如,作用于不同应用(例如拍照,音频播放等)的触摸操作,可以对应不同的振动反馈效果。作用于显示屏194不同区域的触摸操作,马达191也可对应不同的振动反馈效果。不同的应用场景(例如:时间提醒,接收信息,闹钟,游戏等)也可以对应不同的振动反馈效果。触摸振动反馈效果还可以支持自定义。
指示器192可以是指示灯,可以用于指示充电状态,电量变化,也可以用于指示消息,未接来电,通知等。
SIM卡接口195用于连接SIM卡。SIM卡可以通过插入SIM卡接口195,或从SIM卡接口195拔出,实现和电子设备100的接触和分离。电子设备100可以支持1个或N个SIM卡接口,N为大于1的正整数。SIM卡接口195可以支持Nano SIM卡,Micro SIM卡,SIM卡等。同一个SIM卡接口195可以同时插入多张卡。所述多张卡的类型可以相同,也 可以不同。SIM卡接口195也可以兼容不同类型的SIM卡。SIM卡接口195也可以兼容外部存储卡。电子设备100通过SIM卡和网络交互,实现通话以及数据通信等功能。在一些实施例中,电子设备100采用eSIM,即:嵌入式SIM卡。eSIM卡可以嵌在电子设备100中,不能和电子设备100分离。
本申请中,麦克风170C可以采集用户语音,处理器110用来处理麦克风170C采集到的用户语音。然后,移动通信模块150和无线通信模块160可以与人机交互服务器200建立通信连接,例如,图2中示出的通信连接101。显示屏194可以向用户展示人机交互服务器200反馈的语音处理结果。扬声器170A和受话器170B可以向用户播报人机交互服务器200反馈的语音处理结果。
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请实施例各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或处理器执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:快闪存储器、移动硬盘、只读存储器、随机存取存储器、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何在本申请揭露的技术范围内的变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (14)

  1. 一种语义解析方法,其特征在于,包括:
    第一服务器从第一用户语句中提取出第一槽位的实体;所述第一用户语句为所述第一服务器接收到的用户语句;所述第一槽位为第一意图被配置的槽位;所述第一意图为第一技能被配置的意图,所述第一技能被配置有一个或多个意图;所述第一意图、所述第一技能是所述第一服务器根据所述第一用户语句确定的,与所述第一用户语句表示的服务需求相匹配;
    在所述第一槽位的实体为代词的条件下,所述第一服务器将所述第一槽位的实体修改为第二槽位的实体;所述第二槽位被配置为所述第一槽位的关联槽位,所述第二槽位的实体是所述第一服务器从第二用户语句中提取出来的;所述第二用户语句在所述第一用户语句之前被所述第一服务器接收到;所述第二槽位为第二意图被配置的槽位,所述第二意图被配置为所述第一意图的关联意图;所述第二意图为第二技能被配置的意图,所述第二技能被配置为所述第一技能的关联技能;
    所述第一服务器向第二服务器发送第一服务请求,并从所述第二服务器获取响应所述第一服务请求的第一服务结果;所述第一服务请求至少包括所述第一意图的指示信息以及所述第一槽位的实体;所述第二服务器是所述第一技能对应的应用服务器;所述第一服务结果是所述第二服务器根据所述第一意图的指示信息以及所述第一槽位的实体确定的。
  2. 根据权利要求1所述的方法,其特征在于,所述第一服务器接收从电子设备采集到的第一用户语句;所述第一用户语句为音频形式的用户语句或文本形式的用户语句。
  3. 根据权利要求1所述的方法,其特征在于,所述第一服务器从第一用户语句中提取出第一槽位的实体之前,包括:
    所述第一服务器接收第二服务器发送的关联技能请求,所述关联技能请求用于请求将所述第二技能配置为所述第一技能的关联技能;所述关联技能请求包含所述第一技能的指示信息和所述第二技能的指示信息;
    响应于所述关联技能请求,所述第一服务器向第三服务器获取确认信息;所述第三服务器是所述第二技能对应的应用服务器;所述确认信息用于所述第三服务器确认将所述第二技能配置为所述第一技能的关联技能;
    基于所述确认信息,所述第一服务器将所述第二技能配置为所述第一技能的关联技能。
  4. 根据权利要求3所述的方法,其特征在于,所述基于所述确认信息,第一服务器从第一用户语句中提取出第一槽位的实体之前,包括:
    所述第一服务器接收所述第二服务器发送的关联槽位请求,所述关联槽位请求用于请求将所述第二槽位配置为所述第一槽位的关联槽位;所述关联槽位请求包含所述第一槽位的指示信息和所述第二槽位的指示信息;
    响应于所述关联槽位请求;所述第一服务器将所述第二槽位配置为所述第一槽位的关联槽位。
  5. 根据权利要求4所述的方法,其特征在于,所述第一服务器将所述第二槽位配置为所述第一槽位的关联槽位,包括:
    所述第一服务器判断所述第一槽位的槽位类型与所述第二槽位的槽位类型是否相同;
    若相同,则所述第一服务器将所述第二槽位配置为所述第一槽位的关联槽位。
  6. 根据权利要求5所述的方法,其特征在于,所述第一服务器将所述第二槽位配置为所述第一槽位的关联槽位,包括:
    若所述第一槽位被配置的实体来源于系统词库,则所述第一服务器将所述第二槽位的槽位名称与所述第一槽位的槽位名称关联;所述系统词库是所述第一服务器提供给所有技能的词库;所述系统词库使得配置的实体来源于同一个系统词库的所有槽位的实体集合相同;所述第二槽位被配置的实体来源与所述第一槽位被配置的实体来源相同;
    在所述第一槽位被配置的实体来源于第一自定义词库的条件下,所述第一服务器将所述第二槽位的槽位名称与所述第一槽位的槽位名称关联;所述第一服务器将所述第一自定义词库与第二自定义词库关联;所述第一自定义词库为所述第一槽位被配置的实体集合;所述第一自定义词库为所述第一服务器为所述第一技能创建的词库;所述第一自定义词库包含有限的词;所述第二自定义词库为所述第二槽位的被配置实体集合;所述第二自定义词库为所述第一服务器为所述第二技能创建的词库;所述第二自定义词库包含有限的词。
  7. 根据权利要求1所述的方法,其特征在于,所述第一服务结果由所述电子设备输出;所述输出的方式至少包括所述第一服务结果在所述电子设备的屏幕上显示、所述第一服务结果由所述电子设备语音播报。
  8. 一种语义解析方法,其特征在于,包括:
    第二服务器接收第一服务器发送的第一服务请求;所述第一服务请求包括第一意图的指示信息和第一槽位的实体;在第一用户语句中被提取出的所述第一槽位的实体为代词的条件下,所述第一槽位的实体从代词被修改为了第二槽位的实体;所述第二槽位被配置为所述第一槽位的关联槽位;所述第一用户语句是电子设备采集并发送给所述第一服务器的;所述第一槽位为第一意图被配置的槽位;所述第一意图为第一技能被配置的意图,所述第一技能被配置有一个或多个意图;所述第二服务器为所述第一技能对应的应用服务器;所述第一技能、所述第一意图是所述第一服务器根据所述第一用户语句确定的,与所述第一用户语句表示的服务需求相匹配;所述第二用户语句在所述第一用户语句之前被所述电子设备采集到;所述第二槽位为第二意图被配置的槽位,所述第二意图为第二技能被配置的意图;所述第二技能被配置为所述第一技能的关联技能;所述第二技能、所述第二意图是所述第一服务器根据所述第二用户语句确定的,与所述第二用户语句表示的服务需求相匹配;
    响应于所述第一服务请求,所述第二服务器向所述第一服务器发送第一服务结果;所述第一服务结果是所述第二服务器根据所述第一意图的指示信息以及所述第一槽位的实体确定的。
  9. 根据权利要求8所述的方法,其特征在于,所述第二服务器接收第一服务器发送的第一服务请求之前,包括:
    所述第二服务器向所述第一服务器发送关联技能请求,所述关联技能请求用于请求将所述第二技能配置为所述第一技能的关联技能;所述第一请求包含所述第一技能的指示信息和所述第二技能的指示信息。
  10. 根据权利要求8所述的方法,其特征在于,所述第二服务器接收第一服务器发送的第一服务请求之前,包括:
    所述第二服务器向所述第一服务器发送关联槽位请求;所述关联槽位请求用于请求将所述第二槽位配置为所述第一槽位的关联槽位;所述第二请求包含所述第一槽位的指示信息和所述第二槽位的指示信息。
  11. 一种语义解析方法,其特征在于,包括:
    第一服务器从第一用户语句中提取出第一槽位的实体;所述第一用户语句为所述第一服务器接收到的用户语句;所述第一槽位为第一意图被配置的槽位;所述第一意图为第一技能被配置的意图,所述第一技能被配置有一个或多个意图;所述第一意图、所述第一技能是所述第一服务器根据所述第一用户语句确定的,与所述第一用户语句表示的服务需求相匹配;
    在所述第一槽位的实体为代词的条件下,则所述第一服务器将所述第一槽位的实体修改为第一候选语句对应的第一候选实体;所述第一候选语句为M个候选语句中打分排序后打分最高的候选语句;所述M个候选语句为从K个候选候选中语义识别的置信度大于置信度阈值的候选语句;所述K个候选语句为K个候选实体分别替换所述第一用户语句中所述第一槽位的实体得到的候选语句;所述K个候选实体为所述第一服务器从第二用户语句中提取出的第二槽位的实体;所述第二槽位的槽位类型与所述第一槽位的槽位类型相同;所述第二用户语句在所述第一用户语句之前被所述第一服务器接收到;K>=1;M<=K;
    所述第一服务器基于所述第一意图和所述第一槽位的实体,获取针对第一服务请求的第一服务结果;所述第一服务请求包含所述第一意图的指示信息和所述第一槽位的实体;
    所述第一服务器向所述电子设备返回所述第一服务结果;所述第一服务结果是第二服务器根据所述第一意图的指示信息以及所述第一槽位的实体确定的;所述第二服务器是所述第一技能对应的应用服务器。
  12. 一种服务器,其特征在于,运用于人机对话系统中,包括:通信接口、存储器和处理器;所述通信接口、所述存储器与所述处理器耦合,所述存储器用于存储计算机程序代码,所述计算机程序代码包括计算机指令,当所述处理器从所述存储器中读取所述计算机指令,以使得所述服务器执行如权利要求1至11任一项所述的方法。
  13. 一种计算机存储介质,其特征在于,包括计算机指令,当所述计算机指令在服务器上运行时,使得所述服务器执行如权利要求1至11任一项所述的方法。
  14. 一种计算机程序产品,其特征在于,当所述计算机程序产品在计算机上运行时,使得所述计算机执行如权利要求1至11任一项所述的方法。
PCT/CN2020/086002 2019-04-30 2020-04-22 一种语义解析方法及服务器 WO2020221072A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US17/607,657 US11900924B2 (en) 2019-04-30 2020-04-22 Semantic parsing method and server
EP20798047.5A EP3951773A4 (en) 2019-04-30 2020-04-22 SEMANTIC ANALYSIS METHOD AND SERVER

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910370839.7 2019-04-30
CN201910370839.7A CN110111787B (zh) 2019-04-30 2019-04-30 一种语义解析方法及服务器

Publications (1)

Publication Number Publication Date
WO2020221072A1 true WO2020221072A1 (zh) 2020-11-05

Family

ID=67488248

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/086002 WO2020221072A1 (zh) 2019-04-30 2020-04-22 一种语义解析方法及服务器

Country Status (4)

Country Link
US (1) US11900924B2 (zh)
EP (1) EP3951773A4 (zh)
CN (1) CN110111787B (zh)
WO (1) WO2020221072A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112882679A (zh) * 2020-12-21 2021-06-01 广州橙行智动汽车科技有限公司 一种语音交互的方法和装置

Families Citing this family (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110111787B (zh) 2019-04-30 2021-07-09 华为技术有限公司 一种语义解析方法及服务器
CN110797012B (zh) * 2019-08-30 2023-06-23 腾讯科技(深圳)有限公司 一种信息提取方法、设备及存储介质
CN110798506B (zh) * 2019-09-27 2023-03-10 华为技术有限公司 执行命令的方法、装置及设备
CN112579217B (zh) * 2019-09-27 2023-10-03 百度在线网络技术(北京)有限公司 技能展示方法及装置、服务器、电子设备和存储介质
CN110705267B (zh) * 2019-09-29 2023-03-21 阿波罗智联(北京)科技有限公司 语义解析方法、装置及存储介质
CN110688473A (zh) * 2019-10-09 2020-01-14 浙江百应科技有限公司 一种机器人动态获取信息的方法
CN110795547B (zh) * 2019-10-18 2023-04-07 腾讯科技(深圳)有限公司 文本识别方法和相关产品
CN112786022B (zh) * 2019-11-11 2023-04-07 青岛海信移动通信技术股份有限公司 终端、第一语音服务器、第二语音服务器及语音识别方法
CN111128153B (zh) * 2019-12-03 2020-10-02 北京蓦然认知科技有限公司 一种语音交互方法及装置
CN111191018B (zh) * 2019-12-30 2023-10-20 华为技术有限公司 对话系统的应答方法和装置、电子设备、智能设备
CN111177358B (zh) * 2019-12-31 2023-05-12 华为技术有限公司 意图识别方法、服务器及存储介质
CN111294616A (zh) * 2020-01-16 2020-06-16 北京钛星数安科技有限公司 一种自适应远程浏览视觉流传输方法
CN111402888B (zh) * 2020-02-19 2023-12-08 北京声智科技有限公司 语音处理方法、装置、设备及存储介质
CN113555015A (zh) * 2020-04-23 2021-10-26 百度在线网络技术(北京)有限公司 语音交互方法、语音交互设备、电子设备及存储介质
CN111681647B (zh) * 2020-06-10 2023-09-05 北京百度网讯科技有限公司 用于识别词槽的方法、装置、设备以及存储介质
CN113806469A (zh) * 2020-06-12 2021-12-17 华为技术有限公司 语句意图识别方法及终端设备
CN112148847B (zh) * 2020-08-27 2024-03-12 出门问问创新科技有限公司 一种语音信息的处理方法及装置
CN112738207B (zh) * 2020-12-25 2023-06-16 青岛海尔科技有限公司 关键字数据的传输方法及装置、存储介质、电子装置
CN112559723A (zh) * 2020-12-28 2021-03-26 广东国粒教育技术有限公司 一种基于深度学习的faq检索式问答构建方法及系统
CN113591470A (zh) * 2021-06-24 2021-11-02 海信视像科技股份有限公司 一种语义理解方法及装置
US20230008868A1 (en) * 2021-07-08 2023-01-12 Nippon Telegraph And Telephone Corporation User authentication device, user authentication method, and user authentication computer program
CN117373445A (zh) * 2022-07-01 2024-01-09 华为技术有限公司 一种语音指令处理方法、装置、系统以及存储介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104462053A (zh) * 2013-09-22 2015-03-25 江苏金鸽网络科技有限公司 一种文本内的基于语义特征的人称代词指代消解方法
CN107886948A (zh) * 2017-11-16 2018-04-06 百度在线网络技术(北京)有限公司 语音交互方法及装置,终端,服务器及可读存储介质
CN107943793A (zh) * 2018-01-10 2018-04-20 威盛电子股份有限公司 自然语言的语义解析方法
US10019434B1 (en) * 2012-06-01 2018-07-10 Google Llc Resolving pronoun ambiguity in voice queries
CN108920497A (zh) * 2018-05-23 2018-11-30 北京奇艺世纪科技有限公司 一种人机交互方法及装置
CN109241524A (zh) * 2018-08-13 2019-01-18 腾讯科技(深圳)有限公司 语义解析方法及装置、计算机可读存储介质、电子设备
CN110111787A (zh) * 2019-04-30 2019-08-09 华为技术有限公司 一种语义解析方法及服务器

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8238541B1 (en) * 2006-01-31 2012-08-07 Avaya Inc. Intent based skill-set classification for accurate, automatic determination of agent skills
US9171542B2 (en) 2013-03-11 2015-10-27 Nuance Communications, Inc. Anaphora resolution using linguisitic cues, dialogue context, and general knowledge
US9754591B1 (en) 2013-11-18 2017-09-05 Amazon Technologies, Inc. Dialog management context sharing
CN104572626A (zh) * 2015-01-23 2015-04-29 北京云知声信息技术有限公司 语义模板自动生成方法、装置和语义分析方法、系统
US10055403B2 (en) * 2016-02-05 2018-08-21 Adobe Systems Incorporated Rule-based dialog state tracking
US10467509B2 (en) * 2017-02-14 2019-11-05 Microsoft Technology Licensing, Llc Computationally-efficient human-identifying smart assistant computer
US11004444B2 (en) * 2017-09-08 2021-05-11 Amazon Technologies, Inc. Systems and methods for enhancing user experience by communicating transient errors
US10878808B1 (en) * 2018-01-09 2020-12-29 Amazon Technologies, Inc. Speech processing dialog management
CN109063035B (zh) * 2018-07-16 2021-11-09 哈尔滨工业大学 一种面向出行领域的人机多轮对话方法
US11455987B1 (en) * 2019-03-06 2022-09-27 Amazon Technologies, Inc. Multiple skills processing
US11461311B2 (en) * 2019-04-26 2022-10-04 Oracle International Corporation Bot extensibility infrastructure

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10019434B1 (en) * 2012-06-01 2018-07-10 Google Llc Resolving pronoun ambiguity in voice queries
CN104462053A (zh) * 2013-09-22 2015-03-25 江苏金鸽网络科技有限公司 一种文本内的基于语义特征的人称代词指代消解方法
CN107886948A (zh) * 2017-11-16 2018-04-06 百度在线网络技术(北京)有限公司 语音交互方法及装置,终端,服务器及可读存储介质
CN107943793A (zh) * 2018-01-10 2018-04-20 威盛电子股份有限公司 自然语言的语义解析方法
CN108920497A (zh) * 2018-05-23 2018-11-30 北京奇艺世纪科技有限公司 一种人机交互方法及装置
CN109241524A (zh) * 2018-08-13 2019-01-18 腾讯科技(深圳)有限公司 语义解析方法及装置、计算机可读存储介质、电子设备
CN110111787A (zh) * 2019-04-30 2019-08-09 华为技术有限公司 一种语义解析方法及服务器

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3951773A4

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112882679A (zh) * 2020-12-21 2021-06-01 广州橙行智动汽车科技有限公司 一种语音交互的方法和装置

Also Published As

Publication number Publication date
US20220208182A1 (en) 2022-06-30
CN110111787A (zh) 2019-08-09
US11900924B2 (en) 2024-02-13
CN110111787B (zh) 2021-07-09
EP3951773A1 (en) 2022-02-09
EP3951773A4 (en) 2022-05-25

Similar Documents

Publication Publication Date Title
WO2020221072A1 (zh) 一种语义解析方法及服务器
CN110910872B (zh) 语音交互方法及装置
US20220214894A1 (en) Command execution method, apparatus, and device
US11636852B2 (en) Human-computer interaction method and electronic device
WO2021244457A1 (zh) 一种视频生成方法及相关装置
WO2021254411A1 (zh) 意图识别方法和电子设备
WO2022052776A1 (zh) 一种人机交互的方法、电子设备及系统
CN111970401B (zh) 一种通话内容处理方法、电子设备和存储介质
WO2020029094A1 (zh) 一种语音控制命令生成方法及终端
CN111881315A (zh) 图像信息输入方法、电子设备及计算机可读存储介质
WO2021031862A1 (zh) 一种数据处理方法及其装置
CN112740148A (zh) 一种向输入框中输入信息的方法及电子设备
CN115022982B (zh) 多屏协同无感接入方法、电子设备及存储介质
WO2021238371A1 (zh) 生成虚拟角色的方法及装置
CN113380240B (zh) 语音交互方法和电子设备
WO2022007757A1 (zh) 跨设备声纹注册方法、电子设备及存储介质
CN116861066A (zh) 应用推荐方法和电子设备
WO2023236908A1 (zh) 图像描述方法、电子设备及计算机可读存储介质
WO2024051730A1 (zh) 跨模态检索方法、装置、设备、存储介质及计算机程序
WO2023016347A1 (zh) 声纹认证应答方法、系统及电子设备
CN114298014A (zh) 文本纠错的方法、装置、设备及计算机可读存储介质
CN116301510A (zh) 一种控件定位方法及电子设备
CN117012194A (zh) 一种提高车端网联应用的可见即可说识别率的方法
CN114518965A (zh) 一种剪贴内容处理方法及其装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20798047

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2020798047

Country of ref document: EP

Effective date: 20211026

NENP Non-entry into the national phase

Ref country code: DE