WO2021196981A1 - Voice interaction method and apparatus, and terminal device - Google Patents

Voice interaction method and apparatus, and terminal device Download PDF

Info

Publication number
WO2021196981A1
WO2021196981A1 PCT/CN2021/079479 CN2021079479W WO2021196981A1 WO 2021196981 A1 WO2021196981 A1 WO 2021196981A1 CN 2021079479 W CN2021079479 W CN 2021079479W WO 2021196981 A1 WO2021196981 A1 WO 2021196981A1
Authority
WO
WIPO (PCT)
Prior art keywords
sentence
entity information
target
user
historical
Prior art date
Application number
PCT/CN2021/079479
Other languages
French (fr)
Chinese (zh)
Inventor
刘杰
张晴
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2021196981A1 publication Critical patent/WO2021196981A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3343Query execution using phonetics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/38Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually

Definitions

  • This application relates to the field of artificial intelligence technology, and in particular to a voice interaction method, device and terminal equipment.
  • Natural language processing is an important part of artificial intelligence (Artificial Intelligence, AI), and its typical application scenarios include task-oriented dialogue systems and machine translation.
  • AI Artificial Intelligence
  • DST Dialogue State Tracking
  • the DST method based on machine learning requires the model to understand the content of multiple rounds of dialogue well, which places extremely high requirements on the model, which largely limits the accuracy of this type of DST method.
  • due to the high abstraction of natural language and the complexity of multiple rounds of dialogue it is difficult for current machine learning technology to fully and accurately understand multiple rounds of dialogue in practical application scenarios, that is, it is difficult to accurately track the state of multiple rounds of dialogue and determine User intent.
  • the embodiments of the present application provide a voice interaction method, device, and terminal device, which can solve the problem of the difficulty in tracking the state of multiple rounds of dialogue in the prior art and the inability to accurately determine the user's intention.
  • an embodiment of the present application provides a voice interaction method, including:
  • the historical dialogue data is obtained, and the target entity information in the user sentence and the historical entity information in the historical dialogue data are identified through the named entity recognition model; then, the historical entity information is extracted from the historical entity information and the user
  • the key entity information associated with the sentence can rewrite the current user sentence according to the target entity information and the key entity information to generate a target interaction sentence; by outputting a reply sentence corresponding to the target interaction sentence, the user's interaction needs can be met.
  • the dialogue state tracking problem in multiple rounds of dialogue can be converted into a single round of dialogue. Problem, it is convenient to use the existing mature single-round dialogue technology to reply to the user's intention, which helps to improve the accuracy of the user's intention recognition and enhance the language processing ability of the dialogue system.
  • the key entity information associated with the user sentence is extracted from the historical entity information, and the candidate users that match the user sentence can be initially determined based on the target entity information and the historical entity information Intent; then calculate the distribution probability of each historical entity information in the historical dialogue data, so that according to the distribution probability and candidate user intent, the key entity information can be extracted from the historical entity information.
  • separately calculating the distribution probability of each historical entity information in the historical dialogue data can be implemented by calling a preset pointer generation network model, and the pointer generation network model includes an encoding module, The encoding module can be used to separately encode each historical entity information to obtain the distribution probability corresponding to each historical entity information.
  • the target probability can be
  • the candidate entity information corresponding to the value is determined to be the key entity information, and the aforementioned target probability value is the probability value of the distribution probability of any candidate entity information in the historical dialogue data.
  • the target basic sentence when generating the target interactive sentence based on the target entity information and key entity information, can be determined first, and then the target entity information and key entity information can be used to determine the target basic sentence.
  • the sentence is rewritten to obtain the target interactive sentence, which reduces the difficulty of directly generating the target interactive sentence.
  • multiple basic sentences can be obtained from user sentences containing target entity information and historical dialogue data containing key entity information; and then multiple basic sentences and entities to be evaluated can be calculated separately
  • the matching degree between the information is identified, and the basic sentence corresponding to the maximum value of the matching degree is recognized as the current target basic sentence.
  • the above-mentioned entity information to be evaluated includes all target entity information and key entity information.
  • any basic sentence may respectively include multiple semantic slots
  • the matching degree between the entity information to be evaluated and the multiple basic sentences may be based on the number of key slots and the basic sentence
  • the ratio between the number of semantic slots in Therefore, for any basic sentence, you can first count the number of semantic slots in the basic sentence and the number of entity information to be evaluated; then determine the number of key slots in the basic sentence that match the information of each entity to be evaluated. ; After calculating the ratio between the number of key slots and the number of semantic slots in the basic sentence, the ratio can be used as the matching degree between the entity information to be evaluated and the basic sentence.
  • the pointer generation network model may also include a decoding module, and the decoding module may be obtained by training a variety of training data.
  • the basic sentence corresponding to the entity information. Therefore, when the target basic sentence is rewritten and the target interactive sentence is generated, the decoding module can be used to complete. Specifically, if the target basic sentence is the current user sentence, the decoding module can be used to decode the key entity information and the target basic sentence and output the target interactive sentence; if the target basic sentence is the user sentence in the historical dialogue data, it can be used The decoding module decodes target entity information, key entity information and target basic sentences, and outputs target interactive sentences.
  • the target interaction sentence After obtaining the target interaction sentence, it is also possible to verify whether the rewritten target interaction sentence is correct.
  • This embodiment provides a two-layer verification mechanism. First, multiple entity information in the target interaction sentence can be extracted, and it can be verified whether the multiple entity information matches the preset semantic slot of the target user's intention in the knowledge base, and the target user's intention is Any of the candidate user intents.
  • the target interactive sentence can be verified a second time according to the sentence type of the target interactive sentence. In the second verification, it is possible to judge whether the target interactive sentence is a task-type sentence by calling a preset natural language understanding model.
  • an embodiment of the present application provides a voice interaction device, including:
  • the historical dialogue data acquisition module is used to acquire historical dialogue data when the user sentence to be replied is received
  • the target entity information identification module is used to identify the target entity information in the user sentence.
  • the historical entity information identification module is used to identify the historical entity information in the historical dialogue data
  • a key entity information extraction module for extracting key entity information associated with the user sentence from the historical entity information
  • a target interactive sentence generating module configured to generate a target interactive sentence according to the target entity information and the key entity information
  • the reply sentence output module is used to output the reply sentence corresponding to the target interactive sentence.
  • the key entity information extraction module may specifically include the following submodules:
  • a candidate user intention determination sub-module configured to determine a candidate user intention that matches the user sentence according to the target entity information and the historical entity information;
  • the distribution probability calculation sub-module is used to separately calculate the distribution probability of each historical entity information in the historical dialogue data
  • the key entity information extraction sub-module is used to extract key entity information from the historical entity information according to the distribution probability and the candidate user's intention.
  • the distribution probability calculation submodule may specifically include the following units:
  • the first pointer generation network model calling unit is configured to call a preset pointer generation network model, and use the coding module of the pointer generation network model to respectively encode each historical entity information to obtain information corresponding to each historical entity information. The corresponding distribution probability.
  • the key entity information extraction submodule may specifically include the following units:
  • a candidate entity information extraction unit configured to extract candidate entity information associated with any candidate user's intention from the historical entity information
  • the key entity information extraction unit is configured to extract candidate entity information whose distribution probability is greater than a preset probability threshold as key entity information.
  • the key entity information extraction submodule may further include the following units:
  • the query sentence generating unit is configured to: if the difference between the target probability value and the preset probability threshold is less than the preset difference, and the target probability value is less than the preset probability threshold, then according to the target probability value Corresponding to the candidate entity information and the key entity information, generating a query sentence to instruct the user to identify the candidate entity information corresponding to the target probability value;
  • the key entity information determining unit is configured to determine the candidate entity information corresponding to the target probability value as the key entity information when the user's confirmation information for the query sentence is received, and the target probability value is any candidate The probability value of the distribution probability of the entity information in the historical dialogue data.
  • the target interactive sentence generation module may specifically include the following submodules:
  • the target basic sentence determination sub-module is used to determine the target basic sentence
  • the target interactive sentence generating sub-module is used to use the target entity information and the key entity information to rewrite the target basic sentence to generate a target interactive sentence.
  • the target basic sentence determination submodule may specifically include the following units:
  • the basic sentence obtaining unit is configured to obtain a plurality of basic sentences from the user sentence containing the target entity information and the historical dialogue data containing the key entity information;
  • a matching degree calculation unit configured to calculate the matching degree between the plurality of basic sentences and the entity information to be evaluated, where the entity information to be evaluated includes the target entity information and the key entity information;
  • the target basic sentence identification unit is used to identify the basic sentence corresponding to the maximum matching degree as the current target basic sentence.
  • any basic sentence may respectively include multiple semantic slots
  • the matching degree calculation unit may specifically include the following sub-units:
  • the statistics subunit is used to count the number of semantic slots in the basic sentence and the number of entity information to be evaluated for any basic sentence;
  • the determining sub-module is used to determine the number of key slots in the basic sentence that respectively match the information of the entity to be evaluated;
  • the calculation subunit is used to calculate the ratio between the number of key slots and the number of semantic slots in the basic sentence, and use the ratio as the difference between the entity information to be evaluated and the basic sentence suitability.
  • the pointer generation network model may also include a decoding module, which is obtained by training a variety of training data.
  • the aforementioned multiple training data includes multiple entity information and information related to each entity.
  • the basic sentence corresponding to the information; the aforementioned target interactive sentence generating sub-module may specifically include the following units:
  • the second pointer generation network model calling unit is configured to use the decoding module to decode the key entity information and the target basic sentence if the target basic sentence is the user sentence, and output a target interactive sentence; if If the target basic sentence is the historical dialogue data, the decoding module is used to decode the target entity information and the target basic sentence, and output a target interactive sentence.
  • the target interactive sentence generation submodule may further include the following units:
  • the target interactive sentence entity information extraction unit is used to extract multiple entity information in the target interactive sentence
  • the target interactive sentence verification unit is used to verify whether the multiple entity information in the target interactive sentence matches a preset semantic slot of the target user's intention in the knowledge base, and the target user's intention is any of the candidate user intentions One; if the multiple entity information in the target interaction sentence matches the semantic slot intended by the target user, it is determined that the generated target interaction sentence is correct, and a reply sentence corresponding to the target interaction sentence is output; If the multiple entity information in the target interaction sentence does not match the semantic slot intended by the target user, the target interaction sentence is verified according to the sentence type of the target interaction sentence.
  • the target interactive sentence verification unit is further configured to call a preset natural language understanding model to determine whether the target interactive sentence is a task-type sentence, and if the target interactive sentence is Task-type sentence, the reply sentence corresponding to the target interactive sentence is output; if the target interactive sentence is not a task-type sentence, the user is prompted to re-enter the user sentence, and the target interactive sentence is generated again according to the re-input user sentence .
  • an embodiment of the present application provides a terminal device, including a memory, a processor, and a computer program stored in the memory and running on the processor.
  • the processor executes the computer program, The voice interaction method described in any one of the foregoing first aspect is implemented.
  • an embodiment of the present application provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor of a terminal device, any one of the above-mentioned aspects of the first aspect is implemented.
  • the voice interaction method is a method that uses a computer program to execute the computer program.
  • the embodiments of the present application provide a computer program product, which when the computer program product runs on a terminal device, causes the terminal device to execute the voice interaction method described in any one of the above-mentioned first aspects.
  • the embodiments of the present application include the following beneficial effects:
  • the actual intention of the user can be determined based on the above two kinds of entity information, and the current round can be determined according to the intention.
  • the second user sentence is rewritten to generate a target interactive sentence, so that applications such as the voice assistant in the terminal device can respond according to the target interactive sentence.
  • the existing mature single-round dialogue technology can be used to reply to the user's intention, and the accuracy of dialogue state tracking and user intention recognition can be improved. It can improve the natural language processing capabilities of the dialogue system, and enhance the rationality of the dialogue system’s reply during multiple rounds of dialogue, so that the system’s reply can better match the actual needs of the user and reduce the number of interactions between the user and the dialogue system.
  • Figure 1 is a schematic diagram of the operation process of a multi-round dialogue state tracking scheme based on knowledge base reasoning in the prior art
  • FIG. 2 is a schematic diagram of the operation process of a multi-round dialogue state tracking solution based on a learning model in the prior art
  • FIG. 3 is a schematic diagram of the operation process of a voice interaction method provided by an embodiment of the present application.
  • FIG. 4 is a schematic diagram of an application scenario of a voice interaction method provided by an embodiment of the present application.
  • FIG. 5 is a schematic diagram of the hardware structure of a mobile phone to which the voice interaction method provided by an embodiment of the present application is applicable;
  • FIG. 6 is a schematic diagram of the software structure of a mobile phone to which the voice interaction method provided by an embodiment of the present application is applicable;
  • FIG. 7 is a schematic step flowchart of a voice interaction method provided by an embodiment of the present application.
  • FIG. 8 is a schematic step flowchart of a voice interaction method provided by another embodiment of the present application.
  • FIG. 9 is a schematic diagram of a calculation process of the distribution probability of entity information provided by an embodiment of the present application.
  • FIG. 10 is a schematic diagram of the operation process of a voice interaction method provided by another embodiment of the present application.
  • FIG. 11 is a structural block diagram of a voice interaction device provided by an embodiment of the present application.
  • FIG. 12 is a schematic structural diagram of a terminal device provided by an embodiment of the present application.
  • FIG. 1 it is a schematic diagram of the operation process of a multi-round dialogue state tracking solution in the prior art.
  • This scheme is a scheme based on question and answer (Question&Answering, QA) knowledge base reasoning, and its specific operation process is as follows:
  • the corresponding candidate multi-round dialogue set can be obtained, as shown in box 102 in FIG. 1.
  • the similarity between the current dialogue and the candidate dialogue is calculated.
  • the specific strategies include: calculating the semantic similarity between the current input and the candidate question as the first similarity; calculating the context of the current input and each candidate The semantic similarity of the question context is used as the second similarity; the similarity between the summary information of the current multiple rounds of dialogue and each candidate multiple rounds of dialogue is calculated as the third similarity.
  • the weighted summation of the three similarities obtains the similarity between each candidate question and the current input, and the response corresponding to the candidate question with the largest similarity is used as the output response. This step is shown in box 103 in FIG. 1.
  • the key information extraction in multiple rounds of dialogue has no primary or secondary distinction, that is, no key information related to the current round of input is extracted, and the extracted redundant information will affect the accuracy of dialogue state tracking. ;
  • the accuracy of dialogue state tracking largely depends on the coverage of the knowledge base, in view of the complexity of natural language dialogue in real scenes, it is actually difficult to obtain an ideal knowledge base with extensive coverage;
  • the The method of obtaining the state tracking results in the scheme depends on various pre-defined rules, which also greatly affects the generalization ability and robustness of the model.
  • FIG. 2 it is a schematic diagram of the operation process of another multi-round dialogue state tracking solution in the prior art.
  • This solution is a DST solution based on a learning model. It tracks the status information of each round of dialogue in turn, and updates the state of the current round of dialogue through the mechanism of copying the stream, thereby realizing the tracking of the long-term dialogue state.
  • the specific operation process includes step S201 -S204:
  • the key information in the current round of dialogue and the previous round of dialogue is extracted through the semi-supervised neural network model, and the keyword sequences corresponding to the above two rounds of sentences are generated.
  • a new encoder-decoder network based on the copy-stream mechanism is adopted to express dialogue status information by displaying a sequence of words.
  • the copy flow mechanism can transmit the information flow of the dialogue history through copying, and finally participate in the generation of the target sentence for the current round of dialogue replies.
  • the decoder module is used to automatically generate the target sentence of the current round of dialogue reply, and then complete the response to the user's inquiry.
  • the key information of historical dialogue is extracted based on the semi-supervised neural network model, which may lead to the loss or mis-extraction of key information, which will affect the understanding of historical dialogue;
  • the historical dialogue is tracked round by round. Updating the dialogue state will easily lead to higher time complexity of the model and error accumulation.
  • this solution relies too much on the model’s ability to understand historical dialogue, and the encoder-decoder network at this stage is difficult to achieve a higher level of time complexity. The accuracy meets the actual needs of the scene.
  • the core idea of the embodiments of the present application is that in the multi-round dialogue state tracking process of the dialogue system, the sentence of the current round is rewritten based on the key information in the historical dialogue to complete the omitted information of the current round of dialogue. , Thereby converting multiple rounds of dialogue questions into a single round of dialogue.
  • the voice interaction method provided by the embodiments of the present application tracks user intentions based on key information of historical dialogues, overcomes the shortcomings of various solutions in the prior art to a certain extent, and improves the accuracy of state tracking in multiple rounds of dialogue and the dialogue system The accuracy of the response.
  • the voice interaction method uses a named entity recognition (Named Entity Recognition, NER) module to extract entity information in historical conversations, and then uses Knowledge Bases (KBs) and Pointer-Generator Networks , PGN) model calculates the attention distribution of the entity information in the historical dialogue and the entities in the current round of dialogue. By filtering the entity information in the historical dialogue, redundant entities are discarded, and the participants in the current round of dialogue state tracking are determined Key Information. Such a processing method not only reduces the impact of redundant information on dialogue status tracking, but also provides effective key information for subsequent steps.
  • NER Named Entity Recognition
  • KBs Knowledge Bases
  • PGN Pointer-Generator Networks
  • the role of key entity information in the dialogue state (represented by the distribution probability) is calculated, and the feedforward neural network is used as part of the entire model to determine whether it is Directly affect the dialogue state. This avoids tracking the dialogue state round by round and reduces the accumulation of errors.
  • a mature single-round dialogue related module is used to generate multiple rounds of dialogue reply sentences to improve the accuracy of the dialogue system's response.
  • FIG. 3 it is a schematic diagram of the operation process of the voice interaction method provided by an embodiment of the present application.
  • this method first extracts key information directly related to the dialogue state in the historical dialogue, and then rewrites the current round of sentences in combination with the model to complete the tracking and fusion of dialogue state information. Then, using the existing single-round dialogue processing module, on the basis of rewriting the sentence, the corresponding reply to the user's inquiry in the multi-round dialogue is completed.
  • the operation process of this method can be realized by the following multiple modules:
  • the probability distribution of key information is calculated based on a supervised feedforward network.
  • the decoder link of the PGN model is used to complete or rewrite the current round of sentences.
  • FIG. 4 it is a schematic diagram of an application scenario of a voice interaction method provided by an embodiment of the present application.
  • the user’s query sentence is: "Please tell me the nearest restaurant.”
  • the dialogue system replied: "The nearest restaurant is on Nongda South Road, Haidian District.
  • the current dialogue can be regarded as the current round of dialogue, the first two
  • the round of dialogue is a historical dialogue.
  • entities in historical dialogues can be extracted based on the entity extraction module, including “Recent”, “Restaurant”, “Nongda South Road in Haidian District”, “Haidilao”, “Friday” and “Temperature”.
  • entity extraction module uses the screening entity module and combine the predefined KBs and PGN models to calculate the probability distribution of the above entities in the historical dialogue, and obtain the key entities related to the dialogue state, namely "Friday” and “Temperature”
  • Other entities such as “Xinyi”, “Restaurant”, “Nongda South Road in Haidian District” and “Haidilao” are redundant entities.
  • the dialogue system can determine the basic rewrite sentence based on the key entities obtained, that is, "What is the temperature on Friday?"
  • the key information distribution prediction module can be used to further predict the probability distribution of key information related to the current dialogue state based on the feedforward network, and the probability of obtaining "temperature” is 0.86, and the probability of "Friday” is 0.72.
  • the dialogue system can invite users to participate in the configuration, that is, the dialogue system in Figure 4 can ask the user: "Do you want to query the temperature information of "Friday” in Beijing ?” Then use the sentence rewrite module to generate the rewritten sentence "What is the temperature in Beijing on Friday?”
  • the dialogue system can use the generating reply module to generate a reply to the rewritten sentence, that is, the reply of the dialogue system in Figure 4: "The temperature in Beijing on Friday is."
  • the basic rewritten sentence may come from historical dialogue data or from the current dialogue. That is, the basic sentence may be selected from a certain user sentence in the historical dialogue or the current user sentence.
  • the standard can be determined according to the number of target entity information and key entity information contained in the sentence. For example, in the above example, the historical dialogue "What is the temperature on Friday?" contains two key entity information, "Friday” and "Temperature", while the current dialogue "Beijing" contains only one target entity information. , So you can choose the historical dialogue "What's the temperature on Friday?" as the basis for rewriting the sentence.
  • the voice interaction method provided by the embodiments of this application can be applied to mobile phones, tablet computers, wearable devices, in-vehicle devices, augmented reality (AR)/virtual reality (VR) devices, notebook computers, and super mobile personal computers
  • AR augmented reality
  • VR virtual reality
  • UMPC ultra-mobile personal computer
  • netbooks netbooks
  • PDA personal digital assistant
  • Fig. 5 shows a block diagram of a part of the structure of a mobile phone provided by an embodiment of the present application.
  • the mobile phone includes: a radio frequency (RF) circuit 510, a memory 520, an input unit 530, a display unit 540, a sensor 550, an audio circuit 560, a wireless fidelity (Wi-Fi) module 570, a processing 580, and power supply 590.
  • RF radio frequency
  • the RF circuit 510 can be used for receiving and sending signals during information transmission or communication. In particular, after receiving the downlink information of the base station, it is processed by the processor 580; in addition, the designed uplink data is sent to the base station.
  • the RF circuit includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier (LNA), a duplexer, and the like.
  • the RF circuit 510 can also communicate with the network and other devices through wireless communication.
  • the above-mentioned wireless communication can use any communication standard or protocol, including but not limited to Global System of Mobile Communication (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (Code Division) Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), Long Term Evolution (LTE), Email, Short Messaging Service (SMS), etc.
  • GSM Global System of Mobile Communication
  • GPRS General Packet Radio Service
  • CDMA Code Division Multiple Access
  • WCDMA Wideband Code Division Multiple Access
  • LTE Long Term Evolution
  • Email Short Messaging Service
  • the memory 520 may be used to store software programs and modules.
  • the processor 580 executes various functional applications and data processing of the mobile phone by running the software programs and modules stored in the memory 520.
  • the memory 520 may mainly include a program storage area and a data storage area.
  • the program storage area may store an operating system, an application program required by at least one function (such as a sound playback function, an image playback function, etc.), etc.; Data created by the use of mobile phones (such as audio data, phone book, etc.), etc.
  • the memory 520 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, or other volatile solid-state storage devices.
  • the input unit 530 may be used to receive inputted digital or character information, and generate key signal input related to user settings and function control of the mobile phone 500.
  • the input unit 530 may include a touch panel 531 and other input devices 532.
  • the touch panel 531 also called a touch screen, can collect the user's touch operations on or near it (for example, the user uses any suitable objects or accessories such as fingers, stylus, etc.) on the touch panel 531 or near the touch panel 531. Operation), and drive the corresponding connection device according to the preset program.
  • the touch panel 531 may include two parts: a touch detection device and a touch controller.
  • the touch detection device detects the user's touch position, detects the signal brought by the touch operation, and transmits the signal to the touch controller; the touch controller receives the touch information from the touch detection device, converts it into contact coordinates, and then sends it To the processor 580, and can receive and execute the commands sent by the processor 580.
  • the touch panel 531 can be implemented in multiple types such as resistive, capacitive, infrared, and surface acoustic wave.
  • the input unit 530 may also include other input devices 532.
  • the other input device 532 may include, but is not limited to, one or more of a physical keyboard, function keys (such as a volume control button, a switch button, etc.), a trackball, a mouse, and a joystick.
  • the display unit 540 may be used to display information input by the user or information provided to the user and various menus of the mobile phone.
  • the display unit 540 may include a display panel 541.
  • the display panel 541 may be configured in the form of a liquid crystal display (Liquid Crystal Display, LCD), an organic light-emitting diode (Organic Light-Emitting Diode, OLED), etc.
  • the touch panel 531 can cover the display panel 541. When the touch panel 531 detects a touch operation on or near it, it is transmitted to the processor 580 to determine the type of the touch event, and then the processor 580 determines the type of the touch event. The type provides corresponding visual output on the display panel 541.
  • the touch panel 531 and the display panel 541 are used as two independent components to realize the input and input functions of the mobile phone, but in some embodiments, the touch panel 531 and the display panel 541 can be integrated. Realize the input and output functions of the mobile phone.
  • the mobile phone 500 may also include at least one sensor 550, such as a light sensor, a motion sensor, and other sensors.
  • the light sensor can include an ambient light sensor and a proximity sensor.
  • the ambient light sensor can adjust the brightness of the display panel 541 according to the brightness of the ambient light.
  • the proximity sensor can close the display panel 541 and/or when the mobile phone is moved to the ear. Or backlight.
  • the accelerometer sensor can detect the magnitude of acceleration in various directions (usually three-axis), and can detect the magnitude and direction of gravity when it is stationary.
  • the audio circuit 560, the speaker 561, and the microphone 562 can provide an audio interface between the user and the mobile phone.
  • the audio circuit 560 can transmit the electric signal converted from the received audio data to the speaker 561, and the speaker 561 converts it into a sound signal for output; on the other hand, the microphone 562 converts the collected sound signal into an electric signal, and the audio circuit 560 After being received, it is converted into audio data, and then processed by the audio data output processor 580, and sent to, for example, another mobile phone via the RF circuit 510, or the audio data is output to the memory 520 for further processing.
  • Wi-Fi is a short-distance wireless transmission technology.
  • Wi-Fi module 570 mobile phones can help users send and receive emails, browse web pages, and access streaming media. It provides users with wireless broadband Internet access.
  • FIG. 5 shows the Wi-Fi module 570, it is understandable that it is not a necessary component of the mobile phone 500, and can be omitted as needed without changing the essence of the invention.
  • the processor 580 is the control center of the mobile phone. It uses various interfaces and lines to connect various parts of the entire mobile phone. Various functions and processing data of the mobile phone can be used to monitor the mobile phone as a whole.
  • the processor 580 may include one or more processing units; preferably, the processor 580 may integrate an application processor and a modem processor, where the application processor mainly processes the operating system, user interface, application programs, etc. , The modem processor mainly deals with wireless communication. It can be understood that the foregoing modem processor may not be integrated into the processor 580.
  • the mobile phone 500 also includes a power source 590 (such as a battery) for supplying power to various components.
  • a power source 590 such as a battery
  • the power source can be logically connected to the processor 580 through a power management system, so that functions such as charging, discharging, and power consumption management can be managed through the power management system.
  • the mobile phone 500 may also include a camera.
  • the position of the camera on the mobile phone 500 may be front or rear, which is not limited in the embodiment of the present application.
  • the mobile phone 500 may include a single camera, a dual camera, or a triple camera, etc., which is not limited in the embodiment of the present application.
  • the mobile phone 500 may include three cameras, of which one is a main camera, one is a wide-angle camera, and one is a telephoto camera.
  • the multiple cameras may be all front-mounted, or all rear-mounted, or partly front-mounted and another part rear-mounted, which is not limited in the embodiment of the present application.
  • the mobile phone 500 may also include a Bluetooth module, etc., which will not be repeated here.
  • FIG. 6 is a schematic diagram of the software structure of a mobile phone 500 according to an embodiment of the present application.
  • the Android system is divided into four layers, namely the application layer, the application framework layer (framework, FWK), the system layer, and the hardware abstraction layer. Communication between the layers through software interface.
  • the application layer may include a series of application packages, which may include applications such as short message, calendar, camera, video, navigation, gallery, and call.
  • applications such as short message, calendar, camera, video, navigation, gallery, and call.
  • the application framework layer provides an application programming interface (application programming interface, API) and a programming framework for applications in the application layer.
  • the application framework layer may include some predefined functions, such as functions for receiving events sent by the application framework layer.
  • the application framework layer can include a window manager, a resource manager, and a notification manager.
  • the window manager is used to manage window programs.
  • the window manager can obtain the size of the display screen, determine whether there is a status bar, lock the screen, take a screenshot, etc.
  • the content provider is used to store and retrieve data and make these data accessible to applications.
  • the data may include videos, images, audios, phone calls made and received, browsing history and bookmarks, phone book, etc.
  • the resource manager provides various resources for the application, such as localized strings, icons, pictures, layout files, video files, and so on.
  • the notification manager enables the application to display notification information in the status bar, which can be used to convey notification-type messages, and it can automatically disappear after a short stay without user interaction.
  • the notification manager is used to notify download completion, message reminders, and so on.
  • the notification manager can also be a notification that appears in the status bar at the top of the system in the form of a chart or a scroll bar text, such as a notification of an application running in the background, or a notification that appears on the screen in the form of a dialog window. For example, text messages are prompted in the status bar, prompt sounds, electronic devices vibrate, and indicator lights flash.
  • the application framework layer can also include:
  • a view system which includes visual controls, such as controls that display text, controls that display pictures, and so on.
  • the view system can be used to build applications.
  • the display interface can be composed of one or more views.
  • a display interface that includes a short message notification icon may include a view that displays text and a view that displays pictures.
  • the phone manager is used to provide the communication function of the mobile phone 500. For example, the management of the call status (including connecting, hanging up, etc.).
  • the system layer can include multiple functional modules. For example: sensor service module, physical state recognition module, 3D graphics processing library (for example: OpenGL ES), etc.
  • the sensor service module is used to monitor the sensor data uploaded by various sensors at the hardware layer and determine the physical state of the mobile phone 500;
  • Physical state recognition module used to analyze and recognize user gestures, faces, etc.
  • the 3D graphics processing library is used to implement 3D graphics drawing, image rendering, synthesis, and layer processing.
  • the system layer can also include:
  • the surface manager is used to manage the display subsystem and provides a combination of 2D and 3D layers for multiple applications.
  • the media library supports playback and recording of a variety of commonly used audio and video formats, as well as still image files.
  • the media library can support a variety of audio and video encoding formats, such as: MPEG4, H.264, MP3, AAC, AMR, JPG, PNG, etc.
  • the hardware abstraction layer is the layer between hardware and software.
  • the hardware abstraction layer can include display drivers, camera drivers, sensor drivers, etc., used to drive related hardware at the hardware layer, such as display screens, cameras, sensors, and so on.
  • the following embodiments can be implemented on the mobile phone 500 having the above hardware structure/software structure.
  • the following embodiments will take the mobile phone 500 as an example to describe the voice interaction method provided by the embodiments of the present application.
  • FIG. 7 a schematic step flowchart of a voice interaction method provided by an embodiment of the present application is shown.
  • the method may be applied to the above-mentioned mobile phone 500, and the method may specifically include the following steps:
  • the user sentence may be a certain sentence directly spoken by the user when using an application such as a voice assistant in a terminal device. For example, if a user wants to inquire about the weather tomorrow, the user can wake up the voice assistant in the phone and say "what is the weather tomorrow" or a similar sentence.
  • the user can make multiple rounds of dialogue with the voice assistant to prompt the voice assistant to fully and accurately understand the user's intention, and return information that satisfies the intention.
  • the user sentence to be replied in this embodiment may be a sentence or word spoken by the user during a non-first round of dialogue, that is, the voice assistant has completed at least one round with the user before receiving the user sentence to be replied. dialogue.
  • the voice assistant and other programs can obtain the dialogue data of the previous rounds of dialogue between the user and the voice assistant in the current dialogue process, combined with historical dialogue data Determine the real intention of the user in this round of dialogue.
  • the historical dialogue data can be all the dialogue data after the user wakes up the voice assistant this time, or it can also be the dialogue data in a specific previous round, such as the data of the first three rounds of this round of dialogue, in this embodiment There is no restriction on this.
  • S702 Identify the target entity information in the user sentence, and identify the historical entity information in the historical dialogue data.
  • Entity is a term often used in the information world to represent a conceptual thing.
  • nouns can be used to represent entity information, such as names of persons, places, organizations, etc.; a small amount of entity information can also be represented by other part-of-speech words, such as adjectives.
  • user sentences and entity information in historical dialogue data can be identified based on the NER model.
  • the sentence can be segmented first, and then each word after the segmentation is judged one by one whether it belongs to an entity word, and each entity word is labeled.
  • the entity information identified from the user sentences of the current round of dialogue can be used as historical entity information in the next round and subsequent dialogue rounds. Therefore, for the entity information in the historical dialogue data, after obtaining each sentence in the historical dialogue data, the sentence can be segmented to find out the entity information; or it can be directly extracted from the previous rounds that have been marked as The words of the entity information are used as historical entity information, which is not limited in this embodiment.
  • the user sentence to be replied is the dialogue sentence of the current round, and the entity information contained therein is basically closely related to the user's intention, all the target entity information contained in the user sentence can be retained. For historical entity information, it is necessary to distinguish which is useful information for the current round of dialogue, and which is redundant information.
  • the key entity information associated with the current round of dialogue sentences can be filtered from the historical entity information.
  • These key entity information can be regarded as information that has obvious benefits in identifying the user's intention.
  • multiple user intentions can be set in the voice assistant according to different application scenarios, and multiple associated entity information can be configured for each user intention.
  • other entity information that may be included in the intention can be filtered from the intention containing the target entity information, and then the key entity information can be identified from the historical entity information.
  • the key entity information can be identified from the historical entity information. For example, for the intention of "weather forecast”, multiple entity information such as "time”, "location”, and “weather conditions” can be configured for it. If the target entity information is "weather conditions", the historical entity information can be added Those entity information that meets the "time” and "location” requirements are identified as key entity information.
  • S704 Generate a target interaction sentence according to the target entity information and the key entity information.
  • the target interaction sentence matching the actual intention of the user can be generated based on the above two kinds of information.
  • the target entity information and key entity information include time information "Friday", location information "Beijing”, and weather condition information "Temperature”, it can be recognized that the user currently wants to query the temperature of Beijing on Friday.
  • the target interaction sentence corresponding to this can be "What is the temperature in Beijing this Friday", or other similar sentences.
  • the above-mentioned target interaction sentence is also the expression sentence pattern of the information that the user wants to query.
  • S705 Output a reply sentence corresponding to the target interactive sentence.
  • the function of the voice assistant is to facilitate users to query certain information by voice. Therefore, after identifying the target interaction sentence that matches the user's actual intention, the voice assistant can search for the sentence and find the corresponding reply sentence.
  • the corresponding reply sentence may be "The temperature in Beijing on Friday is 17 degrees Celsius”.
  • the reply sentence can be broadcast to the user by voice, or displayed in the mobile phone interface in the form of text, or sent to the user's mobile phone in other information formats, which is not limited in this embodiment.
  • the actual intention of the user can be determined based on the above two kinds of entity information, and the user’s actual intention can be determined according to the intention.
  • the user sentence of the current round is rewritten to generate the target interactive sentence, so that applications such as the voice assistant in the terminal device can respond according to the target interactive sentence.
  • the existing mature single-round dialogue technology can be used to reply to the user's intention, and the accuracy of dialogue state tracking and user intention recognition can be improved. It can improve the natural language processing capabilities of the dialogue system, and enhance the rationality of the dialogue system’s reply during multiple rounds of dialogue, so that the system’s reply can better match the actual needs of the user and reduce the number of interactions between the user and the dialogue system.
  • FIG. 8 there is shown a schematic step flowchart of a voice interaction method provided by another embodiment of the present application.
  • the method may specifically include the following steps:
  • this embodiment takes the terminal device as a mobile phone as an example for subsequent introduction. That is, when a user uses an application such as a voice assistant in a mobile phone, this type of application identifies the user's entity information in the current round and previous rounds to determine the corresponding user intention, and based on the intention, the current round The second user sentence is rewritten, and a reply sentence corresponding to the rewritten user sentence is output to meet the actual needs of the user.
  • an application such as a voice assistant in a mobile phone
  • the user sentence to be replied may refer to a certain sentence directly uttered by the user during the interaction with the voice assistant.
  • This sentence may be a sentence that can fully express the intention of a certain user, or it may be One or more words.
  • the voice assistant When the voice assistant receives a certain sentence from the user, it can first determine whether it can give a corresponding reply for the sentence. If the voice assistant can directly give a reply based on the sentence, no other processing is required, and the reply sentence can be directly provided to the user. For example, if the user's sentence is "What is the temperature in Beijing this Friday", because the sentence can directly determine that the user's intention is to inquire about the weather in Beijing this Friday, the voice assistant can directly output the result according to the query To the user.
  • the user's intention can be re-determined by combining the user's expressions in the previous rounds.
  • the historical dialogue data between the user and the voice assistant can be obtained.
  • the aforementioned historical dialogue data can be the dialogue data of all rounds after the user wakes up the voice assistant this time until the current round, or it can be the dialogue data of several consecutive rounds before the current round. This embodiment does not do this. limited.
  • S802 Identify the target entity information in the user sentence, and identify the historical entity information in the historical dialogue data.
  • user sentences and entity information in historical dialogue data can be identified based on the NER model.
  • the historical entity information in the historical dialogue data may include the entity information in the sentence spoken by the user in a certain round, and may also include the entity information in the reply sentence when the voice assistant replies to the user.
  • the historical entity information may include the "restaurant” in the user's sentence, as well as the entity information such as "Nongda South Road, Haidian District” and "Haidilao" in the voice assistant's reply sentence.
  • S803 According to the target entity information and the historical entity information, determine a candidate user intention that matches the user sentence.
  • the candidate user intentions preliminarily determined based on the target entity information and the historical entity information may also include multiple types.
  • the KBs can be combined to preliminarily determine the current possible intentions of the user.
  • multiple user intentions can be preset in KBs, and each user intention can include multiple semantic slots. After identifying the target entity information and historical entity information, you can match the slots corresponding to each user intent based on the above two entity information, so as to match the user intents corresponding to the slots containing part of the identified entity information , Which is preliminarily determined as a candidate user’s intention.
  • S804 Calculate the distribution probability of each historical entity information in the historical dialogue data.
  • the distribution probability of each historical entity information in the historical dialogue data may be calculated first.
  • the distribution probability of each historical entity information can be determined based on the PGN model. First, symbolize each historical entity information, then call the PGN model, and use the encoding module of the PGN model to encode each historical entity information after the symbolization process, and calculate the distribution of each historical entity information in the encoding link Probability.
  • the prediction model can be trained with training data and KBs to enhance the key information extraction capability of the PGN model.
  • the above-mentioned training data may be pre-collected multiple rounds of dialogue data, including entity information in a certain round (current round) of the dialogue and historical entity information (rounds before the current round) in the pre-collected training data.
  • the corresponding attention distribution can be output after converting it into a text vector; at the same time, combining the encoding module and decoding module of the PGN model to obtain the generation probability of historical entity information.
  • the above-mentioned various types of probabilities can be added together to output the final distribution probability.
  • the user's confirmation information can also be combined to improve the output distribution probability and the reliability of the identified key entity information.
  • the key entity information associated with the user's intention is found, that is, the entity information that has a greater correlation with the user's intention is selected from all the historical entity information .
  • the candidate entity information associated with any candidate user's intention can be extracted from the historical entity information, and then the candidate entity information whose distribution probability is greater than a certain preset probability threshold can be extracted as the candidate entity information related to the intention Key entity information.
  • the probability threshold may be set to 0.8. Therefore, candidate entity information whose distribution probability is greater than 0.8 can be identified as key entity information.
  • the user may be invited to identify the entity information.
  • the candidate entity information and key entity information corresponding to the target probability value can be used,
  • the query sentence is generated to instruct the user to identify the candidate entity information corresponding to the target probability value, and the target probability value is the probability value of the distribution probability of any candidate entity information in the historical dialogue data.
  • the candidate entity information corresponding to the target probability value can be identified as key entity information.
  • the probability value of the historical entity information "temperature” is calculated to be 0.86, which is greater than the set probability threshold of 0.8.
  • the entity information "temperature” can be Identify as key entity information.
  • the calculated probability value of the historical entity information "Friday” is 0.72, which is less than the above-mentioned probability threshold of 0.8, but is in the vicinity of the threshold.
  • the target entity information in the current round of user sentences is "Beijing"
  • the aforementioned historical entity information "Friday” can also be identified as key entity information.
  • the target interaction sentence matching the actual intention of the user can be generated based on the above two kinds of information.
  • the target basic sentence in order to reduce the difficulty of generating the target interactive sentence, can be determined first, and then rewritten on the basis of the target basic sentence to obtain the final target interactive sentence.
  • the target basic sentence may be determined based on key entity information and/or target entity information.
  • the aforementioned entity information to be evaluated includes all target entity information and key entity information.
  • the basic sentence may be the current user sentence or a certain sentence of the user sentence in the historical dialogue data.
  • the degree of matching between the entity information to be evaluated and each basic sentence can be determined according to the degree of matching between the entity information to be evaluated and the semantic slot.
  • the number of semantic slots in the basic sentence and the number of entity information to be evaluated can be counted, that is, how many slots are included in the basic sentence to be calculated, and the number of slots to be identified can be counted.
  • the target entity information and key entity information can be used to rewrite the sentence to obtain the final target interactive sentence.
  • target entity information or key entity information for sentence rewriting depends on whether the target basic sentence is the current user sentence or the user sentence in the historical dialogue. If the target basic sentence is the current user sentence, since the user sentence already contains all the target entity information, you can use the key entity information identified from the historical dialogue to rewrite; if the target basic sentence is in the historical dialogue Since the sentence may only contain part of the key entity information, you can use all the key entity information and target entity information to rewrite the sentence.
  • the target interactive sentence may be output based on the PGN model.
  • the PGN model may also include a decoding module.
  • the decoding module can be obtained by training various types of training data.
  • the various types of training data can include multiple entity information and basic sentences corresponding to each entity information.
  • the decoding module of the PGN model can be used to decode the target entity information, key entity information, and target basic sentence, and output the target interactive sentence.
  • the target interactive sentence output by the PGN model it can be verified whether the sentence is rewritten correctly.
  • multiple entity information in the target interaction sentence may be extracted first, and it is verified whether the multiple entity information in the target interaction sentence matches the preset semantic slot of the target user's intention in the knowledge base.
  • the target user intention is any one of all candidate user intentions.
  • step S808 is executed to output a reply sentence corresponding to the target interaction sentence.
  • the target interactive sentence can be verified a second time according to the sentence type of the target interactive sentence.
  • step S808 can also be executed to output a reply sentence corresponding to the target interactive sentence; if the target interactive sentence is not a task-type sentence, it means that the voice assistant cannot perform specific intent recognition on the sentence or the recognized intention lacks a change.
  • the user can be prompted to re-enter the user sentence at this time, and the voice assistant can recognize the user's intention again according to the re-entered user sentence, and generate a new target interaction sentence.
  • S808 Output a reply sentence corresponding to the target interactive sentence.
  • the user sentence in the current round can be rewritten, so as to convert the dialogue state tracking problem in multiple rounds of dialogues to a single round of dialogue questions to a certain extent ,
  • FIG. 10 it is a schematic diagram of the operation process of the voice interaction method provided by an embodiment of the present application.
  • the entire voice interaction may include the following steps:
  • the improved model still cannot determine whether the entity should appear in the output sentence.
  • the user may be invited to participate in the configuration of the key entity; then, the key entity greater than the threshold is combined, based on the above determination.
  • the decoding module of the PGN model is used to generate the sentence.
  • the reason for inviting users to participate in entity configuration is to increase the recall rate of key information extracted by the model.
  • the threshold can be set higher. But too high a threshold may also result in the loss of some key information. Therefore, it is necessary to invite users to participate in the configuration for entity information close to the threshold to further improve the recall rate of the key information extracted by the model.
  • the reliability of the output sentences obtained in this way is also high, which can be used as training corpus to iteratively optimize the model, which partially solves the problem of difficulty in obtaining high-quality multi-round dialogue materials.
  • this embodiment designs a two-layer feedback mechanism, and the specific method is as follows:
  • the rewritten sentence is matched with the slot value corresponding to the intent in KBs. If the match is successful, the rewriting is considered correct; if the match is unsuccessful, the rewritten sentence can be verified by using natural language understanding technology. If natural language understanding technology is used If it is recognized that the sentence is a task-type sentence, it can be considered that the rewriting is correct; if it is recognized that the sentence is not a task-type sentence, it can be considered that the rewriting is wrong. At this point, the user can be guided to restate the intention as the follow-up training corpus.
  • the DST problem in multiple rounds of dialogue can be converted to a single-round dialogue question to a certain extent, and the existing mature single-talk dialogue technology can be used to respond to user intentions and improve task orientation The capabilities and user experience of a multi-round dialogue system.
  • FIG. 11 shows a structural block diagram of a voice interaction device provided by an embodiment of the present application. For ease of description, only the parts related to the embodiment of the present application are shown.
  • the device can be applied to terminal equipment, and specifically can include the following modules:
  • the historical dialogue data acquisition module 1101 is used to acquire historical dialogue data when a user sentence to be replied is received;
  • the target entity information identification module 1102 is used to identify the target entity information in the user sentence; and,
  • the historical entity information identification module 1103 is used to identify historical entity information in the historical dialogue data
  • the key entity information extraction module 1104 is configured to extract key entity information associated with the user sentence from the historical entity information;
  • the target interactive sentence generating module 1105 is configured to generate a target interactive sentence according to the target entity information and the key entity information;
  • the reply sentence output module 1106 is used to output a reply sentence corresponding to the target interactive sentence.
  • the key entity information extraction module may specifically include the following submodules:
  • a candidate user intention determination sub-module configured to determine a candidate user intention that matches the user sentence according to the target entity information and the historical entity information;
  • the distribution probability calculation sub-module is used to separately calculate the distribution probability of each historical entity information in the historical dialogue data
  • the key entity information extraction sub-module is used to extract key entity information from the historical entity information according to the distribution probability and the candidate user's intention.
  • the distribution probability calculation sub-module may specifically include the following units:
  • the first pointer generation network model calling unit is configured to call a preset pointer generation network model, and use the coding module of the pointer generation network model to respectively encode each historical entity information to obtain information corresponding to each historical entity information. The corresponding distribution probability.
  • the key entity information extraction submodule may specifically include the following units:
  • a candidate entity information extraction unit configured to extract candidate entity information associated with any candidate user's intention from the historical entity information
  • the key entity information extraction unit is configured to extract candidate entity information whose distribution probability is greater than a preset probability threshold as key entity information.
  • the key entity information extraction submodule may further include the following units:
  • the query sentence generating unit is configured to: if the difference between the target probability value and the preset probability threshold is less than the preset difference, and the target probability value is less than the preset probability threshold, then according to the target probability value Corresponding to the candidate entity information and the key entity information, generating a query sentence to instruct the user to identify the candidate entity information corresponding to the target probability value;
  • the key entity information determining unit is configured to determine the candidate entity information corresponding to the target probability value as the key entity information when the user's confirmation information for the query sentence is received, and the target probability value is any candidate The probability value of the distribution probability of the entity information in the historical dialogue data.
  • the target interactive sentence generation module may specifically include the following sub-modules:
  • the target basic sentence determination sub-module is used to determine the target basic sentence
  • the target interactive sentence generating sub-module is used to use the target entity information and the key entity information to rewrite the target basic sentence to generate a target interactive sentence.
  • the target basic sentence determination submodule may specifically include the following units:
  • the basic sentence obtaining unit is configured to obtain a plurality of basic sentences from the user sentence containing the target entity information and the historical dialogue data containing the key entity information;
  • a matching degree calculation unit configured to calculate the matching degree between the plurality of basic sentences and the entity information to be evaluated, where the entity information to be evaluated includes the target entity information and the key entity information;
  • the target basic sentence identification unit is used to identify the basic sentence corresponding to the maximum matching degree as the current target basic sentence.
  • any basic sentence includes multiple semantic slots
  • the matching degree calculation unit may specifically include the following subunits:
  • the statistics subunit is used to count the number of semantic slots in the basic sentence and the number of entity information to be evaluated for any basic sentence;
  • the determining sub-module is used to determine the number of key slots in the basic sentence that respectively match the information of the entity to be evaluated;
  • the calculation subunit is used to calculate the ratio between the number of key slots and the number of semantic slots in the basic sentence, and use the ratio as the difference between the entity information to be evaluated and the basic sentence suitability.
  • the pointer generation network model further includes a decoding module, which is obtained by training various types of training data, and the various types of training data include multiple entity information and information related to each entity.
  • a decoding module which is obtained by training various types of training data, and the various types of training data include multiple entity information and information related to each entity.
  • the target interactive sentence generation sub-module may specifically include the following units:
  • the second pointer generation network model calling unit is configured to use the decoding module to decode the target entity information, the key entity information, and the target basic sentence, and output a target interactive sentence.
  • the target interactive sentence generation submodule may further include the following units:
  • the target interactive sentence entity information extraction unit is used to extract multiple entity information in the target interactive sentence
  • the target interactive sentence verification unit is used to verify whether the multiple entity information in the target interactive sentence matches a preset semantic slot of the target user's intention in the knowledge base, and the target user's intention is any of the candidate user intentions One; if multiple entity information in the target interaction sentence matches the semantic slot of the target user's intention, it is determined that the generated target interaction sentence is correct, and the response sentence corresponding to the target interaction sentence is executed and output If the multiple entity information in the target interaction sentence does not match the semantic slot intended by the target user, then the target interaction sentence is verified according to the sentence type of the target interaction sentence.
  • the target interactive sentence verification unit is further configured to: call a preset natural language understanding model to determine whether the target interactive sentence is a task-type sentence; if the target interactive sentence is a task-type sentence, then Call the reply sentence output module to output a reply sentence corresponding to the target interactive sentence; if the target interactive sentence is not a task-type sentence, the user is prompted to re-enter the user sentence, and re-generated according to the re-input user sentence Target interactive statement.
  • the description is relatively simple, and for related parts, please refer to the description of the method embodiment part.
  • the terminal device 1200 of this embodiment includes: a processor 1210, a memory 1220, and a computer program 1221 that is stored in the memory 1220 and can run on the processor 1210.
  • the processor 1210 executes the computer program 1221
  • the steps in the various embodiments of the voice interaction method described above are implemented, for example, steps S701 to S705 shown in FIG. 7.
  • the processor 1210 executes the computer program 1221
  • the functions of the modules/units in the foregoing device embodiments are implemented, for example, the functions of the modules 1101 to 1106 shown in FIG. 11.
  • the computer program 1221 may be divided into one or more modules/units, and the one or more modules/units are stored in the memory 1220 and executed by the processor 1210 to complete This application.
  • the one or more modules/units may be a series of computer program instruction segments capable of completing specific functions, and the instruction segments may be used to describe the execution process of the computer program 1221 in the terminal device 1200.
  • the computer program 1221 can be divided into a historical dialogue data acquisition module, a target entity information recognition module, a historical entity information recognition module, a key entity information extraction module, a target interactive sentence generation module, and a reply sentence output module.
  • the specific functions of each module are as follows:
  • the historical dialogue data acquisition module is used to acquire historical dialogue data when the user sentence to be replied is received
  • the target entity information identification module is used to identify the target entity information in the user sentence
  • the historical entity information identification module is used to identify the historical entity information in the historical dialogue data
  • a key entity information extraction module for extracting key entity information associated with the user sentence from the historical entity information
  • a target interactive sentence generating module configured to generate a target interactive sentence according to the target entity information and the key entity information
  • the reply sentence output module is used to output the reply sentence corresponding to the target interactive sentence.
  • the terminal device 1200 may be a computing device such as a desktop computer, a notebook, or a palmtop computer.
  • the terminal device 1200 may include, but is not limited to, a processor 1210 and a memory 1220.
  • FIG. 12 is only an example of the terminal device 1200, and does not constitute a limitation on the terminal device 1200. It may include more or less components than those shown in the figure, or combine some components, or different components.
  • the terminal device 1200 may also include input and output devices, network access devices, buses, and so on.
  • the processor 1210 may be a central processing unit (Central Processing Unit, CPU), other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (ASIC), Ready-made programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the memory 1220 may be an internal storage unit of the terminal device 1200, such as a hard disk or a memory of the terminal device 1200.
  • the memory 1220 may also be an external storage device of the terminal device 1200, such as a plug-in hard disk equipped on the terminal device 1200, a smart memory card (Smart Media Card, SMC), and a Secure Digital (SD) Card, Flash Card, etc.
  • the memory 1220 may also include both an internal storage unit of the terminal device 1200 and an external storage device.
  • the memory 1220 is used to store the computer program 1221 and other programs and data required by the terminal device 1200.
  • the memory 1220 can also be used to temporarily store data that has been output or will be output.
  • the embodiment of the present application also discloses a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the aforementioned voice interaction method can be realized.
  • the disclosed voice interaction method, device, and terminal device can be implemented in other ways.
  • the division of the modules or units is only a logical function division, and there may be other divisions in actual implementation.
  • multiple units or components can be combined or integrated into another system, or some features can be ignored. Or not.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the computer program can be stored in a computer-readable storage medium. When executed by the processor, the steps of the foregoing method embodiments can be implemented.
  • the computer program includes computer program code, and the computer program code may be in the form of source code, object code, executable file, or some intermediate forms.
  • the computer-readable medium may include at least: any entity or device capable of carrying computer program code to a voice interaction device or terminal device, recording medium, computer memory, read-only memory (ROM, Read-Only Memory), random access Memory (RAM, Random Access Memory), electric carrier signal, telecommunications signal, and software distribution medium.
  • ROM read-only memory
  • RAM random access Memory
  • electric carrier signal telecommunications signal
  • software distribution medium for example, U disk, mobile hard disk, floppy disk or CD-ROM, etc.
  • computer-readable media cannot be electrical carrier signals and telecommunication signals.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Library & Information Science (AREA)
  • Health & Medical Sciences (AREA)
  • Acoustics & Sound (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Machine Translation (AREA)

Abstract

A voice interaction method and apparatus and a terminal device, applicable to the technical field of artificial intelligence. The method comprises: obtaining historical dialogue data when a user statement to be replied is received; identifying target entity information in the user statement, and identifying historical entity information in the historical dialogue data; extracting key entity information associated with the user statement from the historical entity information; generating a target interaction statement according to the target entity information and the key entity information; and outputting a reply statement corresponding to the target interaction statement. According to the method, the accuracy of dialogue state tracking and user intention recognition can be improved, the natural language processing capacity of the dialogue system is improved, and the reply rationality of the dialogue system in the multi-round dialogue process is enhanced.

Description

语音交互方法、装置和终端设备Voice interaction method, device and terminal equipment
本申请要求于2020年03月31日提交国家知识产权局、申请号为202010244784.8、申请名称为“语音交互方法、装置和终端设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the State Intellectual Property Office, the application number is 202010244784.8, and the application name is "Voice Interaction Method, Apparatus and Terminal Equipment" on March 31, 2020, the entire content of which is incorporated herein by reference Applying.
技术领域Technical field
本申请涉及人工智能技术领域,尤其涉及一种语音交互方法、装置和终端设备。This application relates to the field of artificial intelligence technology, and in particular to a voice interaction method, device and terminal equipment.
背景技术Background technique
自然语言处理(Natural Language Processing,NLP)是人工智能(Artificial Intelligence,AI)的重要组成部分,其典型的应用场景包含任务导向型对话系统和机器翻译等。在基于自然语言对话的多轮对话场景中,如何跟踪并确定用户的意图是至关重要的一个环节。在对话状态跟踪(Dialogue State Tracking,DST)的过程中,需要结合历史对话,动态地调整模型,以提取用户语句中所蕴涵的关键信息,进而确定用户意图,结合对话系统完成相应的回应。Natural language processing (Natural Language Processing, NLP) is an important part of artificial intelligence (Artificial Intelligence, AI), and its typical application scenarios include task-oriented dialogue systems and machine translation. In a multi-round dialogue scenario based on natural language dialogue, how to track and determine the user's intention is a crucial link. In the process of Dialogue State Tracking (DST), it is necessary to dynamically adjust the model in combination with historical dialogue to extract the key information contained in the user's sentence, and then determine the user's intention, and complete the corresponding response in conjunction with the dialogue system.
现有技术中,基于机器学习的DST方法需要模型很好地理解多轮对话的内容,这对模型有着极高的要求,很大程度上限制了这类DST方法的精度。然而,由于自然语言的高度抽象性,以及多轮对话的复杂性,当前的机器学习技术很难在实际应用场景中,完整准确地理解多轮对话,即难以准确跟踪多轮对话的状态并确定用户意图。In the prior art, the DST method based on machine learning requires the model to understand the content of multiple rounds of dialogue well, which places extremely high requirements on the model, which largely limits the accuracy of this type of DST method. However, due to the high abstraction of natural language and the complexity of multiple rounds of dialogue, it is difficult for current machine learning technology to fully and accurately understand multiple rounds of dialogue in practical application scenarios, that is, it is difficult to accurately track the state of multiple rounds of dialogue and determine User intent.
发明内容Summary of the invention
本申请实施例提供了一种语音交互方法、装置和终端设备,可以解决现有技术中跟踪多轮对话的状态难度较大,无法准确地确定用户意图的问题。The embodiments of the present application provide a voice interaction method, device, and terminal device, which can solve the problem of the difficulty in tracking the state of multiple rounds of dialogue in the prior art and the inability to accurately determine the user's intention.
第一方面,本申请实施例提供了一种语音交互方法,包括:In the first aspect, an embodiment of the present application provides a voice interaction method, including:
当接收到待回复的用户语句时,获取历史对话数据,并通过命名实体识别模型识别用户语句中的目标实体信息,以及历史对话数据中的历史实体信息;然后,从历史实体信息中提取与用户语句相关联的关键实体信息,从而可以根据目标实体信息和关键实体信息对当前的用户语句进行改写,生成目标交互语句;通过输出与目标交互语句相对应的回复语句,可以满足用户的交互需求。本实施例通过获取历史对话轮次中的实体信息,并结合当前对话轮次中的实体信息对当前对话轮次的语句进行改写,可以将多轮对话中的对话状态跟踪问题转换为单轮对话问题,便于利用现有成熟的单轮对话技术,对用户意图进行回复,有助于提高用户意图识别的准确性,提升对话系统的语言处理能力。When the user sentence to be replied is received, the historical dialogue data is obtained, and the target entity information in the user sentence and the historical entity information in the historical dialogue data are identified through the named entity recognition model; then, the historical entity information is extracted from the historical entity information and the user The key entity information associated with the sentence can rewrite the current user sentence according to the target entity information and the key entity information to generate a target interaction sentence; by outputting a reply sentence corresponding to the target interaction sentence, the user's interaction needs can be met. In this embodiment, by acquiring entity information in historical dialogue rounds, and combining the entity information in the current dialogue round to rewrite the sentences of the current dialogue round, the dialogue state tracking problem in multiple rounds of dialogue can be converted into a single round of dialogue. Problem, it is convenient to use the existing mature single-round dialogue technology to reply to the user's intention, which helps to improve the accuracy of the user's intention recognition and enhance the language processing ability of the dialogue system.
在第一方面的一种可能的实现方式中,从历史实体信息中提取与用户语句相关联的关键实体信息,可以首先根据目标实体信息和历史实体信息,初步确定与用户语句相匹配的候选用户意图;然后分别计算每个历史实体信息在历史对话数据中的分布概率,从而根据分布概率和候选用户意图,可以从历史实体信息中提取出关键实体信息。In a possible implementation of the first aspect, the key entity information associated with the user sentence is extracted from the historical entity information, and the candidate users that match the user sentence can be initially determined based on the target entity information and the historical entity information Intent; then calculate the distribution probability of each historical entity information in the historical dialogue data, so that according to the distribution probability and candidate user intent, the key entity information can be extracted from the historical entity information.
在第一方面的一种可能的实现方式中,分别计算每个历史实体信息在历史对话数据中的分布概率,可以通过调用预设的指针生成网络模型实现,该指针生成网络模型 包括编码模块,可以采用该编码模块分别对每个历史实体信息进行编码,获得与每个历史实体信息相对应的分布概率。In a possible implementation of the first aspect, separately calculating the distribution probability of each historical entity information in the historical dialogue data can be implemented by calling a preset pointer generation network model, and the pointer generation network model includes an encoding module, The encoding module can be used to separately encode each historical entity information to obtain the distribution probability corresponding to each historical entity information.
在第一方面的一种可能的实现方式中,提取关键实体信息时,还可以初步从历史实体信息中提取与任一候选用户意图相关联的候选实体信息;然后再提取分布概率的概率值大于预设概率阈值的候选实体信息,作为关键实体信息。In a possible implementation of the first aspect, when extracting key entity information, it is also possible to initially extract candidate entity information associated with any candidate user’s intention from the historical entity information; and then extract the probability value of the distribution probability greater than Candidate entity information with a preset probability threshold is used as key entity information.
在第一方面的一种可能的实现方式中,对于目标概率值与预设概率阈值之间的差值小于预设差值该所述目标概率值小于预设概率阈值的那些候选实体信息,可以根据目标概率值对应的候选实体信息和关键实体信息,生成询问语句,邀请用户对该目标概率值对应的候选实体信息进行识别;如果接收到用户针对询问语句的确认信息,则可以将该目标概率值对应的候选实体信息确定为关键实体信息,上述目标概率值为任一候选实体信息在历史对话数据中的分布概率的概率值。In a possible implementation manner of the first aspect, for those candidate entity information whose target probability value is less than the preset probability threshold and the difference between the target probability value and the preset probability threshold is less than the preset probability threshold, you can According to the candidate entity information and key entity information corresponding to the target probability value, a query sentence is generated, and the user is invited to identify the candidate entity information corresponding to the target probability value; if the user's confirmation information for the query sentence is received, the target probability can be The candidate entity information corresponding to the value is determined to be the key entity information, and the aforementioned target probability value is the probability value of the distribution probability of any candidate entity information in the historical dialogue data.
在第一方面的一种可能的实现方式中,根据目标实体信息和关键实体信息,生成目标交互语句时,可以首先确定出目标基础语句,然后再采用目标实体信息和关键实体信息,对目标基础语句进行改写,得到目标交互语句,降低直接生成目标交互语句的难度。In a possible implementation of the first aspect, when generating the target interactive sentence based on the target entity information and key entity information, the target basic sentence can be determined first, and then the target entity information and key entity information can be used to determine the target basic sentence. The sentence is rewritten to obtain the target interactive sentence, which reduces the difficulty of directly generating the target interactive sentence.
在第一方面的一种可能的实现方式中,可以从包含目标实体信息的用户语句,以及包含关键实体信息的历史对话数据中获取多个基础语句;然后分别计算多个基础语句与待评估实体信息之间的匹配度,并将匹配度最大值对应的基础语句识别为当前的目标基础语句,上述待评估实体信息包括全部的目标实体信息和关键实体信息。In a possible implementation of the first aspect, multiple basic sentences can be obtained from user sentences containing target entity information and historical dialogue data containing key entity information; and then multiple basic sentences and entities to be evaluated can be calculated separately The matching degree between the information is identified, and the basic sentence corresponding to the maximum value of the matching degree is recognized as the current target basic sentence. The above-mentioned entity information to be evaluated includes all target entity information and key entity information.
在第一方面的一种可能的实现方式中,任一基础语句可以分别包括多个语义槽位,待评估实体信息与多个基础语句之间的匹配度可以基于关键槽位个数与基础语句中的语义槽位个数之间的比值来确定。因此,针对任一基础语句,可以首先统计基础语句中的语义槽位个数,以及待评估实体信息的个数;然后确定基础语句中分别与各个待评估实体信息相匹配的关键槽位个数;待计算出关键槽位个数与基础语句中的语义槽位个数之间的比值后,可以将该比值作为待评估实体信息与基础语句之间的匹配度。In a possible implementation of the first aspect, any basic sentence may respectively include multiple semantic slots, and the matching degree between the entity information to be evaluated and the multiple basic sentences may be based on the number of key slots and the basic sentence To determine the ratio between the number of semantic slots in. Therefore, for any basic sentence, you can first count the number of semantic slots in the basic sentence and the number of entity information to be evaluated; then determine the number of key slots in the basic sentence that match the information of each entity to be evaluated. ; After calculating the ratio between the number of key slots and the number of semantic slots in the basic sentence, the ratio can be used as the matching degree between the entity information to be evaluated and the basic sentence.
在第一方面的一种可能的实现方式中,指针生成网络模型还可以包括解码模块,解码模块可以通过对多种训练数据进行训练得到,上述多种训练数据包括多个实体信息以及与每个实体信息相对应的基础语句。因此,在对目标基础语句进行改写,生成目标交互语句时,可以采用解码模块来完成。具体地,如果目标基础语句为当前的用户语句,则可以采用解码模块对关键实体信息和目标基础语句进行解码,输出目标交互语句;若目标基础语句为历史对话数据中的用户语句,则可以采用解码模块对目标实体信息、关键实体信息和目标基础语句进行解码,输出目标交互语句。In a possible implementation of the first aspect, the pointer generation network model may also include a decoding module, and the decoding module may be obtained by training a variety of training data. The basic sentence corresponding to the entity information. Therefore, when the target basic sentence is rewritten and the target interactive sentence is generated, the decoding module can be used to complete. Specifically, if the target basic sentence is the current user sentence, the decoding module can be used to decode the key entity information and the target basic sentence and output the target interactive sentence; if the target basic sentence is the user sentence in the historical dialogue data, it can be used The decoding module decodes target entity information, key entity information and target basic sentences, and outputs target interactive sentences.
在第一方面的一种可能的实现方式中,在得到目标交互语句之后,还可以对改写得到的目标交互语句是否正确进行验证。本实施例提供了双层验证机制,首先可以提取目标交互语句中的多个实体信息,验证上述多个实体信息是否匹配预设的知识库中目标用户意图的语义槽位,该目标用户意图为候选用户意图中的任意一个。如果目标交互语句中的多个实体信息匹配目标用户意图的语义槽位,则可以判定生成的目标交互语句正确,并执行输出与目标交互语句相对应的回复语句的步骤;如果目标交互语句中的多个实体信息不匹配目标用户意图的语义槽位,则可以根据目标交互语句的语 句类型,对目标交互语句进行二次验证。在进行二次验证时,可以通过调用预设的自然语言理解模型来判断目标交互语句是否为任务型语句。如果是任务型语句,则可以针对该语句输出相应的回复语句;如果不是,则需要提示用户重新输入用户语句,重述用户意图,并根据重新输入的用户语句再次生成目标交互语句。In a possible implementation manner of the first aspect, after obtaining the target interaction sentence, it is also possible to verify whether the rewritten target interaction sentence is correct. This embodiment provides a two-layer verification mechanism. First, multiple entity information in the target interaction sentence can be extracted, and it can be verified whether the multiple entity information matches the preset semantic slot of the target user's intention in the knowledge base, and the target user's intention is Any of the candidate user intents. If multiple entity information in the target interactive sentence matches the semantic slot of the target user’s intention, it can be determined that the generated target interactive sentence is correct, and the step of outputting a reply sentence corresponding to the target interactive sentence is executed; If multiple entity information does not match the semantic slot of the target user's intention, the target interactive sentence can be verified a second time according to the sentence type of the target interactive sentence. In the second verification, it is possible to judge whether the target interactive sentence is a task-type sentence by calling a preset natural language understanding model. If it is a task-type sentence, you can output a corresponding reply sentence for the sentence; if it is not, you need to prompt the user to re-enter the user sentence, restate the user's intention, and generate the target interactive sentence again according to the re-entered user sentence.
第二方面,本申请实施例提供了一种语音交互装置,包括:In the second aspect, an embodiment of the present application provides a voice interaction device, including:
历史对话数据获取模块,用于在接收到待回复的用户语句时,获取历史对话数据;The historical dialogue data acquisition module is used to acquire historical dialogue data when the user sentence to be replied is received;
目标实体信息识别模块,用于识别所述用户语句中的目标实体信息;以及,The target entity information identification module is used to identify the target entity information in the user sentence; and,
历史实体信息识别模块,用于识别所述历史对话数据中的历史实体信息;The historical entity information identification module is used to identify the historical entity information in the historical dialogue data;
关键实体信息提取模块,用于从所述历史实体信息中提取与所述用户语句相关联的关键实体信息;A key entity information extraction module for extracting key entity information associated with the user sentence from the historical entity information;
目标交互语句生成模块,用于根据所述目标实体信息和所述关键实体信息,生成目标交互语句;A target interactive sentence generating module, configured to generate a target interactive sentence according to the target entity information and the key entity information;
回复语句输出模块,用于输出与所述目标交互语句相对应的回复语句。The reply sentence output module is used to output the reply sentence corresponding to the target interactive sentence.
在第二方面的一种可能的实现方式中,关键实体信息提取模块具体可以包括如下子模块:In a possible implementation of the second aspect, the key entity information extraction module may specifically include the following submodules:
候选用户意图确定子模块,用于根据所述目标实体信息和所述历史实体信息,确定与所述用户语句相匹配的候选用户意图;A candidate user intention determination sub-module, configured to determine a candidate user intention that matches the user sentence according to the target entity information and the historical entity information;
分布概率计算子模块,用于分别计算每个历史实体信息在所述历史对话数据中的分布概率;The distribution probability calculation sub-module is used to separately calculate the distribution probability of each historical entity information in the historical dialogue data;
关键实体信息提取子模块,用于根据所述分布概率和所述候选用户意图,从所述历史实体信息中提取关键实体信息。The key entity information extraction sub-module is used to extract key entity information from the historical entity information according to the distribution probability and the candidate user's intention.
在第二方面的一种可能的实现方式中,分布概率计算子模块具体可以包括如下单元:In a possible implementation of the second aspect, the distribution probability calculation submodule may specifically include the following units:
第一指针生成网络模型调用单元,用于调用预设的指针生成网络模型,采用所述指针生成网络模型的编码模块分别对每个历史实体信息进行编码,获得与所述每个历史实体信息相对应的分布概率。The first pointer generation network model calling unit is configured to call a preset pointer generation network model, and use the coding module of the pointer generation network model to respectively encode each historical entity information to obtain information corresponding to each historical entity information. The corresponding distribution probability.
在第二方面的一种可能的实现方式中,关键实体信息提取子模块具体可以包括如下单元:In a possible implementation of the second aspect, the key entity information extraction submodule may specifically include the following units:
候选实体信息提取单元,用于从所述历史实体信息中提取与任一候选用户意图相关联的候选实体信息;A candidate entity information extraction unit, configured to extract candidate entity information associated with any candidate user's intention from the historical entity information;
关键实体信息提取单元,用于提取所述分布概率的概率值大于预设概率阈值的候选实体信息,作为关键实体信息。The key entity information extraction unit is configured to extract candidate entity information whose distribution probability is greater than a preset probability threshold as key entity information.
在第二方面的一种可能的实现方式中,关键实体信息提取子模块还可以包括如下单元:In a possible implementation of the second aspect, the key entity information extraction submodule may further include the following units:
询问语句生成单元,用于若目标概率值与所述预设概率阈值之间的差值小于预设差值,且所述目标概率值小于所述预设概率阈值,则根据所述目标概率值对应的候选实体信息和所述关键实体信息,生成询问语句,以指示用户对所述目标概率值对应的候选实体信息进行识别;The query sentence generating unit is configured to: if the difference between the target probability value and the preset probability threshold is less than the preset difference, and the target probability value is less than the preset probability threshold, then according to the target probability value Corresponding to the candidate entity information and the key entity information, generating a query sentence to instruct the user to identify the candidate entity information corresponding to the target probability value;
关键实体信息确定单元,用于在接收到用户针对所述询问语句的确认信息时,将 所述目标概率值对应的候选实体信息确定为所述关键实体信息,所述目标概率值为任一候选实体信息在所述历史对话数据中的分布概率的概率值。The key entity information determining unit is configured to determine the candidate entity information corresponding to the target probability value as the key entity information when the user's confirmation information for the query sentence is received, and the target probability value is any candidate The probability value of the distribution probability of the entity information in the historical dialogue data.
在第二方面的一种可能的实现方式中,目标交互语句生成模块具体可以包括如下子模块:In a possible implementation of the second aspect, the target interactive sentence generation module may specifically include the following submodules:
目标基础语句确定子模块,用于确定目标基础语句;The target basic sentence determination sub-module is used to determine the target basic sentence;
目标交互语句生成子模块,用于采用所述目标实体信息和所述关键实体信息,对所述目标基础语句进行改写,生成目标交互语句。The target interactive sentence generating sub-module is used to use the target entity information and the key entity information to rewrite the target basic sentence to generate a target interactive sentence.
在第二方面的一种可能的实现方式中,目标基础语句确定子模块具体可以包括如下单元:In a possible implementation of the second aspect, the target basic sentence determination submodule may specifically include the following units:
基础语句获取单元,用于从包含所述目标实体信息的用户语句,以及包含所述关键实体信息的历史对话数据中,获取多个基础语句;The basic sentence obtaining unit is configured to obtain a plurality of basic sentences from the user sentence containing the target entity information and the historical dialogue data containing the key entity information;
匹配度计算单元,用于分别计算所述多个基础语句与所述待评估实体信息之间的匹配度,所述待评估实体信息包括所述目标实体信息和所述关键实体信息;A matching degree calculation unit, configured to calculate the matching degree between the plurality of basic sentences and the entity information to be evaluated, where the entity information to be evaluated includes the target entity information and the key entity information;
目标基础语句识别单元,用于识别所述匹配度最大值对应的基础语句为当前的目标基础语句。The target basic sentence identification unit is used to identify the basic sentence corresponding to the maximum matching degree as the current target basic sentence.
在第二方面的一种可能的实现方式中,任一基础语句可以分别包括多个语义槽位,匹配度计算单元具体可以包括如下子单元:In a possible implementation of the second aspect, any basic sentence may respectively include multiple semantic slots, and the matching degree calculation unit may specifically include the following sub-units:
统计子单元,用于针对任一基础语句,统计所述基础语句中的语义槽位个数,以及所述待评估实体信息的个数;The statistics subunit is used to count the number of semantic slots in the basic sentence and the number of entity information to be evaluated for any basic sentence;
确定子模块,用于确定所述基础语句中分别与所述待评估实体信息相匹配的关键槽位个数;The determining sub-module is used to determine the number of key slots in the basic sentence that respectively match the information of the entity to be evaluated;
计算子单元,用于计算所述关键槽位个数与所述基础语句中的语义槽位个数之间的比值,将所述比值作为所述待评估实体信息与所述基础语句之间的匹配度。The calculation subunit is used to calculate the ratio between the number of key slots and the number of semantic slots in the basic sentence, and use the ratio as the difference between the entity information to be evaluated and the basic sentence suitability.
在第二方面的一种可能的实现方式中,指针生成网络模型还可以包括解码模块,解码模块通过对多种训练数据进行训练得到,上述多种训练数据包括多个实体信息以及与每个实体信息相对应的基础语句;上述目标交互语句生成子模块具体可以包括如下单元:In a possible implementation of the second aspect, the pointer generation network model may also include a decoding module, which is obtained by training a variety of training data. The aforementioned multiple training data includes multiple entity information and information related to each entity. The basic sentence corresponding to the information; the aforementioned target interactive sentence generating sub-module may specifically include the following units:
第二指针生成网络模型调用单元,用于若所述目标基础语句为所述用户语句,则采用所述解码模块对所述关键实体信息和所述目标基础语句进行解码,输出目标交互语句;若所述目标基础语句为所述历史对话数据,则采用所述解码模块对所述目标实体信息和所述目标基础语句进行解码,输出目标交互语句。The second pointer generation network model calling unit is configured to use the decoding module to decode the key entity information and the target basic sentence if the target basic sentence is the user sentence, and output a target interactive sentence; if If the target basic sentence is the historical dialogue data, the decoding module is used to decode the target entity information and the target basic sentence, and output a target interactive sentence.
在第二方面的一种可能的实现方式中,目标交互语句生成子模块还可以包括如下单元:In a possible implementation of the second aspect, the target interactive sentence generation submodule may further include the following units:
目标交互语句实体信息提取单元,用于提取所述目标交互语句中的多个实体信息;The target interactive sentence entity information extraction unit is used to extract multiple entity information in the target interactive sentence;
目标交互语句验证单元,用于验证所述目标交互语句中的多个实体信息是否匹配预设的知识库中目标用户意图的语义槽位,所述目标用户意图为所述候选用户意图中的任意一个;若所述目标交互语句中的多个实体信息匹配所述目标用户意图的语义槽位,则判定生成的所述目标交互语句正确,并输出与所述目标交互语句相对应的回复语句;若所述目标交互语句中的多个实体信息不匹配所述目标用户意图的语义槽位, 则根据所述目标交互语句的语句类型,对所述目标交互语句进行验证。The target interactive sentence verification unit is used to verify whether the multiple entity information in the target interactive sentence matches a preset semantic slot of the target user's intention in the knowledge base, and the target user's intention is any of the candidate user intentions One; if the multiple entity information in the target interaction sentence matches the semantic slot intended by the target user, it is determined that the generated target interaction sentence is correct, and a reply sentence corresponding to the target interaction sentence is output; If the multiple entity information in the target interaction sentence does not match the semantic slot intended by the target user, the target interaction sentence is verified according to the sentence type of the target interaction sentence.
在第二方面的一种可能的实现方式中,所述目标交互语句验证单元还用于调用预设的自然语言理解模型判断所述目标交互语句是否为任务型语句,若所述目标交互语句为任务型语句,则输出与所述目标交互语句相对应的回复语句;若所述目标交互语句不为任务型语句,则提示用户重新输入用户语句,并根据重新输入的用户语句再次生成目标交互语句。In a possible implementation of the second aspect, the target interactive sentence verification unit is further configured to call a preset natural language understanding model to determine whether the target interactive sentence is a task-type sentence, and if the target interactive sentence is Task-type sentence, the reply sentence corresponding to the target interactive sentence is output; if the target interactive sentence is not a task-type sentence, the user is prompted to re-enter the user sentence, and the target interactive sentence is generated again according to the re-input user sentence .
第三方面,本申请实施例提供了一种终端设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现上述第一方面任一项所述的语音交互方法。In the third aspect, an embodiment of the present application provides a terminal device, including a memory, a processor, and a computer program stored in the memory and running on the processor. When the processor executes the computer program, The voice interaction method described in any one of the foregoing first aspect is implemented.
第四方面,本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被终端设备的处理器执行时实现上述第一方面任一项所述的语音交互方法。In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor of a terminal device, any one of the above-mentioned aspects of the first aspect is implemented. The voice interaction method.
第五方面,本申请实施例提供了一种计算机程序产品,当所述计算机程序产品在终端设备上运行时,使得所述终端设备执行上述第一方面中任一项所述的语音交互方法。In a fifth aspect, the embodiments of the present application provide a computer program product, which when the computer program product runs on a terminal device, causes the terminal device to execute the voice interaction method described in any one of the above-mentioned first aspects.
与现有技术相比,本申请实施例包括以下有益效果:Compared with the prior art, the embodiments of the present application include the following beneficial effects:
本申请实施例,通过识别当前对话轮次中的目标实体信息,并从历史对话数据中提取出关键实体信息,可以根据上述两种实体信息确定出用户的实际意图,并根据该意图对当前轮次的用户语句进行改写,生成目标交互语句,从而使得终端设备中的语音助手等应用程序可以根据目标交互语句做出回复。本实施例通过将多轮对话中的DST问题在一定程度上转换为单轮对话问题,可以利用现有成熟的单轮对话技术,对用户意图进行回复,提高对话状态跟踪和用户意图识别的准确性,提升对话系统的自然语言处理能力,增强在多轮对话过程中对话系统回复的合理性,使得系统回复更能匹配用户的实际需求,减少用户与对话系统之间的交互次数。In the embodiment of this application, by identifying the target entity information in the current dialogue round and extracting key entity information from the historical dialogue data, the actual intention of the user can be determined based on the above two kinds of entity information, and the current round can be determined according to the intention. The second user sentence is rewritten to generate a target interactive sentence, so that applications such as the voice assistant in the terminal device can respond according to the target interactive sentence. In this embodiment, by converting the DST questions in multiple rounds of dialogues to a certain extent into single-round dialogue questions, the existing mature single-round dialogue technology can be used to reply to the user's intention, and the accuracy of dialogue state tracking and user intention recognition can be improved. It can improve the natural language processing capabilities of the dialogue system, and enhance the rationality of the dialogue system’s reply during multiple rounds of dialogue, so that the system’s reply can better match the actual needs of the user and reduce the number of interactions between the user and the dialogue system.
附图说明Description of the drawings
图1是现有技术中基于知识库推理的多轮对话状态跟踪方案的运行过程示意图;Figure 1 is a schematic diagram of the operation process of a multi-round dialogue state tracking scheme based on knowledge base reasoning in the prior art;
图2是现有技术中基于学习模型的多轮对话状态跟踪方案的运行过程示意图;2 is a schematic diagram of the operation process of a multi-round dialogue state tracking solution based on a learning model in the prior art;
图3是本申请一实施例提供的语音交互方法的运行过程示意图;FIG. 3 is a schematic diagram of the operation process of a voice interaction method provided by an embodiment of the present application;
图4是本申请一实施例提供的语音交互方法的应用场景示意图;4 is a schematic diagram of an application scenario of a voice interaction method provided by an embodiment of the present application;
图5是本申请一实施例提供的语音交互方法所适用于的手机的硬件结构示意图;5 is a schematic diagram of the hardware structure of a mobile phone to which the voice interaction method provided by an embodiment of the present application is applicable;
图6是本申请一实施例提供的语音交互方法所适用于的手机的软件结构示意图;FIG. 6 is a schematic diagram of the software structure of a mobile phone to which the voice interaction method provided by an embodiment of the present application is applicable;
图7是本申请一实施例提供的语音交互方法的示意性步骤流程图;FIG. 7 is a schematic step flowchart of a voice interaction method provided by an embodiment of the present application;
图8是本申请另一实施例提供的语音交互方法的示意性步骤流程图;FIG. 8 is a schematic step flowchart of a voice interaction method provided by another embodiment of the present application;
图9是本申请一实施例提供的实体信息的分布概率计算过程示意图;FIG. 9 is a schematic diagram of a calculation process of the distribution probability of entity information provided by an embodiment of the present application;
图10是本申请另一实施例提供的语音交互方法的运行过程示意图;FIG. 10 is a schematic diagram of the operation process of a voice interaction method provided by another embodiment of the present application;
图11是本申请一实施例提供的语音交互装置的结构框图;FIG. 11 is a structural block diagram of a voice interaction device provided by an embodiment of the present application;
图12是本申请一实施例提供的终端设备的结构示意图。FIG. 12 is a schematic structural diagram of a terminal device provided by an embodiment of the present application.
具体实施方式Detailed ways
为了便于理解,首先对现有技术中几种典型的多轮对话状态跟踪方案作一介绍。In order to facilitate understanding, firstly, several typical multi-round dialogue state tracking solutions in the prior art are introduced.
如图1所示,是现有技术中的一种多轮对话状态跟踪方案的运行过程示意图。该方案是基于问答(Question&Answering,QA)知识库推理的一种方案,其具体的运行过程为:As shown in FIG. 1, it is a schematic diagram of the operation process of a multi-round dialogue state tracking solution in the prior art. This scheme is a scheme based on question and answer (Question&Answering, QA) knowledge base reasoning, and its specific operation process is as follows:
首先,根据当前多轮对话和当前输入确定多轮对话的关键词,作为当前对话状态的输入,然后在知识库中按照预先定义的规则进行检索,本步骤如图1中框101所示。First, determine the keywords of the multiple rounds of dialogue according to the current multiple rounds of dialogue and the current input, as the input of the current dialogue state, and then search in the knowledge base according to predefined rules. This step is shown in box 101 in FIG. 1.
然后,经检索,可以得到对应的候选多轮对话集合,如图1中框102所示。Then, after searching, the corresponding candidate multi-round dialogue set can be obtained, as shown in box 102 in FIG. 1.
最后,按照预先定义的相似度计算规则,计算当前对话与候选对话的相似度,其具体策略包括:计算当前输入与候选问题的语义相似度作为第一相似度;计算当前输入的上下文与各候选问题上下文的语义相似度作为第二相似度;计算当前多轮对话与各候选多轮对话的摘要信息的相似度作为第三相似度。三个相似度加权求和得到各候选问题与当前输入之间的相似度,将相似度最大的候选问题对应的回复作为输出回复,本步骤如图1中框103所示。Finally, according to the predefined similarity calculation rules, the similarity between the current dialogue and the candidate dialogue is calculated. The specific strategies include: calculating the semantic similarity between the current input and the candidate question as the first similarity; calculating the context of the current input and each candidate The semantic similarity of the question context is used as the second similarity; the similarity between the summary information of the current multiple rounds of dialogue and each candidate multiple rounds of dialogue is calculated as the third similarity. The weighted summation of the three similarities obtains the similarity between each candidate question and the current input, and the response corresponding to the candidate question with the largest similarity is used as the output response. This step is shown in box 103 in FIG. 1.
在图1所示的方案中,多轮对话中的关键信息提取没有主次之分,即没有提取出关于当前轮输入相关的关键信息,提取出来的冗余信息将会影响对话状态跟踪的精度;其次,由于对话状态跟踪的精度很大程度上依赖于知识库的覆盖程度,鉴于现实场景中自然语言对话的复杂性,覆盖广泛的理想知识库事实上是很难得到的;第三,该方案中得到状态跟踪结果的方法依赖于预先定义的各种规则,这也极大地影响了模型的泛化能力和鲁棒性。In the scheme shown in Figure 1, the key information extraction in multiple rounds of dialogue has no primary or secondary distinction, that is, no key information related to the current round of input is extracted, and the extracted redundant information will affect the accuracy of dialogue state tracking. ; Secondly, since the accuracy of dialogue state tracking largely depends on the coverage of the knowledge base, in view of the complexity of natural language dialogue in real scenes, it is actually difficult to obtain an ideal knowledge base with extensive coverage; third, the The method of obtaining the state tracking results in the scheme depends on various pre-defined rules, which also greatly affects the generalization ability and robustness of the model.
如图2所示,是现有技术中的另一种多轮对话状态跟踪方案的运行过程示意图。该方案是一种基于学习模型的DST方案,依次跟踪每一轮对话的状态信息,通过拷贝流的机制以更新当前轮对话的状态,进而实现长期对话状态的跟踪,其具体运行过程包括步骤S201-S204:As shown in FIG. 2, it is a schematic diagram of the operation process of another multi-round dialogue state tracking solution in the prior art. This solution is a DST solution based on a learning model. It tracks the status information of each round of dialogue in turn, and updates the state of the current round of dialogue through the mechanism of copying the stream, thereby realizing the tracking of the long-term dialogue state. The specific operation process includes step S201 -S204:
首先,通过半监督神经网络模型提取当前轮对话和上一轮对话中的关键信息,并生成上述两轮语句对应的关键词序列。First, the key information in the current round of dialogue and the previous round of dialogue is extracted through the semi-supervised neural network model, and the keyword sequences corresponding to the above two rounds of sentences are generated.
然后,采用基于拷贝流机制的新型编码器-解码器网络,通过显示词汇序列表示对话状态信息。拷贝流机制可以将对话历史的信息流通过复制传递,并最终参与本轮对话回复的目标语句的生成。Then, a new encoder-decoder network based on the copy-stream mechanism is adopted to express dialogue status information by displaying a sequence of words. The copy flow mechanism can transmit the information flow of the dialogue history through copying, and finally participate in the generation of the target sentence for the current round of dialogue replies.
最后,根据上面得到的当前轮对话的状态信息,利用解码器模块自动化地生成本轮对话回复的目标语句,进而完成对用户询问的回应。Finally, according to the status information of the current round of dialogue obtained above, the decoder module is used to automatically generate the target sentence of the current round of dialogue reply, and then complete the response to the user's inquiry.
在图2所示的方案中,基于半监督神经网络模型提取历史对话的关键信息,可能导致关键信息丢失或误提取,进而影响对历史对话的理解;其次,通过对历史对话进行逐轮跟踪并更新对话状态,容易导致模型的时间复杂度较高,且容易造成错误累积;第三,该方案过于依赖模型对历史对话的理解能力,现阶段的编码器-解码器类网络难以有较高的精度满足实际的场景需求。In the scheme shown in Figure 2, the key information of historical dialogue is extracted based on the semi-supervised neural network model, which may lead to the loss or mis-extraction of key information, which will affect the understanding of historical dialogue; secondly, the historical dialogue is tracked round by round. Updating the dialogue state will easily lead to higher time complexity of the model and error accumulation. Third, this solution relies too much on the model’s ability to understand historical dialogue, and the encoder-decoder network at this stage is difficult to achieve a higher level of time complexity. The accuracy meets the actual needs of the scene.
为了解决上述问题,提出了本申请实施例的核心构思在于,在对话系统的多轮对话状态跟踪过程中,基于历史对话中的关键信息对当前轮语句进行改写,补全当前轮对话的省略信息,由此将多轮对话问题转换为单轮对话。本申请实施例提供的语音交互方法,基于历史对话的关键信息来跟踪用户意图,在一定程度上克服了现有技术中 各类方案的缺陷,提升了多轮对话中状态跟踪的精度和对话系统回应的准确性。In order to solve the above-mentioned problems, the core idea of the embodiments of the present application is that in the multi-round dialogue state tracking process of the dialogue system, the sentence of the current round is rewritten based on the key information in the historical dialogue to complete the omitted information of the current round of dialogue. , Thereby converting multiple rounds of dialogue questions into a single round of dialogue. The voice interaction method provided by the embodiments of the present application tracks user intentions based on key information of historical dialogues, overcomes the shortcomings of various solutions in the prior art to a certain extent, and improves the accuracy of state tracking in multiple rounds of dialogue and the dialogue system The accuracy of the response.
本申请实施例提供的语音交互方法,使用命名实体识别(Named Entity Recognition,NER)模块提取出历史对话中的实体信息后,利用知识库(Knowledge Bases,KBs)和指针生成网络(Pointer-Generator Networks,PGN)模型计算得到历史对话中的实体信息的注意力分布和当前轮对话中的实体,通过对历史对话中的实体信息进行筛选,摒弃冗余的实体,确定出参与当前轮对话状态跟踪的关键信息。这样的处理方式不仅降低了冗余信息对对话状态跟踪的影响,也为后续的步骤提供了有效的关键信息。The voice interaction method provided by the embodiments of this application uses a named entity recognition (Named Entity Recognition, NER) module to extract entity information in historical conversations, and then uses Knowledge Bases (KBs) and Pointer-Generator Networks , PGN) model calculates the attention distribution of the entity information in the historical dialogue and the entities in the current round of dialogue. By filtering the entity information in the historical dialogue, redundant entities are discarded, and the participants in the current round of dialogue state tracking are determined Key Information. Such a processing method not only reduces the impact of redundant information on dialogue status tracking, but also provides effective key information for subsequent steps.
然后,结合知识库以及有监督的前馈神经网络,对关键实体信息在对话状态中的作用(以分布概率进行表示)进行计算,并将前馈神经网络作为整个模型的一部分,以确定其是否直接影响对话状态。这样避免了逐轮跟踪对话状态,减少了错误的累积。在基于PGN模型解码步骤对当前轮语句进行改写后,利用成熟的单轮对话相关模块来生成多轮对话的回复语句,提高对话系统回应的准确性。Then, combined with the knowledge base and the supervised feedforward neural network, the role of key entity information in the dialogue state (represented by the distribution probability) is calculated, and the feedforward neural network is used as part of the entire model to determine whether it is Directly affect the dialogue state. This avoids tracking the dialogue state round by round and reduces the accumulation of errors. After rewriting the current round of sentences based on the decoding steps of the PGN model, a mature single-round dialogue related module is used to generate multiple rounds of dialogue reply sentences to improve the accuracy of the dialogue system's response.
如图3所示,是本申请一实施例提供的语音交互方法的运行过程示意图。按照图3所示的运行过程,本方法首先通过提取历史对话中与对话状态直接相关的关键信息,再结合模型对当前轮语句进行改写,以完成对话状态信息的跟踪与融合。然后,利用已有的单轮对话处理模块,在改写语句的基础上,完成用户在多轮对话中进行询问的对应回复。As shown in FIG. 3, it is a schematic diagram of the operation process of the voice interaction method provided by an embodiment of the present application. According to the running process shown in Figure 3, this method first extracts key information directly related to the dialogue state in the historical dialogue, and then rewrites the current round of sentences in combination with the model to complete the tracking and fusion of dialogue state information. Then, using the existing single-round dialogue processing module, on the basis of rewriting the sentence, the corresponding reply to the user's inquiry in the multi-round dialogue is completed.
基于上述运行过程,本方法的运行过程可以通过如下多个模块来实现:Based on the above operation process, the operation process of this method can be realized by the following multiple modules:
1、实体提取模块1. Entity extraction module
用于提取出历史对话中的实体信息,作为候选关键信息。Used to extract entity information in the historical dialogue as candidate key information.
2、筛选实体模块2. Screening entity modules
结合KBs、PGN模型中计算得到的关于实体信息的注意力分布和当前轮语句中的实体,筛选出与当前轮对话状态相关的关键信息。Combining the attention distribution of the entity information calculated in the KBs and PGN models and the entities in the current round of sentences to filter out the key information related to the current round of dialogue status.
3、关键信息分布预测模块3. Key information distribution prediction module
利用KBs和关键信息,基于有监督的前馈网络计算关键信息的概率分布。Using KBs and key information, the probability distribution of key information is calculated based on a supervised feedforward network.
4、语句改写模块4. Statement rewriting module
基于关键信息的概率分布,利用PGN模型的解码器环节对当前轮语句进行补全或改写。Based on the probability distribution of key information, the decoder link of the PGN model is used to complete or rewrite the current round of sentences.
5、生成回复模块5. Generate reply module
利用现有的成熟的单轮对话系统处理技术,生成多轮对话中的回复语句。Utilize the existing mature single-round dialogue system processing technology to generate reply sentences in multiple rounds of dialogue.
如图4所示,是本申请一实施例提供的语音交互方法的应用场景示意图。在图4所示的典型应用场景中,第一轮对话中,用户的询问语句为:“请告诉我最近的餐馆。”针对该询问,对话系统回复:“最近的餐馆是海淀区农大南路的海底捞。”紧接着,在第二轮对话中,用户回复:“好的。”并继续询问“周五的温度是多少啊?”此时,用户的询问语句中并未包含任何与地点相关的词语,对话系统将询问用户需要查询哪个地方的温度,即“请问您想查询哪个城市周五的温度?”用户回复:“北京市。”当前对话即可看作是当前轮对话,前两轮对话为历史对话。As shown in FIG. 4, it is a schematic diagram of an application scenario of a voice interaction method provided by an embodiment of the present application. In the typical application scenario shown in Figure 4, in the first round of dialogue, the user’s query sentence is: "Please tell me the nearest restaurant." In response to the query, the dialogue system replied: "The nearest restaurant is on Nongda South Road, Haidian District. Haidilao." Then, in the second round of dialogue, the user replied: "Okay." And continued to ask "What is the temperature on Friday?" At this time, the user's query statement did not contain any location-related The dialogue system will ask the user which place the temperature needs to be inquired, that is, "Which city do you want to check the temperature on Friday?" The user replied: "Beijing." The current dialogue can be regarded as the current round of dialogue, the first two The round of dialogue is a historical dialogue.
针对历史对话,可以基于实体提取模块提取出历史对话中的实体,包括“最近”、“餐馆”、“海淀区农大南路”、“海底捞”、“周五”和“温度”等。在此基础上, 利用筛选实体模块,结合预定义的KBs和PGN模型计算上述实体在历史对话中的概率分布,得到与对话状态相关的关键实体,即“周五”和“温度”,历史对话中的其他实体如“最近”、“餐馆”、“海淀区农大南路”和“海底捞”均为冗余实体。对话系统可以根据得到的关键实体确定基础改写语句,即“周五的温度是多少啊?”For historical dialogues, entities in historical dialogues can be extracted based on the entity extraction module, including "Recent", "Restaurant", "Nongda South Road in Haidian District", "Haidilao", "Friday" and "Temperature". On this basis, use the screening entity module and combine the predefined KBs and PGN models to calculate the probability distribution of the above entities in the historical dialogue, and obtain the key entities related to the dialogue state, namely "Friday" and "Temperature", historical dialogue Other entities such as "Xinyi", "Restaurant", "Nongda South Road in Haidian District" and "Haidilao" are redundant entities. The dialogue system can determine the basic rewrite sentence based on the key entities obtained, that is, "What is the temperature on Friday?"
具体地,可以利用关键信息分布预测模块,基于前馈网络进一步预测关键信息与当前对话状态相关的概率分布,得到“温度”的概率为0.86,“周五”的概率为0.72。对于概率值在阈值0.8附近但低于该阈值的实体,对话系统可以邀请用户参与配置,即图4中对话系统可以向用户询问:“请问您是想查询北京市“周五”的温度信息吗?”然后利用语句改写模块,生成改写语句“周五北京市的温度是多少啊?”Specifically, the key information distribution prediction module can be used to further predict the probability distribution of key information related to the current dialogue state based on the feedforward network, and the probability of obtaining "temperature" is 0.86, and the probability of "Friday" is 0.72. For entities with a probability value near the threshold 0.8 but lower than the threshold, the dialogue system can invite users to participate in the configuration, that is, the dialogue system in Figure 4 can ask the user: "Do you want to query the temperature information of "Friday" in Beijing ?" Then use the sentence rewrite module to generate the rewritten sentence "What is the temperature in Beijing on Friday?"
对于改写后的用户语句,对话系统可以利用生成回复模块,生成对改写语句的回复,即图4中对话系统的回复:“周五北京市的温度是……”For the rewritten user sentence, the dialogue system can use the generating reply module to generate a reply to the rewritten sentence, that is, the reply of the dialogue system in Figure 4: "The temperature in Beijing on Friday is..."
需要说明的是,基础改写语句可能来自历史对话数据,也可以来自当前对话。即,基础语句可以是从历史对话中的某一用户语句,或者当前的用户语句中选择得到的。其标准可以根据该语句中所包含的目标实体信息及关键实体信息的数量多少来确定。例如,在上述示例中,历史对话“周五的温度是多少啊?”中包含有“周五”和“温度”两个关键实体信息,而当前对话“北京市”中仅包含一个目标实体信息,因此可以选择历史对话“周五的温度是多少啊?”来作为基础改写语句。It should be noted that the basic rewritten sentence may come from historical dialogue data or from the current dialogue. That is, the basic sentence may be selected from a certain user sentence in the historical dialogue or the current user sentence. The standard can be determined according to the number of target entity information and key entity information contained in the sentence. For example, in the above example, the historical dialogue "What is the temperature on Friday?" contains two key entity information, "Friday" and "Temperature", while the current dialogue "Beijing" contains only one target entity information. , So you can choose the historical dialogue "What's the temperature on Friday?" as the basis for rewriting the sentence.
下面结合具体的实施例,对本申请的语音交互进行详细介绍。The voice interaction of the present application will be described in detail below in conjunction with specific embodiments.
以下描述中,为了说明而不是为了限定,提出了诸如特定系统结构、技术之类的具体细节,以便透彻理解本申请实施例。然而,本领域技术人员应当清楚,在没有这些具体细节的其他实施例中也可以实现本申请。在其他情况中,省略对众所周知的系统、装置、电路以及方法的详细说明,以免不必要的细节妨碍本申请的描述。In the following description, for the purpose of illustration rather than limitation, specific details such as a specific system structure and technology are proposed for a thorough understanding of the embodiments of the present application. However, it should be clear to those skilled in the art that the present application can also be implemented in other embodiments without these specific details. In other cases, detailed descriptions of well-known systems, devices, circuits, and methods are omitted to avoid unnecessary details from obstructing the description of this application.
以下实施例中所使用的术语只是为了描述特定实施例的目的,而并非旨在作为对本申请的限制。如在本申请的说明书和所附权利要求书中所使用的那样,单数表达形式“一个”、“一种”、“所述”、“上述”、“该”和“这一”旨在也包括例如“一个或多个”这种表达形式,除非其上下文中明确地有相反指示。还应当理解,在本申请实施例中,“一个或多个”是指一个、两个或两个以上;“和/或”,描述关联对象的关联关系,表示可以存在三种关系;例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B的情况,其中A、B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。The terms used in the following embodiments are only for the purpose of describing specific embodiments, and are not intended to limit the application. As used in the specification and appended claims of this application, the singular expressions "a", "an", "said", "above", "the" and "this" are intended to also This includes expressions such as "one or more" unless the context clearly indicates to the contrary. It should also be understood that in the embodiments of the present application, "one or more" refers to one, two or more than two; "and/or" describes the association relationship of the associated objects, indicating that there may be three relationships; for example, A and/or B can mean the situation where A exists alone, A and B exist at the same time, and B exists alone, where A and B can be singular or plural. The character "/" generally indicates that the associated objects before and after are in an "or" relationship.
本申请实施例提供的语音交互方法可以应用于手机、平板电脑、可穿戴设备、车载设备、增强现实(augmented reality,AR)/虚拟现实(virtual reality,VR)设备、笔记本电脑、超级移动个人计算机(ultra-mobile personal computer,UMPC)、上网本、个人数字助理(personal digital assistant,PDA)等终端设备上,本申请实施例对终端设备的具体类型不作任何限制。The voice interaction method provided by the embodiments of this application can be applied to mobile phones, tablet computers, wearable devices, in-vehicle devices, augmented reality (AR)/virtual reality (VR) devices, notebook computers, and super mobile personal computers For terminal devices (ultra-mobile personal computer, UMPC), netbooks, and personal digital assistants (personal digital assistant, PDA), the embodiments of this application do not impose any restrictions on the specific types of terminal devices.
以终端设备为手机为例。图5示出的是与本申请实施例提供的手机的部分结构的框图。参考图5,手机包括:射频(Radio Frequency,RF)电路510、存储器520、输入单元530、显示单元540、传感器550、音频电路560、无线保真(wireless fidelity, Wi-Fi)模块570、处理器580、以及电源590等部件。本领域技术人员可以理解,图5中示出的手机结构并不构成对手机的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。Take the terminal device as a mobile phone as an example. Fig. 5 shows a block diagram of a part of the structure of a mobile phone provided by an embodiment of the present application. 5, the mobile phone includes: a radio frequency (RF) circuit 510, a memory 520, an input unit 530, a display unit 540, a sensor 550, an audio circuit 560, a wireless fidelity (Wi-Fi) module 570, a processing 580, and power supply 590. Those skilled in the art can understand that the structure of the mobile phone shown in FIG. 5 does not constitute a limitation on the mobile phone, and may include more or less components than those shown in the figure, or a combination of certain components, or different component arrangements.
下面结合图5对手机的各个构成部件进行具体的介绍:The following is a detailed introduction to each component of the mobile phone in conjunction with Figure 5:
RF电路510可用于收发信息或通话过程中,信号的接收和发送,特别地,将基站的下行信息接收后,给处理器580处理;另外,将设计上行的数据发送给基站。通常,RF电路包括但不限于天线、至少一个放大器、收发信机、耦合器、低噪声放大器(Low Noise Amplifier,LNA)、双工器等。此外,RF电路510还可以通过无线通信与网络和其他设备通信。上述无线通信可以使用任一通信标准或协议,包括但不限于全球移动通讯系统(Global System of Mobile communication,GSM)、通用分组无线服务(General Packet Radio Service,GPRS)、码分多址(Code Division Multiple Access,CDMA)、宽带码分多址(Wideband Code Division Multiple Access,WCDMA)、长期演进(Long Term Evolution,LTE))、电子邮件、短消息服务(Short Messaging Service,SMS)等。The RF circuit 510 can be used for receiving and sending signals during information transmission or communication. In particular, after receiving the downlink information of the base station, it is processed by the processor 580; in addition, the designed uplink data is sent to the base station. Generally, the RF circuit includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier (LNA), a duplexer, and the like. In addition, the RF circuit 510 can also communicate with the network and other devices through wireless communication. The above-mentioned wireless communication can use any communication standard or protocol, including but not limited to Global System of Mobile Communication (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (Code Division) Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), Long Term Evolution (LTE), Email, Short Messaging Service (SMS), etc.
存储器520可用于存储软件程序以及模块,处理器580通过运行存储在存储器520的软件程序以及模块,从而执行手机的各种功能应用以及数据处理。存储器520可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序(比如声音播放功能、图像播放功能等)等;存储数据区可存储根据手机的使用所创建的数据(比如音频数据、电话本等)等。此外,存储器520可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。The memory 520 may be used to store software programs and modules. The processor 580 executes various functional applications and data processing of the mobile phone by running the software programs and modules stored in the memory 520. The memory 520 may mainly include a program storage area and a data storage area. The program storage area may store an operating system, an application program required by at least one function (such as a sound playback function, an image playback function, etc.), etc.; Data created by the use of mobile phones (such as audio data, phone book, etc.), etc. In addition, the memory 520 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, or other volatile solid-state storage devices.
输入单元530可用于接收输入的数字或字符信息,以及产生与手机500的用户设置以及功能控制有关的键信号输入。具体地,输入单元530可包括触控面板531以及其他输入设备532。触控面板531,也称为触摸屏,可收集用户在其上或附近的触摸操作(比如用户使用手指、触笔等任何适合的物体或附件在触控面板531上或在触控面板531附近的操作),并根据预先设定的程式驱动相应的连接装置。可选的,触控面板531可包括触摸检测装置和触摸控制器两个部分。其中,触摸检测装置检测用户的触摸方位,并检测触摸操作带来的信号,将信号传送给触摸控制器;触摸控制器从触摸检测装置上接收触摸信息,并将它转换成触点坐标,再送给处理器580,并能接收处理器580发来的命令并加以执行。此外,可以采用电阻式、电容式、红外线以及表面声波等多种类型实现触控面板531。除了触控面板531,输入单元530还可以包括其他输入设备532。具体地,其他输入设备532可以包括但不限于物理键盘、功能键(比如音量控制按键、开关按键等)、轨迹球、鼠标、操作杆等中的一种或多种。The input unit 530 may be used to receive inputted digital or character information, and generate key signal input related to user settings and function control of the mobile phone 500. Specifically, the input unit 530 may include a touch panel 531 and other input devices 532. The touch panel 531, also called a touch screen, can collect the user's touch operations on or near it (for example, the user uses any suitable objects or accessories such as fingers, stylus, etc.) on the touch panel 531 or near the touch panel 531. Operation), and drive the corresponding connection device according to the preset program. Optionally, the touch panel 531 may include two parts: a touch detection device and a touch controller. Among them, the touch detection device detects the user's touch position, detects the signal brought by the touch operation, and transmits the signal to the touch controller; the touch controller receives the touch information from the touch detection device, converts it into contact coordinates, and then sends it To the processor 580, and can receive and execute the commands sent by the processor 580. In addition, the touch panel 531 can be implemented in multiple types such as resistive, capacitive, infrared, and surface acoustic wave. In addition to the touch panel 531, the input unit 530 may also include other input devices 532. Specifically, the other input device 532 may include, but is not limited to, one or more of a physical keyboard, function keys (such as a volume control button, a switch button, etc.), a trackball, a mouse, and a joystick.
显示单元540可用于显示由用户输入的信息或提供给用户的信息以及手机的各种菜单。显示单元540可包括显示面板541,可选的,可以采用液晶显示器(Liquid Crystal Display,LCD)、有机发光二极管(Organic Light-Emitting Diode,OLED)等形式来配置显示面板541。进一步的,触控面板531可覆盖显示面板541,当触控面板531检测到在其上或附近的触摸操作后,传送给处理器580以确定触摸事件的类型,随后处理器580根据触摸事件的类型在显示面板541上提供相应的视觉输出。虽然在图5 中,触控面板531与显示面板541是作为两个独立的部件来实现手机的输入和输入功能,但是在某些实施例中,可以将触控面板531与显示面板541集成而实现手机的输入和输出功能。The display unit 540 may be used to display information input by the user or information provided to the user and various menus of the mobile phone. The display unit 540 may include a display panel 541. Optionally, the display panel 541 may be configured in the form of a liquid crystal display (Liquid Crystal Display, LCD), an organic light-emitting diode (Organic Light-Emitting Diode, OLED), etc. Further, the touch panel 531 can cover the display panel 541. When the touch panel 531 detects a touch operation on or near it, it is transmitted to the processor 580 to determine the type of the touch event, and then the processor 580 determines the type of the touch event. The type provides corresponding visual output on the display panel 541. Although in FIG. 5, the touch panel 531 and the display panel 541 are used as two independent components to realize the input and input functions of the mobile phone, but in some embodiments, the touch panel 531 and the display panel 541 can be integrated. Realize the input and output functions of the mobile phone.
手机500还可包括至少一种传感器550,比如光传感器、运动传感器以及其他传感器。具体地,光传感器可包括环境光传感器及接近传感器,其中,环境光传感器可根据环境光线的明暗来调节显示面板541的亮度,接近传感器可在手机移动到耳边时,关闭显示面板541和/或背光。作为运动传感器的一种,加速计传感器可检测各个方向上(一般为三轴)加速度的大小,静止时可检测出重力的大小及方向,可用于识别手机姿态的应用(比如横竖屏切换、相关游戏、磁力计姿态校准)、振动识别相关功能(比如计步器、敲击)等;至于手机还可配置的陀螺仪、气压计、湿度计、温度计、红外线传感器等其他传感器,在此不再赘述。The mobile phone 500 may also include at least one sensor 550, such as a light sensor, a motion sensor, and other sensors. Specifically, the light sensor can include an ambient light sensor and a proximity sensor. The ambient light sensor can adjust the brightness of the display panel 541 according to the brightness of the ambient light. The proximity sensor can close the display panel 541 and/or when the mobile phone is moved to the ear. Or backlight. As a kind of motion sensor, the accelerometer sensor can detect the magnitude of acceleration in various directions (usually three-axis), and can detect the magnitude and direction of gravity when it is stationary. It can be used to identify mobile phone posture applications (such as horizontal and vertical screen switching, related Games, magnetometer posture calibration), vibration recognition related functions (such as pedometer, percussion), etc.; as for other sensors such as gyroscopes, barometers, hygrometers, thermometers, infrared sensors, etc., which can also be configured in mobile phones, I will not here Go into details.
音频电路560、扬声器561,传声器562可提供用户与手机之间的音频接口。音频电路560可将接收到的音频数据转换后的电信号,传输到扬声器561,由扬声器561转换为声音信号输出;另一方面,传声器562将收集的声音信号转换为电信号,由音频电路560接收后转换为音频数据,再将音频数据输出处理器580处理后,经RF电路510以发送给比如另一手机,或者将音频数据输出至存储器520以便进一步处理。The audio circuit 560, the speaker 561, and the microphone 562 can provide an audio interface between the user and the mobile phone. The audio circuit 560 can transmit the electric signal converted from the received audio data to the speaker 561, and the speaker 561 converts it into a sound signal for output; on the other hand, the microphone 562 converts the collected sound signal into an electric signal, and the audio circuit 560 After being received, it is converted into audio data, and then processed by the audio data output processor 580, and sent to, for example, another mobile phone via the RF circuit 510, or the audio data is output to the memory 520 for further processing.
Wi-Fi属于短距离无线传输技术,手机通过Wi-Fi模块570可以帮助用户收发电子邮件、浏览网页和访问流式媒体等,它为用户提供了无线的宽带互联网访问。虽然图5示出了Wi-Fi模块570,但是可以理解的是,其并不属于手机500的必须构成,完全可以根据需要在不改变发明的本质的范围内而省略。Wi-Fi is a short-distance wireless transmission technology. Through the Wi-Fi module 570, mobile phones can help users send and receive emails, browse web pages, and access streaming media. It provides users with wireless broadband Internet access. Although FIG. 5 shows the Wi-Fi module 570, it is understandable that it is not a necessary component of the mobile phone 500, and can be omitted as needed without changing the essence of the invention.
处理器580是手机的控制中心,利用各种接口和线路连接整个手机的各个部分,通过运行或执行存储在存储器520内的软件程序和/或模块,以及调用存储在存储器520内的数据,执行手机的各种功能和处理数据,从而对手机进行整体监控。可选的,处理器580可包括一个或多个处理单元;优选的,处理器580可集成应用处理器和调制解调处理器,其中,应用处理器主要处理操作系统、用户界面和应用程序等,调制解调处理器主要处理无线通信。可以理解的是,上述调制解调处理器也可以不集成到处理器580中。The processor 580 is the control center of the mobile phone. It uses various interfaces and lines to connect various parts of the entire mobile phone. Various functions and processing data of the mobile phone can be used to monitor the mobile phone as a whole. Optionally, the processor 580 may include one or more processing units; preferably, the processor 580 may integrate an application processor and a modem processor, where the application processor mainly processes the operating system, user interface, application programs, etc. , The modem processor mainly deals with wireless communication. It can be understood that the foregoing modem processor may not be integrated into the processor 580.
手机500还包括给各个部件供电的电源590(比如电池),优选的,电源可以通过电源管理系统与处理器580逻辑相连,从而通过电源管理系统实现管理充电、放电、以及功耗管理等功能。The mobile phone 500 also includes a power source 590 (such as a battery) for supplying power to various components. Preferably, the power source can be logically connected to the processor 580 through a power management system, so that functions such as charging, discharging, and power consumption management can be managed through the power management system.
尽管未示出,手机500还可以包括摄像头。可选地,摄像头在手机500的上的位置可以为前置的,也可以为后置的,本申请实施例对此不作限定。Although not shown, the mobile phone 500 may also include a camera. Optionally, the position of the camera on the mobile phone 500 may be front or rear, which is not limited in the embodiment of the present application.
可选地,手机500可以包括单摄像头、双摄像头或三摄像头等,本申请实施例对此不作限定。Optionally, the mobile phone 500 may include a single camera, a dual camera, or a triple camera, etc., which is not limited in the embodiment of the present application.
例如,手机500可以包括三摄像头,其中,一个为主摄像头、一个为广角摄像头、一个为长焦摄像头。For example, the mobile phone 500 may include three cameras, of which one is a main camera, one is a wide-angle camera, and one is a telephoto camera.
可选地,当手机500包括多个摄像头时,这多个摄像头可以全部前置,或者全部后置,或者一部分前置、另一部分后置,本申请实施例对此不作限定。Optionally, when the mobile phone 500 includes multiple cameras, the multiple cameras may be all front-mounted, or all rear-mounted, or partly front-mounted and another part rear-mounted, which is not limited in the embodiment of the present application.
另外,尽管未示出,手机500还可以包括蓝牙模块等,在此不再赘述。In addition, although not shown, the mobile phone 500 may also include a Bluetooth module, etc., which will not be repeated here.
图6是本申请实施例的手机500的软件结构示意图。以手机500操作系统为Android系统为例,在一些实施例中,将Android系统分为四层,分别为应用程序层、应用程序框架层(framework,FWK)、系统层以及硬件抽象层,层与层之间通过软件接口通信。FIG. 6 is a schematic diagram of the software structure of a mobile phone 500 according to an embodiment of the present application. Taking the mobile phone 500 operating system as the Android system as an example, in some embodiments, the Android system is divided into four layers, namely the application layer, the application framework layer (framework, FWK), the system layer, and the hardware abstraction layer. Communication between the layers through software interface.
如图6所示,所述应用程序层可以包括一系列应用程序包,应用程序包可以包括短信息,日历,相机,视频,导航,图库,通话等应用程序。As shown in Figure 6, the application layer may include a series of application packages, which may include applications such as short message, calendar, camera, video, navigation, gallery, and call.
应用程序框架层为应用程序层的应用程序提供应用编程接口(application programming interface,API)和编程框架。应用程序框架层可以包括一些预先定义的函数,例如用于接收应用程序框架层所发送的事件的函数。The application framework layer provides an application programming interface (application programming interface, API) and a programming framework for applications in the application layer. The application framework layer may include some predefined functions, such as functions for receiving events sent by the application framework layer.
如图6所示,应用程序框架层可以包括窗口管理器、资源管理器以及通知管理器等。As shown in Figure 6, the application framework layer can include a window manager, a resource manager, and a notification manager.
窗口管理器用于管理窗口程序。窗口管理器可以获取显示屏大小,判断是否有状态栏,锁定屏幕,截取屏幕等。内容提供器用来存放和获取数据,并使这些数据可以被应用程序访问。所述数据可以包括视频,图像,音频,拨打和接听的电话,浏览历史和书签,电话簿等。The window manager is used to manage window programs. The window manager can obtain the size of the display screen, determine whether there is a status bar, lock the screen, take a screenshot, etc. The content provider is used to store and retrieve data and make these data accessible to applications. The data may include videos, images, audios, phone calls made and received, browsing history and bookmarks, phone book, etc.
资源管理器为应用程序提供各种资源,比如本地化字符串,图标,图片,布局文件,视频文件等等。The resource manager provides various resources for the application, such as localized strings, icons, pictures, layout files, video files, and so on.
通知管理器使应用程序可以在状态栏中显示通知信息,可以用于传达告知类型的消息,可以短暂停留后自动消失,无需用户交互。比如通知管理器被用于告知下载完成,消息提醒等。通知管理器还可以是以图表或者滚动条文本形式出现在系统顶部状态栏的通知,例如后台运行的应用程序的通知,还可以是以对话窗口形式出现在屏幕上的通知。例如在状态栏提示文本信息,发出提示音,电子设备振动,指示灯闪烁等。The notification manager enables the application to display notification information in the status bar, which can be used to convey notification-type messages, and it can automatically disappear after a short stay without user interaction. For example, the notification manager is used to notify download completion, message reminders, and so on. The notification manager can also be a notification that appears in the status bar at the top of the system in the form of a chart or a scroll bar text, such as a notification of an application running in the background, or a notification that appears on the screen in the form of a dialog window. For example, text messages are prompted in the status bar, prompt sounds, electronic devices vibrate, and indicator lights flash.
应用程序框架层还可以包括:The application framework layer can also include:
视图系统,所述视图系统包括可视控件,例如显示文字的控件,显示图片的控件等。视图系统可用于构建应用程序。显示界面可以由一个或多个视图组成的。例如,包括短信通知图标的显示界面,可以包括显示文字的视图以及显示图片的视图。A view system, which includes visual controls, such as controls that display text, controls that display pictures, and so on. The view system can be used to build applications. The display interface can be composed of one or more views. For example, a display interface that includes a short message notification icon may include a view that displays text and a view that displays pictures.
电话管理器用于提供手机500的通信功能。例如通话状态的管理(包括接通,挂断等)。The phone manager is used to provide the communication function of the mobile phone 500. For example, the management of the call status (including connecting, hanging up, etc.).
系统层可以包括多个功能模块。例如:传感器服务模块,物理状态识别模块,三维图形处理库(例如:OpenGL ES)等。The system layer can include multiple functional modules. For example: sensor service module, physical state recognition module, 3D graphics processing library (for example: OpenGL ES), etc.
传感器服务模块,用于对硬件层各类传感器上传的传感器数据进行监测,确定手机500的物理状态;The sensor service module is used to monitor the sensor data uploaded by various sensors at the hardware layer and determine the physical state of the mobile phone 500;
物理状态识别模块,用于对用户手势、人脸等进行分析和识别;Physical state recognition module, used to analyze and recognize user gestures, faces, etc.;
三维图形处理库用于实现三维图形绘图,图像渲染,合成,和图层处理等。The 3D graphics processing library is used to implement 3D graphics drawing, image rendering, synthesis, and layer processing.
系统层还可以包括:The system layer can also include:
表面管理器用于对显示子系统进行管理,并且为多个应用程序提供了2D和3D图层的融合。The surface manager is used to manage the display subsystem and provides a combination of 2D and 3D layers for multiple applications.
媒体库支持多种常用的音频,视频格式回放和录制,以及静态图像文件等。媒体库可以支持多种音视频编码格式,例如:MPEG4,H.264,MP3,AAC,AMR,JPG, PNG等。The media library supports playback and recording of a variety of commonly used audio and video formats, as well as still image files. The media library can support a variety of audio and video encoding formats, such as: MPEG4, H.264, MP3, AAC, AMR, JPG, PNG, etc.
硬件抽象层是硬件和软件之间的层。硬件抽象层可以包括显示驱动,摄像头驱动,传感器驱动等,用于驱动硬件层的相关硬件,如显示屏、摄像头、传感器等。The hardware abstraction layer is the layer between hardware and software. The hardware abstraction layer can include display drivers, camera drivers, sensor drivers, etc., used to drive related hardware at the hardware layer, such as display screens, cameras, sensors, and so on.
以下实施例可以在具有上述硬件结构/软件结构的手机500上实现。以下实施例将以手机500为例,对本申请实施例提供的语音交互方法进行说明。The following embodiments can be implemented on the mobile phone 500 having the above hardware structure/software structure. The following embodiments will take the mobile phone 500 as an example to describe the voice interaction method provided by the embodiments of the present application.
参照图7,示出了本申请一实施例提供的语音交互方法的示意性步骤流程图,作为示例而非限定,该方法可以应用于上述手机500中,该方法具体可以包括如下步骤:Referring to FIG. 7, a schematic step flowchart of a voice interaction method provided by an embodiment of the present application is shown. As an example and not a limitation, the method may be applied to the above-mentioned mobile phone 500, and the method may specifically include the following steps:
S701、当接收到待回复的用户语句时,获取历史对话数据。S701: When a user sentence to be replied is received, historical dialogue data is obtained.
在本申请实施例中,用户语句可以是用户在终端设备中使用语音助手等应用程序时直接说出的某个句子。例如,用户若希望查询明天的天气情况,该用户可以唤醒手机中的语音助手,并说出“明天的天气如何”或类似的句子。In the embodiment of the present application, the user sentence may be a certain sentence directly spoken by the user when using an application such as a voice assistant in a terminal device. For example, if a user wants to inquire about the weather tomorrow, the user can wake up the voice assistant in the phone and say "what is the weather tomorrow" or a similar sentence.
通常,用户可以通过与语音助手进行多轮对话来促使语音助手完整、准确地理解用户意图,并返回满足该意图的信息。本实施例中待回复的用户语句可以是用户在非第一轮对话过程中所说出的句子或词语,即语音助手在接收到上述待回复的用户语句之前,至少已经与用户完成了一轮对话。Generally, the user can make multiple rounds of dialogue with the voice assistant to prompt the voice assistant to fully and accurately understand the user's intention, and return information that satisfies the intention. The user sentence to be replied in this embodiment may be a sentence or word spoken by the user during a non-first round of dialogue, that is, the voice assistant has completed at least one round with the user before receiving the user sentence to be replied. dialogue.
在本申请实施例中,为了更好地理解用户意图,在接收到用户语句后,语音助手等程序可以获取在当前对话过程中,用户与语音助手之前数轮对话的对话数据,结合历史对话数据确定本轮对话中用户的真实意图。In the embodiment of the present application, in order to better understand the user's intention, after receiving the user sentence, the voice assistant and other programs can obtain the dialogue data of the previous rounds of dialogue between the user and the voice assistant in the current dialogue process, combined with historical dialogue data Determine the real intention of the user in this round of dialogue.
在具体实现中,历史对话数据可以是在用户本次唤醒语音助手后的全部对话数据,或者,也可以是之前特定轮次内的对话数据,如本轮对话前三轮的数据,本实施例对此不作限定。In a specific implementation, the historical dialogue data can be all the dialogue data after the user wakes up the voice assistant this time, or it can also be the dialogue data in a specific previous round, such as the data of the first three rounds of this round of dialogue, in this embodiment There is no restriction on this.
S702、识别所述用户语句中的目标实体信息,以及识别所述历史对话数据中的历史实体信息。S702: Identify the target entity information in the user sentence, and identify the historical entity information in the historical dialogue data.
实体(entity)是信息世界中经常使用到的一个术语,用于表示一种概念性事物。通常,可用于表示实体信息的多为名词,如人名、地名、机构名等等;少量的实体信息也可用其他词性的词语进行表示,如形容词等。Entity is a term often used in the information world to represent a conceptual thing. Generally, nouns can be used to represent entity information, such as names of persons, places, organizations, etc.; a small amount of entity information can also be represented by other part-of-speech words, such as adjectives.
在本申请实施例中,可以基于NER模型识别用户语句和历史对话数据中的实体信息。In the embodiment of the present application, user sentences and entity information in historical dialogue data can be identified based on the NER model.
在具体实现中,针对接收到的用户语句,可以首先对该语句进行分词,然后逐个判断分词后的每个词语是否属于实体词,并对每个实体词进行标注。In a specific implementation, for the received user sentence, the sentence can be segmented first, and then each word after the segmentation is judged one by one whether it belongs to an entity word, and each entity word is labeled.
当然,从本轮对话的用户语句中识别出的实体信息,在下一轮及其之后的对话轮次中,可以作为历史实体信息使用。因此,对于历史对话数据中的实体信息,可以在获取到历史对话数据中的各个语句后,通过对语句进行分词,找出其中的实体信息;也可以直接提取前面各轮次中已经被标记为实体信息的词语,作为历史实体信息,本实施例对此不作限定。Of course, the entity information identified from the user sentences of the current round of dialogue can be used as historical entity information in the next round and subsequent dialogue rounds. Therefore, for the entity information in the historical dialogue data, after obtaining each sentence in the historical dialogue data, the sentence can be segmented to find out the entity information; or it can be directly extracted from the previous rounds that have been marked as The words of the entity information are used as historical entity information, which is not limited in this embodiment.
S703、从所述历史实体信息中提取与所述用户语句相关联的关键实体信息。S703. Extract key entity information associated with the user sentence from the historical entity information.
由于待回复的用户语句为当前轮次的对话语句,其中包含的各个实体信息基本上都是与用户意图密切相关的,因此可以将用户语句中包含的全部目标实体信息予以保 留。而对于历史实体信息,则需要区分哪些是对于当前轮次对话的有用信息,而哪些又是冗余信息。Since the user sentence to be replied is the dialogue sentence of the current round, and the entity information contained therein is basically closely related to the user's intention, all the target entity information contained in the user sentence can be retained. For historical entity information, it is necessary to distinguish which is useful information for the current round of dialogue, and which is redundant information.
因此,在提取出历史实体信息后,可以从这些历史实体信息中筛选出与当前轮的对话语句相关联的关键实体信息,这些关键实体信息可以看作是对识别用户意图具有明显益处的信息。Therefore, after the historical entity information is extracted, the key entity information associated with the current round of dialogue sentences can be filtered from the historical entity information. These key entity information can be regarded as information that has obvious benefits in identifying the user's intention.
在本申请实施例中,可以根据不同的应用场景,在语音助手中设置多个用户意图,并为每个用户意图配置多个关联实体信息。当识别出用户语句中的目标实体信息后,可以从包含该目标实体信息的意图中,筛选出该意图可能包含的其他实体信息,然后再从历史实体信息中将这些关键实体信息识别出来。例如,对于“天气预报”这一意图,可以为其配置“时间”、“地点”、“天气状况”等多个实体信息,若目标实体信息为“天气状况”,则可以将历史实体信息中满足“时间”、“地点”要求的那些实体信息识别为关键实体信息。In the embodiments of the present application, multiple user intentions can be set in the voice assistant according to different application scenarios, and multiple associated entity information can be configured for each user intention. After identifying the target entity information in the user sentence, other entity information that may be included in the intention can be filtered from the intention containing the target entity information, and then the key entity information can be identified from the historical entity information. For example, for the intention of "weather forecast", multiple entity information such as "time", "location", and "weather conditions" can be configured for it. If the target entity information is "weather conditions", the historical entity information can be added Those entity information that meets the "time" and "location" requirements are identified as key entity information.
当然,根据实际使用需求的不同,也可以采用其他方式确定关键实体信息,本实施例对此不作限定。Of course, other methods may be used to determine the key entity information according to different actual usage requirements, which is not limited in this embodiment.
S704、根据所述目标实体信息和所述关键实体信息,生成目标交互语句。S704: Generate a target interaction sentence according to the target entity information and the key entity information.
在本申请实施例中,在确定出当前轮次的用户语句中的目标实体信息,以及历史对话语句中的关键实体信息后,可以根据上述两种信息,生成匹配用户实际意图的目标交互语句。In the embodiment of the present application, after determining the target entity information in the user sentence of the current round and the key entity information in the historical dialogue sentence, the target interaction sentence matching the actual intention of the user can be generated based on the above two kinds of information.
例如,若目标实体信息和关键实体信息中包含时间信息“周五”、地点信息“北京”,以及天气状况信息“温度”,则可以识别出用户当前希望查询的是北京周五的温度情况,与此相对应的目标交互语句可以是“北京本周五的温度是多少”,或者其他类似的语句。上述目标交互语句也就是用户希望查询的信息的表达句式。For example, if the target entity information and key entity information include time information "Friday", location information "Beijing", and weather condition information "Temperature", it can be recognized that the user currently wants to query the temperature of Beijing on Friday. The target interaction sentence corresponding to this can be "What is the temperature in Beijing this Friday", or other similar sentences. The above-mentioned target interaction sentence is also the expression sentence pattern of the information that the user wants to query.
S705、输出与所述目标交互语句相对应的回复语句。S705: Output a reply sentence corresponding to the target interactive sentence.
语音助手的功能即是方便用户以语音的方式查询某些信息。因此,在识别出与用户实际意图相匹配的目标交互语句后,语音助手可以对该语句进行搜索,查找出对应的回复语句。The function of the voice assistant is to facilitate users to query certain information by voice. Therefore, after identifying the target interaction sentence that matches the user's actual intention, the voice assistant can search for the sentence and find the corresponding reply sentence.
例如,对于“北京本周五的温度是多少”这一交互语句,相应的回复语句可以是“周五北京的温度是17摄氏度”。该回复语句可以通过语音的方式向用户进行播报,也可以通过文本的形式显示在手机界面中,或者通过其他信息格式发送至用户手机中,本实施例对此不作限定。For example, for the interactive sentence "What is the temperature in Beijing this Friday", the corresponding reply sentence may be "The temperature in Beijing on Friday is 17 degrees Celsius". The reply sentence can be broadcast to the user by voice, or displayed in the mobile phone interface in the form of text, or sent to the user's mobile phone in other information formats, which is not limited in this embodiment.
在本申请实施例中,通过识别当前对话轮次中的目标实体信息,并从历史对话数据中提取出关键实体信息,可以根据上述两种实体信息确定出用户的实际意图,并根据该意图对当前轮次的用户语句进行改写,生成目标交互语句,从而使得终端设备中的语音助手等应用程序可以根据目标交互语句做出回复。本实施例通过将多轮对话中的DST问题在一定程度上转换为单轮对话问题,可以利用现有成熟的单轮对话技术,对用户意图进行回复,提高对话状态跟踪和用户意图识别的准确性,提升对话系统的自然语言处理能力,增强在多轮对话过程中对话系统回复的合理性,使得系统回复更能匹配用户的实际需求,减少用户与对话系统之间的交互次数。In the embodiment of this application, by identifying the target entity information in the current dialogue round and extracting key entity information from the historical dialogue data, the actual intention of the user can be determined based on the above two kinds of entity information, and the user’s actual intention can be determined according to the intention. The user sentence of the current round is rewritten to generate the target interactive sentence, so that applications such as the voice assistant in the terminal device can respond according to the target interactive sentence. In this embodiment, by converting the DST questions in multiple rounds of dialogues to a certain extent into single-round dialogue questions, the existing mature single-round dialogue technology can be used to reply to the user's intention, and the accuracy of dialogue state tracking and user intention recognition can be improved. It can improve the natural language processing capabilities of the dialogue system, and enhance the rationality of the dialogue system’s reply during multiple rounds of dialogue, so that the system’s reply can better match the actual needs of the user and reduce the number of interactions between the user and the dialogue system.
参照图8,示出了本申请另一实施例提供的语音交互方法的示意性步骤流程图,该方法具体可以包括如下步骤:Referring to FIG. 8, there is shown a schematic step flowchart of a voice interaction method provided by another embodiment of the present application. The method may specifically include the following steps:
S801、当接收到待回复的用户语句时,获取历史对话数据。S801: When a user sentence to be replied is received, historical dialogue data is obtained.
需要说明的是,本方法可以应用于手机、平板电脑等终端设备中,本实施例对终端设备的具体类型不作限定。It should be noted that this method can be applied to terminal devices such as mobile phones and tablet computers, and this embodiment does not limit the specific types of terminal devices.
为了便于理解,本实施例以终端设备为手机为例进行后续介绍。即,用户在手机中使用语音助手等应用程序时,该类应用程序通过识别用户在当前轮次及之前各个轮次中的实体信息,从而确定出相应的用户意图,并基于该意图对当前轮次的用户语句进行改写,输出与改写后的用户语句相对应的回复语句,以满足用户的实际需求。For ease of understanding, this embodiment takes the terminal device as a mobile phone as an example for subsequent introduction. That is, when a user uses an application such as a voice assistant in a mobile phone, this type of application identifies the user's entity information in the current round and previous rounds to determine the corresponding user intention, and based on the intention, the current round The second user sentence is rewritten, and a reply sentence corresponding to the rewritten user sentence is output to meet the actual needs of the user.
在本申请实施例中,待回复的用户语句可以是指用户在与语音助手交互过程中所直接说出的某一句话,这句话可以是能够完整表达某一用户意图的句子,也可以是一个或多个词语。In the embodiments of this application, the user sentence to be replied may refer to a certain sentence directly uttered by the user during the interaction with the voice assistant. This sentence may be a sentence that can fully express the intention of a certain user, or it may be One or more words.
当语音助手在接收到用户的某一语句时,可以首先针对该语句,判断是否能够给出相应的回复。如果语音助手可以根据该语句直接给出回复,则无需进行其他处理,而可以直接将回复的语句提供给用户。例如,若用户语句为“请问北京本周五的温度是多少啊”,由于根据该语句可以直接确定出用户的意图是询问本周五北京的天气情况,语音助手可以根据查询得到的结果直接输出给用户。When the voice assistant receives a certain sentence from the user, it can first determine whether it can give a corresponding reply for the sentence. If the voice assistant can directly give a reply based on the sentence, no other processing is required, and the reply sentence can be directly provided to the user. For example, if the user's sentence is "What is the temperature in Beijing this Friday", because the sentence can directly determine that the user's intention is to inquire about the weather in Beijing this Friday, the voice assistant can directly output the result according to the query To the user.
若无法直接根据用户当前轮次的语句查询出相应的结果,则可以结合用户在之前各个轮次中的表达,重新确定用户的意图。此时,可以获取用户与语音助手之间的历史对话数据。上述历史对话数据可以是用户本次唤醒语音助手后,直到当前轮次之前的全部轮次的对话数据,也可以是在当前轮次之前连续数个轮次的对话数据,本实施例对此不作限定。If it is not possible to directly query the corresponding results according to the user's current round of sentences, the user's intention can be re-determined by combining the user's expressions in the previous rounds. At this time, the historical dialogue data between the user and the voice assistant can be obtained. The aforementioned historical dialogue data can be the dialogue data of all rounds after the user wakes up the voice assistant this time until the current round, or it can be the dialogue data of several consecutive rounds before the current round. This embodiment does not do this. limited.
S802、识别所述用户语句中的目标实体信息,以及识别所述历史对话数据中的历史实体信息。S802: Identify the target entity information in the user sentence, and identify the historical entity information in the historical dialogue data.
在本申请实施例中,可以基于NER模型识别用户语句和历史对话数据中的实体信息。In the embodiment of the present application, user sentences and entity information in historical dialogue data can be identified based on the NER model.
需要说明的是,历史对话数据中的历史实体信息可以包括用户在某一轮次中说出的句子中的实体信息,也可以包括语音助手在回复用户时的回复语句中的实体信息。It should be noted that the historical entity information in the historical dialogue data may include the entity information in the sentence spoken by the user in a certain round, and may also include the entity information in the reply sentence when the voice assistant replies to the user.
例如,在某个历史对话轮次中,用户询问语音助手“最近的餐馆是哪一家”,语音助手回复“最近的餐馆是海淀区农大南路的海底捞”,针对该轮次的历史对话数据,其中的历史实体信息可以包括用户语句中的“餐馆”,也应当包括语音助手回复语句中的“海淀区农大南路”和“海底捞”等实体信息。For example, in a certain historical dialogue round, the user asked the voice assistant "Which is the nearest restaurant", and the voice assistant replied "The nearest restaurant is Haidilao on Nongda South Road, Haidian District". For the historical dialogue data of this round, The historical entity information may include the "restaurant" in the user's sentence, as well as the entity information such as "Nongda South Road, Haidian District" and "Haidilao" in the voice assistant's reply sentence.
S803、根据所述目标实体信息和所述历史实体信息,确定与所述用户语句相匹配的候选用户意图。S803: According to the target entity information and the historical entity information, determine a candidate user intention that matches the user sentence.
需要说明的是,由于用户语句中包含的目标实体信息以及历史对话数据中包含的历史实体信息可能包括很多,所以根据目标实体信息和历史实体信息初步确定的候选用户意图也可能包括多种。It should be noted that since the target entity information contained in the user sentence and the historical entity information contained in the historical dialogue data may include many, the candidate user intentions preliminarily determined based on the target entity information and the historical entity information may also include multiple types.
在本申请实施例中,在识别出目标实体信息和历史实体信息后,可以结合KBs,初步确定用户当前可能的意图是哪些。In the embodiment of the present application, after identifying the target entity information and historical entity information, the KBs can be combined to preliminarily determine the current possible intentions of the user.
在具体实现中,KBs中可以预设多种用户意图,每种用户意图可以包括多个语义槽位。在识别得到目标实体信息和历史实体信息后,可以根据上述两种实体信息在每种用户意图对应的槽位中进行匹配,从而将包含部分已识别出的实体信息的槽位所对应的用户意图,初步确定为候选用户意图。In specific implementation, multiple user intentions can be preset in KBs, and each user intention can include multiple semantic slots. After identifying the target entity information and historical entity information, you can match the slots corresponding to each user intent based on the above two entity information, so as to match the user intents corresponding to the slots containing part of the identified entity information , Which is preliminarily determined as a candidate user’s intention.
S804、分别计算每个历史实体信息在所述历史对话数据中的分布概率。S804: Calculate the distribution probability of each historical entity information in the historical dialogue data.
在本申请实施例中,为了准确地确定用户的实际意图,可以首先计算每个历史实体信息在历史对话数据中的分布概率。In the embodiment of the present application, in order to accurately determine the actual intention of the user, the distribution probability of each historical entity information in the historical dialogue data may be calculated first.
在具体实现中,可以基于PGN模型,确定出每个历史实体信息的分布概率。首先,将每一个历史实体信息进行符号化处理,然后调用PGN模型,采用PGN模型的编码模块对符号化处理后的每个历史实体信息进行编码,在编码环节计算出每个历史实体信息的分布概率。In specific implementation, the distribution probability of each historical entity information can be determined based on the PGN model. First, symbolize each historical entity information, then call the PGN model, and use the encoding module of the PGN model to encode each historical entity information after the symbolization process, and calculate the distribution of each historical entity information in the encoding link Probability.
如图9所示,是本申请一实施例提供的基于PGN模型的实体信息的分布概率计算过程示意图。首先,可以结合训练数据和KBs训练预测模型,增强PGN模型的关键信息提取能力。上述训练数据可以是预先采集的多轮对话数据,包括预先采集的训练数据中某一轮(当前轮)对话中的实体信息以及历史(当前轮之前各轮)的实体信息。针对当前需要计算的历史实体信息,可以在将其转换为文本向量后,输出相应的注意力分布;同时,结合PGN模型的编码模块和解码模块,得到历史实体信息的生成概率。上述各种类型的概率相加,可以输出最终的分布概率。另一发明,在确定分布概率时,还可以结合用户的确认信息,提高输出的分布概率以及识别出的关键实体信息的可靠性。As shown in FIG. 9, it is a schematic diagram of the calculation process of the distribution probability of entity information based on the PGN model provided by an embodiment of the present application. First, the prediction model can be trained with training data and KBs to enhance the key information extraction capability of the PGN model. The above-mentioned training data may be pre-collected multiple rounds of dialogue data, including entity information in a certain round (current round) of the dialogue and historical entity information (rounds before the current round) in the pre-collected training data. For the historical entity information that currently needs to be calculated, the corresponding attention distribution can be output after converting it into a text vector; at the same time, combining the encoding module and decoding module of the PGN model to obtain the generation probability of historical entity information. The above-mentioned various types of probabilities can be added together to output the final distribution probability. In another invention, when determining the distribution probability, the user's confirmation information can also be combined to improve the output distribution probability and the reliability of the identified key entity information.
S805、根据所述分布概率和所述候选用户意图,从所述历史实体信息中提取关键实体信息。S805. Extract key entity information from the historical entity information according to the distribution probability and the intention of the candidate user.
在本申请实施例中,根据每个历史实体信息的分布概率,查找出与用户意图相关联的关键实体信息,即是从所有历史实体信息中筛选出与用户意图具有较大相关性的实体信息。In the embodiment of the present application, according to the distribution probability of each historical entity information, the key entity information associated with the user's intention is found, that is, the entity information that has a greater correlation with the user's intention is selected from all the historical entity information .
在具体实现中,可以从历史实体信息中提取与任一候选用户意图相关联的候选实体信息,然后提取出分布概率的概率值大于某个预设概率阈值的候选实体信息,作为与该意图相关的关键实体信息。In specific implementation, the candidate entity information associated with any candidate user's intention can be extracted from the historical entity information, and then the candidate entity information whose distribution probability is greater than a certain preset probability threshold can be extracted as the candidate entity information related to the intention Key entity information.
作为本实施例的一种示例,概率阈值可以设置为0.8。因此,可以将分布概率的概率值大于0.8的候选实体信息识别为关键实体信息。As an example of this embodiment, the probability threshold may be set to 0.8. Therefore, candidate entity information whose distribution probability is greater than 0.8 can be identified as key entity information.
在本申请实施例中,对于某些概率值虽然未大于上述概率阈值,但其概率值处于上述概率阈值附近的实体信息,可以邀请用户对该实体信息进行识别。In the embodiment of the present application, for some entity information whose probability value is not greater than the above-mentioned probability threshold, but whose probability value is near the above-mentioned probability threshold, the user may be invited to identify the entity information.
在具体实现中,若目标概率值与上述概率阈值之间的差值小于预设差值,且该目标概率值小于上述概率阈值,则可以根据目标概率值对应的候选实体信息和关键实体信息,生成询问语句,用于指示用户对该目标概率值对应的候选实体信息进行识别,该目标概率值为任一候选实体信息在历史对话数据中的分布概率的概率值。In specific implementation, if the difference between the target probability value and the aforementioned probability threshold is less than the preset difference, and the target probability value is less than the aforementioned probability threshold, the candidate entity information and key entity information corresponding to the target probability value can be used, The query sentence is generated to instruct the user to identify the candidate entity information corresponding to the target probability value, and the target probability value is the probability value of the distribution probability of any candidate entity information in the historical dialogue data.
当接收到用户针对上述询问语句的确认信息时,可以认为用户认可了该实体信息,此时,可以将该目标概率值对应的候选实体信息识别为关键实体信息。When receiving the user's confirmation information for the above query sentence, it can be considered that the user has approved the entity information. At this time, the candidate entity information corresponding to the target probability value can be identified as key entity information.
例如,在用户与语音助手之间的某个对话轮次中,计算出历史实体信息“温度” 的概率值为0.86,大于设定的概率阈值0.8,此时可以将“温度”这一实体信息识别为关键实体信息。另一方面,计算出历史实体信息“周五”的概率值为0.72,小于上述概率阈值0.8,但处于该阈值的附近。假设当前轮次用户语句中的目标实体信息为“北京市”,则可以结合目标实体信息和已有的关键实体信息,生成相应的询问语句“请问您是想查询北京市“周五”的温度信息吗”,若接收到用户的确认回复,则可以将上述历史实体信息“周五”也一并识别为关键实体信息。For example, in a certain round of dialogue between the user and the voice assistant, the probability value of the historical entity information "temperature" is calculated to be 0.86, which is greater than the set probability threshold of 0.8. At this time, the entity information "temperature" can be Identify as key entity information. On the other hand, the calculated probability value of the historical entity information "Friday" is 0.72, which is less than the above-mentioned probability threshold of 0.8, but is in the vicinity of the threshold. Assuming that the target entity information in the current round of user sentences is "Beijing", you can combine the target entity information and the existing key entity information to generate the corresponding query sentence "May I ask you to check the temperature of "Friday" in Beijing Information?", if the user's confirmation reply is received, the aforementioned historical entity information "Friday" can also be identified as key entity information.
S806、确定目标基础语句。S806. Determine the target basic sentence.
在本申请实施例中,在确定出当前轮次的用户语句中的目标实体信息,以及历史对话语句中的关键实体信息后,可以根据上述两种信息,生成匹配用户实际意图的目标交互语句。In the embodiment of the present application, after determining the target entity information in the user sentence of the current round and the key entity information in the historical dialogue sentence, the target interaction sentence matching the actual intention of the user can be generated based on the above two kinds of information.
在具体实现中,为了降低目标交互语句的生成难度,可以首先确定出目标基础语句,然后在目标基础语句的基础上进行改写,得到最终的目标交互语句。In specific implementation, in order to reduce the difficulty of generating the target interactive sentence, the target basic sentence can be determined first, and then rewritten on the basis of the target basic sentence to obtain the final target interactive sentence.
在本申请实施例中,目标基础语句可以基于关键实体信息和/或目标实体信息来确定。In the embodiment of the present application, the target basic sentence may be determined based on key entity information and/or target entity information.
在具体实现中,可以首先包含目标实体信息的用户语句,以及包含关键实体信息的历史对话数据中,获取多个基础语句,然后分别计算关键多个基础语句与待评估实体信息之间的匹配度,并识别上述匹配度最大值对应的基础语句为当前的目标基础语句。上述待评估实体信息包括全部的目标实体信息和关键实体信息。In specific implementation, you can first obtain multiple basic sentences from the user sentence containing the target entity information and historical dialogue data containing the key entity information, and then calculate the matching degree between the key multiple basic sentences and the entity information to be evaluated. , And identify the basic sentence corresponding to the maximum matching degree as the current target basic sentence. The aforementioned entity information to be evaluated includes all target entity information and key entity information.
在本申请实施例中,基础语句可以是当前的用户语句,也可以是历史对话数据中的某一句用户语句。待评估实体信息与各个基础语句之间的匹配度,可以根据待评估实体信息与语义槽位之间的匹配度来确定。In the embodiment of the present application, the basic sentence may be the current user sentence or a certain sentence of the user sentence in the historical dialogue data. The degree of matching between the entity information to be evaluated and each basic sentence can be determined according to the degree of matching between the entity information to be evaluated and the semantic slot.
具体地,针对任一基础语句,可以统计该基础语句中的语义槽位个数,以及待评估实体信息的个数,即统计待计算的基础语句中包括多少个槽位,以及识别出的待评估实体信息有多少个。然后,确定该基础语句中分别与待评估实体信息相匹配的关键槽位个数有多少,最后计算关键槽位个数与基础语句中的语义槽位个数之间的比值,从而可以将该比值作为待评估实体信息与所述基础语句之间的匹配度。Specifically, for any basic sentence, the number of semantic slots in the basic sentence and the number of entity information to be evaluated can be counted, that is, how many slots are included in the basic sentence to be calculated, and the number of slots to be identified can be counted. Evaluate how many entity information there are. Then, determine the number of key slots in the basic sentence that match the entity information to be evaluated, and finally calculate the ratio between the number of key slots and the number of semantic slots in the basic sentence. The ratio is used as the matching degree between the entity information to be evaluated and the basic sentence.
例如,某一基础语句中包括四个语义槽位,其中待评估实体信息“周五”和“温度”分别与其中的时间槽位和天气情况槽位相匹配,则上述待评估实体信息与该基础语句的匹配度为50%。For example, if a basic sentence includes four semantic slots, the information of the entity to be evaluated "Friday" and "temperature" respectively match the time slot and weather condition slot, then the above entity information to be evaluated matches the basic The matching degree of the sentence is 50%.
S807、采用所述目标实体信息和所述关键实体信息,对所述目标基础语句进行改写,生成目标交互语句。S807. Using the target entity information and the key entity information, rewrite the target basic sentence to generate a target interactive sentence.
在确定出目标基础语句后,可以采用目标实体信息和关键实体信息,对该语句进行改写,得到最终的目标交互语句。After the target basic sentence is determined, the target entity information and key entity information can be used to rewrite the sentence to obtain the final target interactive sentence.
具体采用目标实体信息还是关键实体信息进行语句改写,取决于目标基础语句究竟是当前的用户语句还是历史对话中的用户语句。如果目标基础语句为当前的用户语句,由于该用户语句中已经包含了全部的目标实体信息,此时就可以利用从历史对话中识别出的关键实体信息进行改写;如果目标基础语句为历史对话中的语句,由于该语句可能仅包含其中一部分关键实体信息,此时可以利用全部的关键实体信息和目标实体信息,对该语句进行改写。The specific use of target entity information or key entity information for sentence rewriting depends on whether the target basic sentence is the current user sentence or the user sentence in the historical dialogue. If the target basic sentence is the current user sentence, since the user sentence already contains all the target entity information, you can use the key entity information identified from the historical dialogue to rewrite; if the target basic sentence is in the historical dialogue Since the sentence may only contain part of the key entity information, you can use all the key entity information and target entity information to rewrite the sentence.
在本申请实施例中,可以基于PGN模型输出目标交互语句。PGN模型除编码模块外,还可以包括解码模块,该解码模块可以通过对多种训练数据进行训练得到,上述多种训练数据可以包括多个实体信息以及与每个实体信息相对应的基础语句。In the embodiment of the present application, the target interactive sentence may be output based on the PGN model. In addition to the encoding module, the PGN model may also include a decoding module. The decoding module can be obtained by training various types of training data. The various types of training data can include multiple entity information and basic sentences corresponding to each entity information.
因此,在确定出目标基础语句后,可以采用PGN模型的解码模块对目标实体信息、关键实体信息和目标基础语句进行解码,输出目标交互语句。Therefore, after the target basic sentence is determined, the decoding module of the PGN model can be used to decode the target entity information, key entity information, and target basic sentence, and output the target interactive sentence.
在本申请实施例中,针对PGN模型输出的目标交互语句,可以验证该语句是否改写正确。In the embodiment of the present application, for the target interactive sentence output by the PGN model, it can be verified whether the sentence is rewritten correctly.
在本申请实施例中,可以通过双层验证的方式来判断目标交互语句是否改写正确。In the embodiment of the present application, it is possible to determine whether the target interactive sentence is rewritten correctly by means of double-layer verification.
具体地,可以首先提取目标交互语句中的多个实体信息,验证上述目标交互语句中的多个实体信息是否匹配预设的知识库中目标用户意图的语义槽位。其中,目标用户意图即是全部候选用户意图中的任意一个。Specifically, multiple entity information in the target interaction sentence may be extracted first, and it is verified whether the multiple entity information in the target interaction sentence matches the preset semantic slot of the target user's intention in the knowledge base. Among them, the target user intention is any one of all candidate user intentions.
如果目标交互语句中的多个实体信息匹配目标用户意图的语义槽位,则可以判定生成的目标交互语句正确,并执行步骤S808,输出与目标交互语句相对应的回复语句。If multiple entity information in the target interaction sentence matches the semantic slot intended by the target user, it can be determined that the generated target interaction sentence is correct, and step S808 is executed to output a reply sentence corresponding to the target interaction sentence.
如果目标交互语句中的多个实体信息不匹配目标用户意图的语义槽位,则可以根据目标交互语句的语句类型,对目标交互语句进行二次验证。If multiple entity information in the target interactive sentence does not match the semantic slot intended by the target user, the target interactive sentence can be verified a second time according to the sentence type of the target interactive sentence.
在进行二次验证时,可以基于自然语言理解模型来完成。通过调用预设的自然语言理解模型,可以判断目标交互语句是否为任务型语句。如果该语句为任务型语句,也就是根据当前的语句能够具体识别出用户的意图,并针对该意图做出响应。此时,也可以执行步骤S808,输出与目标交互语句相对应的回复语句;如果目标交互语句不为任务型语句,则表示语音助手无法对该语句进行具体的意图识别或识别出的意图缺少更明确的信息,此时可以提示用户重新输入用户语句,语音助手可以根据重新输入的用户语句再次对用户意图进行识别,并生成新的目标交互语句。In the second verification, it can be done based on the natural language understanding model. By calling a preset natural language understanding model, it can be judged whether the target interactive sentence is a task-type sentence. If the sentence is a task-type sentence, that is, according to the current sentence, the user's intention can be specifically identified and a response can be made to the intention. At this time, step S808 can also be executed to output a reply sentence corresponding to the target interactive sentence; if the target interactive sentence is not a task-type sentence, it means that the voice assistant cannot perform specific intent recognition on the sentence or the recognized intention lacks a change. With clear information, the user can be prompted to re-enter the user sentence at this time, and the voice assistant can recognize the user's intention again according to the re-entered user sentence, and generate a new target interaction sentence.
S808、输出与所述目标交互语句相对应的回复语句。S808: Output a reply sentence corresponding to the target interactive sentence.
在本申请实施例中,通过结合历史对话数据中的实体信息,可以对当前轮次中的用户语句进行改写,从而在一定程度上将多轮对话中的对话状态跟踪问题转换为单轮对话问题,可以利用已有的单轮对话技术,对用户意图进行回复,提高了对话系统的自然语言处理能力,保证了用户意图识别的准确性,增强了在多轮对话过程中对话系统回复的合理性,使得系统回复更能匹配用户的实际需求,减少用户与对话系统之间的交互次数。In the embodiment of the present application, by combining the entity information in the historical dialogue data, the user sentence in the current round can be rewritten, so as to convert the dialogue state tracking problem in multiple rounds of dialogues to a single round of dialogue questions to a certain extent , Can use the existing single-round dialogue technology to reply to the user's intention, improve the natural language processing ability of the dialogue system, ensure the accuracy of the user's intention recognition, and enhance the rationality of the dialogue system's reply in the process of multiple rounds of dialogue , So that the system reply can better match the actual needs of the user, reducing the number of interactions between the user and the dialogue system.
为了便于理解,下面结合一个具体的示例,对本申请的语音交互方法作一介绍。如图10所示,是本申请一实施例提供的语音交互方法的运行过程示意图,按照图10所示的运行过程,整个语音交互可以包括如下步骤:For ease of understanding, the following describes the voice interaction method of the present application in conjunction with a specific example. As shown in FIG. 10, it is a schematic diagram of the operation process of the voice interaction method provided by an embodiment of the present application. According to the operation process shown in FIG. 10, the entire voice interaction may include the following steps:
1、针对多轮对话过程中的输入语句,可以首先判断是否需要对当前轮对话的输入语句进行改写。如果无需改写,则可以直接输出回复语句;如果需要改写,则首先利用NER模块,提取历史对话数据中的实体信息,确定出历史对话数据中所有的历史实体信息。由于实体可能由多个词构成,因此需要将每一个实体信息进行符号化处理,以便于PGN模型中后续的编码和生成等处理。1. Regarding the input sentences in the process of multiple rounds of dialogue, you can first determine whether it is necessary to rewrite the input sentences of the current round of dialogue. If there is no need to rewrite, you can directly output the reply sentence; if you need to rewrite, first use the NER module to extract the entity information in the historical dialogue data, and determine all the historical entity information in the historical dialogue data. Since entities may be composed of multiple words, each entity information needs to be symbolized to facilitate subsequent encoding and generation in the PGN model.
2、基于PGN模型,在编码环节计算每个历史实体信息的注意力分布。2. Based on the PGN model, calculate the attention distribution of each historical entity information in the coding process.
3、结合当前对话轮次中的目标实体信息、历史实体信息的分布概率以及知识库 KBs,训练预测模型,并从历史实体信息中筛选出与用户意图相关的概率最大值的实体信息作为关键实体信息,同时摒弃历史对话数据中的冗余实体信息。在此基础上,根据关键实体信息确定对应的基础改写语句。之后在基础改写语句的基础上,利用PGN模型的解码模块生成目标交互语句。预先确定基础改写语句的意义在于降低PGN模型生成输出语句的难度。3. Combine the target entity information in the current dialogue round, the distribution probability of historical entity information and the knowledge base KBs, train the prediction model, and filter the entity information with the maximum probability related to the user's intention from the historical entity information as the key entity Information, while discarding redundant entity information in historical dialogue data. On this basis, the corresponding basic rewrite sentence is determined according to the key entity information. Then, on the basis of the basic rewritten sentence, the decoding module of the PGN model is used to generate the target interactive sentence. The significance of pre-determining the basic rewritten sentences is to reduce the difficulty of generating output sentences from the PGN model.
4、将KBs作为先验,结合历史实体信息和当前轮对话中的目标实体信息来训练神经网络模型,以增强PGN模型在历史对话中抽取关键信息的能力。需要说明的是,由于当前对话轮次的语句中的实体基本上都是与用户意图密切相关的,因此予以全部保留。然后,将神经网络的输出融入PGN模型的损失函数,用来计算历史实体信息对应的输出概率。4. Using KBs as a priori, combining historical entity information and target entity information in the current round of dialogue to train the neural network model to enhance the ability of the PGN model to extract key information from historical dialogue. It should be noted that since the entities in the sentences of the current conversation round are basically closely related to the user's intention, they are all retained. Then, the output of the neural network is integrated into the loss function of the PGN model to calculate the output probability corresponding to the historical entity information.
5、在每个历史实体信息的输出概率基础上,结合PGN模型,计算出每个历史实体信息最终的分布概率,即模型中该实体最终能反映用户意图的概率。5. Based on the output probability of each historical entity information, combined with the PGN model, calculate the final distribution probability of each historical entity information, that is, the probability that the entity in the model can ultimately reflect the user's intention.
6、若分布概率小于阈值,即改进后的模型依然不能确定该实体是否应出现在输出语句中,此时可能邀请用户参与配置该关键实体;之后结合大于阈值的关键实体,在上面确定的基础改写语句的基础上,利用PGN模型的解码模块进行语句生成。需要说明的是,邀请用户参与实体配置,是由于为了提升模型提取关键信息的召回率,一般可以将该阈值设置得较高。但阈值过高也可能会导致丢失一部分关键信息,因此需要针对与阈值靠近的实体信息,邀请用户参与配置,以进一步提升模型提取关键信息的召回率。另一方面,通过邀请用户参与实体信息的配置,这样得到的输出语句可靠性也较高,可以作为训练语料对模型进行迭代优化,部分解决了高质量多轮对话语料难以获取的问题。6. If the distribution probability is less than the threshold, that is, the improved model still cannot determine whether the entity should appear in the output sentence. At this time, the user may be invited to participate in the configuration of the key entity; then, the key entity greater than the threshold is combined, based on the above determination. On the basis of rewriting the sentence, the decoding module of the PGN model is used to generate the sentence. It should be noted that the reason for inviting users to participate in entity configuration is to increase the recall rate of key information extracted by the model. Generally, the threshold can be set higher. But too high a threshold may also result in the loss of some key information. Therefore, it is necessary to invite users to participate in the configuration for entity information close to the threshold to further improve the recall rate of the key information extracted by the model. On the other hand, by inviting users to participate in the configuration of entity information, the reliability of the output sentences obtained in this way is also high, which can be used as training corpus to iteratively optimize the model, which partially solves the problem of difficulty in obtaining high-quality multi-round dialogue materials.
7、为了确定输出的目标交互语句的有效性,本实施例设计了双层反馈机制,具体方式为:7. In order to determine the validity of the output target interactive sentence, this embodiment designs a two-layer feedback mechanism, and the specific method is as follows:
改写后的句子与KBs中意图对应的槽位值进行匹配,若匹配成功,则认为改写正确;若匹配不成功,则可以继续利用自然语言理解技术对改写句子进行验证,若采用自然语言理解技术识别出该语句为任务型语句,则可以认为改写正确;若识别出该语句并非任务型语句,则认为改写错误。此时,可以引导用户重述意图,作为后续的训练语料。The rewritten sentence is matched with the slot value corresponding to the intent in KBs. If the match is successful, the rewriting is considered correct; if the match is unsuccessful, the rewritten sentence can be verified by using natural language understanding technology. If natural language understanding technology is used If it is recognized that the sentence is a task-type sentence, it can be considered that the rewriting is correct; if it is recognized that the sentence is not a task-type sentence, it can be considered that the rewriting is wrong. At this point, the user can be guided to restate the intention as the follow-up training corpus.
8、基于改写后的目标交互语句,可在一定程度上将多轮对话中的DST问题转换为单轮对话问题,并利用现有成熟的单论对话技术,对用户意图进行回复,提升任务导向型多轮对话系统的能力与用户体验。8. Based on the rewritten target interaction sentence, the DST problem in multiple rounds of dialogue can be converted to a single-round dialogue question to a certain extent, and the existing mature single-talk dialogue technology can be used to respond to user intentions and improve task orientation The capabilities and user experience of a multi-round dialogue system.
应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。It should be understood that the size of the sequence number of each step in the foregoing embodiment does not mean the order of execution. The execution sequence of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiment of the present application.
对应于上文实施例所述的语音交互方法,图11示出了本申请一实施例提供的语音交互装置的结构框图,为了便于说明,仅示出了与本申请实施例相关的部分。Corresponding to the voice interaction method described in the above embodiment, FIG. 11 shows a structural block diagram of a voice interaction device provided by an embodiment of the present application. For ease of description, only the parts related to the embodiment of the present application are shown.
参照图11,该装置可以应用于终端设备中,具体可以包括如下模块:Referring to FIG. 11, the device can be applied to terminal equipment, and specifically can include the following modules:
历史对话数据获取模块1101,用于在接收到待回复的用户语句时,获取历史对话 数据;The historical dialogue data acquisition module 1101 is used to acquire historical dialogue data when a user sentence to be replied is received;
目标实体信息识别模块1102,用于识别所述用户语句中的目标实体信息;以及,The target entity information identification module 1102 is used to identify the target entity information in the user sentence; and,
历史实体信息识别模块1103,用于识别所述历史对话数据中的历史实体信息;The historical entity information identification module 1103 is used to identify historical entity information in the historical dialogue data;
关键实体信息提取模块1104,用于从所述历史实体信息中提取与所述用户语句相关联的关键实体信息;The key entity information extraction module 1104 is configured to extract key entity information associated with the user sentence from the historical entity information;
目标交互语句生成模块1105,用于根据所述目标实体信息和所述关键实体信息,生成目标交互语句;The target interactive sentence generating module 1105 is configured to generate a target interactive sentence according to the target entity information and the key entity information;
回复语句输出模块1106,用于输出与所述目标交互语句相对应的回复语句。The reply sentence output module 1106 is used to output a reply sentence corresponding to the target interactive sentence.
在本申请实施例中,所述关键实体信息提取模块具体可以包括如下子模块:In the embodiment of the present application, the key entity information extraction module may specifically include the following submodules:
候选用户意图确定子模块,用于根据所述目标实体信息和所述历史实体信息,确定与所述用户语句相匹配的候选用户意图;A candidate user intention determination sub-module, configured to determine a candidate user intention that matches the user sentence according to the target entity information and the historical entity information;
分布概率计算子模块,用于分别计算每个历史实体信息在所述历史对话数据中的分布概率;The distribution probability calculation sub-module is used to separately calculate the distribution probability of each historical entity information in the historical dialogue data;
关键实体信息提取子模块,用于根据所述分布概率和所述候选用户意图,从所述历史实体信息中提取关键实体信息。The key entity information extraction sub-module is used to extract key entity information from the historical entity information according to the distribution probability and the candidate user's intention.
在本申请实施例中,所述分布概率计算子模块具体可以包括如下单元:In the embodiment of the present application, the distribution probability calculation sub-module may specifically include the following units:
第一指针生成网络模型调用单元,用于调用预设的指针生成网络模型,采用所述指针生成网络模型的编码模块分别对每个历史实体信息进行编码,获得与所述每个历史实体信息相对应的分布概率。The first pointer generation network model calling unit is configured to call a preset pointer generation network model, and use the coding module of the pointer generation network model to respectively encode each historical entity information to obtain information corresponding to each historical entity information. The corresponding distribution probability.
在本申请实施例中,所述关键实体信息提取子模块具体可以包括如下单元:In the embodiment of the present application, the key entity information extraction submodule may specifically include the following units:
候选实体信息提取单元,用于从所述历史实体信息中提取与任一候选用户意图相关联的候选实体信息;A candidate entity information extraction unit, configured to extract candidate entity information associated with any candidate user's intention from the historical entity information;
关键实体信息提取单元,用于提取所述分布概率的概率值大于预设概率阈值的候选实体信息,作为关键实体信息。The key entity information extraction unit is configured to extract candidate entity information whose distribution probability is greater than a preset probability threshold as key entity information.
在本申请实施例中,所述关键实体信息提取子模块还可以包括如下单元:In the embodiment of the present application, the key entity information extraction submodule may further include the following units:
询问语句生成单元,用于若目标概率值与所述预设概率阈值之间的差值小于预设差值,且所述目标概率值小于所述预设概率阈值,则根据所述目标概率值对应的候选实体信息和所述关键实体信息,生成询问语句,以指示用户对所述目标概率值对应的候选实体信息进行识别;The query sentence generating unit is configured to: if the difference between the target probability value and the preset probability threshold is less than the preset difference, and the target probability value is less than the preset probability threshold, then according to the target probability value Corresponding to the candidate entity information and the key entity information, generating a query sentence to instruct the user to identify the candidate entity information corresponding to the target probability value;
关键实体信息确定单元,用于在接收到用户针对所述询问语句的确认信息时,将所述目标概率值对应的候选实体信息确定为所述关键实体信息,所述目标概率值为任一候选实体信息在所述历史对话数据中的分布概率的概率值。The key entity information determining unit is configured to determine the candidate entity information corresponding to the target probability value as the key entity information when the user's confirmation information for the query sentence is received, and the target probability value is any candidate The probability value of the distribution probability of the entity information in the historical dialogue data.
在本申请实施例中,所述目标交互语句生成模块具体可以包括如下子模块:In the embodiment of the present application, the target interactive sentence generation module may specifically include the following sub-modules:
目标基础语句确定子模块,用于确定目标基础语句;The target basic sentence determination sub-module is used to determine the target basic sentence;
目标交互语句生成子模块,用于采用所述目标实体信息和所述关键实体信息,对所述目标基础语句进行改写,生成目标交互语句。The target interactive sentence generating sub-module is used to use the target entity information and the key entity information to rewrite the target basic sentence to generate a target interactive sentence.
在本申请实施例中,所述目标基础语句确定子模块具体可以包括如下单元:In the embodiment of the present application, the target basic sentence determination submodule may specifically include the following units:
基础语句获取单元,用于从包含所述目标实体信息的用户语句,以及包含所述关键实体信息的历史对话数据中,获取多个基础语句;The basic sentence obtaining unit is configured to obtain a plurality of basic sentences from the user sentence containing the target entity information and the historical dialogue data containing the key entity information;
匹配度计算单元,用于分别计算所述多个基础语句与所述待评估实体信息之间的匹配度,所述待评估实体信息包括所述目标实体信息和所述关键实体信息;A matching degree calculation unit, configured to calculate the matching degree between the plurality of basic sentences and the entity information to be evaluated, where the entity information to be evaluated includes the target entity information and the key entity information;
目标基础语句识别单元,用于识别所述匹配度最大值对应的基础语句为当前的目标基础语句。The target basic sentence identification unit is used to identify the basic sentence corresponding to the maximum matching degree as the current target basic sentence.
在本申请实施例中,任一基础语句分别包括多个语义槽位,所述匹配度计算单元具体可以包括如下子单元:In the embodiment of the present application, any basic sentence includes multiple semantic slots, and the matching degree calculation unit may specifically include the following subunits:
统计子单元,用于针对任一基础语句,统计所述基础语句中的语义槽位个数,以及所述待评估实体信息的个数;The statistics subunit is used to count the number of semantic slots in the basic sentence and the number of entity information to be evaluated for any basic sentence;
确定子模块,用于确定所述基础语句中分别与所述待评估实体信息相匹配的关键槽位个数;The determining sub-module is used to determine the number of key slots in the basic sentence that respectively match the information of the entity to be evaluated;
计算子单元,用于计算所述关键槽位个数与所述基础语句中的语义槽位个数之间的比值,将所述比值作为所述待评估实体信息与所述基础语句之间的匹配度。The calculation subunit is used to calculate the ratio between the number of key slots and the number of semantic slots in the basic sentence, and use the ratio as the difference between the entity information to be evaluated and the basic sentence suitability.
在本申请实施例中,所述指针生成网络模型还包括解码模块,所述解码模块通过对多种训练数据进行训练得到,所述多种训练数据包括多个实体信息以及与每个实体信息相对应的基础语句;所述目标交互语句生成子模块具体可以包括如下单元:In the embodiment of the present application, the pointer generation network model further includes a decoding module, which is obtained by training various types of training data, and the various types of training data include multiple entity information and information related to each entity. Corresponding basic sentences; the target interactive sentence generation sub-module may specifically include the following units:
第二指针生成网络模型调用单元,用于采用所述解码模块对所述目标实体信息、所述关键实体信息和所述目标基础语句进行解码,输出目标交互语句。The second pointer generation network model calling unit is configured to use the decoding module to decode the target entity information, the key entity information, and the target basic sentence, and output a target interactive sentence.
在本申请实施例中,所述目标交互语句生成子模块还可以包括如下单元:In the embodiment of the present application, the target interactive sentence generation submodule may further include the following units:
目标交互语句实体信息提取单元,用于提取所述目标交互语句中的多个实体信息;The target interactive sentence entity information extraction unit is used to extract multiple entity information in the target interactive sentence;
目标交互语句验证单元,用于验证所述目标交互语句中的多个实体信息是否匹配预设的知识库中目标用户意图的语义槽位,所述目标用户意图为所述候选用户意图中的任意一个;若所述目标交互语句中的多个实体信息匹配所述目标用户意图的语义槽位,则判定生成的所述目标交互语句正确,并执行输出与所述目标交互语句相对应的回复语句的步骤;若所述目标交互语句中的多个实体信息不匹配所述目标用户意图的语义槽位,则根据所述目标交互语句的语句类型,对所述目标交互语句进行验证。The target interactive sentence verification unit is used to verify whether the multiple entity information in the target interactive sentence matches a preset semantic slot of the target user's intention in the knowledge base, and the target user's intention is any of the candidate user intentions One; if multiple entity information in the target interaction sentence matches the semantic slot of the target user's intention, it is determined that the generated target interaction sentence is correct, and the response sentence corresponding to the target interaction sentence is executed and output If the multiple entity information in the target interaction sentence does not match the semantic slot intended by the target user, then the target interaction sentence is verified according to the sentence type of the target interaction sentence.
在本申请实施例中,所述目标交互语句验证单元还用于:调用预设的自然语言理解模型判断所述目标交互语句是否为任务型语句;若所述目标交互语句为任务型语句,则调用所述回复语句输出模块,输出与所述目标交互语句相对应的回复语句;若所述目标交互语句不为任务型语句,则提示用户重新输入用户语句,并根据重新输入的用户语句再次生成目标交互语句。In the embodiment of the present application, the target interactive sentence verification unit is further configured to: call a preset natural language understanding model to determine whether the target interactive sentence is a task-type sentence; if the target interactive sentence is a task-type sentence, then Call the reply sentence output module to output a reply sentence corresponding to the target interactive sentence; if the target interactive sentence is not a task-type sentence, the user is prompted to re-enter the user sentence, and re-generated according to the re-input user sentence Target interactive statement.
对于装置实施例而言,由于其与方法实施例基本相似,所以描述得比较简单,相关之处参见方法实施例部分的说明即可。As for the device embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and for related parts, please refer to the description of the method embodiment part.
参照图12,示出了本申请一实施例的一种终端设备的示意图。如图12所示,本实施例的终端设备1200包括:处理器1210、存储器1220以及存储在所述存储器1220中并可在所述处理器1210上运行的计算机程序1221。所述处理器1210执行所述计算机程序1221时实现上述语音交互方法各个实施例中的步骤,例如图7所示的步骤S701至S705。或者,所述处理器1210执行所述计算机程序1221时实现上述各装置实施例中各模块/单元的功能,例如图11所示模块1101至1106的功能。Referring to FIG. 12, a schematic diagram of a terminal device according to an embodiment of the present application is shown. As shown in FIG. 12, the terminal device 1200 of this embodiment includes: a processor 1210, a memory 1220, and a computer program 1221 that is stored in the memory 1220 and can run on the processor 1210. When the processor 1210 executes the computer program 1221, the steps in the various embodiments of the voice interaction method described above are implemented, for example, steps S701 to S705 shown in FIG. 7. Alternatively, when the processor 1210 executes the computer program 1221, the functions of the modules/units in the foregoing device embodiments are implemented, for example, the functions of the modules 1101 to 1106 shown in FIG. 11.
示例性的,所述计算机程序1221可以被分割成一个或多个模块/单元,所述一个或者多个模块/单元被存储在所述存储器1220中,并由所述处理器1210执行,以完成本申请。所述一个或多个模块/单元可以是能够完成特定功能的一系列计算机程序指令段,该指令段可以用于描述所述计算机程序1221在所述终端设备1200中的执行过程。例如,所述计算机程序1221可以被分割成历史对话数据获取模块、目标实体信息识别模块、历史实体信息识别模块、关键实体信息提取模块、目标交互语句生成模块和回复语句输出模块,各模块具体功能如下:Exemplarily, the computer program 1221 may be divided into one or more modules/units, and the one or more modules/units are stored in the memory 1220 and executed by the processor 1210 to complete This application. The one or more modules/units may be a series of computer program instruction segments capable of completing specific functions, and the instruction segments may be used to describe the execution process of the computer program 1221 in the terminal device 1200. For example, the computer program 1221 can be divided into a historical dialogue data acquisition module, a target entity information recognition module, a historical entity information recognition module, a key entity information extraction module, a target interactive sentence generation module, and a reply sentence output module. The specific functions of each module are as follows:
历史对话数据获取模块,用于在接收到待回复的用户语句时,获取历史对话数据;The historical dialogue data acquisition module is used to acquire historical dialogue data when the user sentence to be replied is received;
目标实体信息识别模块,用于识别所述用户语句中的目标实体信息;The target entity information identification module is used to identify the target entity information in the user sentence;
历史实体信息识别模块,用于识别所述历史对话数据中的历史实体信息;The historical entity information identification module is used to identify the historical entity information in the historical dialogue data;
关键实体信息提取模块,用于从所述历史实体信息中提取与所述用户语句相关联的关键实体信息;A key entity information extraction module for extracting key entity information associated with the user sentence from the historical entity information;
目标交互语句生成模块,用于根据所述目标实体信息和所述关键实体信息,生成目标交互语句;A target interactive sentence generating module, configured to generate a target interactive sentence according to the target entity information and the key entity information;
回复语句输出模块,用于输出与所述目标交互语句相对应的回复语句。The reply sentence output module is used to output the reply sentence corresponding to the target interactive sentence.
所述终端设备1200可以是桌上型计算机、笔记本、掌上电脑等计算设备。所述终端设备1200可包括,但不仅限于,处理器1210、存储器1220。本领域技术人员可以理解,图12仅仅是终端设备1200的一种示例,并不构成对终端设备1200的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件,例如所述终端设备1200还可以包括输入输出设备、网络接入设备、总线等。The terminal device 1200 may be a computing device such as a desktop computer, a notebook, or a palmtop computer. The terminal device 1200 may include, but is not limited to, a processor 1210 and a memory 1220. Those skilled in the art can understand that FIG. 12 is only an example of the terminal device 1200, and does not constitute a limitation on the terminal device 1200. It may include more or less components than those shown in the figure, or combine some components, or different components. For example, the terminal device 1200 may also include input and output devices, network access devices, buses, and so on.
所述处理器1210可以是中央处理单元(Central Processing Unit,CPU),还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。The processor 1210 may be a central processing unit (Central Processing Unit, CPU), other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (ASIC), Ready-made programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc. The general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
所述存储器1220可以是所述终端设备1200的内部存储单元,例如终端设备1200的硬盘或内存。所述存储器1220也可以是所述终端设备1200的外部存储设备,例如所述终端设备1200上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等等。进一步地,所述存储器1220还可以既包括所述终端设备1200的内部存储单元也包括外部存储设备。所述存储器1220用于存储所述计算机程序1221以及所述终端设备1200所需的其他程序和数据。所述存储器1220还可以用于暂时地存储已经输出或者将要输出的数据。The memory 1220 may be an internal storage unit of the terminal device 1200, such as a hard disk or a memory of the terminal device 1200. The memory 1220 may also be an external storage device of the terminal device 1200, such as a plug-in hard disk equipped on the terminal device 1200, a smart memory card (Smart Media Card, SMC), and a Secure Digital (SD) Card, Flash Card, etc. Further, the memory 1220 may also include both an internal storage unit of the terminal device 1200 and an external storage device. The memory 1220 is used to store the computer program 1221 and other programs and data required by the terminal device 1200. The memory 1220 can also be used to temporarily store data that has been output or will be output.
本申请实施例还公开了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时可以实现前述的语音交互方法。The embodiment of the present application also discloses a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the aforementioned voice interaction method can be realized.
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述或记载的部分,可以参见其他实施例的相关描述。In the above-mentioned embodiments, the description of each embodiment has its own emphasis. For parts that are not described in detail or recorded in an embodiment, reference may be made to related descriptions of other embodiments.
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功 能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。A person of ordinary skill in the art may realize that the units and algorithm steps of the examples described in combination with the embodiments disclosed herein can be implemented by electronic hardware or a combination of computer software and electronic hardware. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered beyond the scope of this application.
在本申请所提供的实施例中,应该理解到,所揭露的语音交互方法、装置和终端设备,可以通过其他的方式实现。例如,所述模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通讯连接可以是通过一些接口,装置或单元的间接耦合或通讯连接,可以是电性,机械或其他的形式。In the embodiments provided in this application, it should be understood that the disclosed voice interaction method, device, and terminal device can be implemented in other ways. For example, the division of the modules or units is only a logical function division, and there may be other divisions in actual implementation. For example, multiple units or components can be combined or integrated into another system, or some features can be ignored. Or not. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实现上述实施例方法中的全部或部分流程,可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一计算机可读存储介质中,该计算机程序在被处理器执行时,可实现上述各个方法实施例的步骤。其中,所述计算机程序包括计算机程序代码,所述计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述计算机可读介质至少可以包括:能够将计算机程序代码携带到语音交互装置或终端设备的任何实体或装置、记录介质、计算机存储器、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、电载波信号、电信信号以及软件分发介质。例如U盘、移动硬盘、磁碟或者光盘等。在某些司法管辖区,根据立法和专利实践,计算机可读介质不可以是电载波信号和电信信号。If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium. Based on this understanding, the implementation of all or part of the processes in the above-mentioned embodiment methods in the present application can be accomplished by instructing relevant hardware through a computer program. The computer program can be stored in a computer-readable storage medium. When executed by the processor, the steps of the foregoing method embodiments can be implemented. Wherein, the computer program includes computer program code, and the computer program code may be in the form of source code, object code, executable file, or some intermediate forms. The computer-readable medium may include at least: any entity or device capable of carrying computer program code to a voice interaction device or terminal device, recording medium, computer memory, read-only memory (ROM, Read-Only Memory), random access Memory (RAM, Random Access Memory), electric carrier signal, telecommunications signal, and software distribution medium. For example, U disk, mobile hard disk, floppy disk or CD-ROM, etc. In some jurisdictions, according to legislation and patent practices, computer-readable media cannot be electrical carrier signals and telecommunication signals.
最后应说明的是:以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何在本申请揭露的技术范围内的变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。Finally, it should be noted that the above are only specific implementations of this application, but the scope of protection of this application is not limited to this. Any changes or substitutions within the technical scope disclosed in this application shall be covered by this application. Within the scope of protection applied for. Therefore, the protection scope of this application should be subject to the protection scope of the claims.

Claims (14)

  1. 一种语音交互方法,其特征在于,包括:A voice interaction method, characterized in that it comprises:
    当接收到待回复的用户语句时,获取历史对话数据;When the user sentence to be replied is received, the historical conversation data is obtained;
    识别所述用户语句中的目标实体信息,以及识别所述历史对话数据中的历史实体信息;Identifying the target entity information in the user sentence, and identifying the historical entity information in the historical dialogue data;
    从所述历史实体信息中提取与所述用户语句相关联的关键实体信息;Extracting key entity information associated with the user sentence from the historical entity information;
    根据所述目标实体信息和所述关键实体信息,生成目标交互语句;Generating a target interactive sentence according to the target entity information and the key entity information;
    输出与所述目标交互语句相对应的回复语句。A reply sentence corresponding to the target interactive sentence is output.
  2. 根据权利要求1所述的方法,其特征在于,所述从所述历史实体信息中提取与所述用户语句相关联的关键实体信息,包括:The method according to claim 1, wherein said extracting key entity information associated with said user sentence from said historical entity information comprises:
    根据所述目标实体信息和所述历史实体信息,确定与所述用户语句相匹配的候选用户意图;Determine, according to the target entity information and the historical entity information, candidate user intentions that match the user sentence;
    分别计算每个历史实体信息在所述历史对话数据中的分布概率;Respectively calculating the distribution probability of each historical entity information in the historical dialogue data;
    根据所述分布概率和所述候选用户意图,从所述历史实体信息中提取关键实体信息。According to the distribution probability and the intention of the candidate user, key entity information is extracted from the historical entity information.
  3. 根据权利要求2所述的方法,其特征在于,所述分别计算每个历史实体信息在所述历史对话数据中的分布概率,包括:The method according to claim 2, wherein the separately calculating the distribution probability of each historical entity information in the historical dialogue data comprises:
    调用预设的指针生成网络模型,所述指针生成网络模型包括编码模块;Calling a preset pointer generation network model, the pointer generation network model including an encoding module;
    采用所述编码模块分别对每个历史实体信息进行编码,获得与所述每个历史实体信息相对应的分布概率。The encoding module is used to separately encode each historical entity information to obtain the distribution probability corresponding to each historical entity information.
  4. 根据权利要求2或3所述的方法,其特征在于,所述根据所述分布概率和所述候选用户意图,从所述历史实体信息中提取关键实体信息,包括:The method according to claim 2 or 3, wherein said extracting key entity information from said historical entity information according to said distribution probability and said candidate user intentions comprises:
    从所述历史实体信息中提取与任一候选用户意图相关联的候选实体信息;Extract candidate entity information associated with any candidate user's intention from the historical entity information;
    提取所述分布概率的概率值大于预设概率阈值的候选实体信息,作为关键实体信息。Extract candidate entity information whose distribution probability has a probability value greater than a preset probability threshold as key entity information.
  5. 根据权利要求4所述的方法,其特征在于,还包括:The method according to claim 4, further comprising:
    若目标概率值与所述预设概率阈值之间的差值小于预设差值,且所述目标概率值小于所述预设概率阈值,则根据所述目标概率值对应的候选实体信息和所述关键实体信息,生成询问语句,以指示用户对所述目标概率值对应的候选实体信息进行识别;If the difference between the target probability value and the preset probability threshold is less than the preset difference, and the target probability value is less than the preset probability threshold, then according to the candidate entity information corresponding to the target probability value and all The key entity information is described, and a query sentence is generated to instruct the user to identify the candidate entity information corresponding to the target probability value;
    当接收到用户针对所述询问语句的确认信息时,将所述目标概率值对应的候选实体信息确定为所述关键实体信息,所述目标概率值为任一候选实体信息在所述历史对话数据中的分布概率的概率值。When the user's confirmation information for the query sentence is received, the candidate entity information corresponding to the target probability value is determined as the key entity information, and the target probability value is any candidate entity information in the historical dialogue data. The probability value of the distribution probability in.
  6. 根据权利要求3所述的方法,其特征在于,所述根据所述目标实体信息和所述关键实体信息,生成目标交互语句,包括:The method according to claim 3, wherein the generating a target interaction sentence according to the target entity information and the key entity information comprises:
    确定目标基础语句;Determine the target basic sentence;
    采用所述目标实体信息和所述关键实体信息,对所述目标基础语句进行改写,生成目标交互语句。Using the target entity information and the key entity information, the target basic sentence is rewritten to generate a target interactive sentence.
  7. 根据权利要求6所述的方法,其特征在于,所述确定目标基础语句,包括:The method according to claim 6, wherein said determining the target basic sentence comprises:
    从包含所述目标实体信息的用户语句,以及包含所述关键实体信息的历史对话数 据中,获取多个基础语句;Obtain a plurality of basic sentences from the user sentence containing the target entity information and the historical dialogue data containing the key entity information;
    分别计算所述多个基础语句与待评估实体信息之间的匹配度,所述待评估实体信息包括所述目标实体信息和所述关键实体信息;Respectively calculating the matching degree between the multiple basic sentences and the entity information to be evaluated, where the entity information to be evaluated includes the target entity information and the key entity information;
    识别所述匹配度最大值对应的基础语句为当前的目标基础语句。Identify the basic sentence corresponding to the maximum matching degree as the current target basic sentence.
  8. 根据权利要求7所述的方法,其特征在于,任一基础语句分别包括多个语义槽位,所述分别计算所述多个基础语句与所述待评估实体信息之间的匹配度,包括:The method according to claim 7, wherein any basic sentence includes a plurality of semantic slots respectively, and the calculating the matching degree between the plurality of basic sentences and the entity information to be evaluated respectively comprises:
    针对任一基础语句,统计所述基础语句中的语义槽位个数,以及所述待评估实体信息的个数;For any basic sentence, count the number of semantic slots in the basic sentence and the number of entity information to be evaluated;
    确定所述基础语句中分别与所述待评估实体信息相匹配的关键槽位个数;Determine the number of key slots in the basic sentence that respectively match the information of the entity to be evaluated;
    计算所述关键槽位个数与所述基础语句中的语义槽位个数之间的比值,将所述比值作为所述待评估实体信息与所述基础语句之间的匹配度。The ratio between the number of key slots and the number of semantic slots in the basic sentence is calculated, and the ratio is used as the matching degree between the entity information to be evaluated and the basic sentence.
  9. 根据权利要求7或8所述的方法,其特征在于,所述指针生成网络模型还包括解码模块,所述解码模块通过对多种训练数据进行训练得到,所述多种训练数据包括多个实体信息以及与每个实体信息相对应的基础语句;The method according to claim 7 or 8, wherein the pointer generation network model further comprises a decoding module, the decoding module is obtained by training various types of training data, the various types of training data including multiple entities Information and basic sentences corresponding to each entity information;
    所述采用所述目标实体信息和所述关键实体信息,对所述目标基础语句进行改写,生成目标交互语句,包括:The using the target entity information and the key entity information to rewrite the target basic sentence to generate a target interactive sentence includes:
    采用所述解码模块对所述目标实体信息、所述关键实体信息和所述目标基础语句进行解码,输出目标交互语句。The decoding module is used to decode the target entity information, the key entity information, and the target basic sentence, and output a target interactive sentence.
  10. 根据权利要求9所述的方法,其特征在于,还包括:The method according to claim 9, further comprising:
    提取所述目标交互语句中的多个实体信息;Extract multiple entity information in the target interactive sentence;
    验证所述目标交互语句中的多个实体信息是否匹配预设的知识库中目标用户意图的语义槽位,所述目标用户意图为所述候选用户意图中的任意一个;Verifying whether the multiple entity information in the target interaction sentence matches a preset semantic slot of the target user's intention in the knowledge base, and the target user's intention is any one of the candidate user's intentions;
    若所述目标交互语句中的多个实体信息匹配所述目标用户意图的语义槽位,则判定生成的所述目标交互语句正确,并执行输出与所述目标交互语句相对应的回复语句的步骤;If multiple entity information in the target interaction sentence matches the semantic slot intended by the target user, it is determined that the generated target interaction sentence is correct, and the step of outputting a reply sentence corresponding to the target interaction sentence is executed ;
    若所述目标交互语句中的多个实体信息不匹配所述目标用户意图的语义槽位,则根据所述目标交互语句的语句类型,对所述目标交互语句进行验证。If the multiple entity information in the target interaction sentence does not match the semantic slot intended by the target user, the target interaction sentence is verified according to the sentence type of the target interaction sentence.
  11. 根据权利要求10所述的方法,其特征在于,所述根据所述目标交互语句的语句类型,对所述目标交互语句进行验证,包括:The method according to claim 10, wherein the verifying the target interactive sentence according to the sentence type of the target interactive sentence comprises:
    调用预设的自然语言理解模型判断所述目标交互语句是否为任务型语句;Calling a preset natural language understanding model to determine whether the target interactive sentence is a task-type sentence;
    若所述目标交互语句为任务型语句,则执行输出与所述目标交互语句相对应的回复语句的步骤;If the target interactive sentence is a task-type sentence, execute the step of outputting a reply sentence corresponding to the target interactive sentence;
    若所述目标交互语句不为任务型语句,则提示用户重新输入用户语句,并根据重新输入的用户语句再次生成目标交互语句。If the target interactive sentence is not a task-type sentence, the user is prompted to re-enter the user sentence, and the target interactive sentence is generated again according to the re-entered user sentence.
  12. 一种语音交互装置,其特征在于,包括:A voice interaction device, characterized in that it comprises:
    历史对话数据获取模块,用于在接收到待回复的用户语句时,获取历史对话数据;The historical dialogue data acquisition module is used to acquire historical dialogue data when the user sentence to be replied is received;
    目标实体信息识别模块,用于识别所述用户语句中的目标实体信息;以及,The target entity information identification module is used to identify the target entity information in the user sentence; and,
    历史实体信息识别模块,用于识别所述历史对话数据中的历史实体信息;The historical entity information identification module is used to identify the historical entity information in the historical dialogue data;
    关键实体信息提取模块,用于从所述历史实体信息中提取与所述用户语句相关联 的关键实体信息;The key entity information extraction module is used to extract key entity information associated with the user sentence from the historical entity information;
    目标交互语句生成模块,用于根据所述目标实体信息和所述关键实体信息,生成目标交互语句;A target interactive sentence generating module, configured to generate a target interactive sentence according to the target entity information and the key entity information;
    回复语句输出模块,用于输出与所述目标交互语句相对应的回复语句。The reply sentence output module is used to output the reply sentence corresponding to the target interactive sentence.
  13. 一种终端设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现如权利要求1至11任一项所述的语音交互方法。A terminal device, comprising a memory, a processor, and a computer program stored in the memory and running on the processor, wherein the processor executes the computer program as claimed in claims 1 to 11. The voice interaction method described in any one of items.
  14. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1至11任一项所述的语音交互方法。A computer-readable storage medium storing a computer program, wherein the computer program implements the voice interaction method according to any one of claims 1 to 11 when the computer program is executed by a processor.
PCT/CN2021/079479 2020-03-31 2021-03-08 Voice interaction method and apparatus, and terminal device WO2021196981A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010244784.8A CN111428483B (en) 2020-03-31 2020-03-31 Voice interaction method and device and terminal equipment
CN202010244784.8 2020-03-31

Publications (1)

Publication Number Publication Date
WO2021196981A1 true WO2021196981A1 (en) 2021-10-07

Family

ID=71557320

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/079479 WO2021196981A1 (en) 2020-03-31 2021-03-08 Voice interaction method and apparatus, and terminal device

Country Status (2)

Country Link
CN (1) CN111428483B (en)
WO (1) WO2021196981A1 (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115545002A (en) * 2022-11-29 2022-12-30 支付宝(杭州)信息技术有限公司 Method, device, storage medium and equipment for model training and business processing
CN115579008A (en) * 2022-12-05 2023-01-06 广州小鹏汽车科技有限公司 Voice interaction method, server and computer readable storage medium
CN115934922A (en) * 2023-03-09 2023-04-07 杭州心识宇宙科技有限公司 Conversation service execution method and device, storage medium and electronic equipment
US20230290347A1 (en) * 2020-11-20 2023-09-14 Beijing Baidu Netcom Science And Technology Co., Ltd. Voice interaction method and apparatus, device and computer storage medium
CN116933800A (en) * 2023-09-12 2023-10-24 深圳须弥云图空间科技有限公司 Template-based generation type intention recognition method and device
CN117076620A (en) * 2023-06-25 2023-11-17 北京百度网讯科技有限公司 Dialogue processing method and device, electronic equipment and storage medium
CN117172732A (en) * 2023-07-31 2023-12-05 北京五八赶集信息技术有限公司 Recruitment service system, method, equipment and storage medium based on AI
WO2024008056A1 (en) * 2022-07-08 2024-01-11 中国疾病预防控制中心慢性非传染性疾病预防控制中心 System for helping operator question help seeker
CN117421416A (en) * 2023-12-19 2024-01-19 数据空间研究院 Interactive search method and device and electronic equipment
WO2024077878A1 (en) * 2022-10-13 2024-04-18 深圳市人马互动科技有限公司 Voice outbound call processing method and related apparatus

Families Citing this family (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111428483B (en) * 2020-03-31 2022-05-24 华为技术有限公司 Voice interaction method and device and terminal equipment
CN111966803B (en) * 2020-08-03 2024-04-12 深圳市欢太科技有限公司 Dialogue simulation method and device, storage medium and electronic equipment
CN112084768A (en) * 2020-08-06 2020-12-15 珠海格力电器股份有限公司 Multi-round interaction method and device and storage medium
CN111949793B (en) * 2020-08-13 2024-02-27 深圳市欢太科技有限公司 User intention recognition method and device and terminal equipment
CN112183105A (en) * 2020-08-28 2021-01-05 华为技术有限公司 Man-machine interaction method and device
CN112100349B (en) * 2020-09-03 2024-03-19 深圳数联天下智能科技有限公司 Multi-round dialogue method and device, electronic equipment and storage medium
CN112256229B (en) * 2020-09-11 2024-05-14 北京三快在线科技有限公司 Man-machine voice interaction method and device, electronic equipment and storage medium
CN112183097B (en) * 2020-09-27 2024-06-21 深圳追一科技有限公司 Entity recall method and related device
CN112199473A (en) * 2020-10-16 2021-01-08 上海明略人工智能(集团)有限公司 Multi-turn dialogue method and device in knowledge question-answering system
CN112331201A (en) * 2020-11-03 2021-02-05 珠海格力电器股份有限公司 Voice interaction method and device, storage medium and electronic device
CN112395887A (en) * 2020-11-05 2021-02-23 北京文思海辉金信软件有限公司 Dialogue response method, dialogue response device, computer equipment and storage medium
CN112527998A (en) * 2020-12-22 2021-03-19 深圳市优必选科技股份有限公司 Reply recommendation method, reply recommendation device and intelligent device
CN112632251B (en) * 2020-12-24 2023-12-29 北京百度网讯科技有限公司 Reply content generation method, device, equipment and storage medium
CN112735374B (en) * 2020-12-29 2023-01-06 北京三快在线科技有限公司 Automatic voice interaction method and device
CN112699228B (en) * 2020-12-31 2023-07-14 青岛海尔科技有限公司 Service access method, device, electronic equipment and storage medium
CN112650846A (en) * 2021-01-13 2021-04-13 北京智通云联科技有限公司 Question-answer intention knowledge base construction system and method based on question frame
CN112783324B (en) * 2021-01-14 2023-12-01 科大讯飞股份有限公司 Man-machine interaction method and device and computer storage medium
CN112836030B (en) * 2021-01-29 2023-04-25 成都视海芯图微电子有限公司 Intelligent dialogue system and method
CN112989008A (en) * 2021-04-21 2021-06-18 上海汽车集团股份有限公司 Multi-turn dialog rewriting method and device and electronic equipment
CN113436752B (en) * 2021-05-26 2023-04-28 山东大学 Semi-supervised multi-round medical dialogue reply generation method and system
CN113536788B (en) * 2021-07-28 2023-12-05 平安科技(上海)有限公司 Information processing method, device, storage medium and equipment
CN113590750A (en) * 2021-07-30 2021-11-02 北京小米移动软件有限公司 Man-machine conversation method, device, electronic equipment and storage medium
CN113806508A (en) * 2021-09-17 2021-12-17 平安普惠企业管理有限公司 Multi-turn dialogue method and device based on artificial intelligence and storage medium
CN114861680B (en) * 2022-05-27 2023-07-25 马上消费金融股份有限公司 Dialogue processing method and device
CN115759122A (en) * 2022-11-03 2023-03-07 支付宝(杭州)信息技术有限公司 Intention identification method, device, equipment and readable storage medium
CN116521850B (en) * 2023-07-04 2023-12-01 北京红棉小冰科技有限公司 Interaction method and device based on reinforcement learning
CN116975654B (en) * 2023-08-22 2024-01-05 腾讯科技(深圳)有限公司 Object interaction method and device, electronic equipment and storage medium
CN117078270B (en) * 2023-10-17 2024-02-02 彩讯科技股份有限公司 Intelligent interaction method and device for network product marketing

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107885756A (en) * 2016-09-30 2018-04-06 华为技术有限公司 Dialogue method, device and equipment based on deep learning
CN109101492A (en) * 2018-07-25 2018-12-28 南京瓦尔基里网络科技有限公司 Usage history conversation activity carries out the method and system of entity extraction in a kind of natural language processing
CN109697282A (en) * 2017-10-20 2019-04-30 阿里巴巴集团控股有限公司 A kind of the user's intension recognizing method and device of sentence
CN110334201A (en) * 2019-07-18 2019-10-15 中国工商银行股份有限公司 A kind of intension recognizing method, apparatus and system
CN111428483A (en) * 2020-03-31 2020-07-17 华为技术有限公司 Voice interaction method and device and terminal equipment

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107369443B (en) * 2017-06-29 2020-09-25 北京百度网讯科技有限公司 Dialog management method and device based on artificial intelligence
CN108228764A (en) * 2017-12-27 2018-06-29 神思电子技术股份有限公司 A kind of single-wheel dialogue and the fusion method of more wheel dialogues
US10593350B2 (en) * 2018-04-21 2020-03-17 International Business Machines Corporation Quantifying customer care utilizing emotional assessments
CN109086329B (en) * 2018-06-29 2021-01-05 出门问问信息科技有限公司 Topic keyword guide-based multi-turn conversation method and device
CN109461039A (en) * 2018-08-28 2019-03-12 厦门快商通信息技术有限公司 A kind of text handling method and intelligent customer service method
CN110162675B (en) * 2018-09-25 2023-05-02 腾讯科技(深圳)有限公司 Method and device for generating answer sentence, computer readable medium and electronic device
CN109582767B (en) * 2018-11-21 2024-05-17 北京京东尚科信息技术有限公司 Dialogue system processing method, device, equipment and readable storage medium
CN109918673B (en) * 2019-03-14 2021-08-03 湖北亿咖通科技有限公司 Semantic arbitration method and device, electronic equipment and computer-readable storage medium
CN110263330B (en) * 2019-05-22 2024-06-25 腾讯科技(深圳)有限公司 Method, device, equipment and storage medium for rewriting problem statement
CN110209791B (en) * 2019-06-12 2021-03-26 百融云创科技股份有限公司 Multi-round dialogue intelligent voice interaction system and device
CN110442676A (en) * 2019-07-02 2019-11-12 北京邮电大学 Patent retrieval method and device based on more wheel dialogues
CN110390108B (en) * 2019-07-29 2023-11-21 中国工商银行股份有限公司 Task type interaction method and system based on deep reinforcement learning
CN110704596B (en) * 2019-09-29 2023-03-31 北京百度网讯科技有限公司 Topic-based conversation method and device and electronic equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107885756A (en) * 2016-09-30 2018-04-06 华为技术有限公司 Dialogue method, device and equipment based on deep learning
CN109697282A (en) * 2017-10-20 2019-04-30 阿里巴巴集团控股有限公司 A kind of the user's intension recognizing method and device of sentence
CN109101492A (en) * 2018-07-25 2018-12-28 南京瓦尔基里网络科技有限公司 Usage history conversation activity carries out the method and system of entity extraction in a kind of natural language processing
CN110334201A (en) * 2019-07-18 2019-10-15 中国工商银行股份有限公司 A kind of intension recognizing method, apparatus and system
CN111428483A (en) * 2020-03-31 2020-07-17 华为技术有限公司 Voice interaction method and device and terminal equipment

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20230290347A1 (en) * 2020-11-20 2023-09-14 Beijing Baidu Netcom Science And Technology Co., Ltd. Voice interaction method and apparatus, device and computer storage medium
WO2024008056A1 (en) * 2022-07-08 2024-01-11 中国疾病预防控制中心慢性非传染性疾病预防控制中心 System for helping operator question help seeker
WO2024077878A1 (en) * 2022-10-13 2024-04-18 深圳市人马互动科技有限公司 Voice outbound call processing method and related apparatus
CN115545002B (en) * 2022-11-29 2023-03-31 支付宝(杭州)信息技术有限公司 Model training and business processing method, device, storage medium and equipment
CN115545002A (en) * 2022-11-29 2022-12-30 支付宝(杭州)信息技术有限公司 Method, device, storage medium and equipment for model training and business processing
CN115579008A (en) * 2022-12-05 2023-01-06 广州小鹏汽车科技有限公司 Voice interaction method, server and computer readable storage medium
CN115579008B (en) * 2022-12-05 2023-03-31 广州小鹏汽车科技有限公司 Voice interaction method, server and computer readable storage medium
CN115934922B (en) * 2023-03-09 2024-01-30 杭州心识宇宙科技有限公司 Dialogue service execution method and device, storage medium and electronic equipment
CN115934922A (en) * 2023-03-09 2023-04-07 杭州心识宇宙科技有限公司 Conversation service execution method and device, storage medium and electronic equipment
CN117076620A (en) * 2023-06-25 2023-11-17 北京百度网讯科技有限公司 Dialogue processing method and device, electronic equipment and storage medium
CN117172732A (en) * 2023-07-31 2023-12-05 北京五八赶集信息技术有限公司 Recruitment service system, method, equipment and storage medium based on AI
CN116933800A (en) * 2023-09-12 2023-10-24 深圳须弥云图空间科技有限公司 Template-based generation type intention recognition method and device
CN116933800B (en) * 2023-09-12 2024-01-05 深圳须弥云图空间科技有限公司 Template-based generation type intention recognition method and device
CN117421416A (en) * 2023-12-19 2024-01-19 数据空间研究院 Interactive search method and device and electronic equipment
CN117421416B (en) * 2023-12-19 2024-03-26 数据空间研究院 Interactive search method and device and electronic equipment

Also Published As

Publication number Publication date
CN111428483A (en) 2020-07-17
CN111428483B (en) 2022-05-24

Similar Documents

Publication Publication Date Title
WO2021196981A1 (en) Voice interaction method and apparatus, and terminal device
CN112071302B (en) Synthetic speech selection for computing agents
AU2014281049B2 (en) Environmentally aware dialog policies and response generation
JP6987814B2 (en) Visual presentation of information related to natural language conversation
KR102436293B1 (en) Determining an agent to perform an action based at least in part on the image data
CN111261144B (en) Voice recognition method, device, terminal and storage medium
US11556698B2 (en) Augmenting textual explanations with complete discourse trees
US11893993B2 (en) Interfacing with applications via dynamically updating natural language processing
US11514896B2 (en) Interfacing with applications via dynamically updating natural language processing
CN110825863B (en) Text pair fusion method and device
US10474439B2 (en) Systems and methods for building conversational understanding systems
CN113826089A (en) Contextual feedback with expiration indicators for natural understanding systems in chat robots
US11521619B2 (en) System and method for modifying speech recognition result
CN109643540A (en) System and method for artificial intelligent voice evolution
CN110325987A (en) Context voice driven depth bookmark
US20220366904A1 (en) Active Listening for Assistant Systems
US20210250438A1 (en) Graphical User Interface for a Voice Response System
US9183196B1 (en) Parsing annotator framework from external services
US12008988B2 (en) Electronic apparatus and controlling method thereof
WO2019071607A1 (en) Voice information processing method and device, and terminal
CN112002313B (en) Interaction method and device, sound box, electronic equipment and storage medium
US20220415309A1 (en) Priority and context-based routing of speech processing
CN114138943A (en) Dialog message generation method and device, electronic equipment and storage medium
CN113590769A (en) State tracking method and device in task-driven multi-turn dialogue system
US20210264910A1 (en) User-driven content generation for virtual assistant

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21778927

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21778927

Country of ref document: EP

Kind code of ref document: A1