WO2021196981A1 - Procédé et appareil d'interaction vocale, et dispositif terminal - Google Patents
Procédé et appareil d'interaction vocale, et dispositif terminal Download PDFInfo
- Publication number
- WO2021196981A1 WO2021196981A1 PCT/CN2021/079479 CN2021079479W WO2021196981A1 WO 2021196981 A1 WO2021196981 A1 WO 2021196981A1 CN 2021079479 W CN2021079479 W CN 2021079479W WO 2021196981 A1 WO2021196981 A1 WO 2021196981A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- sentence
- entity information
- target
- user
- historical
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3343—Query execution using phonetics
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/38—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
Definitions
- This application relates to the field of artificial intelligence technology, and in particular to a voice interaction method, device and terminal equipment.
- Natural language processing is an important part of artificial intelligence (Artificial Intelligence, AI), and its typical application scenarios include task-oriented dialogue systems and machine translation.
- AI Artificial Intelligence
- DST Dialogue State Tracking
- the DST method based on machine learning requires the model to understand the content of multiple rounds of dialogue well, which places extremely high requirements on the model, which largely limits the accuracy of this type of DST method.
- due to the high abstraction of natural language and the complexity of multiple rounds of dialogue it is difficult for current machine learning technology to fully and accurately understand multiple rounds of dialogue in practical application scenarios, that is, it is difficult to accurately track the state of multiple rounds of dialogue and determine User intent.
- the embodiments of the present application provide a voice interaction method, device, and terminal device, which can solve the problem of the difficulty in tracking the state of multiple rounds of dialogue in the prior art and the inability to accurately determine the user's intention.
- an embodiment of the present application provides a voice interaction method, including:
- the historical dialogue data is obtained, and the target entity information in the user sentence and the historical entity information in the historical dialogue data are identified through the named entity recognition model; then, the historical entity information is extracted from the historical entity information and the user
- the key entity information associated with the sentence can rewrite the current user sentence according to the target entity information and the key entity information to generate a target interaction sentence; by outputting a reply sentence corresponding to the target interaction sentence, the user's interaction needs can be met.
- the dialogue state tracking problem in multiple rounds of dialogue can be converted into a single round of dialogue. Problem, it is convenient to use the existing mature single-round dialogue technology to reply to the user's intention, which helps to improve the accuracy of the user's intention recognition and enhance the language processing ability of the dialogue system.
- the key entity information associated with the user sentence is extracted from the historical entity information, and the candidate users that match the user sentence can be initially determined based on the target entity information and the historical entity information Intent; then calculate the distribution probability of each historical entity information in the historical dialogue data, so that according to the distribution probability and candidate user intent, the key entity information can be extracted from the historical entity information.
- separately calculating the distribution probability of each historical entity information in the historical dialogue data can be implemented by calling a preset pointer generation network model, and the pointer generation network model includes an encoding module, The encoding module can be used to separately encode each historical entity information to obtain the distribution probability corresponding to each historical entity information.
- the target probability can be
- the candidate entity information corresponding to the value is determined to be the key entity information, and the aforementioned target probability value is the probability value of the distribution probability of any candidate entity information in the historical dialogue data.
- the target basic sentence when generating the target interactive sentence based on the target entity information and key entity information, can be determined first, and then the target entity information and key entity information can be used to determine the target basic sentence.
- the sentence is rewritten to obtain the target interactive sentence, which reduces the difficulty of directly generating the target interactive sentence.
- multiple basic sentences can be obtained from user sentences containing target entity information and historical dialogue data containing key entity information; and then multiple basic sentences and entities to be evaluated can be calculated separately
- the matching degree between the information is identified, and the basic sentence corresponding to the maximum value of the matching degree is recognized as the current target basic sentence.
- the above-mentioned entity information to be evaluated includes all target entity information and key entity information.
- any basic sentence may respectively include multiple semantic slots
- the matching degree between the entity information to be evaluated and the multiple basic sentences may be based on the number of key slots and the basic sentence
- the ratio between the number of semantic slots in Therefore, for any basic sentence, you can first count the number of semantic slots in the basic sentence and the number of entity information to be evaluated; then determine the number of key slots in the basic sentence that match the information of each entity to be evaluated. ; After calculating the ratio between the number of key slots and the number of semantic slots in the basic sentence, the ratio can be used as the matching degree between the entity information to be evaluated and the basic sentence.
- the pointer generation network model may also include a decoding module, and the decoding module may be obtained by training a variety of training data.
- the basic sentence corresponding to the entity information. Therefore, when the target basic sentence is rewritten and the target interactive sentence is generated, the decoding module can be used to complete. Specifically, if the target basic sentence is the current user sentence, the decoding module can be used to decode the key entity information and the target basic sentence and output the target interactive sentence; if the target basic sentence is the user sentence in the historical dialogue data, it can be used The decoding module decodes target entity information, key entity information and target basic sentences, and outputs target interactive sentences.
- the target interaction sentence After obtaining the target interaction sentence, it is also possible to verify whether the rewritten target interaction sentence is correct.
- This embodiment provides a two-layer verification mechanism. First, multiple entity information in the target interaction sentence can be extracted, and it can be verified whether the multiple entity information matches the preset semantic slot of the target user's intention in the knowledge base, and the target user's intention is Any of the candidate user intents.
- the target interactive sentence can be verified a second time according to the sentence type of the target interactive sentence. In the second verification, it is possible to judge whether the target interactive sentence is a task-type sentence by calling a preset natural language understanding model.
- an embodiment of the present application provides a voice interaction device, including:
- the historical dialogue data acquisition module is used to acquire historical dialogue data when the user sentence to be replied is received
- the target entity information identification module is used to identify the target entity information in the user sentence.
- the historical entity information identification module is used to identify the historical entity information in the historical dialogue data
- a key entity information extraction module for extracting key entity information associated with the user sentence from the historical entity information
- a target interactive sentence generating module configured to generate a target interactive sentence according to the target entity information and the key entity information
- the reply sentence output module is used to output the reply sentence corresponding to the target interactive sentence.
- the key entity information extraction module may specifically include the following submodules:
- a candidate user intention determination sub-module configured to determine a candidate user intention that matches the user sentence according to the target entity information and the historical entity information;
- the distribution probability calculation sub-module is used to separately calculate the distribution probability of each historical entity information in the historical dialogue data
- the key entity information extraction sub-module is used to extract key entity information from the historical entity information according to the distribution probability and the candidate user's intention.
- the distribution probability calculation submodule may specifically include the following units:
- the first pointer generation network model calling unit is configured to call a preset pointer generation network model, and use the coding module of the pointer generation network model to respectively encode each historical entity information to obtain information corresponding to each historical entity information. The corresponding distribution probability.
- the key entity information extraction submodule may specifically include the following units:
- a candidate entity information extraction unit configured to extract candidate entity information associated with any candidate user's intention from the historical entity information
- the key entity information extraction unit is configured to extract candidate entity information whose distribution probability is greater than a preset probability threshold as key entity information.
- the key entity information extraction submodule may further include the following units:
- the query sentence generating unit is configured to: if the difference between the target probability value and the preset probability threshold is less than the preset difference, and the target probability value is less than the preset probability threshold, then according to the target probability value Corresponding to the candidate entity information and the key entity information, generating a query sentence to instruct the user to identify the candidate entity information corresponding to the target probability value;
- the key entity information determining unit is configured to determine the candidate entity information corresponding to the target probability value as the key entity information when the user's confirmation information for the query sentence is received, and the target probability value is any candidate The probability value of the distribution probability of the entity information in the historical dialogue data.
- the target interactive sentence generation module may specifically include the following submodules:
- the target basic sentence determination sub-module is used to determine the target basic sentence
- the target interactive sentence generating sub-module is used to use the target entity information and the key entity information to rewrite the target basic sentence to generate a target interactive sentence.
- the target basic sentence determination submodule may specifically include the following units:
- the basic sentence obtaining unit is configured to obtain a plurality of basic sentences from the user sentence containing the target entity information and the historical dialogue data containing the key entity information;
- a matching degree calculation unit configured to calculate the matching degree between the plurality of basic sentences and the entity information to be evaluated, where the entity information to be evaluated includes the target entity information and the key entity information;
- the target basic sentence identification unit is used to identify the basic sentence corresponding to the maximum matching degree as the current target basic sentence.
- any basic sentence may respectively include multiple semantic slots
- the matching degree calculation unit may specifically include the following sub-units:
- the statistics subunit is used to count the number of semantic slots in the basic sentence and the number of entity information to be evaluated for any basic sentence;
- the determining sub-module is used to determine the number of key slots in the basic sentence that respectively match the information of the entity to be evaluated;
- the calculation subunit is used to calculate the ratio between the number of key slots and the number of semantic slots in the basic sentence, and use the ratio as the difference between the entity information to be evaluated and the basic sentence suitability.
- the pointer generation network model may also include a decoding module, which is obtained by training a variety of training data.
- the aforementioned multiple training data includes multiple entity information and information related to each entity.
- the basic sentence corresponding to the information; the aforementioned target interactive sentence generating sub-module may specifically include the following units:
- the second pointer generation network model calling unit is configured to use the decoding module to decode the key entity information and the target basic sentence if the target basic sentence is the user sentence, and output a target interactive sentence; if If the target basic sentence is the historical dialogue data, the decoding module is used to decode the target entity information and the target basic sentence, and output a target interactive sentence.
- the target interactive sentence generation submodule may further include the following units:
- the target interactive sentence entity information extraction unit is used to extract multiple entity information in the target interactive sentence
- the target interactive sentence verification unit is used to verify whether the multiple entity information in the target interactive sentence matches a preset semantic slot of the target user's intention in the knowledge base, and the target user's intention is any of the candidate user intentions One; if the multiple entity information in the target interaction sentence matches the semantic slot intended by the target user, it is determined that the generated target interaction sentence is correct, and a reply sentence corresponding to the target interaction sentence is output; If the multiple entity information in the target interaction sentence does not match the semantic slot intended by the target user, the target interaction sentence is verified according to the sentence type of the target interaction sentence.
- the target interactive sentence verification unit is further configured to call a preset natural language understanding model to determine whether the target interactive sentence is a task-type sentence, and if the target interactive sentence is Task-type sentence, the reply sentence corresponding to the target interactive sentence is output; if the target interactive sentence is not a task-type sentence, the user is prompted to re-enter the user sentence, and the target interactive sentence is generated again according to the re-input user sentence .
- an embodiment of the present application provides a terminal device, including a memory, a processor, and a computer program stored in the memory and running on the processor.
- the processor executes the computer program, The voice interaction method described in any one of the foregoing first aspect is implemented.
- an embodiment of the present application provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor of a terminal device, any one of the above-mentioned aspects of the first aspect is implemented.
- the voice interaction method is a method that uses a computer program to execute the computer program.
- the embodiments of the present application provide a computer program product, which when the computer program product runs on a terminal device, causes the terminal device to execute the voice interaction method described in any one of the above-mentioned first aspects.
- the embodiments of the present application include the following beneficial effects:
- the actual intention of the user can be determined based on the above two kinds of entity information, and the current round can be determined according to the intention.
- the second user sentence is rewritten to generate a target interactive sentence, so that applications such as the voice assistant in the terminal device can respond according to the target interactive sentence.
- the existing mature single-round dialogue technology can be used to reply to the user's intention, and the accuracy of dialogue state tracking and user intention recognition can be improved. It can improve the natural language processing capabilities of the dialogue system, and enhance the rationality of the dialogue system’s reply during multiple rounds of dialogue, so that the system’s reply can better match the actual needs of the user and reduce the number of interactions between the user and the dialogue system.
- Figure 1 is a schematic diagram of the operation process of a multi-round dialogue state tracking scheme based on knowledge base reasoning in the prior art
- FIG. 2 is a schematic diagram of the operation process of a multi-round dialogue state tracking solution based on a learning model in the prior art
- FIG. 3 is a schematic diagram of the operation process of a voice interaction method provided by an embodiment of the present application.
- FIG. 4 is a schematic diagram of an application scenario of a voice interaction method provided by an embodiment of the present application.
- FIG. 5 is a schematic diagram of the hardware structure of a mobile phone to which the voice interaction method provided by an embodiment of the present application is applicable;
- FIG. 6 is a schematic diagram of the software structure of a mobile phone to which the voice interaction method provided by an embodiment of the present application is applicable;
- FIG. 7 is a schematic step flowchart of a voice interaction method provided by an embodiment of the present application.
- FIG. 8 is a schematic step flowchart of a voice interaction method provided by another embodiment of the present application.
- FIG. 9 is a schematic diagram of a calculation process of the distribution probability of entity information provided by an embodiment of the present application.
- FIG. 10 is a schematic diagram of the operation process of a voice interaction method provided by another embodiment of the present application.
- FIG. 11 is a structural block diagram of a voice interaction device provided by an embodiment of the present application.
- FIG. 12 is a schematic structural diagram of a terminal device provided by an embodiment of the present application.
- FIG. 1 it is a schematic diagram of the operation process of a multi-round dialogue state tracking solution in the prior art.
- This scheme is a scheme based on question and answer (Question&Answering, QA) knowledge base reasoning, and its specific operation process is as follows:
- the corresponding candidate multi-round dialogue set can be obtained, as shown in box 102 in FIG. 1.
- the similarity between the current dialogue and the candidate dialogue is calculated.
- the specific strategies include: calculating the semantic similarity between the current input and the candidate question as the first similarity; calculating the context of the current input and each candidate The semantic similarity of the question context is used as the second similarity; the similarity between the summary information of the current multiple rounds of dialogue and each candidate multiple rounds of dialogue is calculated as the third similarity.
- the weighted summation of the three similarities obtains the similarity between each candidate question and the current input, and the response corresponding to the candidate question with the largest similarity is used as the output response. This step is shown in box 103 in FIG. 1.
- the key information extraction in multiple rounds of dialogue has no primary or secondary distinction, that is, no key information related to the current round of input is extracted, and the extracted redundant information will affect the accuracy of dialogue state tracking. ;
- the accuracy of dialogue state tracking largely depends on the coverage of the knowledge base, in view of the complexity of natural language dialogue in real scenes, it is actually difficult to obtain an ideal knowledge base with extensive coverage;
- the The method of obtaining the state tracking results in the scheme depends on various pre-defined rules, which also greatly affects the generalization ability and robustness of the model.
- FIG. 2 it is a schematic diagram of the operation process of another multi-round dialogue state tracking solution in the prior art.
- This solution is a DST solution based on a learning model. It tracks the status information of each round of dialogue in turn, and updates the state of the current round of dialogue through the mechanism of copying the stream, thereby realizing the tracking of the long-term dialogue state.
- the specific operation process includes step S201 -S204:
- the key information in the current round of dialogue and the previous round of dialogue is extracted through the semi-supervised neural network model, and the keyword sequences corresponding to the above two rounds of sentences are generated.
- a new encoder-decoder network based on the copy-stream mechanism is adopted to express dialogue status information by displaying a sequence of words.
- the copy flow mechanism can transmit the information flow of the dialogue history through copying, and finally participate in the generation of the target sentence for the current round of dialogue replies.
- the decoder module is used to automatically generate the target sentence of the current round of dialogue reply, and then complete the response to the user's inquiry.
- the key information of historical dialogue is extracted based on the semi-supervised neural network model, which may lead to the loss or mis-extraction of key information, which will affect the understanding of historical dialogue;
- the historical dialogue is tracked round by round. Updating the dialogue state will easily lead to higher time complexity of the model and error accumulation.
- this solution relies too much on the model’s ability to understand historical dialogue, and the encoder-decoder network at this stage is difficult to achieve a higher level of time complexity. The accuracy meets the actual needs of the scene.
- the core idea of the embodiments of the present application is that in the multi-round dialogue state tracking process of the dialogue system, the sentence of the current round is rewritten based on the key information in the historical dialogue to complete the omitted information of the current round of dialogue. , Thereby converting multiple rounds of dialogue questions into a single round of dialogue.
- the voice interaction method provided by the embodiments of the present application tracks user intentions based on key information of historical dialogues, overcomes the shortcomings of various solutions in the prior art to a certain extent, and improves the accuracy of state tracking in multiple rounds of dialogue and the dialogue system The accuracy of the response.
- the voice interaction method uses a named entity recognition (Named Entity Recognition, NER) module to extract entity information in historical conversations, and then uses Knowledge Bases (KBs) and Pointer-Generator Networks , PGN) model calculates the attention distribution of the entity information in the historical dialogue and the entities in the current round of dialogue. By filtering the entity information in the historical dialogue, redundant entities are discarded, and the participants in the current round of dialogue state tracking are determined Key Information. Such a processing method not only reduces the impact of redundant information on dialogue status tracking, but also provides effective key information for subsequent steps.
- NER Named Entity Recognition
- KBs Knowledge Bases
- PGN Pointer-Generator Networks
- the role of key entity information in the dialogue state (represented by the distribution probability) is calculated, and the feedforward neural network is used as part of the entire model to determine whether it is Directly affect the dialogue state. This avoids tracking the dialogue state round by round and reduces the accumulation of errors.
- a mature single-round dialogue related module is used to generate multiple rounds of dialogue reply sentences to improve the accuracy of the dialogue system's response.
- FIG. 3 it is a schematic diagram of the operation process of the voice interaction method provided by an embodiment of the present application.
- this method first extracts key information directly related to the dialogue state in the historical dialogue, and then rewrites the current round of sentences in combination with the model to complete the tracking and fusion of dialogue state information. Then, using the existing single-round dialogue processing module, on the basis of rewriting the sentence, the corresponding reply to the user's inquiry in the multi-round dialogue is completed.
- the operation process of this method can be realized by the following multiple modules:
- the probability distribution of key information is calculated based on a supervised feedforward network.
- the decoder link of the PGN model is used to complete or rewrite the current round of sentences.
- FIG. 4 it is a schematic diagram of an application scenario of a voice interaction method provided by an embodiment of the present application.
- the user’s query sentence is: "Please tell me the nearest restaurant.”
- the dialogue system replied: "The nearest restaurant is on Nongda South Road, Haidian District.
- the current dialogue can be regarded as the current round of dialogue, the first two
- the round of dialogue is a historical dialogue.
- entities in historical dialogues can be extracted based on the entity extraction module, including “Recent”, “Restaurant”, “Nongda South Road in Haidian District”, “Haidilao”, “Friday” and “Temperature”.
- entity extraction module uses the screening entity module and combine the predefined KBs and PGN models to calculate the probability distribution of the above entities in the historical dialogue, and obtain the key entities related to the dialogue state, namely "Friday” and “Temperature”
- Other entities such as “Xinyi”, “Restaurant”, “Nongda South Road in Haidian District” and “Haidilao” are redundant entities.
- the dialogue system can determine the basic rewrite sentence based on the key entities obtained, that is, "What is the temperature on Friday?"
- the key information distribution prediction module can be used to further predict the probability distribution of key information related to the current dialogue state based on the feedforward network, and the probability of obtaining "temperature” is 0.86, and the probability of "Friday” is 0.72.
- the dialogue system can invite users to participate in the configuration, that is, the dialogue system in Figure 4 can ask the user: "Do you want to query the temperature information of "Friday” in Beijing ?” Then use the sentence rewrite module to generate the rewritten sentence "What is the temperature in Beijing on Friday?”
- the dialogue system can use the generating reply module to generate a reply to the rewritten sentence, that is, the reply of the dialogue system in Figure 4: "The temperature in Beijing on Friday is."
- the basic rewritten sentence may come from historical dialogue data or from the current dialogue. That is, the basic sentence may be selected from a certain user sentence in the historical dialogue or the current user sentence.
- the standard can be determined according to the number of target entity information and key entity information contained in the sentence. For example, in the above example, the historical dialogue "What is the temperature on Friday?" contains two key entity information, "Friday” and "Temperature", while the current dialogue "Beijing" contains only one target entity information. , So you can choose the historical dialogue "What's the temperature on Friday?" as the basis for rewriting the sentence.
- the voice interaction method provided by the embodiments of this application can be applied to mobile phones, tablet computers, wearable devices, in-vehicle devices, augmented reality (AR)/virtual reality (VR) devices, notebook computers, and super mobile personal computers
- AR augmented reality
- VR virtual reality
- UMPC ultra-mobile personal computer
- netbooks netbooks
- PDA personal digital assistant
- Fig. 5 shows a block diagram of a part of the structure of a mobile phone provided by an embodiment of the present application.
- the mobile phone includes: a radio frequency (RF) circuit 510, a memory 520, an input unit 530, a display unit 540, a sensor 550, an audio circuit 560, a wireless fidelity (Wi-Fi) module 570, a processing 580, and power supply 590.
- RF radio frequency
- the RF circuit 510 can be used for receiving and sending signals during information transmission or communication. In particular, after receiving the downlink information of the base station, it is processed by the processor 580; in addition, the designed uplink data is sent to the base station.
- the RF circuit includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier (LNA), a duplexer, and the like.
- the RF circuit 510 can also communicate with the network and other devices through wireless communication.
- the above-mentioned wireless communication can use any communication standard or protocol, including but not limited to Global System of Mobile Communication (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (Code Division) Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), Long Term Evolution (LTE), Email, Short Messaging Service (SMS), etc.
- GSM Global System of Mobile Communication
- GPRS General Packet Radio Service
- CDMA Code Division Multiple Access
- WCDMA Wideband Code Division Multiple Access
- LTE Long Term Evolution
- Email Short Messaging Service
- the memory 520 may be used to store software programs and modules.
- the processor 580 executes various functional applications and data processing of the mobile phone by running the software programs and modules stored in the memory 520.
- the memory 520 may mainly include a program storage area and a data storage area.
- the program storage area may store an operating system, an application program required by at least one function (such as a sound playback function, an image playback function, etc.), etc.; Data created by the use of mobile phones (such as audio data, phone book, etc.), etc.
- the memory 520 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, or other volatile solid-state storage devices.
- the input unit 530 may be used to receive inputted digital or character information, and generate key signal input related to user settings and function control of the mobile phone 500.
- the input unit 530 may include a touch panel 531 and other input devices 532.
- the touch panel 531 also called a touch screen, can collect the user's touch operations on or near it (for example, the user uses any suitable objects or accessories such as fingers, stylus, etc.) on the touch panel 531 or near the touch panel 531. Operation), and drive the corresponding connection device according to the preset program.
- the touch panel 531 may include two parts: a touch detection device and a touch controller.
- the touch detection device detects the user's touch position, detects the signal brought by the touch operation, and transmits the signal to the touch controller; the touch controller receives the touch information from the touch detection device, converts it into contact coordinates, and then sends it To the processor 580, and can receive and execute the commands sent by the processor 580.
- the touch panel 531 can be implemented in multiple types such as resistive, capacitive, infrared, and surface acoustic wave.
- the input unit 530 may also include other input devices 532.
- the other input device 532 may include, but is not limited to, one or more of a physical keyboard, function keys (such as a volume control button, a switch button, etc.), a trackball, a mouse, and a joystick.
- the display unit 540 may be used to display information input by the user or information provided to the user and various menus of the mobile phone.
- the display unit 540 may include a display panel 541.
- the display panel 541 may be configured in the form of a liquid crystal display (Liquid Crystal Display, LCD), an organic light-emitting diode (Organic Light-Emitting Diode, OLED), etc.
- the touch panel 531 can cover the display panel 541. When the touch panel 531 detects a touch operation on or near it, it is transmitted to the processor 580 to determine the type of the touch event, and then the processor 580 determines the type of the touch event. The type provides corresponding visual output on the display panel 541.
- the touch panel 531 and the display panel 541 are used as two independent components to realize the input and input functions of the mobile phone, but in some embodiments, the touch panel 531 and the display panel 541 can be integrated. Realize the input and output functions of the mobile phone.
- the mobile phone 500 may also include at least one sensor 550, such as a light sensor, a motion sensor, and other sensors.
- the light sensor can include an ambient light sensor and a proximity sensor.
- the ambient light sensor can adjust the brightness of the display panel 541 according to the brightness of the ambient light.
- the proximity sensor can close the display panel 541 and/or when the mobile phone is moved to the ear. Or backlight.
- the accelerometer sensor can detect the magnitude of acceleration in various directions (usually three-axis), and can detect the magnitude and direction of gravity when it is stationary.
- the audio circuit 560, the speaker 561, and the microphone 562 can provide an audio interface between the user and the mobile phone.
- the audio circuit 560 can transmit the electric signal converted from the received audio data to the speaker 561, and the speaker 561 converts it into a sound signal for output; on the other hand, the microphone 562 converts the collected sound signal into an electric signal, and the audio circuit 560 After being received, it is converted into audio data, and then processed by the audio data output processor 580, and sent to, for example, another mobile phone via the RF circuit 510, or the audio data is output to the memory 520 for further processing.
- Wi-Fi is a short-distance wireless transmission technology.
- Wi-Fi module 570 mobile phones can help users send and receive emails, browse web pages, and access streaming media. It provides users with wireless broadband Internet access.
- FIG. 5 shows the Wi-Fi module 570, it is understandable that it is not a necessary component of the mobile phone 500, and can be omitted as needed without changing the essence of the invention.
- the processor 580 is the control center of the mobile phone. It uses various interfaces and lines to connect various parts of the entire mobile phone. Various functions and processing data of the mobile phone can be used to monitor the mobile phone as a whole.
- the processor 580 may include one or more processing units; preferably, the processor 580 may integrate an application processor and a modem processor, where the application processor mainly processes the operating system, user interface, application programs, etc. , The modem processor mainly deals with wireless communication. It can be understood that the foregoing modem processor may not be integrated into the processor 580.
- the mobile phone 500 also includes a power source 590 (such as a battery) for supplying power to various components.
- a power source 590 such as a battery
- the power source can be logically connected to the processor 580 through a power management system, so that functions such as charging, discharging, and power consumption management can be managed through the power management system.
- the mobile phone 500 may also include a camera.
- the position of the camera on the mobile phone 500 may be front or rear, which is not limited in the embodiment of the present application.
- the mobile phone 500 may include a single camera, a dual camera, or a triple camera, etc., which is not limited in the embodiment of the present application.
- the mobile phone 500 may include three cameras, of which one is a main camera, one is a wide-angle camera, and one is a telephoto camera.
- the multiple cameras may be all front-mounted, or all rear-mounted, or partly front-mounted and another part rear-mounted, which is not limited in the embodiment of the present application.
- the mobile phone 500 may also include a Bluetooth module, etc., which will not be repeated here.
- FIG. 6 is a schematic diagram of the software structure of a mobile phone 500 according to an embodiment of the present application.
- the Android system is divided into four layers, namely the application layer, the application framework layer (framework, FWK), the system layer, and the hardware abstraction layer. Communication between the layers through software interface.
- the application layer may include a series of application packages, which may include applications such as short message, calendar, camera, video, navigation, gallery, and call.
- applications such as short message, calendar, camera, video, navigation, gallery, and call.
- the application framework layer provides an application programming interface (application programming interface, API) and a programming framework for applications in the application layer.
- the application framework layer may include some predefined functions, such as functions for receiving events sent by the application framework layer.
- the application framework layer can include a window manager, a resource manager, and a notification manager.
- the window manager is used to manage window programs.
- the window manager can obtain the size of the display screen, determine whether there is a status bar, lock the screen, take a screenshot, etc.
- the content provider is used to store and retrieve data and make these data accessible to applications.
- the data may include videos, images, audios, phone calls made and received, browsing history and bookmarks, phone book, etc.
- the resource manager provides various resources for the application, such as localized strings, icons, pictures, layout files, video files, and so on.
- the notification manager enables the application to display notification information in the status bar, which can be used to convey notification-type messages, and it can automatically disappear after a short stay without user interaction.
- the notification manager is used to notify download completion, message reminders, and so on.
- the notification manager can also be a notification that appears in the status bar at the top of the system in the form of a chart or a scroll bar text, such as a notification of an application running in the background, or a notification that appears on the screen in the form of a dialog window. For example, text messages are prompted in the status bar, prompt sounds, electronic devices vibrate, and indicator lights flash.
- the application framework layer can also include:
- a view system which includes visual controls, such as controls that display text, controls that display pictures, and so on.
- the view system can be used to build applications.
- the display interface can be composed of one or more views.
- a display interface that includes a short message notification icon may include a view that displays text and a view that displays pictures.
- the phone manager is used to provide the communication function of the mobile phone 500. For example, the management of the call status (including connecting, hanging up, etc.).
- the system layer can include multiple functional modules. For example: sensor service module, physical state recognition module, 3D graphics processing library (for example: OpenGL ES), etc.
- the sensor service module is used to monitor the sensor data uploaded by various sensors at the hardware layer and determine the physical state of the mobile phone 500;
- Physical state recognition module used to analyze and recognize user gestures, faces, etc.
- the 3D graphics processing library is used to implement 3D graphics drawing, image rendering, synthesis, and layer processing.
- the system layer can also include:
- the surface manager is used to manage the display subsystem and provides a combination of 2D and 3D layers for multiple applications.
- the media library supports playback and recording of a variety of commonly used audio and video formats, as well as still image files.
- the media library can support a variety of audio and video encoding formats, such as: MPEG4, H.264, MP3, AAC, AMR, JPG, PNG, etc.
- the hardware abstraction layer is the layer between hardware and software.
- the hardware abstraction layer can include display drivers, camera drivers, sensor drivers, etc., used to drive related hardware at the hardware layer, such as display screens, cameras, sensors, and so on.
- the following embodiments can be implemented on the mobile phone 500 having the above hardware structure/software structure.
- the following embodiments will take the mobile phone 500 as an example to describe the voice interaction method provided by the embodiments of the present application.
- FIG. 7 a schematic step flowchart of a voice interaction method provided by an embodiment of the present application is shown.
- the method may be applied to the above-mentioned mobile phone 500, and the method may specifically include the following steps:
- the user sentence may be a certain sentence directly spoken by the user when using an application such as a voice assistant in a terminal device. For example, if a user wants to inquire about the weather tomorrow, the user can wake up the voice assistant in the phone and say "what is the weather tomorrow" or a similar sentence.
- the user can make multiple rounds of dialogue with the voice assistant to prompt the voice assistant to fully and accurately understand the user's intention, and return information that satisfies the intention.
- the user sentence to be replied in this embodiment may be a sentence or word spoken by the user during a non-first round of dialogue, that is, the voice assistant has completed at least one round with the user before receiving the user sentence to be replied. dialogue.
- the voice assistant and other programs can obtain the dialogue data of the previous rounds of dialogue between the user and the voice assistant in the current dialogue process, combined with historical dialogue data Determine the real intention of the user in this round of dialogue.
- the historical dialogue data can be all the dialogue data after the user wakes up the voice assistant this time, or it can also be the dialogue data in a specific previous round, such as the data of the first three rounds of this round of dialogue, in this embodiment There is no restriction on this.
- S702 Identify the target entity information in the user sentence, and identify the historical entity information in the historical dialogue data.
- Entity is a term often used in the information world to represent a conceptual thing.
- nouns can be used to represent entity information, such as names of persons, places, organizations, etc.; a small amount of entity information can also be represented by other part-of-speech words, such as adjectives.
- user sentences and entity information in historical dialogue data can be identified based on the NER model.
- the sentence can be segmented first, and then each word after the segmentation is judged one by one whether it belongs to an entity word, and each entity word is labeled.
- the entity information identified from the user sentences of the current round of dialogue can be used as historical entity information in the next round and subsequent dialogue rounds. Therefore, for the entity information in the historical dialogue data, after obtaining each sentence in the historical dialogue data, the sentence can be segmented to find out the entity information; or it can be directly extracted from the previous rounds that have been marked as The words of the entity information are used as historical entity information, which is not limited in this embodiment.
- the user sentence to be replied is the dialogue sentence of the current round, and the entity information contained therein is basically closely related to the user's intention, all the target entity information contained in the user sentence can be retained. For historical entity information, it is necessary to distinguish which is useful information for the current round of dialogue, and which is redundant information.
- the key entity information associated with the current round of dialogue sentences can be filtered from the historical entity information.
- These key entity information can be regarded as information that has obvious benefits in identifying the user's intention.
- multiple user intentions can be set in the voice assistant according to different application scenarios, and multiple associated entity information can be configured for each user intention.
- other entity information that may be included in the intention can be filtered from the intention containing the target entity information, and then the key entity information can be identified from the historical entity information.
- the key entity information can be identified from the historical entity information. For example, for the intention of "weather forecast”, multiple entity information such as "time”, "location”, and “weather conditions” can be configured for it. If the target entity information is "weather conditions", the historical entity information can be added Those entity information that meets the "time” and "location” requirements are identified as key entity information.
- S704 Generate a target interaction sentence according to the target entity information and the key entity information.
- the target interaction sentence matching the actual intention of the user can be generated based on the above two kinds of information.
- the target entity information and key entity information include time information "Friday", location information "Beijing”, and weather condition information "Temperature”, it can be recognized that the user currently wants to query the temperature of Beijing on Friday.
- the target interaction sentence corresponding to this can be "What is the temperature in Beijing this Friday", or other similar sentences.
- the above-mentioned target interaction sentence is also the expression sentence pattern of the information that the user wants to query.
- S705 Output a reply sentence corresponding to the target interactive sentence.
- the function of the voice assistant is to facilitate users to query certain information by voice. Therefore, after identifying the target interaction sentence that matches the user's actual intention, the voice assistant can search for the sentence and find the corresponding reply sentence.
- the corresponding reply sentence may be "The temperature in Beijing on Friday is 17 degrees Celsius”.
- the reply sentence can be broadcast to the user by voice, or displayed in the mobile phone interface in the form of text, or sent to the user's mobile phone in other information formats, which is not limited in this embodiment.
- the actual intention of the user can be determined based on the above two kinds of entity information, and the user’s actual intention can be determined according to the intention.
- the user sentence of the current round is rewritten to generate the target interactive sentence, so that applications such as the voice assistant in the terminal device can respond according to the target interactive sentence.
- the existing mature single-round dialogue technology can be used to reply to the user's intention, and the accuracy of dialogue state tracking and user intention recognition can be improved. It can improve the natural language processing capabilities of the dialogue system, and enhance the rationality of the dialogue system’s reply during multiple rounds of dialogue, so that the system’s reply can better match the actual needs of the user and reduce the number of interactions between the user and the dialogue system.
- FIG. 8 there is shown a schematic step flowchart of a voice interaction method provided by another embodiment of the present application.
- the method may specifically include the following steps:
- this embodiment takes the terminal device as a mobile phone as an example for subsequent introduction. That is, when a user uses an application such as a voice assistant in a mobile phone, this type of application identifies the user's entity information in the current round and previous rounds to determine the corresponding user intention, and based on the intention, the current round The second user sentence is rewritten, and a reply sentence corresponding to the rewritten user sentence is output to meet the actual needs of the user.
- an application such as a voice assistant in a mobile phone
- the user sentence to be replied may refer to a certain sentence directly uttered by the user during the interaction with the voice assistant.
- This sentence may be a sentence that can fully express the intention of a certain user, or it may be One or more words.
- the voice assistant When the voice assistant receives a certain sentence from the user, it can first determine whether it can give a corresponding reply for the sentence. If the voice assistant can directly give a reply based on the sentence, no other processing is required, and the reply sentence can be directly provided to the user. For example, if the user's sentence is "What is the temperature in Beijing this Friday", because the sentence can directly determine that the user's intention is to inquire about the weather in Beijing this Friday, the voice assistant can directly output the result according to the query To the user.
- the user's intention can be re-determined by combining the user's expressions in the previous rounds.
- the historical dialogue data between the user and the voice assistant can be obtained.
- the aforementioned historical dialogue data can be the dialogue data of all rounds after the user wakes up the voice assistant this time until the current round, or it can be the dialogue data of several consecutive rounds before the current round. This embodiment does not do this. limited.
- S802 Identify the target entity information in the user sentence, and identify the historical entity information in the historical dialogue data.
- user sentences and entity information in historical dialogue data can be identified based on the NER model.
- the historical entity information in the historical dialogue data may include the entity information in the sentence spoken by the user in a certain round, and may also include the entity information in the reply sentence when the voice assistant replies to the user.
- the historical entity information may include the "restaurant” in the user's sentence, as well as the entity information such as "Nongda South Road, Haidian District” and "Haidilao" in the voice assistant's reply sentence.
- S803 According to the target entity information and the historical entity information, determine a candidate user intention that matches the user sentence.
- the candidate user intentions preliminarily determined based on the target entity information and the historical entity information may also include multiple types.
- the KBs can be combined to preliminarily determine the current possible intentions of the user.
- multiple user intentions can be preset in KBs, and each user intention can include multiple semantic slots. After identifying the target entity information and historical entity information, you can match the slots corresponding to each user intent based on the above two entity information, so as to match the user intents corresponding to the slots containing part of the identified entity information , Which is preliminarily determined as a candidate user’s intention.
- S804 Calculate the distribution probability of each historical entity information in the historical dialogue data.
- the distribution probability of each historical entity information in the historical dialogue data may be calculated first.
- the distribution probability of each historical entity information can be determined based on the PGN model. First, symbolize each historical entity information, then call the PGN model, and use the encoding module of the PGN model to encode each historical entity information after the symbolization process, and calculate the distribution of each historical entity information in the encoding link Probability.
- the prediction model can be trained with training data and KBs to enhance the key information extraction capability of the PGN model.
- the above-mentioned training data may be pre-collected multiple rounds of dialogue data, including entity information in a certain round (current round) of the dialogue and historical entity information (rounds before the current round) in the pre-collected training data.
- the corresponding attention distribution can be output after converting it into a text vector; at the same time, combining the encoding module and decoding module of the PGN model to obtain the generation probability of historical entity information.
- the above-mentioned various types of probabilities can be added together to output the final distribution probability.
- the user's confirmation information can also be combined to improve the output distribution probability and the reliability of the identified key entity information.
- the key entity information associated with the user's intention is found, that is, the entity information that has a greater correlation with the user's intention is selected from all the historical entity information .
- the candidate entity information associated with any candidate user's intention can be extracted from the historical entity information, and then the candidate entity information whose distribution probability is greater than a certain preset probability threshold can be extracted as the candidate entity information related to the intention Key entity information.
- the probability threshold may be set to 0.8. Therefore, candidate entity information whose distribution probability is greater than 0.8 can be identified as key entity information.
- the user may be invited to identify the entity information.
- the candidate entity information and key entity information corresponding to the target probability value can be used,
- the query sentence is generated to instruct the user to identify the candidate entity information corresponding to the target probability value, and the target probability value is the probability value of the distribution probability of any candidate entity information in the historical dialogue data.
- the candidate entity information corresponding to the target probability value can be identified as key entity information.
- the probability value of the historical entity information "temperature” is calculated to be 0.86, which is greater than the set probability threshold of 0.8.
- the entity information "temperature” can be Identify as key entity information.
- the calculated probability value of the historical entity information "Friday” is 0.72, which is less than the above-mentioned probability threshold of 0.8, but is in the vicinity of the threshold.
- the target entity information in the current round of user sentences is "Beijing"
- the aforementioned historical entity information "Friday” can also be identified as key entity information.
- the target interaction sentence matching the actual intention of the user can be generated based on the above two kinds of information.
- the target basic sentence in order to reduce the difficulty of generating the target interactive sentence, can be determined first, and then rewritten on the basis of the target basic sentence to obtain the final target interactive sentence.
- the target basic sentence may be determined based on key entity information and/or target entity information.
- the aforementioned entity information to be evaluated includes all target entity information and key entity information.
- the basic sentence may be the current user sentence or a certain sentence of the user sentence in the historical dialogue data.
- the degree of matching between the entity information to be evaluated and each basic sentence can be determined according to the degree of matching between the entity information to be evaluated and the semantic slot.
- the number of semantic slots in the basic sentence and the number of entity information to be evaluated can be counted, that is, how many slots are included in the basic sentence to be calculated, and the number of slots to be identified can be counted.
- the target entity information and key entity information can be used to rewrite the sentence to obtain the final target interactive sentence.
- target entity information or key entity information for sentence rewriting depends on whether the target basic sentence is the current user sentence or the user sentence in the historical dialogue. If the target basic sentence is the current user sentence, since the user sentence already contains all the target entity information, you can use the key entity information identified from the historical dialogue to rewrite; if the target basic sentence is in the historical dialogue Since the sentence may only contain part of the key entity information, you can use all the key entity information and target entity information to rewrite the sentence.
- the target interactive sentence may be output based on the PGN model.
- the PGN model may also include a decoding module.
- the decoding module can be obtained by training various types of training data.
- the various types of training data can include multiple entity information and basic sentences corresponding to each entity information.
- the decoding module of the PGN model can be used to decode the target entity information, key entity information, and target basic sentence, and output the target interactive sentence.
- the target interactive sentence output by the PGN model it can be verified whether the sentence is rewritten correctly.
- multiple entity information in the target interaction sentence may be extracted first, and it is verified whether the multiple entity information in the target interaction sentence matches the preset semantic slot of the target user's intention in the knowledge base.
- the target user intention is any one of all candidate user intentions.
- step S808 is executed to output a reply sentence corresponding to the target interaction sentence.
- the target interactive sentence can be verified a second time according to the sentence type of the target interactive sentence.
- step S808 can also be executed to output a reply sentence corresponding to the target interactive sentence; if the target interactive sentence is not a task-type sentence, it means that the voice assistant cannot perform specific intent recognition on the sentence or the recognized intention lacks a change.
- the user can be prompted to re-enter the user sentence at this time, and the voice assistant can recognize the user's intention again according to the re-entered user sentence, and generate a new target interaction sentence.
- S808 Output a reply sentence corresponding to the target interactive sentence.
- the user sentence in the current round can be rewritten, so as to convert the dialogue state tracking problem in multiple rounds of dialogues to a single round of dialogue questions to a certain extent ,
- FIG. 10 it is a schematic diagram of the operation process of the voice interaction method provided by an embodiment of the present application.
- the entire voice interaction may include the following steps:
- the improved model still cannot determine whether the entity should appear in the output sentence.
- the user may be invited to participate in the configuration of the key entity; then, the key entity greater than the threshold is combined, based on the above determination.
- the decoding module of the PGN model is used to generate the sentence.
- the reason for inviting users to participate in entity configuration is to increase the recall rate of key information extracted by the model.
- the threshold can be set higher. But too high a threshold may also result in the loss of some key information. Therefore, it is necessary to invite users to participate in the configuration for entity information close to the threshold to further improve the recall rate of the key information extracted by the model.
- the reliability of the output sentences obtained in this way is also high, which can be used as training corpus to iteratively optimize the model, which partially solves the problem of difficulty in obtaining high-quality multi-round dialogue materials.
- this embodiment designs a two-layer feedback mechanism, and the specific method is as follows:
- the rewritten sentence is matched with the slot value corresponding to the intent in KBs. If the match is successful, the rewriting is considered correct; if the match is unsuccessful, the rewritten sentence can be verified by using natural language understanding technology. If natural language understanding technology is used If it is recognized that the sentence is a task-type sentence, it can be considered that the rewriting is correct; if it is recognized that the sentence is not a task-type sentence, it can be considered that the rewriting is wrong. At this point, the user can be guided to restate the intention as the follow-up training corpus.
- the DST problem in multiple rounds of dialogue can be converted to a single-round dialogue question to a certain extent, and the existing mature single-talk dialogue technology can be used to respond to user intentions and improve task orientation The capabilities and user experience of a multi-round dialogue system.
- FIG. 11 shows a structural block diagram of a voice interaction device provided by an embodiment of the present application. For ease of description, only the parts related to the embodiment of the present application are shown.
- the device can be applied to terminal equipment, and specifically can include the following modules:
- the historical dialogue data acquisition module 1101 is used to acquire historical dialogue data when a user sentence to be replied is received;
- the target entity information identification module 1102 is used to identify the target entity information in the user sentence; and,
- the historical entity information identification module 1103 is used to identify historical entity information in the historical dialogue data
- the key entity information extraction module 1104 is configured to extract key entity information associated with the user sentence from the historical entity information;
- the target interactive sentence generating module 1105 is configured to generate a target interactive sentence according to the target entity information and the key entity information;
- the reply sentence output module 1106 is used to output a reply sentence corresponding to the target interactive sentence.
- the key entity information extraction module may specifically include the following submodules:
- a candidate user intention determination sub-module configured to determine a candidate user intention that matches the user sentence according to the target entity information and the historical entity information;
- the distribution probability calculation sub-module is used to separately calculate the distribution probability of each historical entity information in the historical dialogue data
- the key entity information extraction sub-module is used to extract key entity information from the historical entity information according to the distribution probability and the candidate user's intention.
- the distribution probability calculation sub-module may specifically include the following units:
- the first pointer generation network model calling unit is configured to call a preset pointer generation network model, and use the coding module of the pointer generation network model to respectively encode each historical entity information to obtain information corresponding to each historical entity information. The corresponding distribution probability.
- the key entity information extraction submodule may specifically include the following units:
- a candidate entity information extraction unit configured to extract candidate entity information associated with any candidate user's intention from the historical entity information
- the key entity information extraction unit is configured to extract candidate entity information whose distribution probability is greater than a preset probability threshold as key entity information.
- the key entity information extraction submodule may further include the following units:
- the query sentence generating unit is configured to: if the difference between the target probability value and the preset probability threshold is less than the preset difference, and the target probability value is less than the preset probability threshold, then according to the target probability value Corresponding to the candidate entity information and the key entity information, generating a query sentence to instruct the user to identify the candidate entity information corresponding to the target probability value;
- the key entity information determining unit is configured to determine the candidate entity information corresponding to the target probability value as the key entity information when the user's confirmation information for the query sentence is received, and the target probability value is any candidate The probability value of the distribution probability of the entity information in the historical dialogue data.
- the target interactive sentence generation module may specifically include the following sub-modules:
- the target basic sentence determination sub-module is used to determine the target basic sentence
- the target interactive sentence generating sub-module is used to use the target entity information and the key entity information to rewrite the target basic sentence to generate a target interactive sentence.
- the target basic sentence determination submodule may specifically include the following units:
- the basic sentence obtaining unit is configured to obtain a plurality of basic sentences from the user sentence containing the target entity information and the historical dialogue data containing the key entity information;
- a matching degree calculation unit configured to calculate the matching degree between the plurality of basic sentences and the entity information to be evaluated, where the entity information to be evaluated includes the target entity information and the key entity information;
- the target basic sentence identification unit is used to identify the basic sentence corresponding to the maximum matching degree as the current target basic sentence.
- any basic sentence includes multiple semantic slots
- the matching degree calculation unit may specifically include the following subunits:
- the statistics subunit is used to count the number of semantic slots in the basic sentence and the number of entity information to be evaluated for any basic sentence;
- the determining sub-module is used to determine the number of key slots in the basic sentence that respectively match the information of the entity to be evaluated;
- the calculation subunit is used to calculate the ratio between the number of key slots and the number of semantic slots in the basic sentence, and use the ratio as the difference between the entity information to be evaluated and the basic sentence suitability.
- the pointer generation network model further includes a decoding module, which is obtained by training various types of training data, and the various types of training data include multiple entity information and information related to each entity.
- a decoding module which is obtained by training various types of training data, and the various types of training data include multiple entity information and information related to each entity.
- the target interactive sentence generation sub-module may specifically include the following units:
- the second pointer generation network model calling unit is configured to use the decoding module to decode the target entity information, the key entity information, and the target basic sentence, and output a target interactive sentence.
- the target interactive sentence generation submodule may further include the following units:
- the target interactive sentence entity information extraction unit is used to extract multiple entity information in the target interactive sentence
- the target interactive sentence verification unit is used to verify whether the multiple entity information in the target interactive sentence matches a preset semantic slot of the target user's intention in the knowledge base, and the target user's intention is any of the candidate user intentions One; if multiple entity information in the target interaction sentence matches the semantic slot of the target user's intention, it is determined that the generated target interaction sentence is correct, and the response sentence corresponding to the target interaction sentence is executed and output If the multiple entity information in the target interaction sentence does not match the semantic slot intended by the target user, then the target interaction sentence is verified according to the sentence type of the target interaction sentence.
- the target interactive sentence verification unit is further configured to: call a preset natural language understanding model to determine whether the target interactive sentence is a task-type sentence; if the target interactive sentence is a task-type sentence, then Call the reply sentence output module to output a reply sentence corresponding to the target interactive sentence; if the target interactive sentence is not a task-type sentence, the user is prompted to re-enter the user sentence, and re-generated according to the re-input user sentence Target interactive statement.
- the description is relatively simple, and for related parts, please refer to the description of the method embodiment part.
- the terminal device 1200 of this embodiment includes: a processor 1210, a memory 1220, and a computer program 1221 that is stored in the memory 1220 and can run on the processor 1210.
- the processor 1210 executes the computer program 1221
- the steps in the various embodiments of the voice interaction method described above are implemented, for example, steps S701 to S705 shown in FIG. 7.
- the processor 1210 executes the computer program 1221
- the functions of the modules/units in the foregoing device embodiments are implemented, for example, the functions of the modules 1101 to 1106 shown in FIG. 11.
- the computer program 1221 may be divided into one or more modules/units, and the one or more modules/units are stored in the memory 1220 and executed by the processor 1210 to complete This application.
- the one or more modules/units may be a series of computer program instruction segments capable of completing specific functions, and the instruction segments may be used to describe the execution process of the computer program 1221 in the terminal device 1200.
- the computer program 1221 can be divided into a historical dialogue data acquisition module, a target entity information recognition module, a historical entity information recognition module, a key entity information extraction module, a target interactive sentence generation module, and a reply sentence output module.
- the specific functions of each module are as follows:
- the historical dialogue data acquisition module is used to acquire historical dialogue data when the user sentence to be replied is received
- the target entity information identification module is used to identify the target entity information in the user sentence
- the historical entity information identification module is used to identify the historical entity information in the historical dialogue data
- a key entity information extraction module for extracting key entity information associated with the user sentence from the historical entity information
- a target interactive sentence generating module configured to generate a target interactive sentence according to the target entity information and the key entity information
- the reply sentence output module is used to output the reply sentence corresponding to the target interactive sentence.
- the terminal device 1200 may be a computing device such as a desktop computer, a notebook, or a palmtop computer.
- the terminal device 1200 may include, but is not limited to, a processor 1210 and a memory 1220.
- FIG. 12 is only an example of the terminal device 1200, and does not constitute a limitation on the terminal device 1200. It may include more or less components than those shown in the figure, or combine some components, or different components.
- the terminal device 1200 may also include input and output devices, network access devices, buses, and so on.
- the processor 1210 may be a central processing unit (Central Processing Unit, CPU), other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (ASIC), Ready-made programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc.
- the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
- the memory 1220 may be an internal storage unit of the terminal device 1200, such as a hard disk or a memory of the terminal device 1200.
- the memory 1220 may also be an external storage device of the terminal device 1200, such as a plug-in hard disk equipped on the terminal device 1200, a smart memory card (Smart Media Card, SMC), and a Secure Digital (SD) Card, Flash Card, etc.
- the memory 1220 may also include both an internal storage unit of the terminal device 1200 and an external storage device.
- the memory 1220 is used to store the computer program 1221 and other programs and data required by the terminal device 1200.
- the memory 1220 can also be used to temporarily store data that has been output or will be output.
- the embodiment of the present application also discloses a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the aforementioned voice interaction method can be realized.
- the disclosed voice interaction method, device, and terminal device can be implemented in other ways.
- the division of the modules or units is only a logical function division, and there may be other divisions in actual implementation.
- multiple units or components can be combined or integrated into another system, or some features can be ignored. Or not.
- the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
- the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
- the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
- the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
- the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
- the computer program can be stored in a computer-readable storage medium. When executed by the processor, the steps of the foregoing method embodiments can be implemented.
- the computer program includes computer program code, and the computer program code may be in the form of source code, object code, executable file, or some intermediate forms.
- the computer-readable medium may include at least: any entity or device capable of carrying computer program code to a voice interaction device or terminal device, recording medium, computer memory, read-only memory (ROM, Read-Only Memory), random access Memory (RAM, Random Access Memory), electric carrier signal, telecommunications signal, and software distribution medium.
- ROM read-only memory
- RAM random access Memory
- electric carrier signal telecommunications signal
- software distribution medium for example, U disk, mobile hard disk, floppy disk or CD-ROM, etc.
- computer-readable media cannot be electrical carrier signals and telecommunication signals.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Library & Information Science (AREA)
- Health & Medical Sciences (AREA)
- Acoustics & Sound (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Machine Translation (AREA)
Abstract
La présente invention concerne, qui s'applique au domaine technique de l'intelligence artificielle, concerne un procédé et un appareil d'interaction vocale, et un dispositif terminal. Le procédé comprend : l'obtention de données de dialogue historiques lors de la réception d'une déclaration d'utilisateur nécessitant une réponse ; l'identification d'informations d'entité cible dans la déclaration d'utilisateur et l'identification d'informations d'entité historiques dans les données de dialogue historiques ; l'extraction d'informations d'entité clé associées à la déclaration d'utilisateur depuis les informations d'entité historiques ; la génération d'une déclaration d'interaction cible selon les informations d'entité cible et les informations d'entité clé ; et la fourniture d'une déclaration de réponse correspondant à la déclaration d'interaction cible. Selon le procédé, la précision d'un suivi d'état de dialogue et d'une reconnaissance d'intention d'utilisateur peut être améliorée, la capacité de traitement de langage naturel du système de dialogue est améliorée et la rationalité de réponse du système de dialogue dans le processus de dialogue à multiples tours est améliorée.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010244784.8 | 2020-03-31 | ||
CN202010244784.8A CN111428483B (zh) | 2020-03-31 | 2020-03-31 | 语音交互方法、装置和终端设备 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021196981A1 true WO2021196981A1 (fr) | 2021-10-07 |
Family
ID=71557320
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/079479 WO2021196981A1 (fr) | 2020-03-31 | 2021-03-08 | Procédé et appareil d'interaction vocale, et dispositif terminal |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN111428483B (fr) |
WO (1) | WO2021196981A1 (fr) |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114357950A (zh) * | 2021-12-31 | 2022-04-15 | 科大讯飞股份有限公司 | 数据改写方法、装置、存储介质及计算机设备 |
CN114739408A (zh) * | 2022-03-21 | 2022-07-12 | 深圳市优必选科技股份有限公司 | 一种语义导航方法、语义导航装置及机器人 |
CN115545002A (zh) * | 2022-11-29 | 2022-12-30 | 支付宝(杭州)信息技术有限公司 | 一种模型训练和业务处理的方法、装置、存储介质及设备 |
CN115579008A (zh) * | 2022-12-05 | 2023-01-06 | 广州小鹏汽车科技有限公司 | 语音交互方法、服务器及计算机可读存储介质 |
CN115934922A (zh) * | 2023-03-09 | 2023-04-07 | 杭州心识宇宙科技有限公司 | 一种对话业务执行方法、装置、存储介质及电子设备 |
CN116052312A (zh) * | 2023-01-10 | 2023-05-02 | 广东好太太智能家居有限公司 | 智能锁的控制方法及相关设备 |
US20230290347A1 (en) * | 2020-11-20 | 2023-09-14 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Voice interaction method and apparatus, device and computer storage medium |
CN116933800A (zh) * | 2023-09-12 | 2023-10-24 | 深圳须弥云图空间科技有限公司 | 一种基于模版的生成式意图识别方法及装置 |
CN117076620A (zh) * | 2023-06-25 | 2023-11-17 | 北京百度网讯科技有限公司 | 一种对话处理方法、装置、电子设备及存储介质 |
CN117172732A (zh) * | 2023-07-31 | 2023-12-05 | 北京五八赶集信息技术有限公司 | 基于ai的招聘服务系统、方法、设备及存储介质 |
WO2024008056A1 (fr) * | 2022-07-08 | 2024-01-11 | 中国疾病预防控制中心慢性非传染性疾病预防控制中心 | Système permettant d'aider un opérateur à poser des questions à un chercheur d'aide |
CN117421416A (zh) * | 2023-12-19 | 2024-01-19 | 数据空间研究院 | 交互检索方法、装置和电子设备 |
WO2024077878A1 (fr) * | 2022-10-13 | 2024-04-18 | 深圳市人马互动科技有限公司 | Procédé de traitement d'appel sortant vocal et appareil associé |
Families Citing this family (32)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111428483B (zh) * | 2020-03-31 | 2022-05-24 | 华为技术有限公司 | 语音交互方法、装置和终端设备 |
CN111966803B (zh) * | 2020-08-03 | 2024-04-12 | 深圳市欢太科技有限公司 | 对话模拟方法、装置、存储介质及电子设备 |
CN112084768A (zh) * | 2020-08-06 | 2020-12-15 | 珠海格力电器股份有限公司 | 一种多轮交互方法、装置及存储介质 |
CN111949793B (zh) * | 2020-08-13 | 2024-02-27 | 深圳市欢太科技有限公司 | 用户意图识别方法、装置及终端设备 |
CN112183105A (zh) * | 2020-08-28 | 2021-01-05 | 华为技术有限公司 | 人机交互方法及装置 |
CN112100349B (zh) * | 2020-09-03 | 2024-03-19 | 深圳数联天下智能科技有限公司 | 一种多轮对话方法、装置、电子设备及存储介质 |
CN112256229B (zh) * | 2020-09-11 | 2024-05-14 | 北京三快在线科技有限公司 | 人机语音交互方法、装置、电子设备及存储介质 |
CN112183097B (zh) * | 2020-09-27 | 2024-06-21 | 深圳追一科技有限公司 | 一种实体召回方法及相关装置 |
CN112199473A (zh) * | 2020-10-16 | 2021-01-08 | 上海明略人工智能(集团)有限公司 | 一种知识问答系统中的多轮对话方法与装置 |
CN112331201A (zh) * | 2020-11-03 | 2021-02-05 | 珠海格力电器股份有限公司 | 语音的交互方法和装置、存储介质、电子装置 |
CN112395887A (zh) * | 2020-11-05 | 2021-02-23 | 北京文思海辉金信软件有限公司 | 对话应答方法、装置、计算机设备和存储介质 |
CN112527998B (zh) * | 2020-12-22 | 2024-08-20 | 深圳市优必选科技股份有限公司 | 一种答复推荐方法、答复推荐装置及智能设备 |
CN112632251B (zh) * | 2020-12-24 | 2023-12-29 | 北京百度网讯科技有限公司 | 回复内容的生成方法、装置、设备和存储介质 |
CN112735374B (zh) * | 2020-12-29 | 2023-01-06 | 北京三快在线科技有限公司 | 一种自动语音交互的方法及装置 |
CN112699228B (zh) * | 2020-12-31 | 2023-07-14 | 青岛海尔科技有限公司 | 业务访问方法、装置、电子设备及存储介质 |
CN112650846B (zh) * | 2021-01-13 | 2024-08-23 | 北京智通云联科技有限公司 | 一种基于问句框架的问答意图知识库构建系统及方法 |
CN112783324B (zh) * | 2021-01-14 | 2023-12-01 | 科大讯飞股份有限公司 | 人机交互方法及设备、计算机存储介质 |
CN112836030B (zh) * | 2021-01-29 | 2023-04-25 | 成都视海芯图微电子有限公司 | 一种智能对话系统及方法 |
CN112989008A (zh) * | 2021-04-21 | 2021-06-18 | 上海汽车集团股份有限公司 | 一种多轮对话改写方法、装置和电子设备 |
CN113436752B (zh) * | 2021-05-26 | 2023-04-28 | 山东大学 | 一种半监督的多轮医疗对话回复生成方法及系统 |
CN113536788B (zh) * | 2021-07-28 | 2023-12-05 | 平安科技(上海)有限公司 | 信息处理方法、装置、存储介质及设备 |
CN113590750B (zh) * | 2021-07-30 | 2024-09-13 | 北京小米移动软件有限公司 | 人机对话方法、装置、电子设备及存储介质 |
CN113806508A (zh) * | 2021-09-17 | 2021-12-17 | 平安普惠企业管理有限公司 | 基于人工智能的多轮对话方法、装置及存储介质 |
CN114138958A (zh) * | 2021-11-25 | 2022-03-04 | 北京声智科技有限公司 | 信息交互方法、装置、设备及存储介质 |
CN115146041A (zh) * | 2022-05-27 | 2022-10-04 | 阿里巴巴(中国)有限公司 | 信息提取方法及装置 |
CN114861680B (zh) * | 2022-05-27 | 2023-07-25 | 马上消费金融股份有限公司 | 对话处理方法及装置 |
CN115759122A (zh) * | 2022-11-03 | 2023-03-07 | 支付宝(杭州)信息技术有限公司 | 一种意图识别方法、装置、设备及可读存储介质 |
CN116246629A (zh) * | 2023-02-13 | 2023-06-09 | 深圳市优必选科技股份有限公司 | 人机对话方法、装置及电子设备 |
CN116521850B (zh) * | 2023-07-04 | 2023-12-01 | 北京红棉小冰科技有限公司 | 一种基于强化学习的交互方法及装置 |
CN116975654B (zh) * | 2023-08-22 | 2024-01-05 | 腾讯科技(深圳)有限公司 | 对象互动方法、装置、电子设备及存储介质 |
CN117078270B (zh) * | 2023-10-17 | 2024-02-02 | 彩讯科技股份有限公司 | 用于网络产品营销的智能交互方法和装置 |
CN118394430A (zh) * | 2024-04-23 | 2024-07-26 | 深圳领驭科技有限公司 | 一种应用于制造业领域的智能管理方法以及设备 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107885756A (zh) * | 2016-09-30 | 2018-04-06 | 华为技术有限公司 | 基于深度学习的对话方法、装置及设备 |
CN109101492A (zh) * | 2018-07-25 | 2018-12-28 | 南京瓦尔基里网络科技有限公司 | 一种自然语言处理中使用历史对话行为进行实体提取的方法及系统 |
CN109697282A (zh) * | 2017-10-20 | 2019-04-30 | 阿里巴巴集团控股有限公司 | 一种语句的用户意图识别方法和装置 |
CN110334201A (zh) * | 2019-07-18 | 2019-10-15 | 中国工商银行股份有限公司 | 一种意图识别方法、装置及系统 |
CN111428483A (zh) * | 2020-03-31 | 2020-07-17 | 华为技术有限公司 | 语音交互方法、装置和终端设备 |
Family Cites Families (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107369443B (zh) * | 2017-06-29 | 2020-09-25 | 北京百度网讯科技有限公司 | 基于人工智能的对话管理方法及装置 |
CN108228764A (zh) * | 2017-12-27 | 2018-06-29 | 神思电子技术股份有限公司 | 一种单轮对话和多轮对话的融合方法 |
US10593350B2 (en) * | 2018-04-21 | 2020-03-17 | International Business Machines Corporation | Quantifying customer care utilizing emotional assessments |
CN109086329B (zh) * | 2018-06-29 | 2021-01-05 | 出门问问信息科技有限公司 | 基于话题关键词引导的进行多轮对话方法及装置 |
CN109461039A (zh) * | 2018-08-28 | 2019-03-12 | 厦门快商通信息技术有限公司 | 一种文本处理方法及智能客服方法 |
CN110162675B (zh) * | 2018-09-25 | 2023-05-02 | 腾讯科技(深圳)有限公司 | 应答语句的生成方法、装置、计算机可读介质及电子设备 |
CN109582767B (zh) * | 2018-11-21 | 2024-05-17 | 北京京东尚科信息技术有限公司 | 对话系统处理方法、装置、设备及可读存储介质 |
CN109918673B (zh) * | 2019-03-14 | 2021-08-03 | 湖北亿咖通科技有限公司 | 语义仲裁方法、装置、电子设备和计算机可读存储介质 |
CN110263330B (zh) * | 2019-05-22 | 2024-06-25 | 腾讯科技(深圳)有限公司 | 问题语句的改写方法、装置、设备和存储介质 |
CN110209791B (zh) * | 2019-06-12 | 2021-03-26 | 百融云创科技股份有限公司 | 一种多轮对话智能语音交互系统及装置 |
CN110442676A (zh) * | 2019-07-02 | 2019-11-12 | 北京邮电大学 | 基于多轮对话的专利检索方法及装置 |
CN110390108B (zh) * | 2019-07-29 | 2023-11-21 | 中国工商银行股份有限公司 | 基于深度强化学习的任务型交互方法和系统 |
CN110704596B (zh) * | 2019-09-29 | 2023-03-31 | 北京百度网讯科技有限公司 | 基于话题的对话方法、装置和电子设备 |
-
2020
- 2020-03-31 CN CN202010244784.8A patent/CN111428483B/zh active Active
-
2021
- 2021-03-08 WO PCT/CN2021/079479 patent/WO2021196981A1/fr active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107885756A (zh) * | 2016-09-30 | 2018-04-06 | 华为技术有限公司 | 基于深度学习的对话方法、装置及设备 |
CN109697282A (zh) * | 2017-10-20 | 2019-04-30 | 阿里巴巴集团控股有限公司 | 一种语句的用户意图识别方法和装置 |
CN109101492A (zh) * | 2018-07-25 | 2018-12-28 | 南京瓦尔基里网络科技有限公司 | 一种自然语言处理中使用历史对话行为进行实体提取的方法及系统 |
CN110334201A (zh) * | 2019-07-18 | 2019-10-15 | 中国工商银行股份有限公司 | 一种意图识别方法、装置及系统 |
CN111428483A (zh) * | 2020-03-31 | 2020-07-17 | 华为技术有限公司 | 语音交互方法、装置和终端设备 |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20230290347A1 (en) * | 2020-11-20 | 2023-09-14 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Voice interaction method and apparatus, device and computer storage medium |
US12118992B2 (en) * | 2020-11-20 | 2024-10-15 | Beijing Baidu Netcom Science Technology Co., Ltd. | Voice interaction method and apparatus, device and computer storage medium |
CN114357950A (zh) * | 2021-12-31 | 2022-04-15 | 科大讯飞股份有限公司 | 数据改写方法、装置、存储介质及计算机设备 |
CN114739408A (zh) * | 2022-03-21 | 2022-07-12 | 深圳市优必选科技股份有限公司 | 一种语义导航方法、语义导航装置及机器人 |
WO2024008056A1 (fr) * | 2022-07-08 | 2024-01-11 | 中国疾病预防控制中心慢性非传染性疾病预防控制中心 | Système permettant d'aider un opérateur à poser des questions à un chercheur d'aide |
WO2024077878A1 (fr) * | 2022-10-13 | 2024-04-18 | 深圳市人马互动科技有限公司 | Procédé de traitement d'appel sortant vocal et appareil associé |
CN115545002A (zh) * | 2022-11-29 | 2022-12-30 | 支付宝(杭州)信息技术有限公司 | 一种模型训练和业务处理的方法、装置、存储介质及设备 |
CN115545002B (zh) * | 2022-11-29 | 2023-03-31 | 支付宝(杭州)信息技术有限公司 | 一种模型训练和业务处理的方法、装置、存储介质及设备 |
CN115579008B (zh) * | 2022-12-05 | 2023-03-31 | 广州小鹏汽车科技有限公司 | 语音交互方法、服务器及计算机可读存储介质 |
CN115579008A (zh) * | 2022-12-05 | 2023-01-06 | 广州小鹏汽车科技有限公司 | 语音交互方法、服务器及计算机可读存储介质 |
CN116052312A (zh) * | 2023-01-10 | 2023-05-02 | 广东好太太智能家居有限公司 | 智能锁的控制方法及相关设备 |
CN115934922B (zh) * | 2023-03-09 | 2024-01-30 | 杭州心识宇宙科技有限公司 | 一种对话业务执行方法、装置、存储介质及电子设备 |
CN115934922A (zh) * | 2023-03-09 | 2023-04-07 | 杭州心识宇宙科技有限公司 | 一种对话业务执行方法、装置、存储介质及电子设备 |
CN117076620A (zh) * | 2023-06-25 | 2023-11-17 | 北京百度网讯科技有限公司 | 一种对话处理方法、装置、电子设备及存储介质 |
CN117172732A (zh) * | 2023-07-31 | 2023-12-05 | 北京五八赶集信息技术有限公司 | 基于ai的招聘服务系统、方法、设备及存储介质 |
CN116933800A (zh) * | 2023-09-12 | 2023-10-24 | 深圳须弥云图空间科技有限公司 | 一种基于模版的生成式意图识别方法及装置 |
CN116933800B (zh) * | 2023-09-12 | 2024-01-05 | 深圳须弥云图空间科技有限公司 | 一种基于模版的生成式意图识别方法及装置 |
CN117421416A (zh) * | 2023-12-19 | 2024-01-19 | 数据空间研究院 | 交互检索方法、装置和电子设备 |
CN117421416B (zh) * | 2023-12-19 | 2024-03-26 | 数据空间研究院 | 交互检索方法、装置和电子设备 |
Also Published As
Publication number | Publication date |
---|---|
CN111428483B (zh) | 2022-05-24 |
CN111428483A (zh) | 2020-07-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021196981A1 (fr) | Procédé et appareil d'interaction vocale, et dispositif terminal | |
CN111261144B (zh) | 一种语音识别的方法、装置、终端以及存储介质 | |
KR102436293B1 (ko) | 이미지 데이터에 적어도 부분적으로 기초하여 액션을 수행하기 위한 에이전트 결정 | |
US11556698B2 (en) | Augmenting textual explanations with complete discourse trees | |
JP2020034897A (ja) | 自然言語会話に関連する情報の視覚的提示 | |
US11574634B2 (en) | Interfacing with applications via dynamically updating natural language processing | |
US11514896B2 (en) | Interfacing with applications via dynamically updating natural language processing | |
CN110825863B (zh) | 一种文本对融合方法及装置 | |
KR102701423B1 (ko) | 음성 인식을 수행하는 전자 장치 및 전자 장치의 동작 방법 | |
US11521619B2 (en) | System and method for modifying speech recognition result | |
CN113220848B (zh) | 用于人机交互的自动问答方法、装置和智能设备 | |
CN113590769B (zh) | 任务驱动型多轮对话系统中的状态追踪方法及装置 | |
US10474439B2 (en) | Systems and methods for building conversational understanding systems | |
CN113826089A (zh) | 对聊天机器人中的自然理解系统的具有到期指标的上下文反馈 | |
CN109643540A (zh) | 用于人工智能语音演进的系统和方法 | |
US12008988B2 (en) | Electronic apparatus and controlling method thereof | |
CN110325987A (zh) | 语境语音驱动深度书签 | |
CN112148836B (zh) | 多模态信息处理方法、装置、设备及存储介质 | |
US20210250438A1 (en) | Graphical User Interface for a Voice Response System | |
US9183196B1 (en) | Parsing annotator framework from external services | |
WO2019071607A1 (fr) | Procédé et dispositif de traitement d'informations vocales, et terminal | |
CN112002313B (zh) | 交互方法及装置、音箱、电子设备和存储介质 | |
CN118265981A (zh) | 用于为预训练的语言模型处置长文本的系统和技术 | |
US20210264910A1 (en) | User-driven content generation for virtual assistant | |
CN113535926A (zh) | 主动对话方法、装置及语音终端 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21778927 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21778927 Country of ref document: EP Kind code of ref document: A1 |