CN118093815A

CN118093815A - Electronic device, intention recognition method, intention recognition device, and storage medium

Info

Publication number: CN118093815A
Application number: CN202410158683.7A
Authority: CN
Inventors: 巨荣辉
Original assignee: Hisense Electronic Technology Wuhan Co ltd
Current assignee: Hisense Electronic Technology Wuhan Co ltd
Priority date: 2024-02-04
Filing date: 2024-02-04
Publication date: 2024-05-28

Abstract

The embodiment of the invention relates to the technical field of artificial intelligence, and discloses electronic equipment, an intention recognition method, an intention recognition device and a storage medium, wherein the electronic equipment comprises: a controller configured to: processing the first question information based on the intention recognition model to obtain a plurality of candidate reply information; determining target similarity of the plurality of candidate reply messages based on the plurality of candidate reply messages; if the target similarity is smaller than the similarity threshold, acquiring at least two keywords corresponding to the candidate reply messages; generating intention questioning information according to at least two keywords, and outputting the intention questioning information; and responding to the intention reply information input by the user on the intention questioning information, and determining and outputting target reply information corresponding to the first questioning information. The invention can realize the user intention recognition of the open domain by the active questioning mode, thereby accurately positioning the user intention and giving corresponding feedback by the targeted active questioning under the condition that the user intention is not clear.

Description

Electronic device, intention recognition method, intention recognition device, and storage medium

Technical Field

The present invention relates to the field of artificial intelligence, and in particular, to an electronic device, an intention recognition method, an intention recognition apparatus, and a storage medium.

Background

The artificial intelligence (ARTIFICIAL INTELLIGENCE, AI) dialogue system is an emerging intelligent interaction mode, can realize the functions of voice and text recognition, intention recognition, emotion analysis and the like, and can respond with proper natural language, thereby realizing natural interaction between human and machine. Among them, the intention recognition (intelt Detection) function is a key of the AI dialogue system, and the intention is the requirement of the user, i.e. what the user wants to do. After understanding the user's intent, the AI dialog system can further provide accurate feedback on the user's questions or commands.

However, in the related art, the AI dialog system generally cannot realize the user intention recognition function of the open domain, and thus, the user intention recognition method in the related art needs further improvement.

Disclosure of Invention

The embodiment of the invention provides electronic equipment, an intention recognition method, an intention recognition device and a storage medium, which can realize the user intention recognition of an open domain and accurately position the user intention to give corresponding feedback.

According to an aspect of an embodiment of the present invention, there is provided an electronic apparatus including: a communicator configured to receive first question information input by a user; a controller coupled to the communicator, the controller configured to: processing the first question information based on an intention recognition model to obtain a plurality of candidate reply information; determining target similarity of the plurality of candidate reply messages based on the plurality of candidate reply messages; if the target similarity is smaller than a similarity threshold, acquiring at least two keywords corresponding to the candidate reply messages; generating intention questioning information according to the at least two keywords, and outputting the intention questioning information; and determining and outputting target reply information corresponding to the first question information in response to the intention reply information input by the user on the intention question information.

In some embodiments, the controller is specifically configured to: responding to the intention reply information input by the user to the intention question information, and determining second question information according to the first question information and the intention reply information; and processing the second question information based on the intention recognition model to obtain and output the target reply information.

In some embodiments, the controller is specifically configured to: determining the similarity between any two candidate reply messages in the plurality of candidate reply messages; and determining the target similarity according to the similarity between any two pieces of candidate reply information and the number of the similarities between any two pieces of candidate reply information.

In some embodiments, the controller is specifically configured to: clustering the candidate reply information to obtain a plurality of candidate reply information groups; wherein, the corresponding categories among the candidate reply information groups in the candidate reply information groups are different; extracting keywords corresponding to the candidate reply information groups; and generating the intention questioning information according to the keywords corresponding to the candidate reply information groups.

In some embodiments, each candidate reply message group includes at least one candidate reply message of the plurality of candidate reply messages, and the controller is specifically configured to: processing each candidate reply information group based on a keyword extraction model to obtain keywords corresponding to each candidate reply information group; and processing keywords corresponding to each candidate reply information group based on a question information generation model to obtain the intention question information.

In some embodiments, the controller is further configured to: and if the target similarity is greater than or equal to the preset threshold, outputting any one candidate reply message of the plurality of candidate reply messages.

According to still another aspect of the embodiment of the present invention, there is provided an intention recognition method, applied to an electronic device, the method including: receiving first question information input by a user; processing the first question information based on an intention recognition model to obtain a plurality of candidate reply information; determining target similarity of the plurality of candidate reply messages based on the plurality of candidate reply messages; if the target similarity is smaller than a similarity threshold, acquiring at least two keywords corresponding to the candidate reply messages; generating intention questioning information according to the at least two keywords, and outputting the intention questioning information; and determining and outputting target reply information corresponding to the first question information in response to the intention reply information input by the user on the intention question information.

In some embodiments, the determining and outputting, in response to the intention reply information input by the user to the intention question information, target reply information corresponding to the first question information includes: responding to the intention reply information input by the user to the intention question information, and determining second question information according to the first question information and the intention reply information; and processing the second question information based on the intention recognition model to obtain and output the target reply information.

In some embodiments, determining the target similarity of the plurality of candidate reply messages based on the plurality of candidate reply messages includes: determining the similarity between any two candidate reply messages in the plurality of candidate reply messages; and determining the target similarity according to the similarity between any two pieces of candidate reply information and the number of the similarities between any two pieces of candidate reply information.

In some embodiments, the obtaining at least two keywords corresponding to the plurality of candidate reply messages includes: clustering the candidate reply information to obtain a plurality of candidate reply information groups; wherein, the corresponding categories among the candidate reply information groups in the candidate reply information groups are different; extracting keywords corresponding to the candidate reply information groups; and generating the intention questioning information according to the keywords corresponding to the candidate reply information groups.

In some embodiments, each candidate reply information group includes at least one candidate reply information of the plurality of candidate reply information, and the obtaining at least two keywords corresponding to the plurality of candidate reply information includes: processing each candidate reply information group based on a keyword extraction model to obtain keywords corresponding to each candidate reply information group; the generating the intention question information according to the at least two keywords includes: and processing keywords corresponding to each candidate reply information group based on a question information generation model to obtain the intention question information.

In some embodiments, the above method further comprises: and if the target similarity is greater than or equal to the preset threshold, outputting any one candidate reply message of the plurality of candidate reply messages.

According to still another aspect of the embodiments of the present invention, there is provided an intention recognition apparatus configured in an electronic device, the apparatus including: a receiving module for: receiving first question information input by a user; a processing module for: processing the first question information based on an intention recognition model to obtain a plurality of candidate reply information; a determining module for: determining target similarity of the plurality of candidate reply messages based on the plurality of candidate reply messages; an acquisition module for: if the target similarity is smaller than a similarity threshold, acquiring at least two keywords corresponding to the candidate reply messages; a generation module for: generating intention questioning information according to the at least two keywords, and outputting the intention questioning information; the determining module is further configured to: and determining and outputting target reply information corresponding to the first question information in response to the intention reply information input by the user on the intention question information.

According to yet another aspect of embodiments of the present invention, there is provided a computer-readable storage medium having stored therein at least one executable instruction that, when executed on an electronic device, causes the electronic device to perform the operations of the intent recognition method as described above.

According to yet another aspect of embodiments of the present invention, there is provided a computer program product for, when run on an electronic device, causing the electronic device to implement the operations of the intent recognition method as described above.

According to the electronic equipment, the intention recognition method, the intention recognition device and the storage medium provided by the embodiment of the invention, the first question information can be processed based on the intention recognition model to obtain a plurality of candidate reply information, and the target similarity of the plurality of candidate reply information is determined based on the plurality of candidate reply information. And if the target similarity is smaller than the similarity threshold, acquiring at least two keywords corresponding to the candidate reply messages. And generating intention questioning information according to at least two keywords, and outputting the intention questioning information. And finally, responding to the intention reply information input by the user on the intention questioning information, and determining and outputting target reply information corresponding to the first questioning information.

By applying the technical scheme of the invention, whether the intention recognition model accurately recognizes the user intention can be judged based on the first question information input by the user, and the intention question information is generated to further acquire the intention of the user under the condition that the user intention cannot be accurately recognized, so that the accurate recognition of the user intention is realized, and the target reply information meeting the user requirement is output. Compared with the prior art, the method and the device have the advantages that the scheme of intention recognition in the field of the database is needed based on the pre-constructed intention database, and the intention recognition in the open field can be realized without depending on the intention database, so that the application scene of the AI dialogue system in the aspect of the intention recognition is expanded, and the user experience is improved.

Drawings

Fig. 1 shows an interaction schematic diagram of an electronic device and a control device according to an embodiment of the present invention;

fig. 2 shows a block diagram of a configuration of a control device in an embodiment of the present invention;

Fig. 3 is a block diagram showing a hardware configuration of an electronic device according to an embodiment of the present invention;

FIG. 4 is a flowchart of an intent recognition method provided by an embodiment of the present invention;

FIG. 5 is a schematic diagram illustrating an AI dialogue system according to an embodiment of the invention;

FIGS. 6A and 6B illustrate a scene graph of intent recognition provided by an embodiment of the present invention;

FIG. 7 is a flowchart of obtaining at least two keywords corresponding to a plurality of candidate reply messages according to an embodiment of the present invention;

FIG. 8 is a schematic diagram of keyword extraction according to an embodiment of the present invention;

FIG. 9 is a schematic diagram of generating intent query information according to an embodiment of the present invention;

FIG. 10 shows a flowchart for determining and outputting target reply information corresponding to first question information according to an embodiment of the present invention;

FIG. 11 is a flowchart of another method for intent recognition provided by an embodiment of the present invention;

fig. 12 is a schematic structural diagram of an intent recognition device according to an embodiment of the present invention.

Detailed Description

For the purposes of making the objects and embodiments of the present invention more apparent, an exemplary embodiment of the present invention will be described in detail below with reference to the accompanying drawings in which exemplary embodiments of the present invention are illustrated, it being apparent that the exemplary embodiments described are only some, but not all, of the embodiments of the present invention.

It should be noted that the brief description of the terminology in the present invention is for the purpose of facilitating understanding of the embodiments described below only and is not intended to limit the embodiments of the present invention. Unless otherwise indicated, these terms should be construed in their ordinary and customary meaning.

The terms first, second, third and the like in the description and in the claims and in the above-described figures are used for distinguishing between similar or similar objects or entities and not necessarily for describing a particular sequential or chronological order, unless otherwise indicated. It is to be understood that the terms so used are interchangeable under appropriate circumstances.

The terms "comprises," "comprising," and any variations thereof herein are intended to cover a non-exclusive inclusion, such that a product or apparatus that comprises a list of elements is not necessarily limited to all elements explicitly listed, but may include other elements not expressly listed or inherent to such product or apparatus.

In the related art, the intention recognition scheme adopted is generally: firstly, manually sorting out question information and corresponding intentions possibly input by all users to construct an intention database, training an intention recognition model based on the intention database in a machine learning or deep learning mode and the like, and recognizing the intention of the users based on the trained intention recognition model; or based on the capabilities of the AI dialog model, using tagged intent data to make supervised fine-tuning of the model to identify user intent, etc. In summary, the intention recognition method in the related art generally needs to rely on the related intention data acquired in advance, that is, the intention recognition method in the related art is performed in the field formed by the intention data acquired in advance when performing intention recognition, that is, the intention recognition of the user in the open field cannot be realized, so that the accurate intention of the user cannot be further acquired and recognized under the condition that the intention of the user is ambiguous, that is, feedback meeting the real requirement of the user cannot be given, and the user experience is greatly reduced.

The embodiment of the invention provides electronic equipment and an intention recognition method, and the intention recognition method can be applied to the electronic equipment. By applying the technical scheme of the invention, the intention recognition of the open domain can be realized by an active questioning mode, so that under the condition that the intention of a user is not clear, the intention of the user is accurately positioned and corresponding feedback is given by the targeted active questioning, and the user experience is improved.

The intention recognition method provided by the present application is described in detail below with reference to the accompanying drawings.

The electronic device provided in the embodiment of the invention may have various implementation forms, for example, may be a mobile terminal, a tablet computer, a notebook computer, a television, an electronic whiteboard (electronic bulletin board), an electronic desktop (electronic table), and the like, and the specific form of the electronic device is not limited in the embodiment of the invention.

Fig. 1 shows an interaction schematic diagram of an electronic device and a control device according to an embodiment of the present invention. As shown in fig. 1, a user may operate the electronic apparatus 200 through the mobile terminal 300 or the control device 100. The control device 100 may be a remote controller, and the remote controller and the electronic apparatus 200 may communicate through an infrared protocol, a bluetooth protocol, or the remote controller may control the electronic apparatus 200 in a wireless or other wired manner.

The user may control the electronic device 200 by inputting user instructions through keys on a remote control, voice input, a control panel, etc. For example, the user may control the electronic device 200 to switch the displayed page through up-down keys on the remote controller, control the video played by the electronic device 200 to play or pause through play pause keys, and input a voice command through a voice input key to control the electronic device 200 to perform a corresponding operation.

In some embodiments, the user may also control the electronic device 200 using a mobile terminal, tablet, computer, notebook, and other smart device. For example, a user may control the electronic device 200 through an application installed on the smart device that, by configuration, may provide the user with various controls in an intuitive user interface on a screen associated with the smart device.

In some embodiments, the mobile terminal 300 may implement connection communication with a software application installed on the electronic device 200 through a network communication protocol for the purpose of one-to-one control operation and data communication. For example, a control command protocol may be established between the mobile terminal 300 and the electronic device 200, a remote control keyboard may be synchronized with the mobile terminal 300, a function of controlling the electronic device 200 may be implemented by controlling a user interface on the mobile terminal 300, or a function of transmitting content displayed on the mobile terminal 300 to the electronic device 200, and a synchronous display may be implemented.

As shown in fig. 1, the electronic device 200 and the server 400 may communicate data via a variety of communication means, which may allow the electronic device 200 to be communicatively coupled via a local area network (Local Area Network, LAN), a wireless local area network (Wireless Local Area Network, WLAN), and other networks. The server 400 may provide various content and interactions to the electronic device 200. For example, electronic device 200 receives software program updates, or accesses a remotely stored digital media library by sending and receiving messages, and electronic program guide (ELECTRICAL PROGRAM GUIDE, EPG) interactions. The server 400 may be one cluster or multiple clusters, and may include one or more types of servers.

The electronic device 200 may be a liquid crystal display, an Organic Light-Emitting Diode (OLED) display, a projection electronic device, a smart terminal, such as a mobile phone, a tablet computer, a smart television, a laser projection device, an electronic desktop (electronic table), etc. The specific electronic device type, size, resolution, etc. are not limited.

Fig. 2 shows a block diagram of a configuration of the control device 100 in an exemplary embodiment of the present invention, and as shown in fig. 2, the control device 100 includes a controller 110, a communication interface 130, a user input/output interface 140, a memory, and a power supply. The control apparatus 100 may receive an operation instruction input by a user, and convert the operation instruction into an instruction recognizable and responsive to the electronic device 200, and may perform an interaction between the user and the electronic device 200.

Taking an electronic device as an example, fig. 3 shows a hardware configuration block diagram of an electronic device 200 according to an embodiment of the present invention. As shown in fig. 3, the electronic device 200 includes: a modem 210, a receiver 220, a detector 230, an external device interface 240, a controller 250, a display 260, an audio output interface 270, and at least one of a memory, a power supply, and a user interface.

The modem 210 may receive broadcast television signals through a wired or wireless reception manner and demodulate an audio/video signal, such as an EPG data signal, from a plurality of wireless or wired broadcast television signals. The detector 230 may be used to collect signals of the external environment or interaction with the outside.

In some embodiments, the frequency point demodulated by the modem 210 is controlled by the controller 250, and the controller 250 may issue a control signal according to the user selection, so that the modem responds to the television signal frequency selected by the user and modulates and demodulates the television signal carried by the frequency.

The broadcast television signal may be classified into a terrestrial broadcast signal, a cable broadcast signal, a satellite broadcast signal, an internet broadcast signal, or the like according to different broadcasting systems of the television signal. Or may be differentiated into digital modulation signals, analog modulation signals, etc., depending on the type of modulation. And further, the signals are classified into digital signals, analog signals and the like according to different signal types.

In some embodiments, the controller 250 and the modem 210 may be located in separate devices, i.e., the modem 210 may also be located in an external device to the main device in which the controller 250 is located, such as an external set-top box or the like.

In some embodiments, the receiver 220 may be a component for communicating with an external device or external server according to various communication protocol types. For example: the receiver may include at least one of a Wifi chip, a bluetooth communication protocol chip, a wired ethernet communication protocol chip, or other network communication protocol chip or a near field communication protocol chip, and an infrared receiver.

In some embodiments, the detector 230 may be used to collect signals of or interact with the external environment, may include an optical receiver and a temperature sensor, etc.

The light receiver can be used for acquiring a sensor of the intensity of the ambient light, and adaptively adjusting display parameters and the like according to the intensity of the ambient light; the temperature sensor may be used to sense an ambient temperature, so that the electronic device 200 may adaptively adjust a display color temperature of the image, such as may adjust a color temperature colder hue of the image displayed by the electronic device 200 when the ambient temperature is higher, or may adjust a color temperature warmer hue of the image displayed by the electronic device 200 when the ambient temperature is lower.

In some embodiments, the detector 230 may further include an image collector, such as a camera, a video camera, etc., which may be used to collect external environmental scenes, collect attributes of a user or interact with a user, adaptively change display parameters, and recognize a user gesture to realize an interaction function with the user.

In some embodiments, the detector 230 may also include a sound collector or the like, such as a microphone, that may be used to receive the user's sound. For example, a voice signal including control instructions for a user to control the electronic device 200, or collecting ambient sound for identifying an ambient scene type, so that the electronic device 200 can adapt to ambient noise.

In some embodiments, external device interface 240 may include, but is not limited to, the following: any one or more interfaces such as a high-definition multimedia interface (High Definition Multimedia Interface, HDMI), an analog or data high-definition component input interface, a composite video input interface, a universal serial bus (Universal Serial Bus, USB) input interface, an RGB port, or the like, or an input/output interface in which the plurality of interfaces form a composite can be used.

As shown in fig. 3, the controller 250 may include at least one of a central processor, a video processor, an audio processor, a graphic processor, a random access Memory (Random Access Memory, RAM), a Read-Only Memory (ROM), and a first interface to an nth interface for input/output. Wherein the communication bus connects the various components.

In some embodiments, the controller 250 may control the operation of the electronic device and respond to user operations through various software control programs stored on an external memory. For example, a user may input a user command through a graphical user interface (Graphic User Interface, GUI) displayed on the display 260, the user input interface receives the user input command through the graphical user interface, or the user may input the user command by inputting a specific sound or gesture, the user input interface recognizes the sound or gesture through the sensor, and receives the user input command.

A "user interface" is a media interface for interaction and exchange of information between an application or operating system and a user that enables conversion between an internal form of information and a user-acceptable form. A commonly used presentation form of a user interface is a graphical user interface, which refers to a user interface related to computer operations that is displayed in a graphical manner. The control can comprise at least one of an icon, a button, a menu, a tab, a text box, a dialog box, a status bar, a navigation bar, a Widget (short for Widget) and other visual interface elements.

In some embodiments, RAM may be used to store temporary data for the operating system or other on-the-fly programs; ROM may be used to store instructions for various system starts, for example, may be used to store instructions for a basic input output system, referred to as a basic input output system (Basic Input Output System, BIOS) start. ROM can be used to complete the power-on self-test of the system, the initialization of each functional module in the system, the driving program of the basic input/output of the system and the booting of the operating system.

In some embodiments, upon receipt of the power-on signal, the electronic device 200 power begins to boot, and the central processor runs system boot instructions in ROM, copying temporary data of the operating system stored in memory into RAM for booting or running the operating system. When the starting of the operating system is completed, the CPU copies the temporary data of various application programs in the memory into the RAM, and then the temporary data are convenient for starting or running the various application programs.

In some embodiments, the central processor may be configured to execute operating system and application instructions stored in memory, and to execute various applications, data, and content in accordance with various interactive instructions received from external inputs, to ultimately display and play various audio-visual content.

In some example embodiments, the central processor may include a plurality of processors. The plurality of processors may include one main processor and one or more sub-processors. A main processor for performing some operations of the electronic device 200 in a pre-power-up mode and/or displaying a screen in a normal mode. One or more sub-processors for one operation in a standby mode or the like.

In some embodiments, the video processor may be configured to receive an external video signal, perform video processing in accordance with standard codec protocols for input signals, decompression, decoding, scaling, noise reduction, frame rate conversion, resolution conversion, transparency settings, image composition, etc., and may result in a signal that is directly displayable or playable on the electronic device 200.

In some embodiments, the video processor may include a demultiplexing module, a video decoding module, an image compositing module, a frame rate conversion module, a display formatting module, and the like.

The demultiplexing module is used for demultiplexing the input audio/video data stream, such as input moving picture expert group standard 2 (Moving Picture Experts Group-2, MPEG-2), and demultiplexes the input audio/video data stream into video signals, audio signals and the like; the video decoding module is used for processing the demultiplexed video signal, including decoding and scaling, transparency setting, etc.

And an image synthesis module, such as an image synthesizer, for performing superposition mixing processing on the graphic generator and the video image after the scaling processing according to the GUI signal input by the user or generated by the graphic generator, so as to generate an image signal for display. The frame rate conversion module is configured to convert the input video frame rate, for example, converting the 60Hz frame rate into the 120Hz frame rate or the 240Hz frame rate, and the common format is implemented in an inserting frame manner. The display format module is used for converting the received frame rate into a video output signal, and changing the video output signal to a signal conforming to the display format, such as outputting an RGB data signal.

In some embodiments, the audio processor may be configured to receive an external audio signal, decompress and decode the audio signal according to a standard codec protocol of the input signal, and perform noise reduction, digital-to-analog conversion, and amplification processes to obtain a sound signal that may be played in a speaker.

In some embodiments, the video processor may comprise one or more chips. The audio processor may also comprise one or more chips. Meanwhile, the video processor and the audio processor may be a single chip, or may be integrated with the controller in one or more chips.

In some embodiments, the interface for input/output may be used for audio output, that is, receiving the sound signal output by the audio processor under the control of the controller 250 and outputting the sound signal to an external device such as a speaker, and may output the sound signal to an external sound output terminal of the generating device of the external device, except for the speaker carried by the electronic device 200 itself, for example: external sound interface or earphone interface, etc. The audio output may also include a near field communication module in the communication interface, such as: and the Bluetooth module is used for outputting sound of a loudspeaker connected with the Bluetooth module.

In some embodiments, the graphics processor may be used to generate various graphical objects, such as: icons, operation menus, user input instruction display graphics, and the like. The graphic processor may include an operator to display various objects according to display attributes by receiving user input of various interactive instructions to perform operations. And a renderer for rendering the various objects obtained by the arithmetic unit, wherein the rendered objects are used for being displayed on a display.

In some embodiments, the graphics processor and the video processor may be integrated or may be separately configured, where the integrated configuration may perform processing of graphics signals output to the display, and the separate configuration may perform different functions, such as a graphics processor (Graphics Processing Unit, GPU) +frame frequency Conversion (FRAME RATE Conversion, FRC) architecture, respectively.

The display 260 may be at least one of a liquid crystal display, an OLED display, a touch display, and a projection display, and may also be a projection device and a projection screen.

In some embodiments, the display 260 may be used to display a user interface, such as may be used to display an interface corresponding to an electronic device, for example, the display interface may be a channel search interface in an electronic device, or may also be a display interface of some application program, etc.

In some embodiments, the display 260 may be used to receive audio and video signals output by the audio processor and video processor, display video content and images, play audio of the video content, and display components of a menu manipulation interface.

In some embodiments, the display 260 may be used to present a user-operated UI interface generated in the electronic device 200 and used to control the electronic device 200.

In some embodiments, the electronic device 200 may establish control signal and data signal transmission and reception between the receiver 220 and the control apparatus 100 or the content providing device.

In some embodiments, the memory may include storage of various software modules for driving the electronic device 200. Such as: various software modules stored in the first memory, including: at least one of a basic module, a detection module, a communication module, a display control module, a browser module, various service modules and the like.

The base module is a bottom software module for signal communication between the hardware of the electronic device 200 and sending processing and control signals to the upper module. The detection module is used for collecting various information from various sensors or user input interfaces and carrying out digital-to-analog conversion and analysis management.

The display control module can be used for controlling the display to display the image content and can be used for playing the multimedia image content, the UI interface and other information. And the communication module can be used for carrying out control and data communication with external equipment. And the browser module can be used for executing data communication between the browsing servers. And the service module is used for providing various services and various application programs. Meanwhile, the memory may also store images of various items in various user interfaces, visual effect patterns of the focus object, and the like, which receive external data and user data.

In some embodiments, the user interface may be used to receive control device 100, such as: an infrared control signal transmitted by an infrared remote controller, etc.

The power supply may supply power to the electronic device 200 through power input from an external power source under the control of the controller 250.

In some embodiments, the electronic device 200 may receive a query instruction input by a user through the receiver 220. For example, when the receiver 220 is a touch component, the touch component may together form a touch screen with the display 260. On the touch screen, a user can input different control instructions through touch operation, for example, the user can input touch instructions such as clicking, sliding, long pressing, double clicking and the like, and different touch instructions can represent different control functions.

To implement the different touch actions, the touch assembly may generate different electrical signals when the user inputs the different touch actions, and transmit the generated electrical signals to the controller 250. The controller 250 may perform feature extraction on the received electrical signal to determine a control function to be performed by the user based on the extracted features.

For example, when a user inputs a click touch action at a search location in the display interface, the touch component will sense the touch action to generate an electrical signal. After receiving the electrical signal, the controller 250 may determine the duration of the level corresponding to the touch action in the electrical signal, and recognize that the user inputs the click command when the duration is less than the preset time threshold. The controller 250 then extracts the location features generated by the electrical signals to determine the touch location. When the touch position is within the search position range, it is determined that the user has input a click touch instruction at the search position. Then, the controller 250 may start a media search function and receive a search instruction input by the user, such as a search keyword, a voice search instruction, etc.

In some embodiments, the user may trigger the query operation through a specific gesture operation on the touch screen, for example, when the user performs two continuous double-click operations on the display interface, the controller 250 may determine an interval time between two continuous double-clicks, and when the interval time is less than a preset time threshold, recognize that the user inputs the continuous double-click operation, and determine that the user triggers the media resource search operation.

In some embodiments, a user may enter voice instructions on a touch screen via a touch operation, such as a user may trigger a voice query operation on display 260 via a voice-triggered gesture.

In some embodiments, the receiver 220 may also be an external control component, such as a mouse, remote control, or the like, that establishes a communication connection with the electronic device. When the user performs different control operations on the external control component, the external control component may generate different control signals in response to the control operations of the user and transmit the generated control signals to the controller 250. The controller 250 may perform feature extraction on the received control signal to determine a control function to be performed by the user according to the extracted features.

For example, when a user clicks a left mouse button at any position in the channel display interface through the external control component, the external control component can sense a control action to generate a control signal. After receiving the control signal, the controller 250 may control the stay time of the action at the position according to the control signal, and identify that the click command is input by the user through the external control component when the stay time is less than the preset time threshold. The clicking instruction is used for triggering an input function instruction of the query instruction or switching the media resource page under the current scene.

For another example, when the user presses a voice key on the remote control, the remote control may initiate a voice entry function, and during the process of the user entering a voice command, the remote control may synchronize the voice command to the display 260, at which time the display 260 may display a voice entry identifier to indicate that the user is entering a voice command.

In some embodiments, the receiver 220 may also be a control component coupled to the display 260, such as a desktop computer, for example, and the control component may be a keyboard coupled to the display. The user can input different control instructions, such as media information switching instructions, inquiry instructions and the like through the keyboard.

Illustratively, the user may input a click command, a voice command, etc. through the corresponding shortcut key. For example, the user may trigger the sliding operation by selecting the "Tab" key and the direction key, that is, when the user selects the "Tab" key and the direction key on the keyboard at the same time, the controller 250 may receive the key signal, determine that the user triggers the operation of performing the switching operation in the direction corresponding to the direction key, and then, the controller 250 may control to turn or scroll the display interface in the media presentation page to display the corresponding media options.

Correspondingly, the user can also input voice instructions through corresponding shortcut keys. For example, when the user selects the "Ctrl" key and the "V" key, the controller 250 may receive a key signal to determine that the user triggers a voice search operation, and then the controller 250 may receive a voice command input by the user and control the display 260 to perform a corresponding operation, such as displaying a query result page corresponding to the voice command, according to the voice command.

In order to facilitate the detailed description of the intention recognition method provided by the embodiment of the present invention, fig. 4 shows a flowchart of an intention recognition method provided by the embodiment of the present invention, and the method may be applied to the electronic device 200 shown in fig. 1.

Among other things, the electronic device 200 may include a display 260, a receiver 220, and a controller 250 coupled to the display 260 and the receiver 220, respectively.

In some embodiments, the display 260 may be used to display a user interface, which is an interactive interface of a user with the electronic device 200, and the user may send instructions to the electronic device 200 through control operations, such as touch operations, gesture operations, etc., to achieve a certain task. The user interface has perfect interactive design, so that a user can easily complete a task.

In some embodiments, the receiver 220 may be configured to receive first question information input by a user, and the controller 250 may process the first question information based on the intent recognition model to obtain a plurality of candidate reply information.

According to the intention recognition method provided by the embodiment of the invention, the electronic equipment processes the first question information based on the intention recognition model to obtain a plurality of candidate reply information, and determines the target similarity of the plurality of candidate reply information based on the plurality of candidate reply information. And if the target similarity is smaller than the similarity threshold, acquiring at least two keywords corresponding to the candidate reply messages. And generating intention questioning information according to at least two keywords, and outputting the intention questioning information. And finally, responding to the intention reply information input by the user on the intention questioning information, and determining and outputting target reply information corresponding to the first questioning information.

Referring to a schematic structure of an AI dialog system shown in fig. 5, the intention recognition method provided by the present invention may be applied to the AI dialog system 510 shown in fig. 5.

For example, as shown in fig. 5, an AI dialog system 510 may be established in an electronic device, and an application layer, an interface layer, and a functional layer may be included in the AI dialog system 510.

The application layer is used for displaying a user interface, so that a user can input related text question information or voice question information (such as first question information) in the user interface and view reply information (such as intention question information and target reply information) output by the AI dialogue system in the user interface.

It will be appreciated that the first question information (prompt) refers to an instruction/instruction or query to indicate the user's intent and needs, for example: "please recommend a smartphone", "how is today weather? "you are a medical expert, please explain the cause and treatment of myocardial infarction", etc. The form of the first question information may include a voice form and a text form, and the form of the first question information is not limited in this embodiment.

The function layer is used for realizing functions of similarity calculation, intention recognition, question information generation and the like through a related model and an algorithm, wherein the related model and the algorithm can comprise a similarity matching model, a similarity algorithm, an intention recognition model, a keyword extraction model, a question information generation model, a clustering algorithm and the like.

The interface layer is used for defining related interfaces so as to realize information transfer between the application layer and the functional layer. For example, after a user inputs text question information in a user interface of an application layer, an interface layer receives the text question information and inputs the text question information to an intention recognition model in a function layer, so that the intention recognition model carries out relevant processing on the text question information.

In some embodiments, the user may change the degrees of freedom of the intent recognition model by adjusting the temperature coefficients in the intent recognition model. The degree of freedom of the intent recognition model refers to the similarity between a plurality of replies output by the intent recognition model aiming at the same piece of questioning information input by a user, namely the divergence of the output of the intent recognition model. It can be understood that the higher the degree of freedom of the intention recognition model is, the lower the degree of similarity between the replies outputted for the same piece of question information is; the lower the degree of freedom of the intention recognition model is, the higher the degree of similarity between the replies outputted for the same piece of question information is.

Note that the structure of the AI dialog system 510 shown in fig. 5 is merely an example, and this is not a limitation in the present embodiment.

As shown in fig. 4, the controller 250 is configured to perform the following steps S410 to S460:

s410: first question information input by a user is received.

Referring to fig. 6A and 6B, a scene graph of intent recognition is shown.

In an exemplary embodiment, as shown in fig. 6A and 6B, a user may input first question information in the user interface 601 of the AI dialog system 510, after which the interface layer of the AI dialog system 510 may receive the first question information through the relevant interface.

For example, as shown in fig. 6A, the first question information input by the user may be first question information 602: "please help me write a lecture about basketball game"; alternatively, as shown in fig. 6B, the first question information input by the user may be first question information 602': "please help me write a lecture," etc. Thereafter, the interface layer of the AI dialog system 510 may receive the first question information 602 or the first question information 602' via the relevant interface.

S420: and processing the first question information based on the intention recognition model to obtain a plurality of candidate reply information.

In an exemplary embodiment, with continued reference to fig. 6A and 6B, after determining the degrees of freedom of the intent recognition model, the interface layer may send the above-described first question information 602 or first question information 602' to the intent recognition model 603 in the functional layer through the relevant interface.

Next, the AI dialog system 510 may input the first question information 602 or the first question information 602' to the intent recognition model 603 a plurality of times, so that the intent recognition model 603 processes the first question information 602 or the first question information 602', and obtains candidate reply information generated after the intent recognition model 603 processes the first question information 602 or the first question information 602' each time, thereby finally obtaining a plurality of candidate reply information generated by the intent recognition model 603.

The number of times the AI dialog system 510 inputs the first question information into the intent recognition model 603 may be set according to actual situations, which is not limited in this embodiment.

For example, the plurality of candidate reply messages may include: candidate reply information a, candidate reply information B, candidate reply information C, candidate reply information D, and so on corresponding to the first question information 602; or candidate reply information a ', candidate reply information B ', candidate reply information C ', candidate reply information D ', etc. corresponding to the first question information 602 '.

It will be appreciated that the process of obtaining the plurality of candidate reply messages output by the intent recognition model 603 is performed in the background of the AI dialog system 510, that is, the plurality of candidate reply messages are not yet output to the user interface 601 through the interface layer.

S430: based on the plurality of candidate reply messages, a target similarity of the plurality of candidate reply messages is determined.

In an exemplary embodiment, reference is continued to fig. 6A and 6B. After obtaining the plurality of candidate reply messages, the AI dialog system 510 may input the plurality of candidate reply messages into the similarity matching model 604 to determine target similarities for the plurality of candidate reply messages based on the similarity matching model 604; or the AI dialog system 510 may determine the target similarity for the plurality of candidate reply messages based on a similarity algorithm.

The similarity matching model 604 may include, among other things, a Cross-encoder (Cross-encoders) model, a Bi-encoders model, a Late Interaction model, an Attention-based aggregation (Attention-based Aggregator) model, and so on.

The similarity algorithm may include a distance-based similarity calculation method, for example: euclidean distance (Euclidean Distance), manhattan distance (MANHATTAN DISTANCE), etc.; and angle cosine algorithms, such as: angle Cosine (Cosine), tanimoto coefficient (Tanimoto Coefficient), etc.; and a Jacquard similarity coefficient (Jaccard similarity coefficient), and the like.

For example, the AI dialog system 510 may first perform a vectorization process on the plurality of candidate reply messages, thereby processing the plurality of candidate reply messages based on the similarity matching model 604 or the similarity algorithm. In the process of processing the vectorized plurality of candidate reply messages, a similarity between any two candidate reply messages of the plurality of candidate reply messages may be first determined. And then determining the target similarity corresponding to the plurality of candidate reply messages according to the similarity between any two candidate reply messages and the number of the similarities between any two candidate reply messages.

The similarity between any two candidate reply messages may be the similarity between specific contents of any two candidate reply messages; the target similarity refers to average similarity corresponding to the candidate reply messages.

For example, assume that, based on the similarity matching model 604 or the similarity algorithm, after the candidate reply information a, the candidate reply information B, the candidate reply information C, and the candidate reply information D are processed, the similarity x of any two candidate reply information is obtained as follows: the similarity x1=0.72 between the candidate reply message a and the candidate reply message B, the similarity x2=0.81 between the candidate reply message a and the candidate reply message C, the similarity x3=0.70 between the candidate reply message a and the candidate reply message D, the similarity x4=0.75 between the candidate reply message B and the candidate reply message C, the similarity x5=0.88 between the candidate reply message B and the candidate reply message D, and the similarity x6=0.83 between the candidate reply message C and the candidate reply message D.

Then, according to the similarity (x 1, x2, x3, x4, x5, x 6) of any two candidate reply messages and the number (n=6) of the similarity of any two candidate reply messages, the target similarities corresponding to the candidate reply message a, the candidate reply message B, the candidate reply message C, and the candidate reply message D are obtained as follows: (x1+x2+x3+x4+x5+x6)/n= (0.72+0.81+0.70+0.75+0.88+0.83)/6=0.78.

For another example, assume that, based on the similarity matching model 604 or the similarity algorithm, the similarity x between any two candidate reply messages is obtained by processing the candidate reply message a ', the candidate reply message B', the candidate reply message C ', and the candidate reply message D' as follows: the similarity x1=0.62 between the candidate reply message a 'and the candidate reply message B', the similarity x2=0.51 between the candidate reply message a 'and the candidate reply message C', the similarity x3=0.37 between the candidate reply message a 'and the candidate reply message D', the similarity x4=0.48 between the candidate reply message B 'and the candidate reply message C', the similarity x5=0.55 between the candidate reply message B 'and the candidate reply message D', and the similarity x6=0.34 between the candidate reply message C 'and the candidate reply message D'.

Then, according to the similarity (x 1, x2, x3, x4, x5, x 6) of any two candidate reply messages and the number (n=6) of the similarity of any two candidate reply messages, the target similarities corresponding to the candidate reply message a ', the candidate reply message B', the candidate reply message C 'and the candidate reply message D' are obtained as follows: (x1+x2+x3+x4+x5+x6)/n= (0.62+0.51+0.37+0.48+0.55+0.34)/6=0.48.

It should be noted that, the above method for determining the target similarity of the plurality of candidate reply messages is merely an example, and the present embodiment is not limited thereto.

In an exemplary embodiment, the AI dialog system 510 may next determine a magnitude relationship between the target similarity and a pre-set similarity threshold. If the target similarity is smaller than the similarity threshold, the intention of the first question information input by the user is not clear, so that a great difference exists among a plurality of candidate reply information generated by the intention recognition model 603 according to the first question information; if the target similarity is greater than or equal to the similarity threshold, the intention of the first question information input by the user is clear, so that the difference between the candidate reply information generated by the intention recognition model 603 according to the first question information is small.

For example, for the first question information 602 "please help me write a lecture about basketball game", the intention expressed by the question information is clear, and the user describes the theme or application scenario of the lecture as "basketball game". Thus, the intent recognition model 603 may output a plurality of candidate reply messages each of which is related to a basketball game lecture based on the subject of the "basketball game", so that the similarity difference between the contents of the candidate reply messages may be small.

For the first question information 602' "please help me write a lecture", the intention expressed by the question information is not very clear, and the user does not describe the theme or application scenario of the lecture. Thus, the intent recognition model 603 may output a plurality of candidate reply messages having different subjects or application scenes, for example, a lecture about a basketball game, a lecture of a seminar, a lecture of a football game, a lecture at the time of winning a prize, and the like, and thus the similarity difference between the contents of these candidate reply messages may be large.

S440: if the target similarity is greater than or equal to a preset threshold, outputting any one candidate reply message of the plurality of candidate reply messages.

In an exemplary embodiment, reference is continued to FIG. 6A. In the case that the target similarity is greater than or equal to the similarity threshold, the AI dialog system 510 may select any one candidate reply message 605 from the plurality of candidate reply messages corresponding to the first question information 602, and output the candidate reply message 605 to the user interface 601 through the relevant interface of the interface layer.

S440': and if the target similarity is smaller than the similarity threshold, acquiring at least two keywords corresponding to the candidate reply messages.

In an exemplary embodiment, the AI dialog system 510 can ascertain the intent of the user by actively asking the user if the target similarity is less than the similarity threshold.

With continued reference to fig. 6B, taking the example of the above embodiment as an example, the target similarity of the plurality of candidate reply messages (e.g., candidate reply message a ', candidate reply message B', candidate reply message C ', and candidate reply message D') corresponding to the first question information 602 is 0.48; assuming a similarity threshold of 0.75, the target similarity is less than the similarity threshold.

In some embodiments, referring to fig. 7, the controller 250 may obtain at least two keywords corresponding to the plurality of candidate reply messages by:

s710: and clustering the candidate reply messages to obtain a plurality of candidate reply message groups.

Wherein each candidate reply information group of the plurality of candidate reply information groups corresponds to a different candidate reply information category.

In an exemplary embodiment, if the target similarity is less than the similarity threshold, the AI dialog system 510 may cluster the plurality of candidate reply messages and obtain a plurality of candidate reply message groups. It is understood that each candidate reply message group includes at least one candidate reply message; each candidate reply message group corresponds to a different candidate reply message category, i.e., the contents of candidate reply messages included in different candidate reply message groups belong to different categories.

The clustering algorithm may be a traditional clustering algorithm such as K-means, hierarchical clustering, or a clustering algorithm based on Deep learning such as Deep Cluster, which is not limited in this embodiment.

For example, regarding the candidate reply information a ', the candidate reply information B', the candidate reply information C ', and the candidate reply information D' corresponding to the first question information 602, it is assumed that the contents of the candidate reply information a 'and the candidate reply information C' are lectures regarding basketball game, the contents of the candidate reply information B 'are lectures regarding seminar, and the contents of the candidate reply information D' are lectures regarding winning winnings. Obviously, the candidate reply message a 'and the candidate reply message C' belong to the same category, the candidate reply message B 'is one category, and the candidate reply message D' is another category.

Therefore, after the 4 candidate reply messages are clustered based on a clustering algorithm, the obtained multiple candidate reply message groups are respectively: candidate reply information set 1 (including: candidate reply information a 'and candidate reply information C'), candidate reply information set 2 (including: candidate reply information B '), and candidate reply information set 3 (including: candidate reply information D').

S720: and extracting keywords corresponding to each candidate reply information group.

In an exemplary embodiment, reference is continued to FIG. 6B. Next, the AI dialog system 510 may perform keyword extraction processing on the plurality of candidate reply information groups based on the keyword extraction model 606, and obtain keywords corresponding to each candidate reply information group. It will be appreciated that the keywords are used to represent representative information corresponding to each candidate reply information group, i.e., to distinguish between different candidate reply information groups.

The keyword extraction model 606 may be obtained by training a sequence-to-sequence language model (Sequence to Sequence Language Model, seq-to-Seq LM), and the like, and in this embodiment, the type of the keyword extraction model 606 is not limited.

Reference is made to a schematic diagram of extracting keywords as shown in fig. 8.

For example, as shown in fig. 8, the AI dialog system 510 may sequentially input a plurality of candidate reply information groups (e.g., candidate reply information group 1, candidate reply information group 2, … … candidate reply information group m) into the keyword extraction model 606, so as to extract keywords from candidate reply information (e.g., ：s₁₁、s₁₂、……s_1n;s₂₁、s₂₂、……s_2n;……;s_m1、s_m2、……s_mn)) in each candidate reply information group based on the keyword extraction model 606, and finally obtain keywords (e.g., k1, k2, … …, km) corresponding to each candidate reply information group.

For example, for the candidate reply information a ', the candidate reply information B', the candidate reply information C ', and the candidate reply information D' corresponding to the first question information 602, it is assumed that after the keyword extraction is performed on the candidate reply information a '(i.e. s ₁₁) and the candidate reply information C' (i.e. s ₁₂) in the candidate reply information group 1 based on the keyword extraction model 606, the obtained keywords are: k1 = "basketball game"; after extracting the keyword of the candidate reply information B' (i.e. s ₂₁) in the candidate reply information group 2, the obtained keyword is: k2 = "seminar"; after extracting the keywords of the candidate reply information D' (i.e. s ₃₁) in the candidate reply information group 3, the obtained keywords are: k3 = "winning prize".

In some embodiments, when the keyword extraction processing is performed on the candidate reply messages in the plurality of candidate reply message groups, keywords may be extracted based on a named-body Recognition (NER) or keyword matching mode in addition to the models such as Seq-to-Seq LM; or keywords may also be obtained by entering question information into the intent recognition model 603, for example: please extract the keywords corresponding to each candidate reply information group (where the candidate reply information in each candidate reply information group is input), and so on. In this embodiment, the manner of extracting the keywords corresponding to each candidate reply information group is not limited.

S450: and generating intention questioning information according to at least two keywords, and outputting the intention questioning information.

In an exemplary embodiment, continued reference is made to FIG. 6B. Thereafter, the AI dialog system 510 may process 606 the keywords corresponding to each of the candidate reply information groups based on the question information generation model to obtain the intent question information 608.

The question information generating model pair 606 may be obtained by training a sequence-to-sequence language model (Sequence to Sequence Language Model, seq-to-Seq LM), etc., and in this embodiment, the type of the question information generating model pair 606 is not limited.

Reference is made to a schematic diagram of generating intent query information shown in fig. 9.

For example, as shown in FIG. 9, the AI dialogue system 510 may combine the keywords (e.g., k1, k2, … …, km) corresponding to each candidate reply information set described above, and may use symbols such as "|" to separate between the keywords to distinguish between the different keywords. Then, the keyword is input into the question information generation model pair 606, so that the question information generation model 607 generates the intended question information 608 based on the keyword.

For example, for the keywords k1= "basketball game", k2= "seminar", k3= "winning", the AI dialog system 510 may combine these keywords to obtain the keyword set q= { basketball game |seminar|winning }. Then, the keyword set Q is input into the question information generation model 607, so as to obtain the intention question information 608 output by the question information generation model 607, where the intention question information 608 may be: "what you want to write is a lecture about basketball game, whether it is a lecture for seminar, or about winning a prize? ".

In some embodiments, for the manner of obtaining the intent question information 608 according to the keywords corresponding to each candidate reply information group, the intent question information 608 may be generated by checking the differences between different keywords based on a pre-written question information template or rule in addition to the models based on the Seq-to-Seq LM and the like; or the intention question information 608 may also be obtained by inputting the question information to the intention recognition model 603, for example: please generate intent question information based on these keywords (e.g., basketball game, seminar, winning prize), etc. The manner in which the intention question information 608 is determined is not limited in this embodiment.

Exemplary, continued reference is made to FIG. 6B. After determining the intent query 608, the AI dialog system 510 can obtain the intent query 608 via the relevant interface in the interface layer and output it to the user interface 601.

S460: and responding to the intention reply information input by the user on the intention questioning information, and determining and outputting target reply information corresponding to the first questioning information.

In an exemplary embodiment, continued reference is made to FIG. 6B. After the user looks up the above-mentioned intention questioning information 608 in the user interface 601, the user may reply to the intention questioning information 608, that is, input intention reply information 609 in the user interface 601, so as to further acquire the intention of the user. The AI dialog system 510 may obtain the intent reply message 609 via an associated interface in the interface layer.

For example, the user requests the intention question information 608 "whether you want to write a lecture about basketball game, a lecture about seminar, or a lecture about winning? "intention reply message 609 is: "Artificial Intelligence seminar".

In some embodiments, referring to fig. 10, the controller 250 may determine and output the target reply information corresponding to the first question information by:

s1010: and determining second questioning information according to the first questioning information and the intention reply information.

In an exemplary embodiment, continued reference is made to FIG. 6B. The AI dialog system 510 can reconstruct the second question information from the first question information 602 and the intention reply information 609 entered by the user and input it into the intention recognition model 603. That is, the second question information constructed by the AI dialog system 510 integrates the user's intention, which is more clear than the first question information 602.

It will be appreciated that, as such, the second question information (prompt) refers to an instruction/instruction or query to indicate the user's intent and needs, such as: "please recommend a smartphone", "how is today weather? "you are a medical expert, please explain the cause and treatment of myocardial infarction", etc. The form of the second question information may include a voice form and a text form, and the form of the second question information is not limited in this embodiment.

For example, when the second question information is constructed, the second question information may be reconstructed according to a preset question information template, or the second question information may be reconstructed using an external question information construction service, etc., and in this embodiment, the manner of determining the second question information is not limited.

For example, as shown with continued reference to fig. 6B. The second question information generated according to the first question information 602 "please help me write a lecture" and the intention reply information 609 "about the artificial intelligence seminar" may be: please write a lecture about the artificial intelligence seminar.

S1020: and processing the second question information based on the intention recognition model to obtain and output target reply information.

In an exemplary embodiment, continued reference is made to FIG. 6B. Next, the AI dialog system 510 may input the second question information into the intent recognition model 603, such that the intent recognition model 603 processes the second question information and generates corresponding target reply information 610.

It should be noted that, the user input intention reply information 609 may not include the keywords related to the intention question information 608, for example, the intention reply information 609 may be: "teaching about football". However, this case has no effect on the subsequent processing, that is, the second question information is still generated according to the first question information 602 of the user and the intention reply information 609, and the second question information is processed based on the intention recognition model, so as to obtain the target reply information 610.

In some embodiments, the first question information 602 and the intention reply information 609 may also be directly processed based on the intention recognition model 603, thereby obtaining the target reply information 610.

In some embodiments, after the second question information is input to the intent recognition model 603, the above steps S420 to S430 may be performed on the second question information, so as to further determine whether the intent of the user of the second question information is clear.

For example, the intent reply information 609 entered by the user may be: "about seminar". In this case, the second question information generated by the intention recognition model 603 according to the first question information 602 and the intention reply information 609 may be: "please write a lecture about seminar".

After the second question information is processed based on the intention recognition model 603, the target similarity of the obtained multiple candidate reply information may be greater than or equal to a preset threshold, that is, the intention of the user is clear at this time, in this case, S440 may be executed, where any candidate reply information is selected from the multiple candidate reply information corresponding to the second question information (that is, the candidate reply information is the target reply information) and output.

However, after the second question information is processed based on the intent recognition model 603, the target similarity of the obtained plurality of candidate reply information may still be smaller than the preset threshold. For example, the plurality of candidate response messages may include lectures on topics such as artificial intelligence seminars, literature writing seminars, architectural design seminars, and the like. That is, the intention of the user is still not clear, the AI dialog system 510 may continue to perform S440' to S460, … … described above, and so on until the intention of the user can be clear and the reply information corresponding to the first question information can be output.

Embodiments of the above-described intention recognition method are summarized below.

Referring to a flowchart of another intention recognition method shown in fig. 11.

S1110: in response to the first question information entered by the user.

S1120: a plurality of candidate reply messages are generated.

As shown in fig. 11, in summary, the technical solution of the present invention can process the first question information based on the intention recognition model in response to the first question information input by the user, so as to obtain a plurality of candidate reply messages.

S1130: and calculating the similarity of the targets.

S1140: and judging whether the target similarity is smaller than a similarity threshold value.

And then, determining the target similarity of the plurality of candidate reply messages based on the plurality of candidate reply messages, and judging whether the target similarity is smaller than a similarity threshold value.

S1150: and if the target similarity is greater than or equal to the similarity threshold, outputting the reply information.

And if the target similarity is greater than or equal to the similarity threshold, indicating that the user intention corresponding to the first question information is clear. In this case, any one of the plurality of candidate reply messages may be output.

S1160: and if the target similarity is smaller than the similarity threshold, clustering the candidate reply messages.

S1170: and extracting keywords.

If the target similarity is smaller than the similarity threshold, the user intention corresponding to the first question information is not clear. In this case, the plurality of candidate reply messages may be clustered, and at least two keywords corresponding to the plurality of candidate reply messages may be extracted.

S1180: and generating intention question information.

Then, intention questioning information is generated according to the at least two keywords, and the intention questioning information is output to further determine the intention of the user.

S1190: and generating second questioning information according to the first questioning information and the intention reply information.

And finally, responding to the intention reply information input by the user to the intention inquiry information, and generating second inquiry information according to the first inquiry information and the intention reply information. At this time, the above-described S1120 to S1140 may be continuously performed on the second question information to again determine whether the user intention corresponding to the second question information is clear. And so on, until S1150 is completed, that is, the user intention can be clarified and the reply information corresponding to the first question information can be output.

By applying the technical scheme of the invention, whether the intention recognition model accurately recognizes the user intention can be judged based on the first question information input by the user, and the intention question information is generated to further acquire the intention of the user under the condition that the user intention cannot be accurately recognized, so that the accurate recognition of the user intention is realized, and the target reply information meeting the user requirement is output. According to the method and the device, intention recognition is realized without depending on an intention database, so that intention recognition in an open domain can be realized, and accurate reply information can be output aiming at the questioning information input by a user, so that user experience is improved.

The embodiment of the present invention further provides an intention recognition apparatus, referring to fig. 12, the intention recognition apparatus 1200 may be applied to an electronic device, and the intention recognition apparatus 1200 may include: a receiving module 1210, a processing module 1220, a determining module 1230, an obtaining module 1240, and a generating module 1250.

A receiving module 1210 for: first question information input by a user is received.

A processing module 1220 for: and processing the first question information based on the intention recognition model to obtain a plurality of candidate reply information.

A determining module 1230 for: based on the plurality of candidate reply messages, a target similarity of the plurality of candidate reply messages is determined.

An obtaining module 1240, configured to: and if the target similarity is smaller than the similarity threshold, acquiring at least two keywords corresponding to the candidate reply messages.

A generation module 1250 for: and generating intention questioning information according to at least two keywords, and outputting the intention questioning information.

The determination module 1230 is also configured to: and responding to the intention reply information input by the user on the intention questioning information, and determining and outputting target reply information corresponding to the first questioning information.

In some embodiments, the determining module 1230 is specifically configured to: responding to the intention reply information input by the user on the intention questioning information, and determining second questioning information according to the first questioning information and the intention reply information; and processing the second question information based on the intention recognition model to obtain and output target reply information.

In some embodiments, the determining module 1230 is specifically configured to: determining the similarity between any two candidate reply messages in the plurality of candidate reply messages; and determining the target similarity according to the similarity between any two candidate reply messages and the number of the similarities between any two candidate reply messages.

In some embodiments, the obtaining module 1240 is specifically configured to: clustering the candidate reply messages to obtain a plurality of candidate reply message groups; wherein, the corresponding categories among the candidate reply information groups in the candidate reply information groups are different; extracting keywords corresponding to each candidate reply information group; and generating intention question information according to the keywords corresponding to each candidate reply information group.

In some embodiments, each candidate reply message group includes at least one candidate reply message of the plurality of candidate reply messages, and the obtaining module 1240 is specifically configured to: processing each candidate reply information group based on the keyword extraction model to obtain keywords corresponding to each candidate reply information group; the generating module 1250 specifically is configured to: and processing keywords corresponding to each candidate reply information group based on the questioning information generation model to obtain the intention questioning information.

As shown in fig. 12, the intention recognition apparatus 1200 may further include: an output module 1260.

In some embodiments, output module 1260 is for: if the target similarity is greater than or equal to a preset threshold, outputting any one candidate reply message of the plurality of candidate reply messages.

Correspondingly, the specific details of each part in the intention recognition device are already described in the embodiment of the electronic equipment part, and the details not disclosed can be referred to the embodiment of the electronic equipment part, so that the details are not repeated.

Embodiments of the present invention provide a computer readable storage medium storing at least one executable instruction that, when executed on an electronic device/intent recognition apparatus, causes the electronic device/intent recognition apparatus to perform the intent recognition method in any of the method embodiments described above.

The executable instructions may be specifically for causing the electronic device/intention recognition apparatus to perform the above-described intention recognition method.

In this embodiment, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The algorithms or displays presented herein are not inherently related to any particular computer, virtual system, or other apparatus. In addition, embodiments of the present invention are not directed to any particular programming language.

In the description provided herein, numerous specific details are set forth. It will be appreciated, however, that embodiments of the invention may be practiced without such specific details. Similarly, in the above description of exemplary embodiments of the invention, various features of embodiments of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. Wherein the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.

Those skilled in the art will appreciate that the modules in the apparatus of the embodiments may be adaptively changed and disposed in one or more apparatuses different from the embodiments. The modules or units or components of the embodiments may be combined into one module or unit or component and, furthermore, they may be divided into a plurality of sub-modules or sub-units or sub-components. Except that at least some of such features and/or processes or elements are mutually exclusive.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the words first, second, third, etc. do not denote any order. These words may be interpreted as names. The steps in the above embodiments should not be construed as limiting the order of execution unless specifically stated.

Claims

1. An electronic device, comprising:

A communicator configured to receive first question information input by a user;

a controller coupled with the communicator, the controller configured to:

processing the first question information based on an intention recognition model to obtain a plurality of candidate reply information;

determining target similarity of the plurality of candidate reply messages based on the plurality of candidate reply messages;

if the target similarity is smaller than a similarity threshold, acquiring at least two keywords corresponding to the candidate reply messages;

generating intention questioning information according to the at least two keywords, and outputting the intention questioning information;

And responding to the intention reply information input by the user to the intention question information, and determining and outputting target reply information corresponding to the first question information.

2. The electronic device of claim 1, wherein the controller is specifically configured to:

responding to the intention reply information input by the user to the intention question information, and determining second question information according to the first question information and the intention reply information;

And processing the second question information based on the intention recognition model to obtain and output the target reply information.

3. The electronic device of claim 1, wherein the controller is specifically configured to:

determining the similarity between any two candidate reply messages in the plurality of candidate reply messages;

And determining the target similarity according to the similarity between any two pieces of candidate reply information and the number of the similarities between any two pieces of candidate reply information.

4. The electronic device of any one of claims 1-3, wherein the controller is specifically configured to:

Clustering the candidate reply information to obtain a plurality of candidate reply information groups; wherein, the corresponding categories among the candidate reply information groups in the candidate reply information groups are different;

extracting keywords corresponding to each candidate reply information group;

and generating the intention questioning information according to the keywords corresponding to the candidate reply information groups.

5. The electronic device of claim 4, wherein each candidate reply message group includes at least one candidate reply message of the plurality of candidate reply messages, the controller being specifically configured to:

processing each candidate reply information group based on a keyword extraction model to obtain keywords corresponding to each candidate reply information group;

And processing keywords corresponding to each candidate reply information group based on a question information generation model to obtain the intention question information.

6. The electronic device of any one of claims 1-3, wherein the controller is further configured to:

And if the target similarity is greater than or equal to the preset threshold, outputting any one candidate reply message of the plurality of candidate reply messages.

7. An intention recognition method, applied to an electronic device, comprising:

receiving first question information input by a user;

8. The method of claim 7, wherein the determining and outputting the target reply information corresponding to the first question information in response to the intention reply information input by the user to the intention question information comprises:

9. The method of claim 7, wherein the determining the target similarity of the plurality of candidate reply messages based on the plurality of candidate reply messages comprises:

10. The method according to any one of claims 7 to 9, wherein the obtaining at least two keywords corresponding to the plurality of candidate reply messages includes:

extracting keywords corresponding to each candidate reply information group;

11. The method of claim 10, wherein each candidate reply message group includes at least one candidate reply message of the plurality of candidate reply messages, and the obtaining at least two keywords corresponding to the plurality of candidate reply messages includes:

the generating the intention question information according to the at least two keywords comprises the following steps:

12. The method according to any one of claims 7 to 9, further comprising:

13. An intention recognition device, which is disposed in an electronic apparatus, comprising:

A receiving module for: receiving first question information input by a user;

a processing module for: processing the first question information based on an intention recognition model to obtain a plurality of candidate reply information;

a determining module for: determining target similarity of the plurality of candidate reply messages based on the plurality of candidate reply messages;

An acquisition module for: if the target similarity is smaller than a similarity threshold, acquiring at least two keywords corresponding to the candidate reply messages;

a generation module for: generating intention questioning information according to the at least two keywords, and outputting the intention questioning information;

The determining module is further configured to: and responding to the intention reply information input by the user to the intention question information, and determining and outputting target reply information corresponding to the first question information.

14. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed, implements the intention recognition method according to any one of claims 7 to 12.