CN111984763A

CN111984763A - Question answering processing method and intelligent equipment

Info

Publication number: CN111984763A
Application number: CN202010889260.4A
Authority: CN
Inventors: 李俊彦; 芮智琦; 詹乐
Original assignee: Hisense Electronic Technology Wuhan Co ltd
Current assignee: Hisense Electronic Technology Wuhan Co ltd
Priority date: 2020-08-28
Filing date: 2020-08-28
Publication date: 2020-11-24
Anticipated expiration: 2040-08-28
Also published as: CN111984763B

Abstract

The invention discloses a question-answering processing method and intelligent equipment, wherein a plurality of candidate question sentences similar to a target question sentence and answer information thereof are obtained in response to receiving the target question sentence input by a user; calculating core words included in all question sentences in a first set, wherein the first set comprises a target question sentence and a plurality of candidate question sentences; respectively forming new sentences by the question sentences in the first set and the core phrases thereof to obtain a second set; and calculating the matching candidate question with the highest probability score according to the second set, and outputting answer information corresponding to the matching candidate question. The invention utilizes the interactive matching of the core words, has good question-answer matching capability even if only a certain word or only individual words in the question have differences, enables the output answer to be more accurate, and can also improve the generalization capability of the current open-domain question-answer system.

Description

Question answering processing method and intelligent equipment

Technical Field

The invention relates to the technical field of open domain question answering, in particular to a question answering processing method and intelligent equipment.

Background

The open domain question-answering system is an important link in intelligent equipment such as an intelligent television, an intelligent home, an intelligent sound and the like, when a user proposes a target question through the intelligent equipment, the open domain question-answering system can retrieve similar candidate questions and answers thereof from a large number of data sets, and then provides a closest answer for the user through inquiring the similarity with the target question, so that interactive question-answering between the intelligent equipment and the user is realized.

The open-domain system can adopt a conventional interactive matching network model to perform question-answering processing, but the question-answering processing mode has the defects that: if only a certain word or only individual words in the question are different, the question-answer matching effect is poor. As shown in fig. 1, a user asks "how to drink milk tea in the morning", an open domain system gives an answer "the milk tea is drunk in the morning generally to the body", then the user asks "the milk tea is drunk at night" again, and the question is only a word difference between "morning" and "evening", so that the output answer may still be "the milk tea is drunk in the morning generally to the body", and a false answer is caused, and the answer expected by the user cannot be given.

Disclosure of Invention

In order to solve the technical problem, the invention provides a question answering processing method and intelligent equipment.

A question answering processing method provided in a first aspect includes:

responding to a target question input by a user, and acquiring a plurality of candidate questions similar to the target question and answer information thereof;

calculating core words included in all question sentences in a first set, wherein the first set comprises a target question sentence and a plurality of candidate question sentences;

respectively forming new sentences by the question sentences in the first set and the core phrases thereof to obtain a second set;

and calculating the matching candidate question with the highest probability score according to the second set, and outputting answer information corresponding to the matching candidate question.

A second aspect provides a smart device comprising:

the input interface is used for receiving a target question sentence input by a user and sending the target question sentence to the controller;

the controller configured to perform:

When a target question input by a user is received, similarity query is carried out according to the target question to obtain candidate questions and answer information thereof, so that a first set is obtained. According to the core word extraction algorithm, the core words included by the question sentences in the first set can be calculated, the core words are key words for measuring the semantics of the question sentences, and the accuracy of question-answer matching can be improved by using the core words to participate in calculation. Then, the question and the core phrase thereof are combined into a new sentence, for example, the question is 'drink milk tea in the morning', the core words are 'morning' and 'milk tea', and the new sentence is 'drink milk tea in the morning + milk tea in the morning', the method not only packs and combines the original question and the core words, but also improves the weight proportion of the core words in the new sentence, after each question in the first set generates a new sentence according to the method, a second set can be obtained, and according to the second set, the related question-answer matching algorithm, for example, an interactive matching network model, is combined to calculate the matching candidate question with the highest probability score, so that the matching degree of the answer information corresponding to the output matching candidate question and the target question is higher, and the answer is more accurate. By utilizing the interactive matching of the core words, the question and answer matching system has good question and answer matching capability even if only a certain word or only individual characters in the question are different, so that the output answer is more accurate, and the generalization capability of the current open domain question and answer system is favorably improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings to be accessed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

FIG. 1 is a UI diagram illustrating a conventional interactive matching network model for question-answering processing;

fig. 2 is a schematic diagram illustrating an operation scenario between the display device 200 and the control apparatus 100;

fig. 3 is a block diagram illustrating a hardware configuration of the display device 200 in fig. 2;

fig. 4 is a block diagram illustrating a hardware configuration of the control apparatus 100 in fig. 2;

fig. 5 is a schematic diagram illustrating a software configuration in the display device 200 in fig. 2;

FIG. 6 is a schematic diagram illustrating an icon control interface display of an application on display device 200;

FIG. 7 is a flow chart illustrating a question-answering processing method;

an architectural diagram of the ESIM model is shown schematically in fig. 8;

a UI diagram of the question-and-answer process with dynamic core words in conjunction with the ESIM model is illustrated in fig. 9.

Detailed Description

To make the objects, embodiments and advantages of the present application clearer, the following description of exemplary embodiments of the present application will clearly and completely describe the exemplary embodiments of the present application with reference to the accompanying drawings in the exemplary embodiments of the present application, and it is to be understood that the described exemplary embodiments are only a part of the embodiments of the present application, and not all of the embodiments.

All other embodiments, which can be derived by a person skilled in the art from the exemplary embodiments described herein without inventive step, are intended to be within the scope of the claims appended hereto. In addition, while the disclosure herein has been presented in terms of one or more exemplary examples, it should be appreciated that aspects of the disclosure may be implemented solely as a complete embodiment.

It should be noted that the brief descriptions of the terms in the present application are only for the convenience of understanding the embodiments described below, and are not intended to limit the embodiments of the present application. These terms should be understood in their ordinary and customary meaning unless otherwise indicated.

The terms "first," "second," and the like in the description and claims of this application and in the above-described drawings are used for distinguishing between similar or analogous objects or entities and are not necessarily intended to limit the order or sequence Unless otherwise indicated. It is to be understood that the terms so accessed are interchangeable under appropriate circumstances such that the terms first, second, third, etc. are, for example, capable of implementation in sequences other than those illustrated or otherwise described herein with respect to the embodiments of the application.

Furthermore, the terms "comprises" and "comprising," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a product or device that comprises a list of elements is not necessarily limited to those elements explicitly listed, but may include other elements not expressly listed or inherent to such product or device.

According to the method and the device, mainly aiming at application scenes with question-answering processing, such as an open domain question-answering system and the like built in the intelligent device, a user can input a target question through voice or typing and the like, and finally an answer with the highest matching degree with the target question is output through question-answering processing, so that intelligent question-answering interaction between the user and the intelligent device is realized. The intelligent devices include, but are not limited to, smart televisions, mobile terminals, smart homes, smart customer services, smart sounds, smart robots, and the like. The following provides an embodiment of a display device (smart tv).

The term "module," as referred to herein, refers to any known or later developed hardware, software, firmware, artificial intelligence, fuzzy logic, or combination of hardware and/or software code that is capable of performing the functionality associated with that element.

The term "remote control" as referred to in this application refers to a component of an electronic device, such as the display device disclosed in this application, that is typically wirelessly controllable over a relatively short distance. Generally access infrared and/or Radio Frequency (RF) signals and/or bluetooth to connect with the electronic device, and may also include WiFi, wireless USB, bluetooth, motion sensor, etc. For example: the hand-held touch remote controller replaces most of the physical built-in hard keys in the common remote control device with the user interface in the touch screen.

The term "gesture" as referred to in this application refers to a user action through a change in hand shape or hand motion to convey an intended idea, action, purpose, or result.

Fig. 2 is a schematic diagram illustrating an operation scenario between a display device and a control apparatus according to an embodiment. As shown in fig. 2, the user may operate the display device 200 through the mobile terminal 300 and the control apparatus 100.

In some embodiments, the control apparatus 100 may be a remote controller, and the communication between the remote controller and the display device includes an infrared protocol communication or a bluetooth protocol communication, and other short-distance communication methods, etc., and the display device 200 is controlled by wireless or other wired methods. The user may input a user command through a key on a remote controller, voice input, control panel input, etc. to control the display apparatus 200. Such as: the user can input a corresponding control command through a volume up/down key, a channel control key, up/down/left/right moving keys, a voice input key, a menu key, a power on/off key, etc. on the remote controller, to implement the function of controlling the display device 200.

In some embodiments, mobile terminals, tablets, computers, laptops, and other smart devices may also be accessed to control the display device 200. For example, accessing an application running on the smart device controls the display device 200. The application, through configuration, may provide the user with various controls in an intuitive User Interface (UI) on a screen associated with the smart device.

In some embodiments, the mobile terminal 300 may install a software application with the display device 200 to implement connection communication through a network communication protocol for the purpose of one-to-one control operation and data communication. Such as: the mobile terminal 300 and the display device 200 can establish a control instruction protocol, synchronize a remote control keyboard to the mobile terminal 300, and control the display device 200 by controlling a user interface on the mobile terminal 300. The audio and video content displayed on the mobile terminal 300 can also be transmitted to the display device 200, so as to realize the synchronous display function.

As also shown in fig. 2, the display apparatus 200 also performs data communication with the server 400 through various communication means. The display device 200 may be allowed to be communicatively connected through a Local Area Network (LAN), a Wireless Local Area Network (WLAN), and other networks. The server 400 may provide various contents and interactions to the display apparatus 200. Illustratively, the display device 200 receives software program updates, or accesses a remotely stored digital media library, by sending and receiving information, as well as Electronic Program Guide (EPG) interactions. The server 400 may be a cluster or a plurality of clusters, and may include one or more types of servers. Other web service contents such as video on demand and advertisement services are provided through the server 400.

The display device 200 may be a liquid crystal display, an OLED display, a projection display device. The particular display device type, size, resolution, etc. are not limiting, and those skilled in the art will appreciate that the display device 200 may be modified in performance and configuration as desired.

The display apparatus 200 may additionally provide an intelligent network tv function of a computer support function including, but not limited to, a network tv, an intelligent tv, an Internet Protocol Tv (IPTV), and the like, in addition to the broadcast receiving tv function.

A hardware configuration block diagram of the display apparatus 200 according to an exemplary embodiment is exemplarily shown in fig. 3.

In some embodiments, at least one of the controller 250, the tuner demodulator 210, the communicator 220, the detector 230, the input/output interface 255, the display 275, the audio output interface 285, the memory 260, the power supply 290, the user interface 265, and the external device interface 240 is included in the display apparatus 200.

In some embodiments, a display 275 receives image signals originating from the first processor output and displays video content and images and components of the menu manipulation interface.

In some embodiments, the display 275, includes a display component for presenting a picture, and a drive component that drives the display of an image.

In some embodiments, the video content is displayed from broadcast television content, or alternatively, from various broadcast signals that may be received via wired or wireless communication protocols. Alternatively, various image contents received from the network communication protocol and sent from the network server side can be displayed.

In some embodiments, the display 275 is used to present a user-manipulated UI interface generated in the display apparatus 200 and used to control the display apparatus 200.

In some embodiments, a driver assembly for driving the display is also included, depending on the type of display 275.

In some embodiments, display 275 is a projection display and may also include a projection device and a projection screen.

In some embodiments, communicator 220 is a component for communicating with external devices or external servers according to various communication protocol types. For example: the communicator may include at least one of a WIFI module 221, a bluetooth module 222, a wired ethernet module 223, and other network communication protocol modules or near field communication protocol modules, and an infrared receiver, so that the communicator 220 may receive a control signal of the control device 100 according to the control of the controller 250, and implement the control signal as a signal type such as a WIFI signal, a bluetooth signal, and a radio frequency signal.

In some embodiments, the display apparatus 200 may establish control signal and data signal transmission and reception with the external control device 100 or the content providing apparatus through the communicator 220.

In some embodiments, the user interface 265 may be configured to receive infrared control signals from a control device 100 (e.g., an infrared remote control, etc.).

In some embodiments, the detector 230 is a signal used by the display device 200 to collect an external environment or interact with the outside.

In some embodiments, the detector 230 includes a light receiver, a sensor for collecting the intensity of ambient light, and parameters changes can be adaptively displayed by collecting the ambient light, and the like.

In some embodiments, the detector 230 may further include an image collector, such as a camera, etc., which may be configured to collect external environment scenes, collect attributes of the user or gestures interacted with the user, adaptively change display parameters, and recognize user gestures, so as to implement a function of interaction with the user.

In some embodiments, the detector 230 may also include a temperature sensor or the like, such as by sensing ambient temperature.

In some embodiments, the display apparatus 200 may adaptively adjust a display color temperature of an image. For example, the display apparatus 200 may be adjusted to display a cool tone when the temperature is in a high environment, or the display apparatus 200 may be adjusted to display a warm tone when the temperature is in a low environment.

In some embodiments, the detector 230 may also be a sound collector or the like, such as a microphone, which may be used to receive the user's voice. Illustratively, a voice signal including a control instruction of the user to control the display device 200, or to collect an ambient sound for recognizing an ambient scene type, so that the display device 200 can adaptively adapt to an ambient noise.

In some embodiments, as shown in fig. 3, the input/output interface 255 is configured to allow data transfer between the controller 250 and external other devices or other controllers 250. Such as receiving video signal data and audio signal data of an external device, or command instruction data, etc.

In some embodiments, the external device interface 240 may include, but is not limited to, the following: the interface can be any one or more of a high-definition multimedia interface (HDMI), an analog or data high-definition component input interface, a composite video input interface, a USB input interface, an RGB port and the like. The plurality of interfaces may form a composite input/output interface.

In some embodiments, as shown in fig. 3, the tuning demodulator 210 is configured to receive a broadcast television signal through a wired or wireless receiving manner, perform modulation and demodulation processing such as amplification, mixing, resonance, and the like, and demodulate an audio and video signal from a plurality of wireless or wired broadcast television signals, where the audio and video signal may include a television audio and video signal carried in a television channel frequency selected by a user and an EPG data signal.

In some embodiments, the frequency points demodulated by the tuner demodulator 210 are controlled by the controller 250, and the controller 250 can send out control signals according to user selection, so that the modem responds to the television signal frequency selected by the user and modulates and demodulates the television signal carried by the frequency.

In some embodiments, the broadcast television signal may be classified into a terrestrial broadcast signal, a cable broadcast signal, a satellite broadcast signal, an internet broadcast signal, or the like according to the broadcasting system of the television signal. Or may be classified into a digital modulation signal, an analog modulation signal, and the like according to a modulation type. Or the signals are classified into digital signals, analog signals and the like according to the types of the signals.

In some embodiments, the controller 250 and the modem 210 may be located in different separate devices, that is, the modem 210 may also be located in an external device of the main device where the controller 250 is located, such as an external set-top box. Therefore, the set top box outputs the television audio and video signals modulated and demodulated by the received broadcast television signals to the main body equipment, and the main body equipment receives the audio and video signals through the first input/output interface.

In some embodiments, the controller 250 controls the operation of the display device and responds to user operations through various software control programs stored in memory. The controller 250 may control the overall operation of the display apparatus 200. For example: in response to receiving a user command for selecting a UI object to be displayed on the display 275, the controller 250 may perform an operation related to the object selected by the user command.

In some embodiments, the object may be any one of selectable objects, such as a hyperlink or an icon. Operations related to the selected object, such as: displaying an operation connected to a hyperlink page, document, image, or the like, or performing an operation of a program corresponding to the icon. The user command for selecting the UI object may be a command input through various input means (e.g., a mouse, a keyboard, a touch pad, etc.) connected to the display apparatus 200 or a voice command corresponding to a voice spoken by the user.

As shown in fig. 3, the controller 250 includes at least one of a Random Access Memory 251 (RAM), a Read-Only Memory 252 (ROM), a video processor 270, an audio processor 280, other processors 253 (e.g., a Graphics Processing Unit (GPU), a Central Processing Unit 254 (CPU), a Communication Interface (Communication Interface), and a Communication Bus 256(Bus), which connects the respective components.

In some embodiments, RAM 251 is used to store temporary data for the operating system or other programs that are running.

In some embodiments, ROM252 is used to store instructions for various system boots.

In some embodiments, the ROM252 is used to store a Basic Input Output System (BIOS). The system is used for completing power-on self-test of the system, initialization of each functional module in the system, a driver of basic input/output of the system and booting an operating system.

In some embodiments, when the power-on signal is received, the display device 200 starts to power up, the CPU executes the system boot instruction in the ROM252, and copies the temporary data of the operating system stored in the memory to the RAM 251 so as to start or run the operating system. After the start of the operating system is completed, the CPU copies the temporary data of the various application programs in the memory to the RAM 251, and then, the various application programs are started or run.

In some embodiments, processor 254 is used to execute operating system and application program instructions stored in memory. And executing various application programs, data and contents according to various interactive instructions received from the outside so as to finally display and play various audio and video contents.

In some demonstrative embodiments, processor 254 may include a plurality of processors. The plurality of processors may include a main processor and one or more sub-processors. A main processor for performing some operations of the display apparatus 200 in a pre-power-up mode and/or operations of displaying a screen in a normal mode. One or more sub-processors for one operation in a standby mode or the like.

In some embodiments, the graphics processor 253 is used to generate various graphics objects, such as: icons, operation menus, user input instruction display graphics, and the like. The display device comprises an arithmetic unit which carries out operation by receiving various interactive instructions input by a user and displays various objects according to display attributes. And the system comprises a renderer for rendering various objects obtained based on the arithmetic unit, wherein the rendered objects are used for being displayed on a display.

In some embodiments, the video processor 270 is configured to receive an external video signal, and perform video processing such as decompression, decoding, scaling, noise reduction, frame rate conversion, resolution conversion, image synthesis, and the like according to a standard codec protocol of the input signal, so as to obtain a signal that can be displayed or played on the direct display device 200.

In some embodiments, video processor 270 includes a demultiplexing module, a video decoding module, an image synthesis module, a frame rate conversion module, a display formatting module, and the like.

The demultiplexing module is used for demultiplexing the input audio and video data stream, and if the input MPEG-2 is input, the demultiplexing module demultiplexes the input audio and video data stream into a video signal and an audio signal.

And the video decoding module is used for processing the video signal after demultiplexing, including decoding, scaling and the like.

And the image synthesis module is used for carrying out superposition mixing processing on the GUI signal input by the user or generated by the user and the video image after the zooming processing by the graphic generator so as to generate an image signal for display.

The frame rate conversion module is configured to convert an input video frame rate, such as a 60Hz frame rate into a 120Hz frame rate or a 240Hz frame rate, and the normal format is implemented in, for example, an interpolation frame mode.

The display format module is used for converting the received video output signal after the frame rate conversion, and changing the signal to conform to the signal of the display format, such as outputting an RGB data signal.

In some embodiments, the graphics processor 253 and the video processor may be integrated or separately configured, and when the graphics processor and the video processor are integrated, the graphics processor and the video processor may perform processing of graphics signals output to the display, and when the graphics processor and the video processor are separately configured, the graphics processor and the video processor may perform different functions, respectively, for example, a GPU + frc (frame Rate conversion) architecture.

In some embodiments, the audio processor 280 is configured to receive an external audio signal, decompress and decode the received audio signal according to a standard codec protocol of the input signal, and perform noise reduction, digital-to-analog conversion, and amplification processes to obtain an audio signal that can be played in a speaker.

In some embodiments, video processor 270 may comprise one or more chips. The audio processor may also comprise one or more chips.

In some embodiments, the video processor 270 and the audio processor 280 may be separate chips or may be integrated together with the controller in one or more chips.

In some embodiments, the audio output, under the control of controller 250, receives sound signals output by audio processor 280, such as: the speaker 286, and an external sound output terminal of a generating device that can output to an external device, in addition to the speaker carried by the display device 200 itself, such as: external sound interface or earphone interface, etc., and may also include a near field communication module in the communication interface, for example: and the Bluetooth module is used for outputting sound of the Bluetooth loudspeaker.

The power supply 290 supplies power to the display device 200 from the power input from the external power source under the control of the controller 250. The power supply 290 may include a built-in power supply circuit installed inside the display apparatus 200, or may be a power supply interface installed outside the display apparatus 200 to provide an external power supply in the display apparatus 200.

A user interface 265 for receiving an input signal of a user and then transmitting the received user input signal to the controller 250. The user input signal may be a remote controller signal received through an infrared receiver, and various user control signals may be received through the network communication module.

In some embodiments, the user inputs a user command through the control apparatus 100 or the mobile terminal 300, the user input interface responds to the user input through the controller 250 according to the user input, and the display device 200 responds to the user input through the controller 250.

In some embodiments, a user may enter user commands on a Graphical User Interface (GUI) displayed on the display 275, and the user input interface receives the user input commands through the Graphical User Interface (GUI). Alternatively, the user may input the user command by inputting a specific sound or gesture, and the user input interface receives the user input command by recognizing the sound or gesture through the sensor.

In some embodiments, a "user interface" is a media interface for interaction and information exchange between an application or operating system and a user that enables conversion between an internal form of information and a form that is acceptable to the user. A commonly used presentation form of the User Interface is a Graphical User Interface (GUI), which refers to a User Interface related to computer operations and displayed in a graphical manner. It may be an interface element such as an icon, window, control, etc. displayed in the display of the electronic device, where the control may include a visual interface element such as an icon, button, menu, tab, text box, dialog box, status bar, navigation bar, Widget, etc.

The memory 260 includes a memory storing various software modules for driving the display device 200. Such as: various software modules stored in the first memory, including: at least one of a basic module, a detection module, a communication module, a display control module, a browser module, and various service modules.

The base module is a bottom layer software module for signal communication between various hardware in the display device 200 and for sending processing and control signals to the upper layer module. The detection module is used for collecting various information from various sensors or user input interfaces, and the management module is used for performing digital-to-analog conversion and analysis management.

For example, the voice recognition module comprises a voice analysis module and a voice instruction database module. The display control module is used for controlling the display to display the image content, and can be used for playing the multimedia image content, UI interface and other information. And the communication module is used for carrying out control and data communication with external equipment. And the browser module is used for executing a module for data communication between browsing servers. And the service module is used for providing various services and modules including various application programs. Meanwhile, the memory 260 may store a visual effect map for receiving external data and user data, images of various items in various user interfaces, and a focus object, etc.

Fig. 4 exemplarily shows a block diagram of a configuration of the control apparatus 100 according to an exemplary embodiment. As shown in fig. 4, the control device 100 includes a controller 110, a communication interface 130, a user input/output interface, a memory, and a power supply.

The control apparatus 100 is configured to control the display device 200 and may receive an input operation instruction of a user and convert the operation instruction into an instruction recognizable and responsive by the display device 200, serving as an interaction intermediary between the user and the display device 200. Such as: the user operates the channel up/down key on the control device 100, and the display device 200 responds to the channel up/down operation.

In some embodiments, the control device 100 may be a smart device. Such as: the control apparatus 100 may install various applications that control the display device 200 according to user demands.

In some embodiments, as shown in fig. 2, the mobile terminal 300 or other intelligent electronic device may function similar to the control apparatus 100 after installing an application for manipulating the display device 200. Such as: the user may implement the function of controlling the physical keys of the apparatus 100 by installing an application, various function keys or virtual buttons of a graphical user interface available on the mobile terminal 300 or other intelligent electronic device.

The controller 110 includes a processor 112 and RAM 113 and ROM 114, a communication interface 130, and a communication bus. The controller is used for controlling the operation of the control device 100, as well as the communication cooperation among the internal components and the external and internal data processing functions.

The communication interface 130 enables communication of control signals and data signals with the display apparatus 200 under the control of the controller 110. Such as: the received user input signal is transmitted to the display apparatus 200. The communication interface 130 may include at least one of a WiFi chip 131, a bluetooth module 132, an NFC module 133, and other near field communication modules.

A user input/output interface 140, wherein the input interface includes at least one of a microphone 141, a touch pad 142, a sensor 143, keys 144, and other input interfaces. Such as: the user can realize a user instruction input function through actions such as voice, touch, gesture, pressing, and the like, and the input interface converts the received analog signal into a digital signal and converts the digital signal into a corresponding instruction signal, and sends the instruction signal to the display device 200.

The output interface includes an interface that transmits the received user instruction to the display apparatus 200. In some embodiments, the interface may be an infrared interface or a radio frequency interface. Such as: when the infrared signal interface is used, the user input instruction needs to be converted into an infrared control signal according to an infrared control protocol, and the infrared control signal is sent to the display device 200 through the infrared sending module. The following steps are repeated: when the rf signal interface is used, a user input command needs to be converted into a digital signal, and then the digital signal is modulated according to the rf control signal modulation protocol and then transmitted to the display device 200 through the rf transmitting terminal.

In some embodiments, the control device 100 includes at least one of a communication interface 130 and an input-output interface 140. The control device 100 is configured with a communication interface 130, such as: the WiFi, bluetooth, NFC, etc. modules may transmit the user input command to the display device 200 through the WiFi protocol, or the bluetooth protocol, or the NFC protocol code.

And a memory 190 for storing various operation programs, data and applications for driving and controlling the control apparatus 100 under the control of the controller. The memory 190 may store various control signal commands input by a user.

And a power supply 180 for providing operation power support for each element of the control device 100 under the control of the controller. A battery and associated control circuitry.

In some embodiments, the system may include a Kernel (Kernel), a command parser (shell), a file system, and an application program. The kernel, shell, and file system together make up the basic operating system structure that allows users to manage files, run programs, and access the system. After power-on, the kernel is started, kernel space is activated, hardware is abstracted, hardware parameters are initialized, and virtual memory, a scheduler, signals and interprocess communication (IPC) are operated and maintained. And after the kernel is started, loading the Shell and the user application program. The application program is compiled into machine code after being started, and a process is formed.

Referring to fig. 5, in some embodiments, the system is divided into four layers, which are an Application (Applications) layer (abbreviated as "Application layer"), an Application Framework (Application Framework) layer (abbreviated as "Framework layer"), an Android runtime (Android runtime) and system library layer (abbreviated as "system runtime library layer"), and a kernel layer, respectively, from top to bottom.

In some embodiments, at least one application program runs in the application program layer, and the application programs can be Window (Window) programs carried by an operating system, system setting programs, clock programs, camera applications and the like; or may be an application developed by a third party developer such as a hi program, a karaoke program, a magic mirror program, or the like. In specific implementation, the application packages in the application layer are not limited to the above examples, and may actually include other application packages, which is not limited in this embodiment of the present application.

The framework layer provides an Application Programming Interface (API) and a programming framework for the application program of the application layer. The application framework layer includes a number of predefined functions. The application framework layer acts as a processing center that decides to let the applications in the application layer act. The application program can access the resources in the system and obtain the services of the system in execution through the API interface.

As shown in fig. 5, in the embodiment of the present application, the application framework layer includes a manager (Managers), a Content Provider (Content Provider), a View System (View System), and the like, where the manager includes at least one of the following modules: an Activity Manager (Activity Manager) is used for interacting with all activities running in the system; the Location Manager (Location Manager) is used for providing the system service or application with the access of the system Location service; a Package Manager (Package Manager) for retrieving various information related to an application Package currently installed on the device; a Notification Manager (Notification Manager) for controlling display and clearing of Notification messages; a Window Manager (Window Manager) is used to manage the icons, windows, toolbars, wallpapers, and desktop components on a user interface.

In some embodiments, the activity manager is to: managing the life cycle of each application program and the general navigation backspacing function, such as controlling the exit of the application program (including switching the user interface currently displayed in the display window to the system desktop), opening, backing (including switching the user interface currently displayed in the display window to the previous user interface of the user interface currently displayed), and the like.

In some embodiments, the window manager is used to manage all window processes, such as obtaining display size, determining if there is a status bar, locking the screen, intercepting the screen, controlling display window changes (e.g., zooming out, dithering, distorting, etc.) and the like.

In some embodiments, the system runtime layer provides support for the upper layer, i.e., the framework layer, and when the framework layer is accessed, the android operating system runs the C/C + + library included in the system runtime layer to implement the functions to be implemented by the framework layer.

In some embodiments, the kernel layer is a layer between hardware and software. As shown in fig. 5, the core layer includes at least one of the following drivers: audio drive, display drive, bluetooth drive, camera drive, WIFI drive, USB drive, HDMI drive, sensor drive (such as fingerprint sensor, temperature sensor, touch sensor, pressure sensor, etc.), and so on.

In some embodiments, the kernel layer further comprises a power driver module for power management.

In some embodiments, software programs and/or modules corresponding to the software architecture of fig. 5 are stored in the first memory or the second memory shown in fig. 3 or fig. 4.

In some embodiments, taking the magic mirror application (photographing application) as an example, when the remote control receiving device receives a remote control input operation, a corresponding hardware interrupt is sent to the kernel layer. The kernel layer processes the input operation into an original input event (including information such as a value of the input operation, a timestamp of the input operation, etc.). The raw input events are stored at the kernel layer. The application program framework layer obtains an original input event from the kernel layer, identifies a control corresponding to the input event according to the current position of the focus and uses the input operation as a confirmation operation, the control corresponding to the confirmation operation is a control of a magic mirror application icon, the magic mirror application calls an interface of the application framework layer to start the magic mirror application, and then the kernel layer is called to start a camera driver, so that a static image or a video is captured through the camera.

In some embodiments, for a display device with a touch function, taking a split screen operation as an example, the display device receives an input operation (such as a split screen operation) applied to a display by a user, and the kernel layer may generate a corresponding input event according to the input operation and report the event to the application framework layer. The window mode (such as multi-window mode) corresponding to the input operation, the position and size of the window and the like are set by an activity manager of the application framework layer. And the window management of the application program framework layer draws a window according to the setting of the activity manager, then sends the drawn window data to the display driver of the kernel layer, and the display driver displays the corresponding application interface in different display areas of the display.

In some embodiments, as shown in fig. 6, the application layer containing at least one application may display a corresponding icon control in the display, such as: a live television application icon control, a Video On Demand (VOD) application icon control, a media center application icon control, an application center icon control, a game application icon control, and the like.

In some embodiments, the live television application may provide live television via different signal sources. For example, a live television application may access input providing television signals from cable television, radio broadcasts, satellite services, or other types of live television services. And, the live television application may display video of the live television signal on the display device 200.

In some embodiments, a video-on-demand application may provide video from different storage sources. Unlike live television applications, video on demand provides a video display from some storage source. For example, the video on demand may come from a server side of the cloud storage, from a local hard disk storage containing stored video programs.

In some embodiments, the media center application may provide various applications for multimedia content playback. For example, a media center, which may be other than live television or video on demand, may provide services that a user may access to various images or audio through a media center application.

In some embodiments, an application center may provide storage for various applications. The application may be a game, an application, or some other application associated with a computer system or other device that may be run on the smart television. The application center may obtain these applications from different sources, store them in local storage, and then be operable on the display device 200.

In some embodiments, the display device further has an input interface, where the input interface is used to receive a target question input by a user, send the target question to the controller 250, and the controller 250 executes and controls the question-answer processing flow. The input interface can be a sound collection interface specifically, and collects the problem provided by the user voice, and the target statement is the corpus information in the form of voice; alternatively, the input interface may be the user interface 265, and a target question manually input by a user through a device such as a remote controller or a keyboard may be received through the user interface 265, and the target question may be expressed as text information. The form of presentation and the input method of the target question are not limited. For other types of intelligent devices, the intelligent device at least comprises an input interface and a controller for executing question and answer processing, the open domain question and answer system is built in the controller, and other hardware/software structures included in the intelligent device are not limited, and the specific application is the standard.

As can be seen from fig. 1, although the existing interactive matching network model can overcome the semantic gap, the overall similarity of the question is high, and when only a certain word or only a single word has a difference, an answer of accurate matching cannot be given, and a false answer is easily caused. To solve the technical problem, as shown in fig. 7, a question answering processing method provided in an embodiment of the present application is executed by a controller 250, that is, the controller 250 is an execution subject of the method, and the method includes:

step S10, in response to receiving a target question input by a user, acquiring a plurality of candidate questions similar to the target question and answer information thereof.

After the user inputs the target question, the candidate question similar to the target question needs to be retrieved, the matching candidate question with the highest similarity to the target question is selected from the candidate question, and answer information corresponding to the matching candidate question is output, so that one question and one answer between the user and the intelligent device are realized. In some embodiments, the candidate question sentences and the corresponding answer information thereof may be pre-stored in a designated database, or may be queried by a Search engine, such as Elastic Search. The controller 250 generates a query instruction according to the target question sentence and transmits the query instruction to the database or the search engine.

In some embodiments, a database/search engine may retrieve many question and answer information having different similarities with a target question, where the question and answer information necessarily includes some question and answer information having relatively low similarities with the target question, and the matching between the question and answer information and the target question is obviously not high, so that further screening is required, if M question and answer information similar to the target question are retrieved, the M question and answer information are ranked according to the question similarities, and N question and answer information with the highest question similarity ranking are selected as candidate question and answer information thereof. For example, if the question and sentence similarity is sorted in the descending (decreasing) order, the top N pieces of question and answer information are selected; and if the question and answer information is sorted according to the sequence of the similarity of the question sentences from small to large (increasing), selecting the question and answer information sorted in the last N, wherein M is greater than or equal to N. In the embodiment, question-answer information with low similarity is filtered through quantity constraint, and in other embodiments, question-answer information with question similarity larger than a threshold value is selected through threshold value constraint of question similarity, and the screening rule of candidate question is not limited.

Step S20, calculating core words included in each question in a first set, where the first set includes a target question and a plurality of candidate questions.

In the present application, the target question originally presented by the user and the obtained candidate question form a first set, that is, the first set includes N +1 question sentences, and then the core words of the N +1 question sentences need to be extracted in step S20. The core words are key words for measuring the semantics of the question and answer, and directly influence the accuracy of question and answer matching, for example, in fig. 1, the target question "how good the target question is when the target question is drunk as early as the morning", and the core words include "morning" and "milky tea". Some core word extraction algorithms may be adopted, and in some embodiments, a core word extraction method based on an ELMO (Embedding from Language Model, which obtains word vectors from a Language Model) Model and an SIF (Smooth Inverse Frequency) Model is adopted in the present application.

The ELMO model is a language model based on a multi-layer bidirectional LSTM (Long Short-Term Memory network), a large number of texts and linguistic data are used for pre-training the language model, the LSTM at the lower layer represents simpler grammatical information, the LSTM at the upper layer captures semantic information, a sentence is input into the language model, the language model performs calculation processing, and different word vector representations are output. The ELMO model generally includes an input layer (equivalent to an Embedding layer) and a bidirectional LSTM layer, and is used to obtain different features (grammar and semantics) of words and solve a word ambiguity problem, and for word vectors, words and vectors are in one-to-one correspondence and do not change due to different parts of speech and semantics. For the detailed content of the ELMO, reference may be made to the description of related art, and details are not repeated in this application.

The SIF model is a weighted bag-of-words model, which only considers the weights of all words without considering the context relationship between the words, similar to putting all words into a bag, each word is independent, and the weighting coefficient of each word in the bag is calculated by a smooth inverse word frequency algorithm. Each word vector in the sentence is input into the SIF model, and after calculation processing by the SIF model, a vector representation of the sentence can be obtained, which is named as a sentence vector. Specific contents of the SIF model may refer to the description of the related prior art, and are not described in detail in this application.

And calculating word vectors corresponding to words included in the question in the first set by using an ELMO model, calculating sentence vectors of the question in the first set by using an SIF model, and then obtaining core words of the question according to the similarity between the word vectors and the sentence vectors in the same question. In some embodiments, the euclidean distance between the word vector and the sentence vector of the same question is calculated, and the smaller the euclidean distance is, the closer the word corresponding to the word vector is to the core word of the question, so that a threshold value critical constraint can be set, and when the euclidean distance is smaller than the threshold value, the word corresponding to the word vector is determined to be the core word in the question, so that the core words of the target question and the candidate question can be calculated, and information such as the weight of each core word in the question can also be calculated through the SIF model.

Step S30, respectively forming new sentences from the question sentences in the first set and the core phrases thereof, to obtain a second set.

Wherein, the new sentence is a question and a core word sequence; the core word sequence is obtained by deleting the words of the non-core words in the question and reserving the original sequence of the core words in the question.

After the core words included in each question in the first set are calculated, the question and the core words thereof are formed into a new sentence, for example, the question is 'good milk tea drunk in the morning', the core words extracted according to the step S20 are 'morning' and 'milk tea', the words of the non-core words include 'drink', 'good' and 'do', the words are deleted, the 'morning' sequence is kept in front of the word sequence, the 'milk tea' sequence is behind, the core word sequence is 'good milk tea drunk in the morning', and the new sentence is 'good milk tea drunk in the morning + milk tea drunk in the morning'. After each question (including the target question and the candidate question) in the first set generates a new sentence in this way, the second set can be obtained.

And step S40, calculating the matching candidate question with the highest probability score according to the second set, and outputting answer information corresponding to the matching candidate question.

Step S40 needs to be calculated by using a question-and-answer matching algorithm, in some embodiments, an interactive matching Model is used in the present application, for example, an Enhanced Sequential Inference Model (ESIM), where the ESIM Model is an interactive matching Model based on an attention mechanism, such as an ESIM Model architecture shown in fig. 8, a left-side ESIM Model in fig. 8 represents a network Model containing syntactic analysis information in a Tree LSTM (Tree-LSTM), and the ESIM Model is specifically configured as follows:

(A) the bottom layer is an Input Encoding (Input Encoding) layer. Namely, each new sentence in the second set is input into the coding layer, and the core word coding and question sentence (original sentence) coding of each new sentence are obtained by the coding layer. In the ESIM model, two questions are input to Embedding and BilSTM (i.e. bidirectional LSTM network), respectively, and if one of the two questions is assumed as a Premise (Premise) and the other is assumed as a Hypothesis (Hypothesis), the BilSTM is used to encode the Premise and the Hypothesis, respectively, to obtain:

in the above formula, the first and second carbon atoms are,

in order to make a question before encoding,

for the encoded hypothesized question, a represents the hypothesized question, b represents the hypothesized question, i identifies the sequence number of the word in the hypothesized question, j identifies the sequence number of the word in the hypothesized question, and l_aTo presuppose the number of words in a question (corresponding to the length of a sentence), l_bSuppose the number of words in a question (equivalent to a sentence length). Using BilSTM, one can learn how to represent the relationship between a word in a sentence and its context, and one can also understand that this is after the word vector, re-encoded in the current context, resulting in a new Embelling vector. In the present application, a question as a precondition is a target question, and a question is assumed to be a candidate question.

(B) The Local Inference layer implements Local Inference Modeling (Local Inference Modeling). Before local reasoning modeling is carried out, the two question sentences are aligned, and then the similarity between words of the two question sentences is calculated. What is used to realize the alignment mechanism is an attention mechanism, and the process is as follows: the word sequence of the antecedent (hypothesis) is treated as a bag-of-words embedding vector, and the "alignment" (or attention) between the question sentences is calculated separately so that each word of the antecedent is semantically consistent with the word in the hypothesis. Calculating a core word similarity matrix _ key and an question similarity matrix _ seq in a dot product mode according to the core word codes and the question codes obtained by the coding layer_ij. In matrix _ seq_ijFor example, it can be calculated by the following formula:

for the core word similarity matrix _ key, the core word vector of the coded target question and the core word vector of the coded candidate question can be utilized according to matrix _ seq_ijThe calculation is performed in a similar manner and will not be described herein. Core inter-word similarity matrix _ key and question inter-similarity matrix _ seq_ijFor a two-dimensional vector, for example, if the precondition sentence is "beautiful in China", and the sentence is "I is Chinese", the matrix _ seq is obtained_ijAs shown in table 1:

TABLE 1

	China (China)	Very much	Beauty product
				I am	0.5	0.2	0.1
Is that	0.1	0.3	0.1
				China (China)	1	0.2	0.4
Human being	0.4	0.2	0.1

Then, local reasoning is started, and the core inter-word similarity matrix _ key and the question inter-sentence similarity matrix _ seq obtained before are utilized_ijAnd respectively calculating the combination of the target question and the candidate question.

The combined Query of the target question is [ Query; query'; query _ keyword' ];

a combined Candidate of the Candidate question sentence ═ Candidate; candidate'; candidate _ keyword' ].

The Query represents an original sentence of a target question, the Query 'is a target question represented by a candidate question when interactive processing is carried out in an ESIM (electronic information modeling) model, and the Query _ keyword' is a core word of the target question represented by a core word of the candidate question when interactive processing is carried out in the ESIM model; candidate represents an original sentence of the Candidate question sentence, Candidate 'is a Candidate question sentence represented by the target question sentence when interactive processing is performed in the ESIM model, Candidate _ keyword' is a core word of the Candidate question sentence represented by the core word of the target question sentence when interactive processing is performed in the ESIM model.

Taking the calculation of Query 'and Candidate' as an example, it is equivalent to combining the target question and the Candidate question, mutually generating the sentence expressions with weighted similarity, and keeping the dimension unchanged, and the calculation formula is as follows:

as can be seen from the above formula, the,

is to

The result of the calculation after weighted summation, i.e.

To represent

Each word in

The degree of correlation of (c). Similarly, the calculation principle of Query 'and Candidate' can be referred to, and the embodiment of computing Query _ keyword 'and Candidate _ keyword' is not described again.

After the Query 'and the Query _ keyword' are calculated, the Query is combined, and then the combination type of the target question sentence can be spliced and combined; after Candidate 'and Candidate _ keyword' are calculated, the Candidate is combined, and then the Candidate question sentence combination type can be spliced and combined, so that local reasoning information is obtained.

(C) The Inference combination (Inference Composition) layer is used for calculating a context representation vector of local Inference information according to a combination of a target question and a candidate question, extracting the local Inference information once through BilSTM, adopting an Average Pooling (Average Pooling) method and a maximum Pooling (Max Pooling) method when calculating the context representation vector, connecting all pooled values to finally form a feature vector V with a fixed length, and inputting the feature vector V to the Prediction (Prediction) layer.

(D) And the prediction layer is used for predicting and sequencing the probability scores of the candidate question sentences according to the feature vector V. The prediction layer may be a fully connected layer of two layers, and dropout may be added before pooling layers to the fully connected layer to prevent overfitting. Of the fully-connected layers, the first layer may use the relu activation function and the second layer uses softmax as the prediction output. softmax can automatically calculate the probability score of each candidate question, and the probability score is used for measuring the question-answer matching degree corresponding to each candidate question, namely the probability of each candidate question being obtained.

And after calculating the probability scores of the N candidate question sentences by softmax, sequencing according to the probability scores, determining the candidate question sentence corresponding to the screened highest probability score as a matching candidate question sentence, and outputting answer information corresponding to the candidate question sentence. In some embodiments, if the user is a voice input target question, answer information may be played in voice through a sound playing device such as a speaker; if the user manually inputs the target question in text, that is, the text information of the target question is displayed on the display 275 of the display device, the matched answer information is also displayed on the display 275. It should be noted that the form of presentation of questions and answers is not limited to that described in this embodiment.

The ESIM model described above is configured in the controller. According to the application, attention interaction of core words is added on the basis of conventional ESIM model processing logic, the attention interaction of the core words is similar to the interactive process between question sentences, other more detailed contents of the ESIM model can be referred to the description of the related prior art, and the description is omitted in this embodiment.

As shown in fig. 9, when the question is answered according to the above technical solution of the present application, the user asks "how to drink milk tea in the morning", the answer information displayed by the smart device is "the milk tea is drunk in the morning generally to the body", the user continuously asks "how to drink milk tea at night", and the answer information displayed by the smart device is "the milk tea is drunk at night not well to the body, so that the user is easily fattened, and the gastrointestinal burden is also easily increased".

In natural language, the core degree (also called weight) of a word in different sentences is different, and the ELMO model and the unsupervised SIF model are combined to dynamically acquire the weight of the word in the sentence, so that the importance degree of the word in a complete sentence is acquired, and the most important core word or core words of each sentence can be acquired according to the weight. Therefore, the question and the core words are respectively subjected to attention interaction, so that two different similar matrixes can be obtained, the similarity of a question layer and the similarity of a core word layer can be respectively obtained through the two similar matrixes, so that richer matching characteristics among sentences can be obtained, and then a series of processing logics such as alignment, combined splicing, prediction and the like are carried out, so that more accurate matching scores are obtained.

According to the technical scheme, the attention mechanism matching network integrating the dynamic core word interaction can calculate the similarity of the semantic space of the question sentence by combining the deep interactive matching network integrating the dynamic core word and the attention mechanism while the current open domain question-answering system rapidly expands data and rapidly expands the question-answering field, so that the reply accuracy and the generalization capability of the open domain question-answering system are ensured.

Those skilled in the art will readily appreciate that the techniques of the embodiments of the present invention may be implemented as software plus a required general purpose hardware platform. In a specific implementation, the present invention further provides a computer storage medium, where the computer storage medium may store a program, and when the computer storage medium is located in an intelligent device, the program may include all program steps involved in a question processing method configured by a controller when executed. The computer storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM) or a Random Access Memory (RAM).

In this specification, the same and similar parts between the display device embodiment and the method embodiment may be referred to each other, and related contents are not described again.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This invention is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. The specification and examples are to be regarded in an illustrative manner only and are not intended to limit the scope of the present invention. With a true scope and spirit of the invention being indicated by the following claims.

Claims

1. A question answering processing method, comprising:

2. The method of claim 1, wherein the new sentences in the second set are represented as:

new sentence as question + core word sequence

And deleting the words of the non-core words in the question, and reserving the original sequence of the core words in the question to obtain the core word sequence.

3. The method of claim 1, further comprising: inputting the second set into an interactive matching model configured to:

acquiring core word codes and question sentence codes of each new sentence in the second set;

calculating a core word similarity matrix and a question similarity matrix according to the core word codes and the question codes;

respectively calculating the combination of the target question and the candidate question according to the core interword similarity matrix and the interquestion similarity matrix;

and predicting and sequencing the probability scores of the candidate question sentences according to the combination of the target question sentences and the candidate question sentences.

4. The method of claim 3,

a combined Candidate of the Candidate question sentence ═ Candidate; candidate'; candidate _ keyword';

query represents an original sentence of the target question sentence, Query 'is the target question sentence represented by the candidate question sentence when interactive processing is carried out in the interactive matching model, and Query _ keyword' is a core word of the target question sentence represented by a core word of the candidate question sentence when interactive processing is carried out in the interactive matching model; candidate represents an original sentence of the Candidate question sentence, Candidate 'is a Candidate question sentence represented by a target question sentence when interactive processing is performed in the interactive matching model, Candidate _ keyword' is a core word of the Candidate question sentence represented by a core word of the target question sentence when interactive processing is performed in the interactive matching model.

5. The method of claim 1, wherein calculating the core words included in the question sentences of the first set comprises:

calculating word vectors corresponding to words included in the question sentences in the first set by using an ELMO model;

calculating sentence vectors of the question sentences in the first set by using the smooth reverse word frequency model;

calculating the Euclidean distance between the word vector and the sentence vector;

and when the Euclidean distance is smaller than a threshold value, determining that the words corresponding to the word vector are core words in the question sentence.

6. The method according to claim 1, wherein the obtaining of candidate question sentences similar to the target question sentence and answer information thereof comprises:

retrieving M question-answer information similar to the target question from a database;

sequencing the M question-answer information according to the question similarity;

and selecting N pieces of question-answer information with highest question similarity ranking as candidate question sentences and answer information thereof.

7. A smart device, comprising:

the controller configured to perform:

8. The smart device of claim 7 wherein the new sentences in the second set are represented as:

new sentence as question + core word sequence

9. The smart device of claim 7, wherein the controller is further configured to perform:

inputting the second set into an interactive matching model configured to:

10. The smart device of claim 9,

the Query represents an original sentence of the target question sentence, the Query 'is the target question sentence expressed by using the candidate question sentence when the interactive matching model carries out interactive processing, and the Query _ keyword' is a core word of the target question sentence expressed by using the core word of the candidate question sentence when the interactive matching model carries out interactive processing; candidate represents an original sentence of the Candidate question sentence, Candidate 'is a Candidate question sentence expressed by using a target question sentence when interactive processing is performed in the interactive matching model, Candidate _ keyword' is a core word of the Candidate question sentence expressed by using a core word of the target question sentence when interactive processing is performed in the interactive matching model.