CN111984763B

CN111984763B - Question answering processing method and intelligent device

Info

Publication number: CN111984763B
Application number: CN202010889260.4A
Authority: CN
Inventors: 李俊彦; 芮智琦; 詹乐
Original assignee: Hisense Electronic Technology Wuhan Co ltd
Current assignee: Hisense Electronic Technology Wuhan Co ltd
Priority date: 2020-08-28
Filing date: 2020-08-28
Publication date: 2023-09-19
Anticipated expiration: 2040-08-28
Also published as: CN111984763A

Abstract

The invention discloses a question answering processing method and intelligent equipment, which are used for responding to a target question input by a user and acquiring a plurality of candidate questions similar to the target question and answer information thereof; calculating core words included in each question in a first set, wherein the first set comprises a target question and a plurality of candidate questions; respectively forming new sentences by each question sentence and the core words in the first set to obtain a second set; and calculating a matching candidate question with highest probability score according to the second set, and outputting answer information corresponding to the matching candidate question. The invention utilizes the core word interaction matching, has good question-answer matching capability even if only a word or only individual words in the questions have differences, ensures that the output answers are more accurate, and can also improve the generalization capability of the current open-domain question-answer system.

Description

Question answering processing method and intelligent device

Technical Field

The invention relates to the technical field of open domain question and answer, in particular to a question and answer processing method and intelligent equipment.

Background

The open domain question-answering system is an important link in intelligent equipment such as intelligent televisions, intelligent houses and intelligent sound equipment, when a user puts forward a target question through the intelligent equipment, the open domain question-answering system can search similar candidate questions and answers thereof from a large number of data sets, and then provides a closest answer for the user by inquiring the similarity between the candidate questions and the target question, so that interactive question-answering between the intelligent equipment and the user is realized.

The open domain system can adopt a conventional interactive matching network model to carry out question-answer processing, but the question-answer processing mode has the following defects: if there is only a certain word or only individual words in the question are different, the question-answer matching effect is poor. As shown in fig. 1, the user asks "drink milk tea in the morning" and the open domain system gives the answer "drink milk tea in the morning is general to the body", then the user asks "drink milk tea in the evening" and the question is only a word difference of "early" and "late" from the previous question, so that the output answer may still be "drink milk tea in the morning" and cause misanswers, and thus the answer desired by the user cannot be given.

Disclosure of Invention

In order to solve the technical problems, the invention provides a method for processing a question and intelligent equipment.

The answer processing method provided in the first aspect comprises the following steps:

responding to a target question input by a user, and acquiring a plurality of candidate questions similar to the target question and answer information thereof;

calculating core words included in each question in a first set, wherein the first set comprises a target question and a plurality of candidate questions;

respectively forming new sentences by each question sentence and the core words in the first set to obtain a second set;

And calculating a matching candidate question with highest probability score according to the second set, and outputting answer information corresponding to the matching candidate question.

The intelligent device provided in the second aspect includes:

the input interface is used for receiving a target question input by a user and sending the target question to the controller;

the controller is configured to perform:

And when receiving the target question input by the user, carrying out similarity query according to the target question to acquire candidate questions and answer information thereof, thereby obtaining a first set. According to the core word extraction algorithm, core words included in each question in the first set can be calculated, the core words are key words for measuring the semantics of the questions, and the core words are used for participating in calculation, so that accuracy of question-answer matching can be improved. And then, forming a new sentence by using the questions and the core words thereof, for example, the questions are "how good the milk tea is drunk in the morning", extracting the core words are "morning" and "milk tea", and then, the new sentence is "how good the milk tea is drunk in the morning+how good the milk tea is drunk in the morning", so that the method not only packages and combines the original questions and the core words, but also improves the weight proportion of the core words in the new sentence, after each question in the first set generates the new sentence according to the method, a second set can be obtained, and according to the second set, the matching candidate questions with highest probability score are calculated by combining with the related question-answer matching algorithm, for example, an interactive matching network model, so that the matching degree of answer information corresponding to the output matching candidate questions and the target questions is higher, and the answer is more accurate. The application utilizes the core word interaction matching, has good question-answer matching capability even if only a word or only an individual word in the questions has difference, ensures that the output answers are more accurate, and is also beneficial to improving the generalization capability of the current open-domain question-answer system.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following description will briefly explain the drawings required for the embodiments, and it is apparent that the drawings in the following description are only some embodiments of the present invention and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

A UI schematic diagram of a conventional interactive matching network model for question-answering processing is exemplarily shown in fig. 1;

a schematic diagram of an operation scenario between the display device 200 and the control apparatus 100 is exemplarily shown in fig. 2;

a hardware configuration block diagram of the display device 200 in fig. 2 is exemplarily shown in fig. 3;

a hardware configuration block diagram of the control apparatus 100 in fig. 2 is exemplarily shown in fig. 4;

a schematic diagram of the software configuration in the display device 200 in fig. 2 is exemplarily shown in fig. 5;

an icon control interface display schematic for an application in display device 200 is shown schematically in fig. 6;

a flowchart of a question-answering processing method is exemplarily shown in fig. 7;

an architectural diagram of the ESIM model is shown schematically in fig. 8;

a UI schematic of a question-answer process of combining dynamic core words with ESIM models is exemplarily shown in fig. 9.

Detailed Description

For the purposes of making the objects, embodiments and advantages of the present application more apparent, an exemplary embodiment of the present application will be described more fully hereinafter with reference to the accompanying drawings in which exemplary embodiments of the application are shown, it being understood that the exemplary embodiments described are merely some, but not all, of the examples of the application.

Based on the exemplary embodiments described herein, all other embodiments that may be obtained by one of ordinary skill in the art without making any inventive effort are within the scope of the appended claims. Furthermore, while the present disclosure has been described in terms of an exemplary embodiment or embodiments, it should be understood that each aspect of the disclosure can be practiced separately from the other aspects.

It should be noted that the brief description of the terminology in the present application is for the purpose of facilitating understanding of the embodiments described below only and is not intended to limit the embodiments of the present application. Unless otherwise indicated, these terms should be construed in their ordinary and customary meaning.

The terms first, second and the like in the description and in the claims and in the above-described figures are used for distinguishing between similar or similar objects or entities and not necessarily for describing a particular sequential or chronological order, unless otherwise indicated (Unless otherwise indicated). It is to be understood that the terms so accessed are interchangeable under appropriate circumstances such that the embodiments of the application are, for example, capable of operation in sequences other than those illustrated or otherwise described herein.

Furthermore, the terms "comprise" and "have," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a product or apparatus that comprises a list of elements is not necessarily limited to those elements expressly listed, but may include other elements not expressly listed or inherent to such product or apparatus.

The application mainly aims at application scenes with question-answering processing, such as an open domain question-answering system and the like built in the intelligent equipment, a user can input a target question through a voice or typing mode and the like, and finally outputs an answer with the highest matching degree with the target question through the question-answering processing, so that intelligent question-answering interaction between the user and the intelligent equipment is realized. The intelligent devices include, but are not limited to, intelligent televisions, mobile terminals, intelligent home, intelligent customer service, intelligent sound equipment, intelligent robots and the like. An embodiment of a display device (smart tv) is provided below.

The term "module" as used herein refers to any known or later developed hardware, software, firmware, artificial intelligence, fuzzy logic, or combination of hardware and/or software code that is capable of performing the function associated with that element.

The term "remote control" as used herein refers to a component of an electronic device (such as a display device as disclosed herein) that is typically capable of wirelessly controlling the electronic device over a relatively short distance. Typically, the electronic device is connected with infrared and/or Radio Frequency (RF) signals and/or Bluetooth, and can also comprise functional modules such as WiFi, wireless USB, bluetooth, motion sensors and the like. For example: the hand-held touch remote controller replaces most of the physical built-in hard keys in a general remote control device with a touch screen user interface.

The term "gesture" as used herein refers to a user action by a user through a change in hand shape or hand movement, for expressing an intended idea, action, purpose, or result.

A schematic diagram of an operation scenario between a display device and a control apparatus according to an embodiment is exemplarily shown in fig. 2. As shown in fig. 2, a user may operate the display apparatus 200 through the mobile terminal 300 and the control device 100.

In some embodiments, the control apparatus 100 may be a remote controller, and the communication between the remote controller and the display device includes infrared protocol communication or bluetooth protocol communication, and other short-range communication modes, etc., and the display device 200 is controlled by a wireless or other wired mode. The user may control the display device 200 by inputting user instructions through keys on a remote control, voice input, control panel input, etc. Such as: the user can input corresponding control instructions through volume up-down keys, channel control keys, up/down/left/right movement keys, voice input keys, menu keys, on-off keys, etc. on the remote controller to realize the functions of the control display device 200.

In some embodiments, mobile terminals, tablet computers, notebook computers, and other smart devices may also be accessed to control the display device 200. For example, accessing an application running on a smart device controls the display device 200. The application program, by configuration, can provide various controls to the user in an intuitive User Interface (UI) on a screen associated with the smart device.

In some embodiments, the mobile terminal 300 may install a software application with the display device 200, implement connection communication through a network communication protocol, and achieve the purpose of one-to-one control operation and data communication. Such as: it is possible to implement a control command protocol established between the mobile terminal 300 and the display device 200, synchronize a remote control keyboard to the mobile terminal 300, and implement a function of controlling the display device 200 by controlling a user interface on the mobile terminal 300. The audio/video content displayed on the mobile terminal 300 can also be transmitted to the display device 200, so as to realize the synchronous display function.

As also shown in fig. 2, the display device 200 is also in data communication with the server 400 via a variety of communication means. The display device 200 may be permitted to make communication connections via a Local Area Network (LAN), a Wireless Local Area Network (WLAN), and other networks. The server 400 may provide various contents and interactions to the display device 200. By way of example, display device 200 receives software program updates, or accesses a remotely stored digital media library by sending and receiving information, as well as Electronic Program Guide (EPG) interactions. The server 400 may be a cluster, or may be multiple clusters, and may include one or more types of servers. Other web service content such as video on demand and advertising services are provided through the server 400.

The display device 200 may be a liquid crystal display, an OLED display, a projection display device. The particular display device type, size, resolution, etc. are not limited, and those skilled in the art will appreciate that the display device 200 may be modified in performance and configuration as desired.

The display apparatus 200 may additionally provide a smart network television function of a computer support function, including, but not limited to, a network television, a smart television, an Internet Protocol Television (IPTV), etc., in addition to the broadcast receiving television function.

A hardware configuration block diagram of the display device 200 according to an exemplary embodiment is illustrated in fig. 3.

In some embodiments, at least one of the controller 250, the modem 210, the communicator 220, the detector 230, the input/output interface 255, the display 275, the audio output interface 285, the memory 260, the power supply 290, the user interface 265, and the external device interface 240 is included in the display apparatus 200.

In some embodiments, the display 275 is configured to receive image signals from the first processor output, and to display video content and images and components of the menu manipulation interface.

In some embodiments, display 275 includes a display assembly for presenting pictures, and a drive assembly to drive the display of images.

In some embodiments, the video content is displayed from broadcast television content, or alternatively, from various broadcast signals that may be received via a wired or wireless communication protocol. Alternatively, various image contents received from the network server side transmitted from the network communication protocol may be displayed.

In some embodiments, the display 275 is used to present a user-manipulated UI interface generated in the display device 200 and used to control the display device 200.

In some embodiments, depending on the type of display 275, a drive assembly for driving the display is also included.

In some embodiments, display 275 is a projection display and may further include a projection device and a projection screen.

In some embodiments, communicator 220 is a component for communicating with external devices or external servers according to various communication protocol types. For example: the communicator may include at least one of a WIFI module 221, a bluetooth module 222, a wired ethernet module 223, or other network communication protocol modules or near field communication protocol modules, and an infrared receiver, so that the communicator 220 may receive a control signal of the control device 100 according to the control of the controller 250, and implement the control signal as a WIFI signal, a bluetooth signal, a radio frequency signal, or other signal types.

In some embodiments, the display apparatus 200 may establish control signal and data signal transmission and reception between the communicator 220 and the external control device 100 or the content providing apparatus.

In some embodiments, the user interface 265 may be used to receive infrared control signals from the control device 100 (e.g., an infrared remote control, etc.).

In some embodiments, the detector 230 is a signal that the display device 200 uses to capture or interact with the external environment.

In some embodiments, the detector 230 includes an optical receiver, a sensor for capturing the intensity of ambient light, a parameter change may be adaptively displayed by capturing ambient light, etc.

In some embodiments, the detector 230 may further include an image collector, such as a camera, a video camera, etc., which may be used to collect external environmental scenes, collect attributes of a user or interact with a user, adaptively change display parameters, and recognize a user gesture to realize an interaction function with the user.

In some embodiments, the detector 230 may also include a temperature sensor or the like, such as by sensing ambient temperature.

In some embodiments, the display device 200 may adaptively adjust the display color temperature of the image. The display device 200 may be adjusted to display a colder color temperature shade of the image, such as when the temperature is higher, or the display device 200 may be adjusted to display a warmer color shade of the image when the temperature is lower.

In some embodiments, the detector 230 may also be a sound collector or the like, such as a microphone, that may be used to receive the user's sound. Illustratively, a voice signal including a control instruction for a user to control the display apparatus 200, or an acquisition environmental sound is used to recognize an environmental scene type so that the display apparatus 200 can adapt to environmental noise.

In some embodiments, as shown in fig. 3, the input/output interface 255 is configured to enable data transfer between the controller 250 and external other devices or other controllers 250. Such as receiving video signal data and audio signal data of an external device, command instruction data, or the like.

In some embodiments, external device interface 240 may include, but is not limited to, the following: any one or more interfaces of a high definition multimedia interface HDMI interface, an analog or data high definition component input interface, a composite video input interface, a USB input interface, an RGB port, and the like can be used. The plurality of interfaces may form a composite input/output interface.

In some embodiments, as shown in fig. 3, the modem 210 is configured to receive the broadcast television signal by a wired or wireless receiving manner, and may perform modulation and demodulation processes such as amplification, mixing, and resonance, and demodulate the audio/video signal from a plurality of wireless or wired broadcast television signals, where the audio/video signal may include a television audio/video signal carried in a television channel frequency selected by a user, and an EPG data signal.

In some embodiments, the frequency point demodulated by the modem 210 is controlled by the controller 250, and the controller 250 may send a control signal according to the user selection, so that the modem responds to the television signal frequency selected by the user and modulates and demodulates the television signal carried by the frequency.

In some embodiments, the broadcast television signal may be classified into a terrestrial broadcast signal, a cable broadcast signal, a satellite broadcast signal, an internet broadcast signal, or the like according to a broadcasting system of the television signal. Or may be differentiated into digital modulation signals, analog modulation signals, etc., depending on the type of modulation. Or it may be classified into digital signals, analog signals, etc. according to the kind of signals.

In some embodiments, the controller 250 and the modem 210 may be located in separate devices, i.e., the modem 210 may also be located in an external device to the main device in which the controller 250 is located, such as an external set-top box or the like. In this way, the set-top box outputs the television audio and video signals modulated and demodulated by the received broadcast television signals to the main body equipment, and the main body equipment receives the audio and video signals through the first input/output interface.

In some embodiments, the controller 250 controls the operation of the display device and responds to user operations through various software control programs stored on the memory. The controller 250 may control the overall operation of the display apparatus 200. For example: in response to receiving a user command to select to display a UI object on the display 275, the controller 250 may perform an operation related to the object selected by the user command.

In some embodiments, the object may be any one of selectable objects, such as a hyperlink or an icon. Operations related to the selected object, such as: displaying an operation of connecting to a hyperlink page, a document, an image, or the like, or executing an operation of a program corresponding to the icon. The user command for selecting the UI object may be an input command through various input means (e.g., mouse, keyboard, touch pad, etc.) connected to the display device 200 or a voice command corresponding to a voice uttered by the user.

As shown in fig. 3, the controller 250 includes at least one of a random access Memory 251 (Random Access Memory, RAM), a Read-Only Memory 252 (ROM), a video processor 270, an audio processor 280, other processors 253 (e.g., a graphics processor (Graphics Processing Unit, GPU), a central processing unit 254 (Central Processing Unit, CPU), a communication interface (Communication Interface), and a communication Bus 256 (Bus), which connects the respective components.

In some embodiments, RAM 251 is used to store temporary data for the operating system or other on-the-fly programs.

In some embodiments, ROM 252 is used to store instructions for various system boots.

In some embodiments, ROM252 is used to store a basic input output system, referred to as a basic input output system (Basic Input Output System, BIOS). The system comprises a drive program and a boot operating system, wherein the drive program is used for completing power-on self-checking of the system, initialization of each functional module in the system and basic input/output of the system.

In some embodiments, upon receipt of the power-on signal, the display device 200 power starts up, the CPU runs system boot instructions in the ROM252, copies temporary data of the operating system stored in memory into the RAM 251, in order to start up or run the operating system. When the operating system is started, the CPU copies temporary data of various applications in the memory to the RAM 251, and then, facilitates starting or running of the various applications.

In some embodiments, processor 254 is used to execute operating system and application program instructions stored in memory. And executing various application programs, data and contents according to various interactive instructions received from the outside, so as to finally display and play various audio and video contents.

In some example embodiments, the processor 254 may include a plurality of processors. The plurality of processors may include one main processor and one or more sub-processors. A main processor for performing some operations of the display apparatus 200 in the pre-power-up mode and/or displaying a picture in the normal mode. One or more sub-processors for one operation in a standby mode or the like.

In some embodiments, the graphics processor 253 is configured to generate various graphical objects, such as: icons, operation menus, user input instruction display graphics, and the like. The device comprises an arithmetic unit, wherein the arithmetic unit is used for receiving various interaction instructions input by a user to carry out operation and displaying various objects according to display attributes. And a renderer for rendering the various objects obtained by the arithmetic unit, wherein the rendered objects are used for being displayed on a display.

In some embodiments, video processor 270 is configured to receive external video signals, perform video processing such as decompression, decoding, scaling, noise reduction, frame rate conversion, resolution conversion, image composition, etc., according to standard codec protocols for input signals, and may result in signals that are displayed or played on directly displayable device 200.

In some embodiments, video processor 270 includes a demultiplexing module, a video decoding module, an image compositing module, a frame rate conversion module, a display formatting module, and the like.

The demultiplexing module is used for demultiplexing the input audio/video data stream, such as the input MPEG-2, and demultiplexes the input audio/video data stream into video signals, audio signals and the like.

And the video decoding module is used for processing the demultiplexed video signals, including decoding, scaling and the like.

And an image synthesis module, such as an image synthesizer, for performing superposition mixing processing on the graphic generator and the video image after the scaling processing according to the GUI signal input by the user or generated by the graphic generator, so as to generate an image signal for display.

The frame rate conversion module is configured to convert the input video frame rate, for example, converting the 60Hz frame rate into the 120Hz frame rate or the 240Hz frame rate, and the common format is implemented in an inserting frame manner.

The display format module is used for converting the received frame rate into a video output signal, and changing the video output signal to a signal conforming to the display format, such as outputting an RGB data signal.

In some embodiments, the graphics processor 253 may be integrated with the video processor, or may be separately configured, where the integrated configuration may perform processing of graphics signals output to the display, and the separate configuration may perform different functions, such as gpu+ FRC (Frame Rate Conversion)) architecture, respectively.

In some embodiments, the audio processor 280 is configured to receive an external audio signal, decompress and decode the audio signal according to a standard codec protocol of an input signal, and perform noise reduction, digital-to-analog conversion, and amplification processing, so as to obtain a sound signal that can be played in a speaker.

In some embodiments, video processor 270 may include one or more chips. The audio processor may also comprise one or more chips.

In some embodiments, video processor 270 and audio processor 280 may be separate chips or may be integrated together with the controller in one or more chips.

In some embodiments, the audio output, under the control of the controller 250, receives sound signals output by the audio processor 280, such as: the speaker 286, and an external sound output terminal that can be output to a generating device of an external device, other than the speaker carried by the display device 200 itself, such as: external sound interface or earphone interface, etc. can also include the close range communication module in the communication interface, for example: and the Bluetooth module is used for outputting sound of the Bluetooth loudspeaker.

The power supply 290 supplies power input from an external power source to the display device 200 under the control of the controller 250. The power supply 290 may include a built-in power circuit installed inside the display device 200, or may be an external power source installed in the display device 200, and a power interface for providing an external power source in the display device 200.

The user interface 265 is used to receive an input signal from a user and then transmit the received user input signal to the controller 250. The user input signal may be a remote control signal received through an infrared receiver, and various user control signals may be received through a network communication module.

In some embodiments, a user inputs a user command through the control apparatus 100 or the mobile terminal 300, the user input interface is then responsive to the user input through the controller 250, and the display device 200 is then responsive to the user input.

In some embodiments, a user may input a user command through a Graphical User Interface (GUI) displayed on the display 275, and the user input interface receives the user input command through the Graphical User Interface (GUI). Alternatively, the user may input the user command by inputting a specific sound or gesture, and the user input interface recognizes the sound or gesture through the sensor to receive the user input command.

In some embodiments, a "user interface" is a media interface for interaction and exchange of information between an application or operating system and a user that enables conversion between an internal form of information and a form acceptable to the user. A commonly used presentation form of the user interface is a graphical user interface (Graphic User Interface, GUI), which refers to a user interface related to computer operations that is displayed in a graphical manner. It may be an interface element such as an icon, window, control, etc. displayed in the display of the electronic device, where the control may include a visual interface element such as an icon, button, menu, tab, text box, dialog box, status bar, navigation bar, widget, etc.

The memory 260 includes memory storing various software modules for driving the display device 200. Such as: various software modules stored in the first memory, including: at least one of a base module, a detection module, a communication module, a display control module, a browser module, various service modules, and the like.

The base module is a bottom software module for signal communication between the various hardware in the display device 200 and for sending processing and control signals to the upper modules. The detection module is used for collecting various information from various sensors or user input interfaces and carrying out digital-to-analog conversion and analysis management.

For example, the voice recognition module includes a voice analysis module and a voice instruction database module. The display control module is used for controlling the display to display the image content, and can be used for playing the multimedia image content, the UI interface and other information. And the communication module is used for carrying out control and data communication with external equipment. And the browser module is used for executing data communication between the browsing servers. And the service module is used for providing various services and various application programs. Meanwhile, the memory 260 also stores received external data and user data, images of various items in various user interfaces, visual effect maps of focus objects, and the like.

Fig. 4 exemplarily shows a block diagram of a configuration of the control apparatus 100 in accordance with an exemplary embodiment. As shown in fig. 4, the control device 100 includes a controller 110, a communication interface 130, a user input/output interface, a memory, and a power supply.

The control apparatus 100 is configured to control the display device 200, and to receive an input operation instruction of a user, and to convert the operation instruction into an instruction recognizable and responsive to the display device 200, and to function as an interaction between the user and the display device 200. Such as: the user responds to the channel addition and subtraction operation by operating the channel addition and subtraction key on the control apparatus 100.

In some embodiments, the control apparatus 100 may be a smart device. Such as: the control apparatus 100 may install various applications for controlling the display device 200 according to user's needs.

In some embodiments, as shown in fig. 2, a mobile terminal 300 or other intelligent electronic device may function similarly to the control apparatus 100 after installing an application for manipulating the display device 200. Such as: the user may implement the functions of the physical keys of the control apparatus 100 by installing various function keys or virtual buttons of a graphical user interface available on the mobile terminal 300 or other intelligent electronic device.

The controller 110 includes a processor 112 and RAM 113 and ROM 114, a communication interface 130, and a communication bus. The controller is used to control the operation and operation of the control device 100, as well as the communication collaboration among the internal components and the external and internal data processing functions.

The communication interface 130 enables communication of control signals and data signals with the display device 200 under the control of the controller 110. Such as: the received user input signal is transmitted to the display device 200. The communication interface 130 may include at least one of a WiFi chip 131, a bluetooth module 132, an NFC module 133, and other near field communication modules.

A user input/output interface 140, wherein the input interface includes at least one of a microphone 141, a touchpad 142, a sensor 143, keys 144, and other input interfaces. Such as: the user can implement a user instruction input function through actions such as voice, touch, gesture, press, and the like, and the input interface converts a received analog signal into a digital signal and converts the digital signal into a corresponding instruction signal, and sends the corresponding instruction signal to the display device 200.

The output interface includes an interface that transmits the received user instruction to the display device 200. In some embodiments, an infrared interface may be used, as well as a radio frequency interface. Such as: when the infrared signal interface is used, the user input instruction needs to be converted into an infrared control signal according to an infrared control protocol, and the infrared control signal is sent to the display device 200 through the infrared sending module. And the following steps: when the radio frequency signal interface is used, the user input instruction is converted into a digital signal, and then the digital signal is modulated according to a radio frequency control signal modulation protocol and then transmitted to the display device 200 through the radio frequency transmission terminal.

In some embodiments, the control device 100 includes at least one of a communication interface 130 and an input-output interface 140. The control device 100 is provided with a communication interface 130, such as: the WiFi, bluetooth, NFC, etc. modules may send the user input instruction to the display device 200 through a WiFi protocol, or a bluetooth protocol, or an NFC protocol code.

A memory 190 for storing various operation programs, data and applications for driving and controlling the control device 100 under the control of the controller. The memory 190 may store various control signal instructions input by a user.

And a power supply 180 for providing operation power support for each element of the control device 100 under the control of the controller. May be a battery and associated control circuitry.

In some embodiments, the system may include a Kernel (Kernel), a command parser (shell), a file system, and an application. The kernel, shell, and file system together form the basic operating system structure that allows users to manage files, run programs, and access the system. After power-up, the kernel is started, the kernel space is activated, hardware is abstracted, hardware parameters are initialized, virtual memory, a scheduler, signal and inter-process communication (IPC) are operated and maintained. After the kernel is started, shell and user application programs are loaded again. The application program is compiled into machine code after being started to form a process.

Referring to FIG. 5, in some embodiments, the system is divided into four layers, from top to bottom, an application layer (referred to as an "application layer"), an application framework layer (Application Framework layer) (referred to as a "framework layer"), a An Zhuoyun row (Android run) and a system library layer (referred to as a "system runtime layer"), and a kernel layer, respectively.

In some embodiments, at least one application program is running in the application program layer, and these application programs may be a Window (Window) program of an operating system, a system setting program, a clock program, a camera application, and the like; and may be an application program developed by a third party developer, such as a hi-see program, a K-song program, a magic mirror program, etc. In particular implementations, the application packages in the application layer are not limited to the above examples, and may actually include other application packages, which the embodiments of the present application do not limit.

The framework layer provides an application programming interface (Aplication Pogramming Iterface, API) and programming framework for application programs of the application layer. The application framework layer includes a number of predefined functions. The application framework layer corresponds to a processing center that decides to let the applications in the application layer act. Through the API interface, the application program can access the resources in the system and acquire the services of the system in the execution.

As shown in fig. 5, in the embodiment of the present application, the application framework layer includes a manager (manager), a Content Provider (Content Provider), a View System (View System), and the like, where the manager includes at least one of the following modules: an Activity Manager (Activity Manager) is used to interact with all activities that are running in the system; a Location Manager (Location Manager) is used to provide system services or applications with access to system Location services; a Package Manager (Package Manager) for retrieving various information about an application Package currently installed on the device; a notification manager (Notification Manager) for controlling the display and clearing of notification messages; a Window Manager (Window Manager) is used to manage bracketing icons, windows, toolbars, wallpaper, and desktop components on the user interface.

In some embodiments, the activity manager is to: the lifecycle of each application program is managed, as well as the usual navigation rollback functions, such as controlling the exit of the application program (including switching the currently displayed user interface in the display window to the system desktop), opening, backing (including switching the currently displayed user interface in the display window to the previous user interface of the currently displayed user interface), etc.

In some embodiments, the window manager is used to manage all window programs, such as to obtain the display size, determine if there is a status bar, lock the screen, intercept the screen, control display window changes (e.g., zoom out the display window, shake the display, warp the display, etc.), and so on.

In some embodiments, the system runtime layer provides support for the upper layer, the framework layer, and when the framework layer is accessed, the android operating system runs the C/C++ libraries contained in the system runtime layer to implement the functions to be implemented by the framework layer.

In some embodiments, the kernel layer is a layer between hardware and software. As shown in fig. 5, the kernel layer contains at least one of the following drivers: audio drive, display drive, bluetooth drive, camera drive, WIFI drive, USB drive, HDMI drive, sensor drive (e.g., fingerprint sensor, temperature sensor, touch sensor, pressure sensor, etc.), and the like.

In some embodiments, the kernel layer further includes a power driver module for power management.

In some embodiments, the software programs and/or modules corresponding to the software architecture in fig. 5 are stored in the first memory or the second memory shown in fig. 3 or fig. 4.

In some embodiments, taking a magic mirror application (photographing application) as an example, when the remote control receiving device receives an input operation of the remote control, a corresponding hardware interrupt is sent to the kernel layer. The kernel layer processes the input operation into the original input event (including the value of the input operation, the timestamp of the input operation, etc.). The original input event is stored at the kernel layer. The application program framework layer acquires an original input event from the kernel layer, identifies a control corresponding to the input event according to the current position of the focus and takes the input operation as a confirmation operation, wherein the control corresponding to the confirmation operation is a control of a magic mirror application icon, the magic mirror application calls an interface of the application framework layer, the magic mirror application is started, and further, a camera driver is started by calling the kernel layer, so that a still image or video is captured through a camera.

In some embodiments, for a display device with a touch function, taking a split screen operation as an example, the display device receives an input operation (such as a split screen operation) acted on a display by a user, and the kernel layer may generate a corresponding input event according to the input operation and report the event to the application framework layer. The window mode (e.g., multi-window mode) and window position and size corresponding to the input operation are set by the activity manager of the application framework layer. And window management of the application framework layer draws a window according to the setting of the activity manager, then the drawn window data is sent to a display driver of the kernel layer, and the display driver displays application interfaces corresponding to the window data in different display areas of the display.

In some embodiments, as shown in fig. 6, the application layer contains at least one icon control that the application can display in the display, such as: a live television application icon control, a Video On Demand (VOD) application icon control, a media center application icon control, an application center icon control, a game application icon control, and the like.

In some embodiments, the live television application may provide live television via different signal sources. For example, a live television application may access input provided television signals from a cable television, a wireless broadcast, a satellite service, or other type of live television service. And, the live television application may display video of the live television signal on the display device 200.

In some embodiments, the video on demand application may provide video from different storage sources. Unlike live television applications, video-on-demand provides video displays from some storage sources. For example, video-on-demand may come from the server side of cloud storage, from a local hard disk storage containing stored video programs.

In some embodiments, the media center application may provide various multimedia content playing applications. For example, a media center may be a different service than live television or video on demand, and a user may access various images or audio through a media center application.

In some embodiments, an application center may be provided to store various applications. The application may be a game, an application, or some other application associated with a computer system or other device but which may be run in a smart television. The application center may obtain these applications from different sources, store them in local storage, and then be run on the display device 200.

The above embodiments provide the hardware/software architecture and the functional implementation of the smart tv, and in some embodiments, the display device should further have an input interface, where the input interface is used to receive a target question input by a user, and send the target question to the controller 250, and the controller 250 executes and controls the question-answering processing procedure. The input interface can be a sound collection interface for collecting the problems raised by the voice of the user, and the target sentence is the corpus information in the form of voice; alternatively, the input interface may be a user interface 265, and the user interface 265 receives a target question manually input by a user through a remote controller, a keyboard, or the like, where the target question is embodied as text information. The embodiment of the target question is not limited to the input method. For other types of intelligent devices, the intelligent device at least comprises an input interface and a controller for executing question-answering processing, the open domain question-answering system is arranged in the controller, and other hardware/software structures included in the intelligent device are not limited, and the intelligent device is particularly suitable for practical application.

Referring to fig. 1, it can be seen that although the existing interactive matching network model can overcome the semantic gap, the overall similarity of questions and sentences is very high, and when only a certain word or only an individual word is different, an accurate matching answer cannot be given, so that an incorrect answer is easily caused. In order to solve the technical problem, as shown in fig. 7, a question-answering processing method provided by an embodiment of the present application is executed by a controller 250, that is, the controller 250 is an execution subject of the method, and the method includes:

and step S10, responding to a target question inputted by a user, and acquiring a plurality of candidate questions similar to the target question and answer information thereof.

After the user inputs the target question, the user needs to search the candidate question similar to the target question, select the matching candidate question with the highest similarity with the target question from the candidate questions, and output the answer information corresponding to the matching candidate question, thereby realizing one-question-one-answer between the user and the intelligent device. In some embodiments, the candidate questions and the answer information corresponding to the candidate questions may be stored in a designated database in advance, or may be queried by a Search engine, such as an Elastic Search. The controller 250 generates a query instruction according to the target question and transmits the query instruction to a database or a search engine.

In some embodiments, many pieces of question-answer information with different similarity to the target question can be retrieved through the database/search engine, which necessarily includes some pieces of question-answer information with relatively low similarity to the target question, and these pieces of question-answer information obviously have not high matching with the target question, so that further screening is also required, and if M pieces of question-answer information similar to the target question are retrieved, the M pieces of question-answer information are ranked according to the similarity of the question, and N pieces of question-answer information with highest ranking of the similarity of the question are selected as candidate question and answer information thereof. For example, if the questions are ranked in order of descending similarity, the question answering information ranked in the top N is selected; if the questions are ordered according to the order of the similarity of the questions from small to large (increasing), the question answering information ordered in the last N pieces is selected, wherein M is larger than or equal to N. In the embodiment, the question and answer information with low similarity is filtered through the quantity constraint, and in other embodiments, the question and answer information with the similarity of the question being greater than the threshold value can be selected through the threshold constraint of the similarity of the question, and the screening rule of the candidate questions is not limited.

Step S20, core words included in all questions in a first set are calculated, and the first set comprises target questions and a plurality of candidate questions.

In the application, the target question sentence originally proposed by the user and the acquired candidate question sentences form a first set, namely, the first set comprises n+1 question sentences, and core words of the n+1 question sentences are required to be extracted in step S20. The core words are key words for measuring the semantics of the question, and directly influence the accuracy of question-answer matching, for example, in fig. 1, the target question is "how good milk tea is drunk in the morning", and the core words comprise "morning" and "milk tea". Some core word extraction algorithms may be employed, and in some embodiments, core word extraction methods based on ELMO models (Embedding from Language Model, word vectors obtained from language models) and SIF (Smooth Inverse Frequency, smooth word-inversion frequency) models are employed in the present application.

The ELMO model is a language model based on a multi-layer two-way LSTM (Long Short-Term Memory network), a large number of texts and corpus are used for pre-training the language model, the LSTM at the lower layer represents relatively simple grammar information, the LSTM at the upper layer captures semantic information, a sentence is input into the language model, calculation processing is carried out by the language model, and different word vector representations are output. The ELMO model generally includes an input layer (equivalent to an Embedding layer) and a bi-directional LSTM layer, which are used to obtain different features (grammar and semantics) of words and solve the problem of word ambiguity, and for word vectors, words and vectors are in one-to-one correspondence, and are not changed due to the difference of parts of speech and semantics. For details of ELMO, reference may be made to the description of the related art, and the present application will not be repeated.

The SIF model is a weighted word bag model, which considers only the weights of all words without considering the context between words, similar to filling all words into a bag, each word being independent, and calculating the weighting coefficient of each word in the bag by a smooth word-inversion algorithm. Each word vector in a sentence is input into the SIF model, and after calculation processing by the SIF model, a vector representation of the sentence, here denominated as a sentence vector, can be obtained. The specific content of the SIF model may refer to the description of the related prior art, and the present application will not be repeated.

And calculating word vectors corresponding to words included in the question in the first set by using the ELMO model, calculating sentence vectors of the question in the first set by using the SIF model, and then obtaining core words of the question according to similarity of the word vectors and the sentence vectors in the same question. In some embodiments, the euclidean distance between the word vector and the sentence vector of the same question is calculated, the smaller the euclidean distance is, which indicates that the word corresponding to the word vector is closer to the core word of the question, so that a threshold critical constraint can be set, when the euclidean distance is smaller than the threshold, the word corresponding to the word vector is determined to be the core word in the question, so that the core word of the target question and the candidate question can be calculated, and the weight and other information of each core word in the question can be calculated through the SIF model.

Step S30, each question sentence in the first set and the core word thereof are respectively formed into a new sentence, and a second set is obtained.

Wherein new sentence = question + core word sequence; the core word sequence is obtained by deleting the characters of non-core words in the question and keeping the original sequence of the core words in the question.

After the core words included in each question in the first set are calculated, the question and the core words thereof are formed into a new sentence, for example, the question is "how good the milk tea is drunk in the morning", the core words are "drink", "good" and "how good the milk tea" are extracted according to the step S20, the words of the non-core words are deleted, the words are kept before the words are sorted according to the sequence, the sequence of the core words is "how good the milk tea is drunk in the morning" after the words are sorted, and the new sentence is "how good the milk tea is drunk in the morning+how good the milk tea is drunk in the morning", so that the method not only packages and combines the original question and the core words, but also improves the weight proportion of the core words in the new sentence and improves the accuracy of question-answer matching. Each question (including the target question and the candidate question) in the first set generates a new sentence in this way, and then the second set is obtained.

And step S40, calculating a matching candidate question with the highest probability score according to the second set, and outputting answer information corresponding to the matching candidate question.

Step S40 needs to be calculated by means of a question-answer matching algorithm, and in some embodiments, the present application uses an interactive matching model, such as ESIM (Enhanced Sequential Inference Model, enhanced sequence inference model), which is an attention mechanism-based interactive matching model, such as the ESIM model architecture shown in fig. 8, where the left side represents the ESIM model in fig. 8, and the right side represents a network model containing syntax analysis information in Tree LSTM (Tree-LSTM), and the specific processing flow of the ESIM model is configured as follows:

(A) The bottom layer is the Input Encoding (Input Encoding) layer. Namely, each new sentence in the second set is input into the coding layer, and the core word code and the question (original sentence) code of each new sentence are obtained by the coding layer. In the ESIM model, two questions are input and respectively connected with Embedding and BiLSTM (i.e. a bidirectional LSTM network), one of the two questions is set as a precondition (Premise) and the other is set as a Hypothesis (Hypothesis), and the precondition and the Hypothesis are respectively encoded by using BiLSTM to obtain the following steps:

in the above-mentioned method, the step of,for the post-coding precondition question +. >For the coded hypothesized question, a represents the precondition question, b represents the hypothesized question, i is the number identified as the word in the preceding question, j is the number identified as the word in the hypothesized question, l _a For the number of words in the preceding question (corresponding to the length of the sentence), l _b To assume the number of words in a question (equivalent to the sentence length). Using BiLSTM, it is possible to learn how to represent the word in a sentence and its context, and it is also understood that this is after the word vector, recoding in the current context, resulting in a new Embedding vector. The precondition question in the application is a target question, and the question is assumed to be a candidate question.

(B) The local inference layer implements local inference modeling (Local Inference Modeling). Before local reasoning modeling, aligning two questions, and then calculating the similarity between words of the two questions. The alignment mechanism is implemented here as an attention mechanism, which is the process of: the word sequence of the premise (hypothesis) is treated as a bag of words embedding vector, and the "alignment" between questions is calculated separately so that each word of the premise is semantically consistent with the words in the hypothesis. According to the core word code and question code obtained by the coding layer, calculating a core word similarity matrix matrix_key and a question similarity matrix matrix_seq in a dot product mode _ij . In matrix_seq _ij For example, it can be calculated by the following formula:

for the matrix_key of the core inter-word similarity matrix, the core word vector of the target question after encoding and the core word vector of the candidate question after encoding can be utilized to calculate the matrix_seq _ij The calculations are performed in a similar manner and are not described in detail herein. Core inter-word similarity matrix matrix_key and inter-question similarity matrix matrix_seq _ij Is a two-dimensional vector, for example, if the precondition is "Chinese very beautiful", and if the precondition is "I are Chinese", the obtained matrix_seq is _ij As shown in table 1:

TABLE 1

	China	Very much	Beauty device
				I am	0.5	0.2	0.1
Is that	0.1	0.3	0.1
				China	1	0.2	0.4
Human body	0.4	0.2	0.1

Then, local reasoning is started, and the previously obtained core inter-word similarity matrix matrix_key and inter-question similarity matrix matrix_seq are utilized _ij And respectively calculating the combination formulas of the target question sentence and the candidate question sentence.

The combined query= [ Query ] of the target question; query'; query_keyword' ];

combined candidate= [ Candidate ] of the Candidate question; candidate'; candidate_keyword' ].

The Query represents an original sentence of a target question, the Query 'is a target question represented by a candidate question when the ESIM model is subjected to interactive processing, and the query_keyword' is a core word of the target question represented by a core word of the candidate question when the ESIM model is subjected to interactive processing; candida represents the original sentence of the Candidate question, candida 'is the Candidate question represented by the target question when the interactive processing is performed in the ESIM model, and Candida_keyword' is the core word of the Candidate question represented by the core word of the target question when the interactive processing is performed in the ESIM model.

Taking Query 'and Candidate' calculations as examples, equivalent to combining the target question and the Candidate question, mutually generating sentence representations weighted by similarity to each other, and keeping the dimensions unchanged, the calculation formula is as follows:

as can be seen from the above-mentioned description,is to->The result of the calculation after the weighted summation, that is +.>Representation->Every word and->Is a correlation degree of (a). Similarly, the calculation principles of Query 'and Candidate' may be referred to calculate query_keyword 'and candidate_keyword', which are not described in detail in this embodiment.

After the Query 'and the query_keyword' are calculated, the Query is combined, and then the combination type target question sentence can be spliced and combined; after the Candida 'and Candida_keyword' are calculated, the Candida is combined, so that the combined formulas of the Candidate questions can be spliced and combined, and the local reasoning information is obtained.

(C) The reasoning combination (Inference Composition) layer is used for calculating a context expression vector of local reasoning information according to a combination of a target question and a candidate question, the local reasoning information can be extracted once through BiLSTM, an Average Pooling (Average Pooling) method and a maximum Pooling (Max Pooling) method can be adopted when the context expression vector is calculated, all the pooled magnitudes are connected to finally form a feature vector V with a fixed length, and the feature vector V is input into the Prediction (Prediction) layer.

(D) And the prediction layer is used for predicting and sequencing the probability scores of the candidate questions according to the feature vector V. The prediction layer may be a two-layer fully connected layer, and dropout may be added to prevent overfitting before pooling the layer to the fully connected layer. In fully connected layers, the first layer may use the relu activation function and the second layer uses softmax as the prediction output. The softmax can automatically calculate the probability score of each candidate question, and the probability score is used for measuring the question-answer matching degree corresponding to each candidate question, namely the probability equivalent to each candidate question being acquired.

After calculating the probability scores of the N candidate question sentences, the softmax ranks according to the probability scores, determines the candidate question sentences corresponding to the highest probability score to be the matched candidate question sentences, and outputs the answer information corresponding to the candidate question sentences. In some embodiments, if the user is a voice input target question, the answer information can be played through a voice playing device such as a speaker; if the user manually inputs the target question by text, i.e. text information of the target question is displayed on the display 275 of the display device, the matched answer information is also displayed through the display 275. Note that the question-answer presentation form is not limited to that described in this embodiment.

The ESIM model is configured in the controller. The application increases the attention interaction of the core words based on the processing logic of the conventional ESIM model, the attention interaction of the core words is similar to the interaction process between questions, and the content adaptability of other more details of the ESIM model is related to the description of the related prior art, so that the embodiment is not repeated.

As shown in fig. 9, when the answer processing is performed according to the technical scheme of the application, the user asks the question "how good the milk tea is drunk in the morning", the answer information displayed by the intelligent device is that the milk tea is drunk in the morning and the body is general ", the user continues to ask the question" how good the milk tea is drunk in the evening ", and the answer information displayed by the intelligent device is that the milk tea is drunk in the evening and the body is not good, so that the user is easy to fat and the gastrointestinal burden is easy to increase, so that even if the target question sentence of two questions only has the difference between" early "and" late ", the scheme can still give accurate answers, and the answer accuracy is improved.

In natural language, the core degree (also called weight) of a word in different sentences is different, and the weight of the word in the sentence is dynamically obtained by combining an ELMO model and an unsupervised SIF model, so that the importance degree of the word in a complete sentence is obtained, and one or more core words of the most importance of each sentence can be obtained according to the weight. Therefore, the question and the core words thereof are respectively subjected to attention interaction, so that two different similarity matrixes can be obtained, the similarity of the question layer and the core word layer can be respectively obtained through the two similarity matrixes, so that the richer matching characteristics among sentences can be obtained, and then a series of processing logics such as alignment, combined splicing, prediction and the like can be used for obtaining a more accurate matching score.

According to the technical scheme, the attention mechanism matching network integrating dynamic core word interaction can be used for calculating similarity of a semantic space of a question by combining the integrated dynamic core word and the deep interactive matching network of the attention mechanism while the current open domain question-answering system rapidly expands data and the current question-answering field, so that accuracy and generalization capability of replying of the open domain question-answering system are guaranteed.

It will be apparent to those skilled in the art that the techniques of embodiments of the present application may be implemented in software plus a necessary general purpose hardware platform. In a specific implementation, the present application also provides a computer storage medium, where the computer storage medium may store a program, and when the computer storage medium is located in the smart device, the program may include all program steps involved in a question answering processing method configured by the controller when the program is executed. The computer storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), a random-access memory (random access memory, RAM), or the like.

In this description, the same and similar parts between the display device embodiment and the method embodiment may be referred to each other, and the relevant contents are not repeated.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This invention is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope of the invention being indicated by the following claims. The true scope and spirit of the invention is indicated by the following claims.

Claims

1. A method of processing questions, comprising:

acquiring a core word code and a question code of each new sentence in the second set;

calculating a core word similarity matrix and a question similarity matrix according to the core word codes and the question codes;

According to the core inter-word similarity matrix and the inter-question similarity matrix, respectively calculating the combination of the target question and the candidate question; the combined expression of the target question is [ Query ]; the combination formula of the Candidate questions is expressed as [ Candida; candidate_keyword ', wherein Query represents an original sentence of the target question, the Query ' is a target question represented by a Candidate question when the interactive matching model is used for interactive processing, and the query_keyword ' is a core word of the target question represented by a core word of the Candidate question when the interactive matching model is used for interactive processing; candidate represents an original sentence of the Candidate question, candidate 'is a Candidate question represented by a target question when performing interactive processing in the interactive matching model, and candidate_keyword' is a core word of the Candidate question represented by a core word of the target question when performing interactive processing in the interactive matching model;

and predicting probability scores of all the candidate questions according to the combination of the target questions and the candidate questions, and outputting answer information corresponding to the matched candidate questions with highest probability scores.

2. The method of claim 1, wherein the new sentence in the second set is represented as:

new sentence=question+core word sequence

The core word sequence is obtained by deleting the characters of non-core words in the question and keeping the original sequence of the core words in the question.

3. The method of claim 1, wherein the computing core words included in each question in the first set comprises:

calculating word vectors corresponding to words included in question sentences in the first set by using an ELMO model;

calculating sentence vectors of questions in the first set by using the smooth word-reversing frequency model;

calculating the Euclidean distance between the word vector and the sentence vector;

and when the Euclidean distance is smaller than a threshold value, determining that the word corresponding to the word vector is a core word in the question.

4. The method according to claim 1, wherein the obtaining a plurality of candidate questions similar to the target question and answer information thereof includes:

retrieving M question-answer information similar to the target question in a database;

sorting the M question-answer information according to the similarity of the questions;

and selecting N question-answering information with highest question similarity ranking as candidate question and answer information thereof.

5. An intelligent device, comprising:

the controller is configured to perform:

6. The smart device of claim 5, wherein the new statement in the second set is represented as:

new sentence=question+core word sequence

7. The smart device of claim 5, wherein the controller calculates core words included in each question in the first set, comprising:

8. The smart device of claim 5, wherein the controller obtains a number of candidate questions and answer information thereof that are similar to the target question, comprising: