CN111858856A

CN111858856A - Multi-round search type chatting method and display equipment

Info

Publication number: CN111858856A
Application number: CN202010717270.XA
Authority: CN
Inventors: 连欢; 柳志德; 朱飞
Original assignee: Hisense Electronic Technology Wuhan Co ltd
Current assignee: Hisense Electronic Technology Wuhan Co ltd
Priority date: 2020-07-23
Filing date: 2020-07-23
Publication date: 2020-10-30

Abstract

The application discloses a multi-round search type chatting method and display equipment, which can improve the use experience of a user in multi-round chatting. The method comprises the following steps: obtaining a chat statement sent by a user, and determining a confidence coefficient corresponding to a preset theme classification of the chat statement; when the highest confidence coefficient is not smaller than a first preset value, determining a preset topic classification corresponding to the highest confidence coefficient as an alternative topic classification; when the highest confidence coefficient is smaller than a first preset value and not smaller than a second preset value, determining that the preset topic classification corresponding to the confidence coefficient in the preset sequence is an alternative topic classification; screening candidate replies from a multi-round corpus database according to the alternative topic classification and the preset character attribute; according to the history chatting records and the candidate replies, screening one reply from the candidate replies as a reply sentence by using an interactive matching network model; and when the highest confidence coefficient is smaller than a second preset value, determining the universal chat sentence as a reply sentence.

Description

Multi-round search type chatting method and display equipment

Technical Field

The application relates to the technical field of intelligent chatting, in particular to a multi-round search type chatting method and display equipment.

Background

Currently, a chat robot can implement a single round of conversation with a user, for example, the user: i kicked the ball foot earlier and injured, so good. The robot comprises: then the couple of days is good.

However, it is often not effective when the user wishes to have multiple rounds of conversation with the chat robot. For example, the user: i kicked the ball foot earlier and injured, so good. The robot comprises: then the couple of days is good. The user: i want to stay exercising. The robot comprises: to play football. Through the above dialog, it can be seen that in the second round of dialog with the robot, the reply sentence of the robot, the foot in the first round of dialog with the user is injured, which creates a contradiction, and further, the chatting effect is greatly reduced. Therefore, those skilled in the art need to solve the problem of poor effect of multiple rounds of conversation.

Disclosure of Invention

The embodiment of the application provides a multi-round search type chatting method and display equipment, which are used for improving the use experience of a user in multi-round chatting.

In a first aspect, there is provided a display device comprising:

a display;

a controller for performing: obtaining a chat statement sent by a user, and determining a confidence coefficient corresponding to a preset theme classification of the chat statement; when the highest confidence coefficient is not smaller than a first preset value, determining a preset topic classification corresponding to the highest confidence coefficient as an alternative topic classification; when the highest confidence coefficient is smaller than a first preset value and not smaller than a second preset value, determining that the preset topic classification corresponding to the confidence coefficient in the preset sequence is an alternative topic classification; screening out candidate replies from a multi-round corpus database according to the candidate topic classification and the preset character attribute; according to the history chatting records and the candidate replies, screening one reply from the candidate replies as a reply sentence by using an interactive matching network model;

and when the highest confidence coefficient is smaller than a second preset value, determining the universal chat sentence as a reply sentence.

In some embodiments, the controller is configured to determine the confidence level corresponding to the preset topic classification of the chat statement according to the following steps: and determining confidence corresponding to the preset topic classification by the chat statement by using a convolutional neural network text classification model.

In some embodiments, the determining process of the multi-round corpus database includes:

acquiring original multi-round corpora;

performing emotion analysis on each sentence in the original multi-turn corpus to obtain a qualified sentence;

splitting the qualified sentences to obtain multiple rounds of corpora in a multiple round corpus database;

determining the topic classification and the character attribute of each sentence in the multi-turn corpus;

and storing the multiple rounds of linguistic data and the topic classification and the character attribute corresponding to each sentence in a multiple round of linguistic data database.

In some embodiments, the method for obtaining a qualified sentence by performing emotion analysis on each sentence in an original multi-turn corpus includes:

segmenting the sentence to obtain chat segments;

determining emotion words in the chat participles and degree words and/or negative words corresponding to the emotion words;

determining a score corresponding to the sentence by using a weighting algorithm according to the degree words and/or the negative words and the emotional words;

and if the score reaches a preset score, determining that the statement is a qualified statement.

In some embodiments, the controller is further configured to perform: and updating the historical chat records by using the chat sentences and reply sentences screened from the candidate replies.

In some embodiments, the determining of the original multi-round corpus comprises:

obtaining candidate sentences randomly selected by a first chat robot, and determining whether the candidate sentences have sensitive words and/or negative words;

if the candidate sentence has sensitive words and/or negative words, repeatedly executing the candidate sentence randomly selected by the first chat robot;

if the candidate sentence has no sensitive words and/or negative words, determining that the candidate sentence is one sentence in the original multi-sentence materials, transmitting the candidate sentence to a second chat robot, enabling the second chat robot to output a corresponding reply sentence according to the candidate sentence, taking the reply sentence as the candidate sentence, and repeatedly executing the step of determining whether the candidate sentence has the sensitive words and/or negative words.

In a second aspect, a multi-round search chat method is provided, the method comprising:

obtaining a chat statement sent by a user, and determining a confidence coefficient corresponding to a preset theme classification of the chat statement; when the highest confidence degree is not smaller than a first preset value, determining a preset topic classification corresponding to the highest confidence degree as an alternative topic classification; when the highest confidence coefficient is smaller than a first preset value and not smaller than a second preset value, determining that the preset topic classification corresponding to the confidence coefficient in the preset sequence is an alternative topic classification; screening candidate replies from a multi-round speech material database according to the alternative theme classification and the preset character attribute; according to the historical chat records and the candidate replies, screening one reply from the candidate replies as a reply sentence by using an interactive matching network model;

In some embodiments, the confidence level corresponding to the preset topic classification of the chat statement includes: and determining confidence degrees corresponding to preset topic classifications by the chat sentences by using a convolutional neural network text classification model.

acquiring original multi-round corpora;

segmenting the sentence to obtain chat segments;

In the embodiment, the multi-round search type chatting method and the display device improve the use experience of the user in the multi-round chatting. The method comprises the following steps: obtaining a chat statement sent by a user, and determining a confidence coefficient corresponding to a preset theme classification of the chat statement; when the highest confidence coefficient is not smaller than a first preset value, determining a preset topic classification corresponding to the highest confidence coefficient as an alternative topic classification; when the highest confidence coefficient is smaller than a first preset value and not smaller than a second preset value, determining that the preset topic classification corresponding to the confidence coefficients in the preset sequence is an alternative topic classification; screening candidate replies from a multi-round corpus database according to the alternative topic classification and the preset character attribute; according to the history chatting records and the candidate replies, screening one reply from the candidate replies as a reply sentence by using an interactive matching network model; and when the highest confidence coefficient is smaller than a second preset value, determining the universal chat sentence as a reply sentence.

Drawings

Fig. 1A is a schematic diagram illustrating an operation scenario between a display device and a control apparatus;

fig. 1B is a block diagram schematically illustrating a configuration of the control apparatus 100 in fig. 1A;

fig. 1C is a block diagram schematically illustrating a configuration of the display device 200 in fig. 1A;

FIG. 1D is a block diagram illustrating an architectural configuration of an operating system in memory of display device 200;

a user interface diagram according to some embodiments is illustrated in fig. 2;

FIG. 3 is a flow chart illustrating a method of multi-round retrieval chat;

FIG. 4 is a flow chart illustrating another method of multi-round search chat;

FIG. 5 is a flow chart illustrating yet another method of multi-round retrieval chat;

a flow chart of yet another multi-round retrieval chat method is illustrated in fig. 6.

Detailed Description

To make the purpose, technical solutions and advantages of the exemplary embodiments of the present application clearer, the technical solutions in the exemplary embodiments of the present application will be clearly and completely described below with reference to the drawings in the exemplary embodiments of the present application, and it is obvious that the described exemplary embodiments are only a part of the embodiments of the present application, and not all embodiments.

All other embodiments, which can be derived by a person skilled in the art from the exemplary embodiments shown in the present application without inventive step, are within the scope of protection of the present application. In addition, while the disclosure herein has been presented in terms of one or more exemplary embodiments, it should be appreciated that individual aspects of the disclosure can be utilized independently and separately from one another in a fully enabling fashion.

As used in this application, the terms "comprises" and "comprising," as well as any variations thereof, are intended to cover a non-exclusive inclusion, such that a product or apparatus that comprises a list of elements is not necessarily limited to those elements explicitly listed, but may include other elements not expressly listed or inherent to such product or apparatus.

The term "module" as used herein refers to any known or later developed hardware, software, firmware, artificial intelligence, fuzzy logic, or combination of hardware and/or software code that is capable of performing the functionality associated with that element.

The term "gesture" as used in this application refers to a user's behavior through a change in hand shape or an action such as hand motion to convey a desired idea, action, purpose, or result.

Fig. 1A is a schematic diagram illustrating an operation scenario between the display device 200 and the control apparatus 100. As shown in fig. 1A, the control apparatus 100 and the display device 200 may communicate with each other in a wired or wireless manner.

Among them, the control apparatus 100 is configured to control the display device 200, which can receive an operation instruction input by a user and convert the operation instruction into an instruction recognizable and responsive by the display device 200, serving as an intermediary for interaction between the user and the display device 200. Such as: the user operates the channel up/down key on the control device 100, and the display device 200 responds to the channel up/down operation.

The control device 100 may be a remote controller 100A, which includes infrared protocol communication or bluetooth protocol communication, and other short-distance communication methods, etc. to control the display apparatus 200 in a wireless or other wired manner. The user may input a user instruction through a key on a remote controller, voice input, control panel input, etc., to control the display apparatus 200. Such as: the user can input a corresponding control command through a volume up/down key, a channel control key, up/down/left/right moving keys, a voice input key, a menu key, a power on/off key, etc. on the remote controller to implement the function of controlling the display device 200.

The control device 100 may also be an intelligent device, such as a mobile terminal 100B, a tablet computer, a notebook computer, and the like. For example, the display device 200 is controlled using an application program running on the smart device. The application program may provide various controls to a user through an intuitive User Interface (UI) on a screen associated with the smart device through configuration.

For example, the mobile terminal 100B may install a software application with the display device 200 to implement connection communication through a network communication protocol for the purpose of one-to-one control operation and data communication. Such as: the mobile terminal 100B may be caused to establish a control instruction protocol with the display device 200 to implement the functions of the physical keys as arranged in the remote control 100A by operating various function keys or virtual buttons of the user interface provided on the mobile terminal 100B. The audio and video content displayed on the mobile terminal 100B may also be transmitted to the display device 200, so as to implement the synchronous display function.

The display apparatus 200 may be implemented as a television, and may provide an intelligent network television function of a broadcast receiving television function as well as a computer support function. Examples of the display device include a digital television, a web television, a smart television, an Internet Protocol Television (IPTV), and the like.

The display device 200 may be a liquid crystal display, an organic light emitting display, a projection display device. The specific display device type, size, resolution, etc. are not limited.

The display apparatus 200 also performs data communication with the server 300 through various communication means. Here, the display apparatus 200 may be allowed to be communicatively connected through a Local Area Network (LAN), a Wireless Local Area Network (WLAN), and other networks. The server 300 may provide various contents and interactions to the display apparatus 200. By way of example, the display device 200 may send and receive information such as: receiving Electronic Program Guide (EPG) data, receiving software program updates, or accessing a remotely stored digital media library. The servers 300 may be a group or groups of servers, and may be one or more types of servers. Other web service contents such as a video on demand and an advertisement service are provided through the server 300.

Fig. 1B is a block diagram illustrating the configuration of the control device 100. As shown in fig. 1B, the control device 100 includes a controller 110, a memory 120, a communicator 130, a user input interface 140, an output interface 150, and a power supply 160.

The controller 110 includes a Random Access Memory (RAM)111, a Read Only Memory (ROM)112, a processor 113, a power-on interface, and a communication bus. The controller 110 is used to control the operation of the control device 100, as well as the internal components of the communication cooperation, external and internal data processing functions.

Illustratively, the cavity controller 110 may control to generate a signal corresponding to the detected interaction and transmit the signal to the display device 200 when an interaction of a user pressing a key disposed on the remote controller 100A or an interaction of touching a touch panel disposed on the remote controller 100A is detected.

And a memory 120 for storing various operation programs, data and applications for driving and controlling the control apparatus 100 under the control of the controller 110. The memory 120 may store various control signal commands input by a user.

The communicator 130 enables communication of control signals and data signals with the display apparatus 200 under the control of the controller 110. Such as: the control apparatus 100 transmits a control signal (e.g., a touch signal or a button signal) to the display device 200 via the communicator 130, and the control apparatus 100 may receive the signal transmitted by the display device 200 via the communicator 130. The communicator 130 may include an infrared signal interface 131 and a radio frequency signal interface 132. For example: when the infrared signal interface is used, the user input instruction needs to be converted into an infrared control signal according to an infrared control protocol, and the infrared control signal is sent to the display device 200 through the infrared sending module. The following steps are repeated: when the rf signal interface is used, a user input command needs to be converted into a digital signal, and then the digital signal is modulated according to the rf control signal modulation protocol and then transmitted to the display device 200 through the rf transmitting terminal.

The user input interface 140 may include at least one of a microphone 141, a touch pad 142, a sensor 143, a key 144, and the like, so that a user can input a user instruction regarding controlling the display apparatus 200 to the control apparatus 100 through voice, touch, gesture, press, and the like.

The output interface 150 outputs a user instruction received by the user input interface 140 to the display apparatus 200, or outputs an image or voice signal received by the display apparatus 200. Here, the output interface 150 may include an LED interface 151, a vibration interface 152 generating vibration, a sound output interface 153 outputting sound, a display 154 outputting an image, and the like. For example, the remote controller 100A may receive an output signal such as audio, video, or data from the output interface 150, and display the output signal in the form of an image on the display 154, an audio on the sound output interface 153, or a vibration on the vibration interface 152.

And a power supply 160 for providing operation power support for each element of the control device 100 under the control of the controller 110. In the form of a battery and associated control circuitry.

A hardware configuration block diagram of the display device 200 is exemplarily illustrated in fig. 1C. As shown in fig. 1C, the display apparatus 200 may further include a tuner demodulator 210, a communicator 220, a detector 230, an external device interface 240, a controller 250, a memory 260, a user interface 265, a video processor 270, a display 275, an audio processor 280, an audio input interface 285, and a power supply 290.

The tuner demodulator 210 receives the broadcast television signal in a wired or wireless manner, may perform modulation and demodulation processing such as amplification, mixing, and resonance, and is configured to demodulate, from a plurality of wireless or wired broadcast television signals, an audio/video signal carried in a frequency of a television channel selected by a user, and additional information (e.g., EPG data).

The tuner demodulator 210 is responsive to the user selected frequency of the television channel and the television signal carried by the frequency, as selected by the user and controlled by the controller 250.

The tuner demodulator 210 can receive a television signal in various ways according to the broadcasting system of the television signal, such as: terrestrial broadcasting, cable broadcasting, satellite broadcasting, internet broadcasting, or the like; and according to different modulation types, a digital modulation mode or an analog modulation mode can be adopted; and can demodulate the analog signal and the digital signal according to the different kinds of the received television signals.

In other exemplary embodiments, the tuning demodulator 210 may also be in an external device, such as an external set-top box. In this way, the set-top box outputs a television signal after modulation and demodulation, and inputs the television signal into the display apparatus 200 through the external device interface 240.

The communicator 220 is a component for communicating with an external device or an external server according to various communication protocol types. For example, the display apparatus 200 may transmit content data to an external apparatus connected via the communicator 220, or browse and download content data from an external apparatus connected via the communicator 220. The communicator 220 may include a network communication protocol module or a near field communication protocol module, such as a WIFI module 221, a bluetooth module 222, and a wired ethernet module 223, so that the communicator 220 may receive a control signal of the control device 100 according to the control of the controller 250 and implement the control signal as a WIFI signal, a bluetooth signal, a radio frequency signal, and the like.

The detector 230 is a component of the display apparatus 200 for collecting signals of an external environment or interaction with the outside. The detector 230 may include an image collector 231, such as a camera, a video camera, etc., which may be used to collect external environment scenes to adaptively change the display parameters of the display device 200; and the function of acquiring the attribute of the user or interacting gestures with the user so as to realize the interaction between the display equipment and the user. A light receiver 232 may also be included to collect ambient light intensity to adapt to changes in display parameters of the display device 200, etc.

In some other exemplary embodiments, the detector 230 may further include a temperature sensor, such as by sensing an ambient temperature, and the display device 200 may adaptively adjust a display color temperature of the image. For example, when the temperature is higher, the display apparatus 200 may be adjusted to display a color temperature of an image that is cooler; when the temperature is lower, the display device 200 can be adjusted to display the warm tone of the image.

In some other exemplary embodiments, the detector 230, which may further include a sound collector, such as a microphone, may be configured to receive a sound of a user, such as a voice signal of a control instruction of the user to control the display device 200; alternatively, ambient sounds may be collected that identify the type of ambient scene, enabling the display device 200 to adapt to ambient noise.

The external device interface 240 is a component for providing the controller 210 to control data transmission between the display apparatus 200 and an external apparatus. The external device interface 240 may be connected with an external apparatus such as a set-top box, a game device, a notebook computer, etc. in a wired/wireless manner, and may receive data such as a video signal (e.g., moving image), an audio signal (e.g., music), additional information (e.g., EPG), etc. of the external apparatus.

The external device interface 240 may include: a High Definition Multimedia Interface (HDMI) terminal 241, a Composite Video Blanking Sync (CVBS) terminal 242, an analog or digital Component terminal 243, a Universal Serial Bus (USB) terminal 244, a Component terminal (not shown), a red, green, and blue (RGB) terminal (not shown), and the like.

The controller 250 controls the operation of the display apparatus 200 and responds to the operation of the user by running various software control programs (e.g., an operating system and various application programs) stored on the memory 260.

As shown in FIG. 1C, controller 250 includes Random Access Memory (RAM)251, Read Only Memory (ROM) 252, graphics processor 253, processor 254, power-on interface 255, and communication bus 256. The RAM251, the ROM252, the graphic processor 253, and the power interface 255 of the processor 254 are connected by a communication bus 256.

The ROM252 stores various system boot instructions. When the power-on signal is received, the display apparatus 200 starts to be powered on, and the processor 254 executes the system boot instruction in the ROM252 and copies the operating system stored in the memory 260 to the RAM251 to start running the boot operating system. After the start of the operating system is completed, the processor 254 copies the various applications in the memory 260 to the RAM251 and then starts running the various applications.

A graphic processor 253 for generating screen images of various graphic objects such as icons, images, and operation menus. The graphic processor 253 may include an operator for performing an operation by receiving various interactive instructions input by a user, and further displaying various objects according to display attributes; and a renderer for generating various objects based on the operator and displaying the rendered result on the display 275.

A processor 254 for executing operating system and application program instructions stored in memory 260. And executing processing of various applications, data and contents according to the received user input instruction so as to finally display and play various audio-video contents.

In some demonstrative embodiments, processor 254 may include a plurality of processors. The plurality of processors may include one main processor and a plurality of or one sub-processor. A main processor for performing some initialization operations of the display apparatus 200 in the display apparatus preload mode and/or operations of displaying a screen in the normal mode. A plurality of or a sub-processor for performing an operation in a standby mode or the like of the display device.

The power-up interface 255 may include a first interface through an nth interface. These interfaces may be network interfaces that are connected to external devices via a network.

The controller 250 may control the overall operation of the display apparatus 200. For example: in response to receiving a user input command for selecting a GUI object displayed on the display 275, the controller 250 may perform an operation related to the object selected by the user input command.

Where the object may be any one of the selectable objects, such as a hyperlink or an icon. The operation related to the selected object is, for example, an operation of displaying a link to a hyperlink page, document, image, or the like, or an operation of executing a program corresponding to an icon. The user input command for selecting the GUI object may be a command input through various input means (e.g., a mouse, a keyboard, a touch panel, etc.) connected to the display apparatus 200 or a voice command corresponding to a user's spoken voice.

A memory 260 for storing various types of data, software programs, or applications for driving and controlling the operation of the display device 200. The memory 260 may include volatile and/or nonvolatile memory. And the term "memory" includes the memory 260, the RAM251 and the ROM252 of the controller 250, or a memory card in the display device 200.

In some embodiments, the memory 260 is specifically used for storing an operation program for driving the controller 250 of the display device 200; storing various application programs built in the display apparatus 200 and downloaded by a user from an external apparatus; data such as visual effect images for configuring various GUIs provided by the display 275, various objects related to the GUIs, and selectors for selecting GUI objects are stored.

In some embodiments, the memory 260 is specifically configured to store drivers and related data for the tuner demodulator 210, the communicator 220, the detector 230, the external device interface 240, the video processor 270, the display 275, the audio processor 280, and the like, external data (e.g., audio-visual data) received from the external device interface, or user data (e.g., key information, voice information, touch information, and the like) received from the user interface.

In some embodiments, memory 260 specifically stores software and/or programs representing an Operating System (OS), which may include, for example: a kernel, middleware, an Application Programming Interface (API), and/or an application program. Illustratively, the kernel may control or manage system resources, as well as functions implemented by other programs (e.g., the middleware, APIs, or applications); at the same time, the kernel may provide an interface to allow middleware, APIs, or applications to access the controller to enable control or management of system resources.

A block diagram of the architectural configuration of the operating system in the memory of the display device 200 is illustrated in fig. 1D. The operating system architecture comprises an application layer, a middleware layer and a kernel layer from top to bottom.

The application layer, the application programs built in the system and the non-system-level application programs belong to the application layer and are responsible for direct interaction with the user. The application layer may include a plurality of applications such as NETFLIX applications, setup applications, media center applications, and the like. These applications may be implemented as Web applications that execute based on a WebKit engine, and in particular may be developed and executed based on HTML, Cascading Style Sheets (CSS), and JavaScript.

Here, HTML, which is called HyperText Markup Language (HyperText Markup Language), is a standard Markup Language for creating web pages, and describes the web pages by Markup tags, where the HTML tags are used to describe characters, graphics, animation, sound, tables, links, etc., and a browser reads an HTML document, interprets the content of the tags in the document, and displays the content in the form of web pages.

CSS, known as Cascading Style Sheets (Cascading Style Sheets), is a computer language used to represent the Style of HTML documents, and may be used to define Style structures, such as fonts, colors, locations, etc. The CSS style can be directly stored in the HTML webpage or a separate style file to realize the control of the style in the webpage.

JavaScript, a language applied to Web page programming, can be inserted into an HTML page and interpreted and executed by a browser. The interaction logic of the Web application is realized by JavaScript. The JavaScript can package the JavaScript extension interface through the browser to realize the communication with the kernel layer,

the middleware layer may provide some standardized interfaces to support the operation of various environments and systems. For example, the middleware layer may be implemented as multimedia and hypermedia information coding experts group (MHEG) middleware related to data broadcasting, DLNA middleware which is middleware related to communication with an external device, middleware which provides a browser environment in which each application program in the display device operates, and the like.

The kernel layer provides core system services, such as: file management, memory management, process management, network management, system security authority management and the like. The kernel layer may be implemented as a kernel based on various operating systems, for example, a kernel based on the Linux operating system.

The kernel layer also provides communication between system software and hardware, and provides device driver services for various hardware, such as: provide display driver for the display, provide camera driver for the camera, provide button driver for the remote controller, provide wiFi driver for the WIFI module, provide audio driver for audio output interface, provide power management drive for Power Management (PM) module etc..

A user interface 265 receives various user interactions. Specifically, it is used to transmit an input signal of a user to the controller 250 or transmit an output signal from the controller 250 to the user. For example, the remote controller 100A may transmit an input signal input by a user, such as a power switch signal, a channel selection signal, a volume adjustment signal, etc., to the user interface 265, and then the input signal is forwarded to the controller 250 through the user interface 265; alternatively, the remote controller 100A may receive an output signal such as audio, video, or data output from the user interface 265 via the controller 250, and display the received output signal or output the received output signal in audio or vibration form.

In some embodiments, a user may enter user commands on a Graphical User Interface (GUI) displayed on the display 275, and the user interface 265 receives the user input commands through the GUI. Specifically, the user interface 265 may receive user input commands for controlling the position of a selector in the GUI to select different objects or items.

Alternatively, the user may input a user command by inputting a specific sound or gesture, and the user interface 265 receives the user input command by recognizing the sound or gesture through the sensor.

The video processor 270 is configured to receive an external video signal, and perform video data processing such as decompression, decoding, scaling, noise reduction, frame rate conversion, resolution conversion, and image synthesis according to a standard codec protocol of the input signal, so as to obtain a video signal that is directly displayed or played on the display 275.

Illustratively, the video processor 270 includes a demultiplexing module, a video decoding module, an image synthesizing module, a frame rate conversion module, a display formatting module, and the like.

The demultiplexing module is configured to demultiplex an input audio/video data stream, where, for example, an input MPEG-2 stream (based on a compression standard of a digital storage media moving image and voice), the demultiplexing module demultiplexes the input audio/video data stream into a video signal and an audio signal.

And the video decoding module is used for processing the video signal after demultiplexing, including decoding, scaling and the like.

And the image synthesis module is used for carrying out superposition mixing processing on the GUI signal input by the user or generated by the user and the video image after the zooming processing by the graphic generator so as to generate an image signal for display.

The frame rate conversion module is configured to convert a frame rate of an input video, for example, convert a frame rate of an input 60Hz video into a frame rate of 120Hz or 240Hz, where a common format is implemented by using, for example, an interpolation frame method.

And a display formatting module for converting the signal output by the frame rate conversion module into a signal conforming to a display format of a display, such as converting the format of the signal output by the frame rate conversion module to output an RGB data signal.

And a display 275 for receiving the image signal from the output of the video processor 270 and displaying video, image and menu manipulation interfaces. For example, the display may display video from a broadcast signal received by the tuner demodulator 210, may display video input from the communicator 220 or the external device interface 240, and may display an image stored in the memory 260. The display 275, while displaying a user manipulation interface UI generated in the display apparatus 200 and used to control the display apparatus 200.

And, the display 275 may include a display screen assembly for presenting a picture and a driving assembly for driving the display of an image. Alternatively, a projection device and projection screen may be included, provided display 275 is a projection display.

The audio processor 280 is configured to receive an external audio signal, decompress and decode the received audio signal according to a standard codec protocol of the input signal, and perform audio data processing such as noise reduction, digital-to-analog conversion, and amplification processing to obtain an audio signal that can be played by the speaker 286.

Illustratively, audio processor 280 may support various audio formats. Such as MPEG-2, MPEG-4, Advanced Audio Coding (AAC), high efficiency AAC (HE-AAC), and the like.

Audio output interface 285 receives audio signals from the output of audio processor 280. For example, the audio output interface may output audio in a broadcast signal received via the tuner demodulator 210, may output audio input via the communicator 220 or the external device interface 240, and may output audio stored in the memory 260. The audio output interface 285 may include a speaker 286, or an external audio output terminal 287, such as an earphone output terminal, that outputs to a generating device of an external device.

In other exemplary embodiments, video processor 270 may comprise one or more chips. Audio processor 280 may also comprise one or more chips.

And, in other exemplary embodiments, the video processor 270 and the audio processor 280 may be separate chips or may be integrated with the controller 250 in one or more chips.

And a power supply 290 for supplying power supply support to the display apparatus 200 from the power input from the external power source under the control of the controller 250. The power supply 290 may be a built-in power supply circuit installed inside the display apparatus 200 or may be a power supply installed outside the display apparatus 200.

Fig. 2 illustrates a schematic diagram of a GUI provided by the display apparatus 200.

As shown in fig. 2, the display device may provide a GUI400 to the display, the GUI400 including a chat interface between the display device and the user including dialog boxes between the display device and the user. Illustratively,

dialog boxes

41 and 43 are user dialog boxes, and

dialog boxes

42 and 44 are display device dialog boxes. And a user identifier or a display device identifier is also displayed at a position corresponding to each dialog box on the interactive interface, and the user identifier and the display device identifier can be represented by characters, pictures or other contents.

In order to avoid the problem of poor effect of multiple rounds of conversations, an embodiment of the present application provides a multiple round of retrieval chat method, as shown in fig. 3, the method includes:

s101, obtaining the chat sentences sent by the user. And S102, determining the confidence corresponding to the preset topic classification of the chat statement. Specifically, the confidence level may represent a degree to which a preset topic classification corresponds to the chat statement. The higher the confidence level is, the higher the probability that the preset topic classification corresponds to the chat sentence is, and the lower the confidence level is, the lower the probability that the preset topic classification corresponds to the chat sentence is.

In some embodiments, the confidence level corresponding to the preset topic classification of the chat statement includes: the chat statement determines a confidence corresponding to the preset topic classification by using a Convolutional Neural network text classification model (CNN). Illustratively, the number of the preset topic classifications used for training is 40, and in this embodiment of the present application, the confidence level corresponding to the 40 preset topic classifications is determined through the convolutional neural network text classification model. In addition, in the embodiment of the application, the convolutional neural network text classification model is obtained by performing classification training on a plurality of sentences and corresponding preset topics.

The confidence degrees corresponding to the multiple preset topic classifications are sorted according to the sizes, and illustratively, if the number of the preset topic classifications is 40, the sizes of the 40 confidence degrees are sorted.

S103, when the highest confidence coefficient is not smaller than the first preset value, determining the preset topic classification corresponding to the highest confidence coefficient as a candidate topic classification. Illustratively, the first preset value may be 0.9. And if the preset topic classification corresponding to the highest confidence coefficient is the sports topic classification, determining the sports topic classification as the alternative topic classification.

And S104, when the highest confidence coefficient is smaller than the first preset value and not smaller than the second preset value, determining the preset topic classification corresponding to the confidence coefficients in the preset sequence as an alternative topic classification. For example, the first preset value may be 0.9, the second preset value may be 0.5, and the preset sequence is the first three digits. And if the first three preset theme classifications with confidence degrees arranged according to the sizes are a sports theme classification, a music theme classification and a sports theme classification, determining the sports theme classification, the music theme classification and the sports theme classification as alternative theme classifications.

And S105, when the highest confidence coefficient is smaller than a second preset value, determining the universal chat statement as a reply statement. The general chat sentence is preset reply content, and for example, the general chat sentence can be that i seems to hear unconsciously, or that i is learning, please talk again, and the like.

S106, screening out candidate replies from the multi-turn corpus database according to the alternative theme classification and the preset character attribute. The preset personality attribute is preset content, and exemplarily, the preset personality attribute may be a personality attribute such as loveliness or funny, so that a requirement of a user for replying to a sentence can be met. The number of the candidate replies is a preset number, and for example, the preset number may be 10. When the candidate topic classification and the preset character attribute are adopted, the corresponding 10 sentences are searched in the multi-round corpus database by utilizing ES retrieval (distributed full-text search) matching and serve as 10 candidate replies.

In some embodiments, the determining process of the corpus database, as shown in fig. 4, includes:

s601, obtaining original multi-round corpora. In some embodiments, the original multi-round corpus comprises at least one or more of a public chat data set, chat data obtained by crawling content on a website by a crawler, and multi-round corpus obtained by two chat robot dialogues. It should be noted that, in order to enrich the original multi-round corpus, two chat robot dialogues are added in the embodiment of the present application. Of course, the embodiments of the present application are not limited to the above three methods for obtaining the original multi-round speech, and any method may be used without departing from the purpose of the present application.

In addition, the process of obtaining multiple rounds of corpora through the conversation of two chat robots, as shown in fig. 5, includes:

s6011, candidate sentences randomly selected by the first chat robot are obtained. S6012, determining whether the candidate sentences have sensitive words and/or negative words. In the embodiment of the application, one of the two chat robots needs to have a function of multi-turn chat, so that the conversation between the two chat robots can be ensured to be coherent.

S6013, if the candidate sentences have sensitive words and/or negative words, the step S6011 of obtaining the candidate sentences randomly selected by the first chat robot is repeatedly executed. Specifically, in order to ensure that the output reply sentence is positive, when the candidate sentence selected by the chat robot is a sentence with sensitive words and/or negative words, the candidate sentence is discarded, and a candidate sentence is reselected.

S6013, if the candidate sentence does not have the sensitive word and/or the negative word, determining that the candidate sentence is one sentence in the original multi-turn corpus, transmitting the candidate sentence to a second chat robot, enabling the second chat robot to output a corresponding reply sentence according to the candidate sentence, taking the reply sentence as the candidate sentence, and repeatedly executing S6012 and determining whether the candidate sentence has the sensitive word and/or the negative word.

It should be noted that if there is no sensitive word and/or negative word in the candidate sentence, the chat between the first chat robot and the second chat robot is performed from the candidate sentence. And if the reply sentence of the second chat robot is used as the candidate sentence, repeatedly executing the step of judging whether the candidate sentence has the sensitive word and/or the negative word. In this way, the first chat robot and the second chat robot take the chat contents between the first chat robot and the second chat robot as original multi-round corpora through multi-round chatting.

Processing the original multi-turn corpus because the corpus which does not meet the requirement possibly exists in the original multi-turn corpus, wherein the processing step comprises the following steps:

s602, performing emotion analysis on each sentence in the original multi-turn corpus to obtain a qualified sentence.

The method for obtaining the qualified sentences by performing emotion analysis on each sentence in the original multi-turn corpus comprises the following steps:

and segmenting the sentence to obtain chat segmentation words. In this step, the word obtained by word segmentation may be compared with the word segmentation dictionary, and if the word is the same as the word in the word segmentation dictionary, the word is determined to be chat word segmentation. Punctuation and abnormal symbols do not exist in the chat participle.

Determining emotion words in the chat participles and degree words and/or negative words corresponding to the emotion words.

And determining the corresponding score of the sentence by using a weighting algorithm according to the degree words and/or the negative words and the emotional words.

In the embodiments of the present application, the term of degree may include "very", "extremely", and the term of negation may include "no", and the like. The degree words and the negative words represent weighted weights when calculating the scores, the degree words are positive numbers, and the negative words are negative numbers. The emotional words may be positive or negative, for example, the open heart is positive and the injured heart is negative.

And if the score reaches a preset score, determining that the statement is a qualified statement. For example, the preset score may be 0, and when the score is greater than or equal to 0, the sentence is determined to be a qualified sentence. When judging whether the degree word corresponding to the emotional word is the fixed word or not, the position information of the emotional word can be used for determining.

And if the score does not reach the preset score, determining that the statement is an unqualified statement.

In addition, emotion analysis can also be regarded as a text classification process. Due to the fact that the structure of the BilSTM (bidirectional Long-Short Term Memory Bi-directional Long-Short Term Memory) is time-consuming, in order to reduce calculation time consumption and improve the speed of predicting reply and further optimize user experience, methods such as cudnLSTM and parameter adjustment are adopted for time-consuming optimization. This also results in a qualified statement.

And S603, splitting the qualified sentences to obtain multiple rounds of corpora in a multiple round corpus database. Illustratively, the qualified sentences comprise a qualified sentence A, a qualified sentence B, a qualified sentence C, a qualified sentence D, a qualified sentence E and a qualified sentence F which are output continuously, so that multiple rounds of corpora can be obtained, wherein the qualified sentences comprise the qualified sentence A and the qualified sentence B; qualified statement B, qualified statement C; qualified statement C, qualified statement D; a qualified statement D and a qualified statement E; a qualified statement E, a qualified statement F; the qualified statement A, the qualified statement B, the qualified statement C and the qualified statement D; a qualified statement B, a qualified statement C, a qualified statement D and a qualified statement E; a qualified statement C, a qualified statement D, a qualified statement E and a qualified statement F; and the qualified sentence A, the qualified sentence B, the qualified sentence C, the qualified sentence D, the qualified sentence E, the qualified sentence F and the like can obtain a plurality of groups of multi-turn corpora. It should be noted that, processing the original multiple rounds of corpora can be completed offline. If the qualified sentences are not continuously output, namely the questions and answers are not continuously asked in a dialogue mode, the unqualified sentences among the qualified sentences can be modified into the qualified sentences and then split again. Or abandoning the qualified sentences corresponding to the unqualified sentences and then splitting. Illustratively, the sentences in the original multi-round corpus include sentences r, s, t, u, v and w, where the sentences r, t, u, v and w are qualified sentences and the sentence s is a non-qualified sentence, and at this time, the sentence s and the sentence s can be discarded as a conversation r, the sentences t, u, v and w can be split directly, or the sentence s can be modified into a qualified sentence, and then the sentences r, s, t, u, v and w are split.

S604, determining the subject classification and the character attribute of each sentence in the multi-turn corpus. Illustratively, the subject classification and personality attributes of the qualified sentences a, B, C, D, E and F are determined.

S605, storing the multiple rounds of linguistic data and the topic classification and the character attribute corresponding to each sentence in the multiple rounds of linguistic data database. According to the above example, the multi-round corpus database includes a qualified sentence a, a qualified sentence B, a qualified sentence C, a qualified sentence D, a qualified sentence E, a qualified sentence F, and their corresponding topic classifications and personality attributes.

And S107, screening one reply from the candidate replies as a reply sentence by utilizing an interactive matching network model according to the historical chat records and the candidate replies. The historical chat log is a log of multiple rounds of conversations between the user and the display device. For example, the user: hello; a display device: you like; the user: i want to go out to play. The above are all history chatting records of the current chatting. For my thought of the user to go out to play, the reply sentence needs to be screened out from the candidate replies. In order to avoid poor multi-round conversation effect between a user and display equipment, the embodiment of the application utilizes historical chat records and candidate replies, and selects one reply from the candidate replies as a reply sentence through an interactive matching network model, so that the user can say that the user wants to go out to play and respond. It should be noted that the chat statements sent by the user are stored in the history chat log, and the reply statements screened from the candidate replies are also stored in the history chat log.

In the embodiment of the application, the interactive matching network model can accurately screen one reply from a plurality of candidate replies to be used as a reply sentence, so that high-quality multi-turn chatting is realized.

The principle of the interactive matching network algorithm is as follows, as shown in fig. 6:

1. word representation layer: and determining a general word vector of each sentence in the historical chat records and the current chat sentences of the user, a word vector of a specific task training set and character vector characteristics as word vector representations. This can reduce the problem of OOV (unregistered words).

The word vector characterization of the kth statement in the historical chat log (Context) can be expressed as follows:

where i represents the ith word in the kth sentence, which is exemplarily: accompany me to chat the bar,

in the form of a vector of co-ordinates,

in the form of a vector of my words,

is a vector of words for a chat,

is a word vector for the bar.

Is the number of words in the kth sentence.

The candidate replies are characterized as follows:

where j represents the jth word in the reply sentence, and the history chat records are, for example, good morning'good morning' and 'what breakfast' and reverts to soybean milk fried bread stick as candidate, at this time, r₁ ⁰Is the word vector of soymilk, r₂ ⁰Is the word vector of the fried bread stick.

2. Sentence coding layer: the semantic features of each statement are extracted by adopting a bidirectional LSTM as a block, each block is subjected to weighted summation to serve as an encoding feature, and a final feature representation is obtained based on an attention mechanism (attention mechanism). The role of Attention is to characterize important information in interaction and reduce the influence of irrelevant information. A new sentence encoding mode AHRE (attention level recursive encoding) is adopted. The Encoder module (coder) synthesizes internal states of internal multilayer RNNs (recurrent neural networks) by using ELMo (tokens from Language models based vectors), utilizes Bi-LSTM of an L layer, and finally synthesizes internal states of each layer as final comprehensive characterization. In the embodiment of the application, L layers of bidirectional LSTM are used, and information of each layer is added, which is different from that only the last layer is used for the relevant model. Illustratively, the first layer information is L1, the second layer is L2.. n.th layer is Ln, and these are weighted together, such as a first layer weight of 0.13, a second layer weight of 0.12,. n.th layer weight of 0.21, which together are 0.13L1+0.12L2 +. . . +0.21 Ln. The weight is the most suitable weight expression obtained in the deep learning model training process and is obtained by learning. In addition, bidirectional lstms (bilstms) serve as basic units. In the L-level RNN, the output of the I-1 level RNN is used as the input of the I-level RNN. The calculation is as follows:

the final overall characterization is as follows:

where i is the number of words in the sentence, U_kIs a representation of the kth contextual dialog statement, R is a representation of a candidate reply,

number of words representing uk, l_rNumber of words representing R.

3. Matching layer: the influence of the history chat history on the choice of reply sentences is taken into account by matching the history chat history as a single sentence with the candidate replies. Firstly, splicing word granularity of final feature representation of semantic coding, and performing cross-attention operation on reply sentences to obtain local correlation features of historical chat records and candidate replies; and performing difference operation and dot product operation to obtain the enhanced features of history chat record-reply sentence augmentation.

1) And (4) carrying out collocation (concatenation) on the comprehensive representations of the n sentences in the historical chat records. Representation of context

Wherein

Presenting a dialog collection

Expressed in the form of a connection.

The formula may stitch the comprehensive characterizations, illustratively, [ 10 ] stitch [ 12 ], which may result in [ 1012 ].

2) Gathering information between two sequences using attention-based alignment, i.e. computing each representing primitive ancestor

The following were used:

furthermore, the calculated attention weight e_ijA two-way correlation between context and response may be obtained.

And the relevance of the historical chatting records and the candidate replies and the attention calculation. The words in the historical chat history and the candidate replies are given a higher weight. "today/weather/good/you/like/what/sports/" "is/me/most/like/badminton" like, badminton is a relative word, and important attention is needed.

3) And interacting the sentences and the candidate replies by using the matching degree information, reconstructing the characteristics of each other, and further fusing the matching information of each other. For a word in a context, its reply-level-related representation is composed of a reply statement and an attention weight e_ijAnd calculating to obtain:

wherein

While

Is that

Weighted summation of (3). In a visual sense, the user can easily understand the situation,

neutralization

The related contents are selected to form

Every word in response (reply sentence) is also the same way of computation, where

4) Fusing the matching information, and in order to strengthen the obtained information, carrying out matching

And

difference and element product calculations are performed separately, and the resulting difference and element product are concatenated with the original vector to obtain an enhanced representation, as follows:

the above formula fuses C before transformation^encAfter transformation

Different points

Same point

5) And splitting the comprehensive information into the granularity of each sentence.

separate was divided into expressions per sentence. And finally obtaining the granularity of each statement in the matching layer.

4. Polymerization layer: extracting the features of the reply sentences from the augmentation matrix as final feature representation r of the reply, then carrying out bidirectional LSTM (Long Short-Term Memory) and posing pooling operation on the augmentation matrix to obtain representation c, and carrying out dot product operation on the (r, c) to obtain a final matching feature vector m.

1) The information after matching is further captured by a BiLSTM (Bidirectional long short term memory network).

r_j＝BiLSTM(R^mat，j)，j∈{1，....，l_r}

2) Combinatorial characterization

3) BiLSTM again

c_k＝BiLSTM(U^agr，k)，k∈{1，...，n}

4) Combining information with response again

c^agr＝[c_max；c_n]

m＝[c^agr；r^agr].

And the timing relation is captured through the LSTM, so that more meaningful information of a higher layer is further obtained. Each time the LSTM is passed, higher-level rich information can be obtained from a hidden state, and then further rich expression is carried out, so that the two-classification problem is solved.

5. Prediction layer: the combined information is input to a classifier (not limited to a multi-layer perceptron, CNN, etc.), and a score s representing the degree of matching layers is output. I.e., the question of whether the candidate reply and the history chat records match is converted into a two-class question with a score ranging between 0 and 1. A score of 0 represents a mismatch and a score of 1 represents a perfect match, so the higher the score, the higher the degree of match between the candidate reply and the historical chat history. And in the training stage, whether s is consistent with labeled label is evaluated in a supervised learning mode, the label is matched or not, and the candidate reply corresponding to si (i represents the number of the candidate replies) with the highest score is output as a reply sentence in the chat application stage. Generally, it is better to have 10 candidate replies, i.e. to select the reply sentence that best matches the history chat record from 10 replies. According to the embodiment of the application, the interactive matching network model is used, the two-way deep matching between the candidate reply and the historical chat records is carried out, the quality of the reply sentences is guaranteed, and the effect of multi-round chatting is better.

In the embodiment, the multi-round search type chatting method and the display device improve the use experience of the user in the multi-round chatting. The method comprises the following steps: obtaining a chat statement sent by a user, and determining a confidence coefficient corresponding to a preset theme classification of the chat statement; when the highest confidence coefficient is not smaller than a first preset value, determining a preset topic classification corresponding to the highest confidence coefficient as an alternative topic classification; when the highest confidence coefficient is smaller than a first preset value and not smaller than a second preset value, determining that the preset topic classification corresponding to the confidence coefficient in the preset sequence is an alternative topic classification; screening candidate replies from a multi-round corpus database according to the alternative topic classification and the preset character attribute; according to the history chatting records and the candidate replies, screening one reply from the candidate replies as a reply sentence by using an interactive matching network model; and when the highest confidence coefficient is smaller than a second preset value, determining the universal chat sentence as a reply sentence.

Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present application.

The foregoing description, for purposes of explanation, has been presented in conjunction with specific embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the embodiments to the precise forms disclosed above. Many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles and the practical application, to thereby enable others skilled in the art to best utilize the embodiments and various embodiments with various modifications as are suited to the particular use contemplated.

Claims

1. A display device, comprising:

a display;

a controller for performing: obtaining a chat statement sent by a user, and determining a confidence coefficient corresponding to a preset theme classification of the chat statement; when the highest confidence coefficient is not smaller than a first preset value, determining a preset topic classification corresponding to the highest confidence coefficient as an alternative topic classification; when the highest confidence coefficient is smaller than a first preset value and not smaller than a second preset value, determining that the preset topic classification corresponding to the confidence coefficient in the preset sequence is an alternative topic classification; screening candidate replies from a multi-round corpus database according to the alternative topic classification and the preset character attribute; according to the history chatting records and the candidate replies, screening one reply from the candidate replies as a reply sentence by using an interactive matching network model;

2. The display device of claim 1, wherein the controller is configured to determine the confidence level corresponding to the preset topic classification of the chat statement according to the following steps: and determining confidence corresponding to the preset topic classification by the chat statement by using a convolutional neural network text classification model.

3. The display device according to claim 1, wherein the determining process of the multi-round corpus database comprises:

acquiring original multi-round corpora;

4. The device according to claim 3, wherein the method for emotion analyzing each sentence in the original multi-turn corpus to obtain the qualified sentence comprises:

segmenting the sentence to obtain chat segments;

determining a score corresponding to the sentence by using a weighting algorithm according to the degree words and/or the negative words and the emotional words; and if the score reaches a preset score, determining that the statement is a qualified statement.

5. The display device according to claim 1, wherein the controller is further configured to perform: and updating the historical chat records by using the chat sentences and reply sentences screened from the candidate replies.

6. The display device according to claim 3, wherein the determining process of the original multi-turn corpus comprises:

if the candidate sentence has no sensitive word and/or negative word, determining that the candidate sentence is one sentence in the original multi-round corpus, and transmitting the candidate sentence to a second chat robot, so that the second chat robot outputs a corresponding reply sentence according to the candidate sentence, taking the reply sentence as the candidate sentence, and repeatedly executing the step of determining whether the candidate sentence has the sensitive word and/or negative word.

7. A method for multi-round retrieval chat, the method comprising:

obtaining a chat statement sent by a user, and determining a confidence coefficient corresponding to a preset theme classification of the chat statement; when the highest confidence coefficient is not smaller than a first preset value, determining a preset topic classification corresponding to the highest confidence coefficient as an alternative topic classification; when the highest confidence coefficient is smaller than a first preset value and not smaller than a second preset value, determining that the preset topic classification corresponding to the confidence coefficient in the preset sequence is an alternative topic classification; screening candidate replies from a multi-round corpus database according to the alternative topic classification and the preset character attribute; according to the history chatting records and the candidate replies, screening one reply from the candidate replies as a reply sentence by using an interactive matching network model;

8. The method of claim 7, wherein the confidence level corresponding to the preset topic classification of the chat statement comprises: and determining confidence corresponding to the preset topic classification by the chat statement by using a convolutional neural network text classification model.

9. The method according to claim 7, wherein the determining process of the multi-round corpus database comprises:

acquiring original multi-round corpora;

10. The method of claim 9, wherein the emotion analyzing each sentence in the original multi-turn corpus to obtain the qualified sentence comprises:

segmenting the sentence to obtain chat segments;