CN112883144A - Information interaction method - Google Patents

Information interaction method Download PDF

Info

Publication number
CN112883144A
CN112883144A CN201911205823.7A CN201911205823A CN112883144A CN 112883144 A CN112883144 A CN 112883144A CN 201911205823 A CN201911205823 A CN 201911205823A CN 112883144 A CN112883144 A CN 112883144A
Authority
CN
China
Prior art keywords
reply text
candidate reply
video
candidate
score
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911205823.7A
Other languages
Chinese (zh)
Inventor
连欢
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hisense Electronic Technology Wuhan Co ltd
Original Assignee
Hisense Electronic Technology Wuhan Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hisense Electronic Technology Wuhan Co ltd filed Critical Hisense Electronic Technology Wuhan Co ltd
Priority to CN201911205823.7A priority Critical patent/CN112883144A/en
Publication of CN112883144A publication Critical patent/CN112883144A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/43Querying
    • G06F16/435Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/7867Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, title and artist information, manually generated time, location and usage information, user ratings
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue

Abstract

The application discloses an information interaction method, which comprises the following steps: receiving a query instruction from an intelligent television, and acquiring attribute information of a target audio/video according to a keyword carried by the query instruction; generating at least two candidate reply texts containing evaluation information based on a reply text generation model and by using the attribute information of the target audio/video; determining the score of each candidate reply text according to the evaluation information in the candidate reply text and the score corresponding to the evaluation word; selecting one reply text from the candidate reply texts according to the score of each candidate reply text; and sending the reply text to enable the user to receive and display the reply text containing the evaluation information. The method and the device can improve user experience.

Description

Information interaction method
Technical Field
The present application relates to communications technologies, and in particular, to an information interaction method.
Background
Smart devices (such as smart televisions, smart speakers, smart phones), and the like, have interaction capability with users, and can provide users with experience meeting user requirements according to the requests of the users, so that the smart devices are increasingly widely applied.
Taking the smart television as an example, a user may send a query and a request for playing a certain movie in a voice manner (for example, "i want to see ABC", where "ABC" represents a movie name), the smart television can recognize the voice and analyze the voice, query by using the analyzed movie name as a keyword, display the query on an interface after querying a playing resource of the movie, and reply in a voice manner (for example, "find a video about ABC for you, please select to play").
Currently, when an intelligent device replies to a query request of a user, the reply content is obtained based on a template set in advance manually. For example, in the above example, the reply template is manually preset to "find a video about { movie name } for you, please select to play", where the movie name in { } can be changed according to the query result, and other parts are fixed. Therefore, the mode of replying by the template preset manually is single and stiff in expression mode, cannot generate positive emotional resonance with the user, and reduces user experience.
Therefore, it is desirable to provide a language generation method for improving user psychology to optimize an information interaction process.
Disclosure of Invention
The application provides an information interaction method which is used for providing matched responses for a user based on evaluation words when responding to a query request of the user for audio and video in an interaction process, so that user experience is improved.
An information interaction method provided by an embodiment of the application includes:
receiving a query instruction from an intelligent television, and acquiring attribute information of a target audio/video according to a keyword carried by the query instruction;
generating at least two candidate reply texts containing evaluation information based on a reply text generation model and by using the attribute information of the target audio/video; the reply text generation model generates a reply text based on a reply text generation template obtained through training, and attribute information of an audio and video used for training the reply text generation model comprises evaluation information, so that the reply text generation template obtained through training comprises the evaluation information;
determining the score of each candidate reply text according to the evaluation information in the candidate reply text and the score corresponding to the evaluation word;
selecting one reply text from the candidate reply texts according to the score of each candidate reply text;
and sending the reply text to enable the smart television to display and/or voice broadcast the reply text.
The evaluation information comprises emotional words used for expressing subjective feeling of the target audio and video; determining the score of each candidate reply text according to the evaluation information in the candidate reply text and the score corresponding to the evaluation information, wherein the determining specifically comprises the following steps: inquiring a score corresponding to the emotion word according to the emotion word in the candidate reply text; and accumulating the scores of the emotion words in the candidate reply texts to obtain the scores corresponding to the candidate reply texts for each candidate reply text.
Optionally, the evaluation information further includes a negative word; for each candidate reply text, accumulating the scores of the emotion words in the candidate reply text to obtain the score corresponding to the candidate reply text, which specifically comprises the following steps: determining the emotional words modified by the negative words in the candidate reply texts; multiplying the coefficient corresponding to the negative word by the score corresponding to the emotion word modified by the negative word to obtain the score of the emotion word; and accumulating the scores of the emotion words in the candidate reply texts to obtain the scores corresponding to the candidate reply texts.
Optionally, the evaluation information further includes a degree word; for each candidate reply text, accumulating the scores of the emotion words in the candidate reply text to obtain the score corresponding to the candidate reply text, which specifically comprises the following steps: determining the emotional words modified by the degree words in the candidate reply texts; multiplying the coefficient corresponding to the tone degree grade by the score corresponding to the emotion word modified by the level word according to the tone degree grade of the level word to obtain the score of the emotion word; and accumulating the scores of the emotion words in the candidate reply texts to obtain the scores corresponding to the candidate reply texts.
The computer storage medium stores computer program instructions, and when the instructions are run on a computer, the computer executes the information interaction method.
An embodiment of the present application provides an information interaction device, including: a user input interface configured to receive input from a user; an audio output configured to output an audio signal; a controller respectively coupled with the audio outputs of the user input interfaces, the controller being configured to perform the above-mentioned information interaction method.
In the embodiment of the application, when the audio/video query request of the user is responded, on one hand, at least two candidate reply texts containing evaluation information are generated according to the reply text generation model and by utilizing the attribute information of the target audio/video, and compared with a fixed reply text generation template, the generated reply texts are more vivid and rich in content; on the other hand, the score of each candidate reply text is determined according to the evaluation information in the candidate reply texts and the score corresponding to the evaluation information, and one reply text is selected according to the score of each candidate reply text, so that the selected reply text can reflect positive evaluation on the target audio and video, further the reply text can resonate with the user, and the psychological feeling of the user is improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to these drawings without inventive exercise.
Fig. 1 is a schematic diagram illustrating an operation scenario between a display device and a control apparatus according to an embodiment;
fig. 2 is a block diagram exemplarily showing a hardware configuration of a display device 200 according to an embodiment;
fig. 3 is a block diagram exemplarily showing a hardware configuration of the control apparatus 100 according to the embodiment;
fig. 4 is a diagram exemplarily showing a functional configuration of the display device 200 according to the embodiment;
fig. 5a schematically shows a software configuration in the display device 200 according to an embodiment;
fig. 5b schematically shows a configuration of an application in the display device 200 according to an embodiment;
fig. 6 is a schematic diagram illustrating a user interface in the display device 200 according to an embodiment;
fig. 7 is a diagram exemplarily illustrating a flow of generating an evaluation word dictionary in the display device 200 according to the embodiment;
FIG. 8 is a schematic flow diagram illustrating training of a reply text generation model in the display device 200 according to an embodiment;
fig. 9 schematically shows a flow chart of an interaction method of the display device 200 according to an embodiment;
fig. 10 is a schematic diagram illustrating a user interface of the display device 200 after responding to a user audio-video query request according to an embodiment.
Detailed Description
The embodiment of the application provides a method for realizing information interaction by information interaction equipment, which can provide a reply text with positive emotion for a user when replying a query request of the user for audio and video in a man-machine interaction scene, and improve user experience.
For example, the information interaction device in the embodiment of the present application may be a display device with a voice interaction function, such as a smart television, and the display device may respond to a query request of a user for audio and video, display resource information of a queried target audio and video on a display, and broadcast a reply text in a voice manner. The information interaction device in the embodiment of the application may also be a playing device with a voice interaction function, such as a smart sound box, and the playing device may respond to an audio query request of a user and broadcast a reply text in a voice manner.
To make the objects, technical solutions and advantages of the exemplary embodiments of the present application clearer, the technical solutions in the exemplary embodiments of the present application will be clearly and completely described below with reference to the drawings in the exemplary embodiments of the present application, and it is obvious that the described exemplary embodiments are only a part of the embodiments of the present application, but not all the embodiments.
All other embodiments, which can be derived by a person skilled in the art from the exemplary embodiments shown in the present application without inventive effort, shall fall within the scope of protection of the present application. Moreover, while the disclosure herein has been presented in terms of exemplary one or more examples, it is to be understood that each aspect of the disclosure can be utilized independently and separately from other aspects of the disclosure to provide a complete disclosure.
It should be understood that the terms "first," "second," "third," and the like in the description and in the claims of the present application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used are interchangeable under appropriate circumstances and can be implemented in sequences other than those illustrated or otherwise described herein with respect to the embodiments of the application, for example.
Furthermore, the terms "comprises" and "comprising," as well as any variations thereof, are intended to cover a non-exclusive inclusion, such that a product or device that comprises a list of elements is not necessarily limited to those elements explicitly listed, but may include other elements not expressly listed or inherent to such product or device.
The term "module" as used herein refers to any known or later developed hardware, software, firmware, artificial intelligence, fuzzy logic, or combination of hardware and/or software code that is capable of performing the functionality associated with that element.
The term "remote control" as used in this application refers to a component of an electronic device, such as the display device disclosed in this application, that is typically wirelessly controllable over a short range of distances. Typically using infrared and/or Radio Frequency (RF) signals and/or bluetooth to connect with the electronic device, and may also include WiFi, wireless USB, bluetooth, motion sensor, etc. For example: the hand-held touch remote controller replaces most of the physical built-in hard keys in the common remote control device with the user interface in the touch screen.
The term "gesture" as used in this application refers to a user's behavior through a change in hand shape or an action such as hand motion to convey a desired idea, action, purpose, or result.
The following describes embodiments of the present application in detail with reference to the accompanying drawings, and in some embodiments, the following describes an example in which an information interaction device is used as a display device.
Fig. 1 is a schematic diagram illustrating an operation scenario between a display device and a control apparatus according to an embodiment. As shown in fig. 1, a user may operate the display device 200 through the mobile terminal 300 and the control apparatus 100.
The control device 100 may control the display device 200 in a wireless or other wired manner by using a remote controller, including infrared protocol communication, bluetooth protocol communication, other short-distance communication manners, and the like. The user may input a user command through a key on a remote controller, voice input, control panel input, etc. to control the display apparatus 200. Such as: the user can input a corresponding control command through a volume up/down key, a channel control key, up/down/left/right moving keys, a voice input key, a menu key, a power on/off key, etc. on the remote controller, to implement the function of controlling the display device 200.
In some embodiments, mobile terminals, tablets, computers, laptops, and other smart devices may also be used to control the display device 200. For example, the display device 200 is controlled using an application program running on the smart device. The application, through configuration, may provide the user with various controls in an intuitive User Interface (UI) on a screen associated with the smart device.
For example, the mobile terminal 300 may install a software application with the display device 200, implement connection communication through a network communication protocol, and implement the purpose of one-to-one control operation and data communication. Such as: the function of controlling the display device 200 can be realized by establishing a control instruction protocol between the mobile terminal 300 and the display device 200 to synchronize the remote control keyboard to the mobile terminal 300 and controlling the user interface on the mobile terminal 300. The audio and video content displayed on the mobile terminal 300 can also be transmitted to the display device 200, so as to realize the synchronous display function.
As also shown in fig. 1, the display apparatus 200 also performs data communication with the server 400 through various communication means. The display device 200 may be allowed to be communicatively connected through a Local Area Network (LAN), a Wireless Local Area Network (WLAN), and other networks. The server 400 may provide various contents and interactions to the display apparatus 200. Illustratively, the display device 200 receives software program updates, or accesses a remotely stored digital media library, by sending and receiving information, as well as Electronic Program Guide (EPG) interactions. The servers 400 may be a group or groups of servers, and may be one or more types of servers. Other web service contents such as video on demand and advertisement services are provided through the server 400.
The display device 200 may be a smart television or a smart speaker. The specific smart product type, device model, etc. are not limited, and those skilled in the art will appreciate that the display device 200 may be modified in performance and configuration as desired.
The display apparatus 200 may additionally provide an intelligent network tv function that provides a computer support function in addition to the broadcast receiving tv function. Examples include a web tv, a smart tv, an Internet Protocol Tv (IPTV), and the like.
A hardware configuration block diagram of a display device 200 according to an exemplary embodiment is exemplarily shown in fig. 2. As shown in fig. 2, the display device 200 includes a controller 210, a tuner demodulator 220, a communication interface 230, a detector 240, an input/output interface 250, a video processor 260-1, an audio processor 60-2, a display 280, an audio output 270, a memory 290, a power supply, and an infrared receiver.
A display 280 for receiving the image signal from the video processor 260-1 and displaying the video content and image and components of the menu manipulation interface. The display 280 includes a display screen assembly for presenting a picture, and a driving assembly for driving the display of an image. The video content may be displayed from broadcast television content, or may be broadcast signals that may be received via a wired or wireless communication protocol. Alternatively, various image contents received from the network communication protocol and sent from the network server side can be displayed.
Meanwhile, the display 280 displays a user manipulation UI interface generated in the display apparatus 200 and used to control the display apparatus 200.
And, a driving component for driving the display according to the type of the display 280. Alternatively, in case the display 280 is a projection display, it may also comprise a projection device and a projection screen.
The communication interface 230 is a component for communicating with an external device or an external server according to various communication protocol types. For example: the communication interface 230 may be a Wifi chip 231, a bluetooth communication protocol chip 232, a wired ethernet communication protocol chip 233, or other network communication protocol chips or near field communication protocol chips, and an infrared receiver (not shown).
The display apparatus 200 may establish control signal and data signal transmission and reception with an external control apparatus or a content providing apparatus through the communication interface 230. And an infrared receiver, an interface device for receiving an infrared control signal for controlling the apparatus 100 (e.g., an infrared remote controller, etc.).
The detector 240 is a signal used by the display device 200 to collect an external environment or interact with the outside. The detector 240 includes a light receiver 242, a sensor for collecting the intensity of ambient light, and parameters such as parameter changes can be adaptively displayed by collecting the ambient light.
The image acquisition device 241, such as a camera and a camera, may be used to acquire an external environment scene, acquire attributes of a user or interact gestures with the user, adaptively change display parameters, and recognize gestures of the user, so as to implement an interaction function with the user.
In some other exemplary embodiments, the detector 240, a temperature sensor, etc. may be provided, for example, by sensing the ambient temperature, and the display device 200 may adaptively adjust the display color temperature of the image. For example, the display apparatus 200 may be adjusted to display a cool tone when the temperature is in a high environment, or the display apparatus 200 may be adjusted to display a warm tone when the temperature is in a low environment.
In other exemplary embodiments, the detector 240, and a sound collector, such as a microphone, may be used to receive a user's voice, a voice signal including a control instruction of the user to control the display device 200, or collect an ambient sound for identifying an ambient scene type, and the display device 200 may adapt to the ambient noise.
The input/output interface 250 controls data transmission between the display device 200 of the controller 210 and other external devices. Such as receiving video and audio signals or command instructions from an external device.
Input/output interface 250 may include, but is not limited to, the following: any one or more of high definition multimedia interface HDMI interface 251, analog or data high definition component input interface 253, composite video input interface 252, USB input interface 254, RGB ports (not shown in the figures), etc.
In some other exemplary embodiments, the input/output interface 250 may also form a composite input/output interface with the above-mentioned plurality of interfaces.
The tuning demodulator 220 receives the broadcast television signals in a wired or wireless receiving manner, may perform modulation and demodulation processing such as amplification, frequency mixing, resonance, and the like, and demodulates the television audio/video signals carried in the television channel frequency selected by the user and the EPG data signals from the plurality of wireless or wired broadcast television signals.
The tuner demodulator 220 is responsive to the user-selected television signal frequency and the television signal carried by the frequency, as selected by the user and controlled by the controller 210.
The tuner-demodulator 220 may receive signals in various ways according to the broadcasting system of the television signal, such as: terrestrial broadcast, cable broadcast, satellite broadcast, or internet broadcast signals, etc.; and according to different modulation types, the modulation mode can be digital modulation or analog modulation. Depending on the type of television signal received, both analog and digital signals are possible.
In other exemplary embodiments, the tuner/demodulator 220 may be in an external device, such as an external set-top box. In this way, the set-top box outputs television audio/video signals after modulation and demodulation, and the television audio/video signals are input into the display device 200 through the input/output interface 250.
The video processor 260-1 is configured to receive an external video signal, and perform video processing such as decompression, decoding, scaling, noise reduction, frame rate conversion, resolution conversion, and image synthesis according to a standard codec protocol of the input signal, so as to obtain a signal that can be directly displayed or played on the display device 200.
Illustratively, the video processor 260-1 includes a demultiplexing module, a video decoding module, an image synthesizing module, a frame rate conversion module, a display formatting module, and the like.
The demultiplexing module is used for demultiplexing the input audio and video data stream, and if the input MPEG-2 is input, the demultiplexing module demultiplexes the input audio and video data stream into a video signal and an audio signal.
And the video decoding module is used for processing the video signal after demultiplexing, including decoding, scaling and the like.
And the image synthesis module is used for carrying out superposition mixing processing on the GUI signal input by the user or generated by the user and the video image after the zooming processing by the graphic generator so as to generate an image signal for display.
The frame rate conversion module is configured to convert an input video frame rate, such as a 60Hz frame rate into a 120Hz frame rate or a 240Hz frame rate, and the normal format is implemented in, for example, an interpolation frame mode.
The display format module is used for converting the received video output signal after the frame rate conversion, and changing the signal to conform to the signal of the display format, such as outputting an RGB data signal.
The audio processor 260-2 is configured to receive an external audio signal, decompress and decode the received audio signal according to a standard codec protocol of the input signal, and perform noise reduction, digital-to-analog conversion, amplification processing, and the like to obtain an audio signal that can be played in the speaker.
In other exemplary embodiments, video processor 260-1 may comprise one or more chips. The audio processor 260-2 may also comprise one or more chips.
And, in other exemplary embodiments, the video processor 260-1 and the audio processor 260-2 may be separate chips or may be integrated together with the controller 210 in one or more chips.
An audio output 272, which receives the sound signal output from the audio processor 260-2 under the control of the controller 210, such as: the speaker 272, and the external sound output terminal 274 that can be output to the generation device of the external device, in addition to the speaker 272 carried by the display device 200 itself, such as: an external sound interface or an earphone interface and the like.
The power supply provides power supply support for the display device 200 from the power input from the external power source under the control of the controller 210. The power supply may include a built-in power supply circuit installed inside the display device 200, or may be a power supply interface installed outside the display device 200 to provide an external power supply in the display device 200.
A user input interface for receiving an input signal of a user and then transmitting the received user input signal to the controller 210. The user input signal may be a remote controller signal received through an infrared receiver, and various user control signals may be received through the network communication module.
For example, the user inputs a user command through the remote controller 100 or the mobile terminal 300, the user input interface responds to the user input through the controller 210 according to the user input, and the display device 200 responds to the user input.
In some embodiments, a user may enter a user command on a Graphical User Interface (GUI) displayed on the display 280, and the user input interface receives the user input command through the Graphical User Interface (GUI). Alternatively, the user may input the user command by inputting a specific sound or gesture, and the user input interface receives the user input command by recognizing the sound or gesture through the sensor.
The controller 210 controls the operation of the display apparatus 200 and responds to the user's operation through various software control programs stored in the memory 290.
As shown in fig. 2, the controller 210 includes a RAM213 and a ROM214, and a graphic processor 216, a CPU processor 212, a communication interface 218, such as: a first interface 218-1 through an nth interface 218-n, and a communication bus. The RAM213 and the ROM214, the graphic processor 216, the CPU processor 212, and the communication interface 218 are connected via a bus.
A ROM213 for storing instructions for various system boots. If the display apparatus 200 starts power-on upon receipt of the power-on signal, the CPU processor 212 executes a system boot instruction in the ROM, copies the operating system stored in the memory 290 to the RAM213, and starts running the boot operating system. After the start of the operating system is completed, the CPU processor 212 copies the various application programs in the memory 290 to the RAM213, and then starts running and starting the various application programs.
A graphics processor 216 for generating various graphics objects, such as: icons, operation menus, user input instruction display graphics, and the like. The display device comprises an arithmetic unit which carries out operation by receiving various interactive instructions input by a user and displays various objects according to display attributes. And a renderer for generating various objects based on the operator and displaying the rendered result on the display 280.
A CPU processor 212 for executing operating system and application program instructions stored in memory 290. And executing various application programs, data and contents according to various interactive instructions received from the outside so as to finally display and play various audio and video contents.
In some exemplary embodiments, the CPU processor 212 may include a plurality of processors. The plurality of processors may include one main processor and a plurality of or one sub-processor. A main processor for performing some operations of the display apparatus 200 in a pre-power-up mode and/or operations of displaying a screen in a normal mode. A plurality of or one sub-processor for one operation in a standby mode or the like.
The controller 210 may control the overall operation of the display apparatus 100. For example: in response to receiving a user command for selecting a UI object to be displayed on the display 280, the controller 210 may perform an operation related to the object selected by the user command.
Wherein the object may be any one of selectable objects, such as a hyperlink or an icon. Operations related to the selected object, such as: displaying an operation connected to a hyperlink page, document, image, or the like, or performing an operation of a program corresponding to the icon. The user command for selecting the UI object may be a command input through various input means (e.g., a mouse, a keyboard, a touch pad, etc.) connected to the display apparatus 200 or a voice command corresponding to a voice spoken by the user.
The memory 290 includes a memory for storing various software modules for driving the display device 200. Such as: various software modules stored in memory 290, including: the system comprises a basic module, a detection module, a communication module, a display control module, a browser module, various service modules and the like.
Wherein the basic module is a bottom layer software module for signal communication among the various hardware in the postpartum care display device 200 and for sending processing and control signals to the upper layer module. The detection module is used for collecting various information from various sensors or user input interfaces, and the management module is used for performing digital-to-analog conversion and analysis management.
For example: the voice recognition module comprises a voice analysis module and a voice instruction database module. The display control module is a module for controlling the display 280 to display image content, and may be used to play information such as multimedia image content and UI interface. And the communication module is used for carrying out control and data communication with external equipment. And the browser module is used for executing a module for data communication between browsing servers. And the service module is used for providing various services and modules including various application programs.
Meanwhile, the memory 290 is also used to store visual effect maps and the like for receiving external data and user data, images of respective items in various user interfaces, and a focus object.
A block diagram of the configuration of the control apparatus 100 according to an exemplary embodiment is exemplarily shown in fig. 3. As shown in fig. 3, the control apparatus 100 includes a controller 110, a communication interface 130, a user input/output interface 140, a memory 190, and a power supply 180.
The control device 100 is configured to control the display device 200 and may receive an input operation instruction of a user and convert the operation instruction into an instruction recognizable and responsive by the display device 200, serving as an interaction intermediary between the user and the display device 200. Such as: the user responds to the channel up and down operation by operating the channel up and down keys on the control device 100.
In some embodiments, the control device 100 may be a smart device. Such as: the control apparatus 100 may install various applications that control the display apparatus 200 according to user demands.
In some embodiments, as shown in fig. 1, a mobile terminal 300 or other intelligent electronic device may function similar to the control device 100 after installing an application that manipulates the display device 200. Such as: the user may implement the functions of controlling the physical keys of the device 100 by installing applications, various function keys or virtual buttons of a graphical user interface available on the mobile terminal 300 or other intelligent electronic device.
The controller 110 includes a processor 112 and RAM113 and ROM114, a communication interface 218, and a communication bus. The controller 110 is used to control the operation of the control device 100, as well as the internal components for communication and coordination and external and internal data processing functions.
The communication interface 130 enables communication of control signals and data signals with the display apparatus 200 under the control of the controller 110. Such as: the received user input signal is transmitted to the display apparatus 200. The communication interface 130 may include at least one of a WiFi chip, a bluetooth module, an NFC module, and other near field communication modules.
A user input/output interface 140, wherein the input interface includes at least one of a microphone 141, a touch pad 142, a sensor 143, keys 144, and other input interfaces. Such as: the user can realize a user instruction input function through actions such as voice, touch, gesture, pressing, and the like, and the input interface converts the received analog signal into a digital signal and converts the digital signal into a corresponding instruction signal, and sends the instruction signal to the display device 200.
The output interface includes an interface that transmits the received user instruction to the display apparatus 200. In some embodiments, the interface may be an infrared interface or a radio frequency interface. Such as: when the infrared signal interface is used, the user input instruction needs to be converted into an infrared control signal according to an infrared control protocol, and the infrared control signal is sent to the display device 200 through the infrared sending module. The following steps are repeated: when the rf signal interface is used, a user input command needs to be converted into a digital signal, and then the digital signal is modulated according to the rf control signal modulation protocol and then transmitted to the display device 200 through the rf transmitting terminal.
In some embodiments, the control device 100 includes at least one of a communication interface 130 and an output interface. The control device 100 is provided with a communication interface 130, such as: the WiFi, bluetooth, NFC, etc. modules may transmit the user input command to the display device 200 through the WiFi protocol, or the bluetooth protocol, or the NFC protocol code.
A memory 190 for storing various operation programs, data and applications for driving and controlling the control apparatus 200 under the control of the controller 110. The memory 190 may store various control signal commands input by a user.
The power supply 180 is used to provide operational power support for the various components of the control device 100 under the control of the controller 110. A battery and associated control circuitry.
Fig. 4 is a diagram schematically illustrating a functional configuration of the display device 200 according to an exemplary embodiment. As shown in fig. 4, the memory 290 is used to store an operating system, an application program, contents, user data, and the like, and performs system operations for driving the display device 200 and various operations in response to a user under the control of the controller 210. The memory 290 may include volatile and/or nonvolatile memory.
The memory 290 is specifically configured to store an operating program for driving the controller 210 in the display device 200, and to store various application programs installed in the display device 200, various application programs downloaded by a user from an external device, various graphical user interfaces related to the applications, various objects related to the graphical user interfaces, user data information, and internal data of various supported applications. The memory 290 is used to store system software such as an OS kernel, middleware, and applications, and to store input video data and audio data, and other user data.
Memory 290 is specifically configured to store drivers and associated data for audio video processors 260-1 and 260-2, display 280, communication interface 230, tuner demodulator 220, detector 240 input/output interfaces, and the like.
In some embodiments, memory 290 may store software and/or programs, software programs for representing an Operating System (OS) including, for example: a kernel, middleware, an Application Programming Interface (API), and/or an application program. For example, the kernel may control or manage system resources, or functions implemented by other programs (e.g., the middleware, APIs, or applications), and the kernel may provide interfaces to allow the middleware and APIs, or applications, to access the controller to implement controlling or managing system resources.
The memory 290, for example, includes a broadcast receiving module 2901, a channel control module 2902, a volume control module 2903, an image control module 2904, a display control module 2905, an audio control module 2906, an external instruction recognition module 2907, a communication control module 2908, a light receiving module 2909, a power control module 2910, an operating system 2911, and other application programs 2912, a browser module, and the like. The controller 210 performs functions such as: a broadcast television signal reception demodulation function, a television channel selection control function, a volume selection control function, an image control function, a display control function, an audio control function, an external instruction recognition function, a communication control function, an optical signal reception function, an electric power control function, a software control platform supporting various functions, a browser function, and the like.
A block diagram of a configuration of a software system in a display device 200 according to an exemplary embodiment is exemplarily shown in fig. 5 a.
As shown in fig. 5a, an operating system 2911, including executing operating software for handling various basic system services and for performing hardware related tasks, acts as an intermediary for data processing performed between application programs and hardware components. In some embodiments, portions of the operating system kernel may contain a series of software to manage the display device hardware resources and provide services to other programs or software code.
In other embodiments, portions of the operating system kernel may include one or more device drivers, which may be a set of software code in the operating system that assists in operating or controlling the devices or hardware associated with the display device. The drivers may contain code that operates the video, audio, and/or other multimedia components. Examples include a display screen, a camera, Flash, WiFi, and audio drivers.
The accessibility module 2911-1 is configured to modify or access the application program to achieve accessibility and operability of the application program for displaying content.
A communication module 2911-2 for connection to other peripherals via associated communication interfaces and a communication network.
The user interface module 2911-3 is configured to provide an object for displaying a user interface, so that each application program can access the object, and user operability can be achieved.
Control applications 2911-4 for controllable process management, including runtime applications and the like.
The event transmission system 2914, which may be implemented within the operating system 2911 or within the application program 2912, in some embodiments, on the one hand, within the operating system 2911 and on the other hand, within the application program 2912, is configured to listen for various user input events, and to refer to handlers that perform one or more predefined sets of operations in response to the identification of various types of events or sub-events, depending on the various events.
The event monitoring module 2914-1 is configured to monitor an event or a sub-event input by the user input interface.
The event identification module 2914-1 is configured to input definitions of various types of events for various user input interfaces, identify various events or sub-events, and transmit the same to a process for executing one or more corresponding sets of processes.
The event or sub-event refers to an input detected by one or more sensors in the display device 200 and an input of an external control device (e.g., the control device 100). Such as: the method comprises the following steps of inputting various sub-events through voice, inputting gestures through gesture recognition, inputting sub-events through remote control key commands of the control equipment and the like. Illustratively, the one or more sub-events in the remote control include a variety of forms including, but not limited to, one or a combination of key presses up/down/left/right/, ok keys, key presses, and the like. And non-physical key operations such as move, hold, release, etc.
The interface layout manager 2913, directly or indirectly receiving the input events or sub-events from the event transmission system 2914, monitors the input events or sub-events, and updates the layout of the user interface, including but not limited to the position of each control or sub-control in the interface, and the size, position, and level of the container, and other various execution operations related to the layout of the interface.
As shown in fig. 5b, the application layer 2912 contains various applications that may also be executed at the display device 200. The application may include, but is not limited to, one or more applications such as: live television applications, video-on-demand applications, media center applications, application centers, gaming applications, and the like.
The live television application program can provide live television through different signal sources. For example, a live television application may provide television signals using input from cable television, radio broadcasts, satellite services, or other types of live television services. And, the live television application may display video of the live television signal on the display device 200.
A video-on-demand application may provide video from different storage sources. Unlike live television applications, video on demand provides a video display from some storage source. For example, the video on demand may come from a server side of the cloud storage, from a local hard disk storage containing stored video programs.
The media center application program can provide various applications for playing multimedia contents. For example, a media center, which may be other than live television or video on demand, may provide services that a user may access to various images or audio through a media center application.
The application program center can provide and store various application programs. The application may be a game, an application, or some other application associated with a computer system or other device that may be run on the smart television. The application center may obtain these applications from different sources, store them in local storage, and then be operable on the display device 200.
A schematic diagram of a user interface in a display device 200 according to an exemplary embodiment is illustrated in fig. 6. As shown in FIG. 6, the user interface includes a plurality of view display areas, illustratively a first view display area 201 and a second view display area 202, each of which includes a layout of one or more different items. And a selector in the user interface indicating that any one of the items is selected, the position of the selector being movable by user input to change the selection of a different item.
It should be noted that the plurality of view display areas may be visible boundaries or invisible boundaries. Such as: the different view display areas can be marked through different background colors of the view display areas, and visible marks such as boundary lines and invisible boundaries can also be provided. It is also possible that there is no visible or non-visible border, and that only the associated items in a certain area of the screen are displayed, having the same changing properties in size and/or arrangement, which certain area is seen as the presence of the border of the same view partition, such as: the items in the view display area 201 are simultaneously zoomed in or out, while the change in the view display area 202 is different.
Wherein, in some embodiments, the first view display area 201 is a scalable view display. "zoomable," may mean that the first view display area 201 is zoomable in size or proportion on the screen, or that the items in the first view display 201 are zoomable in size or proportion on the screen. The first view display area 201 is a scroll view display area which can scroll update the number of items displayed in the screen by a user input.
The "item" refers to a visual object displayed in each view display area of the user interface in the display apparatus 200 to represent corresponding content such as an icon, a thumbnail, a video clip, and the like. For example: the items may represent movies, image content or video clips of a television show, audio content of music, applications, or other user access content history information.
In some embodiments, an "item" may display an image thumbnail. Such as: when the item is a movie or a tv show, the item may be displayed as a poster of the movie or tv show. If the item is music, a poster of a music album may be displayed. Such as an icon for the application when the item is an application, or a screenshot of the content that captures the application when it was most recently executed. If the item is the user access history, the content screenshot in the latest execution process can be displayed. The "item" may be displayed as a video clip. Such as: the item is a video clip dynamic of a trailer of a television or a television show.
Further, the item may represent an interface or a collection of interfaces on which the display device 200 is connected to an external device, or may represent a name of an external device connected to the display device, or the like. Such as: a signal source input interface set, or an HDMI interface, a USB interface, a PC terminal interface, etc.
In the embodiment of the application, the information interaction device can extract the evaluation information of the reply text by using the generated evaluation word dictionary, and determine the score of the candidate reply text according to the score corresponding to the evaluation information, so that the reply text with the optimal score is selected according to the score of the candidate reply text. The evaluation word dictionary comprises evaluation information describing user psychology and scores corresponding to the evaluation words.
Optionally, the evaluation information includes an emotion word expressing the psychological sensation of the user, and may further include a negative word or a degree word or the like modifying the emotion word. Wherein the emotional words include words or phrases describing the user's subjective feelings, evaluations, etc. of the audio/video, which may be positive or negative, such as positive words or phrases having "wonderful" or "impressive" or "unusual" or the like, negative words or phrases having "competing" or "rotten" or "uninteresting" or the like. The words of degree include words or phrases such as "most", "very", "super", "too", etc. that describe the degree of subjective perception of the user of the audio-video.
Fig. 7 is a diagram illustrating a flowchart of generating an evaluation word dictionary in the display device 200 according to the embodiment. As shown in fig. 7, the following operations are performed according to the setting flow:
step 701: and collecting corpus such as introduction, film evaluation, propaganda and the like of the audio and video works and/or resource information such as external vocabularies (such as knowledge maps, aliases, synonyms, similar words and the like) and the like in the audio and video resource library.
Step 702: and performing data cleaning work such as word segmentation, abnormal symbol removal (such as #, and the like), font conversion (such as conversion from traditional language to simplified language, conversion from other languages to Chinese font and the like) and the like on the collected resource information, and extracting words or phrases used as evaluation information, wherein the words or phrases can comprise emotional words, degree words, negative words and the like.
Alternatively, words or phrases for use as rating information may be extracted from several aspects:
and (3) in the aspect of a plot: for example, "war", "stimulation", "wonderful", "billow", "true", "strive", etc.;
emotional aspect: for example, "tear in smile", "enjoy", "review", "warm feeling", "old", "taste", etc.;
the type of the work: for example, "make a fun", "play a drama", "hate-vengeance", etc.;
and (3) performing: for example, "violent" opponent game "exquisite" hands-and-hands extraordinary "skill online" rotten "and the like;
and (3) on the picture: for example, "use the heart", "beautiful", "in person", "bad", etc.;
rhythm aspect: for example, "progressive", "compact", "fluent", "soothing", etc.;
the special effect aspect is as follows: for example, "special effect of surprise", "cool and dazzling", "in place", "bar", "rubbish", etc.;
and (4) role aspect: for example, "intelligent and conspiracy", "character vividness", "character powder absorption", etc.;
the side of the score: for example, "good hearing" "moving hearing" "warm and fragrant" "background" "wonderful" "pleasant", etc.;
in the aspect of attitude: for example, "none" does not "etc.;
and (3) feeling degree: for example, "special," "very," "equivalent," "straight," "super," etc.
The following examples illustrate how to extract evaluation words from a movie:
example 1: the film rating for movie a included: the mind course of the rewarming motherland is too good to look at the bars, the very sensible, nice "" super bars and the sensible rarily "" unreasonable plot, and the "rewarming" "looking at the" "sensible" "bars" "rarily" "reasonable" emotional words can be extracted from the movie comments to be used as the emotional words of the movie, the "very" "super bars" can be extracted to be used as the degree words for modifying the emotional words, and the "not" can be extracted to be used as the negative words for modifying the emotional words.
Step 703: and marking corresponding scores for the corresponding emotion words and degree words according to different levels of expressing psychological feelings of the emotion words and the degree words, and generating an evaluation word dictionary.
Step 704: judging whether the emotion words in the evaluation word dictionary exist in the sample for generating the reply text model training, if so, selecting the existing emotion words from the evaluation word dictionary, if not, selecting the positive emotion words through emotion analysis, and generating the evaluation word dictionary for extracting the emotion words in the candidate reply text.
In the step, an evaluation word dictionary used for extracting emotion words in candidate reply texts is generated according to the selected emotion words, and the vocabulary quantity of the evaluation word dictionary is small, so that when the emotion words in the reply texts output by the model are extracted according to the evaluation word dictionary, high efficiency can be obtained, resource expenditure is reduced, and processing time delay is reduced.
In the embodiment of the application, the information interaction device can dynamically generate a template of the reply text by using a language generation model based on a sequence to a sequence, attribute information and labeled corpus of audios and videos in an audio and video resource library are used as input of the model, the output of the model is the template of the reply text, and subsequently, structured data (attribute information) of audios and videos which a user requests to inquire can be input into the model, so that the model generates the corresponding reply text according to the reply text template obtained by training.
The attribute information of the audio and video can comprise inherent attribute information of the audio and video. The inherent attribute information can also be called basic information, is used for describing the basic attribute of the audio and video, and is objectively existing information. Taking a movie work as an example, the inherent attribute information of a movie work may include a movie name, a lead actor name, a director name, a movie type (such as a drama, a love or a disaster), a distribution area, a showing time, and the like of the movie work. For example, the inherent attribute information of a musical composition may include a musical name, a word maker, a composer, a release time, a musical style, a theme, and the like of the musical composition.
Taking a movie work as an example, the attribute information (inherent attribute information) may further include one or any combination of the following:
-plot keywords, which can be extracted from the plot synopsis using a keyword extraction algorithm (TF-IDF algorithm);
-classical lines;
-topic song information, such as topic song name;
-the name of the role.
The inherent attribute information of the audio and video can be obtained from various ways, such as a movie and television work information database or a knowledge base related to the movie and television work, and a related website.
The inherent attribute information of the audio and video is stored in the box file through data format conversion in the embodiment of the application. Table 1 exemplarily shows inherent attribute information of one movie work in the embodiment of the present application.
TABLE 1 inherent Property information of film and television works (inherent Property information of Audio/video)
Information domain name Content of information domains
Title of a title Hero distant sign of spider knight-errant
Actor(s) Thomherrand
Actor(s) Zadaya
Actor(s) Jeckgilnhal
Director Jowartz
Type (B) Movement of
Type (B) Science fiction
Region of land USA
Time More recently, the development of new and more recently developed devices
…… ……
The attribute information sound of the audio and video also comprises audio and video auxiliary information, wherein the auxiliary information at least comprises collected evaluation information of the target audio and video. And obtaining the evaluation information through a send file of the training sample labeling sentence. Table 2 exemplarily shows evaluation information of one movie work in the embodiment of the present application.
TABLE 2 evaluation information of film and television works (auxiliary information of audio and video)
Information domain name Content of information domains
Emotional words Common appreciation
Emotional words Appreciation
Emotional words Wonderful colour
Degree word Very much
Negative word Is free of
Based on the attribute information of the audios and the videos, when a model for generating the reply text is trained and the reply text is generated according to an audio and video query request of a user, the inherent attribute information and the evaluation information of the audios and the videos containing the contents are used, so that the template obtained by model training can contain richer information, and further the reply text generated based on the template is more active and vivid and rich in emotion and generates emotion resonance with the user, thereby enhancing the attraction to the user and improving the user experience.
Fig. 8 is a flow diagram illustrating training of a reply text generation model in the display device 200 according to an embodiment, by which a reply text template may be generated.
The reply text generation model may be a neural network-based seq2seq model. For example, the reply text generation model is an attention-based sequence-to-sequence model, and relates to a Long Short Term Memory Network (LSTM), a hidden semi-markov model, encoding and decoding, an attention mechanism, a copy mechanism, an overlay mechanism, and the like.
As shown in fig. 8, the following operations are performed according to the setting flow:
s801: and extracting attribute information of the audio and video and corresponding information such as film evaluation, after-view feeling and the like in the audio and video resource library as basic information of statement marking.
S802: and performing expected generation statement marking according to the basic information. And performing data format conversion based on the attribute information and the corresponding marked sentences, converting the attribute information into box files, and converting the marked expected generation sentences into send files to be used as input data of the text generation training model.
S803: the input data is encoded based on a sequence-to-sequence language model of the attention mechanism.
Wherein, the input data is a series of variable-length sequences and is coded into a vector (middle semantic representation) with a specified length through nonlinear transformation;
s804: and training the language generation model according to the vector with the specified length to obtain a trained language generation model file and a trained template file.
Wherein, an attention mechanism is adopted to correlate the generated reply sentence with the structural data of the audio and video.
S805: extracting and displaying a reply text template;
the format and the number of the reply text templates can be extracted through manual intervention, inferior templates are removed, and redundant templates are reduced.
S806: and decoding a vector (middle semantic representation) with a specified length based on the sequence-to-sequence language model of the attention mechanism to generate a reply text sequence with variable length corresponding to the target audio-video query request.
Taking the display device 200 as an intelligent device (such as an intelligent television or a sound box) as an example, fig. 9 exemplarily shows a flowchart of an interaction method of the intelligent device according to an embodiment. As shown, the process may include:
step 901: and the intelligent equipment receives an audio and video query request input by a user.
In the step, taking an intelligent electronic watch as an example, a user can operate a user interface through a remote controller to input a text-form audio/video query request, and can also input an audio/video query request in a voice mode, wherein the audio/video query request carries a keyword for querying a target audio/video.
Step 902: and the intelligent equipment sends a query instruction to the server according to the audio and video query request, wherein the query instruction carries the keyword.
In the step, the smart television can remove dryness (including echo and environmental noise removal) of the audio and video query request to obtain a clean query request text, and obtain the resource information of the target audio and video according to the query request text.
In the step, a request can be sent to a network side (server) according to a target audio and video requested by a user to request to acquire resource information of the target audio and video. The Resource information of the target audio/video may include information such as a Uniform Resource Identifier (URI) of the target audio/video.
Optionally, if the resource information of the target audio and video is cached locally by the device (for example, locally by the smart television or locally by the smart sound box), the resource information of the target audio and video may be locally obtained from the device.
Step 903: and the server acquires the attribute information of the target audio/video according to the key words in the query instruction.
In this step, the audio/video query request input by the user includes a query keyword, such as an audio/video name requested to be queried, and the audio/video resource library (which stores the attribute information of the audio/video) can be queried according to the query keyword to obtain the attribute information of the target audio/video.
This step is illustrated below as an example:
the user says: "I want to see Liudebua's movie group dragon deprives treasure";
conversion to text representation: i want to see Liudebhua movie group dragon seizing treasure;
word segmentation: i want to see Liudehua's movie group dragon seizing treasure;
semantic parsing: actor (actor): liu De Hua, title (name): group dragon seizes treasure;
and (5) result query: title (name): dragon-robo, actor (actor): liu de hua, actor (actor): guan Zhi Lin, actor (actor): memory lotus, director (director): yuanyuan flying, type (type): scenario, area: hong kong, time (time): 1988.
step 904: and the server generates at least two candidate reply texts containing evaluation information based on a reply text generation model and by using the attribute information of the target audio/video. The reply text generation model generates a reply text based on a reply text generation template obtained through training, and the training corpus contains attribute information and evaluation information, so that the reply text generation template obtained through training contains the attribute information and the evaluation information.
Taking the above example as an example, the generated candidate reply text includes:
y1: take you to review hong Kong action tablet "Lilong Duobao" and see Liu De Hua to perform the interior check of the body and hands.
Y2: the Chinese herbal medicine can be fast enjoyed with the inventor together with the hong Kong old action tablet, namely the crowd Long Dubao, and the Liudebhua is very wonderful in play.
Y3: the action of Yuan Xiang director is a "Yuan Long Du Bao" which describes a war for war on land, coming and my common grade bar!
Y4: do you want to see Liu De Hua and Jiu Lin biao opponents play? Then see the action tablet "Julong Duobao" bar.
Y5: the game is a game for competing for the Koran, and the action piece 'Liong Duobao' played by the Liude Hua master is appreciated together with the game.
The sentence patterns of the reply of the candidate text are dynamically generated according to the training model, and the number of the sentence patterns can be adjusted according to the requirement.
Step 905: and the server determines the score of each candidate reply text according to the evaluation information in the candidate reply text and the score of the evaluation information.
According to different parts of speech contained in the evaluation words, the method for calculating the score of the candidate reply text can comprise three methods:
the method comprises the following steps: inquiring a score corresponding to the emotion word according to the emotion word in the candidate reply text; and accumulating the scores of the emotion words in the candidate reply texts to obtain the scores corresponding to the candidate reply texts for each candidate reply text.
The method 1 determines the score of the candidate reply text only according to the emotional words of the target audio/video subjective feelings included in the evaluation information in the candidate reply text, wherein the more positive the subjective feelings expressed by the emotional words are, the higher the score corresponding to the emotional words is, for example: the score for the emotional word "wonderful" is 3.5, and the score for the emotional word "contends" is-1.5.
The method 2 comprises the following steps: determining the emotional words modified by the negative words in the candidate reply texts; multiplying the coefficient corresponding to the negative word by the score corresponding to the emotion word modified by the negative word to obtain the score of the emotion word; and accumulating the scores of the emotion words in the candidate reply texts to obtain the scores corresponding to the candidate reply texts.
And 2, determining the score of the candidate reply text according to the target audio and video subjective feeling emotional words and the negative words for modifying the emotional words included in the evaluation information in the candidate reply text.
As an example, the coefficient corresponding to the negative word is-1, and the score of the emotional word is corrected through the negative word contained in the reply text, so that the score of the reply text is more accurate and reasonable. As another example, the coefficient corresponding to the negative word may also be set to 0.
The method 3 comprises the following steps: determining the emotional words modified by the degree words in the candidate reply texts; multiplying the coefficient corresponding to the tone degree grade by the score corresponding to the emotion word modified by the level word according to the tone degree grade of the level word to obtain the score of the emotion word; and accumulating the scores of the emotion words in the candidate reply texts to obtain the scores corresponding to the candidate reply texts.
The method 3 determines the score of the candidate reply text according to the target audio and video subjective feeling emotional words and the emotion words modifying the emotional words included in the evaluation information in the candidate reply text, wherein the higher the mood degree level of the emotion words, the larger the coefficient corresponding to the emotion words, for example, the coefficient corresponding to the "very" degree words is 2.0, and the coefficient corresponding to the "very" degree words is 1.1, and the score of the emotion words is corrected by the emotion words included in the reply text, so that the score of the reply text is more accurate and reasonable.
Taking the above example as an example, performing word segmentation processing on the candidate reply text Y1-Y5, and determining the evaluation information and the score corresponding to the evaluation information in the reply text according to the evaluation word dictionary, includes:
y1: the hands are unusual- > 3.3, and retrospectively- > 0.5;
y2: old- > 0.1, wonderful- > 3.5, appreciated- > 2.1 and very- > 2.0;
y3: strive for- > -1.5, war- > -5.1, grade- > 1.2;
y4: biao- > 0.5, playing with opponent- > 0.2;
y5: compete for- > -1.5, game- > 0.9, enjoy- > 2.1.
Thus, the scores of the Y1-Y5 candidate reply texts are determined as follows:
y1 score: 3.3+0.5 ═ 3.8;
y1 score: 0.1+2.0 x 3.5+2.1 ═ 0.2;
y1 score: -1.5-5.1+1.2 ═ -5.4;
y1 score: 0.5+0.2 ═ 0.7;
y1 score: -1.5+0.9+2.1 ═ 1.5.
Step 906: and according to the score of each candidate reply text, the server selects an optimal reply text and sends the optimal reply text and target audio and video resource information, wherein the target audio and video resource information is acquired according to the query instruction.
Step 907: and the intelligent equipment displays the optimal reply text and the target audio and video resource information, or plays the optimal reply text and the target audio and video resource information in a voice mode, or displays and plays the optimal reply text and the target audio and video resource information.
In the embodiment of the present application, the timing sequence of each step of the above-described flow is not strictly limited, for example, step 904 may be executed before step 903, or step 903 and step 904 may be executed in parallel.
Taking the above example as an example, the score of the candidate reply text Y2 is optimal, and then Y2 is used as the final output reply text and is converted into voice for broadcasting.
Fig. 10 illustrates a user interface of the display device 200 after responding to an audiovisual query request input by a user according to an embodiment. As shown in the figure, after the user inputs "i want to watch the movie" and "dragon wins treasure" by voice, the attribute information of the movie is queried according to the keywords "movie" and "dragon wins treasure" therein:
film name: crowd dragon seizes treasure, actor: liudebua, actor: guan zhilin, actor: memory lotus, director: yuanyuan flying, type: scenario, type: action, region: hong kong, time: 1988.
the display device 200 generates a plurality of candidate reply texts based on the reply text generation model and by using the attribute information of the movie, determines the score of each candidate reply text according to the evaluation information in each candidate reply text and the score corresponding to the evaluation information, and further selects and obtains a reply text with the highest score, wherein the content of the reply text is that "fast and me enjoy hong kong old action films" group dragon robe "and see Liudeb Hua very wonderful play".
The display device 200 also sends a query request to the server to request for querying a video resource (URL) of the movie "bouquet treasure", according to the above-mentioned keywords, and receives a query result returned by the server.
The display device broadcasts the reply text by voice on the one hand, and displays the user interface 1000 on the display screen on the other hand. Included in the user interface 1000 is a first view display area 1001 in which the content of the reply text is displayed in text "fast with me enjoy hong kong old action" crowd long robo ", seeing a very wonderful play in liu de hua". The user interface 1000 further includes a second video display area 1002, in which the queried video resources (URLs) of the movie are displayed, where each video resource is displayed in a view control manner in which a movie poster thumbnail or a movie thumbnail is displayed, and when a user clicks an area in which a movie poster thumbnail or a movie thumbnail is located through a remote controller, a corresponding control is triggered, so that playing of the corresponding video resource is started.
Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present application.

Claims (5)

1. An information interaction method, comprising:
receiving a query instruction from an intelligent television, and acquiring attribute information of a target audio/video according to a keyword carried by the query instruction;
generating at least two candidate reply texts containing evaluation information based on a reply text generation model and by using the attribute information of the target audio/video; the reply text generation model generates a reply text based on a reply text generation template obtained through training, and attribute information of an audio and video used for training the reply text generation model comprises evaluation information, so that the reply text generation template obtained through training comprises the evaluation information;
determining the score of each candidate reply text according to the evaluation information in the candidate reply text and the score corresponding to the evaluation word;
selecting one reply text from the candidate reply texts according to the score of each candidate reply text;
and sending the reply text to enable the smart television to display and/or voice broadcast the reply text.
2. The method of claim 1, wherein the rating information includes emotional words for expressing subjective feelings to the target audio-video;
determining the score of each candidate reply text according to the evaluation information in the candidate reply text and the score corresponding to the evaluation information, wherein the determining specifically comprises the following steps:
inquiring a score corresponding to the emotion word according to the emotion word in the candidate reply text;
and accumulating the scores of the emotion words in the candidate reply texts to obtain the scores corresponding to the candidate reply texts for each candidate reply text.
3. The method of claim 2, wherein the rating information further comprises a negative word;
for each candidate reply text, accumulating the scores of the emotion words in the candidate reply text to obtain the score corresponding to the candidate reply text, which specifically comprises the following steps:
determining the emotional words modified by the negative words in the candidate reply texts;
multiplying the coefficient corresponding to the negative word by the score corresponding to the emotion word modified by the negative word to obtain the score of the emotion word;
and accumulating the scores of the emotion words in the candidate reply texts to obtain the scores corresponding to the candidate reply texts.
4. The method of claim 2, wherein the rating information further comprises a degree word;
for each candidate reply text, accumulating the scores of the emotion words in the candidate reply text to obtain the score corresponding to the candidate reply text, which specifically comprises the following steps:
determining the emotional words modified by the degree words in the candidate reply texts;
multiplying the coefficient corresponding to the tone degree grade by the score corresponding to the emotion word modified by the level word according to the tone degree grade of the level word to obtain the score of the emotion word;
and accumulating the scores of the emotion words in the candidate reply texts to obtain the scores corresponding to the candidate reply texts.
5. A computer storage medium having computer program instructions stored therein, which when run on a computer, cause the computer to perform the method of any one of claims 1-4.
CN201911205823.7A 2019-11-29 2019-11-29 Information interaction method Pending CN112883144A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911205823.7A CN112883144A (en) 2019-11-29 2019-11-29 Information interaction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911205823.7A CN112883144A (en) 2019-11-29 2019-11-29 Information interaction method

Publications (1)

Publication Number Publication Date
CN112883144A true CN112883144A (en) 2021-06-01

Family

ID=76039023

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911205823.7A Pending CN112883144A (en) 2019-11-29 2019-11-29 Information interaction method

Country Status (1)

Country Link
CN (1) CN112883144A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113486260A (en) * 2021-07-15 2021-10-08 北京三快在线科技有限公司 Interactive information generation method and device, computer equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030220784A1 (en) * 2002-05-24 2003-11-27 International Business Machines Corporation System and method for automated voice message transcription and delivery
CN105929964A (en) * 2016-05-10 2016-09-07 海信集团有限公司 Method and device for human-computer interaction
CN108882111A (en) * 2018-06-01 2018-11-23 四川斐讯信息技术有限公司 A kind of exchange method and system based on intelligent sound box
CN109151548A (en) * 2018-08-31 2019-01-04 北京优酷科技有限公司 Interface alternation method and device
CN110263247A (en) * 2019-05-20 2019-09-20 话媒(广州)科技有限公司 A kind of return information recommended method and electronic equipment
CN110275939A (en) * 2019-06-10 2019-09-24 腾讯科技(深圳)有限公司 Dialogue generates the determination method and device of model, storage medium, electronic equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030220784A1 (en) * 2002-05-24 2003-11-27 International Business Machines Corporation System and method for automated voice message transcription and delivery
CN105929964A (en) * 2016-05-10 2016-09-07 海信集团有限公司 Method and device for human-computer interaction
CN108882111A (en) * 2018-06-01 2018-11-23 四川斐讯信息技术有限公司 A kind of exchange method and system based on intelligent sound box
CN109151548A (en) * 2018-08-31 2019-01-04 北京优酷科技有限公司 Interface alternation method and device
CN110263247A (en) * 2019-05-20 2019-09-20 话媒(广州)科技有限公司 A kind of return information recommended method and electronic equipment
CN110275939A (en) * 2019-06-10 2019-09-24 腾讯科技(深圳)有限公司 Dialogue generates the determination method and device of model, storage medium, electronic equipment

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113486260A (en) * 2021-07-15 2021-10-08 北京三快在线科技有限公司 Interactive information generation method and device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
CN111372109B (en) Intelligent television and information interaction method
CN110737840A (en) Voice control method and display device
KR102117433B1 (en) Interactive video generation
CN112163086B (en) Multi-intention recognition method and display device
CN112000820A (en) Media asset recommendation method and display device
CN111984763B (en) Question answering processing method and intelligent device
WO2022032916A1 (en) Display system
CN112004157B (en) Multi-round voice interaction method and display device
CN111770370A (en) Display device, server and media asset recommendation method
CN111625716B (en) Media asset recommendation method, server and display device
CN112188249B (en) Electronic specification-based playing method and display device
CN111866568B (en) Display device, server and video collection acquisition method based on voice
CN111654732A (en) Advertisement playing method and display device
CN112492390A (en) Display device and content recommendation method
CN111914134A (en) Association recommendation method, intelligent device and service device
CN111083538A (en) Background image display method and device
CN112162809B (en) Display device and user collection display method
CN112883144A (en) Information interaction method
CN111182339A (en) Method for playing media item and display equipment
CN112053688B (en) Voice interaction method, interaction equipment and server
CN109640124B (en) Interactive information processing method and system and intelligent set top box
CN113490060A (en) Display device and method for determining common contact
CN111858856A (en) Multi-round search type chatting method and display equipment
CN114627864A (en) Display device and voice interaction method
CN111914565A (en) Electronic equipment and user statement processing method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210601

RJ01 Rejection of invention patent application after publication