CN110149549B

CN110149549B - Information display method and device

Info

Publication number: CN110149549B
Application number: CN201910141138.6A
Authority: CN
Inventors: 陈姿
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-02-26
Filing date: 2019-02-26
Publication date: 2022-09-13
Anticipated expiration: 2039-02-26
Also published as: CN110149549A

Abstract

The invention discloses a method and a device for displaying information. Wherein, the method comprises the following steps: performing semantic recognition on the received target voice information in the process of playing the first video through the client to obtain target semantic information corresponding to the target voice information; acquiring a target picture corresponding to the target semantic information from the semantic information with the corresponding relation and the first picture, wherein the semantic information is the semantic information corresponding to the voice information received when the first picture is acquired from the second video; and displaying the target picture and the target semantic information on the client. The invention solves the technical problem of poor interactivity in the process of playing the video.

Description

Information display method and device

Technical Field

The invention relates to the field of computers, in particular to a method and a device for displaying information.

Background

Currently, in the process of playing a video at a client, a user may interact with the video, for example: comments, bullet screens, etc. But the current interactive forms are single, and the experience that the forms are richer and the operation is more convenient is not provided for the user.

In view of the above problems, no effective solution has been proposed.

Disclosure of Invention

The embodiment of the invention provides a method and a device for displaying information, which at least solve the technical problem of poor interactivity in the process of playing a video.

According to an aspect of an embodiment of the present invention, there is provided a method for displaying information, including:

performing semantic recognition on received target voice information in the process of playing a first video through a client to obtain target semantic information corresponding to the target voice information;

acquiring a target picture corresponding to target semantic information from semantic information and a first picture which have a corresponding relation, wherein the semantic information is semantic information corresponding to voice information received when the first picture is acquired from a second video;

and displaying the target picture and the target semantic information on the client.

According to another aspect of the embodiments of the present invention, there is also provided an information display apparatus including:

the recognition module is used for performing semantic recognition on the received target voice information in the process of playing the first video through the client to obtain target semantic information corresponding to the target voice information;

the acquisition module is used for acquiring a target picture corresponding to the target semantic information from the semantic information with the corresponding relation and a first picture, wherein the semantic information is the semantic information corresponding to the received voice information when the first picture is acquired from a second video;

and the display module is used for displaying the target picture and the target semantic information on the client.

According to another aspect of the embodiments of the present invention, there is also provided a storage medium, characterized in that the storage medium stores therein a computer program, wherein the computer program is configured to execute the method described in any one of the above when executed.

According to another aspect of the embodiments of the present invention, there is also provided an electronic device, including a memory and a processor, wherein the memory stores therein a computer program, and the processor is configured to execute the method described in any one of the above through the computer program.

In the embodiment of the invention, in the process of playing the first video through the client, semantic recognition is carried out on the received target voice information to obtain target semantic information corresponding to the target voice information; acquiring a target picture corresponding to the target semantic information from the semantic information with the corresponding relation and the first picture, wherein the semantic information is the semantic information corresponding to the voice information received when the first picture is acquired from the second video; the method comprises the steps of identifying target semantic information corresponding to received target voice information in the process of playing a first video by displaying a target picture and the target semantic information on a client, acquiring the target picture corresponding to the target semantic information from the semantic information and the first picture with corresponding relation, and displaying the target picture and the identified target semantic information on the client for a user to select and send, so that the interaction with the user in a rich form is realized in the video playing process, the technical effect of improving the interactivity in the video playing process is realized, and the technical problem of poor interactivity in the video playing process is solved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

FIG. 1 is a schematic diagram of an alternative method of displaying information in accordance with an embodiment of the present invention;

FIG. 2 is a schematic diagram of an application environment of an alternative information display method according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of an alternative method of displaying information in accordance with an alternative embodiment of the present invention;

FIG. 4 is a schematic diagram of another alternative method of displaying information in accordance with an alternative embodiment of the present invention;

FIG. 5 is a schematic view of an alternative information display device according to an embodiment of the present invention;

fig. 6 is a first schematic application scenario of an alternative information display method according to an embodiment of the present invention;

fig. 7 is a schematic view of an application scenario ii of an alternative information display method according to an embodiment of the present invention; and

fig. 8 is a schematic diagram of an alternative electronic device according to an embodiment of the invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Moreover, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

According to an aspect of an embodiment of the present invention, there is provided a method for displaying information, as shown in fig. 1, the method including:

s102, performing semantic recognition on the received target voice information in the process of playing the first video through the client to obtain target semantic information corresponding to the target voice information;

s104, acquiring a target picture corresponding to the target semantic information from the semantic information with the corresponding relation and the first picture, wherein the semantic information is the semantic information corresponding to the voice information received when the first picture is acquired from the second video;

and S106, displaying the target picture and the target semantic information on the client.

Alternatively, in this embodiment, the information display method may be applied to a hardware environment formed by the client 202 shown in fig. 2. As shown in fig. 2, in the process of playing the first video through the client 202, the client 202 performs semantic recognition on the received target voice information to obtain target semantic information corresponding to the target voice information; the client 202 acquires a target picture corresponding to the target semantic information from the semantic information and the first picture with the corresponding relationship, wherein the semantic information is the semantic information corresponding to the voice information received when the first picture is acquired from the second video; the target picture and the target semantic information are displayed on the client 202.

Optionally, in this embodiment, the display method of the information may be, but is not limited to, applied to a scene in which information is displayed on a client. The client may be, but not limited to, various types of applications, such as an online education application, an instant messaging application, a community space application, a game application, a shopping application, a browser application, a financial application, a multimedia application, a live application, and the like. In particular, the method can be applied to, but not limited to, scenes in which information is displayed on the multimedia application, or can also be applied to, but not limited to, scenes in which information is displayed on the live application, so as to improve interactivity of video playing. The above is only an example, and this is not limited in this embodiment.

Optionally, in this embodiment, the client may include, but is not limited to, an application installed on the electronic device. For example: the electronic device may include, but is not limited to: the mobile terminal, the TV end, the STB, intelligent household electrical appliances, intelligent wearing equipment, intelligent household equipment, panel computer, PC computer etc..

Optionally, in this embodiment, the first video may include, but is not limited to: video files, live video streams, and the like. The second video may include, but is not limited to: video files, live video streams, and the like. The categories of the first video and the second video may be the same or different. For example: the first video is a video file, and the second video can be a video file or a live video stream.

Optionally, in this embodiment, the client performs semantic recognition on the received target voice information to obtain target semantic information, or the client sends the target voice information to the server, and the server recognizes the target voice information as the target semantic information and returns the target semantic information to the client.

Optionally, in this embodiment, the semantic information and the first picture having the corresponding relationship may be, but are not limited to, stored in the client, and the first picture obtained from the second video may include, but is not limited to, a picture obtained by capturing the second video when playing the second video, or a picture extracted from the second video and interested by the user.

Optionally, in this embodiment, the user may control the client to receive the target voice information by triggering the voice instruction or triggering the key instruction, for example: in the process of playing the first video through the client, a user can speak trigger words such as 'barrage', 'comment', 'spit slot' and the like to a voice acquisition device of the client or operate keys for triggering functions such as comment, barrage and the like, the client detects that the trigger words are trigger instructions after receiving the voice information, the target voice information can be detected, at the moment, the user can speak a section of voice such as 'what' to the voice acquisition device of the client, the client can recognize the voice as the target voice information in a semantic mode, and a corresponding target picture is obtained according to a recognition result.

In an optional embodiment, as shown in fig. 3, in the process of playing movie a through a video APP, semantic recognition is performed on received target voice information to obtain target semantic information "too cheer" corresponding to the target voice information, and a target picture corresponding to the target semantic information is obtained from an expression library (i.e., semantic information and a first picture having a corresponding relationship), where the semantic information is semantic information corresponding to voice information received when a plurality of first pictures (picture 1, picture 2, picture 3 … … picture N) are obtained from a plurality of second videos (movie B, drama C, variety D … … live broadcast N), and the picture 3 and the target semantic information "too cheer" are displayed on the videos.

Therefore, through the steps, in the process of playing the first video, the target semantic information corresponding to the received target voice information is identified, the target picture corresponding to the target semantic information is obtained from the semantic information and the first picture with the corresponding relation, the target picture and the identified target semantic information are displayed on the client for the user to select and send, and therefore interaction with the user in a rich form is achieved in the video playing process, the technical effect of improving interactivity in the process of playing the video is achieved, and the technical problem of poor interactivity in the process of playing the video is solved.

As an optional scheme, before obtaining a target picture corresponding to the target semantic information from the semantic information having the corresponding relationship and the first picture, the method further includes:

s1, receiving indication information in the process of playing the second video through the client, wherein the indication information is used for indicating that the first picture is obtained from the second video;

s2, responding the indication information to convert the received voice information into semantic information and acquiring a first picture from the second video;

and S3, establishing a corresponding relation between the semantic information and the first picture, and storing the semantic information and the first picture with the corresponding relation.

Optionally, in this embodiment, the first picture may be, but is not limited to, a picture acquired in the second video, and the first picture may be, but is not limited to, referred to as an emoticon picture, which may be, but is not limited to, made by the user who logs in the client during the process of watching the second video. For example: in the process of playing the second video, a user can control the video to pause at any time and can be used for triggering the function of producing the expression package by voice or keys, when the client detects that the function is triggered, the client can intercept a picture paused on the second video as a first picture and detect voice information sent by the user, the user can speak words or sentences at the moment as the voice information corresponding to the first picture, the client converts the voice information into semantic information, the semantic information is used as a key value of the corresponding first picture, the first picture is used as a value corresponding to the key value, and the semantic information and the first picture with corresponding relations are stored in a key-value form.

Optionally, in this embodiment, the client or the server may further obtain the first picture that can be made into the expression package from the second video, display the first picture to the user, input voice information for each first picture by the user, recognize the voice information as semantic information by the client or the server, use the semantic information as a key of the corresponding first picture, use the first picture as a value corresponding to the key value, and store the semantic information and the first picture with a corresponding relationship in a key-value form.

Optionally, in this embodiment, the semantic information and the first picture having the corresponding relationship may also be associated with a login account of the user, that is, each user corresponds to an expression library created by the user. The user may also share their corresponding emoticons to friends or to a social space.

Optionally, in this embodiment, the user may make an emoticon picture at any time and store the emoticon picture in the corresponding emoticon library during the process of watching the video.

Optionally, in this embodiment, the manner of obtaining the first picture from the second video may include, but is not limited to, capturing a screenshot from the played second video, or may also include, but is not limited to, extracting from a video frame of the second video.

In an alternative embodiment, as shown in fig. 4, in the process of playing the art D through the video APP, it is detected that the user has paused the playing of the art D, and says the instruction information of "making an expression bag" to the voice detection device, in response to the voice information received by the instruction information, the received voice information is converted into semantic information of "too laughing", and a picture 3 is obtained from the art D, the corresponding relationship between the semantic information of "too laughing" and the picture 3 is established, and the semantic information of "too laughing" and the picture 3 with the corresponding relationship are stored. The process of establishing the corresponding relationship between the other semantic information and the first picture is similar to that described above, and is not repeated here. And further obtains an expression library (movie B, TV play C, and integrated art D … … live broadcast N and pictures 1, 2 and 3 … … and N with corresponding relations). In the process of playing movie A through a video APP, it is detected that a user speaks target voice information to a voice detection device, semantic recognition is carried out on the received target voice information to obtain target semantic information 'too cheerful' corresponding to the target voice information, a target picture corresponding to the target semantic information is obtained from an expression library (movie B, TV play C, synthesis D … … live broadcast N and picture 1, picture 2 and picture 3 … … picture N with corresponding relations) and is picture 3, and the picture 3 and the target semantic information 'too cheerful' are displayed on the video APP.

As an alternative, the converting the received voice information into semantic information in response to the indication information, and obtaining the first picture from the second video includes:

s1, responding to the indication information to receive voice information, and capturing a playing picture from the second video to obtain a screenshot picture;

and S2, determining the screenshot picture as a first picture.

Optionally, in this embodiment, the first picture may include, but is not limited to, a screenshot picture taken from the second video.

Optionally, in this embodiment, the user may control the second video to pause playing, and then capture the screenshot picture from the second video. Alternatively, the screenshot picture may be taken from a second video that is playing.

s1, responding to the indication information to receive voice information, and capturing a playing picture from a second video to obtain a screenshot picture;

s2, adding semantic information to the screenshot picture to obtain a map picture;

s3, determining the map picture as a first picture; or determining the screenshot picture and the map picture as the first picture.

Optionally, in this embodiment, after the screenshot is performed on the second video, semantic information may also be added to the screenshot picture to obtain a map picture. The map picture may be only used as the first picture corresponding to the semantic information, or the map picture and the screenshot picture may be used together as the first picture corresponding to the semantic information.

Optionally, in this embodiment, in the process of adding the semantic information to the screenshot picture, some processing may be performed on the semantic information, such as: making artistic words, adding animation effect and the like, thereby enabling the expression form of the obtained first picture to be richer.

For example: in the above alternative embodiment, the picture 3 may include, but is not limited to, a screenshot picture taken from the heddle art D, and/or a "too cheerful" chartlet picture added to the screenshot picture.

As an optional scheme, after establishing the correspondence between the semantic information and the first picture, the method further includes:

and S1, sending the semantic information and the first picture with the corresponding relationship to a server, wherein the semantic information and the first picture with the corresponding relationship are used for indicating the server to extract the label information from the semantic information, establishing the corresponding relationship between the label information and the first picture, and storing the label information and the first picture with the corresponding relationship according to the label information.

Optionally, in this embodiment, the client may send the expression library, that is, the semantic information and the first picture having the corresponding relationship to the server. In order to save storage space and improve search efficiency, the server may extract tag information from the semantic information, where the tag information may be, but is not limited to, a keyword in the semantic information, and the tag information and the first picture having a corresponding relationship are stored according to the tag information.

For example: the server receives the 'too laughing' and the picture 3 with the corresponding relation, extracts the keyword 'too laughing' from the semantic information 'too laughing' as label information, at the moment, if an expression library with the label being the too laughing is stored in the server, the picture 3 is added into the expression library, and if the expression library with the label being the too laughing is not stored in the server, an expression library with the label being the too laughing can be newly built, and the picture 3 is stored inside.

As an optional scheme, in a case that a target picture is not acquired from semantic information and pictures having a correspondence relationship, the method further includes:

s1, sending the target voice information to a server, wherein the target voice information is used for instructing the server to convert the target voice information into target semantic information, extracting target label information from the target semantic information, and acquiring a target picture corresponding to the target label information from the label information and the second picture which have corresponding relations;

and S2, receiving the target picture sent by the server in response to the target voice message.

Optionally, in this embodiment, if the target picture is not acquired from the semantic information and the picture having the corresponding relationship, a picture corresponding to the semantic information may be searched on the server. And the server takes the picture corresponding to the tag information acquired from the tag information with the corresponding relation and the second picture as a target picture and sends the target picture to the client.

As an optional scheme, after the target picture and the target semantic information are displayed on the client, at least one of the following is further included:

s1, under the condition that a first instruction for indicating selection of a target picture is received, the target picture is sent to a first video as a bullet screen;

and S2, in the case of receiving a second instruction for indicating the selection of the target semantic information, sending the target semantic information to the first video as a bullet screen.

Optionally, in this embodiment, the user may select between the target picture and the target semantic information to send the bullet screen. Therefore, rich and novel interaction modes between the user and the video are realized.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.

Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

According to another aspect of the embodiments of the present invention, there is also provided an information display apparatus for implementing the above information display method, as shown in fig. 5, the apparatus including:

the recognition module 52 is configured to perform semantic recognition on the received target voice information during the process of playing the first video through the client, so as to obtain target semantic information corresponding to the target voice information;

an obtaining module 54, configured to obtain a target picture corresponding to the target semantic information from the semantic information having the corresponding relationship and the first picture, where the semantic information is semantic information corresponding to voice information received when the first picture is obtained from the second video;

and a display module 56 for displaying the target picture and the target semantic information on the client.

Alternatively, in the present embodiment, the display device of the information may be applied to, but not limited to, a scene in which information is displayed on a client. The client may be, but not limited to, various types of applications, such as an online education application, an instant messaging application, a community space application, a game application, a shopping application, a browser application, a financial application, a multimedia application, a live application, and the like. In particular, the method can be applied to, but not limited to, scenes in which information is displayed on the multimedia application, or can also be applied to, but not limited to, scenes in which information is displayed on the live application, so as to improve interactivity of video playing. The above is only an example, and this is not limited in this embodiment.

Optionally, in this embodiment, the user may control the client to receive the target voice information by triggering the voice instruction or triggering the key instruction, for example: in the process of playing the first video through the client, a user can speak trigger words such as 'barrage', 'comment', 'spit groove' and the like to a voice acquisition device of the client or operate keys for triggering functions such as comment, barrage and the like, the client detects that the trigger words are trigger instructions after receiving the voice information, the target voice information can be detected, at the moment, the user can speak a section of voice such as 'what' to the voice acquisition device of the client, the client can perform semantic recognition by taking the target voice information as the target voice information, and obtains a target picture corresponding to the target voice information according to a recognition result.

Therefore, through the device, in the process of playing the first video, the target semantic information corresponding to the received target voice information is identified, the target picture corresponding to the target semantic information is obtained from the semantic information and the first picture with the corresponding relation, the target picture and the identified target semantic information are displayed on the client for the user to select and send, and therefore interaction with the user in a rich form is achieved in the video playing process, the technical effect of improving interactivity in the process of playing the video is achieved, and the technical problem of poor interactivity in the process of playing the video is solved.

As an optional solution, the apparatus further includes:

the first receiving module is used for receiving indication information in the process of playing a second video through a client, wherein the indication information is used for indicating that a first picture is obtained from the second video;

the first processing module is used for responding the indication information, converting the received voice information into semantic information and acquiring a first picture from a second video;

and the second processing module is used for establishing a corresponding relation between the semantic information and the first picture and storing the semantic information and the first picture with the corresponding relation.

As an alternative, the first processing module includes:

the first processing unit is used for responding to the indication information, receiving voice information and intercepting a playing picture from a second video to obtain a screenshot picture;

and the first determining unit is used for determining the screenshot picture as a first picture.

As an alternative, the first processing module includes:

the second processing unit is used for responding to the indication information, receiving the voice information and intercepting a playing picture from a second video to obtain a screenshot picture;

the adding unit is used for adding the semantic information to the screenshot picture to obtain a mapping picture;

the second determining unit is used for determining the map picture as the first picture; or, the screenshot picture and the map picture are determined as the first picture.

As an optional solution, the apparatus further includes:

the first sending module is used for sending the semantic information and the first picture with the corresponding relation to the server, wherein the semantic information and the first picture with the corresponding relation are used for indicating the server to extract the label information from the semantic information, establishing the corresponding relation between the label information and the first picture, and storing the label information and the first picture with the corresponding relation according to the label information.

As an optional solution, the apparatus further includes:

the second sending module is used for sending the target voice information to the server under the condition that the target picture is not obtained from the semantic information and the picture with the corresponding relationship, wherein the target voice information is used for indicating the server to convert the target voice information into the target semantic information, extracting the target label information from the target semantic information, and obtaining the target picture corresponding to the target label information from the label information and the second picture with the corresponding relationship;

and the second receiving module is used for receiving the target picture sent by the server responding to the target voice message.

As an optional solution, the apparatus further includes at least one of:

the third sending module is used for sending the target picture to the first video as a barrage under the condition of receiving a first instruction for indicating the selection of the target picture;

and the fourth sending module is used for sending the target semantic information to the first video as a bullet screen under the condition of receiving a second instruction for indicating the selection of the target semantic information.

The application environment of the embodiment of the present invention may refer to the application environment in the above embodiments, but is not described herein again. The embodiment of the invention provides an optional specific application example of the connection method for implementing the real-time communication.

As an alternative embodiment, the above information display method can be applied, but not limited to, in the scene of playing video on the TV end as shown in fig. 6 and 7. The TV end is difficult to send the barrage and the comment characters, and the user scene experience is not friendly, so that the TV end is only used for watching at present and has no input end. In the scene, a user can input the expression package definition through screen capture and voice in the TV film watching process, for example, a picture of a movie star in an embarrassing smile is captured, and at the moment, the user inputs embarrassment through voice, and the screen capture picture is just the embarrassing meaning. When the video or other videos are watched, the user can directly output the picture by inputting the voice to play the screen embarrassment or inputting the voice to comment the user embarrassment, so that the comment fun of the TV end is increased.

In this scenario, a process for making a voice emoticon is provided, as shown in fig. 6, the process includes the following steps:

step 1, a user opens a video app to watch a video A;

step 2, clicking video pause, and inputting a specific voice instruction to the client through voice, such as 'making an emoticon';

step 3, the client transmits the voice command to the server, the server returns a prompt to the client through voice recognition and semantic understanding, the background prompts the user to operate next step, and the client prompts the user to 'please input the emotion bag label' through TTS voice;

step 4, the user receives the prompt, inputs the voice label by voice, and inputs the voice label that the user cannot endure to forget to smile;

step 5, the client recognizes the semantics through ASR, and recognizes the voice as semantic information 'cannot endure laughing illegally';

step 6, returning the recognized characters that the user can not live with the smile to the display interface of the client;

step 7, the client receives the characters identified by the background, displays the identified characters, secondarily confirms whether the characters are the meanings of the user, if the front end displays a button for providing confirmation and cancellation for the user, namely 'do you just say that you can not feel like stealing laughing'; if the user presses cancel, go back to step 4;

step 8, the client intercepts the currently paused picture, and stores the screenshot picture, the semantic label of the picture, namely 'cannot endure thieving' and the corresponding relation between the screenshot picture and the semantic label in a storage space;

step 9, the client can also upload the screenshot picture, the semantic label 'cannot endure the eavesdropping and smiling' of the picture and the corresponding relation between the screenshot picture and the semantic label to a server;

step 10, the server receives the information, extracts the keyword 'laugh' from the semantic label 'cannot live and want to steal' or identifies the category of the semantic label 'cannot live and want to steal' as 'fun';

and step 11, the server stores the screenshot picture in a storage position labeled as 'laugh' or labeled as 'laugh'.

When the user inputs comments by voice, whether the voice tag of the emoticon is hit or not can be seen according to the voice, as shown in fig. 7, the specific process includes the following steps:

step a, a user plays a video B in a video app of a TV end, and clicks and inputs a comment in a comment area of the video B;

step b, the client prompts a user to perform comment operation, and the user inputs voice comments through a voice remote controller;

step c, the client identifies semantics through ASR, and performs semantic label matching on the expression packets according to the semantics;

d, if the expression packet of the semantic label is identified by the client, returning an expression packet picture matched with the semantic label packet and the characters identified by the voice to a display interface of the client; if the expression package of the semantic label is not recognized by the client, the voice file is sent to the server;

step e, the server side recognizes the semantics through the ASR, extracts the keywords or recognizes the categories to which the keywords belong, finds out the pictures corresponding to the keywords or the categories and returns the pictures to the client side;

step f, the client displays the picture and the voice characters input by the user, and asks the user to select whether to use the picture and the recognized characters;

step g, the user confirms to use the picture to enter the step h, if the user does not want to input the picture, if the user selects the characters to enter the step i, comment reply is carried out by using the recognized characters returned by the background, and if the user presses the cancel, the step b is carried out;

step h, the client sends the picture to the appraisal area;

and step i, the client sends the characters to the comment area.

Through the process, the emoticon can be made at the TV end through voice, the voice comment and picture sending of a user can be supported, the text input is not required to be operated by a remote controller in the whole process, and the defect of troublesome text input at the TV side is avoided.

According to still another aspect of the embodiments of the present invention, there is also provided an electronic device for implementing the display method of information, as shown in fig. 8, the electronic device including: one or more processors 802 (only one of which is shown), in which a computer program is stored, a memory 804, in which a sensor 806, an encoder 808 and a transmission device 810 are arranged, wherein the processor is arranged to execute the steps of any of the above-described method embodiments by means of the computer program.

Optionally, in this embodiment, the electronic apparatus may be located in at least one network device of a plurality of network devices of a computer network.

Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:

s1, performing semantic recognition on the received target voice information in the process of playing the first video through the client to obtain target semantic information corresponding to the target voice information;

s2, acquiring a target picture corresponding to the target semantic information from the semantic information and the first picture which have the corresponding relation, wherein the semantic information is the semantic information corresponding to the voice information received when the first picture is acquired from the second video;

and S3, displaying the target picture and the target semantic information on the client.

Alternatively, it can be understood by those skilled in the art that the structure shown in fig. 8 is only an illustration, and the electronic device may also be a terminal device such as a smart phone (e.g., an Android phone, an iOS phone, etc.), a tablet computer, a palm computer, and a Mobile Internet Device (MID), PAD, etc. Fig. 8 is a diagram illustrating a structure of the electronic device. For example, the electronic device may also include more or fewer components (e.g., network interfaces, display devices, etc.) than shown in FIG. 8, or have a different configuration than shown in FIG. 8.

The memory 802 may be used to store software programs and modules, such as program instructions/modules corresponding to the information display method and apparatus in the embodiments of the present invention, and the processor 804 executes various functional applications and data processing by running the software programs and modules stored in the memory 802, that is, implements the control method of the target component described above. The memory 802 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 802 can further include memory located remotely from the processor 804, which can be connected to the terminal over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmitting device 810 is used for receiving or transmitting data via a network. Examples of the network may include a wired network and a wireless network. In one example, the transmission device 810 includes a Network adapter (NIC) that can be connected to a router via a Network cable and other Network devices to communicate with the internet or a local area Network. In one example, the transmission device 810 is a Radio Frequency (RF) module, which is used for communicating with the internet in a wireless manner.

The memory 802 is used, among other things, for storing application programs.

Embodiments of the present invention also provide a storage medium having a computer program stored therein, wherein the computer program is arranged to perform the steps of any of the above method embodiments when executed.

Alternatively, in the present embodiment, the storage medium may be configured to store a computer program for executing the steps of:

Optionally, the storage medium is further configured to store a computer program for executing the steps included in the method in the foregoing embodiment, which is not described in detail in this embodiment.

Alternatively, in this embodiment, a person skilled in the art may understand that all or part of the steps in the methods of the foregoing embodiments may be implemented by a program instructing hardware associated with the terminal device, where the program may be stored in a computer-readable storage medium, and the storage medium may include: flash disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

The integrated unit in the above embodiments, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in the above computer-readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing one or more computer devices (which may be personal computers, servers, network devices, etc.) to execute all or part of the steps of the method according to the embodiments of the present invention.

In the above embodiments of the present invention, the description of each embodiment has its own emphasis, and reference may be made to the related description of other embodiments for parts that are not described in detail in a certain embodiment.

In the several embodiments provided in the present application, it should be understood that the disclosed client may be implemented in other manners. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed coupling or direct coupling or communication connection between each other may be an indirect coupling or communication connection through some interfaces, units or modules, and may be electrical or in other forms.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. A method of displaying information, comprising:

receiving indication information of control operation triggered by each second video played in a client, respectively capturing a playing picture from each second video according to the indication information to obtain a plurality of screenshot pictures, and respectively converting each received voice message into semantic information in response to the indication information;

processing each semantic information, and adding the processed semantic information to the corresponding screenshot picture to obtain a plurality of first pictures, wherein the processing comprises: adding animation effect to the semantic information or making artistic words for the semantic information;

performing semantic recognition on the received target voice information in the process of playing the first video through the client to obtain target semantic information corresponding to the target voice information;

acquiring a target picture corresponding to the target semantic information from the semantic information with the corresponding relation and the first picture;

2. The method according to claim 1, wherein after adding the processed semantic information to the corresponding screenshot picture to obtain a plurality of first pictures, the method comprises:

and establishing respective corresponding relations between the semantic information and the first pictures, and storing the semantic information and the first pictures with the corresponding relations.

3. The method according to claim 2, wherein after establishing respective correspondences between respective ones of the semantic information and respective ones of the first pictures, the method further comprises:

and sending the semantic information with the corresponding relation and the first picture to a server, wherein the semantic information with the corresponding relation and the first picture are used for indicating the server to extract label information from the semantic information, establishing the corresponding relation between the label information and the first picture, and storing the label information with the corresponding relation and the first picture according to the label information.

4. The method according to claim 1, wherein in a case where the target picture is not acquired from the semantic information having the correspondence and the first picture, the method further comprises:

sending the target voice information to a server, wherein the target voice information is used for instructing the server to convert the target voice information into the target semantic information, extracting target label information from the target semantic information, and acquiring a target picture corresponding to the target label information from label information and a second picture which have a corresponding relationship;

and receiving the target picture sent by the server in response to the target voice message.

5. The method of any of claims 1-4, wherein after displaying the target picture and the target semantic information on the client, the method further comprises at least one of:

under the condition that a first instruction for indicating selection of the target picture is received, the target picture is sent to the first video as a bullet screen;

and under the condition that a second instruction for indicating selection of the target semantic information is received, sending the target semantic information to the first video as a bullet screen.

6. A display device for information, comprising:

the display device is also used for receiving indication information of control operation triggered by each second video played in the client, respectively intercepting a playing picture from each second video according to the indication information to obtain a plurality of screenshot pictures, and respectively converting each received voice message into semantic information in response to the indication information;

the display device is further configured to process each semantic information, add the processed semantic information to the corresponding screenshot picture, and obtain a plurality of first pictures, where the processing includes: adding animation effect to the semantic information or making artistic words for the semantic information;

the acquisition module is used for acquiring a target picture corresponding to the target semantic information from the semantic information with the corresponding relation and the first picture, wherein the semantic information is the semantic information corresponding to the voice information received when the first picture is acquired from a second video;

7. The apparatus of claim 6, further comprising:

and the second processing module is used for establishing respective corresponding relations between the semantic information and the first pictures and storing the semantic information and the first pictures with the corresponding relations.

8. The apparatus of claim 6, further comprising:

the second sending module is used for sending the target voice information to a server under the condition that the target picture is not obtained from the semantic information with the corresponding relation and the first picture, wherein the target voice information is used for instructing the server to convert the target voice information into the target semantic information, extracting target label information from the target semantic information, and obtaining the target picture corresponding to the target label information from the label information with the corresponding relation and a second picture;

and the second receiving module is used for receiving the target picture sent by the server in response to the target voice message.

9. The apparatus of any one of claims 6 to 8, further comprising at least one of:

the third sending module is used for sending the target picture to the first video as a bullet screen under the condition of receiving a first instruction for indicating selection of the target picture;

10. A storage medium, in which a computer program is stored, wherein the computer program is arranged to perform the method of any of claims 1 to 5 when executed.

11. An electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to execute the method of any of claims 1 to 5 by means of the computer program.