CN109286848B

CN109286848B - Terminal video information interaction method and device and storage medium

Info

Publication number: CN109286848B
Application number: CN201811167565.3A
Authority: CN
Inventors: 邓朔
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2018-10-08
Filing date: 2018-10-08
Publication date: 2020-08-04
Anticipated expiration: 2038-10-08
Also published as: CN109286848A

Abstract

The invention relates to the technical field of video processing, and discloses a method, a device and a storage medium for interacting terminal video information, which are used for improving the experience of a user when watching a video and improving the video information interaction rate between a terminal and the user. The method comprises the following steps: when a pause command of a user is received, pausing a video played by a terminal at present and obtaining a pause picture; acquiring element information related to the paused picture among elements constituting the video; sending the element information to a server for identification, and obtaining an identification result fed back by the server; and associating the recognition result in the pause picture.

Description

Terminal video information interaction method and device and storage medium

Technical Field

The present invention relates to the field of video processing technologies, and in particular, to an interaction method, an interaction device, and a storage medium for terminal video information.

Background

With the development of internet resources and the diversification of video playing software, more and more users like to watch video files on a terminal such as a tablet computer or a mobile device, and in the process of watching the video files, if the users want to know people, objects or other contents appearing in the video, the users need to leave the current video player and open a search engine in the terminal to search for related information, so that the experience of the users watching the video is greatly reduced.

Therefore, how to perform video information interaction in the video playing process to improve the experience of the user when watching the video is also a technical problem to be considered.

Disclosure of Invention

The embodiment of the invention provides a method and a device for interacting terminal video information and a storage medium, which are used for improving the experience of a user when watching a video and improving the video information interaction rate between a terminal and the user.

In one aspect, an embodiment of the present invention provides an interaction method for terminal video information, including:

when a pause command of a user is received, pausing a video played by a terminal at present and obtaining a pause picture;

acquiring element information related to the paused picture among elements constituting the video;

sending the element information to a server for identification, and obtaining an identification result fed back by the server;

and associating the recognition result in the pause picture.

In the embodiment of the invention, when the terminal receives the pause command of the user, the video currently played by the terminal can be paused, the terminal can acquire element information related to a paused picture in elements forming the video, such as information of images forming the video picture and information of audio forming video sound or subtitles, and then the terminal can send the acquired element information to the server for identification, so that an identification result fed back by the server is obtained, and further, the terminal can associate the identification result with the paused picture displayed on the display interface of the terminal to realize the interaction between the user and the video information played in the terminal equipment, so that the experience of the user watching the video is improved, and the video information interaction rate between the terminal and the user is also improved.

Optionally, the obtaining, among the elements constituting the video, element information related to the paused picture specifically includes: carrying out image recognition on the pause picture to acquire image information of the pause picture;

the sending the element information to a server for identification specifically includes:

sending the image information to a server for identification; and

the obtaining of the identification result fed back by the server specifically includes:

and acquiring the attribute information of the people and/or the objects fed back by the server, wherein the attribute information of the people and/or the objects is acquired by matching the image characteristic information of the people and/or the objects extracted from the image information by the server with the image characteristic information of the people and/or the objects stored in the server database.

Optionally, the obtaining, among the elements constituting the video, element information related to the paused picture specifically includes:

carrying out image recognition on the pause picture, acquiring image information of the pause picture, and extracting image characteristic information of people and/or objects from the image information;

sending the image characteristic information of the people and/or the objects to a server for identification; and

Optionally, before performing image recognition on the pause picture, the method further includes:

evaluating the pause picture and determining whether the pause picture meets a preset condition for identification.

Optionally, the evaluating the pause picture and determining whether the pause picture meets a preset condition for identification specifically includes:

performing face detection on the pause picture to determine whether the pause picture comprises a face;

when the pause picture comprises a face, determining whether the ratio of the face area to the pause picture area is greater than a proportional threshold;

when the ratio of the face area to the pause picture area is smaller than or equal to the ratio threshold, the pause picture is determined not to meet the preset condition.

carrying out edge detection on the pause picture to obtain the edge density of the pause picture

Determining whether the edge density is greater than a density threshold;

when the edge density is greater than the density threshold, determining that the pause picture meets the preset condition, and when the edge density is less than or equal to the density threshold, determining that the pause picture does not meet the preset condition.

Optionally, the identification result fed back by the server is attribute information representing the person and/or object, and associating the identification result in the pause picture specifically includes:

displaying the attribute information at a preset position in the pause picture; or

When the attribute information comprises the position information of the person or object in the pause picture, establishing a man-machine interaction component at the position of the person and/or object identified in the pause picture according to the position information, and establishing association between the component and the attribute information so as to display the attribute information when receiving a command of operating the component by the user.

Optionally, the obtaining of element information related to a paused picture in the elements forming the video specifically includes:

acquiring audio information within a preset time length of a time point containing the pause picture in the video;

the sending the element information to a server for identification specifically includes: and sending the audio information to a server for identification.

Optionally, the sending the audio information to a server for identification specifically includes:

determining an audio category in the audio information;

and when the audio type is a voice signal or a music signal, sending the audio information to the server so that the server can identify the audio content of the audio information.

Optionally, the identification result fed back by the server is attribute information representing the audio content, and associating the identification result in the pause picture specifically includes:

And establishing a human-computer interaction component at a preset position in the pause picture, and associating the attribute information with the component so as to display the attribute information when a command of the user for operating the component is received.

On the other hand, an embodiment of the present invention provides an interactive apparatus for terminal video information, including:

the pause unit is used for pausing the video currently played by the terminal and obtaining a pause picture when a pause command of a user is received;

the acquisition unit is used for acquiring element information related to the pause picture from elements forming the video, sending the element information to a server for identification, and acquiring an identification result fed back by the server;

and the association unit is used for associating the identification result in the pause picture.

Optionally, the obtaining unit is further configured to:

carrying out image recognition on the pause picture to acquire image information of the pause picture;

sending the image information to the server for identification; and

Optionally, the obtaining unit is further configured to:

Optionally, the obtaining unit is further configured to: evaluating the pause picture and determining whether the pause picture meets a preset condition for identification.

Optionally, the obtaining unit is further configured to:

carrying out edge detection on the pause picture to obtain the edge density of the pause picture;

determining whether the edge density is greater than a density threshold;

Optionally, the associating unit is further configured to:

When the attribute information comprises position information of a person or an object in the pause picture, establishing a man-machine interaction component at the position of the person and/or the object identified in the pause picture according to the position information, and establishing association between the component and the attribute information so as to display the attribute information when receiving a command of operating the component by a user.

Optionally, the obtaining unit is further configured to:

determining an audio category in the audio information;

Optionally, the associating unit is further configured to:

And establishing a man-machine interaction component at a preset position in the pause picture, and associating the attribute information with the component so as to display the attribute information when receiving a command of operating the component by the user.

In another aspect, an embodiment of the present invention provides an information processing apparatus, including at least one processor and at least one memory, where the memory stores a computer program, and when the program is executed by the processor, the processor is caused to execute the steps of the above method for interacting with terminal video information.

In another aspect, an embodiment of the present invention provides a storage medium, where the storage medium stores computer instructions, and when the computer instructions are run on a computer, the computer is caused to perform the steps of the method for interacting with video information of a terminal as described above.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention.

Fig. 1 is a schematic view of an application scenario provided in an embodiment of the present invention;

fig. 2 is an interaction flowchart of terminal video information according to an embodiment of the present invention;

FIG. 3 is a flowchart of a method for evaluating a pause picture according to an embodiment of the present invention;

FIG. 4 is a flowchart of another method for evaluating a pause picture according to an embodiment of the present invention;

FIG. 5 is a flowchart illustrating image recognition of a paused image according to an embodiment of the present invention;

FIG. 6 is a flowchart illustrating another exemplary method for image recognition of a paused image according to an embodiment of the present invention;

fig. 7 is a schematic diagram of identifying a person and an object in an image according to an embodiment of the present invention;

FIG. 8 is a schematic diagram of another method for identifying people and objects in an image according to an embodiment of the present invention;

fig. 9 is a schematic diagram of a further method for identifying a person and an object in an image according to an embodiment of the present invention;

fig. 10 is a schematic diagram of a further method for identifying a person or an object in an image according to an embodiment of the present invention;

FIG. 11 is a flowchart illustrating an exemplary embodiment of the present invention for identifying audio of a paused screen;

fig. 12 is a schematic diagram of an interactive device for terminal video information according to an embodiment of the present invention;

fig. 13 is a schematic diagram of another interactive apparatus for terminal video information according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the technical solutions of the present invention. All other embodiments obtained by a person skilled in the art without any inventive work based on the embodiments described in the present application are within the scope of the protection of the technical solution of the present invention.

In the prior art, in the video playing process, if a user wants to know people, objects or other contents appearing in a video, the user needs to leave the current video player and open a search engine in a terminal to search for related information, so that the user experience is reduced, and the speed of acquiring the related information in the video is slow, so that the user experience is further influenced.

To this end, the embodiment of the present invention provides an interaction method for terminal video information, which, when receiving a pause command from a user, pauses a video currently played by a terminal, and further obtains element information related to a paused screen in elements constituting the video, where the obtained element information may be related information of an image constituting the paused screen or related information of an audio synchronized with the paused screen, and of course, the obtained element information may also include related information of the image and related information of the audio at the same time, the terminal may send the obtained element information to a server for identification to identify image content in the paused screen and/or audio content synchronized with the paused screen, and further may associate the identified content in the paused screen to implement interaction between the user and the image content in the paused screen and/or the audio content synchronized with the paused screen in the terminal device, therefore, the experience of the user in watching the video is improved, and the video information interaction rate between the terminal and the user is also improved.

The method for interacting the terminal video information in the embodiment of the present invention may be applied to an application scenario as shown in fig. 1, where the application scenario includes a terminal 10 and a server 11, the terminal 10 is any intelligent electronic device with a video playing function that can run according to a program and automatically process a large amount of data at a high speed, such as a smart phone, a tablet computer, and the like, the server 11 may be one server, or may be a server cluster or a cloud computing center formed by a plurality of servers, the terminal 10 communicates with the server 11 through a network, and the network may be any one of communication networks such as a local area network, a wide area network, or a mobile internet. In the application scenario, the terminal 10 may install any type of video playing software, and then play the video through the installed video playing software, and in the process of playing the video through the video playing software, if a pause command of a user is received, the video information interaction method of the terminal provided by the embodiment of the present invention may be used for processing, which will be described in detail below.

It should be noted that the above-mentioned application scenarios are only presented to facilitate understanding of the spirit and principles of the present invention, and the present invention is not limited in this respect. Rather, embodiments of the present invention may be applied in any scenario where applicable.

The following describes an interaction method for terminal video information provided by an embodiment of the present invention with reference to an application scenario shown in fig. 1.

As shown in fig. 2, an interaction method for terminal video information provided in an embodiment of the present invention includes:

step 101: and when a pause command of a user is received, pausing the video currently played by the terminal.

In the embodiment of the present invention, the manner in which the terminal receives the pause command of the user includes multiple manners, for example, the pause command of the user may be obtained by receiving a click operation of the user clicking a video picture being played, the pause command of the user may also be obtained by receiving a manner in which the user clicks a button specially set in the terminal for pausing a currently played video, and the pause command of the user may also be obtained by receiving a voice manner sent by the user and further recognizing the voice.

After receiving a pause command of a user, the terminal pauses the video played by the terminal currently, and simultaneously, a display interface of the terminal displays a pause picture when the video is paused, wherein the pause picture refers to one frame of image in the video displayed by the display interface of the terminal when the terminal executes the operation of pausing the video played currently.

Step 102: element information related to a paused picture among elements constituting a video is acquired.

In the embodiment of the invention, the elements forming the video refer to images forming a video picture and audio forming video sound, when the video is played, the images change at a speed exceeding 24 frames per second so as to form a continuous video picture, and the audio and the images are synchronized so as to synchronously play the sound of the video in the continuous video picture.

Therefore, in the embodiment of the present invention, after pausing the currently played video, the terminal acquires the element information related to the paused picture in the elements constituting the video, where the element information related to the paused picture may be related to an image constituting the paused picture, may also be related to an audio synchronized with the paused picture, and may also include both the above related information.

For example, when the paused screen has no synchronized audio, the terminal may acquire information related to the image related to the paused screen, and the element information in step 102 may be information related to the image related to the paused screen; when the paused screen has synchronized audio, the element information in step 102 may include both information related to an image related to the paused screen and information related to audio synchronized with the paused screen.

Step 103: and sending the element information to a server for identification, and obtaining an identification result fed back by the server.

In the embodiment of the invention, in order to reduce the requirement on terminal hardware and increase the speed of video information interaction, after acquiring the relevant information of the image forming the pause picture and/or the relevant information of the audio synchronized with the pause picture, the terminal can send the relevant information to a background server (hereinafter referred to as the server for identification), and through the feedback of the server, the terminal can obtain the identification result of the relevant information of the image forming the pause picture and the identification result of the relevant information of the audio synchronized with the pause picture.

The following description will be made in detail with respect to the case where the terminal transmits the acquired information about the image constituting the paused screen to the server for identification, and the terminal transmits the acquired information about the audio synchronized with the paused screen to the server for identification, respectively.

Step 104: the recognition result is associated in the pause screen.

In the embodiment of the present invention, the terminal receives the recognition result fed back by the server, for example, the terminal sends the image-related information forming the pause picture to the server for recognition, and correspondingly, the terminal receives the recognition result of the image-related information fed back by the server; and sending the related information of the audio synchronized with the pause picture to the server for identification at the terminal, and correspondingly receiving the identification result of the audio content fed back by the server by the terminal.

In the embodiment of the present invention, the terminal associates the recognition result of the image related information and the recognition result of the audio content, which are fed back by the receiving server, in the pause picture displayed by the terminal, where the association manner includes multiple manners, for example, the recognition result of the image related information and the recognition result of the audio content may be displayed in a preset position of the pause picture displayed by the terminal, where the preset position may be any position in the pause picture displayed by the terminal; the method can also display a virtual key on the terminal, associate the identification result of the image related information with the virtual key and the identification result of the audio content, and further display the identification content associated with the virtual key when receiving the operation of the virtual key by the user.

Therefore, according to the method, when the terminal receives a pause command of a user, the video currently played by the terminal can be paused, element information related to a paused picture, such as information of an image forming the video picture and information of audio forming video sound or subtitles, in elements forming the video, can be acquired, and then the acquired element information can be sent to the server for identification by the terminal, so that an identification result fed back by the server can be acquired.

Alternatively, in the embodiment of the present invention, in order to improve the probability that the user intends to know the intention of people, things, or other contents appearing in the paused screen, the command of pausing the video currently played by the terminal is accurately recognized, and therefore, before the step 102 is executed, the paused screen may be evaluated, and it may be determined whether the paused screen satisfies the preset condition for recognition.

In practical applications, the more the amount of information included in the pause picture, the greater the probability that the user issues the pause command because the user wants to know the intention of the content in the pause picture, therefore, in the embodiment of the present invention, after the terminal acquires the pause command of the user and pauses the video currently played by the terminal, the terminal may first evaluate the pause picture (which may also be referred to as value evaluation of the pause picture) to evaluate the amount of information included in the pause picture, and then, according to the evaluation result, may determine the size of the probability that the user issues the pause command because the user wants to know the intention of the content in the pause picture, and then determine whether to identify the pause picture.

Thus, the preset conditions may be set as: judging whether the information quantity contained in the pause picture is larger than a threshold value or not according to the evaluation result, if so, determining that the pause command sent by the user is because of the high probability of the intention of the user to know the content in the pause picture, and further determining to identify the pause picture; otherwise, it is considered that the reason why the user issues the pause command is that the probability of the user wanting to know the intention of the content in the pause picture is small, and it is determined that the pause picture is not recognized, therefore, in the embodiment of the present invention, by adopting the preset condition for determining whether the pause picture satisfies the recognition, and when the evaluation result satisfies the preset condition for recognizing the pause picture, the step 102 is executed, which has the effect of being able to improve the accuracy of recognizing the user issuing the command of pausing the video currently played by the terminal, because the probability of the user wanting to know the intention of the content in the pause picture, and recognizing the pause picture after recognizing the intention, so as to realize the interaction of the video information, and also having the effect of being able to reduce the consumption of unnecessary terminal resources.

In the embodiment of the present invention, the manner of evaluating the pause screen can be flexibly set, and two preferable manners of evaluating the pause screen are listed below:

one way, the flow shown in fig. 3, includes:

step 201: carrying out face detection on the pause picture;

step 202: determining whether the pause picture comprises a human face or not according to the detection result, if so, executing a step 203, and otherwise, ending the operation;

step 203: determining whether the ratio of the face area to the pause picture area is greater than a proportional threshold, if so, executing step 204, otherwise, executing step 205;

step 204: determining that the pause picture meets a preset condition for identification;

step 205: and determining that the pause picture does not meet the preset condition for identification.

In this way in the embodiment of the present invention, based on the idea that people have a high probability of being interested in people who appear in a video, if more people appear in a paused picture, the information amount is larger, and the probability of being interested in people is higher, so a face detection method is adopted to perform value evaluation on the paused picture.

The face detection technology is that for any given image, a certain strategy is adopted to search the image to determine whether the image contains a face, and if so, the position, size and posture of the face are returned.

In the embodiment of the invention, after the face detection technology is adopted to detect the pause picture, whether the pause picture comprises the face or not can be determined according to the detection result, if the pause picture does not comprise the face, the representation of the small information amount in the pause picture can be finished, and if the pause picture comprises the face according to the detection result, whether the ratio of the face area in the pause picture to the pause picture area is larger than the proportional threshold value or not can be further determined, so that the accuracy of the value evaluation of the pause picture can be further improved.

In the embodiment of the present invention, when the pause picture includes a human face, if the ratio of the area of the included human face to the area of the pause picture is too small, the information amount in the pause picture is not only small, and there may be a case where the human face cannot be identified by using the human face identification technology, therefore, a ratio threshold may be set in advance, when the ratio of the area of the pause picture to the area of the pause picture in the pause picture is greater than the ratio threshold, it is determined that the pause picture satisfies the condition for performing identification, the step 102 is executed, otherwise, it is determined that the pause picture does not satisfy the condition for performing identification, and the operation may be ended.

Another way, as shown in fig. 4, includes:

step 301: carrying out edge detection on the pause picture to obtain the edge density of the pause picture;

step 302: determining whether the edge density is greater than a density threshold, if so, executing step 303, otherwise, executing step 304;

step 303: determining that the pause picture meets a preset condition for identification;

step 304: and determining that the pause picture does not meet the preset condition for identification.

In this manner of the embodiment of the present invention, in view of that in the imaging, the information of the image is often contained in the object with complex texture, the object containing such an object is often rich in edge and rich in high-frequency information, and if the edge density of the paused image is higher, the amount of information is more abundant, so the value of the paused image is evaluated by using the image edge detection technology.

The image edge refers to the discontinuity of local features of an edge image, and the edge widely exists between a target and a target, between an object and a background, and between an area and an area, so that the higher the image edge density is, the more abundant the information content is.

In the embodiment of the invention, when the image edge detection technology is adopted to carry out edge detection on the pause picture, the edge of the pause picture is extracted by using the edge detection, and then the morphological filtering is used to remove noise to obtain the I of the binary pause picture_binCalculating and obtaining the edge density of the pause picture by adopting a formula (1):

where m is the width of the image and n is the height of the image.

In the embodiment of the present invention, a density threshold may be preset, where the density threshold is used for comparing with the edge density of the paused picture (hereinafter, abbreviated as the paused picture edge density) obtained by the judgment calculation, when the paused picture edge density is greater than the density threshold, the amount of information included in the paused picture is considered to be rich, and the condition for identifying the paused picture is satisfied, the above step 102 is executed, otherwise, it is determined that the paused picture does not satisfy the condition for identifying, and the operation may be ended. In the embodiment of the invention, a lot of experiments prove that when the density threshold value is set to be about 0.3 (such as 0.3), the accuracy of the pause picture value evaluation is higher. Therefore, in the embodiment of the present invention, the density threshold may be set to 0.3.

It should be noted that, in the specific practical process, the two value evaluation manners described above may be simultaneously adopted to evaluate the value of the paused image, or the value of the paused image may be evaluated by combining with other value evaluation manners.

Alternatively, in this embodiment of the present invention, before performing step 102, it may be further determined whether a command for the user to recognize the paused screen is received.

In the embodiment of the present invention, before step 102 is executed, in addition to the manner of evaluating the paused screen described above, the terminal may automatically identify that the user issues a command to pause the video currently played by the terminal, because the user wants to know the intention of people, things, or other content appearing in the paused screen, and may also actively trigger the function of identifying the paused screen by the terminal by determining whether the user receives the command to identify the paused screen.

For example, when the terminal receives a pause command from a user and pauses a video currently played by the terminal, an operation key may be displayed in a pause screen, where the operation key serves as an interface through which the user transmits an instruction that the user wants to identify the pause terminal to the terminal, and when the user operates the operation key, if the user clicks the operation key, the terminal acquires the instruction that the user wants to identify the pause terminal, and then the terminal may execute the step 102, otherwise, the pause screen is not identified, and the operation may be ended.

Alternatively, in this embodiment of the present invention, if the terminal obtains, in the elements forming the video, that the information of the element related to the paused picture includes information related to an image forming the paused picture, step 102 and step 103 in this embodiment of the present invention may also be specifically executed according to the flow shown in fig. 5, or executed according to the flow shown in fig. 6.

The process shown in fig. 5 includes:

step 401: carrying out image recognition on the pause picture to acquire image information of the pause picture;

step 402: sending the image information to a server for identification;

step 403: and acquiring the attribute information of the person and/or object, which is acquired by the server through matching according to the image characteristic information of the person and/or object extracted from the image information.

The process shown in fig. 6 includes:

step 404: carrying out image recognition on the pause picture to acquire image information of the pause picture;

step 405: extracting image characteristic information of people and/or objects from the image information;

step 406: sending the image characteristic information of people and/or objects to a server for identification;

step 407: and acquiring the attribute information of the person and/or the object, which is acquired by the server through matching according to the received image characteristic information of the person and/or the object.

In the embodiment of the present invention, after receiving a pause command from a user and pausing a played video, a terminal may perform image recognition on a paused picture to further obtain image information of the paused picture, where the image information may be information of a frame of image corresponding to the paused picture in the video or information of an image of the paused picture directly captured, and after obtaining the image information of the paused picture, the terminal may directly send the image information to a server as shown in step 402 in fig. 5, and after receiving the image information of the paused picture sent by the terminal, the server identifies an image in the image information by using an image recognition technology to identify a person or an object included in the image.

Assuming that an image in the image information is an image of a singing meeting of zhang san as shown in fig. 7, the server may perform color and color saliency analysis on the image, that is, saliency algorithm processing, Pixel-wise segmentation on the image, that is, image segmentation, and then, in combination with the saliency analysis information, foreground information in the image is obtained, and comparison analysis is performed on the foreground information in the image and information of people and objects stored in an image database in the server, that is, a face recognition and person attribute determination process and an object recognition and object attribute determination process are performed to identify who the people in the image are and what the object in the image is.

The image database in the server stores a large amount of face feature information of different people and corresponding person attribute information, and also stores a large amount of feature information of different objects and attribute information of each object, wherein the face feature information refers to the description of the shape of face organs and the description of distance features between the face organs, and the face organs mainly comprise eyes, a nose, a mouth, a chin and the like; the feature information of an object refers to a description of the shape, structure, and the like of the object.

The server can extract the face in figure 7, the face 1 in figure 7, perform face recognition on the face 1, extract the feature information of the face 1, compare the feature information of the face 1 with the face feature information in the image database in the server, to identify who the face 1 is, assuming that the face 1 is three, the server may further search attribute information of three from the image database, where the attribute information of three may include name, age, height, weight, major works, and so on of three, and based on the same principle, the server may identify other faces in the image, such as the face 2 in fig. 7, to obtain attribute information of the person corresponding to the face 2, and then the terminal obtains the attribute information of the person corresponding to the No. 1 face and the attribute information of the person corresponding to the No. 2 face in the image 7 based on the feedback of the server.

Similarly, for the object in the image, the server may extract the object number 3 and the object number 4 in fig. 7 for object recognition, similarly, the server may extract the feature information of the object number 3 and the object number 4 first, compare the feature information of the object number 3 and the object number 4 with the feature information of the object in the image database in the server to recognize what the object number 3 and the object number 4 are, that is, it is recognized that the number 3 object and the number 4 object are both microphones, the server may further search the attribute information of the microphones from the image database, the attribute information of the microphones may include information such as price, classification, and microphone operation principle description of the microphones, based on the same principle, the server may recognize other objects in the image, obtain attribute information of the other objects, and then the terminal obtains the attribute information of the objects in the image based on the feedback of the server.

It should be noted that, in practical applications, there is also a case that the image in the image information obtained in step 401 and step 404 includes a face-free object or a case that the included object does not include a face, and correspondingly, when the image includes a face-free object, the face in the image may be identified as described above, and then attribute information of a person corresponding to the identified face is obtained; when the image-containing object does not include a human face, the object in the image can be identified by the above-mentioned method, and the attribute information of the identified object can be obtained.

In the flow shown in fig. 6, after acquiring the image information of the pause picture, the terminal may pre-process the image in the image information to extract the image feature information of the person and/or object in the image, send the extracted image feature information of the person and the extracted image feature information of the object to the server for image recognition to identify who the person is and what the object is in the image, and obtain the attribute information of the person and the attribute information of the object corresponding to the face in the image based on the feedback of the server.

In an alternative mode, in the embodiment of the present invention, the modes in which the terminal associates the obtained attribute information of the person and the attribute information of the object in the pause image include at least the following two modes.

One way of association is:

and displaying the attribute information at a preset position in the pause picture.

In the embodiment of the present invention, the terminal may preset a position for displaying the identification result in the pause screen, where the preset position may be any position in the pause screen displayed by the terminal, and the terminal may display the identification result fed back by the server, that is, the attribute information of the person and the attribute information of the object, at the preset position set in the pause screen.

For example, as shown in fig. 8, the preset position may be set beside the position of the corresponding person or object in the pause screen, so that the obtained attribute information of the person may be displayed in the area, and assuming that the attribute information of the person corresponding to the face No. 1, i.e., the person three, obtained in fig. 8 includes name, height and age information of the person three, and the person corresponding to the face No. 2, i.e., the person four, obtained in fig. 8 includes name, height and age information of the person four, and the attribute information of the object is the name of the object, e.g., the microphone in fig. 8, then the name, height and age information of the identified person corresponding to the face No. 1, i.e., the person three, the name, height and age information of the person corresponding to the face No. 2, i.e., the name of the person four, may be displayed beside the position of the corresponding person or object in the pause screen shown in fig. 8, and the name of the microphone, so as to realize the interaction between the user and the video information played in the terminal equipment.

Another association way is as follows:

establishing a human-computer interaction component at the position of the identified person and/or object in the pause picture according to the position information of the identified person or object in the pause picture; and establishing association between the human-computer interaction component and the attribute information so as to display the associated attribute information when a command of operating the human-computer interaction component by a user is received.

In the embodiment of the present invention, the terminal may further establish a human-computer interaction component, such as a UI component, at the position where the identified person is located in the pause picture, establish a human-computer interaction component at the position where the identified object is located in the pause picture, associate the attribute information of the identified person with the human-computer interaction component established at the position where the person is located in the pause picture, and associate the attribute information of the identified object with the human-computer interaction component established at the position where the object is located in the pause picture, so that the user may obtain the attribute information of the person associated with the human-computer interaction component or obtain the attribute information of the object associated with the human-computer interaction component by operating the human-computer interaction component established in the pause picture.

For example, as shown in fig. 9, the terminal respectively establishes a human-computer interaction component at the identified position of the third page, the position of the fourth page and the position of the microphone at the pause picture, the style of the established human-computer interaction component can be flexibly set, and the human-computer interaction component in fig. 9 is a semitransparent ellipse, so that when the user clicks the human-computer interaction component in fig. 9, if the user clicks the human-computer interaction group of the third page in fig. 9, the terminal can display the attribute information of the third page, namely the name, height and age information of the third page, on the pause picture as shown in fig. 10, so as to realize the interaction between the user and the video information played in the terminal device.

Alternatively, in this embodiment of the present invention, if the terminal obtains, in the elements forming the video, that the information of the element related to the paused picture includes information related to the audio forming the paused picture, step 102 and step 103 in this embodiment of the present invention may also be specifically executed according to the flow shown in fig. 11.

The process shown in fig. 11 includes:

step 501: acquiring audio information within a preset time length of a time point containing the pause picture in the video;

step 502: judging the audio type in the audio information, if the audio type is a voice signal or a music signal, executing step 503, otherwise, ending the process;

step 503: and sending the audio information to the server so that the server can identify the audio content of the audio information.

In the embodiment of the present invention, when the paused screen has a synchronized audio, the terminal may further obtain related information of the audio synchronized with the paused screen, and in order to more accurately identify the audio synchronized with the paused screen, the obtained related information of the audio may specifically be audio information within a preset time duration including a time point of the paused screen, where the time point of the paused screen refers to a time node when the video is played to the paused screen.

For example, when the total duration of the video is 1 hour and 20 minutes, the time node of playing the pause picture is 50 minutes of 1 hour and 20 minutes, and then the time point of the pause picture is the 50 th minute, so that the audio information corresponding to the preset duration including the 50 th minute can be acquired, the preset duration can be flexibly set according to actual needs, for example, set to 3 minutes, and correspondingly the audio information corresponding to 49 th to 51 th minutes, or correspondingly the audio information corresponding to 50 th to 52 th minutes, or the audio information corresponding to 48 th to 50 th minutes can be acquired.

In the embodiment of the present invention, for example, the audio information corresponding to 49 th to 51 th minutes is obtained, after the terminal obtains the audio information corresponding to 49 th to 51 th minutes, the terminal may perform preprocessing on the audio information to identify the audio category of the audio information, generally, the audio category is divided into a speech signal (e.g., human speech), a music signal (e.g., singing voice, musical instrument voice), and noise, and the preprocessing mode of the audio information may be based on a machine learning speech recognition algorithm to perform a binary operation on the audio information to identify the audio category of the audio information.

If the audio category of the audio information is identified as a voice signal or a music signal, the terminal may send the audio information to the server, and the server identifies the audio content of the audio information, for example, when the audio category of the audio information is a music signal, the audio content identified by the server includes attribute information related to music, where the attribute information related to music may be information such as a name of a song corresponding to the music, lyrics of the song, a creator of the song, and the like; when the audio category of the audio information is a voice signal, the attribute information of the audio content identified by the server and related to the dialog may be the specific content of the dialog, and in practical applications, the specific content of the dialog may be presented in a text manner, so that the attribute information identified by the server and related to the dialog may specifically be a text corresponding to the dialog.

Similarly, the server may send the identified audio content to the terminal, so that the terminal obtains the identified audio content, and the terminal may associate the identified audio content in the pause picture, where the association may be as described above, and display the identified audio content in a preset position of the pause picture displayed by the terminal, and may also display a virtual key on the terminal, associate the virtual key with the identified audio content, and then display the identified audio content associated with the virtual key when receiving an operation of the virtual key by the user, so as to implement interaction between the user and the video information played in the terminal device, where the description is not repeated.

In practical application, the method for interacting the terminal video information in the embodiment of the present invention may be applied to any terminal device related to video playing, and when the terminal device plays a video, the interaction between the user and the video information played in the terminal device may be implemented according to the method for interacting the terminal video information provided in the embodiment of the present invention, so as to improve the video watching experience of the user.

In practical application, programming languages such as C language, C + + language, Java language, and the like may also be used, based on the method for interacting video information of a terminal provided in the embodiment of the present invention, a program or APP that is specially used for implementing interaction between a user and video information played in a terminal device is developed, and then when a video is played by the terminal, the program or APP is called to complete interaction between the user and the video information.

Based on the same inventive concept, an embodiment of the present invention provides an apparatus for interacting terminal video information, where specific implementation of a method for interacting terminal video information of the apparatus may refer to the description of the foregoing method embodiment, and repeated parts are not repeated, and the apparatus, as shown in fig. 12, includes:

a pause unit 20, configured to pause a video currently played by the terminal when a pause command of the user is received;

an obtaining unit 21, configured to obtain element information related to a paused picture from elements constituting the video, send the element information to a server for recognition, and obtain a recognition result fed back by the server;

an associating unit 22, configured to associate the recognition result in the pause screen.

Optionally, the obtaining unit is further configured to:

sending the image information to a server for identification; and

and acquiring the attribute information of the person and/or object, which is acquired by the server through matching according to the image characteristic information of the person and/or object extracted from the image information.

Optionally, the obtaining unit is further configured to:

and acquiring the attribute information of the person and/or the object, which is acquired by the server through matching according to the received image characteristic information of the person and/or the object.

Optionally, the obtaining unit is further configured to: and evaluating the pause picture, and determining that the pause picture meets the preset condition for identification.

Optionally, the obtaining unit is further configured to:

and when the pause picture comprises a human face, determining that the ratio of the human face area to the pause picture area is greater than a proportional threshold included by a preset condition.

Optionally, the obtaining unit is further configured to:

And determining that the edge density is greater than a density threshold included in a preset condition.

Optionally, the associating unit is further configured to:

When the attribute information comprises the position information of the identified person or object in the pause picture, establishing a man-machine interaction component at the position of the identified person and/or object in the pause picture according to the position information, and establishing association between the component and the attribute information so as to display the attribute information when receiving a command of operating the component by the user.

Optionally, the obtaining unit is further configured to:

determining an audio category in the audio information;

Optionally, the associating unit is further configured to:

Based on the same inventive concept, an embodiment of the present invention provides an information processing apparatus, as shown in fig. 13, including at least one processor 30 and at least one memory 31, where the memory 31 stores a computer program, and when the program is executed by the processor 30, the processor 30 is caused to execute the steps of the interaction method of terminal video information as described above.

Based on the same inventive concept, an embodiment of the present invention provides a storage medium, wherein the storage medium stores computer instructions, and when the computer instructions are executed on a computer, the computer is caused to execute the steps of the method for interacting with terminal video information as described above.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. An interactive method for terminal video information is characterized by comprising the following steps:

evaluating the pause picture and determining that the pause picture meets a preset condition for identification; the preset condition is used for indicating that the information quantity contained in the pause picture is larger than a threshold value; the preset conditions at least comprise one or the combination of the following two: the ratio of the face area to the pause picture area is greater than a proportional threshold, or the edge density of the pause picture is greater than a density threshold;

acquiring element information related to the paused picture among elements constituting the video; wherein the element information includes information related to audio synchronized with the paused screen;

and associating the recognition result in the pause picture.

2. The method of claim 1, wherein:

the acquiring, from the elements constituting the video, element information related to the paused picture specifically includes: carrying out image recognition on the pause picture to acquire image information of the pause picture;

sending the image information to a server for identification; and

3. The method of claim 1, wherein:

the acquiring, from the elements constituting the video, element information related to the paused picture specifically includes:

4. The method according to claim 1, wherein said evaluating said paused screen and determining whether said paused screen satisfies a predetermined condition for recognition comprises:

5. The method according to claim 1, wherein said evaluating said paused screen and determining whether said paused screen satisfies a predetermined condition for recognition comprises:

determining whether the edge density is greater than a density threshold;

6. The method as claimed in claim 1, wherein the identification result fed back by the server is attribute information of a person and/or an object, and the associating the identification result in the pause picture specifically comprises:

7. The method according to claim 1, 2 or 3, wherein the acquiring of the element information related to the paused picture from the elements constituting the video specifically comprises:

8. The method of claim 7, wherein sending the audio information to a server for identification comprises:

determining an audio category in the audio information;

9. The method according to claim 8, wherein the identification result fed back by the server is attribute information characterizing the audio content, and the associating the identification result in the pause picture specifically comprises:

10. An interactive device for terminal video information, comprising:

the acquisition unit is used for evaluating the pause picture, determining that the pause picture meets a preset condition for identification, acquiring element information related to the pause picture from elements forming the video, sending the element information to a server for identification, and acquiring an identification result fed back by the server; wherein the element information includes information related to audio synchronized with the paused screen; the preset condition is used for indicating that the information quantity contained in the pause picture is larger than a threshold value; the preset conditions at least comprise one or the combination of the following two: the ratio of the face area to the pause picture area is greater than a proportional threshold, or the edge density of the pause picture is greater than a density threshold;

11. The apparatus of claim 10, wherein the obtaining unit is further configured to:

sending the image information to the server for identification; and

12. The apparatus of claim 10, wherein the obtaining unit is further configured to:

13. An information processing apparatus comprising at least one processor and at least one memory, wherein the memory stores a computer program that, when executed by the processor, causes the processor to perform the steps of the method of any one of claims 1 to 9.

14. A storage medium storing computer instructions which, when executed on a computer, cause the computer to perform the steps of the method of any one of claims 1 to 9.