WO2021136363A1

WO2021136363A1 - Video data processing and display methods and apparatuses, electronic device, and storage medium

Info

Publication number: WO2021136363A1
Application number: PCT/CN2020/141337
Authority: WO
Inventors: 王楚天
Original assignee: 阿里巴巴集团控股有限公司
Priority date: 2019-12-31
Filing date: 2020-12-30
Publication date: 2021-07-08
Also published as: CN113129045A

Abstract

Video data processing and display methods and apparatuses, an electronic device, and a storage medium. The video data processing method comprises: acquiring a video to be played back and link data, the video comprising information about preset keywords, and the link data corresponding to a target content object indicated by the preset keywords (S102); during playback of the video, performing detection of the information about the preset keywords on at least part of image frames and/or at least part of audio data of the video that is played back (S104); and if it is determined, according to the detection result, that the information about the preset keywords is detected, displaying the corresponding link data on the basis of the video that is played back (S106). By means of said method, the video is more interactive.

Description

Video data processing, display method, device, electronic equipment and storage medium

This application claims the priority of the Chinese patent application filed on December 31, 2019, with the application number 201911412396.X and the invention title "Video data processing, display methods, devices, electronic equipment and storage media", all of which are approved The reference is incorporated in this application.

Technical field

The embodiments of the present invention relate to the field of computer technology, and in particular to a method, device, electronic device, and storage medium for processing and displaying video data.

Background technique

With the development of Internet technology, people's daily lives are increasingly dependent on electronic devices. Whether it is shopping, payment, socializing, etc., it can be achieved through electronic devices. What follows is a new way of video interaction-interactive video, and video advertising is a more important type of interactive video.

Video advertising is an advertising method that introduces products in the form of videos. The existing video advertisement only pushes advertisement information to the audience, and cannot realize the interaction with the audience. Especially when the audience wants to watch more detailed information, they can only search through the tools in their hands such as mobile phones or computers based on the information of the video advertisement they are watching.

It can be seen that the existing video advertising method lacks interaction with the audience, and it is difficult to provide the audience with a way to understand detailed information.

Summary of the invention

In view of this, embodiments of the present invention provide a video data processing solution to solve some or all of the above-mentioned problems.

According to a first aspect of the embodiments of the present invention, there is provided a method for processing video data, including: acquiring a video to be played and link data, wherein the video includes information about preset keywords, and the link data is related to The target content object indicated by the preset keyword corresponds to; during the playback of the video, at least part of the image frames and/or at least part of the audio data of the played video is subjected to the information of the preset keyword Detection; if it is determined that the information of the preset keyword is detected according to the detection result, the corresponding link data is displayed based on the played video.

According to a second aspect of the embodiments of the present invention, a display method is provided, which includes: during video playback, when information about a preset keyword is detected, displaying and the detected preset keyword in a video playback interface The link data corresponding to the target content object indicated by the information; obtain the trigger operation for the link data corresponding to the target content object displayed in the video playback interface; according to the trigger operation, jump from the video playback interface to the link data The linked page used to display the target content object.

According to a third aspect of the embodiments of the present invention, there is provided a method for processing video data, including: acquiring and playing a live video stream; during the playing process of the live video stream, processing image frames in the live video stream Perform content detection, and/or perform content detection on the audio in the live video stream to obtain the content object contained in the live video stream; find out whether the content object has corresponding link data; there will be a corresponding The content object of the link data serves as the target content object, and the link data corresponding to the target content object is displayed in the playback interface of the live video stream.

According to a fourth aspect of the embodiments of the present invention, there is provided a video data processing device, including: a first acquisition module for acquiring a video to be played and link data, wherein the video includes preset keywords Information, the link data corresponds to the target content object indicated by the preset keyword; the first detection module is configured to perform at least part of the image frames and/or of the played video during the playback of the video At least part of the audio data is used to detect the information of the preset keywords; the first display module is configured to, if the information of the preset keywords is determined to be detected according to the detection result, display the corresponding information based on the played video The link data of the target content object.

According to a fifth aspect of the embodiments of the present invention, there is provided a display device, including: a video playback module, configured to display and detect information about preset keywords in a video playback interface during video playback. The link data corresponding to the target content object indicated by the preset keyword information; the trigger acquisition module is used to obtain the trigger operation of the link data corresponding to the target content object displayed in the video playback interface; the interface jump module is used to According to the trigger operation, jump from the video playback interface to the page linked by the link data for displaying the target content object.

According to a sixth aspect of the embodiments of the present invention, there is provided a video data processing device, including: a second acquisition module, configured to acquire and play a live video stream; and a second detection module, configured to display the live video stream During the playback process, perform content detection on the image frames in the live video stream, and/or perform content detection on the audio in the live video stream to obtain the content objects contained in the live video stream; matching module , Used to find whether the content object has corresponding link data; the second display module, used to take the content object with the corresponding link data as the target content object, and display and Link data corresponding to the target content object.

According to a seventh aspect of the embodiments of the present invention, there is provided an electronic device, including: a processor, a memory, a communication interface, and a communication bus. The processor, the memory, and the communication interface complete each other through the communication bus. Inter-communication; the memory is used to store at least one executable instruction, the executable instruction causes the processor to perform operations corresponding to the video data processing method described in the first aspect or the third aspect or perform operations such as The operation corresponding to the two display methods.

According to an eighth aspect of the embodiments of the present invention, there is provided a computer storage medium having a computer program stored thereon, and when the program is executed by a processor, it implements the video data processing method as described in the first or third aspect or Realize the display method as in the second aspect.

According to the video data processing solution provided by the embodiment of the present invention, the video used to promote the target content object includes the information of the preset keywords. During the playback of the video, when the information of the preset keywords is detected, it is based on the playback The video displays the link data corresponding to the target content object indicated by the information of the preset keyword. In this way, not only can the video be used to draw traffic to the page corresponding to the link data, but also it can interact with the audience well, and provide the audience with a way to understand the detailed information of the target content object.

Description of the drawings

In order to explain the embodiments of the present invention or the technical solutions in the prior art more clearly, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the drawings in the following description are only These are some of the embodiments described in the embodiments of the present invention. For those of ordinary skill in the art, other drawings can be obtained based on these drawings.

Fig. 1a is a flowchart of steps of a method for processing video data according to the first embodiment of the present invention;

Figure 1b is a schematic diagram of interaction between a terminal device and a server in a usage scenario according to the first embodiment of the present invention;

FIG. 1c is a schematic diagram of interface changes in a terminal device in a usage scenario according to Embodiment 1 of the present invention;

2a is a flowchart of steps of a method for processing video data according to the second embodiment of the present invention;

2b is a schematic diagram of interface changes in a usage scenario according to the second embodiment of the present invention;

Fig. 3 is a flow chart of the steps of a display method according to the third embodiment of the present invention;

Fig. 4a is a flowchart of steps of processing video data according to the fourth embodiment of the present invention;

4b is a schematic diagram of interface changes in a usage scenario according to the fourth embodiment of the present invention;

Fig. 5 is a structural block diagram of a video data processing device according to the fifth embodiment of the present invention;

6 is a structural block diagram of a display device according to the sixth embodiment of the present invention;

Fig. 7 is a structural block diagram of a video data processing device according to the seventh embodiment of the present invention;

Fig. 8 is a schematic structural diagram of an electronic device according to the eighth embodiment of the present invention.

Detailed ways

In order to enable those skilled in the art to better understand the technical solutions in the embodiments of the present invention, the technical solutions in the embodiments of the present invention will be described clearly and completely in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the description is The embodiments are only a part of the embodiments of the present invention, rather than all the embodiments. Based on the embodiments in the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art should fall within the protection scope of the embodiments of the present invention.

The specific implementation of the embodiments of the present invention will be further described below in conjunction with the accompanying drawings of the embodiments of the present invention.

Taking the application scenario as a video advertisement as an example, in the prior art, one way of advertising through video is to play a pre-shot advertisement video on the interface of a webpage or application program for the audience (that is, the person watching the advertisement video) Watch. The problem with this advertising method is that on the one hand, it is necessary to shoot a video specifically for a product that needs to be promoted in advance, which leads to a long production time and high cost for the advertisement; on the other hand, the duration of the advertising video is usually short, resulting in products that can be introduced. The information is limited. If the audience wants to know more details of the product, they can only search and understand it by themselves based on the product name, model and other information, resulting in poor interaction.

Example one

Referring to FIG. 1a, there is shown a flow chart of the steps of a method for processing video data according to the first embodiment of the present invention.

In this embodiment, the processing method of the video data is described by taking the processing method of the video data executed by the terminal device as an example. Of course, in other embodiments, the video data processing method may also be executed by the server (the server includes a server or the cloud), and this embodiment does not limit this.

Among them, the video data processing method includes the following steps:

Step S102: Obtain a video to be played and link data, where the video includes information about a preset keyword, and the link data corresponds to the target content object indicated by the preset keyword.

The video to be played may be a video used to explain or display the target content object, and the target content object may be any suitable object such as a commodity, a person, a location, and so on. Taking commodities as an example, they can be tangible commodities or intangible commodities (such as services, virtual commodities, etc.).

The video includes image frame sequence data and audio data, in addition to information target content objects with preset keywords. It should be noted that the image frame sequence data may include the image of the target content object, or may not include the image of the target content object at all.

The information of the preset keywords includes at least one of the following: a preset voice keyword and a preset text keyword target content object.

Specifically, for example, the preset voice keywords and the preset text keywords may be the name, model, etc. of the target content object, or other preset keywords determined as needed, such as the category of the target content object. For example, the information of the preset keywords indicates that the voice keyword is "**mobile phone". For another example, the text keyword indicated in the information of the preset keyword is "** thermos cup" and so on.

In a video, the information of the preset keywords may indicate one or more preset keywords. For example, the preset keywords indicated by the information of the preset keywords include "**lipstick", "**pen", etc. Each preset keyword can correspond to one link data, and the link data corresponding to different preset keywords can be the same or different.

The link data can jump to the corresponding page after a trigger operation is performed by an audience of the target content object (hereinafter referred to as the audience, the audience may be a person watching the video), so as to achieve the purpose of diverting traffic to the page corresponding to the link data through the video. The link data can be the URL or IP address of the page, and so on.

In one possible way, the link data can be determined by the application provider. For example, if the link data A corresponds to the product purchase page of the product A, the preset keyword corresponding to the link data A is the name of the product A. In the process of playing the video, if it is detected that the image of the product A is included in the video or the name of the product A is mentioned in the subtitle or audio, it can be determined that the link data A matches the video.

In this way, any video containing preset keywords can be used as the promotion carrier of the link data to promote the link data. The target content object thus enables any appropriate video to provide services for advertisers to meet their needs for promoting the target content object through the video.

Step S104: In the process of playing the video, at least part of the image frames and/or at least part of the audio data of the played video is detected for the information of the preset keywords.

Those skilled in the art can use any appropriate method to detect image frames or audio data. It should be noted that during the detection, the image frame and/or audio data currently being played can be detected, or the image frame and/or audio data after the image frame and/or audio data being played at the current moment can be pre-examined. For detection, this embodiment does not limit this.

A specific detection method. For example, if the preset keyword information is the name of commodity A, when detecting image frames, image recognition algorithms or a trained neural network model with commodity A recognition function (such as convolution The neural network model) detects the image frames in the video to determine whether any image frames contain commodity A.

Alternatively, a neural network model capable of character recognition (such as a convolutional neural network model) can also be used to detect image frames in the video to determine whether any text in the image frame contains the name of commodity A.

Or, use a speech recognition algorithm (such as asr, Automatic Speech Recognition algorithm) to detect the audio data to determine whether there is an audio segment containing the name of commodity A in the audio data.

Step S106: If it is determined that the information of the preset keyword is detected according to the detection result, the corresponding link data is displayed based on the played video.

For example, the information of the preset keyword is the name of commodity A, and if it is recognized that the name of commodity A exists in the caption contained in the image frame for the image frame, it is determined that the information of the preset keyword is detected, and the corresponding link data is displayed.

In this way, it can automatically detect the existence of preset keyword information for any video being played, and automatically display the corresponding link data when the preset keyword information is detected, that is, the preset keyword information is played. On the one hand, the link data can be used to improve the interactivity of the audience, so that the audience can more easily understand the target content object of interest. On the other hand, the flexibility of the link data display is improved, so that it can be better integrated into the video, and it can target any Videos can be applied.

Those skilled in the art can use any appropriate method to determine whether the preset keyword information is detected, and this embodiment does not limit this.

For example, according to the time information of the voice keyword in the audio data (that is, the playing time, such as 1 minute and 20 seconds, etc.), it is determined whether the preset voice keyword is played at the current moment.

Alternatively, it is determined whether the preset keyword information is detected through a voice detection algorithm, an image recognition algorithm, or a deep learning network model.

When the preset keyword is played, the control corresponding to the link data can be displayed in the appropriate position of the video playback interface to display the link data through the control (such as floating window, mask or pop-up window, etc.), which can be received through the control The audience’s clicks and other operations. In this way, the link data will not be displayed in the video all the time, but the link data will be displayed intelligently according to whether the information of the preset keywords is detected during the playback process of the video, so that the display of the link data is well integrated into the video , Will not appear too obtrusive or implanted strong, thereby enhancing the promotion effect through video promotion and ensuring the video viewing experience.

Those skilled in the art can use any appropriate method to display the link data based on the played video, which is not limited in this embodiment. For example, when the video starts to play, the control corresponding to the link data is drawn in advance, and its attribute is set to "hidden". When the preset keyword information is detected, the attribute of the pre-drawn control is changed to "display". and many more.

Through the displayed link data, the interaction between the video and the audience can be realized, and the audience can jump to the corresponding page when the audience operates on the link data to provide the audience with further more detailed information, which is convenient for the audience to understand or make purchases.

In the following, in conjunction with Figure 1b and Figure 1c, a specific usage scenario is taken as an example to describe the implementation process of the video data processing method as follows:

In this usage scenario, as shown in Figure 1b, terminal devices (such as mobile phones, personal computers, personal mobile computers, etc.) are connected to the server (the server includes a server or the cloud) through a network. When an audience browses a webpage through a browser on a terminal device, the terminal device obtains webpage data from the server. The webpage data contains a video to be played (the video can be played in a small window in the lower right corner of a webpage that often appears).

The video contains not only the image frame sequence data and audio data in the conventional video, but also the information of preset keywords.

In addition, link data to the target content object indicated by the preset keyword can also be obtained. The link data can be included in the video, or it can exist independently of the video.

When playing a video through a small window, the interface is as shown in interface 1 in Figure 1c. When the information of the preset keywords is detected during video playback, as shown in interface 2 in Figure 1c, the corresponding link data is displayed in the video playback interface, and the interface for displaying the link data is shown in interface 3 in Figure 1c. Show.

Suppose that the voice keyword or text keyword indicated by the information of the preset keyword is the name of product A (such as **cup), and the target content object is **cup. One of the ways to detect the information of the preset keyword can be : Perform image recognition on the current image frame to determine the content object contained in the current image frame. If there is commodity A in the content object (for example, there is **cup), it is determined that the preset keyword information is detected, and the corresponding Link data.

Alternatively, another detection method can be: perform voice recognition on the currently played audio data. If a voice keyword (such as the name of product A) is recognized, it is determined that the information of the preset keyword is detected, and the corresponding information can be displayed. Link data. Of course, other methods can also be used for detection, which will not be described here.

In the subsequent process, if the audience clicks on the displayed link data, the browser jumps to the webpage corresponding to the link data for display, and the interface after the jump is shown as interface 4 in Figure 1c.

Through this embodiment, the video used to promote the target content object includes the information of the preset keywords. During the playback of the video, when the information of the preset keywords is detected, based on the displayed video and the preset keywords The link data corresponding to the target content object indicated by the information. In this way, not only can the video be used to draw traffic to the page corresponding to the link data, but also it can interact with the audience well, and provide the audience with a way to understand the detailed information of the target content object.

Example two

Referring to Fig. 2a, there is shown a flow chart of the steps of a method for processing video data according to the second embodiment of the present invention.

In this embodiment, the terminal device is still the main body of execution, and the display process of the link data in the video data processing method is mainly described. The video data processing method of this embodiment includes the following:

Step S100: Obtain the corresponding identifier of the target content object, and generate the preset keyword information corresponding to the target content object according to the identifier.

The target content object can be selected by the advertiser, or determined based on big data analysis. For example, the target content object is physical goods, such as cups, mobile phones, etc., which may also be non-physical goods, such as cleaning services, virtual currency, and so on.

The identification can be the name, model, category or preset code of the target content object, words selected by other advertisers, and so on.

The generated preset keyword information may only indicate one preset keyword, or may be used to indicate more than one preset keyword.

For example, the information of the preset keywords includes "**cup", "**lipstick" and "**phone".

Step S102: Obtain the video to be played and the link data.

This step S102 can adopt the same implementation process as the step S102 in the first embodiment, so it will not be repeated here. Step S104: In the process of playing the video, at least part of the image frames and/or at least part of the audio data of the played video is detected for the information of the preset keywords.

The process of detecting the information of the preset keywords can adopt the implementation process in the first embodiment, so it will not be repeated here.

Step S106 can be implemented in the first embodiment.

Or, in a feasible manner, step S106 can be implemented as: matching the copywriting data corresponding to the link data from the copywriting data input by the application provider of the target content object; generating the to-be-displayed based on the link data and the matched copywriting data Link data; based on the video playback interface, display the link data to be displayed.

In this way, the application provider can input personalized copywriting data as needed, and automatically match the copywriting data with the link data, so that the corresponding copywriting data can be displayed at the same time when the link data is displayed, so as to increase the audience's interest in clicking the link data.

Or, in another feasible way, in order to be able to adapt to the image change of the interface during the playback of the video, the display duration may be set for the link data, that is, the stay time in the playback interface of the video may be set.

Wherein, in order to make the link data more convenient for the audience to operate, in step S106, displaying the link data corresponding to the played video may be implemented as: displaying the link data with a preset display duration, so that the target The audience of the content object operates on the link data within the preset display duration. The preset display duration, that is, the duration of the link data stay and display, can be determined according to needs, such as 20 seconds, 1 minute, etc., which is not limited in this embodiment.

If in the process of displaying the link data, the trigger operation of the link data by the audience is not received until the preset display time is reached, the link data is hidden or destroyed.

If an operation of the audience on the link data is received during the display process, the method can also jump to the page corresponding to the link data.

Or, in this embodiment, when the corresponding link data is displayed based on the played video in step S106, the following method can be used: in the video playback interface, a display control is added to display the corresponding link data, where The control includes at least one of the following: floating window, mask, and pop-up window.

Since the display control can easily adjust the display position, it is convenient to typeset the video when the link data is displayed, so that the position of the content object in the image frame of the video can be adapted to ensure that the link data can be displayed in a more appropriate position.

Preferably, in order to enable the display controls to be better integrated into the video, to achieve a more conspicuous display, so that the audience can more easily notice the display controls, and to reduce the feeling of forced implantation, in step S106, in the video playback In the interface, adding a display control to display the corresponding link data includes the following sub-steps:

Sub-step S1061: Perform image recognition on a preset number of image frames after the current image frame based on the information of the preset keyword being played, and determine the position information of the content object in the image frame according to the recognition result.

For the fusion of display controls into video, whether the display controls can be better integrated in the image directly affects the fusion effect. Therefore, in this embodiment, a neural network model, front and back background segmentation, etc. can be used to perform image recognition on a preset number of image frames after the current image frame to obtain the recognition result, and the content object in the image frame can be determined according to the recognition result. According to the position information, the blank area in the image frame, or the area that does not block the main content object in the image frame, etc. can be determined subsequently based on the position information, and these areas are determined as suitable display positions for displaying the display control.

Among them, the content objects in the image frame may be people, objects, buildings, texts, etc. in the image frame. The preset number can be determined according to needs, which is not limited in this embodiment. It should be noted that, for example, the preset number is 5, and the preset number of image frames after the current image frame may be 5 consecutive image frames after the current image frame, or may be 5 image frames at intervals. If it is 5 image frames at intervals, the number of image frames at intervals between two adjacent image frames can be determined as required.

Step S1062: Determine the display position of the link data according to the position information of the content object in the image frame.

In a specific implementation, step S1062 may be implemented as: determining a blank position in each image frame according to the position information of the content object in the image frame; determining the blank position in each image frame according to the blank position in each image frame The placement of the link data. In this way, it can be ensured that the occlusion of content objects can be reduced when link data is displayed.

Specifically, in one case, in the video, the information of the preset keywords starts to be played at the 20s, and the 5 image frames after the 20s corresponding image frame are image-recognized, and the blank positions in each image frame are determined Based on this, the blank position with the highest coincidence rate can be determined, and the blank position with the highest coincidence rate can be used as the display position of the link data, so that the link data can not block or less block the target content object in the image frame when displaying the link data. Improve integration.

Of course, other methods can also be used to determine the display location based on the recognition result. For example, according to the location information of the content object and preset layout rules, the target content object corresponding to the link data is determined from the content object, and the target content object will be separated from the target content object. A certain distance position is determined as a suitable position as a display position.

Sub-step S1063: display the corresponding link data through the display control at the display position.

After the display position is determined, the display control can be displayed in different ways according to the different structure of the display control.

For example, the sub-step S1063 may be implemented as: displaying the display control in the display position, and displaying the first sub-control and the second sub-control in the display control.

The first sub-control is used to display the text and/or image information corresponding to the target content object indicated by the information of the preset keywords; the second sub-control includes a trigger control corresponding to the link data, which is used to jump the video playback interface when triggered Go to the page linked by the link data.

The text and/or image information corresponding to the target content object can be preset, or can be added by the business owner voluntarily.

In this way, the related information of the target content object and the trigger control of the link data can be displayed at the same time, so that the audience can know the target content object and its corresponding trigger control well, so that multiple different display controls can be displayed in one interface. Among them, different display controls can have different functions and be displayed in different ways, thereby reducing implementation costs.

Of course, in other embodiments, if the display control includes only one control, the display control can be directly displayed at the display position, which is not limited in this embodiment.

In the process of displaying the display control, if the audience needs to further understand the target content object, they can operate the display control.

Optionally, in this embodiment, the method further includes:

Step S108: Receive an operation on the displayed link data, and jump from the playback interface of the video to the page linked by the link data according to the operation.

In this embodiment, receiving an operation on the displayed link data indicates that the audience wants to further understand the target content object or view more information related to the target content object, so according to this operation, jump from the video playback interface to Link the page to which the data is linked to display more information corresponding to the target content object.

The video data processing method can be applied to any appropriate use scenario. For example, it is quoted in an e-commerce website to add a video playback window to the homepage, product display page, search display page, etc., and display related link data during video playback, so that the audience can click on the link data to jump Go to the corresponding page to view the page corresponding to the link data, so as to achieve the purpose of page drainage.

Of course, in addition to e-commerce websites, it can also be applied to any other scenes that can play videos.

The following describes the processing method of video data in combination with a usage scenario of playing video in the web interface:

The terminal device obtains webpage data from the server (the server includes the server or the cloud) through the network (refer to Figure 1b for a schematic diagram of the connection between the terminal device and the server), where the webpage data includes video, and the video includes image frame sequence data and audio data , Preset keyword information. Of course, the web page data can also include link data.

The interface for playing the video in the web interface is as shown in interface 1 in Figure 2b. The video may be a video that introduces a certain target content object (such as a product).

As shown in the interface 2 in Figure 2b, when the information of the preset keywords is detected (in this usage scenario, the audio clips corresponding to the voice keywords are played), the text keywords are displayed in the subtitles in the interface. A semi-transparent mask is displayed in the mask, and the semi-transparent first and second sub-controls are displayed in the mask. The first sub-control is used to display the name of the target content object 1 (such as XXX hand cream, etc. ), the second sub-control is a trigger control for linking data (such as a trigger button, a trigger pop-up window, etc.).

As shown in interface 3 in Figure 2b, when the audience clicks on the second sub-control, the interface jumps to the product introduction interface of the target content object, which is used to display the detailed information of the target content object (taking hand cream as an example, which The detailed information can be the appearance, volume, composition, etc. of the hand cream).

Optionally, in this embodiment, the aforementioned video may be automatically generated using a video generation tool. For example, the material video is obtained in advance, and the material video is analyzed and processed to obtain the target content object and preset keywords corresponding to the material video. When the business owner needs to generate a video for promoting the target content object, according to the search information input by the business owner, the algorithm automatically matches the material video that matches the search information, associates the material video with the link data provided by the business owner, and then Material video and link data generate video.

Among them, the image frame sequence data included in the video can be the image frame sequence data in the material video, the audio data can be the audio data in the material video or the audio data automatically generated according to the material copy, and the link data is provided by the business owner Link data so that advertising videos can be produced more accurately.

By intelligently determining the display position of the display control, the intelligent display of the link data can be realized, and the display of the link data can be well integrated into the video.

In addition, videos can be automatically generated, so that business owners (such as advertisers) who are not familiar with video production tools or have no video production capabilities can also generate the videos they need. During the video playback process, the link data with better integration is displayed, so that the display position of the link data changes with the image changes in the video, and the intelligence and adaptability are better, which can directly increase the conversion rate of the video.

Example three

Referring to FIG. 3, there is shown a schematic flowchart of the steps of a display method according to the third embodiment of the present invention.

In this embodiment, taking the terminal device as the execution subject as an example, the display method is described as follows.

Wherein, the display method of this embodiment includes the following steps:

Step S300: During the video playing process, when the information of the preset keyword is detected, the link data corresponding to the target content object indicated by the detected information of the preset keyword is displayed in the video playing interface.

The process of displaying the link data can be the process described in the foregoing first embodiment or second embodiment, so it will not be repeated here.

Step S302: Acquire a trigger operation on the link data corresponding to the target content object displayed in the video playback interface.

Among them, the link data may indicate a page associated with the target content object, and the link data may be a URL (Uniform Resource Locator, Uniform Resource Locator) or an IP address, etc.

Specifically in this embodiment, the link data is the link data corresponding to the information of the preset keyword that is triggered to be displayed when the video is played to the information of the preset keyword.

The target content object can be any suitable object such as commodities, characters, scenic spots, and so on. Commodities can be tangible or intangible.

The trigger operation may be a click operation, a long press operation, a sliding operation, a double-click operation, etc. on the link data (for example, a display control displaying the link data) by the audience.

Step S304: According to the trigger operation, jump from the video playback interface to the page linked by the link data for displaying the target content object.

In a feasible manner, according to the trigger operation, a request for accessing the page indicated by the link data is generated and sent to the corresponding server to obtain the data of the page corresponding to the link data for display.

Through this embodiment, in the process of playing the video, if the information of the preset keyword is played, the corresponding link data is displayed for the audience to trigger, and if the trigger operation is received, the page corresponding to the link data is displayed, thereby displaying the target The detailed information of content objects (such as products) for the audience to view.

Example four

Referring to FIG. 4a, there is shown a schematic flow chart of the steps of a method for processing video data according to the fourth embodiment of the present invention.

In this embodiment, the method for processing video data is described in conjunction with a live video sales scenario. In the live video sales scenario, the live broadcaster can recommend and introduce products to viewers through the live broadcast, and can also display the effects of trials in the live broadcast, thereby realizing online sales of the products. The video data processing method can be executed by a terminal device as a playback terminal.

The video data processing method of this embodiment includes:

Step S402: Acquire and play the live video stream.

The live video stream can be a real-time video obtained from the live broadcast server, or it can be a real-time video obtained directly from the live broadcast terminal.

The live video stream may be a video in which the live broadcaster introduces the product, but is not limited to this, and it may be a video of any other content.

Step S404: During the playback of the live video stream, perform content detection on the image frames in the live video stream, and/or perform content detection on the audio in the live video stream to obtain the live broadcast The content object contained in the video stream.

The content object can be people, objects, buildings, etc. in the image, or the people, objects, buildings, etc. indicated by the text in the image frame or the text keywords appearing in the subtitles, or it can be the voice keyword indications appearing in the audio People, objects, buildings, etc.

In a specific implementation manner, performing content detection on the image frames in the live video stream to obtain the content objects contained in the live video stream may be implemented as follows: during the playback of the live video stream, Perform image recognition at a preset position in the image frame in the live video stream, and obtain the content object indicated by the text keyword in the image frame and/or the content object indicated by the image in the image frame according to the recognition result .

For example, a neural network model with a corresponding content object recognition function is used to detect the preset position in the image frame to identify the content object contained in the preset position of the image frame.

Among them, the preset position may be a position configured by default, such as the entire image frame; or, it may also be a part or all of the image frames selected by the live broadcast host through a selection box during the live broadcast. Through this method of detecting the preset position, the degree of freedom of detection can be improved, so as to improve the adaptability.

Performing content detection on the audio in the live video stream to obtain the content objects contained in the live video stream includes: performing audio on the audio in the live video stream during the playback of the live video stream Identify and obtain the content object indicated by the voice keyword in the audio.

For example, voice keywords such as product names and product models mentioned in the audio can be detected to determine the content objects indicated by these voice keywords.

By using image recognition and/or audio and video methods to detect the live video stream, at least some of the content objects contained in the live video stream can be obtained, and then it can be subsequently determined whether there is a target content object among these content objects to determine whether it is necessary Show link data.

Step S406: Find out whether the content object has corresponding link data.

In a feasible manner, step S406 includes: searching a preset commodity database to determine whether the content object has corresponding link data.

The commodity database stores commodity identifiers (such as commodity names) and its corresponding link data (commodity purchase links). For the detected content object, the preset keywords corresponding to the content object can be matched with the product identifier to determine whether there is corresponding link data in the product database. If it exists, it means that it is the target content object, and the link data can be displayed in the playback interface for the viewer to manipulate the link data as needed.

In another feasible manner, the live video stream includes information indicating preset keywords of the target content object to be identified and corresponding link data. The information and link data of the preset keywords of the target content object to be recognized may be selected by the live broadcaster at the live broadcast end. For example, the live broadcast terminal is equipped with a setting interface, through which the live broadcast host can configure the information of the preset keywords to indicate the target content object; and configure the corresponding link data to improve autonomy and enable the live broadcast host to follow Need to control the displayed link data.

In this case, step S406 can be implemented as: determining whether the detected content object includes a content object matching the target content object to be identified, and if it exists, determining that there is corresponding link data.

For example, by matching a preset keyword corresponding to the detected content object with a preset keyword of the target content object to be identified, it is determined whether there is a matching content object. If there is a matching content object, it is determined that there is corresponding link data.

Step S408: Use the content object with corresponding link data as the target content object, and display the link data corresponding to the target content object in the play interface of the live video stream.

When the target content object is detected, the corresponding link data is displayed, so that the live broadcast viewer can easily jump to the page corresponding to the link data by operating the displayed link data while watching the live broadcast to view the content. Or perform operations such as purchasing the product corresponding to the target content object. As a result, the live broadcast function is enriched, and the viewer can conveniently view the information of the target content object.

In addition to displaying link data on the broadcast side, it can also be displayed on the live broadcast side simultaneously, so that the live broadcaster can know whether the link data is displayed and the display effect of the link data in a timely manner, so that the live broadcaster can more easily monitor the live broadcast effect .

In order to improve the viewing effect, when displaying the link data, an animation can be set to realize the effect of prompting the link data, so that the viewer can more easily notice the link data.

The following describes the live broadcast process in conjunction with a specific usage scenario:

During the live broadcast process, as shown in the live broadcast terminal interface 1 in Figure 4b, the live broadcast host can configure at least one preset keyword and the corresponding link data through the live broadcast terminal. According to the configured preset keywords, a command to indicate to be recognized can be generated. The preset keyword information of the target content object, and the live video stream, the preset keyword information and link data are sent to the playback terminal.

As shown in the interface 1 of the player terminal in FIG. 4b, it shows the interface of the player terminal to play the live video stream. In the process of playing the live video stream, content recognition can be performed on the image frames and/or audio therein to determine the content objects contained in the live video stream. Then find out whether the detected content object has corresponding link data. If there is corresponding link data, the link data will be displayed on the playback interface of the player end and the playback interface of the live broadcast end. In Figure 4b, the interface 2 of the player end shows the display link data. Schematic diagram of the interface.

The viewer can jump to the corresponding page by operating the link data to view the content on the page (as shown in the player interface 3 in Figure 4b).

Example five

Referring to FIG. 5, there is shown a structural block diagram of a video data processing apparatus according to the fifth embodiment of the present invention.

The video data processing device of this embodiment includes: a first acquisition module 502 for acquiring a video to be played and link data, where the video includes information about preset keywords, the link data and the target indicated by the preset keywords Content object correspondence; a first detection module 504, used to detect at least part of the image frames and/or at least part of the audio data of the played video during the playback of the video; the first display module 506 , Used to display the corresponding link data based on the played video if it is determined that the information of the preset keyword is detected according to the detection result.

Optionally, the information of the preset keyword includes at least one of the following: a preset voice keyword and a preset text keyword.

Optionally, the device further includes: an information generating module 500, configured to obtain a corresponding identifier of a target content object, and generate information of the preset keyword corresponding to the target content object according to the identifier, wherein the The identification includes at least one of the following: the name, model, and category of the target content object.

Optionally, the device further includes: a receiving module 508, configured to receive an operation on the displayed link data, and jump from the playing interface of the video to the page linked by the link data according to the operation.

Optionally, the first display module 506 is configured to add a display control to display the corresponding link data on the video playback interface, where the display control includes at least one of the following: a floating window, a mask, and a pop-up window.

Optionally, the first display module 506 is configured to perform image recognition on a preset number of image frames after the current image frame based on the information of the preset keyword being played, and determine the position of the content object in the image frame according to the recognition result Information: According to the position information of the content object in the image frame, the display position of the link data is determined; the corresponding link data is displayed through the display control in the display position.

Optionally, the first display module 506 is configured to display the display control in the display position when displaying the corresponding link data through the display control in the display position, and display the first sub-control and the second sub-control in the display control; wherein, The first sub-control is used to display the text and/or image information corresponding to the target content object indicated by the information of the preset keywords; the second sub-control includes a trigger control corresponding to the link data, which is used to jump the video playback interface when triggered Go to the page linked by the link data.

Optionally, the first display module 506 is configured to determine each image according to the position information of the content object in each image frame indicated by the recognition result when determining the display position of the link data according to the position information of the content object in the image frame The blank position in the frame; according to the blank position in each image frame, determine the display position of the link data.

Optionally, the first display module 506 is configured to display the link data with a preset display duration when displaying the corresponding link data based on the played video, so that the audience of the target content object can operate on the link data within the preset display duration .

Optionally, when displaying the corresponding link data based on the played video, the first display module 506 matches the copy data corresponding to the link data from the copy data input from the application provider of the target content object; Copywriting data to generate link data to be displayed; video-based playback interface to display link data to be displayed.

The video data processing apparatus of this embodiment is used to implement the corresponding video data processing methods in the foregoing multiple method embodiments, and has the beneficial effects of the corresponding method embodiments, which will not be repeated here. In addition, the functional realization of each module in the video data processing apparatus of this embodiment can refer to the description of the corresponding part in the foregoing method embodiment, and will not be repeated here.

Example Six

Referring to FIG. 6, there is shown a structural block diagram of a display device according to the sixth embodiment of the present invention.

The display device of this embodiment includes: a link data display module 600, which is used to display and indicate the detected preset keyword information in the video playback interface when the information of the preset keyword is detected during the video playback process The link data corresponding to the target content object; the trigger obtaining module 602 is used to obtain the trigger operation of the link data corresponding to the target content object displayed in the video playback interface; the interface jump module 604 is used to play from the video according to the trigger operation The interface jumps to the page linked by the link data and used to display the target content object.

The display device of this embodiment is used to implement the corresponding display methods in the foregoing multiple method embodiments, and has the beneficial effects of the corresponding method embodiments, which will not be repeated here. In addition, the functional realization of each module in the display device of this embodiment can refer to the description of the corresponding part in the foregoing method embodiment, and will not be repeated here.

Example Seven

Referring to FIG. 7, there is shown a structural block diagram of a video data processing device according to the seventh embodiment of the present invention.

The device for processing video data in this embodiment includes: a second acquisition module 702, configured to acquire and play a live video stream; a second detection module 704, configured to perform processing on the live video stream during the playback of the live video stream. Content detection is performed on the image frames in the stream, and/or content detection is performed on the audio in the live video stream to obtain the content objects contained in the live video stream; the matching module 706 is used to find the content objects Whether there is corresponding link data; the second display module 708 is configured to use the content object with the corresponding link data as the target content object, and display the content object corresponding to the target content object in the playback interface of the live video stream Link data.

Optionally, the matching module 706 is specifically configured to search a preset commodity database to determine whether the content object has corresponding link data.

Optionally, the live video stream includes information indicating preset keywords of the target content object to be identified and corresponding link data; the matching module 706 is specifically configured to determine whether the detected content object includes If there is a content object matching the target content object to be identified, it is determined that there is corresponding link data.

Optionally, the second detection module 704 is specifically configured to perform content detection on image frames in the live video stream during the playback process of the live video stream to obtain content objects contained in the live video stream. In the process of playing the live video stream, perform image recognition on the preset position in the image frame in the live video stream, and obtain the content object and the content object indicated by the text keyword in the image frame according to the recognition result /Or the content object indicated by the image in the image frame.

Optionally, the second detection module 704 is specifically configured to perform content detection on the audio in the live video stream during the playback process of the live video stream to obtain content objects contained in the live video stream, During the playing process of the live video stream, audio recognition is performed on the audio in the live video stream, and the content object indicated by the voice keyword in the audio is obtained.

Example eight

Referring to FIG. 8, there is shown a schematic structural diagram of an electronic device according to the eighth embodiment of the present invention. The specific embodiment of the present invention does not limit the specific implementation of the electronic device.

As shown in FIG. 8, the electronic device may include: a processor (processor) 802, a communication interface (Communications Interface) 804, a memory (memory) 806, and a communication bus 808.

among them:

The processor 802, the communication interface 804, and the memory 806 communicate with each other through the communication bus 808.

The communication interface 804 is used to communicate with other electronic devices such as terminal devices or servers.

The processor 802 is configured to execute a program 810, and specifically can execute related steps in the foregoing video data processing or display method embodiments.

Specifically, the program 810 may include program code, and the program code includes computer operation instructions.

The processor 802 may be a central processing unit CPU, or an Application Specific Integrated Circuit (ASIC), or one or more integrated circuits configured to implement the embodiments of the present invention. The one or more processors included in the electronic device may be the same type of processor, such as one or more CPUs, or different types of processors, such as one or more CPUs and one or more ASICs.

The memory 806 is used to store the program 810. The memory 806 may include a high-speed RAM memory, and may also include a non-volatile memory (non-volatile memory), for example, at least one disk memory.

The program 810 can specifically be used to make the processor 802 perform the following operations: obtain the video to be played and link data, where the video includes information about preset keywords, and the link data corresponds to the target content object indicated by the preset keywords; During the playback of the video, at least part of the image frames and/or at least part of the audio data of the played video is detected for the information of the preset keywords; if it is determined that the information of the preset keywords is detected according to the detection result, it will be The video shows the corresponding link data.

In an optional implementation manner, the preset keyword information includes at least one of the following: a preset voice keyword and a preset text keyword.

In an optional implementation manner, the program 810 is further configured to enable the processor 802 to obtain the corresponding identification of the target content object, and generate information of preset keywords corresponding to the target content object according to the identification, wherein the identification includes the following At least one: the name, model, and category of the target content object.

In an optional implementation manner, the program 810 is further configured to enable the processor 802 to receive an operation on the displayed link data, and according to the operation, jump from the playing interface of the video to the page linked by the link data.

In an optional implementation manner, the program 810 is further configured to enable the processor 802 to add a display control to display the corresponding link data on the video playback interface when the processor 802 displays the corresponding link data based on the played video, where: The display control includes at least one of the following: floating window, mask, and pop-up window.

In an optional implementation manner, the program 810 is also used to enable the processor 802 to add a display control to display the corresponding link data on the video playback interface, based on the information of the preset keywords being played, to compare the current Perform image recognition on a preset number of image frames after the image frame, determine the position information of the content object in the image frame according to the recognition result, and determine the display position of the link data according to the position information of the content object in the image frame; pass at the display position The display control displays the corresponding link data.

In an optional implementation manner, the program 810 is further configured to cause the processor 802 to display the display control in the display position when displaying the corresponding link data through the display control in the display position, and display the first child in the display control. Control and a second sub-control; wherein, the first sub-control is used to display the text and/or image information corresponding to the target content object indicated by the information of the preset keywords; the second sub-control includes a trigger control corresponding to the link data for When triggered, the video playback interface jumps to the page linked by the link data.

In an optional implementation manner, the program 810 is further configured to make the processor 802 determine the display position of the link data according to the position information of the content object in the image frame, according to each image indicated by the recognition result. The position information of the content object in the frame determines the blank position in each image frame; according to the blank position in each image frame, the display position of the link data is determined.

In an optional implementation manner, the program 810 is further configured to cause the processor 802 to display the link data for a preset display duration when displaying the corresponding link data based on the played video, so that the target content object is The audience operates on the link data within the preset display duration.

In an optional implementation manner, the program 810 is further configured to enable the processor 802 to match the text data input from the application provider of the target content object when displaying the corresponding link data based on the played video. Copywriting data corresponding to the link data; generate link data to be displayed based on the link data and matching copywriting data; display the link data to be displayed based on the video-based playback interface.

Alternatively, the program 810 may specifically be used to cause the processor 802 to perform the following operations: in the video playback process, when the information of the preset keyword is detected, the information indication of the detected preset keyword is displayed in the video playback interface The link data corresponding to the target content object; obtain the trigger operation of the link data corresponding to the target content object displayed in the video playback interface; according to the trigger operation, jump from the video playback interface to the link data linked to display the target content The page of the object.

Alternatively, the program 810 may specifically be used to cause the processor 802 to perform the following operations: obtain and play a live video stream; during the playback of the live video stream, perform content detection on the image frames in the live video stream, and/ Or, perform content detection on the audio in the live video stream to obtain the content object contained in the live video stream; find whether the content object has corresponding link data; the content of the corresponding link data will exist The object serves as the target content object, and the link data corresponding to the target content object is displayed in the playback interface of the live video stream.

In an optional implementation manner, the program 810 is further configured to enable the processor 802 to search a preset commodity database when searching whether the content object has corresponding link data, so as to determine whether the content object has a corresponding link data. Link data.

In an optional implementation manner, the live video stream includes information indicating preset keywords of the target content object to be recognized and corresponding link data; the program 810 is also used to make the processor 802 search for the When the content object has corresponding link data, it is determined whether the detected content object includes a content object matching the target content object to be identified, and if it exists, it is determined that there is corresponding link data.

In an optional implementation manner, the program 810 is further configured to enable the processor 802 to perform content detection on the image frames in the live video stream during the playback of the live video stream, so as to obtain the live video When the content object contained in the stream, during the playback of the live video stream, perform image recognition on the preset position in the image frame in the live video stream, and obtain the text in the image frame according to the recognition result The content object indicated by the keyword and/or the content object indicated by the image in the image frame.

In an optional implementation manner, the program 810 is further configured to enable the processor 802 to perform content detection on the audio in the live video stream during the playback of the live video stream, so as to obtain the live video stream. In the process of playing the live video stream, audio recognition is performed on the audio in the live video stream, and the content object indicated by the voice keyword in the audio is obtained.

For the specific implementation of each step in the program 810, reference may be made to the corresponding description of the corresponding steps and units in the above-mentioned video data processing or display method embodiment, which will not be repeated here. Those skilled in the art can clearly understand that, for convenience and concise description, the specific working process of the devices and modules described above can be referred to the corresponding process description in the foregoing method embodiment, which will not be repeated here.

It should be pointed out that according to the needs of implementation, each component/step described in the embodiment of the present invention can be split into more components/steps, or two or more components/steps or partial operations of components/steps can be combined into New components/steps to achieve the purpose of the embodiments of the present invention.

The above method according to the embodiments of the present invention can be implemented in hardware, firmware, or implemented as software or computer code that can be stored in a recording medium (such as CD ROM, RAM, floppy disk, hard disk, or magneto-optical disk), or implemented by The computer code downloaded from the network is originally stored in a remote recording medium or a non-transitory machine-readable medium and will be stored in a local recording medium, so that the method described here can be stored in a general-purpose computer, a special-purpose processor, or a programmable Or such software processing on a recording medium of dedicated hardware (such as ASIC or FPGA). It can be understood that a computer, a processor, a microprocessor controller, or programmable hardware includes a storage component (for example, RAM, ROM, flash memory, etc.) that can store or receive software or computer code, when the software or computer code is used by the computer, When accessed and executed by the processor or hardware, the video data processing or display method described herein is implemented. In addition, when a general-purpose computer accesses the code used to implement the processing or display method of video data shown here, the execution of the code converts the general-purpose computer into a special-purpose computer for executing the processing or display method of video data shown here. computer.

A person of ordinary skill in the art may be aware that the units and method steps of the examples described in combination with the embodiments disclosed herein can be implemented by electronic hardware or a combination of computer software and electronic hardware. Whether these functions are executed by hardware or software depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered as going beyond the scope of the embodiments of the present invention.

The above implementations are only used to illustrate the embodiments of the present invention, and are not intended to limit the embodiments of the present invention. Those of ordinary skill in the relevant technical field can also make various modifications without departing from the spirit and scope of the embodiments of the present invention. Changes and modifications, therefore, all equivalent technical solutions also belong to the scope of the embodiments of the present invention, and the scope of patent protection of the embodiments of the present invention should be defined by the claims.

Claims

A method for processing video data, including:

Acquiring a video to be played and link data, where the video includes information about a preset keyword, and the link data corresponds to a target content object indicated by the preset keyword;

During the playing process of the video, perform detection of the preset keyword information on at least part of the image frames and/or at least part of the audio data of the played video;

If it is determined according to the detection result that the information of the preset keyword is detected, the corresponding link data is displayed based on the played video.
The method according to claim 1, wherein the information of the preset keyword includes at least one of the following: a preset voice keyword and a preset text keyword.
The method according to claim 2, wherein the method further comprises: obtaining a corresponding identification of the target content object, and generating information of the preset keyword corresponding to the target content object according to the identification, wherein The identification includes at least one of the following: the name, model, and category of the target content object.
The method according to any one of claims 1-3, wherein the method further comprises:

Receive an operation on the displayed link data, and jump from the playback interface of the video to the page linked by the link data according to the operation.
The method according to any one of claims 1 to 3, wherein the display of the corresponding link data based on the video of the playback comprises:

On the video playback interface, a display control is added to display the corresponding link data, wherein the display control includes at least one of the following: a floating window, a mask, and a pop-up window.
The method according to claim 5, wherein the adding a display control to display the corresponding link data on the playing interface of the video comprises:

Performing image recognition on a preset number of image frames following the current image frame based on the information of the preset keyword being played, and determining the position information of the content object in the image frame according to the recognition result;

Determine the display position of the link data according to the position information of the content object in the image frame;

The corresponding link data is displayed in the display position through the display control.
The method according to claim 6, wherein the displaying the corresponding link data through the display control at the display position comprises:

Displaying the display control in the display position, and displaying a first child control and a second child control in the display control;

Wherein, the first sub-control is used to display the text and/or image information corresponding to the target content object indicated by the information of the preset keyword; the second sub-control includes the trigger control corresponding to the link data. When triggered, the playback interface of the video is jumped to the page linked by the link data.
The method according to claim 6, wherein the determining the display position of the link data according to the position information of the content object in the image frame comprises:

Determine the blank position in each image frame according to the position information of the content object in the image frame;

Determine the display position of the link data according to the blank position in each of the image frames.
The method according to claim 1, wherein the displaying of the link data corresponding to the video based on the playback comprises:

The link data is displayed with a preset display duration, so that an audience of the target content object can operate on the link data within the preset display duration.
The method according to claim 1, wherein the displaying of the link data corresponding to the video based on the playback comprises:

Match the copy data corresponding to the link data from the copy data input by the application provider of the target content object;

Generating link data to be displayed according to the link data and the matched copywriting data;

Based on the playing interface of the video, the link data to be displayed is displayed.
A display method including:

During the video playback process, when the information of the preset keyword is detected, the link data corresponding to the target content object indicated by the information of the detected preset keyword is displayed in the video playback interface;

Acquire a trigger operation on the link data corresponding to the target content object displayed in the video playback interface;

According to the trigger operation, jump from the video playback interface to the page linked by the link data for displaying the target content object.
A method for processing video data, including:

Obtain and play live video streams;

During the playback of the live video stream, perform content detection on the image frames in the live video stream, and/or perform content detection on the audio in the live video stream to obtain Contained content objects;

Searching whether the content object has corresponding link data;

The content object with corresponding link data is taken as the target content object, and the link data corresponding to the target content object is displayed in the playback interface of the live video stream.
The method according to claim 12, wherein said searching whether the content object has corresponding link data comprises:

Look up a preset commodity database to determine whether the content object has corresponding link data.
The method according to claim 12, wherein the live video stream includes information and corresponding link data for indicating preset keywords of the target content object to be recognized;

The searching whether the content object has corresponding link data includes:

It is determined whether the detected content object includes a content object that matches the target content object to be identified, and if it exists, it is determined that there is corresponding link data.
The method according to claim 12, wherein, during the playing process of the live video stream, content detection is performed on the image frames in the live video stream to obtain the content objects contained in the live video stream ,include:

During the playback of the live video stream, image recognition is performed on the preset position in the image frame in the live video stream, and the content object and/or the content object indicated by the text keyword in the image frame is obtained according to the recognition result Or the content object indicated by the image in the image frame.
The method according to claim 12, wherein, during the playing process of the live video stream, performing content detection on the audio in the live video stream to obtain the content objects contained in the live video stream, include:

During the playing process of the live video stream, audio recognition is performed on the audio in the live video stream, and the content object indicated by the voice keyword in the audio is obtained.
A video data processing device, including:

The first obtaining module is configured to obtain a video to be played and link data, wherein the video includes information about a preset keyword, and the link data corresponds to a target content object indicated by the preset keyword;

The first detection module is configured to detect the information of the preset keywords on at least part of the image frames and/or at least part of the audio data of the played video during the playback process of the video;

The first display module is configured to display the corresponding link data of the target content object based on the played video if it is determined that the information of the preset keyword is detected according to the detection result.
A display device includes:

The video playback module is used to display the link data corresponding to the target content object indicated by the detected preset keyword information in the video playback interface when the preset keyword information is detected during the video playback process;

The trigger acquisition module is used to acquire the trigger operation of the link data corresponding to the target content object displayed in the video playback interface;

The interface jump module is configured to jump from the video playback interface to the page linked by the link data for displaying the target content object according to the trigger operation.
A video data processing device, including:

The second acquisition module is used to acquire and play the live video stream;

The second detection module is configured to perform content detection on the image frames in the live video stream during the playback process of the live video stream, and/or perform content detection on the audio in the live video stream to Acquiring content objects included in the live video stream;

The matching module is used to find whether the content object has corresponding link data;

The second display module is configured to use the content object with corresponding link data as the target content object, and display the link data corresponding to the target content object in the play interface of the live video stream.
An electronic device, comprising: a processor, a memory, a communication interface, and a communication bus. The processor, the memory, and the communication interface communicate with each other through the communication bus;

The memory is used to store at least one executable instruction, and the executable instruction causes the processor to execute the operation corresponding to the video data processing method according to any one of claims 1-10, or execute the operation as claimed in the right The operation corresponding to the display method according to claim 11, or the operation corresponding to the video data processing method according to any one of claims 12-16 is performed.
A computer storage medium, on which a computer program is stored, and when the program is executed by a processor, the method for processing video data according to any one of claims 1-10 is realized, or when executed, the method for processing video data according to claim 11 is realized Or, when executed, the video data processing method according to any one of claims 12-16 is realized.