CN103970906B

CN103970906B - The method for building up and device of video tab, the display methods of video content and device

Info

Publication number: CN103970906B
Application number: CN201410228398.4A
Authority: CN
Inventors: 高浩渊
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2014-05-27
Filing date: 2014-05-27
Publication date: 2017-07-04
Anticipated expiration: 2034-05-27
Also published as: CN103970906A

Abstract

Method for building up and device, the display methods of video content and device the invention discloses a kind of video tab.Methods described includes：Searching in video includes at least one matching picture frame of object to be matched, and obtains match information corresponding with matching picture frame, wherein, match information at least includes：Matching picture frame timing node in video；According to match information, in video for object to be matched sets up video labeling label.The present invention is by being in video object to be matched sets up video labeling label and is shown in video display process markup information corresponding with video labeling label being added into video image technological means, optimize existing Web Video Service, improve the validity that product manufacturer promotes product in video, ensure that video user can be timely, accurate and effective obtains product interested in video, enormously simplify process of purchase of the video user to product of interest.

Description

Video label establishing method and device and video content display method and device

Technical Field

The embodiment of the invention relates to a video image processing technology, in particular to a method and a device for establishing a video label and a method and a device for displaying video content.

Background

With the continuous development of internet technology, at the present that network bandwidth is continuously increased, network videos attract vast users with convenient access experience, diversified film sources and real-time updating speed, so that the network videos become an indispensable important component in the network life of the users. Accordingly, more and more manufacturers implant their products into the network video, and hope to carry out benign popularization on the related products through the heat and popularity of the video.

The network video refers to a video and audio file which is provided by a network video service provider (e.g., a Baidu Eschka art), is in a streaming media playing format, and can be live or on demand online. Network Video generally requires a separate player, and the file format is mainly based on an FLV (Flash Video) format which occupies less client resources based on a P2P (Peer-to-Peer) technology.

However, with the continuous development of terminal technology and video website design technology, the requirement of people for network video becomes higher, and the traditional network video service provider cannot meet the increasingly enhanced personalized and convenient requirements of people in the process of watching network video.

Disclosure of Invention

In view of this, embodiments of the present invention provide a method and an apparatus for establishing a video tag, and a method and an apparatus for displaying video content, so as to enrich the information content in a video and improve the personalization and convenience of a video service.

In a first aspect, an embodiment of the present invention provides a method for establishing a video tag, including:

at least one matching image frame comprising an object to be matched is searched in a video, and matching information corresponding to the matching image frame is obtained, wherein the matching information at least comprises: a temporal node of the matching image frame in the video;

and establishing a video annotation label for the object to be matched in the video according to the matching information.

In a second aspect, an embodiment of the present invention provides a method for displaying video content, including:

playing the video;

and displaying the labeling information corresponding to the video labeling label in a video image according to the matching information of the video labeling label in the video.

In a third aspect, an embodiment of the present invention provides an apparatus for creating a video tag, including:

the matching information acquisition unit is used for searching at least one matching image frame including an object to be matched in a video and acquiring matching information corresponding to the matching image frame, wherein the matching information at least comprises: a temporal node of the matching image frame in the video;

and the label establishing unit is used for establishing a video label for the object to be matched in the video according to the matching information.

In a fourth aspect, an embodiment of the present invention provides a display apparatus for video content, including:

the video playing unit is used for playing videos;

and the annotation information display unit is used for displaying the annotation information corresponding to the video annotation label in the video image according to the matching information of the video annotation label in the video.

According to the invention, through the technical means of establishing the video label for the object to be matched in the video and adding the label information corresponding to the video label into the video image for display in the video playing process, the existing network video service is optimized, the information content in the video is enriched, the effectiveness of product manufacturers in promoting products in the video is improved, the video user can be ensured to timely, accurately and effectively obtain the product of interest in the video, the purchasing process of the product of interest by the video user is greatly simplified, and the individuation and the convenience of the video service are improved.

Drawings

Fig. 1 is a flowchart of a method for creating a video tag according to a first embodiment of the present invention;

FIG. 2 is a diagram illustrating a first embodiment of the present invention for determining a specific position of an object to be matched in a matching image frame;

fig. 3 is a flowchart of a method for creating a video tag according to a second embodiment of the present invention;

fig. 4 is a flowchart of a method for creating a video tag according to a third embodiment of the present invention;

fig. 5 is a flowchart of a method for determining whether an associated image frame includes an object to be matched according to a third embodiment of the present invention;

FIG. 6 is a diagram illustrating a third embodiment of determining candidate regions within a relevance region;

fig. 7 is a flowchart of a method for displaying video content according to a fourth embodiment of the present invention;

FIG. 8 is a diagram illustrating a fourth embodiment of the present invention for displaying annotation information when video is paused;

FIG. 9 is a diagram illustrating a fourth embodiment of the present invention for displaying annotation information during video playing;

FIG. 10 is a schematic diagram of an application scenario of an embodiment of the present invention;

fig. 11 is a block diagram of a video tag creation apparatus according to a fifth embodiment of the present invention;

fig. 12 is a block diagram of a display device of video contents according to a sixth embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention are described in further detail below with reference to the accompanying drawings. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some but not all of the relevant aspects of the present invention are shown in the drawings.

First, the implementation concept of each embodiment of the present invention is briefly described as follows: by the method of each embodiment of the invention, a product manufacturer or a product distributor of a third party can provide products appearing in the video and corresponding product introduction to a network video service provider or a video processor of the third party, and the introduced products can be any content which the user may be interested in or hopes to recommend to the user, such as clothes of a video character, a car driven by the video character, a building appearing in the video or scenic spots appearing in the video; searching the position of the product to be matched in the video by a network video service provider or a third-party video processing provider, and adding a corresponding video annotation label; when a video user logs in a video website provided by a network video service provider to watch videos, if the product appears in the videos, the network video service provider can push the detailed information of the product to the video user.

First embodiment

Fig. 1 is a flowchart of a method for creating a video tag according to a first embodiment of the present invention, where the method of this embodiment may be performed by a device for creating a video tag, the device may be implemented by hardware and/or software, and may be generally integrated in a server, for example, a server capable of providing video services controlled by a network video service provider or a video processor of a third party. The method of the embodiment specifically includes the following operations:

110. at least one matching image frame comprising an object to be matched is searched in a video, and matching information corresponding to the matching image frame is obtained, wherein the matching information at least comprises: a temporal node of the matching image frame in the video.

In this embodiment, the server searches at least one matching image frame including an object to be matched in the video, and acquires matching information corresponding to the matching image frame.

In this embodiment, the object to be matched may specifically be an entity object such as a person, an animal, a garment, an automobile, an electronic product, furniture, or a sight spot appearing in the video. Of course, the object to be matched may also be other entity objects appearing in the video, which is not limited to this, and is generally an entity object that needs to recommend information to the user.

The video is formed by continuously playing a series of still pictures (video frames). In this embodiment, the matching image frame specifically means that a video frame including the object to be matched is included in one video.

It can be understood that at least one matching image frame including an object to be matched can be searched in a video by a manual matching manner, but the searching manner is time-consuming and labor-consuming, and especially the searching efficiency is very poor and the searching is easy to miss when the number of the objects to be matched is large or the video time is long.

In the embodiment, at least one matching image frame including the object to be matched is searched in the video in a server matching manner. The server may obtain feature information (for example, grayscale feature information or image feature information) of an object to be matched, perform image matching on the feature information and each video frame of the video, use a video frame that passes through matching as a matching image frame, and obtain a time node of the matching image frame in the video as matching information.

Wherein, the time node of the matched image frame in the video refers to the specific playing position of the matched image frame in the video: the frame number of the matching image frame or the timestamp of the matching image frame in the video can be obtained as the time node of the matching image frame in the video.

For example, the object to be matched is a certain model of automobile, the automobile appears in the video a, the server matches the feature information of the automobile with each video frame in the video a, and if the video frame (or the video frame numbered 10004) played by the video a at the 3 rd minute 40 second includes the feature information of the automobile, the time scale value of 3 rd minute 40 second (or the frame numbered value 10004) is taken as the time node of the matched image frame in the video.

Of course, those skilled in the art may understand that the matching information may include, in addition to the time node of the matching image frame in the video, position information of the object to be matched in the matching image frame or matching coefficients obtained by performing matching operation on the object to be matched and the matching video frame, and this is not limited. The matching coefficient specifically refers to a correlation weight obtained by performing correlation operation on the object to be matched and the matching video frame, wherein the larger the matching coefficient is, the larger the probability representing that the object to be matched appears in the matching video frame is. In practical application, when a plurality of consecutive video frames are all matching video frames, a corresponding video annotation tag can be inserted into the matching video frame with the largest matching coefficient.

The position information of the object to be matched in the matching image frame specifically refers to position coordinates of the object to be matched in the matching image frame. Fig. 2 shows a schematic diagram of the position coordinates of an object to be matched in a matching image frame. As shown in fig. 2, the matching video frame 200 is a picture in 1024 × 768 format, and the object to be matched matches the region 21 in the matching video frame 200, four endpoint coordinates (256 ), (384, 256), (256, 736), and (384, 736) of the region 21 can be acquired as the position coordinates of the object to be matched in the matching image frame.

120. And establishing a video annotation label for the object to be matched in the video according to the matching information.

In this embodiment, the server establishes a video annotation tag for the object to be matched in the video according to the matching information.

The corresponding relation between the time node and the labeling information of the object to be matched can be used as a video labeling label; the corresponding relationship between the time node, the position information and the labeling information of the object to be matched can also be used as a video labeling label. A structure table of a video annotation tag is shown in table 1.

The labeling information of the object to be matched may specifically be description information of the object to be matched, for example: when the object to be matched is clothing, the labeling information can be information such as a brand name, a style name, a price, a purchase link and the like of the clothing; when the object to be matched is a sight spot, the labeling information may be information such as a sight spot name, a location where the sight spot is located, and a link of sight spot description.

In this embodiment, the video annotation tag may be stored in correspondence with the video.

TABLE 1

According to the technical scheme of the embodiment of the invention, the detailed information of the recommended content can be obtained without manual search of the user, and no requirement is made on the manual search capability of the user, so that the method and the device are suitable for any user group. Meanwhile, the user can immediately know the information of the recommended content in the process of watching the video, so that the purchasing desire or the consumption plan of the user cannot be weakened along with the delay of the product information acquisition time, and the method is beneficial to product manufacturers and video users, and is beneficial to product manufacturers to popularize products and video users to timely, accurately and effectively acquire the information of interested products. The method optimizes the existing network video service, enriches the information content in the video, improves the effectiveness of product manufacturers in promoting products in the video, ensures that the video users can timely, accurately and effectively acquire interesting products in the video, greatly simplifies the purchasing process of the interesting products by the video users, and improves the individuation and the convenience of the video service.

It can be understood that, when video searching is performed on different objects to be matched, matching accuracy and matching efficiency are different. For example, the face recognition technology is mature at present, and many emerging algorithms are based on face recognition, and the matching speed and the matching accuracy are high. The identification technology of objects such as clothes is relatively cold, and only the conventional image identification technology can be adopted, and the matching speed and the matching accuracy are far lower than those of human face identification.

In order to solve the above problem, in the embodiment of the present invention, before searching for some objects to be matched, which are not easy to perform image recognition, in a video, first, an associated matching object associated with the object to be matched is searched for in the video, and the object to be matched is searched for in an associated image frame including the associated matching object. The selection principle of the associated matching object is as follows: video searching of the associated matching object is relatively easy (the identification technology is relatively mature or the identification process is relatively simple, etc.) and the probability that the associated matching object and the object to be matched appear in the same video frame is greater than a predetermined threshold value.

For example, if the object to be matched is a garment, an associated image frame set including a face of a video person wearing the garment may be first searched in the video, and then the garment may be further searched in the associated image frame set; or, if the object to be matched is a car, the associated image frame set including the road may be first searched in the video, and then the car may be further searched in the associated image frame set.

Second embodiment

Fig. 3 is a flowchart of a method for creating a video tag according to a second embodiment of the present invention, where the present embodiment is optimized based on the above embodiment, in the present embodiment, it is preferable that an operation is performed according to the associated matching object, at least one matching image frame including an object to be matched is searched in a video, and corresponding matching information is obtained and optimized as follows: acquiring a related matching object of the object to be matched; searching an associated image frame set comprising the associated matching object in the video; and searching at least one matching image frame comprising the object to be matched in the associated image frame set, and acquiring corresponding matching information.

Correspondingly, the method of the embodiment specifically includes the following operations:

310. and acquiring the associated matching object of the object to be matched.

In a typical example, the object to be matched is a dress, and the obtained associated matching object is a human face, wherein the human face corresponds to a video character wearing the dress in the video.

320. And searching the relevant image frame set comprising the relevant matching object in the video.

330. And searching at least one matching image frame comprising the object to be matched in the associated image frame set, and acquiring corresponding matching information.

340. And establishing a video annotation label for the object to be matched in the video according to the matching information.

According to the embodiment of the invention, by searching the relevant image frame set comprising the relevant matching object in the video, searching at least one matching image frame comprising the object to be matched in the relevant image frame set, and acquiring the corresponding matching information, for the object to be matched (such as clothes) with poor matching effect in the field of image recognition, the matching is not directly performed in each video frame in the video, but the relevant matching object (such as a human face) which is relevant to the object to be matched and has better matching effect is matched in the video, and the corresponding relevant video frame set is acquired, at the moment, only the object to be matched is required to be searched in the relevant video frame set, so that the searching difficulty of the matching video frame is greatly simplified, and the matching speed and the matching accuracy of the object to be matched are greatly improved.

On the basis of the above embodiments, before searching in the video, the method further includes:

and acquiring a key frame set in the video as the video to be searched, wherein the key frames in the key frame set are arranged according to the time sequence in the video.

Wherein the acquiring the key frame set in the video specifically includes:

and sequentially sampling the video according to a preset sampling frequency, and taking a set of sequentially acquired video frames as the key frame set.

For example, 100000 video frames are included in a video, the video frame rate is 16HZ, that is, 16 video frames are continuously displayed per second, and considering that the scene change speed in the video content is not very severe in the actual application, the image content in many continuous video frames is very similar, in addition, the main purpose of the embodiment of the present invention is to push annotation information corresponding to an object to be matched to a user watching the video, and theoretically, as long as it can be ensured that a matching object can be matched and searched in each scene, the actual requirement can be met. Therefore, images of each frame do not need to be searched, preferably, one video frame can be extracted from the video at intervals (for example, 2s) to serve as a key frame, and the set of key frames serves as the video to be searched.

Third embodiment

Fig. 4 is a flowchart of a method for creating a video tag according to a third embodiment of the present invention, where the present embodiment is optimized based on the above embodiment, in the present embodiment, it is preferable that an operation is optimized to search at least one matching image frame including an object to be matched in a video according to the associated matching object, and obtain corresponding matching information, where the operation is: acquiring a related matching object of the object to be matched; sequentially searching relevant image frames comprising the relevant matching objects in the video; if the associated image frame is determined to comprise the object to be matched, taking the associated image frame as the matched image frame and acquiring corresponding matching information; and searching at least one matching image frame comprising the object to be matched in the video from the video position of the associated image frame by using a video tracking algorithm, and acquiring corresponding matching information.

410. and acquiring the associated matching object of the object to be matched.

420. And sequentially searching an associated image frame comprising the associated matching object in the video.

430. Judging whether the associated image frame comprises the object to be matched: if so, perform operation 440; otherwise, operation 420 is returned.

In this embodiment, matching may be directly performed in all image regions of the associated image frame to determine whether the associated image frame includes the object to be matched, or one or more candidate image regions may be selected according to the position information of the associated matching object in the associated image frame, and matching may be performed in the candidate image regions to determine whether the associated image frame includes the object to be matched, which is not limited thereto. Obviously, the matching speed of the object to be matched can be further improved by selecting the candidate image area for matching, and the matching efficiency is improved.

For example, the object to be matched is clothing, the associated matching image is a face, the server can acquire position information of the face in the associated image frame to serve as an associated area, a candidate area is determined below the associated area, and whether the clothing is included in the candidate area is determined; the object to be matched is an automobile, the associated matched image is a road, the server can acquire position information of the road in the associated image frame to serve as an associated area, a candidate area is determined above the associated area, and whether the automobile is included in the candidate area or not is determined.

Fig. 5 shows a flowchart of a method for determining whether an associated image frame includes an object to be matched, and as shown in fig. 5, the method specifically includes the following operations:

4301. and acquiring a correlation area matched with the correlation matching object in the correlation image frame.

4302. And determining at least one candidate region corresponding to the object to be matched according to the associated region.

In a specific example, the object to be matched is a garment, and the associated matching object is an avatar of a video person wearing the garment. A schematic illustration of determining candidate regions within a relevance region is shown in fig. 6. As shown in fig. 6, the server first obtains the associated area 61 matched with the head portrait in the video frame 600, and considering that the clothing is located under the associated area 61 where the head portrait is located with a high probability, the server may determine the location of the candidate area 62 according to the associated area 61, and match the object to be matched in the candidate area 62.

Of course, in order to avoid missing search as much as possible, a plurality of candidate regions may be selected for matching, which is not limited.

4303. And performing correlation matching calculation on the object characteristic information of the object to be matched and the characteristic information of the candidate region to obtain a corresponding correlation threshold.

In a specific example, the server performs correlation matching calculation on the object feature information of the object to be matched and the feature information of the candidate region by using a feature matching method, specifically: and respectively extracting feature point sets of the object to be matched and the candidate region, performing correlation matching calculation by using the two extracted feature point sets, and obtaining corresponding correlation threshold values.

Of course, the server may also perform correlation matching calculation on the object feature information of the object to be matched and the feature information of the candidate region by using other methods, for example, a model classification method, which is not limited herein.

4304. Judging whether the correlation threshold is larger than a preset threshold or not: if yes, go to operation 4305; otherwise, operation 4306 is performed.

4305. And determining that the associated image frame comprises the object to be matched.

4306. Determining that the object to be matched is not included in the associated image frame.

440. And taking the associated image frame as the matched image frame and acquiring corresponding matching information.

450. And searching at least one matching image frame comprising the object to be matched in the video from the video position of the associated image frame by using a video tracking algorithm, and acquiring corresponding matching information.

Considering that in practical application, the scene change speed in the video content is not very severe, and the correlation between two frames in the video is generally large, it can be reasonably estimated that when an object to be matched is included in a matching video, the object to be matched is also included in one or more subsequent video frames. Therefore, the video tracking algorithm can be used for estimating the possible area of the object to be matched in the next video frame in advance, and searching the object to be matched in the possible position.

460. Judging whether the last matching image frame searched currently is the last video frame of the video: if so, perform operation 460; otherwise, operation 420 is returned.

470. And establishing a video annotation label for the object to be matched in the video according to the matching information.

According to the embodiment of the invention, after the matched video frame where the object to be matched is located is found according to the associated matched object, the technical means that the object to be matched is continuously found at the video position where the matched video frame is located by using the video tracking algorithm is adopted, so that the finding efficiency of the matched video frame is further improved, and the matching speed of the object to be matched is further improved.

Fourth embodiment

Fig. 7 is a flowchart illustrating a method for displaying video content according to a fourth embodiment of the present invention. The method of the present embodiment may be performed by a display device of video content, which may be implemented by means of hardware and/or software, and may be generally integrated in a terminal device. The method of the embodiment specifically includes the following operations:

710. and playing the video.

In this embodiment, the terminal device acquires and plays the video by logging in a video website provided by a network video service provider.

720. And displaying the labeling information corresponding to the video labeling label in a video image according to the matching information of the video labeling label in the video.

In this embodiment, the terminal device displays, in a video image, annotation information corresponding to a video annotation tag according to matching information of the video annotation tag in the video.

The matching information of the video annotation tag comprises: the matching method includes a time node of a matching image frame of an object to be matched in the video, and may further include position information of the object to be matched in the matching image frame or information such as a matching coefficient obtained by performing matching operation on the object to be matched and the matching video frame, which is not limited to this.

The annotation information corresponding to the video annotation tag is specifically description information of an object to be matched, for example: when the object to be matched is clothing, the labeling information can be information such as a brand name, a style name, a price, a purchase link and the like of the clothing; when the object to be matched is a sight spot, the labeling information may be information such as a name of the sight spot, a location where the sight spot is located, and a website link describing the sight spot.

In this embodiment, the displaying, by the server, the annotation information corresponding to the video annotation tag in the video image according to the matching information of the video annotation tag in the video may specifically include the following four ways:

and in the mode 1, if the corresponding video label exists in the current time node of the played video, adding label information corresponding to the video label into a video image which is adaptive to the position information in the matching information for displaying.

And 2, if the pause playing time point is positioned in the time interval corresponding to the time node when the video is paused, adding the marking information of the video marking label into the video image which is adaptive to the position information in the matching information for displaying. Fig. 8 is a schematic diagram showing the display of annotation information when the video is paused in the usage mode 2.

And 3, if the corresponding video annotation label exists in the current time node of the played video, adding annotation information corresponding to the video annotation label to the boundary position of the currently played video image for displaying.

And 4, if the pause playing time point is positioned outside the time interval corresponding to the time node when the video is paused, adding the marking information of the video marking label to the boundary position of the currently played video image for displaying. Wherein the boundary position comprises a top region and/or a bottom region of the video image. Fig. 9 is a schematic diagram illustrating that the annotation information is displayed in the video playing process by using the method 4.

Fig. 10 is a schematic diagram illustrating an application scenario according to an embodiment of the present invention. As shown in fig. 10, the present application scenario includes: a product manufacturer or third party distributor terminal 101, a network video server or third party server 102, and a user terminal 103. Assuming that a product manufacturer or a third-party distributor terminal 101 wants to popularize all the clothes (assumed to be 6) through which the hot broadcast drama from you of stars centralized in the 06 th golden virtuous actor, the pictures of the 6 clothes to be popularized and the corresponding 6 pieces of annotation information (brand, model, purchase link, etc.) can be sent to the network video server or the third-party server 102, so as to realize that the corresponding video annotation tags are added in the 06 th centralized manner in the network video from you of stars.

After receiving 6 clothes to be promoted sent by a product manufacturer or a third-party distributor terminal 101, a network video server or a third-party server 102 may first establish 6 different clothes models according to pictures of the 6 clothes and a mismatch model not belonging to a set of the 6 clothes models, then search for a Jinxiu person's face in a video of a 06 th set from Samsung, and after finding out an associated matching video frame including the Jinxiu person's face, respectively match the 7 models established before with the associated matching video frame, and establish a corresponding video tagging label for the corresponding clothes of the model with the highest matching degree. And after the matching of the videos is completed, correspondingly storing each established video labeling label and the video in a network video server.

When the user terminal 103 obtains and plays the video through the video website provided by the network video server, the network video server displays the annotation information corresponding to the video annotation tag in the video image.

Fifth embodiment

Fig. 11 is a block diagram of a video tag creation apparatus according to a fifth embodiment of the present invention. As shown in fig. 11, the apparatus includes:

a matching information obtaining unit 111, configured to search at least one matching image frame including an object to be matched in a video, and obtain matching information corresponding to the matching image frame, where the matching information at least includes: a temporal node of the matching image frame in the video.

And an annotation tag establishing unit 112, configured to establish a video annotation tag for the object to be matched in the video according to the matching information.

On the basis of the foregoing embodiments, the matching information obtaining unit may be further configured to obtain position information of the object to be matched in the matching image frame as matching information corresponding to the matching image frame.

On the basis of the foregoing embodiments, the matching information obtaining unit may be specifically configured to:

acquiring a related matching object of the object to be matched;

searching an associated image frame set comprising the associated matching object in the video;

and searching at least one matching image frame comprising the object to be matched in the associated image frame set, and acquiring corresponding matching information.

On the basis of the foregoing embodiments, the matching information obtaining unit may specifically include:

the associated matched object obtaining subunit is used for obtaining an associated matched object of the object to be matched;

the relevant image frame acquisition subunit is used for sequentially searching for relevant image frames comprising the relevant matching objects in the video;

the matching subunit is used for taking the associated image frame as the matching image frame and acquiring corresponding matching information if the associated image frame is determined to comprise the object to be matched;

and the matching information acquisition subunit is used for searching at least one matching image frame comprising the object to be matched in the video from the video position where the associated image frame is located by using a video tracking algorithm and acquiring corresponding matching information.

On the basis of the above embodiments, the object to be matched may be a garment, and the associated matching object may be a face, where the face corresponds to a video person wearing the garment in the video.

On the basis of the foregoing embodiments, the matching subunit may specifically be configured to:

acquiring a correlation area matched with the correlation matching object in the correlation image frame;

determining at least one candidate region corresponding to the object to be matched according to the associated region;

carrying out correlation matching calculation on the object characteristic information of the object to be matched and the characteristic information of the candidate region to obtain a corresponding correlation threshold;

and if the correlation threshold is larger than a preset threshold, determining that the associated image frame comprises the object to be matched.

The video tag establishing device provided by the embodiment of the invention can be used for executing the video tag establishing method provided by any embodiment of the invention, has corresponding functional modules and realizes the same beneficial effects.

Sixth embodiment

Fig. 12 is a diagram showing a configuration of a display device for video content according to a sixth embodiment of the present invention. As shown in fig. 12, the apparatus includes:

and a video playing unit 121 for playing the video.

And the annotation information display unit 122 is configured to display, according to the matching information of the video annotation tag in the video, annotation information corresponding to the video annotation tag in a video image.

On the basis of the foregoing embodiments, the annotation information display unit may be specifically configured to:

if the corresponding video annotation label exists in the current time node of the played video, adding annotation information corresponding to the video annotation label into a video image which is adaptive to the position information in the matching information for displaying;

if the pause playing time point is positioned in the time interval corresponding to the time node when the video is paused, adding the marking information of the video marking label into the video image which is adaptive to the position information in the matching information for displaying;

if the corresponding video annotation label exists in the current time node of the played video, adding annotation information corresponding to the video annotation label to the boundary position of the currently played video image for displaying;

if the pause playing time point is positioned outside the time interval corresponding to the time node when the video is paused, adding the marking information of the video marking label to the boundary position of the currently played video image for displaying;

wherein the boundary position comprises a top region and/or a bottom region of the video image.

The video content display device provided by the embodiment of the invention can be used for executing the video content display method provided by any embodiment of the invention, has corresponding functional modules and realizes the same beneficial effects.

It will be apparent to those skilled in the art that the modules or steps of the present invention described above may be implemented by a server as described above. Alternatively, the embodiments of the present invention may be implemented by programs executable by a computer device, so that they can be stored in a storage device and executed by a processor, where the programs may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.; or separately as individual integrated circuit modules, or as a single integrated circuit module from a plurality of modules or steps within them. Thus, the present invention is not limited to any specific combination of hardware and software.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method for establishing a video tag is characterized by comprising the following steps:

according to the matching information, establishing a video annotation label for the object to be matched in the video;

the searching at least one matching image frame including an object to be matched in the video and acquiring corresponding matching information specifically includes:

acquiring a related matching object of the object to be matched;

at least one matching image frame comprising the object to be matched is searched in the associated image frame set, and corresponding matching information is obtained;

or,

acquiring a related matching object of the object to be matched;

sequentially searching relevant image frames comprising the relevant matching objects in the video;

if the associated image frame is determined to comprise the object to be matched, taking the associated image frame as the matched image frame and acquiring corresponding matching information;

and searching at least one matching image frame comprising the object to be matched in the video from the video position of the associated image frame by using a video tracking algorithm, and acquiring corresponding matching information.

2. The method of claim 1, wherein: the matching information also comprises position information of the object to be matched in the matched image frame.

3. The method according to claim 1, wherein the object to be matched is clothing, and the associated matching object is a human face, wherein the human face corresponds to a video character wearing the clothing in the video.

4. The method of claim 1, further comprising, prior to searching in the video:

acquiring a key frame set in the video as a video to be searched, wherein all key frames in the key frame set are arranged according to a time sequence in the video;

wherein the acquiring the key frame set in the video specifically includes:

5. The method according to claim 1, wherein if it is determined that the associated image frame includes the object to be matched, taking the associated image frame as the matching image frame and acquiring corresponding matching information specifically includes:

6. A method for displaying video content, comprising:

playing the video;

displaying annotation information corresponding to the video annotation tag in a video image according to matching information of the video annotation tag in the video, wherein the video annotation tag is established by the method of any one of claims 1 to 5.

7. The method of claim 6, wherein displaying annotation information corresponding to the video annotation tag in a video image according to matching information of the video annotation tag in the video comprises:

if the corresponding video annotation label exists in the current time node of the played video, adding annotation information corresponding to the video annotation label into a video image corresponding to position information in the matching information for displaying, or adding annotation information corresponding to the video annotation label into the boundary position of the currently played video image for displaying;

8. An apparatus for creating a video tag, comprising:

the annotation label establishing unit is used for establishing a video annotation label for the object to be matched in the video according to the matching information;

the matching information obtaining unit is specifically configured to:

acquiring a related matching object of the object to be matched;

at least one matching image frame comprising the object to be matched is searched in the associated image frame set, and corresponding matching information is obtained,

or, the matching information obtaining unit specifically includes:

9. The apparatus of claim 8, wherein: the matching information obtaining unit is further configured to obtain position information of the object to be matched in the matching image frame as matching information corresponding to the matching image frame.

10. The apparatus according to claim 8, wherein the object to be matched is a garment, and the associated matching object is a human face, wherein the human face corresponds to a video person wearing the garment in the video.

11. The apparatus according to claim 8, wherein the matching subunit is specifically configured to:

12. A display device for video content, comprising:

the video playing unit is used for playing videos;

and the annotation information display unit is used for displaying annotation information corresponding to the video annotation label in the video image according to the matching information of the video annotation label in the video, wherein the video annotation label is established by the device of any one of claims 8 to 11.