CN111382620A

CN111382620A - Video tag adding method, computer storage medium and electronic device

Info

Publication number: CN111382620A
Application number: CN201811628075.9A
Authority: CN
Inventors: 杨忠伟
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2018-12-28
Filing date: 2018-12-28
Publication date: 2020-07-07
Anticipated expiration: 2038-12-28
Also published as: CN111382620B

Abstract

The embodiment of the invention discloses a video label adding method, a video playing method, a video searching method, a video pushing method, a server, terminal equipment, a computer storage medium and electronic equipment. The video tag adding method comprises the following steps: acquiring a feature vector of a key frame picture of each video in a video set, wherein the feature vector of the key frame picture of the video is used as a sample vector, and all the sample vectors form a sample vector set; acquiring a characteristic vector of a sample picture corresponding to the label as a target vector; finding out a sample vector similar to the target vector from the sample vector set; and adding a label to the video corresponding to the found sample vector.

Description

Video tag adding method, computer storage medium and electronic device

Technical Field

The present invention relates to the field of video processing technologies, and in particular, to a video tag adding method, a video playing method, a video searching method, a video pushing method, a server, a terminal device, a computer storage medium, and an electronic device.

Background

With the development of internet technology, more and more users acquire content of interest of themselves in a video mode, and tagging videos becomes an important means for video distribution. By tagging the video, content features in the video can be identified, facilitating selection by a user.

In the prior art, the scheme of adding tags to a video is implemented by adopting a fixed tag library: a plurality of labels are preset in the label library, and each video in the video library is made to traverse the labels to determine which labels can be marked on each video. Specifically, the existing method for adding video tags is based on a deep learning mode, training a recognition model by using sample object pictures corresponding to all tags, and after the recognition model is trained, inputting videos to be tagged into the recognition model one by one, so as to print one or more tags corresponding to the videos to be tagged one by one. When the tag library is expanded with new tags, the sample object pictures of the new tags need to be added to retrain the recognition model, and after the training is completed, the new recognition model is used for repeating the process of tagging each video, so that the tags are added to each video again. The videos in the video library are usually massive and sometimes can reach hundreds of millions of videos, so that the time of marking all the videos in the video library once is often consumed for several months, the operation is very inflexible, and the calculation is wasted. Therefore, the video tag adding technology adopted in the prior art needs to spend a large amount of time to re-mark the original video after adding the new tag, and is not beneficial to expanding the new tag.

Therefore, a new video tagging scheme is needed to be provided, which can more quickly tag the video, so as to solve the problems existing in the prior art.

Disclosure of Invention

The embodiment of the invention aims to provide a new video label adding scheme, and labels are added to videos more quickly.

According to a first aspect of the present invention, there is provided a video tag adding method, including the steps of:

acquiring a feature vector of a key frame picture of a video in a video set, wherein the feature vector of the key frame picture of the video is used as a sample vector, and the sample vector forms a sample vector set;

acquiring a characteristic vector of a sample picture corresponding to the label as a target vector;

finding a sample vector similar to the target vector from the sample vector set;

adding the label to the video corresponding to the sample vector found.

Optionally or preferably, the finding of the sample vector similar to the target vector from the sample vector set comprises:

finding a sample vector similar to the target vector from the sample vector set based on a nearest neighbor search algorithm.

Optionally or preferably, the obtaining the feature vector of the key frame picture of the video includes:

performing shot segmentation on the video to obtain at least one shot;

taking the first frame picture of each shot as a key frame picture;

and extracting a feature vector of each key frame picture.

extracting high-dimensional feature vectors from the key frame pictures; and the number of the first and second groups,

and performing dimensionality reduction on the high-dimensional feature vector of the key frame picture.

and extracting the feature vector of the key frame picture by using a neural network model or a bag-of-words model based on a scale-invariant feature variable clustering algorithm.

Optionally or preferably, the obtaining a feature vector of a sample picture corresponding to the tag includes:

extracting high-dimensional feature vectors from the sample picture; and the number of the first and second groups,

and performing dimensionality reduction on the high-dimensional feature vector of the sample picture.

and extracting the feature vector of the sample picture corresponding to the label by using a neural network model or a bag-of-words model based on a scale-invariant feature variable clustering algorithm.

Optionally or preferably, the nearest neighbor retrieval algorithm comprises: a K-d tree based neighbor search algorithm or a product quantization based neighbor search algorithm.

Optionally or preferably, the finding a sample vector similar to the target vector from the sample vector set based on a nearest neighbor search algorithm comprises:

establishing a data index of the sample vector set by performing product quantization processing on sample vectors in the sample vector set;

quantizing the target vector into the data index by performing product quantization processing on the target vector to obtain a distance between a sample vector in the sample vector set and the target vector;

and determining the sample vector with the distance from the target vector less than a preset threshold value as the sample vector similar to the target vector.

According to a second aspect of the present invention, there is provided a video playing method, comprising the steps of:

receiving a playing request of a first video sent by terminal equipment;

sending the first video and the label thereof to the terminal equipment;

the feature vectors of the key frame pictures of the first video belong to a first feature vector set, and the videos corresponding to the first feature vector set are provided with the labels;

the first feature vector set is a set of feature vectors similar to the feature vectors of the sample pictures corresponding to the tags, which are found from a second feature vector set, the second feature vector set is a set of feature vectors of key frame pictures of videos in a video library, and the first video belongs to the video library.

sending a playing request of a first video to a server;

receiving the first video from a server, the first video having a tag attached thereto;

According to a third aspect of the present invention, there is provided a video search method comprising the steps of:

receiving a video search request sent by terminal equipment, wherein the video search request comprises a tag of a video to be searched;

searching in a video library according to the label, and issuing the searched video to the terminal equipment;

the feature vectors of the key frame pictures of the searched videos belong to a first feature vector set, and the videos corresponding to the first feature vector set are provided with the labels;

the first feature vector set is a set of feature vectors similar to the feature vectors of the sample pictures corresponding to the tags, which are searched from a second feature vector set, and the second feature vector set is a set of feature vectors of key frame pictures of videos in a video library.

sending a video search request to a server, wherein the video search request comprises a label of a video to be searched;

receiving videos searched by a server in a video library according to the tags;

According to a fourth aspect of the present invention, there is provided a video push method, including the steps of:

acquiring a label of a video browsed by a user;

searching in a video library according to the label, and pushing the searched video to the terminal equipment;

According to a fifth aspect of the present invention, there is provided a storage medium storing executable instructions which, when executed by a processor, implement the method of any one of the above.

According to a sixth aspect of the present invention, there is provided an electronic apparatus comprising:

a memory storing executable instructions,

a processor, the executable instructions when executed by the processor implementing the method of any of the above.

According to a seventh aspect of the present invention, there is provided a server comprising:

a memory storing executable instructions,

a processor, the executable instructions when executed by the processor implementing the above method.

According to an eighth aspect of the present invention, there is provided a terminal device comprising:

a memory storing executable instructions,

The video tag adding method in the embodiment converts the adding of the video tag into the problem of searching for the sample vector similar to the target vector, and finds out the sample vector similar to the target vector through a searching mode, so that the adding of the video tag is quickly realized. Particularly, when a new tag is expanded in the tag library, the method in the embodiment can quickly add the new tag to the corresponding video, and has a very significant advantage in processing speed compared with the prior art.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description, serve to explain the principles of the invention.

Fig. 1 shows a schematic structural diagram of a video tag adding system provided by an embodiment of the present disclosure.

Fig. 2 shows a flowchart of a video tag adding method provided by the embodiment of the present disclosure.

Fig. 3 shows a flowchart of a method for obtaining feature vectors of key frame pictures of a video according to an embodiment of the present disclosure.

Fig. 4 shows a flowchart of a method for obtaining feature vectors of key frame pictures of a video according to an embodiment of the present disclosure.

Fig. 5 shows a flowchart of a method for finding a sample vector similar to a target vector according to an embodiment of the present disclosure.

Fig. 6 shows a schematic structural diagram of an electronic device provided by an embodiment of the present disclosure.

Detailed Description

Various exemplary embodiments of the present invention will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, the numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless specifically stated otherwise.

The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the invention, its application, or uses.

Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.

In all examples shown and discussed herein, any particular value should be construed as merely illustrative, and not limiting. Thus, other examples of the exemplary embodiments may have different values.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.

< video tag adding System, method, computer storage Medium, electronic device, Server >

< video tag addition System >

As shown in fig. 1, the video tag adding system 1000 provided by the embodiment of the present disclosure includes a video library server 1100-1 and a video management server 1100-2. The video library server 1100-1 and the video management server 1100-2 each have, for example, the structure of the server 1100 as shown in the figure.

The server 1100 described above can be a unitary server or distributed servers across multiple computers or computer data centers. The server 1100 may be a blade server or the like. The processor 1100 may be a dedicated server processor, or may be a desktop processor, a mobile version processor, or the like that meets performance requirements, and is not limited herein. The servers may be of various types, such as, but not limited to, news servers, mail servers, message servers, advertisement servers, file servers, application servers, interaction servers, database servers, or proxy servers. In some embodiments, each server may include hardware, software, or embedded logic components or a combination of two or more such components for performing the appropriate functions supported or implemented by the server.

In one example, the server 1100 can be a computer. The server 1100 may be as shown in FIG. 1, including a processor 1110, a memory 1120, an interface device 1130, a communication device 1140, a display device 1150, and an input device 1160. Although the server 1100 may also include a speaker, a microphone, and the like, these components are not relevant to the present invention and are omitted here. The processor 1110 may be, for example, a central processing unit CPU, a microprocessor MCU, or the like. The memory 1120 includes, for example, a ROM (read only memory), a RAM (random access memory), a nonvolatile memory such as a hard disk, and the like. The interface device 1130 includes, for example, a serial bus interface, a parallel bus interface, a USB interface, and the like. The communication device 1140 is capable of wired or wireless communication, for example. The display device 1150 is, for example, a liquid crystal display panel. Input devices 1160 may include, for example, a touch screen, a keyboard, and the like.

Although a number of devices of server 1100 are shown in fig. 1, embodiments of the present invention may refer to only some of the devices.

The video library server 1100-1 serves to store a video library, i.e., a video set, and can transmit video data to the video management server 1100-2. The video management server 1100-2 may receive video data transmitted by the video library server 1100-1 and may tag the video.

In another embodiment, the video library server 1100-1 and the video management server 1100-2 included in the video tagging system 1000 may be integrated into a single integrated server.

The video tagging system 1000 shown in fig. 1 is merely illustrative and is in no way intended to limit the present invention, its application, or uses.

< video tag adding method >

The embodiment discloses a video tag adding method, which can be implemented by the video management server 1100-2. As shown in fig. 2, the video tag adding method of the present embodiment includes the following steps S2100 to S2400:

step S2100, a feature vector of a key frame picture of each video in the video set is obtained, the feature vector of the key frame picture of the video is used as a sample vector, and all the sample vectors form a sample vector set.

The video set is a set composed of a plurality of videos to be tagged. For a video library, all videos in the video library may constitute a video set. The number of videos in a video collection is typically huge, for example up to hundreds of millions of videos.

The key frame picture of the video may be representative content that can reflect the video. By extracting key frames of the video, the video data can be converted into image data which is easier to process, and the main content of the video can be reserved.

The feature vector of the key frame picture is composed of feature parameters of the key frame picture and can reflect the features of the key frame picture. By extracting the feature vectors, the image can be converted into data that can be processed by a computer.

When the feature vector of the key frame picture is obtained, the feature vector can be extracted by using a deep learning neural network model, can also be extracted by using an automatic encoder, and can also be obtained by using a bag-of-words model after being clustered by using a Scale-invariant feature transform (SIFT) in the traditional machine learning scheme.

Feature vectors of key frame pictures of the video are taken as sample vectors, and all the sample vectors form a sample vector set. The number of vectors in a sample vector set is usually huge. Even for a short video of 10 minutes, the key frame pictures often exceed 200 frames, each key frame picture corresponds to a multi-dimensional feature vector, and then, for hundreds of millions of videos in a video set, massive feature vector data can be formed.

Step S2200 is to acquire a feature vector of the sample picture corresponding to the label as a target vector.

In this embodiment, the tags are in text form. A tag may have multiple pictures matching the content, referred to herein as sample pictures. Feature vectors can also be extracted from the sample picture.

In one example, a person may collect a sample picture corresponding to the label, or a computer may collect a sample picture corresponding to the label, for example, the computer collects the sample picture through a trained recognition model.

Step S2300, finding out a sample vector similar to the target vector from the sample vector set.

In one embodiment, a sample vector similar to the target vector may be found from the sample vector set based on a nearest neighbor search algorithm. In other embodiments, other search algorithms may be used to find a sample vector from the sample vector set that is similar to the target vector.

In the embodiment of the disclosure, in order to improve the retrieval efficiency in the face of massive feature vector data of a sample vector set, a nearest neighbor retrieval algorithm is adopted to find out a sample vector similar to a target vector from the sample vector set.

The nearest neighbor search is to find items similar to the target data from the database according to the similarity of the data, and the similarity is usually quantified to the distance between the data in space, and it can be considered that the closer the distance between the data in space is, the higher the similarity between the data is.

The Nearest neighbor search algorithm in this embodiment may include a classical Nearest neighbor search (NN) algorithm, and may also include an Approximate Nearest neighbor search (ANN) algorithm. The core idea of approximate nearest-neighbor retrieval is to search for data items that are likely to be neighbors and no longer be limited to returning only the most likely items, improving retrieval efficiency at the expense of accuracy within an acceptable range.

In the idea of the nearest neighbor search algorithm, one of the methods is based on improving the search structure to improve the search efficiency, and most of the main methods are based on tree structures, such as the classical K-d tree algorithm, the R tree algorithm, the M tree algorithm, and the like. The other is a method mainly based on processing of data itself to improve retrieval efficiency, including a hash algorithm, a vector quantization method, etc., wherein the vector quantization method is represented by product quantization.

Based on the nearest neighbor search algorithm, the sample vector similar to the target vector can be searched out from the sample vector set more quickly.

And step S2400, adding a label to the video corresponding to the found sample vector.

Since the found sample vector is similar to the target vector, which indicates that the video corresponding to the found sample vector contains contents similar to the sample picture corresponding to the label, the label can be added to the videos.

The video label adding method in the embodiment converts the adding of the video label into the problem of searching for the sample vector similar to the target vector, and quickly finds out the sample vector similar to the target vector by means of the nearest neighbor search algorithm, so that the adding speed of the video label is increased.

Alternatively or preferably, referring to fig. 3, the process of acquiring feature vectors of key frame pictures of a video in this embodiment includes the following steps S3100 to S3300:

and step S3100, segmenting the video to obtain at least one lens.

A shot is a basic unit constituting a visual language. It is the basis for narratives and ideograms. In the prior shooting of a movie or television work, a shot refers to the sum of a segment of pictures that are captured by a camera without interruption from startup to standstill. In post-editing, a shot is a set of frames between two clip points. In a finished shot, a shot refers to the complete segment between the previous optical transition and the next optical transition.

For the purposes of this embodiment, a shot refers to a set of inherently related consecutive frames captured consecutively by a camera, typically representing a temporally and temporally consecutive set of motions.

A frame is a single video frame of the smallest unit in a video motion picture, and corresponds to each shot on a motion picture film. One frame is a still picture, and consecutive frames form a dynamic image, such as a television image.

Shot segmentation is to detect the boundary of a shot and then separate the video from the detected boundary to form individual shots. There is typically a very clear boundary between shots, called a boundary frame. The main task of shot slicing is to detect these boundary frames from all the frames constituting the video file, for example, by using a computer to sequentially detect each frame of the video file to determine whether it is a shot boundary frame, which is also called shot boundary detection.

One or more shots are obtained after a video is shot cut.

The video shot segmentation method provided by the embodiment of the disclosure may be a shot segmentation method based on scene segmentation, for example, a shot segmentation method based on a histogram. The video shot segmentation method provided by the embodiment of the disclosure can also be a motion-based shot segmentation method or a contour-based shot segmentation method. The video shot segmentation method provided by the embodiment of the disclosure can also adopt a video shot segmentation method based on clustering or decision trees. In this embodiment, the shot slicing is performed by using pyscenedetect open source software.

In step S3200, the first frame picture of each shot is taken as a key frame picture.

Each shot contains key frame pictures that can represent the corresponding shot. The key frame picture is a frame picture that can relatively represent the main content of a shot in each frame of the shot. That is, a shot can be compactly expressed using a key frame picture. For example, in the field of animation, a key frame picture is an original picture in a two-dimensional animation, which refers to a frame in which a key action in a motion or change of a character or an object is located.

In the embodiment of the present invention, the key frame pictures are extracted to express the main content of the shots and even the videos, and feature vectors (e.g., feature vectors such as color, texture, and shape) extracted from the key frame pictures are used to provide a basis for whether to label the videos.

A shot may have one or more key frame pictures depending on the complexity of the shot content.

In the present embodiment, the first frame picture of each shot may be taken as a key frame picture. Generally, when a new shot starts, the first frame of the shot is relatively more capable of representing the main content of the shot.

In the disclosed embodiments, an algorithm based on frame averaging may be employed to determine the key frame picture, including pixel frame averaging or histogram frame averaging. Both have the same basic idea, and the difference is only the characteristics of the selected weighted average. The pixel frame averaging method is to take the average of pixel values at certain positions of all frames in the shot as the comparison standard, and to take the frame with the pixel value at the position closest to the average value in the shot as the key frame picture of the shot. The histogram frame averaging rule is to select the average value of all frames in the shot as the standard, and to use the frame with the histogram closest to the average value as the key frame picture of the shot. Although the algorithm is simple in calculation, the selected frames can be close to the average value to the maximum extent, only one key frame picture is selected for one shot, and the content of the key frame picture cannot be comprehensively described, particularly for the shot with large content change.

In other embodiments, for each shot, the key frame extraction may be dynamically performed, instead of performing the key frame extraction according to the length of the current shot and the intensity of the change of the current shot, when the current shot changes more intensely, more key frames should be extracted, even if the current shot is not long. In contrast, even for a long shot, if the scene is not changed, fewer key frames should be extracted. When a plurality of key frame pictures are selected, the criterion for selecting the key frame pictures is to give priority to the dissimilarity among the key frame pictures, namely, the similarity among the frames is taken as a measurement basis, and the key frame pictures are searched each time to ensure that the key frame pictures have the minimum similarity so as to ensure that the key frame pictures have the maximum information content.

In another embodiment, the video key-frames may be extracted using python opencv software.

Through the process, the key frame picture capable of representing the main content of the video can be obtained, and the subsequent steps can be conveniently carried out.

And step S3300, extracting the feature vector of each key frame picture.

The video is preprocessed through the process, the video file is finally converted into vector data which can be processed by a computer, and the subsequent steps are convenient to carry out.

Alternatively or preferably, as shown with reference to fig. 4, step S3300 further includes the following steps S4100 and S4200:

step S4100 extracts a high-dimensional feature vector for the key frame picture.

And step S4200, performing dimension reduction processing on the high-dimensional feature vector of the key frame picture.

Extracting feature vectors from the key frame pictures is also a feature extraction problem. Feature recognition plays a decisive role in image recognition and retrieval. The feature extraction in image processing and computer vision includes a plurality of levels of extraction modes and feature forms, and can be mainly divided into two levels, namely low-level feature extraction and high-level feature extraction. The low-level feature extraction aims to describe the main content structure of the image, and the high-level feature extraction is concerned about how to dig out the algebraic features implicit in the image by various methods.

High-level feature extraction can be divided into two categories, one based on signal processing methods and the other based on learning methods. Feature extraction algorithms based on signal processing methods use some classical transformations, such as fourier transformation, wavelet transformation, etc. The characteristic extraction algorithm based on the learning method mainly reduces the dimension of the existing data, the original data is mapped to a low-dimensional characteristic space from a high-dimensional space through linearity or nonlinearity, the obtained data after the dimension reduction greatly reflects the essential characteristics of the original data, and the typical learning algorithm comprises Principal Component Analysis (PCA), linear discriminant analysis, local preserving mapping, kernel principal component analysis and the like.

In the embodiment, a feature extraction algorithm based on a learning method is adopted, and specifically, for each key frame picture, a high-dimensional feature vector is obtained by utilizing the penultimate layer extraction of the existing VGG16 depth network. The dimensions of the high-dimensional feature vector are, for example, the upper ten thousands dimensions. And then, carrying out dimensionality reduction on the high-dimensional feature vector by using a PCA (principal component analysis) method to obtain a low-dimensional feature vector, for example, reducing the dimensionality to 128 dimensions. The low-dimensional feature vector is applied in the subsequent step S2300, so that the calculation amount can be reduced and the retrieval speed can be increased.

It should be noted that the above method for extracting feature vectors from key frame pictures can also be applied to extracting feature vectors from sample pictures.

Referring to fig. 5, in step S2300, finding out a sample vector similar to a target vector from a sample vector set by using a Product Quantization (PQ) based nearest neighbor search algorithm, including the following steps S5100-5300:

step S5100 establishes a data index of the sample vector set by performing a product quantization process on the sample vectors in the sample vector set.

The product quantization method is a more typical one of the vector quantization methods. The main idea of the product quantization method is to divide the feature vector with higher dimensionality into a plurality of sub-feature vectors with lower dimensionality, then quantize the sub-feature vectors in the sub-feature vector space, further obtain the quantization result (codebook) of the original vector through the cartesian product of the quantization results of the sub-feature vectors, and finally, the original vector can be represented by the code corresponding to the codebook.

In one example, the dimension of the sample vector obtained after step S4200 is 128 dimensions, and the number of such sample vectors is huge, for example, one hundred million sample vectors, and the one hundred million sample vectors constitute a sample vector set.

According to the product quantization method, a 128-dimensional space of sample vectors is partitioned into a plurality of subspaces of the same dimension, for example, into 8 subspaces of 16 dimensions, which are the subspaces X1 and X2 … … X8, respectively. For one sample vector A in the sample vector set, the sample vector A is divided into 8 16-dimensional sub-vectors A1 and A2 … … A8, wherein the sub-vectors A1 and A2 … … A8 correspond to 8 subspaces X1 and X2 … … X8 in a one-to-one mode. And segmenting other sample vectors in the sample vector set by the same method. The set of all sub-vectors of each subspace forms the set of sub-vectors of that subspace. The sub-vectors of the set of sub-vectors in sub-space X1 are quantized to obtain the code of the codebook in sub-space X1 and the code of each sub-vector in the set of sub-vectors in sub-space X1 in the codebook in sub-space X1. And performing the same processing on other subspaces to finally obtain the codebook of each subspace and the code of each subvector in the subvector set of the subspace in the codebook of the subspace.

The cartesian products of the codebooks of the subspaces X1, X2 … … X8 form a codebook of the sample vector space. The coding of the sub-vectors a1, a2 … … A8 in 8 subspaces together constitute the coding of the sample vector a. The data index of the sample vector set may be established from a codebook of the sample vector space and the encoding of each sample vector.

Step S5200 quantizes the target vector into the data index by performing product quantization processing on the target vector to obtain a distance between the sample vector in the sample vector set and the target vector.

Continuing with the example in step S5100, the target vector is also subjected to the product quantization process, and finally the code of the target vector in the same codebook is obtained, that is, the target vector is quantized into the data index.

After the target vector is quantized into the data index, the distance between the target vector and the sample vector can be conveniently determined according to the data index.

Step S5300 determines a sample vector having a distance from the target vector smaller than a preset threshold as a sample vector similar to the target vector.

In this embodiment, the distance between the target vector and the sample vector is used as an evaluation index of the similarity, and the sample vector whose distance from the target vector is smaller than a preset threshold is determined as the sample vector similar to the target vector, so as to find the sample vector similar to the target vector.

And a sample vector similar to the target vector is searched by adopting a product quantization method, so that the searching speed can be greatly increased and the memory consumption can be reduced.

< computer storage Medium >

The present embodiments provide a computer storage medium storing executable instructions that, when executed, implement a method comprising:

acquiring a feature vector of a key frame picture of each video in a video set, wherein the feature vector of the key frame picture of the video is used as a sample vector, and all the sample vectors form a sample vector set;

finding out a sample vector similar to the target vector from the sample vector set;

and adding a label to the video corresponding to the found sample vector.

Specifically, the foregoing embodiment of the video tag adding method can be used to explain a computer storage medium in this embodiment, and details are not repeated in this embodiment of the computer storage medium.

< electronic apparatus >

The present embodiment provides an electronic device, as shown in fig. 6, an electronic device 6100 includes:

a memory 6120, the memory 6120 storing executable instructions,

the processor 6110, when the executable instructions are executed by the processor 6110, implements the following method, including:

and adding a label to the video corresponding to the found sample vector.

Specifically, the embodiment of the video tag adding method can be used to explain the electronic device in this embodiment, and details are not repeated in this embodiment of the electronic device.

The electronic device 6100 may also include an interface device 6130, a communication device 6140, a display device 6150, and an input device 6160.

< Server >

The embodiment of the invention provides a server for adding a label to a video, which comprises the following components: a memory storing executable instructions; and the processor is used for realizing the video label adding method of any one of the above items when the executable instructions are executed by the processor.

A server that may be used to tag videos is, for example, the video management server 1100-2 described above.

The embodiment of the video tag adding method can be used to explain the server in the embodiment, and details are not repeated in the embodiment of the server.

< video playback method, Server, terminal device, computer storage Medium, electronic device >

The embodiment of the invention provides a video playing method. The video playing method according to the embodiment of the present invention may be implemented by the video management server 1100-2.

The video playing method provided by the embodiment of the invention comprises the following steps:

receiving a playing request of a first video sent by terminal equipment;

sending the first video and the label thereof to the terminal equipment;

the feature vectors of the key frame pictures of the first video belong to a first feature vector set, and videos corresponding to the first feature vector set are provided with labels;

the first feature vector set is a set of feature vectors similar to the feature vectors of the sample pictures corresponding to the labels, which are found from the second feature vector set, the second feature vector set is a set of feature vectors of key frame pictures of videos in the video library, and the first video belongs to the video library.

As can be seen from the above, the tag of the first video may be added by any of the video tag adding methods described above.

An embodiment of the present invention provides a server, including: a memory storing executable instructions; and the processor is used for realizing the video playing method when the executable instructions are executed by the processor.

The embodiment of the invention provides a computer storage medium, which stores executable instructions, and when the executable instructions are executed by a processor, the video playing method is realized.

An embodiment of the present invention provides an electronic device, including: a memory storing executable instructions; and the processor is used for realizing the video playing method when the executable instructions are executed by the processor.

The embodiment of the invention provides a video playing method which can be implemented by terminal equipment and comprises the following steps:

sending a playing request of a first video to a server;

receiving a first video from a server, the first video having a tag attached thereto;

the first feature vector set is a set of feature vectors similar to the feature vectors of the sample pictures corresponding to the labels, which are found from the second feature vector set, the second feature vector set is a set of feature vectors of key frame pictures of videos in a video library, and the first video belongs to the video library.

An embodiment of the present invention provides a terminal device, including: a memory storing executable instructions; and the processor is used for realizing the video playing method when the executable instructions are executed by the processor. The terminal device may be, for example, a mobile phone, a desktop, a tablet, a notebook, etc.

< video search method, Server, terminal device, computer storage Medium, electronic device >

The embodiment of the invention provides a video searching method. The video search method according to the embodiment of the present invention may be implemented by the video management server 1100-2.

The video searching method of the embodiment of the invention comprises the following steps:

the feature vectors of the key frame pictures of the searched videos belong to a first feature vector set, and the videos corresponding to the first feature vector set are provided with labels;

the first feature vector set is a set of feature vectors similar to the feature vectors of the sample pictures corresponding to the tags, which are searched from the second feature vector set, and the second feature vector set is a set of feature vectors of key frame pictures of the videos in the video library.

As can be seen from the above, the tags of the videos may be added by any one of the video tag adding methods described above.

An embodiment of the present invention provides a server, including: a memory storing executable instructions; and the processor is used for realizing the video searching method when the executable instructions are executed by the processor.

The embodiment of the invention provides a computer storage medium which stores executable instructions, and when the executable instructions are executed by a processor, the video searching method is realized.

An embodiment of the present invention provides an electronic device, including: a memory storing executable instructions; and the processor is used for realizing the video searching method when the executable instructions are executed by the processor.

The embodiment of the invention provides a video searching method which can be implemented by terminal equipment and comprises the following steps:

receiving videos searched by a server in a video library according to the labels;

An embodiment of the present invention provides a terminal device, including: a memory storing executable instructions; and the processor is used for realizing the video searching method when the executable instructions are executed by the processor. The terminal device may be, for example, a mobile phone, a desktop, a tablet, a notebook, etc.

< video push method, Server, computer storage Medium, electronic device >

The embodiment of the invention provides a video pushing method. The video push method according to the embodiment of the present invention may be implemented by the video management server 1100-2.

The video pushing method provided by the embodiment of the invention comprises the following steps:

acquiring a label of a video browsed by a user;

An embodiment of the present invention provides a server, including: a memory storing executable instructions; and the processor is used for realizing the video pushing method when the executable instructions are executed by the processor.

The embodiment of the invention provides a computer storage medium, which stores executable instructions, and when the executable instructions are executed by a processor, the video pushing method is realized.

An embodiment of the present invention provides an electronic device, including: a memory storing executable instructions; and the processor is used for realizing the video pushing method when the executable instructions are executed by the processor.

The present invention may be a system, method and/or computer program product. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied therewith for causing a processor to implement various aspects of the present invention.

The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or electrical signals transmitted through electrical wires.

The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.

The computer program instructions for carrying out operations of the present invention may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source code or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, aspects of the present invention are implemented by personalizing an electronic circuit, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA), with state information of computer-readable program instructions, which can execute the computer-readable program instructions.

Aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions. It is well known to those skilled in the art that implementation by hardware, implementation by software, and implementation by a combination of software and hardware are equivalent.

Having described embodiments of the present invention, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein. The scope of the invention is defined by the appended claims.

Claims

1. A video tag adding method comprises the following steps:

adding the label to the video corresponding to the sample vector found.

2. The method of claim 1, wherein the finding a sample vector from the set of sample vectors that is similar to the target vector comprises:

3. The method of claim 1, wherein the obtaining feature vectors for key frame pictures of the video comprises:

performing shot segmentation on the video to obtain at least one shot;

taking the first frame picture of each shot as a key frame picture;

and extracting a feature vector of each key frame picture.

4. The method of claim 1, wherein the obtaining feature vectors for key frame pictures of the video comprises:

5. The method of claim 1, wherein the obtaining feature vectors for key frame pictures of the video comprises:

6. The method of claim 1, wherein the obtaining the feature vector of the sample picture corresponding to the label comprises:

7. The method of claim 1, wherein the obtaining the feature vector of the sample picture corresponding to the label comprises:

8. The method of claim 2, wherein the nearest neighbor retrieval algorithm comprises: a K-d tree based neighbor search algorithm or a product quantization based neighbor search algorithm.

9. The method of claim 2, wherein the finding a sample vector from the set of sample vectors that is similar to the target vector based on a nearest neighbor search algorithm comprises:

10. A video playing method comprises the following steps:

receiving a playing request of a first video sent by terminal equipment;

sending the first video and the label thereof to the terminal equipment;

11. A video playing method comprises the following steps:

sending a playing request of a first video to a server;

12. A video search method, comprising the steps of:

13. A video search method, comprising the steps of:

receiving videos searched by a server in a video library according to the tags;

14. A video push method, comprising the steps of:

acquiring a label of a video browsed by a user;

15. A computer storage medium storing executable instructions that, when executed by a processor, implement the method of any of claims 1-14.

16. An electronic device, comprising:

a memory storing executable instructions,

a processor, the executable instructions when executed by the processor implementing the method of any of claims 1-14.

17. A server, comprising:

a memory storing executable instructions,

a processor, the executable instructions when executed by the processor implementing the method of any of claims 1-10, 12, 14.

18. A terminal device, comprising:

a memory storing executable instructions,

a processor, the executable instructions when executed by the processor implementing the method of claim 11 or 13.