CN113365102A

CN113365102A - Video processing method and device and label processing method and device

Info

Publication number: CN113365102A
Application number: CN202010143035.6A
Authority: CN
Inventors: 毛超杰; 唐铭谦
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Damo Academy Beijing Technology Co ltd
Priority date: 2020-03-04
Filing date: 2020-03-04
Publication date: 2021-09-07
Anticipated expiration: 2040-03-04
Also published as: CN113365102B

Abstract

The application provides a video processing method and device and a label processing method and device, wherein the video processing method comprises the following steps: acquiring a video to be processed; obtaining intermediate video data of at least one data dimension based on the video to be processed; encoding the intermediate video data of the at least one data dimension to obtain video characteristics corresponding to the data dimension; and retrieving in a label database according to the video characteristics to obtain the video label of the video to be processed.

Description

Video processing method and device and label processing method and device

Technical Field

The present application relates to the field of data processing technologies, and in particular, to a video processing method and apparatus, and a tag processing method and apparatus.

Background

With the rapid development of network technology and multimedia technology, various resources, such as pictures, characters, sounds, videos and the like, emerge on the network, and these resources enrich the life of users, but also make users have difficulty in selecting when browsing resources on the network, and it is difficult to quickly locate and find desired resources when facing various resources, so that the resources need to be labeled, and by analyzing the resources propagated on the network, the tags of characters, backgrounds and the like contained in the resources are analyzed, and the analyzed tags can be used for resource search.

Disclosure of Invention

In view of the foregoing, the present application provides a video processing method, a video processing apparatus, a tag processing method, a tag processing apparatus, two computing devices, and two computer-readable storage media.

The application provides a video processing method, which comprises the following steps:

acquiring a video to be processed;

obtaining intermediate video data of at least one data dimension based on the video to be processed;

encoding the intermediate video data of the at least one data dimension to obtain video characteristics corresponding to the data dimension;

and retrieving in a label database according to the video characteristics to obtain the video label of the video to be processed.

Optionally, the tag database records tags written by tag registration, where the tag registration includes:

acquiring one or more label data of at least one data dimension of a label to be registered;

according to a coding model corresponding to a data dimension, performing feature coding on at least one of one or more label data of the data dimension to obtain a label feature of the label to be registered;

aggregating the label features according to a feature aggregation mode corresponding to the feature number of the label features to obtain target features;

and registering the one or more label data and the target characteristics in a mode of writing the one or more label data and the target characteristics into the label database.

Optionally, the obtaining intermediate video data of at least one data dimension based on the video to be processed includes:

and decomposing the video to be processed to obtain the intermediate video data of the at least one data dimension.

Optionally, after the step of retrieving in the tag database according to the video features and obtaining the video tag of the video to be processed is executed, the method further includes:

determining and recommending target videos and/or target objects recommended to a user based on the video tags of the videos to be processed;

and/or the presence of a gas in the gas,

analyzing the video browsing behavior of a user based on the video tag of the video to be processed, and determining the video browsing characteristic data of the user; the video to be processed is a historical video browsed by a user.

Optionally, the video to be processed includes an interactive video; the intermediate video data includes interactive data included in the interactive video.

The application provides a video processing apparatus, including:

the acquisition module is configured to acquire a video to be processed;

a determination module configured to obtain intermediate video data of at least one data dimension based on the video to be processed;

an encoding module configured to encode the intermediate video data of the at least one data dimension to obtain video features corresponding to the data dimension;

and the retrieval module is configured to retrieve in a tag database according to the video characteristics to obtain the video tag of the video to be processed.

The application provides a label processing method, which comprises the following steps:

and registering the tag to be registered by writing the one or more tag data and the target characteristics into a tag database.

Optionally, after the step of obtaining one or more tag data of at least one data dimension of the tag to be registered is executed, and before the step of obtaining the tag feature of the tag to be registered is executed, the method includes:

encoding the one or more label data to obtain a label vector;

calculating the similarity between the label vector and a reference label vector in the label database;

judging whether the similarity is smaller than a preset similarity threshold value or not;

if so, executing the coding model corresponding to the data dimension, and performing feature coding on at least one of one or more label data of the data dimension to obtain the label features of the label to be registered.

aiming at one or more label data of any data dimension, clustering the one or more label data by adopting a clustering algorithm;

and if the label data sets obtained by clustering are multiple, removing the label data sets with less label data from the one or more label data sets.

Optionally, the data dimension includes a text dimension, and the feature coding is performed on at least one of one or more tag data of the data dimension according to a coding model corresponding to the data dimension to obtain the tag feature of the tag to be registered, including:

and according to the text coding model corresponding to the text dimension, performing text feature coding on at least one of one or more label texts of the text dimension to obtain the text feature of the label to be registered.

Optionally, the data dimension includes an image dimension, and the feature coding is performed on at least one of one or more tag data of the data dimension according to a coding model corresponding to the data dimension to obtain the tag feature of the tag to be registered, including:

and according to the image coding model corresponding to the image dimension, carrying out image feature coding on at least one of one or more label images of the image dimension to obtain the image feature of the label to be registered.

Optionally, the aggregating the tag features according to the feature aggregation manner corresponding to the feature number of the tag features to obtain the target features includes:

and if the feature number of the label feature is smaller than or equal to a first feature number threshold value, taking the label feature as the target feature.

and if the feature number of the label features is larger than a first feature number threshold and smaller than or equal to a second feature number threshold, adopting an aggregation algorithm to aggregate the label features into the target features.

if the characteristic number of the label characteristic is larger than a second characteristic number threshold value, taking the label characteristic as a positive sample, and selecting the label characteristic with the label type different from that of the label to be registered from the label database as a negative sample;

performing a binary training based on the positive and negative examples;

and constructing a parameter vector according to the training parameters obtained by training, and taking the parameter vector as the target feature.

Optionally, the tag processing method further includes:

acquiring a video to be processed;

decomposing the video to be processed to obtain image data to be processed with image dimensionality, text data to be processed with text dimensionality and/or sound data to be processed with sound dimensionality;

carrying out image feature coding on the image data to be processed, carrying out text feature coding on the text data to be processed and/or carrying out sound feature coding on the sound data to be processed to obtain image features, text features and/or sound features of the video to be processed;

retrieving in the tag database according to the image features, the text features and/or the sound features;

and determining that the target label corresponding to the target feature is the video label of the video to be processed according to the target feature obtained by retrieval.

Optionally, the retrieving in the tag database according to the image feature, the text feature and/or the sound feature includes:

calculating feature similarity of the image features, the text features and/or the sound features and feature vectors in the tag database;

and selecting the feature vector with the highest feature similarity as the target feature.

The application provides a label processing apparatus, includes:

the system comprises a tag data acquisition module, a registration module and a data processing module, wherein the tag data acquisition module is configured to acquire one or more tag data of at least one data dimension of a tag to be registered;

the characteristic coding module is configured to perform characteristic coding on at least one of one or more label data of the data dimension according to a coding model corresponding to the data dimension to obtain a label characteristic of the label to be registered;

the characteristic aggregation module is configured to aggregate the label characteristics according to a characteristic aggregation mode corresponding to the characteristic number of the label characteristics to obtain target characteristics;

and the tag registration module is configured to register the tag to be registered by writing the one or more tag data and the target feature into a tag database.

The present application further provides a computing device comprising:

a memory and a processor;

the memory is to store computer-executable instructions, and the processor is to execute the computer-executable instructions to:

acquiring a video to be processed;

The present application further provides a computing device comprising:

a memory and a processor;

The present application further provides a computer readable storage medium storing computer instructions that, when executed by a processor, implement the steps of the video processing method.

The present application also provides a computer readable storage medium storing computer instructions which, when executed by a processor, implement the steps of the tag processing method.

According to the video processing method, in the process of labeling the video to be processed, more comprehensive and flexible label retrieval is carried out in the label database by acquiring the intermediate video data of the video to be processed in at least one data dimension and utilizing the video characteristics which are obtained by encoding the intermediate video data and correspond to the data dimension, so that the labeling processing of the video to be processed is realized, and more comprehensive and accurate video content expression is realized.

The label processing method provided by the application supports registration of labels to be registered from multiple data dimensions in the label registration process, carries out feature coding processing on label data of the data dimensions respectively through coding models which are set aiming at the data dimensions in advance, and combines the label features of the data dimensions to aggregate the target features of the labels to be registered, so that the label definition mode is enriched, the label registration flexibility is improved, and meanwhile, the feature coding accuracy is improved, so that the label registration process has higher accuracy and flexibility.

Drawings

Fig. 1 is a processing flow chart of a video processing method provided by an embodiment of the present application;

fig. 2 is a schematic view of a video processing scene provided in an embodiment of the present application;

fig. 3 is a schematic diagram of a video processing apparatus according to an embodiment of the present application;

fig. 4 is a processing flow chart of a tag processing method provided in an embodiment of the present application;

fig. 5 is a schematic diagram of a tag registration scenario provided in an embodiment of the present application;

fig. 6 is a schematic diagram of a label processing apparatus according to an embodiment of the present application;

FIG. 7 is a block diagram of a computing device according to an embodiment of the present disclosure;

fig. 8 is a block diagram of another computing device according to an embodiment of the present disclosure.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present description. This description may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein, as those skilled in the art will be able to make and use the present disclosure without departing from the spirit and scope of the present disclosure.

The terminology used in the description of the one or more embodiments is for the purpose of describing the particular embodiments only and is not intended to be limiting of the description of the one or more embodiments. As used in one or more embodiments of the present specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present specification refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It will be understood that, although the terms first, second, etc. may be used herein in one or more embodiments to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first can also be referred to as a second and, similarly, a second can also be referred to as a first without departing from the scope of one or more embodiments of the present description. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

An embodiment of the application provides a video processing method, a video processing device, a label processing method, a label processing device, two kinds of computing equipment and two kinds of computer readable storage media. The following detailed description and the explanation of the steps of the method are individually made with reference to the drawings of the embodiments provided in the present specification.

The embodiment of the video processing method provided by the application is as follows:

referring to fig. 1, a processing flow chart of a video processing method provided by the embodiment is shown, and referring to fig. 2, a schematic diagram of a video processing scene provided by the embodiment is shown.

And step S102, acquiring a video to be processed.

In practical application, in the process that a video platform or a video website provides video browsing access for a user, in order to enable the user to obtain more efficient and faster video browsing experience, video content is identified through a video understanding technology so as to perform tagging processing on video, and for the user, faster and more effective video searching and browsing can be performed through a video tag. According to the video processing method, the comprehensiveness of the labels in the label database is improved by opening the registration of the labels in the label database to the user, so that the labeling processing of the video to be processed based on the label database is more comprehensive and flexible, meanwhile, in the process of labeling processing of the video to be processed, on the basis of decomposing the video to be processed according to data dimensions, more comprehensive and flexible label retrieval is carried out in the label database in which the labels are opened and registered by the user, and the labeling processing of the video to be processed is realized.

The video to be processed according to the embodiment of the present application may be a video segment or a complete video formed according to video frames, for example, the video to be processed may be a segment (5min movie segment) or a complete episode (a tv episode) in a movie, and the video to be processed may also be a complete interactive video or a video segment in an interactive video. Therefore, the video tag obtained by performing tagging processing on the video to be processed in this embodiment may also be a video tag for a video frame or a video tag for a video clip.

And step S104, acquiring intermediate video data of at least one data dimension based on the video to be processed.

Optionally, obtaining intermediate video data of at least one data dimension based on the video to be processed includes: and decomposing the video to be processed to obtain the intermediate video data of the at least one data dimension. For example, for an acquired video to be processed, the video to be processed is decomposed according to the data type of the data included in the video, and after decomposition, image data of the video in an image dimension, text data in a text dimension, and sound data in a sound dimension are obtained. For another example, in a scene where the video to be processed is an interactive video, the intermediate video data obtained by analyzing the interactive video is bullet screen data or interactive data (somatosensory interactive data or AB selection interactive data) in the interactive video.

Step S106, encoding the intermediate video data of the at least one data dimension to obtain the video characteristics corresponding to the data dimension.

And S108, retrieving in a label database according to the video characteristics to obtain the video label of the video to be processed.

Optionally, the tag database records tags written by tag registration, and the tag registration performed in the tag database is specifically implemented in the following manner:

It should be noted that, the tag registration process provided in this embodiment is similar to the tag registration processing process provided in the following tag registration method embodiment, and the specific implementation process of the tag registration provided in the following tag registration method embodiment may be referred to.

For example, in the process that the user a marks the video, first, the video to be processed is read;

then, decomposing the data according to the data type of the data contained in the video, and obtaining image data of the video in an image dimension, text data in a text dimension and sound data in a sound dimension after decomposition;

secondly, performing feature coding on the image data, the text data and the sound data obtained by decomposition, specifically, coding the text data into text feature vectors by adopting a word to vector text feature coding mode, coding the image data into image feature vectors by utilizing an image feature coding algorithm, coding the sound data into sound feature vectors by utilizing a sound feature coding algorithm,

thirdly, searching in the label database according to the text feature vector, the image feature vector and the sound feature vector, wherein the searching process specifically comprises the steps of calculating cosine distances between the text feature vector, the image feature vector and the sound feature vector and feature vectors of labels in the label database to measure the similarity between the text feature vector, the image feature vector and the sound feature vector of the video and the feature vectors of the labels in the label database;

and finally, if the label to which the feature vector with the highest similarity to the text feature vector, the image feature vector and the sound feature vector of the video belongs in the label database is a football game label, marking the video as the football game label.

In practical application, after the video tag of the video to be processed is determined, further recommendation processing and analysis processing may be performed based on the determined video tag. For example, according to a video tag of a movie and television play video browsed by the user, a similar movie and television play is recommended to the user, or a person clothing or a prop in the movie and television play is recommended to the user.

In addition, the video browsing behavior of the user can be analyzed based on the video tags of the historical videos browsed by the user, and the video browsing characteristic data of the user can be determined. For example, video browsing preferences in the process of browsing videos by the user are analyzed based on historical videos browsed by the user, and more accurate video recommendation can be performed on the basis of the video browsing preferences of the user.

In summary, in the video processing method, in the process of performing tagging processing on the video to be processed, the video to be processed is decomposed according to the data dimension, and more comprehensive and flexible tag retrieval is performed in the tag database on the basis that the tag database opens tag registration to the outside, so that the tagging processing of the video to be processed is realized, and more comprehensive and accurate video content expression is realized.

An embodiment of a video processing apparatus provided in this specification is as follows:

in the above embodiments, a video processing method is provided, and a video processing apparatus is provided, which is described below with reference to the accompanying drawings.

Referring to fig. 3, a schematic diagram of a video processing apparatus provided in this embodiment is shown.

Since the device embodiments correspond to the method embodiments, the description is relatively simple, and the relevant portions may refer to the corresponding description of the method embodiments provided above. The device embodiments described below are merely illustrative.

The application provides a video processing apparatus, including:

an obtaining module 302 configured to obtain a video to be processed;

a determining module 304 configured to obtain intermediate video data of at least one data dimension based on the video to be processed;

an encoding module 306 configured to encode the intermediate video data of the at least one data dimension to obtain video features corresponding to the data dimension;

and the retrieval module 308 is configured to retrieve from the tag database according to the video features to obtain a video tag of the video to be processed.

Optionally, a tag written by tag registration is recorded in the tag database, and the tag registration is implemented by operating the following modules:

and the tag registration module is configured to register the tag to be registered by writing the one or more tag data and the target feature into the tag database.

Optionally, the determining module 304 is specifically configured to decompose the video to be processed to obtain intermediate video data of the at least one data dimension.

Optionally, the video processing apparatus further includes:

the recommendation module is configured to determine and recommend a target video and/or a target object to a user based on the video tag of the video to be processed;

and/or the presence of a gas in the gas,

the analysis module is configured to analyze the video browsing behavior of the user based on the video tag of the video to be processed and determine the video browsing characteristic data of the user; the video to be processed is a historical video browsed by a user.

The embodiment of the tag processing method provided by the application is as follows:

referring to fig. 4, it shows a processing flow chart of a tag processing method provided in this embodiment, and referring to fig. 5, it shows a schematic diagram of a tag registration scenario provided in this embodiment.

Step S402, one or more label data of at least one data dimension of the label to be registered is obtained.

In practical application, in the process that a content platform or a resource website provides data resource access for a user, in order to enable the user to obtain more efficient and faster data resource access experience, the data resource is analyzed to perform labeling processing on the data resource, so that the data resource can be accessed more quickly and efficiently through a label. However, in the process of tagging data resources, the difficult problem is how to tag large-scale data resources, and the tags required by different service scenarios are different, so that it takes time and labor to customize adaptive tags for different service scenarios. The tag processing method allows a user to register tags for data resources, and supports the user to perform tag definition through tag data of multiple data dimensions, as shown in fig. 5, the user can define the tags to be registered through the tag data of one or more data dimensions, so that the manner of defining the tags by the user is enriched, the flexibility of tag registration is improved, meanwhile, the tag definition is performed through multiple data dimensions, and the accuracy of tag registration is improved.

In the tag registration process, in this embodiment, first, one or more tag data of at least one data dimension of a tag to be registered is obtained. The data dimension refers to a data type of data adopted in a process of customizing the label by a user, and common data dimensions include a text dimension, an image dimension, a sound dimension and the like. Taking a scene in which a user registers a "pet dog" tag as an example, defining the "pet dog" tag by using tag data of multiple data dimensions is embodied as that the user defines the "pet dog" tag by using one or more images of the pet dog, one or more text descriptions, or one or more sound information, or that the user defines the "pet dog" tag by using one or more text descriptions and one or more images of the pet dog.

In practical applications, the use of tags is often focused on labeling (tagging) data resources, for example, from a character, a video resource is labeled with a character tag, from a scene, an image resource is labeled with a scene tag, or from a semantic, a text resource is labeled with a semantic tag.

In specific implementation, in the process of performing tag registration, in order to improve the validity of tag registration and avoid that a user registers an invalid tag and affects the accuracy of the tag in the tag database, in an optional implementation manner provided in this embodiment, the following manner is adopted to perform mapping processing on the tag to be registered:

encoding the one or more label data to obtain a label vector;

if so, indicating that the similarity between the label to be registered currently subjected to label registration and the existing label in the label database is low, executing the following step S404;

if not, the similarity between the label to be registered currently subjected to label registration and the existing label in the label database is higher, namely: if the tag to be registered currently is likely to be an existing tag in the tag database, mapping the tag to be registered and the existing tag in the tag database whose similarity is greater than a preset similarity threshold, specifically, adding the tag data for the tag to be registered to the tag data of the existing tag, and updating the tag data of the existing tag according to the tag data of the tag to be registered.

For example, in the process of registering a pet dog label, a user submits a text description for the pet dog label and 5 pet dog images of different varieties, firstly, the text description is coded into text feature vectors by adopting text feature coding modes such as word to vector and the like, and the 5 pet dog images are respectively coded into corresponding image feature vectors by utilizing an image feature coding algorithm; then, calculating the feature similarity of the text feature vector and the feature vector of the label in the label database, and respectively calculating the feature similarity of the 5 image feature vectors and the feature vector of the label in the label database; judging whether the feature similarity obtained by calculation is smaller than a feature similarity threshold value or not, if the feature similarity between the text feature vector and the text feature vector of the 'pet dog' label in the label database is larger than or equal to the feature similarity threshold value, and the feature similarity between at least one image feature vector of the 5 image feature vectors and the image feature vector of the 'pet dog' label is larger than or equal to the feature similarity threshold value, indicating that the similarity between the currently registered 'pet dog' label of the user and the 'pet dog' label in the label database is higher, mapping the currently registered 'pet dog' label of the user and the 'pet dog' label in the label database, specifically adding the text description submitted by the user when the 'pet dog' label is registered into the text label data of the 'pet dog' label, and adding the feature similarity between at least one image feature vector of the 'pet dog' label and the image feature vector of the 'pet dog' label to be larger than or equal to the feature similarity threshold value And the eigenvector is added to the image tag data of the pet dog tag.

In addition, in practical application, in a process of performing label registration by a user, a certain deviation may exist for accuracy of label data submitted by the label registration, for example, text description information submitted by the user performing label registration may not accurately describe a feature of a label to be registered, and for example, an image submitted by the user performing label registration may not accurately represent a feature of a label to be registered, and for this situation, in order to improve accuracy and validity of the label data submitted by the label registration, in an optional implementation manner provided in this embodiment, the label data of the label to be registered is cleaned in the following manner:

According to the above example, 5 pet dog images submitted by a user in the process of registering the pet dog label are clustered by adopting a clustering algorithm, and if the 5 pet dog images are of one type, the 5 pet dog images submitted by the user have certain common characteristics, and are not processed; if the clustering result is that 4 pet dog images are classified into one type and the remaining pet dog images are classified into one type, it indicates that the similarity between the remaining pet dog images and the other 4 pet dog images is low, and possibly, if the user uploads an inappropriate image, the pet dog image is removed from the 5 pet dog images of the pet dog label.

If the tag of the pet dog to be registered by the user is marked as L, and the tag data submitted by the user aiming at the tag L of the pet dog is marked as D, the tag L of the pet dog and the tag data D can be marked as a tag tuple [ L, D ]; in addition, after the tag data D is washed, a normalized tag tuple [ Lstd, D ] is obtained.

It should be noted that, in the process of processing the to-be-registered tag, the mapping process for the to-be-registered tag provided above may be combined with the washing process for the tag data of the to-be-registered tag, so as to further improve the accuracy and the effectiveness of tag registration.

Step S404, according to the coding model corresponding to the data dimension, performing feature coding on at least one of one or more label data of the data dimension to obtain the label feature of the label to be registered.

In this embodiment, in order to improve efficiency and accuracy of feature coding performed on tag data of different data dimensions, corresponding coding models are respectively configured for feature coding for the tag data of different data dimensions, specifically, in an optional implementation manner provided in this embodiment, corresponding text coding models are provided for the tag data of text dimensions, and feature coding is performed on at least one of one or more tag data of the text dimensions by using the text coding models, so as to obtain text features of the tag to be registered in the text dimensions.

Similarly, in another optional implementation manner provided by this embodiment, a corresponding image coding model is further provided for the tag data of the image dimension, and the image feature of the tag to be registered in the image dimension is obtained by performing image feature coding on at least one of one or more tag images of the image dimension by using the image coding model.

For example, for a segment of text description submitted by a user in the process of registering a "pet dog" tag and 5 pet dog images of different varieties, the segment of text description submitted by the user is input into neural network models such as bert (bidirectional Encoder retrieval from transforms) for text feature coding, and a text feature vector of the segment of text description is output; and respectively inputting the 5 pet dog images submitted by the user into a deep convolutional network model for image feature coding, and respectively outputting image feature vectors of the 5 pet dog images.

If the label of the pet dog to be registered by the user is marked as L, and the label data submitted by the user aiming at the label L of the pet dog is marked as D, the label L of the pet dog and the label data D can be marked as a label tuple [ L, D ], and on the basis, the text image vector and the image feature vector can be further marked as V, so that the label tuple [ L, D, V ] can be obtained according to the label L of the pet dog, the label data D and the feature vector V.

Step S406, according to a feature aggregation mode corresponding to the feature number of the label features, aggregating the label features to obtain target features.

In this embodiment, from the feature number of the tag features, the tag features of the tags to be registered are aggregated by adopting different feature aggregation modes for different numbers, so as to improve the accuracy and the effectiveness of the target features obtained after feature aggregation.

Specifically, according to the feature number of the tag feature of the tag to be registered, the embodiment performs feature aggregation processing on the tag feature in the following 3 feature aggregation manners:

(1) the number of features of the tag feature is less than or equal to a first feature number threshold:

the value of the first feature number threshold is preset, and is generally set to 1, but may also be set according to a service requirement of an actual scene, and if the feature number of the tag feature of the tag to be registered is less than or equal to the first feature number threshold, it indicates that the feature number of the tag feature obtained after feature encoding is less, in this case, a feature aggregation mode for performing aggregation processing on the tag feature is null, that is: and keeping the label characteristics before and after the aggregation according to the characteristic aggregation mode, and taking the label characteristics as the target label.

(2) The number of features of the tag feature is greater than a first feature number threshold and less than or equal to a second feature number threshold:

and the value of the second characteristic number threshold is larger than that of the first characteristic number threshold, and is also preset, and when the characteristic number of the label characteristic is larger than the first characteristic number threshold and is smaller than or equal to the second characteristic number threshold, the label characteristic is aggregated into the target characteristic by adopting an aggregation algorithm.

For example, a plurality of label features are aggregated into a label feature vector in an averaging manner, and the label feature vector is used as a representative of the label to be registered; for another example, the plurality of tag features of the tag to be registered are input into an LSTM (Long Short-Term Memory, Long Short-Term Memory network) model for aggregation, and the aggregated tag features output by the LSTM model are used as the target tag.

(3) The number of features of the tag feature is greater than a second number of features threshold.

If the number of the tags of the tag to be registered is greater than the second feature number threshold, it indicates that the number of the tags of the tag to be registered is greater, and in this case, in order to enable the target feature obtained by final aggregation to more comprehensively and accurately represent the feature information carried by the plurality of tag features of the tag to be registered, the feature information carried by the plurality of tag features of the tag to be registered is extracted by a model training mode, and the following method is specifically adopted:

taking the label characteristics as a positive sample, and selecting the label characteristics with the label types different from the label types of the labels to be registered from the label database as a negative sample;

performing a binary training based on the positive and negative examples;

Step S408, registering the one or more tag data and the target feature in a manner of writing the one or more tag data and the target feature into a tag database.

As described above, the use of tags is often focused on labeling data resources (labeling), for example, from a person, a video resource is labeled with a person tag, from a scene, an image resource is labeled with a scene tag, or from a semantic perspective, a text resource is labeled with a semantic tag. Specifically, in this embodiment, the registration of the tag to be registered is realized by writing the tag data of the tag to be registered and the target feature into the tag database.

In practical application, for the tags in the tag database, in the process of performing tagging processing on the data resources, the data resources are marked based on the tags in the tag database, in this embodiment, a video resource is taken as an example, and the tagging processing process on the data resources based on the tag database is described, the tagging processing process on the text resources, the image resources or the sound resources except for the video resource is similar to the tagging processing process on the video resource, and the following tagging processing process on the video resource provided in this embodiment is referred to, which is not described herein again one by one.

The tagging processing of the video resource provided by this embodiment is implemented by the following method:

1) acquiring a video to be processed;

2) decomposing the video to be processed to obtain image data to be processed with image dimensionality, text data to be processed with text dimensionality and/or sound data to be processed with sound dimensionality;

3) carrying out image feature coding on the image data to be processed, carrying out text feature coding on the text data to be processed and/or carrying out sound feature coding on the sound data to be processed to obtain image features, text features and/or sound features of the video to be processed;

4) retrieving in the tag database according to the image features, the text features and/or the sound features;

optionally, in the process of retrieving in the tag database, first calculating feature similarity between the image feature, the text feature and/or the sound feature and a feature vector in the tag database, and then selecting a feature vector with the highest feature similarity as the target feature;

5) and determining that the target label corresponding to the target feature is the video label of the video to be processed according to the target feature obtained by retrieval.

To sum up, the label processing method provided by the application supports registration of labels to be registered from multiple data dimensions in the label registration process, respectively performs feature coding processing on the label data of each data dimension through a coding model set for each data dimension in advance, and aggregates target features of the labels to be registered by combining the label features of each data dimension, so that the label definition mode is enriched, the label registration flexibility is improved, and meanwhile, the feature coding accuracy is improved, so that the label registration process has more accuracy and flexibility.

The embodiment of the label processing device provided by the application is as follows:

in the above embodiments, a label processing method is provided, and a label processing apparatus is provided, which is described below with reference to the accompanying drawings.

Referring to fig. 6, a schematic diagram of a label processing apparatus provided in this embodiment is shown.

The application provides a label processing apparatus, includes:

a tag data obtaining module 602 configured to obtain one or more tag data of at least one data dimension of a tag to be registered;

a feature encoding module 604, configured to perform feature encoding on at least one of one or more tag data of a data dimension according to an encoding model corresponding to the data dimension, to obtain a tag feature of the tag to be registered;

a feature aggregation module 606 configured to aggregate the tag features according to a feature aggregation manner corresponding to the feature number of the tag features to obtain target features;

a tag registration module 608 configured to register the tag to be registered by writing the one or more tag data and the target feature into a tag database.

Optionally, the tag processing apparatus further includes:

a tag data encoding module configured to encode the one or more tag data to obtain a tag vector;

a similarity calculation module configured to calculate a similarity of the tag vector with a reference tag vector in the tag database;

the similarity judging module is configured to judge whether the similarity is smaller than a preset similarity threshold value; if so, the feature encoding module 604 is run.

Optionally, the tag processing apparatus further includes:

the tag data clustering module is configured to cluster one or more tag data of any data dimension by adopting a clustering algorithm;

and the label data removing module is configured to remove the label data set with less label data from the one or more label data if the label data sets obtained by clustering are multiple.

Optionally, the feature encoding module 604 is specifically configured to perform text feature encoding on at least one of one or more tag texts of the text dimension according to a text encoding model corresponding to the text dimension, so as to obtain a text feature of the tag to be registered.

Optionally, the feature encoding module 604 is specifically configured to perform image feature encoding on at least one of the one or more tag images in the image dimension according to the image encoding model corresponding to the image dimension, so as to obtain the image feature of the tag to be registered.

Optionally, the feature aggregation module 606 is specifically configured to, if the feature number of the tag feature is smaller than or equal to a first feature number threshold, take the tag feature as the target feature.

Optionally, the feature aggregation module 606 is specifically configured to, if the feature number of the tag feature is greater than a first feature number threshold and less than or equal to a second feature number threshold, aggregate the tag feature into the target feature by using an aggregation algorithm.

Optionally, the feature aggregation module 606 includes:

the sample determining submodule is configured to take the tag feature as a positive sample and select the tag feature with the tag type different from that of the tag to be registered from the tag database as a negative sample if the feature number of the tag feature is larger than a second feature number threshold;

a training sub-module configured to perform a classification training based on the positive examples and the negative examples;

and the target feature determination submodule is configured to construct a parameter vector according to the training parameters obtained by training, and the parameter vector is used as the target feature.

Optionally, the tag processing apparatus further includes:

the video processing device comprises a to-be-processed video acquisition module, a to-be-processed video acquisition module and a to-be-processed video acquisition module, wherein the to-be-processed video acquisition module is configured to acquire a to-be-processed video;

the video decomposition module is configured to decompose the video to be processed to obtain image data to be processed in an image dimension, text data to be processed in a text dimension and/or sound data to be processed in a sound dimension;

the video data coding module is configured to perform image feature coding on the image data to be processed, perform text feature coding on the text data to be processed and/or perform sound feature coding on the sound data to be processed to obtain image features, text features and/or sound features of the video to be processed;

a feature retrieval module configured to retrieve in the tag database according to the image features, the text features, and/or the sound features;

and the video tag determining module is configured to determine a target tag corresponding to the target feature as a video tag of the video to be processed according to the target feature obtained by retrieval.

Optionally, the feature retrieving module includes:

a feature similarity operator module configured to calculate feature similarities of the image features, the text features, and/or the sound features with feature vectors in the tag database;

and the target feature selection submodule is configured to select the feature vector with the highest feature similarity as the target feature.

The embodiment of the computing device provided by the application is as follows:

FIG. 7 is a block diagram illustrating a configuration of a computing device 700 provided according to one embodiment of the present application. The components of the computing device 700 include, but are not limited to, memory 710 and a processor 720. Processor 720 is coupled to memory 710 via bus 730, and database 750 is used to store data.

Computing device 700 also includes access device 740, access device 740 enabling computing device 700 to communicate via one or more networks 760. Examples of such networks include the Public Switched Telephone Network (PSTN), a Local Area Network (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), or a combination of communication networks such as the internet. Access device 740 may include one or more of any type of network interface, e.g., a Network Interface Card (NIC), wired or wireless, such as an IEEE802.11 Wireless Local Area Network (WLAN) wireless interface, a worldwide interoperability for microwave access (Wi-MAX) interface, an ethernet interface, a Universal Serial Bus (USB) interface, a cellular network interface, a bluetooth interface, a Near Field Communication (NFC) interface, and so forth.

In one embodiment of the application, the above-described components of the computing device 700 and other components not shown in fig. 7 may also be connected to each other, for example, by a bus. It should be understood that the block diagram of the computing device architecture shown in FIG. 7 is for purposes of example only and is not limiting as to the scope of the present description. Those skilled in the art may add or replace other components as desired.

Computing device 700 may be any type of stationary or mobile computing device, including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), mobile phone (e.g., smartphone), wearable computing device (e.g., smartwatch, smartglasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or PC. Computing device 700 may also be a mobile or stationary server.

The present application provides another computing device comprising a memory 710, a processor 720, and computer instructions stored on the memory and executable on the processor, the processor 720 being configured to execute the following computer-executable instructions:

acquiring a video to be processed;

Another embodiment of a computing device provided by the present application is as follows:

FIG. 8 is a block diagram illustrating a configuration of a computing device 800 provided according to one embodiment of the present application. The components of the computing device 800 include, but are not limited to, memory 810 and a processor 820. The processor 820 is coupled to the memory 810 via a bus 830, and the database 850 is used to store data.

Computing device 800 also includes access device 840, access device 840 enabling computing device 800 to communicate via one or more networks 860. Examples of such networks include the Public Switched Telephone Network (PSTN), a Local Area Network (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), or a combination of communication networks such as the internet. Access device 840 may include one or more of any type of network interface (e.g., a Network Interface Card (NIC)) whether wired or wireless, such as an IEEE802.11 Wireless Local Area Network (WLAN) wireless interface, a worldwide interoperability for microwave access (Wi-MAX) interface, an ethernet interface, a Universal Serial Bus (USB) interface, a cellular network interface, a bluetooth interface, a Near Field Communication (NFC) interface, and so forth.

In one embodiment of the application, the above-described components of the computing device 800 and other components not shown in fig. 8 may also be connected to each other, for example, by a bus. It should be understood that the block diagram of the computing device architecture shown in FIG. 8 is for purposes of example only and is not limiting as to the scope of the description. Those skilled in the art may add or replace other components as desired.

Computing device 800 may be any type of stationary or mobile computing device, including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), a mobile phone (e.g., smartphone), a wearable computing device (e.g., smartwatch, smartglasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or PC. Computing device 800 may also be a mobile or stationary server.

The present application provides a computing device comprising a memory 810, a processor 820, and computer instructions stored on the memory and executable on the processor, the processor 820 being configured to execute the following computer-executable instructions:

Optionally, after the instruction for obtaining one or more tag data of at least one data dimension of the tag to be registered is executed, and before the instruction for obtaining the tag feature of the tag to be registered is executed, and the encoding model corresponding to the data dimension is used to perform feature encoding on at least one of the one or more tag data of the data dimension, the processor 820 is further configured to execute the following computer-executable instructions:

encoding the one or more label data to obtain a label vector;

performing a binary training based on the positive and negative examples;

Optionally, the processor 820 is further configured to execute the following computer-executable instructions:

acquiring a video to be processed;

The embodiment of a computer-readable storage medium provided by the application is as follows:

one embodiment of the present application provides a computer-readable storage medium storing computer instructions that, when executed by a processor, implement the steps of the video processing method.

The above is an illustrative scheme of a computer-readable storage medium of the present embodiment. It should be noted that the technical solution of the storage medium belongs to the same concept as the technical solution of the above-mentioned video processing method, and details that are not described in detail in the technical solution of the storage medium can be referred to the description of the technical solution of the above-mentioned video processing method.

Another embodiment of a computer-readable storage medium provided by the present application is as follows:

one embodiment of the present application provides a computer-readable storage medium storing computer instructions that, when executed by a processor, implement the steps of the tag processing method.

The above is an illustrative scheme of a computer-readable storage medium of the present embodiment. It should be noted that the technical solution of the storage medium belongs to the same concept as the technical solution of the above-mentioned label processing method, and details that are not described in detail in the technical solution of the storage medium can be referred to the description of the technical solution of the above-mentioned label processing method.

The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

The computer instructions comprise computer program code which may be in the form of source code, object code, an executable file or some intermediate form, or the like. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.

It should be noted that, for the sake of simplicity, the foregoing method embodiments are described as a series of acts, but those skilled in the art should understand that the present embodiment is not limited by the described acts, because some steps may be performed in other sequences or simultaneously according to the present embodiment. Further, those skilled in the art should also appreciate that the embodiments described in this specification are preferred embodiments and that acts and modules referred to are not necessarily required for an embodiment of the specification.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

The preferred embodiments of the present specification disclosed above are intended only to aid in the description of the specification. Alternative embodiments are not exhaustive and do not limit the invention to the precise embodiments described. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the embodiments and the practical application, to thereby enable others skilled in the art to best understand and utilize the embodiments. The specification is limited only by the claims and their full scope and equivalents.

Claims

1. A video processing method, comprising:

acquiring a video to be processed;

2. The video processing method according to claim 1, wherein a tag written by tag registration is recorded in the tag database, the tag registration includes:

3. The video processing method according to claim 1, wherein said obtaining intermediate video data of at least one data dimension based on the video to be processed comprises:

4. The video processing method according to claim 1, wherein after the step of retrieving from the tag database according to the video features and obtaining the video tag of the video to be processed is executed, the method further comprises:

and/or the presence of a gas in the gas,

5. The video processing method of claim 1, the video to be processed comprising an interactive video;

the intermediate video data includes interactive data included in the interactive video.

6. A video processing apparatus comprising:

the acquisition module is configured to acquire a video to be processed;

7. A label processing method, comprising:

8. The tag processing method according to claim 7, wherein after the step of obtaining one or more tag data of at least one data dimension of the tag to be registered is executed, and before the step of obtaining the tag feature of the tag to be registered is executed, the step of obtaining at least one of the one or more tag data of the data dimension by feature coding according to the coding model corresponding to the data dimension includes:

encoding the one or more label data to obtain a label vector;

9. The tag processing method according to claim 7, wherein after the step of obtaining one or more tag data of at least one data dimension of the tag to be registered is executed, and before the step of obtaining the tag feature of the tag to be registered is executed, the step of obtaining at least one of the one or more tag data of the data dimension by feature coding according to the coding model corresponding to the data dimension includes:

10. The tag processing method according to claim 7, wherein the data dimension includes a text dimension, and the obtaining the tag feature of the tag to be registered by feature coding at least one of one or more tag data of the data dimension according to a coding model corresponding to the data dimension includes:

11. The tag processing method according to claim 7, wherein the data dimension includes an image dimension, and the obtaining the tag feature of the tag to be registered by feature coding at least one of one or more tag data of the data dimension according to a coding model corresponding to the data dimension includes:

12. The label processing method according to claim 7, wherein the aggregating the label features according to the feature aggregation manner corresponding to the feature number of the label features to obtain the target features comprises:

13. The label processing method according to claim 7, wherein the aggregating the label features according to the feature aggregation manner corresponding to the feature number of the label features to obtain the target features comprises:

14. The label processing method according to claim 7, wherein the aggregating the label features according to the feature aggregation manner corresponding to the feature number of the label features to obtain the target features comprises:

performing a binary training based on the positive and negative examples;

15. The label processing method of claim 7, further comprising:

acquiring a video to be processed;

16. The tag processing method of claim 15, the retrieving in the tag database according to the image feature, the text feature, and/or the sound feature, comprising:

17. A label processing apparatus comprising:

18. A computing device, comprising:

a memory and a processor;

acquiring a video to be processed;

19. A computing device, comprising:

a memory and a processor;

20. A computer readable storage medium storing computer instructions which, when executed by a processor, implement the steps of the video processing method of any of claims 1 to 5.

21. A computer readable storage medium storing computer instructions which, when executed by a processor, carry out the steps of the label processing method of claim 7 or 16.