CN113365102A - Video processing method and device and label processing method and device - Google Patents

Video processing method and device and label processing method and device Download PDF

Info

Publication number
CN113365102A
CN113365102A CN202010143035.6A CN202010143035A CN113365102A CN 113365102 A CN113365102 A CN 113365102A CN 202010143035 A CN202010143035 A CN 202010143035A CN 113365102 A CN113365102 A CN 113365102A
Authority
CN
China
Prior art keywords
label
data
video
tag
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010143035.6A
Other languages
Chinese (zh)
Other versions
CN113365102B (en
Inventor
毛超杰
唐铭谦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Damo Academy Beijing Technology Co ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN202010143035.6A priority Critical patent/CN113365102B/en
Publication of CN113365102A publication Critical patent/CN113365102A/en
Application granted granted Critical
Publication of CN113365102B publication Critical patent/CN113365102B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/23418Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/73Querying
    • G06F16/735Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/232Content retrieval operation locally within server, e.g. reading video streams from disk arrays
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/432Content retrieval operation from a local storage medium, e.g. hard-disk
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/442Monitoring of processes or resources, e.g. detecting the failure of a recording device, monitoring the downstream bandwidth, the number of times a movie has been viewed, the storage space available from the internal hard disk
    • H04N21/44213Monitoring of end-user related data
    • H04N21/44222Analytics of user selections, e.g. selection of programs or purchase activity
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/84Generation or processing of descriptive data, e.g. content descriptors

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Social Psychology (AREA)
  • Library & Information Science (AREA)
  • Computational Linguistics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Image Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a video processing method and device and a label processing method and device, wherein the video processing method comprises the following steps: acquiring a video to be processed; obtaining intermediate video data of at least one data dimension based on the video to be processed; encoding the intermediate video data of the at least one data dimension to obtain video characteristics corresponding to the data dimension; and retrieving in a label database according to the video characteristics to obtain the video label of the video to be processed.

Description

Video processing method and device and label processing method and device
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a video processing method and apparatus, and a tag processing method and apparatus.
Background
With the rapid development of network technology and multimedia technology, various resources, such as pictures, characters, sounds, videos and the like, emerge on the network, and these resources enrich the life of users, but also make users have difficulty in selecting when browsing resources on the network, and it is difficult to quickly locate and find desired resources when facing various resources, so that the resources need to be labeled, and by analyzing the resources propagated on the network, the tags of characters, backgrounds and the like contained in the resources are analyzed, and the analyzed tags can be used for resource search.
Disclosure of Invention
In view of the foregoing, the present application provides a video processing method, a video processing apparatus, a tag processing method, a tag processing apparatus, two computing devices, and two computer-readable storage media.
The application provides a video processing method, which comprises the following steps:
acquiring a video to be processed;
obtaining intermediate video data of at least one data dimension based on the video to be processed;
encoding the intermediate video data of the at least one data dimension to obtain video characteristics corresponding to the data dimension;
and retrieving in a label database according to the video characteristics to obtain the video label of the video to be processed.
Optionally, the tag database records tags written by tag registration, where the tag registration includes:
acquiring one or more label data of at least one data dimension of a label to be registered;
according to a coding model corresponding to a data dimension, performing feature coding on at least one of one or more label data of the data dimension to obtain a label feature of the label to be registered;
aggregating the label features according to a feature aggregation mode corresponding to the feature number of the label features to obtain target features;
and registering the one or more label data and the target characteristics in a mode of writing the one or more label data and the target characteristics into the label database.
Optionally, the obtaining intermediate video data of at least one data dimension based on the video to be processed includes:
and decomposing the video to be processed to obtain the intermediate video data of the at least one data dimension.
Optionally, after the step of retrieving in the tag database according to the video features and obtaining the video tag of the video to be processed is executed, the method further includes:
determining and recommending target videos and/or target objects recommended to a user based on the video tags of the videos to be processed;
and/or the presence of a gas in the gas,
analyzing the video browsing behavior of a user based on the video tag of the video to be processed, and determining the video browsing characteristic data of the user; the video to be processed is a historical video browsed by a user.
Optionally, the video to be processed includes an interactive video; the intermediate video data includes interactive data included in the interactive video.
The application provides a video processing apparatus, including:
the acquisition module is configured to acquire a video to be processed;
a determination module configured to obtain intermediate video data of at least one data dimension based on the video to be processed;
an encoding module configured to encode the intermediate video data of the at least one data dimension to obtain video features corresponding to the data dimension;
and the retrieval module is configured to retrieve in a tag database according to the video characteristics to obtain the video tag of the video to be processed.
The application provides a label processing method, which comprises the following steps:
acquiring one or more label data of at least one data dimension of a label to be registered;
according to a coding model corresponding to a data dimension, performing feature coding on at least one of one or more label data of the data dimension to obtain a label feature of the label to be registered;
aggregating the label features according to a feature aggregation mode corresponding to the feature number of the label features to obtain target features;
and registering the tag to be registered by writing the one or more tag data and the target characteristics into a tag database.
Optionally, after the step of obtaining one or more tag data of at least one data dimension of the tag to be registered is executed, and before the step of obtaining the tag feature of the tag to be registered is executed, the method includes:
encoding the one or more label data to obtain a label vector;
calculating the similarity between the label vector and a reference label vector in the label database;
judging whether the similarity is smaller than a preset similarity threshold value or not;
if so, executing the coding model corresponding to the data dimension, and performing feature coding on at least one of one or more label data of the data dimension to obtain the label features of the label to be registered.
Optionally, after the step of obtaining one or more tag data of at least one data dimension of the tag to be registered is executed, and before the step of obtaining the tag feature of the tag to be registered is executed, the method includes:
aiming at one or more label data of any data dimension, clustering the one or more label data by adopting a clustering algorithm;
and if the label data sets obtained by clustering are multiple, removing the label data sets with less label data from the one or more label data sets.
Optionally, the data dimension includes a text dimension, and the feature coding is performed on at least one of one or more tag data of the data dimension according to a coding model corresponding to the data dimension to obtain the tag feature of the tag to be registered, including:
and according to the text coding model corresponding to the text dimension, performing text feature coding on at least one of one or more label texts of the text dimension to obtain the text feature of the label to be registered.
Optionally, the data dimension includes an image dimension, and the feature coding is performed on at least one of one or more tag data of the data dimension according to a coding model corresponding to the data dimension to obtain the tag feature of the tag to be registered, including:
and according to the image coding model corresponding to the image dimension, carrying out image feature coding on at least one of one or more label images of the image dimension to obtain the image feature of the label to be registered.
Optionally, the aggregating the tag features according to the feature aggregation manner corresponding to the feature number of the tag features to obtain the target features includes:
and if the feature number of the label feature is smaller than or equal to a first feature number threshold value, taking the label feature as the target feature.
Optionally, the aggregating the tag features according to the feature aggregation manner corresponding to the feature number of the tag features to obtain the target features includes:
and if the feature number of the label features is larger than a first feature number threshold and smaller than or equal to a second feature number threshold, adopting an aggregation algorithm to aggregate the label features into the target features.
Optionally, the aggregating the tag features according to the feature aggregation manner corresponding to the feature number of the tag features to obtain the target features includes:
if the characteristic number of the label characteristic is larger than a second characteristic number threshold value, taking the label characteristic as a positive sample, and selecting the label characteristic with the label type different from that of the label to be registered from the label database as a negative sample;
performing a binary training based on the positive and negative examples;
and constructing a parameter vector according to the training parameters obtained by training, and taking the parameter vector as the target feature.
Optionally, the tag processing method further includes:
acquiring a video to be processed;
decomposing the video to be processed to obtain image data to be processed with image dimensionality, text data to be processed with text dimensionality and/or sound data to be processed with sound dimensionality;
carrying out image feature coding on the image data to be processed, carrying out text feature coding on the text data to be processed and/or carrying out sound feature coding on the sound data to be processed to obtain image features, text features and/or sound features of the video to be processed;
retrieving in the tag database according to the image features, the text features and/or the sound features;
and determining that the target label corresponding to the target feature is the video label of the video to be processed according to the target feature obtained by retrieval.
Optionally, the retrieving in the tag database according to the image feature, the text feature and/or the sound feature includes:
calculating feature similarity of the image features, the text features and/or the sound features and feature vectors in the tag database;
and selecting the feature vector with the highest feature similarity as the target feature.
The application provides a label processing apparatus, includes:
the system comprises a tag data acquisition module, a registration module and a data processing module, wherein the tag data acquisition module is configured to acquire one or more tag data of at least one data dimension of a tag to be registered;
the characteristic coding module is configured to perform characteristic coding on at least one of one or more label data of the data dimension according to a coding model corresponding to the data dimension to obtain a label characteristic of the label to be registered;
the characteristic aggregation module is configured to aggregate the label characteristics according to a characteristic aggregation mode corresponding to the characteristic number of the label characteristics to obtain target characteristics;
and the tag registration module is configured to register the tag to be registered by writing the one or more tag data and the target feature into a tag database.
The present application further provides a computing device comprising:
a memory and a processor;
the memory is to store computer-executable instructions, and the processor is to execute the computer-executable instructions to:
acquiring a video to be processed;
obtaining intermediate video data of at least one data dimension based on the video to be processed;
encoding the intermediate video data of the at least one data dimension to obtain video characteristics corresponding to the data dimension;
and retrieving in a label database according to the video characteristics to obtain the video label of the video to be processed.
The present application further provides a computing device comprising:
a memory and a processor;
the memory is to store computer-executable instructions, and the processor is to execute the computer-executable instructions to:
acquiring one or more label data of at least one data dimension of a label to be registered;
according to a coding model corresponding to a data dimension, performing feature coding on at least one of one or more label data of the data dimension to obtain a label feature of the label to be registered;
aggregating the label features according to a feature aggregation mode corresponding to the feature number of the label features to obtain target features;
and registering the tag to be registered by writing the one or more tag data and the target characteristics into a tag database.
The present application further provides a computer readable storage medium storing computer instructions that, when executed by a processor, implement the steps of the video processing method.
The present application also provides a computer readable storage medium storing computer instructions which, when executed by a processor, implement the steps of the tag processing method.
According to the video processing method, in the process of labeling the video to be processed, more comprehensive and flexible label retrieval is carried out in the label database by acquiring the intermediate video data of the video to be processed in at least one data dimension and utilizing the video characteristics which are obtained by encoding the intermediate video data and correspond to the data dimension, so that the labeling processing of the video to be processed is realized, and more comprehensive and accurate video content expression is realized.
The label processing method provided by the application supports registration of labels to be registered from multiple data dimensions in the label registration process, carries out feature coding processing on label data of the data dimensions respectively through coding models which are set aiming at the data dimensions in advance, and combines the label features of the data dimensions to aggregate the target features of the labels to be registered, so that the label definition mode is enriched, the label registration flexibility is improved, and meanwhile, the feature coding accuracy is improved, so that the label registration process has higher accuracy and flexibility.
Drawings
Fig. 1 is a processing flow chart of a video processing method provided by an embodiment of the present application;
fig. 2 is a schematic view of a video processing scene provided in an embodiment of the present application;
fig. 3 is a schematic diagram of a video processing apparatus according to an embodiment of the present application;
fig. 4 is a processing flow chart of a tag processing method provided in an embodiment of the present application;
fig. 5 is a schematic diagram of a tag registration scenario provided in an embodiment of the present application;
fig. 6 is a schematic diagram of a label processing apparatus according to an embodiment of the present application;
FIG. 7 is a block diagram of a computing device according to an embodiment of the present disclosure;
fig. 8 is a block diagram of another computing device according to an embodiment of the present disclosure.
Detailed Description
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present description. This description may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein, as those skilled in the art will be able to make and use the present disclosure without departing from the spirit and scope of the present disclosure.
The terminology used in the description of the one or more embodiments is for the purpose of describing the particular embodiments only and is not intended to be limiting of the description of the one or more embodiments. As used in one or more embodiments of the present specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present specification refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It will be understood that, although the terms first, second, etc. may be used herein in one or more embodiments to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first can also be referred to as a second and, similarly, a second can also be referred to as a first without departing from the scope of one or more embodiments of the present description. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.
An embodiment of the application provides a video processing method, a video processing device, a label processing method, a label processing device, two kinds of computing equipment and two kinds of computer readable storage media. The following detailed description and the explanation of the steps of the method are individually made with reference to the drawings of the embodiments provided in the present specification.
The embodiment of the video processing method provided by the application is as follows:
referring to fig. 1, a processing flow chart of a video processing method provided by the embodiment is shown, and referring to fig. 2, a schematic diagram of a video processing scene provided by the embodiment is shown.
And step S102, acquiring a video to be processed.
In practical application, in the process that a video platform or a video website provides video browsing access for a user, in order to enable the user to obtain more efficient and faster video browsing experience, video content is identified through a video understanding technology so as to perform tagging processing on video, and for the user, faster and more effective video searching and browsing can be performed through a video tag. According to the video processing method, the comprehensiveness of the labels in the label database is improved by opening the registration of the labels in the label database to the user, so that the labeling processing of the video to be processed based on the label database is more comprehensive and flexible, meanwhile, in the process of labeling processing of the video to be processed, on the basis of decomposing the video to be processed according to data dimensions, more comprehensive and flexible label retrieval is carried out in the label database in which the labels are opened and registered by the user, and the labeling processing of the video to be processed is realized.
The video to be processed according to the embodiment of the present application may be a video segment or a complete video formed according to video frames, for example, the video to be processed may be a segment (5min movie segment) or a complete episode (a tv episode) in a movie, and the video to be processed may also be a complete interactive video or a video segment in an interactive video. Therefore, the video tag obtained by performing tagging processing on the video to be processed in this embodiment may also be a video tag for a video frame or a video tag for a video clip.
And step S104, acquiring intermediate video data of at least one data dimension based on the video to be processed.
Optionally, obtaining intermediate video data of at least one data dimension based on the video to be processed includes: and decomposing the video to be processed to obtain the intermediate video data of the at least one data dimension. For example, for an acquired video to be processed, the video to be processed is decomposed according to the data type of the data included in the video, and after decomposition, image data of the video in an image dimension, text data in a text dimension, and sound data in a sound dimension are obtained. For another example, in a scene where the video to be processed is an interactive video, the intermediate video data obtained by analyzing the interactive video is bullet screen data or interactive data (somatosensory interactive data or AB selection interactive data) in the interactive video.
Step S106, encoding the intermediate video data of the at least one data dimension to obtain the video characteristics corresponding to the data dimension.
And S108, retrieving in a label database according to the video characteristics to obtain the video label of the video to be processed.
Optionally, the tag database records tags written by tag registration, and the tag registration performed in the tag database is specifically implemented in the following manner:
acquiring one or more label data of at least one data dimension of a label to be registered;
according to a coding model corresponding to a data dimension, performing feature coding on at least one of one or more label data of the data dimension to obtain a label feature of the label to be registered;
aggregating the label features according to a feature aggregation mode corresponding to the feature number of the label features to obtain target features;
and registering the one or more label data and the target characteristics in a mode of writing the one or more label data and the target characteristics into the label database.
It should be noted that, the tag registration process provided in this embodiment is similar to the tag registration processing process provided in the following tag registration method embodiment, and the specific implementation process of the tag registration provided in the following tag registration method embodiment may be referred to.
For example, in the process that the user a marks the video, first, the video to be processed is read;
then, decomposing the data according to the data type of the data contained in the video, and obtaining image data of the video in an image dimension, text data in a text dimension and sound data in a sound dimension after decomposition;
secondly, performing feature coding on the image data, the text data and the sound data obtained by decomposition, specifically, coding the text data into text feature vectors by adopting a word to vector text feature coding mode, coding the image data into image feature vectors by utilizing an image feature coding algorithm, coding the sound data into sound feature vectors by utilizing a sound feature coding algorithm,
thirdly, searching in the label database according to the text feature vector, the image feature vector and the sound feature vector, wherein the searching process specifically comprises the steps of calculating cosine distances between the text feature vector, the image feature vector and the sound feature vector and feature vectors of labels in the label database to measure the similarity between the text feature vector, the image feature vector and the sound feature vector of the video and the feature vectors of the labels in the label database;
and finally, if the label to which the feature vector with the highest similarity to the text feature vector, the image feature vector and the sound feature vector of the video belongs in the label database is a football game label, marking the video as the football game label.
In practical application, after the video tag of the video to be processed is determined, further recommendation processing and analysis processing may be performed based on the determined video tag. For example, according to a video tag of a movie and television play video browsed by the user, a similar movie and television play is recommended to the user, or a person clothing or a prop in the movie and television play is recommended to the user.
In addition, the video browsing behavior of the user can be analyzed based on the video tags of the historical videos browsed by the user, and the video browsing characteristic data of the user can be determined. For example, video browsing preferences in the process of browsing videos by the user are analyzed based on historical videos browsed by the user, and more accurate video recommendation can be performed on the basis of the video browsing preferences of the user.
In summary, in the video processing method, in the process of performing tagging processing on the video to be processed, the video to be processed is decomposed according to the data dimension, and more comprehensive and flexible tag retrieval is performed in the tag database on the basis that the tag database opens tag registration to the outside, so that the tagging processing of the video to be processed is realized, and more comprehensive and accurate video content expression is realized.
An embodiment of a video processing apparatus provided in this specification is as follows:
in the above embodiments, a video processing method is provided, and a video processing apparatus is provided, which is described below with reference to the accompanying drawings.
Referring to fig. 3, a schematic diagram of a video processing apparatus provided in this embodiment is shown.
Since the device embodiments correspond to the method embodiments, the description is relatively simple, and the relevant portions may refer to the corresponding description of the method embodiments provided above. The device embodiments described below are merely illustrative.
The application provides a video processing apparatus, including:
an obtaining module 302 configured to obtain a video to be processed;
a determining module 304 configured to obtain intermediate video data of at least one data dimension based on the video to be processed;
an encoding module 306 configured to encode the intermediate video data of the at least one data dimension to obtain video features corresponding to the data dimension;
and the retrieval module 308 is configured to retrieve from the tag database according to the video features to obtain a video tag of the video to be processed.
Optionally, a tag written by tag registration is recorded in the tag database, and the tag registration is implemented by operating the following modules:
the system comprises a tag data acquisition module, a registration module and a data processing module, wherein the tag data acquisition module is configured to acquire one or more tag data of at least one data dimension of a tag to be registered;
the characteristic coding module is configured to perform characteristic coding on at least one of one or more label data of the data dimension according to a coding model corresponding to the data dimension to obtain a label characteristic of the label to be registered;
the characteristic aggregation module is configured to aggregate the label characteristics according to a characteristic aggregation mode corresponding to the characteristic number of the label characteristics to obtain target characteristics;
and the tag registration module is configured to register the tag to be registered by writing the one or more tag data and the target feature into the tag database.
Optionally, the determining module 304 is specifically configured to decompose the video to be processed to obtain intermediate video data of the at least one data dimension.
Optionally, the video processing apparatus further includes:
the recommendation module is configured to determine and recommend a target video and/or a target object to a user based on the video tag of the video to be processed;
and/or the presence of a gas in the gas,
the analysis module is configured to analyze the video browsing behavior of the user based on the video tag of the video to be processed and determine the video browsing characteristic data of the user; the video to be processed is a historical video browsed by a user.
Optionally, the video to be processed includes an interactive video; the intermediate video data includes interactive data included in the interactive video.
The embodiment of the tag processing method provided by the application is as follows:
referring to fig. 4, it shows a processing flow chart of a tag processing method provided in this embodiment, and referring to fig. 5, it shows a schematic diagram of a tag registration scenario provided in this embodiment.
Step S402, one or more label data of at least one data dimension of the label to be registered is obtained.
In practical application, in the process that a content platform or a resource website provides data resource access for a user, in order to enable the user to obtain more efficient and faster data resource access experience, the data resource is analyzed to perform labeling processing on the data resource, so that the data resource can be accessed more quickly and efficiently through a label. However, in the process of tagging data resources, the difficult problem is how to tag large-scale data resources, and the tags required by different service scenarios are different, so that it takes time and labor to customize adaptive tags for different service scenarios. The tag processing method allows a user to register tags for data resources, and supports the user to perform tag definition through tag data of multiple data dimensions, as shown in fig. 5, the user can define the tags to be registered through the tag data of one or more data dimensions, so that the manner of defining the tags by the user is enriched, the flexibility of tag registration is improved, meanwhile, the tag definition is performed through multiple data dimensions, and the accuracy of tag registration is improved.
In the tag registration process, in this embodiment, first, one or more tag data of at least one data dimension of a tag to be registered is obtained. The data dimension refers to a data type of data adopted in a process of customizing the label by a user, and common data dimensions include a text dimension, an image dimension, a sound dimension and the like. Taking a scene in which a user registers a "pet dog" tag as an example, defining the "pet dog" tag by using tag data of multiple data dimensions is embodied as that the user defines the "pet dog" tag by using one or more images of the pet dog, one or more text descriptions, or one or more sound information, or that the user defines the "pet dog" tag by using one or more text descriptions and one or more images of the pet dog.
In practical applications, the use of tags is often focused on labeling (tagging) data resources, for example, from a character, a video resource is labeled with a character tag, from a scene, an image resource is labeled with a scene tag, or from a semantic, a text resource is labeled with a semantic tag.
In specific implementation, in the process of performing tag registration, in order to improve the validity of tag registration and avoid that a user registers an invalid tag and affects the accuracy of the tag in the tag database, in an optional implementation manner provided in this embodiment, the following manner is adopted to perform mapping processing on the tag to be registered:
encoding the one or more label data to obtain a label vector;
calculating the similarity between the label vector and a reference label vector in the label database;
judging whether the similarity is smaller than a preset similarity threshold value or not;
if so, indicating that the similarity between the label to be registered currently subjected to label registration and the existing label in the label database is low, executing the following step S404;
if not, the similarity between the label to be registered currently subjected to label registration and the existing label in the label database is higher, namely: if the tag to be registered currently is likely to be an existing tag in the tag database, mapping the tag to be registered and the existing tag in the tag database whose similarity is greater than a preset similarity threshold, specifically, adding the tag data for the tag to be registered to the tag data of the existing tag, and updating the tag data of the existing tag according to the tag data of the tag to be registered.
For example, in the process of registering a pet dog label, a user submits a text description for the pet dog label and 5 pet dog images of different varieties, firstly, the text description is coded into text feature vectors by adopting text feature coding modes such as word to vector and the like, and the 5 pet dog images are respectively coded into corresponding image feature vectors by utilizing an image feature coding algorithm; then, calculating the feature similarity of the text feature vector and the feature vector of the label in the label database, and respectively calculating the feature similarity of the 5 image feature vectors and the feature vector of the label in the label database; judging whether the feature similarity obtained by calculation is smaller than a feature similarity threshold value or not, if the feature similarity between the text feature vector and the text feature vector of the 'pet dog' label in the label database is larger than or equal to the feature similarity threshold value, and the feature similarity between at least one image feature vector of the 5 image feature vectors and the image feature vector of the 'pet dog' label is larger than or equal to the feature similarity threshold value, indicating that the similarity between the currently registered 'pet dog' label of the user and the 'pet dog' label in the label database is higher, mapping the currently registered 'pet dog' label of the user and the 'pet dog' label in the label database, specifically adding the text description submitted by the user when the 'pet dog' label is registered into the text label data of the 'pet dog' label, and adding the feature similarity between at least one image feature vector of the 'pet dog' label and the image feature vector of the 'pet dog' label to be larger than or equal to the feature similarity threshold value And the eigenvector is added to the image tag data of the pet dog tag.
In addition, in practical application, in a process of performing label registration by a user, a certain deviation may exist for accuracy of label data submitted by the label registration, for example, text description information submitted by the user performing label registration may not accurately describe a feature of a label to be registered, and for example, an image submitted by the user performing label registration may not accurately represent a feature of a label to be registered, and for this situation, in order to improve accuracy and validity of the label data submitted by the label registration, in an optional implementation manner provided in this embodiment, the label data of the label to be registered is cleaned in the following manner:
aiming at one or more label data of any data dimension, clustering the one or more label data by adopting a clustering algorithm;
and if the label data sets obtained by clustering are multiple, removing the label data sets with less label data from the one or more label data sets.
According to the above example, 5 pet dog images submitted by a user in the process of registering the pet dog label are clustered by adopting a clustering algorithm, and if the 5 pet dog images are of one type, the 5 pet dog images submitted by the user have certain common characteristics, and are not processed; if the clustering result is that 4 pet dog images are classified into one type and the remaining pet dog images are classified into one type, it indicates that the similarity between the remaining pet dog images and the other 4 pet dog images is low, and possibly, if the user uploads an inappropriate image, the pet dog image is removed from the 5 pet dog images of the pet dog label.
If the tag of the pet dog to be registered by the user is marked as L, and the tag data submitted by the user aiming at the tag L of the pet dog is marked as D, the tag L of the pet dog and the tag data D can be marked as a tag tuple [ L, D ]; in addition, after the tag data D is washed, a normalized tag tuple [ Lstd, D ] is obtained.
It should be noted that, in the process of processing the to-be-registered tag, the mapping process for the to-be-registered tag provided above may be combined with the washing process for the tag data of the to-be-registered tag, so as to further improve the accuracy and the effectiveness of tag registration.
Step S404, according to the coding model corresponding to the data dimension, performing feature coding on at least one of one or more label data of the data dimension to obtain the label feature of the label to be registered.
In this embodiment, in order to improve efficiency and accuracy of feature coding performed on tag data of different data dimensions, corresponding coding models are respectively configured for feature coding for the tag data of different data dimensions, specifically, in an optional implementation manner provided in this embodiment, corresponding text coding models are provided for the tag data of text dimensions, and feature coding is performed on at least one of one or more tag data of the text dimensions by using the text coding models, so as to obtain text features of the tag to be registered in the text dimensions.
Similarly, in another optional implementation manner provided by this embodiment, a corresponding image coding model is further provided for the tag data of the image dimension, and the image feature of the tag to be registered in the image dimension is obtained by performing image feature coding on at least one of one or more tag images of the image dimension by using the image coding model.
For example, for a segment of text description submitted by a user in the process of registering a "pet dog" tag and 5 pet dog images of different varieties, the segment of text description submitted by the user is input into neural network models such as bert (bidirectional Encoder retrieval from transforms) for text feature coding, and a text feature vector of the segment of text description is output; and respectively inputting the 5 pet dog images submitted by the user into a deep convolutional network model for image feature coding, and respectively outputting image feature vectors of the 5 pet dog images.
If the label of the pet dog to be registered by the user is marked as L, and the label data submitted by the user aiming at the label L of the pet dog is marked as D, the label L of the pet dog and the label data D can be marked as a label tuple [ L, D ], and on the basis, the text image vector and the image feature vector can be further marked as V, so that the label tuple [ L, D, V ] can be obtained according to the label L of the pet dog, the label data D and the feature vector V.
Step S406, according to a feature aggregation mode corresponding to the feature number of the label features, aggregating the label features to obtain target features.
In this embodiment, from the feature number of the tag features, the tag features of the tags to be registered are aggregated by adopting different feature aggregation modes for different numbers, so as to improve the accuracy and the effectiveness of the target features obtained after feature aggregation.
Specifically, according to the feature number of the tag feature of the tag to be registered, the embodiment performs feature aggregation processing on the tag feature in the following 3 feature aggregation manners:
(1) the number of features of the tag feature is less than or equal to a first feature number threshold:
the value of the first feature number threshold is preset, and is generally set to 1, but may also be set according to a service requirement of an actual scene, and if the feature number of the tag feature of the tag to be registered is less than or equal to the first feature number threshold, it indicates that the feature number of the tag feature obtained after feature encoding is less, in this case, a feature aggregation mode for performing aggregation processing on the tag feature is null, that is: and keeping the label characteristics before and after the aggregation according to the characteristic aggregation mode, and taking the label characteristics as the target label.
(2) The number of features of the tag feature is greater than a first feature number threshold and less than or equal to a second feature number threshold:
and the value of the second characteristic number threshold is larger than that of the first characteristic number threshold, and is also preset, and when the characteristic number of the label characteristic is larger than the first characteristic number threshold and is smaller than or equal to the second characteristic number threshold, the label characteristic is aggregated into the target characteristic by adopting an aggregation algorithm.
For example, a plurality of label features are aggregated into a label feature vector in an averaging manner, and the label feature vector is used as a representative of the label to be registered; for another example, the plurality of tag features of the tag to be registered are input into an LSTM (Long Short-Term Memory, Long Short-Term Memory network) model for aggregation, and the aggregated tag features output by the LSTM model are used as the target tag.
(3) The number of features of the tag feature is greater than a second number of features threshold.
If the number of the tags of the tag to be registered is greater than the second feature number threshold, it indicates that the number of the tags of the tag to be registered is greater, and in this case, in order to enable the target feature obtained by final aggregation to more comprehensively and accurately represent the feature information carried by the plurality of tag features of the tag to be registered, the feature information carried by the plurality of tag features of the tag to be registered is extracted by a model training mode, and the following method is specifically adopted:
taking the label characteristics as a positive sample, and selecting the label characteristics with the label types different from the label types of the labels to be registered from the label database as a negative sample;
performing a binary training based on the positive and negative examples;
and constructing a parameter vector according to the training parameters obtained by training, and taking the parameter vector as the target feature.
Step S408, registering the one or more tag data and the target feature in a manner of writing the one or more tag data and the target feature into a tag database.
As described above, the use of tags is often focused on labeling data resources (labeling), for example, from a person, a video resource is labeled with a person tag, from a scene, an image resource is labeled with a scene tag, or from a semantic perspective, a text resource is labeled with a semantic tag. Specifically, in this embodiment, the registration of the tag to be registered is realized by writing the tag data of the tag to be registered and the target feature into the tag database.
In practical application, for the tags in the tag database, in the process of performing tagging processing on the data resources, the data resources are marked based on the tags in the tag database, in this embodiment, a video resource is taken as an example, and the tagging processing process on the data resources based on the tag database is described, the tagging processing process on the text resources, the image resources or the sound resources except for the video resource is similar to the tagging processing process on the video resource, and the following tagging processing process on the video resource provided in this embodiment is referred to, which is not described herein again one by one.
The tagging processing of the video resource provided by this embodiment is implemented by the following method:
1) acquiring a video to be processed;
2) decomposing the video to be processed to obtain image data to be processed with image dimensionality, text data to be processed with text dimensionality and/or sound data to be processed with sound dimensionality;
3) carrying out image feature coding on the image data to be processed, carrying out text feature coding on the text data to be processed and/or carrying out sound feature coding on the sound data to be processed to obtain image features, text features and/or sound features of the video to be processed;
4) retrieving in the tag database according to the image features, the text features and/or the sound features;
optionally, in the process of retrieving in the tag database, first calculating feature similarity between the image feature, the text feature and/or the sound feature and a feature vector in the tag database, and then selecting a feature vector with the highest feature similarity as the target feature;
5) and determining that the target label corresponding to the target feature is the video label of the video to be processed according to the target feature obtained by retrieval.
For example, in the process that the user a marks the video, first, the video to be processed is read;
then, decomposing the data according to the data type of the data contained in the video, and obtaining image data of the video in an image dimension, text data in a text dimension and sound data in a sound dimension after decomposition;
secondly, performing feature coding on the image data, the text data and the sound data obtained by decomposition, specifically, coding the text data into text feature vectors by adopting a word to vector text feature coding mode, coding the image data into image feature vectors by utilizing an image feature coding algorithm, coding the sound data into sound feature vectors by utilizing a sound feature coding algorithm,
thirdly, searching in the label database according to the text feature vector, the image feature vector and the sound feature vector, wherein the searching process specifically comprises the steps of calculating cosine distances between the text feature vector, the image feature vector and the sound feature vector and feature vectors of labels in the label database to measure the similarity between the text feature vector, the image feature vector and the sound feature vector of the video and the feature vectors of the labels in the label database;
and finally, if the label to which the feature vector with the highest similarity to the text feature vector, the image feature vector and the sound feature vector of the video belongs in the label database is a football game label, marking the video as the football game label.
To sum up, the label processing method provided by the application supports registration of labels to be registered from multiple data dimensions in the label registration process, respectively performs feature coding processing on the label data of each data dimension through a coding model set for each data dimension in advance, and aggregates target features of the labels to be registered by combining the label features of each data dimension, so that the label definition mode is enriched, the label registration flexibility is improved, and meanwhile, the feature coding accuracy is improved, so that the label registration process has more accuracy and flexibility.
The embodiment of the label processing device provided by the application is as follows:
in the above embodiments, a label processing method is provided, and a label processing apparatus is provided, which is described below with reference to the accompanying drawings.
Referring to fig. 6, a schematic diagram of a label processing apparatus provided in this embodiment is shown.
Since the device embodiments correspond to the method embodiments, the description is relatively simple, and the relevant portions may refer to the corresponding description of the method embodiments provided above. The device embodiments described below are merely illustrative.
The application provides a label processing apparatus, includes:
a tag data obtaining module 602 configured to obtain one or more tag data of at least one data dimension of a tag to be registered;
a feature encoding module 604, configured to perform feature encoding on at least one of one or more tag data of a data dimension according to an encoding model corresponding to the data dimension, to obtain a tag feature of the tag to be registered;
a feature aggregation module 606 configured to aggregate the tag features according to a feature aggregation manner corresponding to the feature number of the tag features to obtain target features;
a tag registration module 608 configured to register the tag to be registered by writing the one or more tag data and the target feature into a tag database.
Optionally, the tag processing apparatus further includes:
a tag data encoding module configured to encode the one or more tag data to obtain a tag vector;
a similarity calculation module configured to calculate a similarity of the tag vector with a reference tag vector in the tag database;
the similarity judging module is configured to judge whether the similarity is smaller than a preset similarity threshold value; if so, the feature encoding module 604 is run.
Optionally, the tag processing apparatus further includes:
the tag data clustering module is configured to cluster one or more tag data of any data dimension by adopting a clustering algorithm;
and the label data removing module is configured to remove the label data set with less label data from the one or more label data if the label data sets obtained by clustering are multiple.
Optionally, the feature encoding module 604 is specifically configured to perform text feature encoding on at least one of one or more tag texts of the text dimension according to a text encoding model corresponding to the text dimension, so as to obtain a text feature of the tag to be registered.
Optionally, the feature encoding module 604 is specifically configured to perform image feature encoding on at least one of the one or more tag images in the image dimension according to the image encoding model corresponding to the image dimension, so as to obtain the image feature of the tag to be registered.
Optionally, the feature aggregation module 606 is specifically configured to, if the feature number of the tag feature is smaller than or equal to a first feature number threshold, take the tag feature as the target feature.
Optionally, the feature aggregation module 606 is specifically configured to, if the feature number of the tag feature is greater than a first feature number threshold and less than or equal to a second feature number threshold, aggregate the tag feature into the target feature by using an aggregation algorithm.
Optionally, the feature aggregation module 606 includes:
the sample determining submodule is configured to take the tag feature as a positive sample and select the tag feature with the tag type different from that of the tag to be registered from the tag database as a negative sample if the feature number of the tag feature is larger than a second feature number threshold;
a training sub-module configured to perform a classification training based on the positive examples and the negative examples;
and the target feature determination submodule is configured to construct a parameter vector according to the training parameters obtained by training, and the parameter vector is used as the target feature.
Optionally, the tag processing apparatus further includes:
the video processing device comprises a to-be-processed video acquisition module, a to-be-processed video acquisition module and a to-be-processed video acquisition module, wherein the to-be-processed video acquisition module is configured to acquire a to-be-processed video;
the video decomposition module is configured to decompose the video to be processed to obtain image data to be processed in an image dimension, text data to be processed in a text dimension and/or sound data to be processed in a sound dimension;
the video data coding module is configured to perform image feature coding on the image data to be processed, perform text feature coding on the text data to be processed and/or perform sound feature coding on the sound data to be processed to obtain image features, text features and/or sound features of the video to be processed;
a feature retrieval module configured to retrieve in the tag database according to the image features, the text features, and/or the sound features;
and the video tag determining module is configured to determine a target tag corresponding to the target feature as a video tag of the video to be processed according to the target feature obtained by retrieval.
Optionally, the feature retrieving module includes:
a feature similarity operator module configured to calculate feature similarities of the image features, the text features, and/or the sound features with feature vectors in the tag database;
and the target feature selection submodule is configured to select the feature vector with the highest feature similarity as the target feature.
The embodiment of the computing device provided by the application is as follows:
FIG. 7 is a block diagram illustrating a configuration of a computing device 700 provided according to one embodiment of the present application. The components of the computing device 700 include, but are not limited to, memory 710 and a processor 720. Processor 720 is coupled to memory 710 via bus 730, and database 750 is used to store data.
Computing device 700 also includes access device 740, access device 740 enabling computing device 700 to communicate via one or more networks 760. Examples of such networks include the Public Switched Telephone Network (PSTN), a Local Area Network (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), or a combination of communication networks such as the internet. Access device 740 may include one or more of any type of network interface, e.g., a Network Interface Card (NIC), wired or wireless, such as an IEEE802.11 Wireless Local Area Network (WLAN) wireless interface, a worldwide interoperability for microwave access (Wi-MAX) interface, an ethernet interface, a Universal Serial Bus (USB) interface, a cellular network interface, a bluetooth interface, a Near Field Communication (NFC) interface, and so forth.
In one embodiment of the application, the above-described components of the computing device 700 and other components not shown in fig. 7 may also be connected to each other, for example, by a bus. It should be understood that the block diagram of the computing device architecture shown in FIG. 7 is for purposes of example only and is not limiting as to the scope of the present description. Those skilled in the art may add or replace other components as desired.
Computing device 700 may be any type of stationary or mobile computing device, including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), mobile phone (e.g., smartphone), wearable computing device (e.g., smartwatch, smartglasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or PC. Computing device 700 may also be a mobile or stationary server.
The present application provides another computing device comprising a memory 710, a processor 720, and computer instructions stored on the memory and executable on the processor, the processor 720 being configured to execute the following computer-executable instructions:
acquiring a video to be processed;
obtaining intermediate video data of at least one data dimension based on the video to be processed;
encoding the intermediate video data of the at least one data dimension to obtain video characteristics corresponding to the data dimension;
and retrieving in a label database according to the video characteristics to obtain the video label of the video to be processed.
Optionally, the tag database records tags written by tag registration, where the tag registration includes:
acquiring one or more label data of at least one data dimension of a label to be registered;
according to a coding model corresponding to a data dimension, performing feature coding on at least one of one or more label data of the data dimension to obtain a label feature of the label to be registered;
aggregating the label features according to a feature aggregation mode corresponding to the feature number of the label features to obtain target features;
and registering the one or more label data and the target characteristics in a mode of writing the one or more label data and the target characteristics into the label database.
Optionally, the obtaining intermediate video data of at least one data dimension based on the video to be processed includes:
and decomposing the video to be processed to obtain the intermediate video data of the at least one data dimension.
Another embodiment of a computing device provided by the present application is as follows:
FIG. 8 is a block diagram illustrating a configuration of a computing device 800 provided according to one embodiment of the present application. The components of the computing device 800 include, but are not limited to, memory 810 and a processor 820. The processor 820 is coupled to the memory 810 via a bus 830, and the database 850 is used to store data.
Computing device 800 also includes access device 840, access device 840 enabling computing device 800 to communicate via one or more networks 860. Examples of such networks include the Public Switched Telephone Network (PSTN), a Local Area Network (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), or a combination of communication networks such as the internet. Access device 840 may include one or more of any type of network interface (e.g., a Network Interface Card (NIC)) whether wired or wireless, such as an IEEE802.11 Wireless Local Area Network (WLAN) wireless interface, a worldwide interoperability for microwave access (Wi-MAX) interface, an ethernet interface, a Universal Serial Bus (USB) interface, a cellular network interface, a bluetooth interface, a Near Field Communication (NFC) interface, and so forth.
In one embodiment of the application, the above-described components of the computing device 800 and other components not shown in fig. 8 may also be connected to each other, for example, by a bus. It should be understood that the block diagram of the computing device architecture shown in FIG. 8 is for purposes of example only and is not limiting as to the scope of the description. Those skilled in the art may add or replace other components as desired.
Computing device 800 may be any type of stationary or mobile computing device, including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), a mobile phone (e.g., smartphone), a wearable computing device (e.g., smartwatch, smartglasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or PC. Computing device 800 may also be a mobile or stationary server.
The present application provides a computing device comprising a memory 810, a processor 820, and computer instructions stored on the memory and executable on the processor, the processor 820 being configured to execute the following computer-executable instructions:
acquiring one or more label data of at least one data dimension of a label to be registered;
according to a coding model corresponding to a data dimension, performing feature coding on at least one of one or more label data of the data dimension to obtain a label feature of the label to be registered;
aggregating the label features according to a feature aggregation mode corresponding to the feature number of the label features to obtain target features;
and registering the tag to be registered by writing the one or more tag data and the target characteristics into a tag database.
Optionally, after the instruction for obtaining one or more tag data of at least one data dimension of the tag to be registered is executed, and before the instruction for obtaining the tag feature of the tag to be registered is executed, and the encoding model corresponding to the data dimension is used to perform feature encoding on at least one of the one or more tag data of the data dimension, the processor 820 is further configured to execute the following computer-executable instructions:
encoding the one or more label data to obtain a label vector;
calculating the similarity between the label vector and a reference label vector in the label database;
judging whether the similarity is smaller than a preset similarity threshold value or not;
if so, executing the coding model corresponding to the data dimension, and performing feature coding on at least one of one or more label data of the data dimension to obtain the label features of the label to be registered.
Optionally, after the instruction for obtaining one or more tag data of at least one data dimension of the tag to be registered is executed, and before the instruction for obtaining the tag feature of the tag to be registered is executed, and the encoding model corresponding to the data dimension is used to perform feature encoding on at least one of the one or more tag data of the data dimension, the processor 820 is further configured to execute the following computer-executable instructions:
aiming at one or more label data of any data dimension, clustering the one or more label data by adopting a clustering algorithm;
and if the label data sets obtained by clustering are multiple, removing the label data sets with less label data from the one or more label data sets.
Optionally, the data dimension includes a text dimension, and the feature coding is performed on at least one of one or more tag data of the data dimension according to a coding model corresponding to the data dimension to obtain the tag feature of the tag to be registered, including:
and according to the text coding model corresponding to the text dimension, performing text feature coding on at least one of one or more label texts of the text dimension to obtain the text feature of the label to be registered.
Optionally, the data dimension includes an image dimension, and the feature coding is performed on at least one of one or more tag data of the data dimension according to a coding model corresponding to the data dimension to obtain the tag feature of the tag to be registered, including:
and according to the image coding model corresponding to the image dimension, carrying out image feature coding on at least one of one or more label images of the image dimension to obtain the image feature of the label to be registered.
Optionally, the aggregating the tag features according to the feature aggregation manner corresponding to the feature number of the tag features to obtain the target features includes:
and if the feature number of the label feature is smaller than or equal to a first feature number threshold value, taking the label feature as the target feature.
Optionally, the aggregating the tag features according to the feature aggregation manner corresponding to the feature number of the tag features to obtain the target features includes:
and if the feature number of the label features is larger than a first feature number threshold and smaller than or equal to a second feature number threshold, adopting an aggregation algorithm to aggregate the label features into the target features.
Optionally, the aggregating the tag features according to the feature aggregation manner corresponding to the feature number of the tag features to obtain the target features includes:
if the characteristic number of the label characteristic is larger than a second characteristic number threshold value, taking the label characteristic as a positive sample, and selecting the label characteristic with the label type different from that of the label to be registered from the label database as a negative sample;
performing a binary training based on the positive and negative examples;
and constructing a parameter vector according to the training parameters obtained by training, and taking the parameter vector as the target feature.
Optionally, the processor 820 is further configured to execute the following computer-executable instructions:
acquiring a video to be processed;
decomposing the video to be processed to obtain image data to be processed with image dimensionality, text data to be processed with text dimensionality and/or sound data to be processed with sound dimensionality;
carrying out image feature coding on the image data to be processed, carrying out text feature coding on the text data to be processed and/or carrying out sound feature coding on the sound data to be processed to obtain image features, text features and/or sound features of the video to be processed;
retrieving in the tag database according to the image features, the text features and/or the sound features;
and determining that the target label corresponding to the target feature is the video label of the video to be processed according to the target feature obtained by retrieval.
Optionally, the retrieving in the tag database according to the image feature, the text feature and/or the sound feature includes:
calculating feature similarity of the image features, the text features and/or the sound features and feature vectors in the tag database;
and selecting the feature vector with the highest feature similarity as the target feature.
The embodiment of a computer-readable storage medium provided by the application is as follows:
one embodiment of the present application provides a computer-readable storage medium storing computer instructions that, when executed by a processor, implement the steps of the video processing method.
The above is an illustrative scheme of a computer-readable storage medium of the present embodiment. It should be noted that the technical solution of the storage medium belongs to the same concept as the technical solution of the above-mentioned video processing method, and details that are not described in detail in the technical solution of the storage medium can be referred to the description of the technical solution of the above-mentioned video processing method.
Another embodiment of a computer-readable storage medium provided by the present application is as follows:
one embodiment of the present application provides a computer-readable storage medium storing computer instructions that, when executed by a processor, implement the steps of the tag processing method.
The above is an illustrative scheme of a computer-readable storage medium of the present embodiment. It should be noted that the technical solution of the storage medium belongs to the same concept as the technical solution of the above-mentioned label processing method, and details that are not described in detail in the technical solution of the storage medium can be referred to the description of the technical solution of the above-mentioned label processing method.
The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
The computer instructions comprise computer program code which may be in the form of source code, object code, an executable file or some intermediate form, or the like. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.
It should be noted that, for the sake of simplicity, the foregoing method embodiments are described as a series of acts, but those skilled in the art should understand that the present embodiment is not limited by the described acts, because some steps may be performed in other sequences or simultaneously according to the present embodiment. Further, those skilled in the art should also appreciate that the embodiments described in this specification are preferred embodiments and that acts and modules referred to are not necessarily required for an embodiment of the specification.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
The preferred embodiments of the present specification disclosed above are intended only to aid in the description of the specification. Alternative embodiments are not exhaustive and do not limit the invention to the precise embodiments described. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the embodiments and the practical application, to thereby enable others skilled in the art to best understand and utilize the embodiments. The specification is limited only by the claims and their full scope and equivalents.

Claims (21)

1. A video processing method, comprising:
acquiring a video to be processed;
obtaining intermediate video data of at least one data dimension based on the video to be processed;
encoding the intermediate video data of the at least one data dimension to obtain video characteristics corresponding to the data dimension;
and retrieving in a label database according to the video characteristics to obtain the video label of the video to be processed.
2. The video processing method according to claim 1, wherein a tag written by tag registration is recorded in the tag database, the tag registration includes:
acquiring one or more label data of at least one data dimension of a label to be registered;
according to a coding model corresponding to a data dimension, performing feature coding on at least one of one or more label data of the data dimension to obtain a label feature of the label to be registered;
aggregating the label features according to a feature aggregation mode corresponding to the feature number of the label features to obtain target features;
and registering the one or more label data and the target characteristics in a mode of writing the one or more label data and the target characteristics into the label database.
3. The video processing method according to claim 1, wherein said obtaining intermediate video data of at least one data dimension based on the video to be processed comprises:
and decomposing the video to be processed to obtain the intermediate video data of the at least one data dimension.
4. The video processing method according to claim 1, wherein after the step of retrieving from the tag database according to the video features and obtaining the video tag of the video to be processed is executed, the method further comprises:
determining and recommending target videos and/or target objects recommended to a user based on the video tags of the videos to be processed;
and/or the presence of a gas in the gas,
analyzing the video browsing behavior of a user based on the video tag of the video to be processed, and determining the video browsing characteristic data of the user; the video to be processed is a historical video browsed by a user.
5. The video processing method of claim 1, the video to be processed comprising an interactive video;
the intermediate video data includes interactive data included in the interactive video.
6. A video processing apparatus comprising:
the acquisition module is configured to acquire a video to be processed;
a determination module configured to obtain intermediate video data of at least one data dimension based on the video to be processed;
an encoding module configured to encode the intermediate video data of the at least one data dimension to obtain video features corresponding to the data dimension;
and the retrieval module is configured to retrieve in a tag database according to the video characteristics to obtain the video tag of the video to be processed.
7. A label processing method, comprising:
acquiring one or more label data of at least one data dimension of a label to be registered;
according to a coding model corresponding to a data dimension, performing feature coding on at least one of one or more label data of the data dimension to obtain a label feature of the label to be registered;
aggregating the label features according to a feature aggregation mode corresponding to the feature number of the label features to obtain target features;
and registering the tag to be registered by writing the one or more tag data and the target characteristics into a tag database.
8. The tag processing method according to claim 7, wherein after the step of obtaining one or more tag data of at least one data dimension of the tag to be registered is executed, and before the step of obtaining the tag feature of the tag to be registered is executed, the step of obtaining at least one of the one or more tag data of the data dimension by feature coding according to the coding model corresponding to the data dimension includes:
encoding the one or more label data to obtain a label vector;
calculating the similarity between the label vector and a reference label vector in the label database;
judging whether the similarity is smaller than a preset similarity threshold value or not;
if so, executing the coding model corresponding to the data dimension, and performing feature coding on at least one of one or more label data of the data dimension to obtain the label features of the label to be registered.
9. The tag processing method according to claim 7, wherein after the step of obtaining one or more tag data of at least one data dimension of the tag to be registered is executed, and before the step of obtaining the tag feature of the tag to be registered is executed, the step of obtaining at least one of the one or more tag data of the data dimension by feature coding according to the coding model corresponding to the data dimension includes:
aiming at one or more label data of any data dimension, clustering the one or more label data by adopting a clustering algorithm;
and if the label data sets obtained by clustering are multiple, removing the label data sets with less label data from the one or more label data sets.
10. The tag processing method according to claim 7, wherein the data dimension includes a text dimension, and the obtaining the tag feature of the tag to be registered by feature coding at least one of one or more tag data of the data dimension according to a coding model corresponding to the data dimension includes:
and according to the text coding model corresponding to the text dimension, performing text feature coding on at least one of one or more label texts of the text dimension to obtain the text feature of the label to be registered.
11. The tag processing method according to claim 7, wherein the data dimension includes an image dimension, and the obtaining the tag feature of the tag to be registered by feature coding at least one of one or more tag data of the data dimension according to a coding model corresponding to the data dimension includes:
and according to the image coding model corresponding to the image dimension, carrying out image feature coding on at least one of one or more label images of the image dimension to obtain the image feature of the label to be registered.
12. The label processing method according to claim 7, wherein the aggregating the label features according to the feature aggregation manner corresponding to the feature number of the label features to obtain the target features comprises:
and if the feature number of the label feature is smaller than or equal to a first feature number threshold value, taking the label feature as the target feature.
13. The label processing method according to claim 7, wherein the aggregating the label features according to the feature aggregation manner corresponding to the feature number of the label features to obtain the target features comprises:
and if the feature number of the label features is larger than a first feature number threshold and smaller than or equal to a second feature number threshold, adopting an aggregation algorithm to aggregate the label features into the target features.
14. The label processing method according to claim 7, wherein the aggregating the label features according to the feature aggregation manner corresponding to the feature number of the label features to obtain the target features comprises:
if the characteristic number of the label characteristic is larger than a second characteristic number threshold value, taking the label characteristic as a positive sample, and selecting the label characteristic with the label type different from that of the label to be registered from the label database as a negative sample;
performing a binary training based on the positive and negative examples;
and constructing a parameter vector according to the training parameters obtained by training, and taking the parameter vector as the target feature.
15. The label processing method of claim 7, further comprising:
acquiring a video to be processed;
decomposing the video to be processed to obtain image data to be processed with image dimensionality, text data to be processed with text dimensionality and/or sound data to be processed with sound dimensionality;
carrying out image feature coding on the image data to be processed, carrying out text feature coding on the text data to be processed and/or carrying out sound feature coding on the sound data to be processed to obtain image features, text features and/or sound features of the video to be processed;
retrieving in the tag database according to the image features, the text features and/or the sound features;
and determining that the target label corresponding to the target feature is the video label of the video to be processed according to the target feature obtained by retrieval.
16. The tag processing method of claim 15, the retrieving in the tag database according to the image feature, the text feature, and/or the sound feature, comprising:
calculating feature similarity of the image features, the text features and/or the sound features and feature vectors in the tag database;
and selecting the feature vector with the highest feature similarity as the target feature.
17. A label processing apparatus comprising:
the system comprises a tag data acquisition module, a registration module and a data processing module, wherein the tag data acquisition module is configured to acquire one or more tag data of at least one data dimension of a tag to be registered;
the characteristic coding module is configured to perform characteristic coding on at least one of one or more label data of the data dimension according to a coding model corresponding to the data dimension to obtain a label characteristic of the label to be registered;
the characteristic aggregation module is configured to aggregate the label characteristics according to a characteristic aggregation mode corresponding to the characteristic number of the label characteristics to obtain target characteristics;
and the tag registration module is configured to register the tag to be registered by writing the one or more tag data and the target feature into a tag database.
18. A computing device, comprising:
a memory and a processor;
the memory is to store computer-executable instructions, and the processor is to execute the computer-executable instructions to:
acquiring a video to be processed;
obtaining intermediate video data of at least one data dimension based on the video to be processed;
encoding the intermediate video data of the at least one data dimension to obtain video characteristics corresponding to the data dimension;
and retrieving in a label database according to the video characteristics to obtain the video label of the video to be processed.
19. A computing device, comprising:
a memory and a processor;
the memory is to store computer-executable instructions, and the processor is to execute the computer-executable instructions to:
acquiring one or more label data of at least one data dimension of a label to be registered;
according to a coding model corresponding to a data dimension, performing feature coding on at least one of one or more label data of the data dimension to obtain a label feature of the label to be registered;
aggregating the label features according to a feature aggregation mode corresponding to the feature number of the label features to obtain target features;
and registering the tag to be registered by writing the one or more tag data and the target characteristics into a tag database.
20. A computer readable storage medium storing computer instructions which, when executed by a processor, implement the steps of the video processing method of any of claims 1 to 5.
21. A computer readable storage medium storing computer instructions which, when executed by a processor, carry out the steps of the label processing method of claim 7 or 16.
CN202010143035.6A 2020-03-04 2020-03-04 Video processing method and device and label processing method and device Active CN113365102B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010143035.6A CN113365102B (en) 2020-03-04 2020-03-04 Video processing method and device and label processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010143035.6A CN113365102B (en) 2020-03-04 2020-03-04 Video processing method and device and label processing method and device

Publications (2)

Publication Number Publication Date
CN113365102A true CN113365102A (en) 2021-09-07
CN113365102B CN113365102B (en) 2022-08-16

Family

ID=77523363

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010143035.6A Active CN113365102B (en) 2020-03-04 2020-03-04 Video processing method and device and label processing method and device

Country Status (1)

Country Link
CN (1) CN113365102B (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130283307A1 (en) * 2012-04-18 2013-10-24 Narb Avedissian System and methods for providing user generated video reviews
CN106878632A (en) * 2017-02-28 2017-06-20 北京知慧教育科技有限公司 A kind for the treatment of method and apparatus of video data
CN109190482A (en) * 2018-08-06 2019-01-11 北京奇艺世纪科技有限公司 Multi-tag video classification methods and system, systematic training method and device
CN109325148A (en) * 2018-08-03 2019-02-12 百度在线网络技术(北京)有限公司 The method and apparatus for generating information
CN109684506A (en) * 2018-11-22 2019-04-26 北京奇虎科技有限公司 A kind of labeling processing method of video, device and calculate equipment
CN109710800A (en) * 2018-11-08 2019-05-03 北京奇艺世纪科技有限公司 Model generating method, video classification methods, device, terminal and storage medium
CN110059225A (en) * 2019-03-11 2019-07-26 北京奇艺世纪科技有限公司 Video classification methods, device, terminal device and storage medium
CN110263217A (en) * 2019-06-28 2019-09-20 北京奇艺世纪科技有限公司 A kind of video clip label identification method and device
CN110348362A (en) * 2019-07-05 2019-10-18 北京达佳互联信息技术有限公司 Label generation, method for processing video frequency, device, electronic equipment and storage medium
CN110781347A (en) * 2019-10-23 2020-02-11 腾讯科技(深圳)有限公司 Video processing method, device, equipment and readable storage medium
CN110837579A (en) * 2019-11-05 2020-02-25 腾讯科技(深圳)有限公司 Video classification method, device, computer and readable storage medium

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130283307A1 (en) * 2012-04-18 2013-10-24 Narb Avedissian System and methods for providing user generated video reviews
CN106878632A (en) * 2017-02-28 2017-06-20 北京知慧教育科技有限公司 A kind for the treatment of method and apparatus of video data
CN109325148A (en) * 2018-08-03 2019-02-12 百度在线网络技术(北京)有限公司 The method and apparatus for generating information
CN109190482A (en) * 2018-08-06 2019-01-11 北京奇艺世纪科技有限公司 Multi-tag video classification methods and system, systematic training method and device
CN109710800A (en) * 2018-11-08 2019-05-03 北京奇艺世纪科技有限公司 Model generating method, video classification methods, device, terminal and storage medium
CN109684506A (en) * 2018-11-22 2019-04-26 北京奇虎科技有限公司 A kind of labeling processing method of video, device and calculate equipment
CN110059225A (en) * 2019-03-11 2019-07-26 北京奇艺世纪科技有限公司 Video classification methods, device, terminal device and storage medium
CN110263217A (en) * 2019-06-28 2019-09-20 北京奇艺世纪科技有限公司 A kind of video clip label identification method and device
CN110348362A (en) * 2019-07-05 2019-10-18 北京达佳互联信息技术有限公司 Label generation, method for processing video frequency, device, electronic equipment and storage medium
CN110781347A (en) * 2019-10-23 2020-02-11 腾讯科技(深圳)有限公司 Video processing method, device, equipment and readable storage medium
CN110837579A (en) * 2019-11-05 2020-02-25 腾讯科技(深圳)有限公司 Video classification method, device, computer and readable storage medium

Also Published As

Publication number Publication date
CN113365102B (en) 2022-08-16

Similar Documents

Publication Publication Date Title
Ma et al. dipIQ: Blind image quality assessment by learning-to-rank discriminable image pairs
CN108319723B (en) Picture sharing method and device, terminal and storage medium
CN106777318B (en) Matrix decomposition cross-modal Hash retrieval method based on collaborative training
CN111026914B (en) Training method of video abstract model, video abstract generation method and device
CN110083729B (en) Image searching method and system
CN107239564B (en) Text label recommendation method based on supervision topic model
CN113434716B (en) Cross-modal information retrieval method and device
CN111783712A (en) Video processing method, device, equipment and medium
CN113869420B (en) Text recommendation method and related equipment based on contrast learning
CN111090763A (en) Automatic picture labeling method and device
WO2016142285A1 (en) Method and apparatus for image search using sparsifying analysis operators
CN110399547B (en) Method, apparatus, device and storage medium for updating model parameters
CN111046203A (en) Image retrieval method, image retrieval device, storage medium and electronic equipment
EP3166022A1 (en) Method and apparatus for image search using sparsifying analysis operators
Chandrakala et al. Application of artificial bee colony optimization algorithm for image classification using color and texture feature similarity fusion
CN113365102B (en) Video processing method and device and label processing method and device
CN115687676B (en) Information retrieval method, terminal and computer-readable storage medium
CN114510564A (en) Video knowledge graph generation method and device
CN110351183B (en) Resource collection method and device in instant messaging
CN116304184A (en) Video classification model, training method, classification method, apparatus, and storage medium
CN115269998A (en) Information recommendation method and device, electronic equipment and storage medium
CN115129902A (en) Media data processing method, device, equipment and storage medium
CN114780757A (en) Short media label extraction method and device, computer equipment and storage medium
CN111222011B (en) Video vector determining method and device
CN112364682A (en) Case searching method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20231113

Address after: Room 2801, 28th Floor, Building 9, Zone 4, Wangjing Dongyuan, Chaoyang District, Beijing

Patentee after: Alibaba Damo Academy (Beijing) Technology Co.,Ltd.

Address before: Box 847, four, Grand Cayman capital, Cayman Islands, UK

Patentee before: ALIBABA GROUP HOLDING Ltd.

TR01 Transfer of patent right