CN113094552A

CN113094552A - Video template searching method and device, server and readable storage medium

Info

Publication number: CN113094552A
Application number: CN202110296795.5A
Authority: CN
Inventors: 仲召来
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2021-03-19
Filing date: 2021-03-19
Publication date: 2021-07-09

Abstract

The disclosure discloses a video template searching method, a video template searching device, a video template searching server and a readable storage medium, and belongs to the technical field of internet. The searching method of the video template comprises the steps of obtaining uploaded video materials; extracting content characteristics of the video material to obtain a label of the video material; wherein, the label is used for representing the content characteristics of the video material; respectively calculating the similarity between the label of each video template and the label of the video material aiming at a plurality of video templates in a video template library; and outputting a search result according to the similarity between the label of each video template and the label of the video material, wherein the search result is the video template of which the similarity meets the preset condition in the plurality of video templates. By adopting the video template searching method, the video template searching device, the video template searching server and the readable storage medium, the problem that the searching result is not accurate enough when the searching word is used for searching the video template in the related technology can be at least solved.

Description

Video template searching method and device, server and readable storage medium

Technical Field

The disclosure relates to the technical field of internet, in particular to a method, a device, a server and a readable storage medium for searching a video template.

Background

With the development of internet technology, a user can play videos on electronic equipment such as a mobile phone and a tablet personal computer, and one way of making videos is that the user downloads a video template provided by a video platform, and adds or modifies material contents which are allowed to be customized by the user in the video template. The video template may include media-type resources such as pictures, texts, audio, video, etc., wherein some or all of the media-type resources may allow the user to customize the media-type resources.

In the related art, if a user wants to download a resource, a search keyword may be input in a search box of a video template provided by a video platform, and a background server may push the video template related to the search term to the user based on the search keyword input by the user. However, the inventors have found that the video templates pushed for the user based on the search terms are not accurate enough, since the search terms may not accurately represent the content characteristics of the video material used by the user to make the video.

Disclosure of Invention

An object of the embodiments of the present disclosure is to provide a method, an apparatus, a server, and a readable storage medium for searching a video template, so as to at least solve the problem in the related art that a search result is not accurate enough when a search word is used to search a video template.

The technical scheme of the disclosure is as follows:

according to a first aspect of embodiments of the present disclosure, there is provided a method for searching a video template, which may include:

acquiring an uploaded video material;

extracting content characteristics of the video material to obtain a label of the video material; wherein, the label is used for representing the content characteristics of the video material;

respectively calculating the similarity between the label of each video template and the label of the video material aiming at a plurality of video templates in a video template library;

and outputting a search result according to the similarity between the label of each video template and the label of the video material, wherein the search result is the video template of which the similarity meets the preset condition in the plurality of video templates.

In one embodiment, before calculating the similarity between the label of each video template and the label of the video material, the method further includes:

extracting resources of different media types in each video template, and/or extracting resources of different media types in the associated video of each video template; wherein each associated video comprises an example video and/or user work generated based on a corresponding video template;

extracting content features aiming at the extracted resources of the corresponding media types through the feature extraction model corresponding to each media type so as to generate a feature vector of each video template; the feature vector of each video template comprises content features extracted aiming at resources of different media types of the corresponding video template and/or content features extracted aiming at resources of different media types in the associated video of the corresponding video template;

and generating a label corresponding to the video template according to the feature vector of each video template.

Based on this, in one embodiment, extracting content features for the extracted resources of the corresponding media types through the feature extraction model corresponding to each media type to generate a feature vector of each video template includes:

extracting semantic features and/or emotional features in resources of the audio media type through a voice recognition feature extraction model to obtain a first feature vector; and/or the presence of a gas in the gas,

and extracting character features and/or object features in the resources of the image media type through the image recognition feature extraction model to obtain a second feature vector.

In another embodiment, extracting the content features of the video material to obtain the tags of the video material includes:

determining the media type of the uploaded video material;

selecting a feature extraction model corresponding to the media type of the video material from feature extraction models corresponding to different media types, and extracting content features of the video material to generate feature vectors of the video material;

and generating a label of the video material according to the feature vector of the video material.

In another embodiment, the calculating the similarity between the label of each video template and the label of the video material includes:

respectively calculating vector cosine similarity between each feature vector and the feature vector of the video material aiming at each video template to obtain a plurality of vector cosine similarities corresponding to the video templates;

and aiming at each video template, generating the similarity between the label of the video material and the label of the corresponding video template according to the corresponding cosine similarity of the plurality of vectors.

Based on this, in one embodiment, generating the similarity between the label of the video material and the label of the corresponding video template according to the cosine similarity of the corresponding vectors includes:

acquiring the influence weight of each feature vector on the similarity;

and according to the influence weight of each feature vector, carrying out weighted calculation on the cosine similarity of the vectors to obtain the similarity between the label of the video material and the label of the corresponding video template.

According to a second aspect of the embodiments of the present disclosure, there is provided an apparatus for searching for a video template, the apparatus may include:

the first acquisition module is configured to acquire the uploaded video material;

the first extraction module is configured to extract content features of the video material to obtain a label of the video material; wherein the tag is used for representing the content characteristics of the video material;

the calculation module is configured to calculate similarity between a label of each video template and a label of the video material aiming at a plurality of video templates in a video template library;

and the output module is configured to output a search result according to the similarity between the label of each video template and the label of the video material, wherein the search result is a video template of which the similarity meets a preset condition in the plurality of video templates.

In one embodiment, the apparatus further comprises:

a second extraction module configured to extract resources of different media types in each of the video templates and/or extract resources of different media types in associated videos of each of the video templates before calculating similarity of the tags of each of the video templates and the tags of the video materials, respectively; wherein each of the associated videos includes an example video and/or user work generated based on a corresponding video template;

a third extraction module configured to extract content features for the extracted resources of the corresponding media types through a feature extraction model corresponding to each media type to generate a feature vector of each video template; the feature vector of each video template comprises content features extracted aiming at resources of different media types of the corresponding video template and/or content features extracted aiming at resources of different media types in the associated video of the corresponding video template;

and the first generation module is configured to generate a label corresponding to the video template according to the feature vector of each video template.

Based on this, in one embodiment, the third extraction module comprises:

the first extraction unit is configured to extract semantic features and/or emotional features in resources of the audio media types through a voice recognition feature extraction model to obtain a first feature vector; and/or the presence of a gas in the gas,

and the second extraction unit is configured to extract character features and/or object features in the resources of the image media type through the image recognition feature extraction model so as to obtain a second feature vector.

In another embodiment, the first extraction module comprises:

a determination unit configured to determine a media type of the uploaded video material;

the third extraction unit is configured to select a feature extraction model corresponding to the media type of the video material from feature extraction models corresponding to different media types, and extract content features of the video material to generate feature vectors of the video material;

a first generating unit configured to generate a label of the video material according to the feature vector of the video material.

In another embodiment thereof, the calculation module comprises:

the calculating unit is configured to calculate vector cosine similarity between each feature vector and the feature vector of the video material respectively aiming at each video template to obtain a plurality of vector cosine similarities corresponding to the video templates;

and the second generation unit is configured to generate the similarity between the label of the video material and the label of the corresponding video template according to the corresponding cosine similarity of the plurality of vectors for each video template.

Based on this, in one embodiment, the second generating unit includes:

an obtaining subunit, configured to obtain an influence weight of each feature vector on the similarity;

and the calculating subunit is configured to perform weighted calculation on the cosine similarity of the plurality of vectors according to the influence weight of each feature vector to obtain the similarity between the label of the video material and the label of the corresponding video template.

According to a third aspect of embodiments of the present disclosure, there is provided a server, which may include:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to execute the instructions to implement a method of searching for a video template as shown in any embodiment of the first aspect.

According to a fourth aspect of embodiments of the present disclosure, there is provided a readable storage medium, wherein instructions, when executed by a processor of a server, enable the server to perform a method of searching for a video template as in any one of the embodiments of the first aspect.

According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product comprising computer instructions which, when executed by a processor, implement a method of searching for a video template as in any of the embodiments of the first aspect.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:

when the video template is searched, the labels of the content features extracted from the uploaded video material can be used for calculating the similarity with the labels extracted based on the content features of the video template, and the search result is output according to the similarity, so that the accuracy of the search result is improved. Compared with the method of searching the video template through the search terms, the user directly searches the video template through uploading the video materials, the steps of thinking and trying to search terms of the user are omitted, the user experience is improved, and compared with the method of searching based on the search terms, the method of searching based on the content feature labels can obtain more dimensions, and therefore more accurate search results can be obtained.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.

FIG. 1 is an architecture diagram illustrating the searching of a video template according to an exemplary embodiment;

FIG. 2 is a first interaction diagram illustrating a method of searching for video templates in accordance with an exemplary embodiment;

FIG. 3 is a second interaction diagram illustrating a method of searching for video templates in accordance with an exemplary embodiment;

FIG. 4 is a flow diagram illustrating a method of searching for a video template in accordance with an exemplary embodiment;

fig. 5 is a block diagram illustrating a structure of a search apparatus for a video template according to an exemplary embodiment;

FIG. 6 is a block diagram illustrating the structure of a server in accordance with an exemplary embodiment.

Detailed Description

In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

The video template searching method provided by the present disclosure may be applied to the architecture as shown in fig. 1, and is specifically described in detail with reference to fig. 1.

FIG. 1 is an architectural diagram illustrating a search of a video template according to an exemplary embodiment.

As shown in fig. 1, the architecture diagram includes an electronic device 11, an electronic device 12, and a server 10. The server 10 may establish a connection with the electronic device 11 and the electronic device 12 through a network Protocol, such as a hypertext Transfer Protocol over Secure Socket Layer (HTTPS), and perform information interaction. The electronic devices 11 and 12 may be devices with a communication function, such as a mobile phone, a tablet computer, and an all-in-one machine, or may be devices simulated by a virtual machine or a simulator. The server 10 may be a device or the like having storage and computing functions, such as a cloud server or a server cluster.

Based on the above architecture, when the user accesses the application program on the electronic device 11, or when the user accesses some specific websites through the electronic device 11, the application program and the websites may provide the user with an interface for searching for video templates, as shown in fig. 2 as an example of the interface, and the user may upload video materials by clicking the upload control 203. Upload control 203 is a portal for a user to upload video material. The media type of the video material may include one or more of text, pictures, video, audio. The server 10, after receiving the material uploaded by the user for making a video, extracts the content features of the video material to generate tags, and calculates similarity with the tags of the video templates in the video template library, thereby providing search results, and an exemplary search result interface is shown in fig. 3.

The following describes in detail a method for searching a video template provided in an embodiment of the present disclosure with reference to the architecture provided in fig. 1, and as shown in fig. 4, the method for searching a video template may include the following steps:

step 401, obtaining the uploaded video material.

Step 402, extracting content characteristics of the video material to obtain a label of the video material; wherein the tags are used to represent content characteristics of the video material.

Step 403, respectively calculating the similarity between the label of each video template and the label of the video material for a plurality of video templates in the video template library.

And step 404, outputting a search result according to the similarity between the label of each video template and the label of the video material, wherein the search result is a video template of which the similarity meets a preset condition in the plurality of video templates.

In step 401, video material uploaded by a user for making a video may be acquired. The video material is a material used in a video to be produced by a user, and for example, the media type of the video material may include one or more of a plurality of media types such as text, pictures, video, audio, and the like. Wherein, if the video material uploaded by the user is a video, the video can comprise the resource content of an audio media type and the resource content of an image media type.

In one example, a user may log in to an application or a website through the electronic device 11 (or the electronic device 12) in fig. 1, and input a text-type video material that the user desires to put into a video that the user wants to make through a search box 202 of a search interface 201 shown in fig. 2 provided by the application or the website, for example, the user may input text content that the user desires to put into the video: "a drop in life", in addition, the user can upload video material of picture, video, audio, etc. type by clicking the upload control 203. In this way, the server 10 may perform information interaction with the electronic device 11 (or the electronic device 12) through the network protocol, receive the video material uploaded by the user, and further, the server 10 may search for a related video template in the video template library according to the content of the video material uploaded by the user.

In step 402, since the content of the video material provided by the user can indicate some features of the video template that the user intends to search, the embodiments of the present disclosure may extract the content features of the video material after acquiring the video material, generate a tag, and then calculate the similarity between the tag of the video material and the tag of the video template.

In this way, by using the same algorithm as that for extracting the feature vector of the video template, the feature vector is generated, and then the label is generated, so that a more matched search result can be obtained when the video template is searched through the video material, and the search result is more in line with the user expectation.

The video template in the embodiment of the present disclosure may be a combination of multiple media (Multimedia), and may include resources of multiple media types such as text, sound, image, and the like, and the video template at least combines resources of two or more media types to implement a broadcast media. For example, the video template representation may include animation, movie, man-machine interaction function provided by a program, or dynamic display effect.

The video template is a template which can be used for making a video, and the video template can include replaceable resource content and non-replaceable resource content, wherein a user can replace the replaceable resource through a function provided by a platform (a website/client supported by a server) to make a video of the user.

For example, the replaceable resource content may include pictures, audio used as background music, text content, and the like, the non-replaceable resource content may include pictures except for a replaceable part of pictures, a dynamic display effect and a sound effect when every two pictures are switched, fixed text content except for the replaceable part of text content, a text display effect, and the like, and the user may replace the replaceable resource content in the video template with the video material uploaded by the user, and then, the server of the platform may combine the resource content uploaded by the user with the non-replaceable resource content in the video template to obtain a video effect that the user wants to make, thereby simplifying the operation of making a video by the user.

It should be noted that the above exemplary video template and the replaceable resource and non-replaceable resource in the video template are used to illustrate a video template, and the media type of the resource content in the video template is also used to illustrate, and is not used to limit the media type of the video template and the resource content included in the video template.

The source of the video template in the video template library may be made by a user in a network or provided by a service provider, which is not limited in this disclosure, and when any user searches for the video template, the search result of the video template provided by the embodiment of the disclosure may be a part or all of the video templates in the video template library that the user has permission to view/download/store.

Before step 403, the labels of the video templates are obtained. Here, the tags of the video templates may be used to represent content features extracted from the assets of different media types of the video templates and/or to represent content features extracted from the assets of different media types of the associated videos of each video template.

The tags for the video template may be generated in a variety of ways. As an example, the resource may be processed by a feature extraction model to extract content features in the resource to generate feature vectors, thereby generating labels from the feature vectors.

Specifically, the resources processed by the feature extraction model may include resources of different media types in the video template, and may also include resources of different media types in the associated video of the video template. The associated video may include an example video and/or user work generated based on a corresponding video template. The example video generated based on the video template may include all the assets in the video template, or the example video may also have some of the alternative assets replaced by new material, and the file formats of the example video and the video template may be different, for example, the example video is a file format of a video, and the video template may include a plurality of files, each of which is an asset of a media type. The user work is a video uploaded by any user using the above platform.

The manner of association between the associated video and the video template may be varied. An example implementation is that example videos and user works may be uploaded to a video library of the platform and selected for association with one video template at the time of uploading. Alternatively, in another exemplary embodiment, the platform may provide functionality for producing video, such that a user producing the video may produce the video at the platform after the platform selects a video template, and then, after the user confirms to upload the user work, the user work may be marked by the platform as being associated with the corresponding video template.

The feature extraction model may comprise a plurality, each feature extraction model extracting content features of a resource of one media type. Illustratively, content features of resources corresponding to media types can be extracted through feature extraction models corresponding to different media types, corresponding feature vectors are generated, and then tags of the video templates are generated according to all the feature vectors generated for each video template. Therefore, the feature vectors can be respectively calculated according to the resource contents of different media types, so that the feature vectors can more accurately express the content features of the corresponding media type resources, and the generated labels can more accurately express the content features of the video template.

For example, feature vectors of multiple dimensions can be extracted through an image feature extraction model, and can include feature vectors for recognizing characters and objects appearing in an image, and recognized objects can include living objects such as people, animals, plants and the like, and non-living objects such as automobiles, buildings, mobile phones, tables and chairs and the like; alternatively, the image feature extraction model may also extract feature vectors of other dimensions, for example, classification information for classifying the image according to a certain dimension, for example, the image may be classified into various media types such as lyrics, cheerful, and inspirations in the emotion dimension, the image may be classified into various media types such as wedding, learning, working, sports, and traveling in the scene dimension, and the classification information of the image in different dimensions may be obtained by classifying the image through the feature extraction model.

The feature extraction model may be a neural network model. The neural network model may be obtained by pre-training, and parameters in the model may be adjusted according to a required feature vector extraction effect by using a neural network model of SOTA (state of the art, leading edge level) in the field, so as to obtain a trained neural network model. After extracting feature vectors from resource contents of different media types through the neural network model, the obtained feature vectors may be expressed by using information of a certain layer in the neural network model, for example, the feature vectors may be a matrix of a last layer in the feature extraction model.

After the feature vectors carried in the content features of the resources of different media types are obtained, the feature vectors extracted from the content of the resources of different media types can be subjected to structuring processing to obtain structured tags. And the structuring processing is used for converting the information into data passing through a preset structure for expression so as to obtain a structured label conforming to the preset structure.

The label can adopt Chinese character labels, character strings with English letters, pure numbers and other expression modes.

Illustratively, the tags may include a plurality of tags, each tag representing a feature vector of the corresponding media type resource content in a structured expression manner; alternatively, the tag may be one, and the feature vectors of the resource contents of all media types are collected and expressed by one tag.

Taking a plurality of labels as an example, each label may be a feature vector output by the feature extraction model, and in the case where the feature vector extracted by the feature extraction model is a matrix of a certain layer, the structuring process corresponds to a processing algorithm from the layer to the output in the feature extraction model. Each feature vector may include a plurality of elements, and each element is used to represent feature information carried by resource content for extracting a corresponding media type for a certain dimension.

Therefore, according to the method for generating the tags, corresponding tags can be respectively generated for each video template and the video material uploaded by the user and used for making the video, and further, the search result can be determined according to the similarity between the tags of each video template and the video material.

For example, after the video template is obtained from the source of the video template, the tag of the video template may be obtained according to the above method for generating the tag, then the video template and the corresponding tag are stored in the video template library together, and after the search term is obtained, the tag of the video template is obtained from the video template library, so that the computing resource of the server 10 may be saved.

Another exemplary way is to generate tags during searching, which is suitable for the case that the computing power of the server 10 is sufficient, and the storage space can be saved.

In the embodiment of the present disclosure, for the obtained video material used for making a video, when the content features of the video material are extracted to generate the tags, an algorithm that generates the same tags of the video template may be used. Correspondingly, when the content characteristics of the video material are extracted, the media type of the video material can be determined, the content characteristics are extracted through the characteristic extraction model corresponding to the media type of the video material, and then the label of the video material is generated.

Therefore, the characteristic vector is generated by using the algorithm which is the same as the algorithm for extracting the characteristic vector of the video template, and then the label is generated, so that a more matched search result can be obtained when the video template is searched through the video material, the search result is more in line with the expectation of a user, and the accuracy of the search result is improved.

In step 403, the similarity between the obtained tags of the video material and the tags of each video template is calculated. Similarity is used to indicate how similar a video material is to a video template.

In one example, the tags are represented by words, and then, when calculating the similarity of the tags, they can be converted into word vectors by an embedding algorithm, and the similarity is represented by a vector cosine value between the word vectors.

In another example, the tags may be directly represented by feature vectors, and then, in calculating the similarity of the tags, the similarity between the feature vectors of the video material uploaded by the user and the feature vectors of the video template may be directly calculated. In the case that the feature vectors include a plurality of feature vectors, the vector cosine similarity between each feature vector of the video material and each feature vector of the video template can be calculated respectively, and then the similarity between the video material and the video template is generated according to the plurality of vector cosine similarities. Therefore, the tags of different media types of the video material and the tags of the media types corresponding to the video template can be obtained, the similarity of each media type can be obtained more specifically and accurately through respectively calculating the similarities of the different media types, further, the total similarity is calculated according to the similarities of all the media types, the similarities of all the media types can be comprehensively considered, and the similarity of the video material and the video template can be more accurately expressed by the total similarity result.

For example, taking calculating the similarity between the tag of the target video template and the tag of the video material as an example, the tag of the target video template includes three feature vectors respectively used for describing resource contents of three media types, such as audio, image, text, and the like, the tag of the video material includes the feature vector of the image video material, and the vector cosine similarity between the feature vector of the video material and each feature vector of the target video template can be calculated respectively to obtain the three vector cosine similarities. Furthermore, weighting calculation can be performed according to the cosine similarity of the three vectors to obtain the similarity between the target video template and the video material.

The weight of the formula for weighted calculation may be pre-specified, and the similarity is obtained by performing weighted calculation on the cosine similarity of the plurality of vectors according to the weight of the influence of each feature vector on the similarity. In this way, the weight parameter can be set based on the characteristics of each media type, thereby achieving the effect of adjusting the value of the similarity obtained by the final calculation. For example, the content features extracted based on the text can reflect the intention of the user more accurately, and the vector cosine similarity of the feature vector corresponding to the text media type can be set as a higher weight parameter, so that the content feature similarity of the text media type is higher in the total similarity.

Compared with the method and the device which depend on the manual labeling labels, the method and the device for searching the video template can carry more accurate information for describing the content of the video template through deep understanding and learning of the content of the video template, directly analyze the content of the video material by receiving the video material used for making the video when the video template is searched, and do not search according to user-defined search words, so that the hit rate of the video template matched with the content of the video material can be improved, and the accuracy rate of search results is improved.

After obtaining the similarity, the server 10 may return the video template with the similarity higher than the preset threshold value to the electronic device 11 of the user as the search result for displaying.

For example, the returned search results may also be sorted based on the parameter value of the similarity, and then the search results of the video templates sorted according to the relevance are provided to the user.

An exemplary display interface of search results on the electronic device 11 is shown in FIG. 3. The search results may be sorted according to the size of the similarity, or the sorting of the similarity may be adjusted in combination with other information, for example, attribute information such as title/upload time/duration/size/number of fans of an author/tag of the author of a video template, or user feedback information such as play amount/comment amount/collection amount/comment amount, on the basis of sorting by the relevance, the sorting results are adjusted according to a preset adjustment algorithm in combination with the one or more information, so that the sorting of the search results refers to the combined other information. The algorithm for adjusting the ranking specifically in combination with the attribute information and/or the user feedback information is not described herein any more, and may be designed as needed.

The feature extraction model for extracting the feature vector may be based on a computer vision CV algorithm, a speech recognition algorithm, a natural language processing NLP algorithm, or the like, and when a neural network model is used, the model may specifically be a convolutional neural network CNN model, a multi-neuron-based self-coding feature extraction model, or the like. The neural network model is used as a feature extraction model, and the feature vectors can more accurately express the content features in the video template through multiple iterations of deep learning.

For example, for the resource content of the audio media type in the video template, the semantic features and/or the emotional features in the resource content of the audio media type are extracted through the voice recognition feature extraction model, and the semantic features and the emotional features are more consistent with the features of the content carried by the resource content of the audio media type; for the resource content of the image media type in the video template, the character features and/or the object features in the image are extracted through an image recognition feature extraction model (such as a CV algorithm model), and the character features and the object features are more consistent with the features of the content carried in the image.

In addition, the NLP algorithm model can be used for understanding semantic content in characters aiming at the characters in images and audios and resource content of character media types in video templates, and feature vectors for representing the semantics can be generated. The feature vector may be matrix data obtained by an intermediate layer of the feature extraction model, and the feature vector may be information expressed by a machine language.

In step 404, the search result may be a video template with a similarity satisfying a preset condition among the plurality of video templates. An example of the case that the preset condition is met is that after the similarity between each video template and the video material is calculated, the video templates are sorted according to the similarity, and the video template with the similarity exceeding a preset threshold is taken as a hit video template and returned to the electronic device 11 shown in fig. 1 for display; another example of the case that the preset condition is met is that, alternatively, after the similarity between each video template and the video material is calculated, the video templates are sorted according to the size of the similarity, and the video template sorted at the top n is used as the search result.

Illustratively, the ranking of the hit video templates may be based on similarity alone, or may be combined with other attribute information of the video templates, such as user feedback information. Specifically, in an exemplary embodiment, in step 404, the plurality of video templates may be further sorted according to the similarity, so as to obtain a sorting result; then, the similarity of the attribute information of each video template and the search terms and/or the feedback information of the user are combined to adjust the sequencing result; and determining a search result according to the adjusted sorting result.

The user feedback information may be feedback information of the video template from the user, such as the playing amount, the comment amount, the collection amount, the praise amount, and the like of the video template, the user feedback information is related to the popularity of the video template, and the higher the playing amount, the comment amount, the collection amount, and the praise amount of the video template is, the higher the popularity is.

After the search result is obtained, the search result is returned to the electronic device 11, and the electronic device 11 may display the search result according to the search result provided by the server 10.

Under the condition that a plurality of video templates exist in the video template library, the search range can be narrowed through presetting some screening conditions, part of the video templates are screened out from the video template library, and the similarity between the selected video templates and video materials is calculated.

One illustrative example is that in the video template library, video templates with the heat parameter of the video template higher than a preset heat threshold are searched to determine a plurality of video templates. That is, before step 403 is executed, the video template may be screened in the video template library by the above conditions.

Wherein, the heat parameter can be determined according to the feedback information of the multiple users to the video template in the video template library. For example, the popularity parameter may be determined according to user feedback information such as a play amount, an evaluation amount, a collection amount, and a like of the video template, and the numerical value of the popularity parameter is calculated according to a preset formula for obtaining the popularity parameter by the weight of each user feedback information. The popularity parameter can be associated information of each video template in the video template library, the associated information is automatically updated at intervals, when the popularity parameter needs to be used, the popularity parameter of each video template is obtained from the video template library, and the video templates with the popularity parameter higher than a preset popularity threshold are screened out, so that the hit rate of the video templates with higher popularity can be improved.

It should be noted that the application scenarios described in the embodiment of the present disclosure are for more clearly illustrating the technical solutions of the embodiment of the present disclosure, and do not constitute a limitation on the technical solutions provided in the embodiment of the present disclosure, and as a new application scenario appears, a person skilled in the art may know that the technical solutions provided in the embodiment of the present disclosure are also applicable to similar technical problems.

Based on the same inventive concept, the disclosure also provides a searching device of the video template. The details are described with reference to fig. 5.

Fig. 5 is a schematic structural diagram of a video template search apparatus 50 according to an exemplary embodiment.

As shown in fig. 5, the searching apparatus 50 of the video template may specifically include a first obtaining module 501, a first extracting module 502, a calculating module 503 and an outputting module 504. Wherein:

a first obtaining module 501 configured to obtain an uploaded video material;

a first extraction module 502 configured to extract content features of the video material to obtain a tag of the video material; wherein the tag is used for representing the content characteristics of the video material;

a calculating module 503 configured to calculate, for a plurality of video templates in a video template library, similarity between a label of each video template and a label of the video material;

an output module 504, configured to output a search result according to a similarity between the tag of each video template and the tag of the video material, where the search result is a video template of which the similarity satisfies a preset condition among the plurality of video templates.

In one embodiment, the apparatus further comprises:

Based on this, in one embodiment, the third extraction module comprises:

In another embodiment, the first extraction module comprises:

In another embodiment thereof, the calculation module comprises:

Based on this, in one embodiment, the second generating unit includes:

Therefore, when the video template is searched, the labels of the content features extracted from the uploaded video material can be used for calculating the similarity with the labels extracted based on the content features of the video template, the search result is output according to the similarity, and the accuracy of the search result is improved. Compared with the method of searching the video template through the search terms, the user directly searches the video template through uploading the video materials, the steps of thinking and trying to search terms of the user are omitted, the user experience is improved, and compared with the method of searching based on the search terms, the method of searching based on the content feature labels can obtain more dimensions, and therefore more accurate search results can be obtained.

Based on the same inventive concept, the embodiment of the present disclosure further provides a server, which is specifically described in detail with reference to fig. 6.

As shown in fig. 6, the server can implement the search method of the video template according to the embodiment of the present disclosure.

The server may include a processor 1201 and a memory 1202 storing computer instructions.

Specifically, the processor 1201 may include a Central Processing Unit (CPU), or A Specific Integrated Circuit (ASIC), or may be configured to implement one or more integrated circuits of the embodiments of the present application.

Memory 1202 may include a mass storage for information or instructions. By way of example, and not limitation, memory 1202 may include a Hard Disk Drive (HDD), a floppy disk drive, flash memory, an optical disk, a magneto-optical disk, magnetic tape, or a Universal Serial Bus (USB) drive or a combination of two or more of these. Memory 1202 may include removable or non-removable (or fixed) media, where appropriate. Memory 1202 may be internal or external to the integrated gateway device, where appropriate. In a particular embodiment, the memory 1202 is non-volatile solid-state memory. In certain embodiments, memory 1202 comprises Read Only Memory (ROM). Where appropriate, the ROM may be mask-programmed ROM, Programmable ROM (PROM), Erasable PROM (EPROM), Electrically Erasable PROM (EEPROM), electrically rewritable ROM (EAROM), or flash memory, or a combination of two or more of these.

The processor 1201, by reading and executing the computer instructions stored in the memory 1202, performs the following steps:

acquiring an uploaded video material;

and outputting a search result according to the similarity between the label of each video template and the label of the video material.

In one embodiment, the processor 1201, by reading and executing the computer instructions stored in the memory 1202, may further perform the following steps before separately calculating the similarity of the tags of each video template and the tags of the video material:

Based on this, in one embodiment, the processor 1201, by reading and executing the computer instructions stored in the memory 1202, when executing the feature extraction model corresponding to each media type, to extract content features for the extracted resources corresponding to the media type to generate the feature vector of each video template, may include performing the following steps:

determining the media type of the uploaded video material;

In another embodiment, the processor 1201, when executing the step of separately calculating the similarity between the tag of each video template and the tag of the video material, by reading and executing the computer instructions stored in the memory 1202, may include the steps of:

Based on this, in one embodiment, the processor 1201, by reading and executing the computer instructions stored in the memory 1202, when generating the similarity between the tag of the video material and the tag of the corresponding video template according to the corresponding cosine similarity of the plurality of vectors, may include performing the following steps:

acquiring the influence weight of each feature vector on the similarity;

The server provided by the embodiment of the disclosure can extract the label of the content feature from the uploaded video material when searching the video template, so that the similarity is calculated with the label extracted based on the content feature of the video template, the search result is output according to the similarity, and the accuracy of the search result is improved. Compared with the method of searching the video template through the search terms, the user directly searches the video template through uploading the video materials, the steps of thinking and trying to search terms of the user are omitted, the user experience is improved, and compared with the method of searching based on the search terms, the method of searching based on the content feature labels can obtain more dimensions, and therefore more accurate search results can be obtained.

In one example, the server can also include a transceiver 1203 and a bus 1204. As shown in fig. 6, the processor 1201, the memory 1202 and the transceiver 1203 are connected via a bus 1204 to complete communication therebetween.

The bus 1204 includes hardware, software, or both. By way of example, and not limitation, a bus may include an Accelerated Graphics Port (AGP) or other graphics bus, an Enhanced Industry Standard Architecture (EISA) bus, a Front Side Bus (FSB), a Hypertransport (HT) interconnect, an Industry Standard Architecture (ISA) bus, an infiniband interconnect, a Low Pin Count (LPC) bus, a memory bus, a Micro Channel Architecture (MCA) bus, a Peripheral Control Interconnect (PCI) bus, a PCI-Express (PCI-X) bus, a Serial Advanced Technology Attachment (SATA) bus, a video electronics standards association local (VLB) bus, or other suitable bus or a combination of two or more of these. Bus 1003 may include one or more buses, where appropriate. Although specific buses are described and shown in the embodiments of the application, any suitable buses or interconnects are contemplated by the application.

The disclosed embodiment also provides a readable storage medium, and when the instructions in the readable storage medium are executed by a processor of a server, the server is enabled to execute the searching method of the video template provided by the disclosed embodiment.

In some possible embodiments, various aspects of the methods provided by the present disclosure may also be implemented in the form of a computer program product comprising instructions/program code which, when the program product is run on a computer device, when executed by a processor, implement the steps in the methods according to various exemplary embodiments of the present disclosure, for example, the computer device may execute the method for searching for a video template recited in the embodiments of the present disclosure.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may be, for example but not limited to: an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The present disclosure is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus and computer program products according to the present disclosure. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various changes and modifications can be made in the present disclosure without departing from the spirit and scope of the disclosure. Thus, if such modifications and variations of the present disclosure fall within the scope of the claims of the present disclosure and their equivalents, the present disclosure is intended to include such modifications and variations as well.

Claims

1. A method for searching a video template, comprising:

acquiring an uploaded video material;

extracting content characteristics of the video material to obtain a label of the video material; wherein the tag is used for representing the content characteristics of the video material;

2. The method of claim 1, further comprising, prior to separately calculating the similarity of the label of each video template to the label of the video material:

extracting resources of different media types in each video template, and/or extracting resources of different media types in the associated video of each video template; wherein each of the associated videos includes an example video and/or user work generated based on a corresponding video template;

extracting content features aiming at the extracted resources of the corresponding media types through a feature extraction model corresponding to each media type so as to generate a feature vector of each video template; the feature vector of each video template comprises content features extracted aiming at resources of different media types of the corresponding video template and/or content features extracted aiming at resources of different media types in the associated video of the corresponding video template;

3. The method according to claim 2, wherein the extracting content features for the extracted resources of the corresponding media types through the feature extraction model corresponding to each media type to generate the feature vector of each video template comprises:

4. The method of claim 2, wherein extracting the content features of the video material to obtain the tags of the video material comprises:

determining the media type of the uploaded video material;

5. The method of claim 2, wherein the separately calculating the similarity of the label of each video template to the label of the video material comprises:

6. The method of claim 5, wherein generating the similarity between the label of the video material and the label of the corresponding video template according to the corresponding cosine similarities of the vectors comprises:

obtaining the influence weight of each feature vector on the similarity;

and according to the influence weight of each feature vector, carrying out weighted calculation on the cosine similarity of the plurality of vectors to obtain the similarity between the label of the video material and the label of the corresponding video template.

7. An apparatus for searching a video template, comprising:

8. A server, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the method of searching for a video template of any of claims 1 to 6.

9. A readable storage medium, wherein instructions in the readable storage medium, when executed by a processor of a server, enable the server to perform the method of searching for a video template of any of claims 1 to 6.

10. A computer program product comprising computer instructions, characterized in that said computer instructions, when executed by a processor, implement the method of searching for a video template according to any of claims 1 to 6.