CN116304163B

CN116304163B - Image retrieval method, device, computer equipment and medium

Info

Publication number: CN116304163B
Application number: CN202310529432.0A
Authority: CN
Inventors: 黄婷婷; 杨金祥; 何理达
Original assignee: Shenzhen Rabbit Exhibition Intelligent Technology Co ltd
Current assignee: Shenzhen Rabbit Exhibition Intelligent Technology Co ltd
Priority date: 2023-05-11
Filing date: 2023-05-11
Publication date: 2023-07-25
Anticipated expiration: 2043-05-11
Also published as: CN116304163A

Abstract

The invention discloses an image retrieval method, an image retrieval device, computer equipment and a medium, wherein the method comprises the following steps: when the image to be searched input by the user is determined to be a dynamic image, the image to be searched is input to a target recognition model for dynamic special effect recognition to obtain a dynamic special effect category; extracting image features of the image to be retrieved to obtain image features; based on standard features of the historical dynamic images, the image features and the dynamic special effect categories, determining the similarity between the images to be searched and each historical dynamic image to obtain a plurality of similarity values; determining retrieval data of the images to be retrieved in a plurality of historical dynamic images based on a plurality of similarity values, and sending the retrieval data to a terminal device; when the image retrieval is carried out, the method and the device not only carry out feature extraction on the image features of the image to be retrieved, but also further extract the dynamic special effect category of the image to be retrieved, and can mine accurate original image information from multiple dimensions, thereby improving the accuracy of the image retrieval.

Description

Image retrieval method, device, computer equipment and medium

Technical Field

The present invention relates to the field of image retrieval technologies, and in particular, to an image retrieval method, an image retrieval device, a computer device, and a medium.

Background

With the rapid development of the internet and mobile terminals, the number of the internet is rapidly increased, and how to screen out wanted pictures from massive picture data becomes a problem frequently encountered in daily life of people. Searching the images by the images, wherein the function is to search and obtain a target image according to a specified image provided by a user; the function does not need the user to arrange keywords and analyze the retrieval mode by himself, the corresponding image can be retrieved directly and rapidly according to the picture, and the time consumed by the user in the process of retrieving the image is reduced.

In the existing graph searching technology, the feature similarity between a user designated image and each historical dynamic image in a database is generally calculated, so that a target image required by a user is retrieved. However, the accuracy of the mode is insufficient, important information of the image designated by the user is easily ignored, and the image retrieval result is inaccurate.

Disclosure of Invention

The invention provides an image retrieval method, an image retrieval device, computer equipment and a medium, which are used for solving the problem that the image retrieval result is inaccurate because the prior image retrieval method easily ignores important information of an image appointed by a user.

There is provided an image retrieval method including:

Acquiring an image to be searched which is input by a user through terminal equipment, and determining whether the image to be searched is a dynamic image or not;

if the image to be searched is a dynamic image, inputting the image to be searched into a target recognition model for dynamic special effect recognition to obtain a dynamic special effect category of the image to be searched, wherein the target recognition model is a neural network model obtained by performing deep learning based on dynamic special effect labels of a plurality of historical dynamic images;

extracting image features of the image to be searched to obtain the image features of the image to be searched;

acquiring a plurality of historical dynamic images and standard features of each historical dynamic image in a historical dynamic image library, and determining the similarity of the image to be searched and each historical dynamic image based on the standard features, the image features and the dynamic special effect categories to obtain a plurality of similarity values;

and determining retrieval data of the images to be retrieved in the historical dynamic images based on the similarity values, and sending the retrieval data to the terminal equipment.

Optionally, the standard feature is a standard fusion feature obtained by fusing a standard dynamic special effect feature and a standard image feature of the historical dynamic image, and determining similarity between the image to be retrieved and each historical dynamic image based on the standard feature, the image feature and the dynamic special effect category to obtain a plurality of similarity values, including:

Determining dynamic special effect characteristics of the image to be retrieved according to the dynamic special effect category;

fusing the image features of the image to be retrieved and the dynamic special effect features to obtain fusion features of the image to be retrieved;

performing similarity calculation on fusion features of the images to be retrieved and standard fusion features of each historical dynamic image to obtain a similarity value corresponding to each historical dynamic image;

and sorting the plurality of historical dynamic images in a descending order according to the similarity value to obtain retrieval data of the images to be retrieved.

Optionally, the standard feature is a standard image feature of a historical dynamic image, each historical dynamic image corresponds to a dynamic special effect tag, and based on the standard feature, the image feature and the dynamic special effect category, the similarity between the image to be retrieved and each historical dynamic image is determined, so as to obtain a plurality of similarity values, including:

matching the dynamic special effect category with the dynamic special effect label of each historical dynamic image, and marking the successfully matched historical dynamic image as an image to be confirmed;

performing similarity calculation on the image characteristics of the images to be retrieved and the image characteristics of each image to be confirmed to obtain a similarity value corresponding to each image to be confirmed;

And recording the image to be confirmed with the similarity value larger than the first preset value as retrieval data of the image to be retrieved.

Optionally, extracting image features of the image to be retrieved to obtain image features of the image to be retrieved, including:

carrying out framing treatment on the image to be searched to obtain multi-frame sub-images of the image to be searched, and carrying out element detection on each frame of sub-images to obtain component element data of each frame of sub-images;

extracting the characteristics of the component element data of each frame of sub-image to obtain the image characteristics of each frame of sub-image;

and splicing the image features of each frame of sub-image according to the dynamic display sequence of each sub-image to obtain the image features of the image to be retrieved.

Optionally, feature extraction is performed on component element data of each frame of sub-image to obtain an image feature of each frame of sub-image, including:

extracting image characteristics of the component element data of each frame of sub-image to obtain complete image characteristics of each frame of sub-image;

carrying out the de-coloring treatment on each frame of sub-image to obtain a de-coloring image of each frame of sub-image, and carrying out feature extraction on the de-coloring image of each frame of sub-image to obtain basic image features of each frame of sub-image;

calculating the similarity between the basic image features of each frame of sub-image and the corresponding complete image features to obtain a similarity matrix of each frame of sub-image;

Performing weight activation on the similarity matrix of each frame of sub-image to obtain weight data of each frame of sub-image;

and carrying out feature enhancement on the complete image features of the corresponding sub-images based on the weight data of each frame of sub-image to obtain the image features of each frame of sub-image.

Optionally, the constituent element data includes text data, pattern data and symbol data, and the feature extraction is performed on the constituent element data of each frame of sub-image to obtain a complete image feature of each frame of sub-image, including:

extracting text features of the text data of each frame of sub-image to obtain the text features of each frame of sub-image;

extracting the characteristics of the pattern data of each frame of sub-image to obtain the pattern characteristics of each frame of sub-image;

extracting the symbol characteristics of the symbol data of each frame of sub-image to obtain the symbol characteristics of each frame of sub-image;

and merging the text features, the pattern features and the symbol features to obtain the complete image features of each frame of sub-image.

Optionally, before inputting the image to be retrieved into the target recognition model for dynamic special effect recognition, the method further comprises:

determining whether the image to be searched contains dynamic special effect data of the pre-buried point or not;

if the image to be searched does not contain the dynamic special effect data of the pre-buried point, the image to be searched is input into the target recognition model for dynamic special effect recognition, and the dynamic special effect category of the image to be searched is obtained.

Optionally, after determining whether the image to be retrieved contains dynamic special effect data of the pre-buried point, the method further includes:

if the image to be searched contains the dynamic special effect data of the pre-buried points, analyzing the dynamic special effect data to obtain the dynamic special effect category of the image to be searched, wherein the dynamic special effect label comprises animation collage, dynamic typesetting, fault art, equidistant shape, texture effect, deformation transition and liquid movement.

There is provided an image retrieval apparatus including:

the first determining module is used for acquiring an image to be searched which is input by a user through the terminal equipment and determining whether the image to be searched is a dynamic image or not;

the special effect identification module is used for inputting the image to be searched into the target identification model for dynamic special effect identification if the image to be searched is a dynamic image, so as to obtain the dynamic special effect category of the image to be searched, wherein the target identification model is a neural network model obtained by performing deep learning based on dynamic special effect labels of a plurality of historical dynamic images;

the feature extraction module is used for extracting image features of the image to be searched to obtain image features of the image to be searched;

the second determining module is used for acquiring a plurality of historical dynamic images in the historical dynamic image library and standard features of each historical dynamic image, determining the similarity between the image to be retrieved and each historical dynamic image based on the standard features, the image features and the dynamic special effect categories, and obtaining a plurality of similarity values;

And the third determining module is used for determining the retrieval data of the image to be retrieved in the historical dynamic images based on the similarity values and sending the retrieval data to the terminal equipment.

There is provided a computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor executing the steps of the image retrieval method described above.

There is provided a computer readable storage medium storing a computer program which, when executed by a processor, implements the steps of the above image retrieval method.

In one technical scheme provided by the image retrieval method, the device, the computer equipment and the medium, the image to be retrieved input by a user through the terminal equipment is obtained, and whether the image to be retrieved is a dynamic image or not is determined; if the images are dynamic images, inputting the images to be searched into a target recognition model for dynamic special effect recognition to obtain dynamic special effect categories of the images to be searched, wherein the target recognition model is a neural network model obtained by deep learning based on dynamic special effect labels of a plurality of historical dynamic images; extracting image features of the image to be searched to obtain the image features of the image to be searched; acquiring a plurality of historical dynamic images and standard features of each historical dynamic image in a historical dynamic image library, and determining the similarity of the image to be searched and each historical dynamic image based on the standard features, the image features and the dynamic special effect categories to obtain a plurality of similarity values; determining retrieval data of the images to be retrieved in a plurality of historical dynamic images based on a plurality of similarity values, and sending the retrieval data to a terminal device; when the image retrieval is carried out, the method and the device not only can carry out feature extraction on the image features of the image to be retrieved, but also can further extract the dynamic special effect category of the image to be retrieved, and then, the historical dynamic image is retrieved based on the image features and the dynamic special effect category of the image to be retrieved, so that accurate original image information can be mined from multiple dimensions, and the accuracy of image retrieval is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments of the present invention will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic view of an application environment of an image retrieval method according to an embodiment of the present invention;

FIG. 2 is a flow chart of an image retrieval method according to an embodiment of the invention;

FIG. 3 is a schematic flow chart of an image retrieval method according to an embodiment of the invention;

FIG. 4 is a flowchart illustrating an implementation of step S30 in FIG. 2;

FIG. 5 is a flowchart illustrating an implementation of step S40 in FIG. 2;

FIG. 6 is a schematic diagram of an image retrieval device according to an embodiment of the present invention;

FIG. 7 is a schematic diagram of a computer device according to an embodiment of the invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The image retrieval method provided by the embodiment of the invention can be applied to an application scene as shown in fig. 1. The terminal device communicates with the server via a network. When a user needs to search an image, inputting the image to be searched through a terminal device, then acquiring the image to be searched input by the user through the terminal device by a server, and determining whether the image to be searched is a dynamic image or not; if the image to be searched is a dynamic image, inputting the image to be searched into a target recognition model for dynamic special effect recognition to obtain a dynamic special effect category of the image to be searched, wherein the target recognition model is a neural network model obtained by performing deep learning based on a plurality of historical dynamic images and dynamic special effect labels of the historical dynamic images; extracting image features of the image to be searched to obtain the image features of the image to be searched; acquiring a plurality of historical dynamic images and standard features of each historical dynamic image in a historical dynamic image library, and determining the similarity of the image to be searched and each historical dynamic image based on the standard features, the image features and the dynamic special effect categories to obtain a plurality of similarity values; and determining retrieval data of the images to be retrieved in the historical dynamic images based on the similarity values, and sending the retrieval data to the terminal equipment. In the embodiment, when image retrieval is performed, not only the image features of the image to be retrieved are extracted, but also the dynamic special effect category of the image to be retrieved is further extracted, then the historical dynamic image is retrieved based on the image features and the dynamic special effect category of the image to be retrieved, the accuracy of retrieval information is effectively considered, accurate original image information can be mined from multiple dimensions, the image with the user pointing to the dynamic special effect is effectively obtained, and therefore the accuracy of image retrieval is improved, the dynamic special effect requirement of the user on the image can be particularly met, and the user experience is improved. In addition, in the embodiment, the historical dynamic images are screened through the dynamic special effect category, so that the quantity processing amount in the process of searching the dynamic images is reduced.

The terminal equipment device can be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers and other equipment; the server may be implemented as a stand-alone server or as a server cluster composed of a plurality of servers.

In one embodiment, as shown in fig. 2, an image retrieval method is provided, and the method is applied to the server in fig. 1, and includes the following steps:

s10: and acquiring an image to be searched which is input by a user through the terminal equipment, and determining whether the image to be searched is a dynamic image or not.

When the user is designing and editing the digital product, the user may need to search the image through the terminal device. For example, when a user edits H5, a poster and a moving page on an editing platform, a certain picture needs to be newly added or replaced, at this time, the quality of an existing image of the user is not high, or the existing image is not satisfactory enough, and then the existing image is used as an input image, a high-quality original image of the existing image or an image similar to the existing image needs to be searched, at this time, the existing search engine on the editing platform can be used for searching the image. The user inputs the image to be searched into a search engine through the terminal equipment, and the terminal equipment sends the image to be searched input by the user to a server of the editing platform.

Then, the server acquires an image to be searched, which is sent by the user through the terminal equipment, and the image to be searched needs to be subjected to data analysis so as to determine whether the image to be searched is a dynamic image or not. The database of the editing platform stores a plurality of historical images, wherein the plurality of historical images comprise a plurality of dynamic images and a plurality of static images.

After determining whether the image to be searched is a dynamic image or not, if the image to be searched is determined to be a static image by analysis, a conventional image searching technology can be used, namely, image feature extraction is firstly carried out on the image to be searched to obtain image features of the image to be searched, then similarity calculation is carried out on image features of each historical static image in a database based on the image features of the image to be searched to obtain similarity between the image to be searched and each historical static image, and the historical static image with the similarity meeting the requirement is selected to serve as search data of the image to be searched and sent to a terminal device.

S20: if the image to be searched is a dynamic image, inputting the image to be searched into a target recognition model for dynamic special effect recognition, and obtaining the dynamic special effect category of the image to be searched.

After determining whether the image to be searched is a dynamic image, if the image to be searched is determined to be the dynamic image by analysis, the server needs to acquire a target recognition model obtained by training in advance, and then inputs the image to be searched into the target recognition model to perform dynamic special effect recognition, so as to obtain the dynamic special effect category of the image to be searched.

The target recognition model is a neural network model obtained by deep learning based on a plurality of historical dynamic images and dynamic special effect labels of the historical dynamic images. The target recognition model is obtained by the following steps:

acquiring a plurality of historical images containing dynamic special effects, namely acquiring a plurality of historical dynamic images, then carrying out dynamic special effect identification on each historical dynamic image to obtain a dynamic special effect label (namely a dynamic special effect category) of each historical dynamic image, and then carrying out dynamic special effect identification processing on one of the historical dynamic images by adopting a classification model containing initial parameters to obtain a dynamic special effect identification result of the historical dynamic image; calculating a total loss value based on the dynamic special effect recognition result and the dynamic special effect label of the historical dynamic image, so as to judge whether the classification model meets the convergence condition according to the total loss value; when the total loss value is larger than the preset loss value, determining that the classification model does not reach the convergence condition, and continuing to iteratively update parameters of the classification model based on other historical dynamic images until the total loss value is smaller than or equal to the preset loss value, or determining that the classification model reaches the convergence condition when the model iteration times are preset times (such as 1000 times), and outputting the converged segmentation model as a target recognition model. The dynamic special effect labels comprise animation collage, dynamic typesetting, fault art, equidistant shape, texture effect, deformation transition and liquid movement.

In this embodiment, by using a plurality of historical dynamic images as training samples, a target recognition model with higher accuracy can be obtained through training, and further dynamic special effect recognition processing is performed on an image to be searched by using the target recognition model, so that the dynamic special effect category of the image to be searched can be accurately recognized, and an accurate data basis is provided for subsequent similarity calculation and image search.

S30: and extracting image features of the image to be searched to obtain the image features of the image to be searched.

Then, the server also needs to extract image features of the image to be retrieved to obtain the image features of the image to be retrieved. The method comprises the steps of firstly detecting the constituent elements of an image to be searched to obtain constituent element data of the image to be searched, then extracting features of the constituent element data, and finally obtaining image features of the image to be searched. By identifying the constituent elements of the image to be retrieved and further extracting the image features based on the constituent element data, the accuracy of the image features is provided.

For example, when the image to be searched is determined to be a dynamic image, it is indicated that a plurality of frames of images exist in the image to be searched, after the dynamic special effect category of the image to be searched is identified, framing processing is needed to be performed on the image to be searched to obtain a plurality of frames of sub-images of the image to be searched, then image feature extraction is performed on each frame of sub-images to obtain image features of each frame of sub-images, and then image features of each frame of sub-images are spliced to obtain image features of the image to be searched. When the image to be searched is determined to be not a dynamic image, namely a static image, the image to be searched is indicated to have no multi-frame image, and dynamic special effect identification is not needed, so that the image feature extraction can be directly carried out on the image to be searched to obtain the image feature of the image to be searched. In this embodiment, the image features of the image to be retrieved also include feature information such as element features, color features and/or position features of each component element, and by increasing diversification of special effect information, accuracy of the image features is improved, and an accurate data basis is provided for subsequent calculation. In other embodiments, the feature extraction model can be directly used for extracting the pattern features and the color features of the image to be searched, and then the pattern features and the color features are fused to obtain the image features of the image to be searched, so that the method is simple and efficient.

S40: and determining the similarity between the image to be retrieved and each historical dynamic image based on the standard features, the image features and the dynamic special effect categories of each historical dynamic image to obtain a plurality of similarity values.

After extracting the image features of the image to be retrieved, the server also needs to acquire a plurality of historical dynamic images in the historical dynamic image library and standard features of each historical dynamic image. Then, the server calculates the similarity between the image to be searched and each historical dynamic image based on the standard feature of each historical dynamic image, the image feature of the image to be searched and the dynamic special effect category, and a plurality of similarity values are obtained.

The standard features are standard image features obtained by extracting image features of historical dynamic images, and each historical dynamic image corresponds to a dynamic special effect tag. In step S40, that is, based on the standard feature, the image feature and the dynamic special effect category, the similarity between the image to be retrieved and each historical dynamic image is determined, so as to obtain a plurality of similarity values, which specifically includes: matching the dynamic special effect category with the dynamic special effect label of each historical dynamic image, and marking the successfully matched historical dynamic image as an image to be confirmed; and carrying out similarity calculation on the image characteristics of the image to be retrieved and the image characteristics of each image to be confirmed to obtain a similarity value corresponding to each image to be confirmed.

S50: and determining retrieval data of the images to be retrieved in the historical dynamic images based on the similarity values, and sending the retrieval data to the terminal equipment.

After the similarity between the image to be searched and each historical dynamic image is determined, a plurality of similarity values are obtained, search data of the image to be searched are determined in the historical dynamic images based on the plurality of similarity values, and the search data are sent to terminal equipment, so that a user can browse the search data of the image to be searched in time, and a satisfactory image is selected. In the embodiment, the historical dynamic images are screened through the dynamic special effect category, so that the quantity processing amount in the process of searching the dynamic images is reduced.

In this embodiment, the plurality of historical dynamic images may be ordered in a descending order according to the corresponding similarity value, so as to obtain the retrieval data of the images to be retrieved, and then sent to the terminal device to be pushed to the user, so that the user can timely see the historical dynamic image with the maximum similarity, and user experience is improved. In addition, the image to be confirmed with the similarity value larger than the preset value can be recorded as the retrieval data of the image to be retrieved, and then the retrieval data are sent to the terminal equipment to be pushed to the user, so that the data transmission quantity is reduced on the basis that the accuracy of the recommended data is ensured, the feedback speed of the retrieval data is further improved, and the user experience is improved.

In the embodiment, when a user needs to perform image retrieval, an image to be retrieved is input through a terminal device, then a server acquires the image to be retrieved input by the user through the terminal device, and determines whether the image to be retrieved is a dynamic image; if the image to be searched is a dynamic image, inputting the image to be searched into a target recognition model for dynamic special effect recognition to obtain a dynamic special effect category of the image to be searched, wherein the target recognition model is a neural network model obtained by performing deep learning based on a plurality of historical dynamic images and dynamic special effect labels of the historical dynamic images; extracting image features of the image to be searched to obtain the image features of the image to be searched; acquiring a plurality of historical dynamic images and standard features of each historical dynamic image in a historical dynamic image library, and determining the similarity of the image to be searched and each historical dynamic image based on the standard features, the image features and the dynamic special effect categories to obtain a plurality of similarity values; and determining retrieval data of the images to be retrieved in the historical dynamic images based on the similarity values, and sending the retrieval data to the terminal equipment. In the embodiment, when image retrieval is performed, not only the image features of the image to be retrieved are extracted, but also the dynamic special effect category of the image to be retrieved is further extracted, then the historical dynamic image is retrieved based on the image features and the dynamic special effect category of the image to be retrieved, the accuracy of retrieval information is effectively considered, accurate original image information can be mined from multiple dimensions, the image with the user pointing to the dynamic special effect is effectively obtained, and therefore the accuracy of image retrieval is improved, the dynamic special effect requirement of the user on the image can be particularly met, and the user experience is improved.

In one embodiment, as shown in fig. 3, before step S20, i.e. before inputting the image to be retrieved into the object recognition model for dynamic special effect recognition, the method further specifically includes the following steps:

s01: and determining whether the image to be searched contains dynamic special effect data of the pre-buried point.

After the image to be searched is determined to be the dynamic image, whether the image to be searched contains the dynamic special effect data of the pre-buried point or not can be determined firstly, whether the image to be searched needs to be subjected to dynamic special effect recognition by adopting the target recognition model or not is determined according to the judging result, and the image to be searched is not directly input into the target recognition model for dynamic special effect recognition, so that the dynamic special effect category of the image to be searched is obtained.

The dynamic special effect data is mark data which is buried in advance in the image to be searched and is used for indicating the dynamic special effect type of the image. Before the editing platform stores a plurality of historical images into a database, the dynamic images of the historical images are required to be identified, if the historical images are static images, the image types of the historical images are marked as the static images, and the static images are stored into the database; if the historical image is a dynamic image, the dynamic special effect type of the historical image is further identified, then the dynamic special effect mark (dynamic special effect data) corresponding to the dynamic special effect type is buried in the historical image, and finally the image type of the historical image after the burying is marked as the dynamic image and is stored in a database. Different dynamic special effect categories correspond to different dynamic special effect marks, so that the dynamic special effect category can be determined by quickly reading the dynamic special effect data of the pre-buried points in the image to be searched; further, the buried point data is simply a mark, and the amount of data of the buried point can be reduced. In other embodiments, the specific dynamic special effect category can be directly used as a dynamic special effect data embedded point to the historical image, and the dynamic special effect category can be directly read later to be obtained, so that the method is simple and visual.

For example, the dynamic special effect categories include animation collage, dynamic typesetting, fault art, equidistant shape, texture effect, deformation transition, liquid movement and the like, and the corresponding marks can be T1, T2, T3, T4, T5, T6 and T7. In this embodiment, the dynamic special effect category includes animation collage, dynamic typesetting, fault art, equidistant shape, texture effect, deformation transition and liquid movement, which are only exemplary, and the corresponding marks may be other simple marks, which are not described herein.

S02: if the image to be searched does not contain the dynamic special effect data of the pre-buried point, the image to be searched is input into the target recognition model for dynamic special effect recognition, and the dynamic special effect category of the image to be searched is obtained.

After determining whether the image to be searched contains the dynamic special effect data of the pre-buried point, if the image to be searched does not contain the dynamic special effect data of the pre-buried point, which indicates that the image to be searched may not be the image stored in the editing platform server, the dynamic special effect type determination cannot be performed through the mark data, which does not contain the pre-buried point, for indicating the dynamic special effect type of the image, of the image to be searched, step S2 is executed, and the image to be searched needs to be input into the target recognition model for dynamic special effect recognition, so that the dynamic special effect type of the image to be searched is obtained.

Before the image to be searched is input into the target recognition model for dynamic special effect recognition, whether the image to be searched contains the dynamic special effect data of the pre-buried point or not is determined, after the image to be searched does not contain the dynamic special effect data of the pre-buried point, the image to be searched is input into the target recognition model for dynamic special effect recognition, so that the calculation amount required by a server for using the target recognition model with a large data amount is reduced, and the load of the server is reduced.

S03: if the image to be searched contains the dynamic special effect data of the pre-buried point, analyzing the dynamic special effect data to obtain the dynamic special effect category of the image to be searched.

After determining whether the image to be searched contains the dynamic special effect data of the pre-buried point or not, if the image to be searched contains the dynamic special effect data of the pre-buried point, the image to be searched is possibly an image stored in the editing platform server, the dynamic special effect type of the image can be determined through the mark data of the pre-buried point, which is used for indicating the dynamic special effect type of the image, and at the moment, the dynamic special effect data can be directly analyzed to obtain the dynamic special effect type of the image to be searched. By burying points for the historical images in the database, when the images to be searched input by the user are images stored by the editing platform, the dynamic special effect type of the images to be searched can be obtained quickly by analyzing the dynamic special effect data of the pre-buried points in the images to be searched, the target recognition model is not used, the calculated amount of calling the target recognition model by the server is reduced, the load of the server is reduced, and the search response speed is also improved.

For example, dynamic special effects categories include animated collage, dynamic typesetting, failure art, equidistant shape, texture effects, deformation transitions, and liquid movements, whose corresponding labels may be T1, T2, T3, T4, T5, T6, T7. If the dynamic special effect data of the pre-buried point is contained in the image to be searched for is T5, analyzing the dynamic special effect data to obtain the texture effect of the dynamic special effect type of the image to be searched for.

In this embodiment, after determining that an image to be retrieved is a dynamic image, it is required to determine whether the image to be retrieved includes dynamic special effect data of a pre-buried point, and if the image to be retrieved does not include dynamic special effect data of a pre-buried point, the image to be retrieved is input to a target recognition model to perform dynamic special effect recognition, so as to obtain a dynamic special effect category of the image to be retrieved; if the image to be searched contains the dynamic special effect data of the pre-buried point, analyzing the dynamic special effect data to obtain the dynamic special effect type of the image to be searched, by burying the point for the historical image in the database, when the image to be searched input by the user is the image stored by the editing platform, the dynamic special effect type of the image to be searched can be quickly obtained by analyzing the dynamic special effect data of the pre-buried point in the image to be searched, the target recognition model is not used, the calculation amount of the server for calling the target recognition model is reduced, the load of the server is reduced, and the searching response speed is also improved.

In one embodiment, as shown in fig. 4, in step S40, image feature extraction is performed on an image to be retrieved to obtain image features of the image to be retrieved, which specifically includes the following steps:

s31: carrying out framing treatment on the image to be searched to obtain multi-frame sub-images of the image to be searched, and carrying out element detection on each frame of sub-images to obtain component element data of each frame of sub-images;

s32: extracting the characteristics of the component element data of each frame of sub-image to obtain the image characteristics of each frame of sub-image;

s33: and splicing the image features of each frame of sub-image according to the dynamic display sequence of each sub-image to obtain the image features of the image to be retrieved.

After the dynamic special effect category of the image to be searched is identified, framing treatment is needed to be carried out on the image to be searched, multi-frame sub-images of the image to be searched are obtained, then element detection is carried out on each frame of sub-images, and component element data of each frame of sub-images are obtained. And extracting the characteristics of the component element data of each frame of sub-image to obtain the image characteristics of each frame of sub-image, and splicing the image characteristics of each frame of sub-image according to the dynamic display sequence of each sub-image to obtain the image characteristics of the image to be searched. And splicing the image features of each frame of sub-image according to the dynamic display sequence to obtain the image features of the image to be retrieved, thereby further ensuring the accuracy of the image features.

The composition element data comprises information such as each composition element, the position of the composition element, the element and the like, and the composition element comprises elements such as figures, patterns, symbols and the like. Correspondingly, the image features of the image to be retrieved also comprise element features, color features, position features and the like of each component element. The component elements, the colors and the positions of the component elements are identified, so that the accuracy of the component element data can be improved, the accuracy of image features is further improved, and an accurate data basis is provided for subsequent calculation.

In this embodiment, the multi-frame sub-image of the image to be searched is obtained by performing frame processing on the image to be searched, element detection is performed on each frame of sub-image to obtain component element data of each frame of sub-image, then feature extraction is performed on the component element data of each frame of sub-image to obtain image features of each frame of sub-image, and then the image features of each frame of sub-image are spliced according to the dynamic display sequence of each sub-image to obtain the image features of the image to be searched. The process refines the specific steps of extracting the image features of the image to be searched to obtain the image features of the image to be searched, and identifies the constituent elements, the colors and the positions of the constituent elements of each frame of image, so that the accuracy of the data of the constituent elements can be improved, the accuracy of the image features is further improved, and an accurate data basis is provided for subsequent calculation.

In one embodiment, in step S32, feature extraction is performed on component element data of each sub-image frame to obtain an image feature of each sub-image frame, which specifically includes the following steps:

s321: and extracting image characteristics of the component element data of each frame of sub-image to obtain the complete image characteristics of each frame of sub-image.

After element detection is carried out on each frame of sub-image to obtain component element data of each frame of sub-image, the server firstly carries out feature extraction on the component element data of each frame of sub-image to obtain complete image features of each frame of sub-image. The complete image features comprise each component element in each frame of sub-image, and information such as the position and the color of each component element.

S322: and carrying out the de-coloring treatment on each frame of sub-image to obtain a de-coloring image of each frame of sub-image, and carrying out the feature extraction on the de-coloring image of each frame of sub-image to obtain the basic image feature of each frame of sub-image.

Meanwhile, the server performs the color removal processing on each frame of sub-image, namely, removes the color in each frame of sub-image and performs the cleaning processing to obtain the gray level image of each frame of sub-image, namely, the color removal image of each frame of sub-image. Then, feature extraction is further required to be performed on the de-color image of each sub-image frame to obtain a basic image feature of each sub-image frame, where the extraction process of the basic image feature is described above, and the difference is that the basic image feature does not include color information of the constituent elements.

S323: and calculating the similarity between the basic image features of each frame of sub-image and the corresponding complete image features to obtain a similarity matrix of each frame of sub-image.

After the complete image features and the basic image features of each frame of sub-image are obtained, the similarity between the basic image features and the corresponding complete image features of each frame of sub-image needs to be calculated, and a similarity matrix of each frame of sub-image is obtained. For example, calculating the similarity between the basic image features and the corresponding complete image features of each frame of sub-image by adopting a covariance algorithm, and taking the obtained covariance matrix as a similarity matrix of each frame of sub-image; through a covariance algorithm, the similarity between every two sub-features (namely every two pixel points) in the two groups of features can be calculated, and the accuracy of a similarity matrix is improved.

S324: performing weight activation on the similarity matrix of each frame of sub-image to obtain weight data of each frame of sub-image;

after the similarity matrix of each frame of sub-image is obtained, the similarity matrix of each frame of sub-image needs to be subjected to weight activation by adopting a preset activation function, so that weight data of each frame of sub-image is obtained. And activating each sub-feature in each frame of sub-image by adopting a preset activation function to obtain a weight value of each sub-feature in each frame of sub-image, so as to gather and obtain weight data of each frame of sub-image.

The preset activation function can be a nonlinear function, preferably a sigmoid function, the weight calculation is performed by adopting the sigmoid function, the input can be extruded into a range from 0 to 1, the value range of the input is consistent with the value range of the probability, and the calculation speed can be increased.

S325: and carrying out feature enhancement on the complete image features of the corresponding sub-images based on the weight data of each frame of sub-image to obtain the image features of each frame of sub-image.

After the weight data of each frame of sub-image is obtained, the complete image features of the corresponding sub-image are subjected to feature enhancement based on the weight data of each frame of sub-image, and the image features of each frame of sub-image are obtained. The method comprises the steps of giving a weight value of each sub-feature in weight data of each frame of sub-image to each sub-feature (pixel point) in the complete image features of the corresponding sub-image, and realizing global self-adaptive enhancement of the complete image features, so that the image features after the enhancement of each frame of sub-image are obtained.

In the embodiment, the complete image characteristics of each sub-image are obtained by extracting the image characteristics of the component element data of each sub-image; carrying out the de-coloring treatment on each frame of sub-image to obtain a de-coloring image of each frame of sub-image, and carrying out feature extraction on the de-coloring image of each frame of sub-image to obtain basic image features of each frame of sub-image; and then calculating the similarity between the basic image features of each frame of sub-image and the corresponding complete image features to obtain a similarity matrix of each frame of sub-image, performing weight activation on the similarity matrix of each frame of sub-image to obtain weight data of each frame of sub-image, and finally performing feature enhancement on the complete image features of the corresponding sub-image based on the weight data of each frame of sub-image to obtain the image features of each frame of sub-image. The specific process of extracting the characteristic of the component element data of each frame of sub-image to obtain the image characteristic of each frame of sub-image is defined, the basic image characteristic of each frame of sub-image is used as the auxiliary of the corresponding complete image characteristic, the similarity relation of the two groups of characteristics is converted into a weight value and is endowed to each characteristic point of the complete image characteristic, the global self-adaptive weight activation of the complete image characteristic is realized, the characteristic enhancement is carried out for each characteristic point, and the image information expression capability of the image characteristic is improved.

In an embodiment, in the process of calculating the similarity between the basic image features and the corresponding complete image features of each sub-image, compression processing is required for the basic image features and the corresponding complete image features, so that the number of network parameters is reduced, and the calculation is facilitated, and the data processing efficiency is improved. Specifically, the similarity between the basic image features and the corresponding complete image features of each frame of sub-image is calculated to obtain a similarity matrix of each frame of sub-image, and the method comprises the following steps:

s301: and performing matrix dimension conversion on the complete image features of each frame of sub-image, and performing nonlinear activation on the complete image features of each frame of sub-image after the matrix dimension conversion to obtain first features.

After obtaining the complete image features and the basic image features of each frame of sub-image, the server firstly performs matrix dimension conversion on the complete image features of each frame of sub-image, and performs nonlinear activation on the complete image features of each frame of sub-image after the matrix dimension conversion to obtain first features. The nonlinear activation function may be used to perform nonlinear activation on the complete image features of each frame of sub-image after the matrix dimension conversion, for example, the nonlinear activation function softmax, softplus, sigmoid, tanh, reLU. And the nonlinear activation function is adopted to carry out nonlinear activation on the complete image characteristics of each frame of sub-image after the matrix dimension conversion, so that the characteristic expression capability can be enhanced.

Before the matrix dimension conversion is performed on the complete image features of each frame of sub-image, the dimension compression parameters can be adopted to perform compression processing on the complete image features of each frame of sub-image, so that subsequent processing amount can be reduced, then the matrix dimension conversion is performed on the complete image features of each frame of sub-image, further the dimension reduction processing is performed, and therefore the data amount of the first feature can be further reduced, and subsequent calculated amount can be retrieved.

S302: and carrying out global average pooling processing on the basic image characteristics of each frame of sub-image, and carrying out nonlinear activation on the basic image characteristics of each frame of sub-image after global average pooling to obtain second characteristics.

Meanwhile, the server also needs to carry out global average pooling processing on the basic image characteristics of each frame of sub-image so as to realize dimension reduction processing on the characteristics, so that the subsequent processing capacity can be reduced, and then nonlinear activation is carried out on the basic image characteristics of each frame of sub-image after global average pooling to obtain second characteristics, so that the characteristic expression capability can be enhanced. The nonlinear activation function for nonlinear activation at this time may be the same as or different from the nonlinear activation function for processing the complete image feature, and may be determined according to actual needs.

In addition, in order to ensure that the dimensions of the second feature and the first feature obtained later are consistent, before global average pooling processing is performed on the basic image features of each frame of sub-image, the dimension compression parameters can be adopted to perform compression processing on the complete image features of each frame of sub-image, so that the subsequent processing amount can be reduced.

S303: and performing covariance matrix calculation on the first features and the second features by matrix multiplication to obtain a similarity matrix of the complete image features and the basic image features of each frame of sub-image.

After the first feature and the second feature are obtained, covariance matrix calculation is carried out on the first feature and the second feature by matrix multiplication, so that a similarity matrix of the complete image feature and the basic image feature of each frame of sub-image is obtained, and the similarity of each pixel point in the first feature and each pixel point in the second feature is obtained.

In this embodiment, the first feature is obtained by performing matrix dimension conversion on the complete image feature of each sub-image frame, performing nonlinear activation on the complete image feature of each sub-image frame after the matrix dimension conversion, performing global average pooling processing on the basic image feature of each sub-image frame, performing nonlinear activation on the basic image feature of each sub-image frame after global average pooling to obtain the second feature, and performing covariance matrix calculation on the first feature and the second feature by using matrix multiplication to obtain a similarity matrix of the complete image feature and the basic image feature of each sub-image frame, thereby defining a specific process of calculating the similarity matrix of the basic image feature and the corresponding complete image feature of each sub-image frame. Before similarity calculation between features is carried out, feature dimension reduction is carried out on the two types of image features, the number of network parameters is greatly reduced, data processing efficiency is improved, and nonlinear activation is carried out on the integrated image features which are pooled averagely, so that subsequent calculation is facilitated.

In one embodiment, the constituent element data includes text data, pattern data, and symbol data. In step S40, feature extraction is performed on component element data of each sub-image to obtain a complete image feature of each sub-image, which specifically includes the following steps:

s3211: extracting text features of the text data of each frame of sub-image to obtain the text features of each frame of sub-image;

s3212: extracting the characteristics of the pattern data of each frame of sub-image to obtain the pattern characteristics of each frame of sub-image;

s3213: extracting the symbol characteristics of the symbol data of each frame of sub-image to obtain the symbol characteristics of each frame of sub-image;

s3214: and merging the text features, the pattern features and the symbol features to obtain the complete image features of each frame of sub-image.

In this embodiment, the constituent elements of the image to be retrieved include text, pattern, symbol (note, punctuation mark, etc.), and the corresponding constituent element data of each sub-image also includes text data, pattern data, and symbol data.

After the component element data of each frame of sub-image, namely the text data, the pattern data and the symbol data are obtained, text feature extraction is required to be carried out on the text data of each frame of sub-image, so that the text feature of each frame of sub-image is obtained; extracting the characteristics of the pattern data of each frame of sub-image to obtain the pattern characteristics of each frame of sub-image; and extracting the symbol characteristics of the symbol data of each frame of sub-image to obtain the symbol characteristics of each frame of sub-image, and then fusing the text characteristics, the pattern characteristics and the symbol characteristics to obtain the complete image characteristics of each frame of sub-image.

For example, text features, pattern features and symbol features can be directly spliced to obtain complete image features of each frame of sub-image, and the method is simple, convenient and free from losing image information. And each feature in the text feature, the pattern feature and the symbol feature can be used as a node, each feature such as the text feature, the pattern feature and the symbol feature is patterned according to the position information of each feature in each frame of sub-image, the graph feature containing the spatial relationship of each feature is obtained, and the graph feature is used as the complete image feature of each frame of sub-image, so that the complete image feature can more accurately express each frame of sub-image.

In other embodiments, if the constituent element data only includes any one or two types of data of text data, pattern data and symbol data, feature extraction is performed only on the constituent element data included, and the extracted feature information is fused to obtain the complete image feature of each sub-image, and the detailed process is referred to above and will not be repeated herein.

In this embodiment, text feature extraction is performed on text data of each sub-image frame to obtain text features of each sub-image frame, feature extraction is performed on pattern data of each sub-image frame to obtain pattern features of each sub-image frame, symbol feature extraction is performed on symbol data of each sub-image frame to obtain symbol features of each sub-image frame, finally text features, pattern features and symbol features are fused to obtain complete image features of each sub-image frame, specific steps of feature extraction on component element data of each sub-image frame to obtain complete image features of each sub-image frame are clarified, and component elements of the image frame are subdivided to obtain more accurate component element features, so that accuracy of complete image features of each sub-image frame is improved.

In one embodiment, the standard feature is a standard fusion feature obtained by fusing a standard dynamic special effect feature and a standard image feature of the historical dynamic image. As shown in fig. 5, in step S40, that is, based on the standard feature, the image feature and the dynamic special effect category, the similarity between the image to be retrieved and each historical dynamic image is determined, so as to obtain a plurality of similarity values, which specifically includes the following steps:

s41: and determining the dynamic special effect characteristics of the image to be retrieved according to the dynamic special effect category.

In this embodiment, the dynamic special effect feature may be a semantic feature of a dynamic special effect category, that is, a feature obtained by performing semantic extraction on a special effect description text of the dynamic special effect category. Different dynamic effect categories have different text descriptions, so that the different dynamic effect categories can extract completely different semantic features.

For example, the dynamic special effect categories include the categories of animation collage, dynamic typesetting, fault art, equidistant shape, texture effect, deformation transition, liquid movement and the like, and the special effect description texts of the animation collage, the dynamic typesetting, the fault art, the equidistant shape, the texture effect, the deformation transition and the liquid movement are subjected to semantic extraction respectively to obtain dynamic special effect features corresponding to the dynamic special effect categories.

After the dynamic special effect category of the image to be searched is obtained, the dynamic special effect characteristics of the image to be searched need to be determined according to the dynamic special effect category of the image to be searched. After the dynamic special effect category of the image to be searched is obtained, the special effect description text corresponding to the dynamic special effect category of the image to be searched is pulled in the database to be used as a target description text, and then semantic feature extraction is carried out on the target description text to obtain the dynamic special effect feature of the image to be searched. Features obtained after semantic extraction of the special effect description text of the dynamic special effect category are used as dynamic special effect features, and the features are more accurate than features of the subjects of the dynamic special effect category (such as animation collage subjects of the dynamic special effect category), so that the accuracy of the dynamic special effect features can be improved, and the accuracy of image features obtained by subsequent calculation is improved.

In other embodiments, in order to reduce the time of extracting semantic features, semantic extraction may be performed on the special effect description text of each dynamic special effect category in advance to obtain the dynamic special effect feature of each dynamic special effect category, and then each dynamic special effect category and the dynamic special effect feature thereof are bound and stored in the database. After the dynamic special effect type of the image to be searched is obtained, the server directly pulls the dynamic special effect characteristic corresponding to the dynamic special effect type of the image to be searched in the database to serve as the dynamic special effect characteristic of the image to be searched, and the method is simple and quick.

The special effect description text of the animation collage is as follows: characters, drawings, photos, prints, textures, patterns, images and the like are converted into digital photographs, and then the digital photographs are put on an image frame in a collage form, so that the elements move. The dynamic typesetting special effect description text comprises the following steps: the Chinese characters in the image are designed, and various effects such as rotation, wave shape, deformation, mirror image and the like are added, so that the whole picture adopts dynamic effects, and typesetting and animation are combined together. The special effect description text of the fault art is as follows: the color and the image are distorted, broken and misplaced to form new digital fault art, so as to realize artistic processing of the layout and form special aesthetic feeling. The special effect description text of equidistant shapes is: the two-dimensional graph is drawn into equidistant three-dimensional images, more graphs are displayed in one frame, the design of equidistant shapes can reduce messy information, and more space can be provided for placing useful elements. The special effect description text of the texture effect is as follows: the elements with textures are added to the elements, and the elements can be designed by hand painting or materials to manufacture different textures. The special effect description text of the deformation transition is as follows: through seamless transition, no blocking is caused when one image is changed into another image, streamline complete transmission is realized, and the smoothness of the effect is maintained. The special effect description text of the liquid movement is as follows: the design effect of adding liquid to the image main body can be from ripple to wave, to tide rise and fall, and elements such as stretching, smearing, vortex and the like can be added.

In other embodiments, the dynamic special effects category may further include other categories, the corresponding special effects description text may further include other categories, for example, the antique style, and the special effects description text may be: the textures, colors or constituent elements of the image are designed in a pseudo-classic manner, or the annual elements are added, so that the image has an annual sense.

S42: and fusing the image features of the image to be searched and the dynamic special effect features to obtain the fusion features of the image to be searched.

After the dynamic special effect characteristics of the image to be searched are determined according to the dynamic special effect category, the image characteristics of the image to be searched and the dynamic special effect characteristics are fused, and fusion characteristics of the image to be searched are obtained. For example, the image features of the image to be searched and the dynamic special effect features can be spliced to directly obtain the fusion features of the image to be searched, the fusion features are simple and convenient, the fusion features not only comprise the image features but also protect the dynamic special effect features, more information can be described, and the accuracy of the fusion features is improved, so that the accuracy of the subsequent similarity value is improved.

S43: and carrying out similarity calculation on the fusion characteristics of the images to be retrieved and the standard fusion characteristics of each historical dynamic image to obtain a similarity value corresponding to each historical dynamic image.

After obtaining the fusion characteristics of the images to be searched, the server needs to perform similarity calculation on the fusion characteristics of the images to be searched and the standard fusion characteristics of each historical dynamic image to obtain similarity values corresponding to each historical dynamic image, and then a plurality of similarity values are obtained.

The standard features are standard fusion features obtained by fusing the standard dynamic special effect features and the standard image features of the historical dynamic images in advance, so that the accuracy of the standard features on the description of the historical dynamic images is improved, the adaptation of the standard features and the fusion features of the images to be searched is ensured, and the accuracy of the similarity value is improved. The standard dynamic special effect feature and the standard image feature acquiring process of the historical dynamic image are respectively consistent with the dynamic special effect feature and the fusion feature acquiring mode of the image to be searched, and are not repeated here.

S44: and sorting the plurality of historical dynamic images in a descending order according to the similarity value to obtain retrieval data of the images to be retrieved.

And after obtaining the similarity value corresponding to each historical dynamic image, sorting the plurality of historical dynamic images in a descending order according to the similarity value to obtain the retrieval data of the images to be retrieved. In other embodiments, a historical dynamic image with a similarity value greater than a preset value may be selected and recorded as the retrieval data of the image to be retrieved.

In this embodiment, the standard feature is a standard fusion feature obtained by fusing a standard dynamic special effect feature and a standard image feature of a historical dynamic image, the dynamic special effect feature of the image to be searched is determined according to a dynamic special effect category, then the image feature and the dynamic special effect feature of the image to be searched are fused to obtain a fusion feature of the image to be searched, similarity calculation is performed on the fusion feature of the image to be searched and the standard fusion feature of each historical dynamic image to obtain a similarity value corresponding to each historical dynamic image, the historical dynamic images are ordered in descending order according to the similarity value to obtain search data of the image to be searched, the specific steps of determining the similarity of the image to be searched and each historical dynamic image based on the standard feature, the image feature and the dynamic special effect category to obtain a plurality of similarity values are defined, the image feature and the dynamic special effect feature of the image to be searched are fused to obtain the fusion feature, the accuracy of the feature is improved, and similarity calculation is performed on the fusion feature, so that the accuracy of the similarity value is improved, the historical dynamic images which are more similar to the image to be searched are obtained, and the accuracy of the search data is ensured.

In an embodiment, the image features of the image to be retrieved include image features of each sub-image in the image to be retrieved. In step S42, the image features of the image to be retrieved and the dynamic special effect features are fused to obtain the fusion features of the image to be retrieved, which specifically includes the following steps:

s421: and carrying out weight activation of a preset activation function based on the dynamic special effect characteristics to obtain activation weight data.

For example, the preset activation function may be a linear activation function, and the dynamic special effect feature is linearly activated by using the linear activation function to obtain linear activation data, and then the linear activation data is processed based on an attention mechanism to obtain activation weight data including a plurality of weight values. The number of the weight values of the activation weight data is the same as the dimension of the fusion features, so that the feature fusion can be conveniently carried out subsequently.

S422: and fusing the image characteristics of each frame of sub-image based on the activation weight data to obtain the activation characteristics of each frame of sub-image.

After the activation weight data is obtained, the image features of each frame of sub-image are fused based on the activation weight data to obtain the activation feature of each frame of sub-image, namely, each weight value in the activation weight data is endowed to each feature (namely, each pixel point) of the image feature, so that the fusion feature of each frame of sub-image is obtained. When fusing the image features of each frame of sub-image, each sub-feature needs to be fused to improve the accuracy of the activated feature.

S423: and splicing the activation features of each frame of sub-image according to the dynamic display sequence of each frame of sub-image in the image to be searched to obtain the fusion features of the image to be searched.

And finally, splicing the activation characteristics of each frame of sub-image according to the dynamic display sequence of each frame of sub-image in the image to be searched to obtain the fusion characteristics of the image to be searched.

In this embodiment, the activation weight data is obtained by performing the weight activation of the preset activation function based on the dynamic special effect feature, and then the activation weight data including a plurality of weight values is obtained, and then the image features of the image to be retrieved are fused based on the activation weight data, that is, each weight value in the activation weight data is given to each feature (i.e., each pixel point) of the image features, so as to obtain the fusion feature of the image to be retrieved. The dynamic special effect features are converted into weight data, and then the dynamic special effect features are endowed to the image features of each frame of sub-image, so that the effective fusion of the image features is realized, and the information description precision of the fusion features is further improved.

It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present invention.

In one embodiment, an image retrieval device is provided, which corresponds to the image retrieval method in the above embodiment one by one. As shown in fig. 6, the image retrieval apparatus includes a first determination module 601, a special effect identification module 602, a feature extraction module 603, a second determination module 604, and a third determination module 605. The functional modules are described in detail as follows:

a first determining module 601, configured to obtain an image to be retrieved input by a user through a terminal device, and determine whether the image to be retrieved is a dynamic image;

the special effect identification module 602 is configured to input the image to be retrieved into the target identification model to perform dynamic special effect identification if the image to be retrieved is a dynamic image, so as to obtain a dynamic special effect category of the image to be retrieved, where the target identification model is a neural network model obtained by performing deep learning based on dynamic special effect labels of a plurality of historical dynamic images;

the feature extraction module 603 is configured to perform image feature extraction on an image to be retrieved, so as to obtain image features of the image to be retrieved;

a second determining module 604, configured to obtain a plurality of historical dynamic images in the historical dynamic image library and standard features of each historical dynamic image, and determine a similarity between the image to be retrieved and each historical dynamic image based on the standard features, the image features and the dynamic special effect category, so as to obtain a plurality of similarity values;

The third determining module 605 is configured to determine search data of an image to be searched in a plurality of historical moving images based on a plurality of similarity values, and send the search data to the terminal device.

Optionally, the standard feature is a standard fusion feature obtained by fusing a standard dynamic special effect feature and a standard image feature of the historical dynamic image, and the second determining module 604 is specifically configured to:

Optionally, the feature extraction module 603 is specifically configured to:

Optionally, the feature extraction module 603 is specifically further configured to:

Optionally, the constituent element data includes text data, pattern data, and symbol data, and the feature extraction module 603 is specifically further configured to:

Optionally, before the image to be retrieved is input to the target recognition model for dynamic special effect recognition, the first determining module 601 is further configured to determine whether the image to be retrieved includes dynamic special effect data of a pre-buried point;

if the image to be searched does not contain the dynamic special effect data of the pre-buried point, the special effect identification module 602 inputs the image to be searched into the target identification model for dynamic special effect identification, and the dynamic special effect category of the image to be searched is obtained.

If the image to be searched contains the dynamic special effect data of the pre-buried point, the special effect identification module 602 analyzes the dynamic special effect data to obtain the dynamic special effect category of the image to be searched.

For specific limitations of the image retrieval device, reference may be made to the above limitations of the image retrieval method, and no further description is given here. The respective modules in the above-described image retrieval apparatus may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 7. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer equipment is used for storing data used and generated by the image retrieval method, and the data comprise a plurality of historical dynamic images, a target recognition model, a plurality of similarity values and the like. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement an image retrieval method.

In one embodiment, a computer device is provided comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the steps of when executing the computer program:

In one embodiment, a computer readable storage medium is provided having a computer program stored thereon, which when executed by a processor, performs the steps of:

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the various embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions.

The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention, and are intended to be included in the scope of the present invention.

Claims

1. An image retrieval method, comprising:

If the image to be searched is the dynamic image, inputting the image to be searched into a target recognition model for dynamic special effect recognition to obtain a dynamic special effect category of the image to be searched, wherein the target recognition model is a neural network model obtained by performing deep learning based on dynamic special effect labels of a plurality of historical dynamic images;

determining retrieval data of the image to be retrieved in a plurality of historical dynamic images based on a plurality of similarity values, and sending the retrieval data to the terminal equipment;

the image feature extraction is performed on the image to be searched to obtain the image feature of the image to be searched, which comprises the following steps:

Extracting the characteristics of the component element data of the sub-images of each frame to obtain the image characteristics of the sub-images of each frame;

splicing the image features of the sub-images of each frame according to the dynamic display sequence of the sub-images to obtain the image features of the images to be searched;

the feature extraction of the component element data of the sub-images of each frame is carried out to obtain the image features of the sub-images of each frame, and the feature extraction comprises the following steps:

extracting image characteristics of the component element data of the sub-images of each frame to obtain complete image characteristics of the sub-images of each frame;

carrying out de-coloring treatment on the sub-images of each frame to obtain de-coloring images of the sub-images of each frame, and carrying out feature extraction on the de-coloring images of the sub-images of each frame to obtain basic image features of the sub-images of each frame;

calculating the similarity between the basic image characteristics of the sub-images of each frame and the corresponding complete image characteristics to obtain a similarity matrix of the sub-images of each frame;

performing weight activation on the similarity matrix of the sub-images of each frame to obtain weight data of the sub-images of each frame;

and carrying out feature enhancement on the complete image features corresponding to the sub-images based on the weight data of the sub-images of each frame to obtain the image features of the sub-images of each frame.

2. The image retrieval method according to claim 1, wherein the standard feature is a standard fusion feature obtained by fusing a standard dynamic special effect feature and a standard image feature of the historical dynamic image, the determining similarity between the image to be retrieved and each of the historical dynamic images based on the standard feature, the image feature and the dynamic special effect category, and obtaining a plurality of similarity values, includes:

determining the dynamic special effect characteristics of the image to be retrieved according to the dynamic special effect category;

fusing the image features of the image to be searched and the dynamic special effect features to obtain fusion features of the image to be searched;

performing similarity calculation on the fusion characteristics of the images to be searched and the standard fusion characteristics of each historical dynamic image to obtain a similarity value corresponding to each historical dynamic image;

3. The image retrieval method as claimed in claim 1, wherein the constituent element data includes text data, pattern data and symbol data, and the feature extraction is performed on the constituent element data of the sub-images of each frame to obtain complete image features of the sub-images of each frame, including:

Extracting text features of the text data of the sub-images of each frame to obtain the text features of the sub-images of each frame;

extracting features of the pattern data of the sub-images of each frame to obtain pattern features of the sub-images of each frame;

extracting the symbol characteristics of the symbol data of the sub-images of each frame to obtain the symbol characteristics of the sub-images of each frame;

and fusing the text features, the pattern features and the symbol features to obtain complete image features of the sub-images of each frame.

4. A method of image retrieval according to any one of claims 1 to 3, wherein before said inputting said image to be retrieved into a target recognition model for dynamic effect recognition, said method further comprises:

determining whether the image to be searched contains dynamic special effect data of a pre-buried point or not;

and if the image to be searched does not contain the dynamic special effect data of the pre-buried point, inputting the image to be searched into the target recognition model for dynamic special effect recognition to obtain the dynamic special effect category of the image to be searched.

5. The image retrieval method according to claim 4, wherein after said determining whether the image to be retrieved contains dynamic special effect data of a pre-buried point, the method further comprises:

And if the image to be searched contains the dynamic special effect data of the pre-buried point, analyzing the dynamic special effect data to obtain the dynamic special effect category of the image to be searched.

6. An image retrieval apparatus, comprising:

the special effect identification module is used for inputting the image to be searched into a target identification model for dynamic special effect identification if the image to be searched is the dynamic image, so as to obtain the dynamic special effect category of the image to be searched, wherein the target identification model is a neural network model obtained by performing deep learning based on dynamic special effect labels of a plurality of historical dynamic images;

the second determining module is used for acquiring a plurality of historical dynamic images in a historical dynamic image library and standard characteristics of each historical dynamic image, determining the similarity of the image to be searched and each historical dynamic image based on the standard characteristics, the image characteristics and the dynamic special effect category, and obtaining a plurality of similarity values;

A third determining module, configured to determine, based on a plurality of the similarity values, search data of the image to be searched in a plurality of the historical dynamic images, and send the search data to the terminal device;

7. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the image retrieval method according to any one of claims 1 to 5 when the computer program is executed.

8. A computer-readable storage medium storing a computer program, characterized in that the computer program, when executed by a processor, implements the steps of the image retrieval method according to any one of claims 1 to 5.