WO2023092975A1

WO2023092975A1 - Image processing method and apparatus, electronic device, storage medium, and computer program product

Info

Publication number: WO2023092975A1
Application number: PCT/CN2022/096004
Authority: WO
Inventors: 郭彤
Original assignee: 上海商汤智能科技有限公司
Priority date: 2021-11-29
Filing date: 2022-05-30
Publication date: 2023-06-01
Also published as: CN114118278A

Abstract

The present application relates to an image processing method and apparatus, an electronic device, and a storage medium. The method comprises: performing feature extraction on an image to be processed, to obtain a first image feature and a first text feature; respectively determining the image similarity between the first image feature and a second image feature of a reference image and the text similarity between the first text feature and a second text feature of the reference image; and determining an image matching result between the image to be processed and the reference image according to the image similarity and the text similarity.

Description

Image processing method and device, electronic device, storage medium and computer program product

Cross References to Related Applications

This disclosure requires the Chinese patent application No. 202111435625.7 submitted on November 29, 2021. The applicant is Shenzhen Shangtang Technology Co., Ltd., and the application name is "image processing method and device, electronic equipment and storage medium". is incorporated by reference in its entirety into this disclosure.

technical field

The present disclosure relates to the field of computer technology, and in particular to an image processing method and device, electronic equipment, storage media and computer program products.

Background technique

In related technologies, the image comparison method is to extract features of images, and perform similarity calculation based on image features, and then obtain a comparison result. However, when the text content is included in the image, there may be two or more images with high similarity in color, texture, light and shade, layout, style, feature point position, etc., but different text content (for example, Screenshots of the chat interface of social software, news screenshots, etc.).

Contents of the invention

Embodiments of the present disclosure at least provide an image processing method, device, device, storage medium, and computer program product.

The technical scheme of the embodiment of the present disclosure is realized in this way:

According to an aspect of an embodiment of the present disclosure, an image processing method is provided, including: performing feature extraction processing on an image to be processed, and obtaining a first image feature and a first text feature of the image to be processed, wherein the first The text feature is the feature information of the text included in the image to be processed; respectively determine the image similarity between the first image feature and the second image feature of the reference image, and the first text feature and the reference image The text similarity between the second text features of the image, wherein the second text feature is the feature information of the text included in the reference image; according to the image similarity and the text similarity, determine the Image matching results between an image and a reference image.

According to the image processing method of the embodiment of the present disclosure, image features and text features can be obtained for comprehensive comparison. During the comparison process, the text contained in the image is considered, and the color, texture, light and shade, layout, style, and color of the image are reduced. The similarity of the feature point position is high, but the text content is inconsistent, the probability of false positives occurs, and the accuracy of the matching result is improved.

In a possible implementation manner, the method further includes: according to the image matching results between the image to be processed and at least two reference images, determining an object in the at least two reference images that matches the image to be processed image.

In a possible implementation manner, the image similarity between the first image feature and the second image feature of the reference image, and the relationship between the first text feature and the second text feature of the reference image are respectively determined. The text similarity between the texts includes: respectively determining the image similarity between the first image feature and the second image feature of at least two candidate images; according to the image similarity, in the at least two candidate images A reference image is determined from the image; and a text similarity between the first text feature and the second text feature of the reference image is determined.

In a possible implementation manner, the image matching result includes a comprehensive similarity between the reference image and the image to be processed, wherein, according to the image similarity and the text similarity, the The matching result between the image to be processed and the reference image includes one of the following: the product of the image similarity and the text similarity is determined as the comprehensive similarity; the image similarity and The weighted average of the text similarities is determined as the comprehensive similarity.

In a possible implementation manner, determining the weighted average of the image similarity and the text similarity as the comprehensive similarity includes: determining the type of the image to be processed; type, and determine weight information; according to the weight information, perform weighted average processing on the image similarity and the text similarity to obtain the comprehensive similarity.

In a possible implementation manner, determining the weighted average value of the image similarity and the text similarity as the comprehensive similarity includes: determining the area ratio of the area where the text is located in the image to be processed; The weight information is determined according to the area proportion; and the weighted average processing is performed on the image similarity and the text similarity according to the weight information to obtain the comprehensive similarity.

In this way, the weight information of the image similarity and text similarity can be determined by the type of the image to be processed or the area ratio of the text area, which can improve the accuracy of the weight information, thereby improving the accuracy of the matching results .

In a possible implementation manner, determining a weighted average of the image similarity and the text similarity as the comprehensive similarity includes: determining weight information according to the first image feature; The weight information performs weighted average processing on the image similarity and the text similarity to obtain the comprehensive similarity.

In a possible implementation manner, the first text feature includes at least one of the following: semantic feature, format feature, font feature, size feature, typesetting feature, and language feature.

According to an aspect of an embodiment of the present disclosure, there is provided an image processing device, including: a feature extraction part configured to perform feature extraction processing on an image to be processed, and obtain a first image feature and a first text feature of the image to be processed , wherein, the first text feature is the feature information of the text included in the image to be processed; the similarity determining part is configured to respectively determine the difference between the first image feature and the second image feature of the reference image Image similarity, and the text similarity between the first text feature and the second text feature of the reference image, wherein the second text feature is feature information of the text included in the reference image; matching A part configured to determine an image matching result between the image to be processed and the reference image according to the image similarity and the text similarity.

According to an aspect of an embodiment of the present disclosure, there is provided an electronic device, including: a processor; a memory configured to store instructions executable by the processor; wherein, when the processor is configured to call the instructions stored in the memory Implement the above method.

According to an aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium, on which computer program instructions are stored, and when the computer program instructions are executed by a processor, the foregoing method is implemented.

According to an aspect of an embodiment of the present disclosure, a computer program product is provided, the computer program product includes a non-transitory computer-readable storage medium storing a computer program, and when the computer program is read and executed by a computer, the above-mentioned method.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure. Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments with reference to the accompanying drawings.

Description of drawings

The accompanying drawings here are incorporated into the description and constitute a part of the present description. These drawings show embodiments consistent with the present disclosure, and are used together with the description to explain the technical solution of the present disclosure.

FIG. 1 shows a flowchart of an image processing method according to an embodiment of the disclosure;

FIG. 2 shows a flowchart of an image processing method according to an embodiment of the present disclosure;

FIG. 3 shows a flowchart of an image processing method according to an embodiment of the present disclosure;

FIG. 4 shows a flowchart of an image processing method according to an embodiment of the present disclosure;

FIG. 5 shows a schematic diagram of an application of an image processing method according to an embodiment of the present disclosure;

FIG. 6 shows a flowchart of an image processing method according to an embodiment of the present disclosure;

FIG. 7 shows a flowchart of an image processing method according to an embodiment of the present disclosure;

8 shows a block diagram of an image processing device according to an embodiment of the present disclosure;

Fig. 9 shows a block diagram of an electronic device according to an embodiment of the present disclosure;

Fig. 10 shows a block diagram of an electronic device according to an embodiment of the present disclosure.

Detailed ways

Various exemplary embodiments, features, and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. The same reference numbers in the figures indicate functionally identical or similar elements. While various aspects of the embodiments are shown in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.

The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration." Any embodiment described herein as "exemplary" is not necessarily to be construed as superior or better than other embodiments.

The term "at least one" in the present disclosure means any one or any combination of at least two of the multiple types, for example, including at least one of A, B, and C, which can mean including from A, B Any one or more elements selected from the set formed by and C.

In addition, in order to better illustrate the present disclosure, numerous details are given in the following specific embodiments. It will be understood by those skilled in the art that the present disclosure may be practiced without certain of the details. In some embodiments, methods, means, components and circuits well known to those skilled in the art are not described in detail in order to highlight the gist of the present disclosure.

FIG. 1 shows a flow chart of an image processing method according to an embodiment of the present disclosure, the method is executed by an electronic device, and will be described with reference to the steps shown in FIG. 1 .

S101. Perform feature extraction processing on the image to be processed to obtain a first image feature and a first text feature of the image to be processed, wherein the first text feature is feature information of text included in the image to be processed;

S102. Determine the image similarity between the first image feature and the second image feature of the reference image, and the text similarity between the first text feature and the second text feature of the reference image, respectively, wherein , the second text feature is feature information of the text included in the reference image;

S103. Determine an image matching result between the image to be processed and the reference image according to the image similarity and the text similarity.

According to the image processing method of the embodiment of the present disclosure, image features and text features can be obtained for comprehensive comparison. During the comparison process, the text contained in the image is considered, and the color, texture, light and shade, layout, style, and feature of the image are reduced. When the similarity of the point position is high, but the text content is inconsistent, the probability of false positives will increase the accuracy of the matching results.

In related technologies, in the process of determining whether two images match, feature extraction processing is usually performed on the two images, for example, feature extraction is performed through a deep learning neural network, image features of the two images are obtained respectively, and two images are determined The similarity between the image features of the images, if the similarity is higher than a threshold, two images can be considered to match. This method can be used in face recognition, object recognition and other fields. The image features extracted by this method can usually describe the feature information of the pattern level such as color, texture, light and dark, layout, style, feature point position of the image, but it is difficult to recognize the text information when the image contains text information. For example, there may be situations where two images have similar patterns but different text information (for example, screenshots of the chat interface of social software, etc.), in this case, although the image features obtained by the two images have a high similarity, the The meanings expressed by the two images are very different, so it is difficult to determine whether such images match through image features alone.

In a possible implementation, to address the above problems, the image features and text features of the images participating in the comparison can be obtained separately, and a comprehensive judgment can be made based on the image similarity between image features and the text similarity between text features Whether the images participating in the comparison match. Among them, the text features can describe the semantics, format, font, size, typesetting (including the position of the text in the image), language and other feature information of the text. The present disclosure does not limit the feature information described by the text features.

In a possible implementation manner, in S101, the image to be processed may be an image including text information, for example, the image includes one or more characters. When determining whether there is a match between the image to be processed and the reference image, it can be judged comprehensively based on the image similarity and text similarity between the two.

In a possible implementation manner, feature extraction processing may be performed on the image to be processed to obtain the first image feature and the first text feature of the image to be processed. The above features can be obtained by processing the image to be processed through a deep learning neural network. For example, the first image feature of the image to be processed may be extracted through a convolutional neural network. And the text information in the image (for example, the content of the text, the position of the text area, the shape of the character, etc.) can be obtained through the optical character recognition (Optical Character Recognition, OCR) technology, and the first text can be obtained through the recurrent neural network feature. The present disclosure does not limit the acquisition manners of the first image feature and the first text feature.

In a possible implementation manner, the first text features may include at least one of feature information such as semantic features, format features, font features, size features, typesetting features, and language features of the text. Each feature can be obtained separately through a neural network, and weighted average processing is performed to obtain the first text feature. When determining the weight, the weight may be determined based on information such as the type of the image and the number of characters. For example, if the image to be processed is a calligraphy image with only a few characters, the weight of semantic features can be made lower, and the weights of format features, font features, and size features can be higher. For another example, if the image to be processed is a screenshot of a chat interface or a news screenshot that includes a lot of text, the weight of the semantic feature can be made higher, and the weight of other features can be made lower.

In a possible implementation manner, the above features may also be fused in other ways to obtain the first text feature. For example, the above-mentioned features can be expressed in the form of a feature matrix or a feature vector, and the above-mentioned features can be multiplied to obtain the first text feature. The present disclosure does not limit the manner of determining the first text feature.

In a possible implementation, in S102, the reference image is an image to be compared with the image to be processed, for example, the reference image and the image to be processed are images of the same type, for example, both are calligraphy images containing less text , advertising images, etc., or screenshots of chat interfaces or news that contain a lot of text. The reference image can also be any image in the image library, not necessarily the same type as the image to be processed. The present disclosure does not limit the types of the reference image and the image to be processed.

In a possible implementation manner, the acquisition manner of the second image feature and the second text feature of the reference image may be the same as the acquisition manner of the first image feature and the first text feature of the image to be processed respectively. In an example, the second image feature and the second text feature of the reference image can be obtained in advance and stored in the feature library corresponding to the above image library. When performing matching, there is no need to perform feature extraction on the reference image again to improve the matching s efficiency.

In a possible implementation manner, the image similarity between the first image feature and the second image feature, and the text similarity between the first text feature and the second text feature may be determined respectively. In an example, the above features may be feature information in the form of a feature matrix or feature vector, and the above similarity may be determined by determining parameters such as cosine similarity, Jaccard similarity coefficient, Pearson correlation coefficient, and relative entropy. The present disclosure does not limit the calculation methods of image similarity and text similarity.

In a possible implementation, in S103, the image similarity and text similarity can be comprehensively calculated to determine the image matching result between the image to be processed and the reference image, and the matching result can include The overall similarity obtained. The reference image can be matched one-to-one with the image to be processed, and when the comprehensive similarity is greater than or equal to the similarity threshold, it is determined that the two match. Alternatively, multiple reference images in the image library may be matched with the image to be processed, and the target image with the highest matching degree with the image to be processed may be determined in the image library.

FIG. 2 shows a flowchart of an image processing method according to an embodiment of the present disclosure. As shown in FIG. 2, based on FIG. 1, the method further includes:

S201. According to the image matching results between the image to be processed and at least two reference images, determine a target image among the at least two reference images that matches the image to be processed.

In a possible implementation, the image library may include multiple candidate images, and the image similarity between the first image feature of the image to be processed and the second image feature of each candidate image may be determined respectively , and determine the text similarity between the first text feature of the image to be processed and the second text feature of each candidate image, and then determine the comprehensive similarity between the image to be processed and each candidate image.

In a possible implementation manner, if the number of images to be selected in the image database is large, calculating the above two similarities for all images one by one requires a large amount of calculation, and the calculation efficiency is low. To solve this problem, some images can be determined as reference images from a large number of candidate images first, and then the comprehensive similarity between the image to be processed and the reference image can be calculated.

FIG. 3 shows a flowchart of an image processing method according to an embodiment of the present disclosure. As shown in FIG. 3 , based on FIG. 2 , S102 can be implemented through S301 to S303 , which will be described in conjunction with the steps shown in FIG. 3 .

S301. Determine the image similarity between the first image feature and the second image feature of at least two candidate images respectively;

S302. Determine a reference image from the at least two candidate images according to the image similarity;

S303. Determine text similarity between the first text feature and the second text feature of the reference image.

In a possible implementation manner, firstly, the image similarity is used as a screening condition of the reference image, and the image similarity between the image to be processed and each candidate image is determined. Furthermore, according to the image similarity, a reference image with a higher image similarity can be screened out among multiple candidate images. For example, an image with an image similarity higher than a threshold can be screened out as a reference image, or an image with the highest image n (n is a positive integer) images are used as reference images. The filtered reference image has a high similarity with the image to be processed at the pattern level, so the image with the highest matching degree with the image to be processed may come from these reference images, so when determining the text similarity, it is only necessary to determine the filtered The text similarity between part of the reference image and the image to be processed is determined, and the comprehensive similarity between the selected part of the reference image and the image to be processed is determined. In order to save the calculation amount and improve the processing efficiency.

Of course, text similarity can also be used as a filter condition (for example, when the image to be processed mainly includes text content, and when the pattern information is less, the text similarity can be used as a filter condition), firstly, among multiple images to be selected Find the reference image with the highest text similarity with the image to be processed, then determine the image similarity between the reference image and the image to be processed, and then determine the comprehensive similarity between the reference image and the image to be processed. The present disclosure does not limit the screening conditions.

Fig. 4 shows a flowchart of an image processing method according to an embodiment of the present disclosure. As shown in Fig. 4, based on Fig. 1, S103 in Fig. 1 can be realized by at least one of S1031 and S1032, which will be shown in conjunction with Fig. 4 steps are explained.

S1031. Determine the product of the image similarity and the text similarity as the comprehensive similarity;

S1032. Determine a weighted average of the image similarity and the text similarity as the comprehensive similarity.

In a possible implementation, both the image similarity and the text similarity can be numeric values in the form of percentages, for example, if the image similarity is 98%, the text similarity is 95%, etc., the two can be multiplied, and The product is determined as the composite similarity.

In another possible implementation, the weighted average of the two can also be calculated. For example, the two similarities can be considered to have the same importance. Therefore, the weights of both can be set to 1, and directly calculate the average of the two.

Alternatively, the weights of the two may be determined first, and then weighted average processing is performed. There are many ways to determine the weight. For example, it can be based on the number of characters included in the text. When the number is large, the weight of the text similarity can be made higher, otherwise, the weight of the text similarity can be lowered. The weight value can also be determined through the characteristics of the image to be processed.

In a possible implementation manner, S1032, determining a weighted average of the image similarity and the text similarity as the comprehensive similarity may be implemented in the following manner: determining the type of the image to be processed; Determine weight information according to the type of the image to be processed; perform weighted average processing on the image similarity and the text similarity according to the weight information to obtain the comprehensive similarity.

In a possible implementation manner, the type of the image to be processed may be determined according to the source of the image, for example, the source of the image to be processed is a screenshot of a communication tool interaction page or a news website page, and its type may be "communication Tool interaction page screenshot", or "news website page screenshot", or, the source of the image to be processed is a street camera, and its type can be "street view image". Alternatively, the source of the image to be processed is an access control camera, and its type may be "face image" or the like.

In a possible implementation, the type of the image to be processed can also be determined according to the classification mark of the image. The classification mark of the image can be added manually or automatically when the image to be processed is generated, such as the above "communication tool Screenshot of interactive page".

In a possible implementation, the type of the image to be processed can also be determined based on the first image feature. For example, the above-mentioned first image feature is feature information in the form of a matrix or vector, and the first image feature can be deconvoluted , activation, etc., to obtain the type of the image to be processed, for example, it may be determined that the image to be processed is a face image, a street view image, a news screenshot, a calligraphy image, and the like. The type of the image to be processed may also be defined and set by the user, and the present disclosure does not limit the manner of determining the type of the image to be processed.

In a possible implementation manner, the weight information may be determined based on the type of the image to be processed. For example, when the type of image to be processed indicates that the image is mainly characterized by image features (such as face images, landscape images, etc.), the weight of image similarity can be made higher, and the weight of text similarity can be lower . For another example, when the type of image to be processed indicates that the image is mainly characterized by text features (such as news screenshots, web page screenshots, etc.), the weight of image similarity can be lower, and the weight of text similarity can be higher. .

In a possible implementation manner, corresponding weights may be set for each type of image in advance, and weight information may also be calculated based on the type. For example, the type of the image to be processed can be expressed in the form of probability. For example, the probability of the type of the image to be processed is a news screenshot is 95%, and the probability of being other types is 5%. Then the weight information can be calculated based on this data. For example, various types of probabilities may be used as elements of a vector, and the vector may be activated to obtain the above weight information. The present disclosure does not limit the method of calculating the weight.

In a possible implementation manner, S1032. Determine the weighted average of the image similarity and the text similarity as the comprehensive similarity, which may also be implemented in the following manner: determine the text in the image to be processed The area ratio of the area where it is located; according to the area ratio, determine the weight information; according to the weight information, perform weighted average processing on the image similarity and the text similarity to obtain the comprehensive similarity Spend.

In a possible implementation manner, the area proportion of the region where the text is located may be determined based on the first image feature. The first image feature can be used to represent the layout of the image, and the area ratio of the area where the text is located can be calculated based on the first image feature.

In a possible implementation manner, the weight information may be determined based on the area ratio. For example, the area proportion of the region where the text is located can be determined as the weight of the text similarity, and then the weight of the image similarity can be calculated. For another example, the area ratio of the area where the text is located can be activated to obtain the above weight information. The present disclosure does not limit the specific method for calculating the weight.

In a possible implementation, the type of the image to be processed and the area ratio of the area where the text is located can also be obtained in other ways, for example, the type of the image to be processed can be determined through manual labeling, and the Determines the proportion of the area where the text is located. In the example, the area ratio of the area where the text is located can also be determined based on attributes such as color and shape. For example, in images such as screenshots of news websites, the font is usually black, and the area area where the text is located can be determined by the proportion of black. Alternatively, the area where the font is located in the above screenshot is a neat row or column, and the area ratio of the area where the text is located can be determined based on the area of the rectangle presented by the row or column. This disclosure does not limit the method of determining the area ratio . The present disclosure does not limit the manner of determining the type of the image to be processed and the area proportion of the region where the text is located.

In a possible implementation manner, the weight may be positively correlated with the area ratio, for example, the larger the area ratio of the text area, the lower the weight of the image similarity, and the higher the weight of the text similarity.

In a possible implementation, the method of determining the area ratio of the area where the text is located can also be used in scenarios where the text itself is also an image. For example, the text is a word art, and the word art itself is both text and an image. The area ratio of the region where it is located is 100%, and the area ratio of the region where the image is located is also 100%, so the weights of the two can be made equal.

In a possible implementation, the number of words in the image can also be counted, for example, the interval to which the number of words belongs can be associated with the weight value, the more the number of words, the higher the weight of the text similarity, for example, the number of words If the number of words is greater than or equal to 100, the weight of text similarity is 0.8; if the number of words is greater than or equal to 50 and less than 100 words, the weight of text similarity is 0.5; if the number of words is less than 50 words, the weight of text similarity is 0.3, etc. , the present disclosure does not limit the corresponding relationship between the number of words and the weight.

In a possible implementation manner, S1032, determining the weighted average of the image similarity and the text similarity as the comprehensive similarity may also be implemented in the following manner: according to the first image feature, Determine weight information; perform weighted average processing on the image similarity and the text similarity according to the weight information to obtain the comprehensive similarity. The characteristics of the image to be processed can be determined by the first image feature representing the feature information of the pattern level of the image to be processed. For example, the type of the image to be processed can be determined according to the first image feature in the above manner, or it can be determined Handle features such as the area ratio of the area where the text is located in the image. And based on this feature to determine the weight. Alternatively, the weight information may also be obtained directly through a trained network model according to the first image feature, which is not limited in the present disclosure.

In a possible implementation manner, after the weight information is determined, weighted average processing may be performed on the image similarity and the text similarity based on the weight information to obtain a comprehensive similarity between the image to be processed and the reference image. Furthermore, a matching result can be obtained based on the comprehensive similarity. For example, when a one-to-one comparison is performed between the image to be processed and the reference image, it may be determined that the image to be processed matches the reference image when the comprehensive similarity is higher than a threshold. For another example, when the target image matching the image to be processed is determined in the image library, the reference image with the highest comprehensive similarity with the image to be processed may be determined as the target image matching the image to be processed.

According to the image processing method of the embodiment of the present disclosure, image features and text features can be obtained for comprehensive comparison. During the comparison process, the text contained in the image is considered, and the type of image to be processed or the area ratio of the area where the text is located To determine the weight information of image similarity and text similarity, and then obtain the matching result. It reduces the probability of false positives when the image color, texture, light and dark, layout, style, feature point position, etc. have a high similarity, but the text content is inconsistent, and improves the accuracy of the matching result.

Fig. 5 shows a schematic diagram of the application of the image processing method according to an embodiment of the present disclosure. As shown in Fig. 5, the image to be processed is an image including text content, when determining the matching result of the image to be processed and the reference image in the image library , can comprehensively consider the image similarity between the first image feature of the image to be processed and the second image feature of the reference image, and the text similarity between the first text feature of the image to be processed and the second text feature of the reference image .

In a possible implementation manner, the second image feature and the second text feature of each reference image in the image library may be acquired in advance, and stored in the feature library for comparison with the features of the image to be processed.

In a possible implementation manner, the first image feature and the first text feature of the image to be processed may be extracted. For example, the first image feature of the image to be processed can be extracted through a convolutional neural network, the text information in the image can be obtained through OCR technology, and the first text feature of the image to be processed can be obtained through a recurrent neural network.

In a possible implementation, firstly, the first text feature can be used for screening, and from the images in the image library, the n images with the highest image similarity can be selected as reference images, and then the first text feature and these reference images can be determined. Text similarity of images. Furthermore, the comprehensive similarity between the image to be processed and these reference images can be determined through weighted average processing.

In a possible implementation manner, the weights of the text similarity and the image similarity may be determined, for example, the weight of the text similarity may be determined according to the area proportion of the region where the text is located in the image to be processed. In an example, the area proportion of the area where the text is located can be determined based on the first image feature of the image to be processed, and then the area proportion of the area where the text is located is determined as the weight x of the text similarity, and then 1-x is determined is the weight of image similarity.

In a possible implementation, after the above weights are determined, weighted average processing can be performed on the image similarity and text similarity based on the above weights, so as to respectively determine the comprehensive similarity between the image to be processed and each reference image, And the reference image with the highest comprehensive similarity is determined as the target image matching the image to be processed.

In a possible implementation, the image processing method can be used in fields such as network supervision. For example, for a website with a strong anti-data crawling function, it is difficult to directly supervise the text content published by the website. Take screenshots of the content published on the website, and compare the screenshots with preset images containing specific words or sentences to determine whether there are specific words or sentences in the screenshots, and then determine whether the content of the website includes specific words or sentences, In this way, publishers of website content can be effectively supervised. The present disclosure does not limit the application scenarios of the image processing method.

FIG. 6 shows a flowchart of an image processing method according to an embodiment of the present disclosure. As shown in FIG. 6, the method includes:

S601. Obtain a picture with text content (abbreviation: search picture) (corresponding to the image to be processed in the above embodiment);

S602. Search for the image corresponding to the image feature (corresponding to the first image feature in the above embodiment) corresponding to the picture and the image corresponding to the picture (corresponding to the reference image in the above embodiment) in the search base library (corresponding to the image library in the above embodiment) Compare the feature library; and compare the text feature library corresponding to the retrieval picture (corresponding to the first text feature in the above-mentioned embodiment) with the text feature library corresponding to the picture in the search base library;

S6021, extracting image features of the retrieved image based on a deep learning algorithm;

S6022. Perform a 1:N similarity calculation between the image features of the retrieved picture and the image features of the image feature database;

S6023 Obtain the result of calculating the similarity of each image feature;

S6024. Recognize and extract the text content of the retrieved image through the OCR algorithm;

S6025. Further extract text features from the extracted text content (corresponding to the first text feature in the above embodiment);

S6026. Perform a 1:N similarity calculation between the text features of the retrieved image and the text features of the text feature library;

S6027. Obtain the result of calculating the similarity of each character feature;

S603. Fusion calculation to obtain a comprehensive sorting result;

S6031. Accumulate and calculate the similarity of image features and the similarity of text features corresponding to the pictures in the search base library, and obtain a comprehensive similarity result;

S6032. Sort the comprehensive similarity results from high to low, and determine the image corresponding to the comprehensive similarity result with the largest numerical value as the image most similar to the retrieved picture.

Fig. 7 shows a flow chart of an image processing method according to an embodiment of the present disclosure. As shown in Fig. 7, firstly, perform S701 and obtain a retrieval picture; then, perform S7021 image feature comparison and S7023 text feature respectively on the retrieval picture Compare; then, through the similarity calculation of S7022 and the similarity calculation of S7024, the similarity results of A_i, B_i, C_i, and D_i in the retrieved picture and image feature library are correspondingly obtained, and the similarity results of the retrieved picture and text feature library are obtained. The similarity results of A_w, B_w, C_w, D_w. As shown in Table 1, among them, the similarity result of retrieval picture and A_i is 94%, the similarity result with B_i is 96%, the similarity result with C_i is 91%, the similarity result with D_i is 80%, The result of similarity with A_w is 98%, the result of similarity with B_w is 90%, the result of similarity with C_w is 85%, and the result of similarity with D_w is 60%.

Table 1

According to the result of comparing the retrieved picture with the image feature database and the text feature database, the relevant comparison logic of S703 is: compare the retrieved picture with the image feature database, and determine the image corresponding to the maximum similarity result as the one with the retrieved image. The image most similar to the picture, as shown in Table 1, the maximum value of the similarity result obtained by comparing the retrieved image with the image features of the image feature library is 96%, and the image corresponding to the maximum similarity result is a B image. Thus, the B image is determined to be the image most similar to the retrieved picture.

The disclosed comparison logic of S704 is: compare the retrieved picture with the image features in the image feature database and the text features in the text feature database respectively, and compare the images corresponding to the image features in the image feature database and the text features in the text feature database The similarity results of A and A are added together to obtain a comprehensive similarity, and the image corresponding to the maximum comprehensive similarity is determined to be the most similar image to the retrieved image. It can be seen from Table 1 that the composite similarity result of the retrieved image and A image is 192%. The comprehensive similarity result with image B is 186%, the comprehensive similarity result with image C is 176%, and the comprehensive similarity result with image D is 140%. Therefore, the comprehensive similarity result value between the retrieved image and image A is the largest , so that the A image is determined to be the most similar image to the retrieved image.

A related image comparison method is to extract depth features (corresponding to the image features in the above-mentioned embodiments) of the image, and perform similarity calculation based on the image depth features, and then obtain a comparison result. However, in the face of image comparison scenarios with a large proportion of text content, such as screenshots of social APPs (Weibo, Facebook, Twitter, etc.), their image features are often relatively similar. If only through traditional image comparison methods For comparison, there may be more false positive data, that is, the images are relatively similar, but the text content is completely different.

In order to minimize the problem of more false positives in subdivided scenarios (image comparison with large text content), in the example based on the relevant image comparison logic, in addition to extracting deep features from the image, The text content of the image will be identified and extracted through OCR technology, and the text features will be extracted from the text content, and then the similarity calculation will be performed based on the text features to obtain the result of text feature comparison. Then, the image feature comparison result and the text feature comparison result are fused and calculated, for example, the two corresponding comparison results are added to obtain the result with the highest comprehensive similarity. According to the image processing method of this example, only images with a relatively high similarity between image frames and text content will be output, reducing false positives in this scenario and improving the comparison accuracy.

It can be understood that the above-mentioned method embodiments mentioned in this disclosure can all be combined with each other to form a combined embodiment without violating the principle and logic. Those skilled in the art can understand that, in the above method in the specific implementation manner, the specific execution order of each step should be determined according to its function and possible internal logic.

In addition, the present disclosure also provides image processing devices, electronic equipment, computer-readable storage media, and programs, all of which can be used to implement any image processing method provided in the present disclosure. For the corresponding technical solutions and descriptions, please refer to the corresponding records in the method section . FIG. 8 shows a block diagram of an image processing device according to an embodiment of the present disclosure. As shown in FIG. 8 , the device includes: a feature extraction part 801 configured to perform feature extraction processing on an image to be processed, and obtain an image of the image to be processed The first image feature and the first text feature, wherein the first text feature is the feature information of the text included in the image to be processed; the similarity determining part 802 is configured to determine the first image feature and the first text feature respectively the image similarity between the second image features of the reference image, and the text similarity between the first text feature and the second text feature of the reference image, wherein the second text feature is the reference The feature information of the text included in the image; the matching part 803 is configured to determine an image matching result between the image to be processed and the reference image according to the image similarity and the text similarity.

In a possible implementation manner, the apparatus further includes: a target image determining part configured to determine, according to the image matching results between the image to be processed and at least two reference images, the The target image to be matched with the image to be processed.

In a possible implementation manner, the similarity determining part 802 is further configured to: respectively determine the image similarity between the first image feature and the second image feature of at least two candidate images; The image similarity is determined to determine a reference image among the at least two candidate images; and the text similarity between the first text feature and the second text feature of the reference image is determined.

In a possible implementation manner, the image matching result includes a comprehensive similarity between the reference image and the image to be processed, and the matching part 803 is further configured to: combine the image similarity with the The product of the text similarity is determined as the comprehensive similarity; or, the weighted average of the image similarity and the text similarity is determined as the comprehensive similarity.

In a possible implementation manner, the matching part 803 is further configured to: determine the type of the image to be processed; determine weight information according to the type of the image to be processed; The image similarity and the text similarity are weighted and averaged to obtain the comprehensive similarity.

In a possible implementation manner, the matching part 803 is further configured to: determine the area ratio of the area where the text is located in the image to be processed; determine the weight information according to the area ratio; The weight information is used to perform weighted average processing on the image similarity and the text similarity to obtain the comprehensive similarity.

In a possible implementation manner, the matching part 803 is further configured to: determine weight information according to the first image feature; compare the image similarity and the text similarity according to the weight information Perform weighted average processing to obtain the comprehensive similarity.

In a possible implementation manner, the first text features include at least one of semantic features, format features, font features, size features, typesetting features, and language features.

In some embodiments, the functions or parts included in the apparatus provided by the embodiments of the present disclosure may be configured to execute the methods described in the above method embodiments, and the implementation manner may refer to the descriptions of the above method embodiments.

Embodiments of the present disclosure also provide a computer-readable storage medium, on which computer program instructions are stored, and the above-mentioned method is implemented when the computer program instructions are executed by a processor. The computer readable storage medium may be a non-transitory computer readable storage medium.

An embodiment of the present disclosure also proposes an electronic device, including: a processor; a memory configured to store instructions executable by the processor; wherein the processor is configured to invoke the instructions stored in the memory to execute the above method.

Embodiments of the present disclosure also provide another computer program product configured to store computer-readable instructions, and when the instructions are executed, the computer executes the operations of the image processing method provided by any of the above embodiments.

Electronic devices may be provided as terminals, servers, or other forms of devices.

FIG. 9 shows a block diagram of an electronic device 900 according to an embodiment of the present disclosure. For example, the electronic device 900 may be a terminal such as a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, or a personal digital assistant.

Referring to Fig. 9, the electronic device 900 may include one or more of the following components: a processing component 901, a memory 902, a power supply component 903, a multimedia component 904, an audio component 905, an input/output (Input/Ouput, I/O) interface 906, A sensor component 907, and a communication component 908.

The processing component 902 generally controls the overall operations of the electronic device 900, such as those associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 901 may include one or more processors 908 to execute instructions to complete all or part of the steps of the above method. Additionally, processing component 901 may include one or more modules to facilitate interaction between processing component 901 and other components. For example, processing component 901 may include a multimedia portion to facilitate interaction between multimedia component 904 and processing component 901 .

The memory 902 is configured to store various types of data to support operations at the electronic device 900 . Examples of such data include instructions for any application or method operating on the electronic device 900, contact data, phonebook data, messages, pictures, videos, and the like. The memory 902 can be realized by any type of volatile or non-volatile storage device or their combination, such as Static Random-Access Memory (Static Random-Access Memory, SRAM), Electrically Erasable Programmable Read-Only Memory (Electrically -Erasable Programmable Read-Only Memory, EEPROM), Erasable Programmable Read-Only Memory (Erasable Programmable Read-Only Memory, EPROM), Programmable Read-Only Memory (Programmable read-only memory, PROM), Read Only Memory (Read -Only Memory, ROM), magnetic memory, flash memory, magnetic disk or optical disk.

The power supply component 903 provides power to various components of the electronic device 900 . Power supply components 903 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for electronic device 900 .

The multimedia component 904 includes a screen providing an output interface between the electronic device 900 and the user. In some embodiments, the screen may include a liquid crystal display (Liquid Crystal Display, LCD) and a touch panel (Touch Panel, TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may not only sense an edge of a touch or slide action, but also detect a duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 904 includes at least one of a front camera and a rear camera. When the electronic device 900 is in an operation mode, such as a shooting mode or a video mode, at least one of the front camera and the rear camera can receive external multimedia data. Each front camera and rear camera can be a fixed optical lens system or have focal length and optical zoom capability.

The audio component 905 is configured to at least one of output and input an audio signal. For example, the audio component 905 includes a microphone (Microphone, MIC), which is configured to receive external audio signals when the electronic device 900 is in an operation mode, such as a calling mode, a recording mode and a voice recognition mode. Received audio signals may be further stored in memory 902 or sent via communication component 908 . In some embodiments, the audio component 905 also includes a speaker configured to output audio signals.

The I/O interface 906 provides an interface between the processing component 901 and a peripheral interface module, which may be a keyboard, a click wheel, a button, and the like. These buttons may include, but are not limited to: a home button, volume buttons, start button, and lock button.

Sensor assembly 907 includes one or more sensors configured to provide various aspects of status assessment for electronic device 900 . For example, the sensor component 907 can detect the open/closed state of the electronic device 900, the relative positioning of components, for example, the components are the display and the keypad of the electronic device 900, the sensor component 907 can also detect the electronic device 900 or a Changes in the position of components, presence or absence of user contact with the electronic device 900 , electronic device 900 orientation or acceleration/deceleration and temperature changes in the electronic device 900 . The sensor assembly 907 may include a proximity sensor configured to detect the presence of nearby objects in the absence of any physical contact. The sensor assembly 907 may also include an optical sensor, such as a CMOS image sensor (Complementary Metal-Oxide-Semiconductor, CMOS) or a CCD image sensor (Charge Coupled Device, CCD), configured to be used in imaging applications. In some embodiments, the sensor component 907 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor or a temperature sensor.

The communication component 908 is configured to facilitate wired or wireless communication between the electronic device 900 and other devices. The electronic device 900 can access a wireless network based on communication standards, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 908 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 908 further includes a near field communication (Near Field Communication, NFC) module to facilitate short-range communication. For example, the NFC module can be based on Radio Frequency Identification (RFID) technology, Infrared Data Association (Infrared Data Association, IrDA) technology, Ultra Wide Band (UWB) technology, Bluetooth (Bit Torrent, BT) technology and other techniques to achieve.

In an exemplary embodiment, the electronic device 900 may be implemented by one or more application-specific integrated circuits (Application Specific Integrated Circuit, ASIC), digital signal processors (Digital Signal Processing, DSP), digital signal processing devices (Digital Signal Processing Device , DSPD), Programmable Logic Device (Pulsed Laser Deposition, PLD), Field Programmable Gate Array (Field Programmable Gate Array, FPGA), Controller, Microcontroller, Microprocessor or other electronic component implementation, configured to execute the above method.

In an exemplary embodiment, there is also provided a non-volatile computer-readable storage medium, such as a memory 902 including computer program instructions, which can be executed by the processor 909 of the electronic device 900 to implement the above method.

FIG. 10 shows a block diagram of an electronic device 1000 according to an embodiment of the present disclosure. For example, the electronic device 1000 may be provided as a server. Referring to FIG. 10 , electronic device 1000 includes processing component 1001 , which also includes one or more processors, and a memory resource represented by memory 1002 configured to store instructions executable by processing component 1001 , such as application programs. The application program stored in memory 1002 may include one or more modules each corresponding to a set of instructions. In addition, the processing component 1001 is configured to execute instructions to perform the above method.

The electronic device 1000 may also include a power supply component 1003 configured to perform power management of the electronic device 1000 , a wired or wireless network interface 1004 configured to connect the electronic device 1000 to a network, and an I/O interface 1005 . The electronic device 1000 can operate based on an operating system stored in the memory 1002, such as Windows Server™, Mac OS X™, Unix™, Linux™, FreeBSD™ or the like.

In an exemplary embodiment, there is also provided a non-volatile computer-readable storage medium, such as the memory 1002 including computer program instructions, which can be executed by the processing component 1001 of the electronic device 1000 to implement the above method.

The present disclosure may be at least one of a system, method and computer program product. A computer program product may include a computer readable storage medium having computer readable program instructions thereon for causing a processor to implement various aspects of the present disclosure.

A computer readable storage medium may be a tangible device that can retain and store instructions for use by an instruction execution device. A computer readable storage medium may be, for example, but is not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of computer-readable storage media include: portable computer disks, hard disks, Random-Access Memory (RAM), Read-Only Memory (ROM), Erasable Programmable Read-Only Memory (EPROM) or flash memory, Static Random-Access Memory (Static Random-Access Memory, SRAM), Portable Compact Disc Read-Only Memory (Compact Disc Read-Only Memory, CD-ROM), Digital Versatile Disc (DVD), memory sticks, floppy disks, mechanically encoded devices such as punched cards or raised structures in grooves with instructions stored thereon, and any suitable combination. As used herein, computer-readable storage media are not to be construed as transient signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., pulses of light through fiber optic cables), or transmitted electrical signals.

The computer-readable program instructions described herein can be downloaded from a computer-readable storage medium to a respective computing/processing device, or downloaded to an external computer or external storage device over a network, such as at least one of the Internet, a local area network, a wide area network, and a wireless network . The network may include at least one of copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and edge servers. A network adapter card or a network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in each computing/processing device .

Computer program instructions for performing the operations of the present disclosure may be assembly instructions, Instruction Set Architectures (ISA) instructions, machine instructions, machine-dependent instructions, microcode, firmware instructions, state setting data, or in the form of one or more source or object code written in any combination of programming languages, including object-oriented programming languages—such as Smalltalk, C++, etc., and conventional procedural programming languages—such as the “C” language or similar programming languages. Computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server implement. In cases involving a remote computer, the remote computer can be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or it can be connected to an external computer such as use an Internet service provider to connect via the Internet). In some embodiments, electronic circuits, such as programmable logic circuits, field programmable gate arrays (Field Programmable Gate Array, FPGA) or programmable logic arrays (Programmable Logic Array, PLA), the electronic circuit can execute computer-readable program instructions, thereby implementing various aspects of the present disclosure.

Aspects of the present disclosure are described herein with reference to at least one of flowchart illustrations and block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It should be understood that each block of at least one of the flowchart and block diagrams, and combinations of at least one of blocks in the flowchart and block diagrams, can be implemented by computer-readable program instructions.

These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine such that when executed by the processor of the computer or other programmable data processing apparatus , producing a device for realizing the functions/actions specified in one or more blocks of at least one of the flowchart and the block diagram. These computer-readable program instructions can also be stored in a computer-readable storage medium, and these instructions cause at least one of computers, programmable data processing devices and other devices to work in a specific way, so that the computer-readable The medium then includes an article of manufacture including instructions for implementing various aspects of the functions/acts specified in one or more blocks of at least one of flowcharts and block diagrams.

It is also possible to load computer-readable program instructions into a computer, other programmable data processing device, or other equipment, so that a series of operational steps are performed on the computer, other programmable data processing device, or other equipment to produce a computer-implemented process , so that instructions executed on computers, other programmable data processing devices, or other devices implement the functions/actions specified in one or more blocks of at least one of the flowcharts and block diagrams.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in a flowchart or block diagram may represent a module, a portion of a program segment, or an instruction that includes one or more Executable instructions. In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved. It is also to be noted that each block of at least one of the block diagrams and flowcharts, and combinations of blocks of at least one of the block diagrams and flowcharts, may be implemented with dedicated hardware-based devices that perform specified functions or actions. system, or it may be implemented by a combination of special purpose hardware and computer instructions.

The computer program product can be specifically realized by means of hardware, software or a combination thereof. In an optional embodiment, the computer program product is embodied as a computer storage medium, and in another optional embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK) etc. wait.

Having described various embodiments of the present disclosure above, the foregoing description is exemplary, not exhaustive, and is not limited to the disclosed embodiments. Many modifications and alterations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The choice of terminology used herein is intended to explain the principle of each embodiment, practical application or improvement of technology in the market, or to enable other ordinary skilled persons in the technical field to understand each embodiment disclosed herein.

Industrial Applicability

The image processing method of the embodiment of the present disclosure can obtain image features and text features for comprehensive comparison. During the comparison process, the text contained in the image is considered, and the type of the image to be processed or the area ratio of the area where the text is located, etc. To determine the weight information of image similarity and text similarity, and then obtain the matching result. In this way, even if the similarity of image color, texture, light and shade, layout, style, feature point position, etc. is high, but the text content is inconsistent, the determined text similarity can be used to reduce false positives in image matching The probability of , which improves the accuracy of the matching results.

Claims

An image processing method, the method is executed by an electronic device, the method comprising:

performing feature extraction processing on the image to be processed to obtain a first image feature and a first text feature of the image to be processed, wherein the first text feature is feature information of text included in the image to be processed;

respectively determining the image similarity between the first image feature and the second image feature of the reference image, and the text similarity between the first text feature and the second text feature of the reference image, wherein the The second text feature is feature information of the text included in the reference image;

An image matching result between the image to be processed and the reference image is determined according to the image similarity and the text similarity.
The method according to claim 1, wherein the method further comprises:

According to an image matching result between the image to be processed and at least two reference images, a target image matching the image to be processed among the at least two reference images is determined.
The method according to claim 2, wherein the image similarity between the first image feature and the second image feature of the reference image, and the first text feature and the second text feature of the reference image are respectively determined. Textual similarity between features, including:

respectively determining image similarities between the first image features and the second image features of at least two candidate images;

determining a reference image among the at least two candidate images according to the image similarity;

A text similarity between the first text feature and a second text feature of the reference image is determined.
The method according to any one of claims 1 to 3, wherein the image matching result includes a comprehensive similarity between the reference image and the image to be processed,

Wherein, according to the image similarity and the text similarity, determining the matching result between the image to be processed and the reference image includes one of the following:

determining the product of the image similarity and the text similarity as the comprehensive similarity;

A weighted average of the image similarity and the text similarity is determined as the comprehensive similarity.
The method according to claim 4, wherein determining the weighted average of the image similarity and the text similarity as the comprehensive similarity includes:

determining the type of the image to be processed;

Determine weight information according to the type of the image to be processed;

Perform weighted average processing on the image similarity and the text similarity according to the weight information to obtain the comprehensive similarity.
The method according to claim 4, wherein determining the weighted average of the image similarity and the text similarity as the comprehensive similarity includes:

Determine the proportion of the area where the text is located in the image to be processed;

Determine the weight information according to the area ratio;

Perform weighted average processing on the image similarity and the text similarity according to the weight information to obtain the comprehensive similarity.
The method according to any one of claims 4 to 6, wherein determining the weighted average of the image similarity and the text similarity as the comprehensive similarity includes:

Determine weight information according to the first image feature;

Performing weighted average processing on the image similarity and the text similarity according to the weight information to obtain the comprehensive similarity.
The method according to any one of claims 1 to 7, wherein the first text features include at least one of the following: semantic features, format features, font features, size features, typesetting features and language features.
An image processing device, comprising:

The feature extraction part is configured to perform feature extraction processing on the image to be processed, and obtain a first image feature and a first text feature of the image to be processed, wherein the first text feature is the text included in the image to be processed feature information;

a similarity determination part configured to determine image similarity between the first image feature and the second image feature of the reference image, and between the first text feature and the second text feature of the reference image, respectively text similarity, wherein the second text feature is feature information of the text included in the reference image;

The matching part is configured to determine an image matching result between the image to be processed and the reference image according to the image similarity and the text similarity.
The device according to claim 9, wherein the device further comprises:

The target image determining part is configured to determine a target image among the at least two reference images that matches the image to be processed according to an image matching result between the image to be processed and at least two reference images.
The device according to claim 10, wherein the similarity determining part is further configured to:

respectively determining image similarities between the first image features and the second image features of at least two candidate images;

determining a reference image among the at least two candidate images according to the image similarity;

A text similarity between the first text feature and a second text feature of the reference image is determined.
The device according to any one of claims 9 to 11, wherein the image matching result includes a comprehensive similarity between the reference image and the image to be processed, and the matching part is further configured to:

determining the product of the image similarity and the text similarity as the comprehensive similarity;

Alternatively, a weighted average of the image similarity and the text similarity is determined as the comprehensive similarity.
The apparatus of claim 12, wherein the matching section is further configured to:

determining the type of the image to be processed;

Determine weight information according to the type of the image to be processed;

Perform weighted average processing on the image similarity and the text similarity according to the weight information to obtain the comprehensive similarity.
The apparatus of claim 12, wherein the matching section is further configured to:

Determine the proportion of the area where the text is located in the image to be processed;

determining the weight information according to the area ratio;

Perform weighted average processing on the image similarity and the text similarity according to the weight information to obtain the comprehensive similarity.
The device according to any one of claims 12 to 14, wherein the matching part is further configured to:

Determine weight information according to the first image feature;

Performing weighted average processing on the image similarity and the text similarity according to the weight information to obtain the comprehensive similarity.
An electronic device comprising:

processor;

memory configured to store processor-executable instructions;

Wherein, the processor is configured to implement the method according to any one of claims 1 to 8 when calling the instructions stored in the memory.
A computer-readable storage medium, on which computer program instructions are stored, and when the computer program instructions are executed by a processor, the method according to any one of claims 1 to 8 is realized.
A computer program product, the computer program product comprising a non-transitory computer-readable storage medium storing a computer program, when the computer program is read and executed by a computer, it realizes any one of claims 1 to 8 Methods.