WO2023092975A1 - Image processing method and apparatus, electronic device, storage medium, and computer program product - Google Patents

Image processing method and apparatus, electronic device, storage medium, and computer program product Download PDF

Info

Publication number
WO2023092975A1
WO2023092975A1 PCT/CN2022/096004 CN2022096004W WO2023092975A1 WO 2023092975 A1 WO2023092975 A1 WO 2023092975A1 CN 2022096004 W CN2022096004 W CN 2022096004W WO 2023092975 A1 WO2023092975 A1 WO 2023092975A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
similarity
text
feature
processed
Prior art date
Application number
PCT/CN2022/096004
Other languages
French (fr)
Chinese (zh)
Inventor
郭彤
Original Assignee
上海商汤智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海商汤智能科技有限公司 filed Critical 上海商汤智能科技有限公司
Publication of WO2023092975A1 publication Critical patent/WO2023092975A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures

Definitions

  • the present disclosure relates to the field of computer technology, and in particular to an image processing method and device, electronic equipment, storage media and computer program products.
  • the image comparison method is to extract features of images, and perform similarity calculation based on image features, and then obtain a comparison result.
  • the text content is included in the image, there may be two or more images with high similarity in color, texture, light and shade, layout, style, feature point position, etc., but different text content (for example, Screenshots of the chat interface of social software, news screenshots, etc.).
  • Embodiments of the present disclosure at least provide an image processing method, device, device, storage medium, and computer program product.
  • an image processing method including: performing feature extraction processing on an image to be processed, and obtaining a first image feature and a first text feature of the image to be processed, wherein the first The text feature is the feature information of the text included in the image to be processed; respectively determine the image similarity between the first image feature and the second image feature of the reference image, and the first text feature and the reference image The text similarity between the second text features of the image, wherein the second text feature is the feature information of the text included in the reference image; according to the image similarity and the text similarity, determine the Image matching results between an image and a reference image.
  • image features and text features can be obtained for comprehensive comparison.
  • the text contained in the image is considered, and the color, texture, light and shade, layout, style, and color of the image are reduced.
  • the similarity of the feature point position is high, but the text content is inconsistent, the probability of false positives occurs, and the accuracy of the matching result is improved.
  • the method further includes: according to the image matching results between the image to be processed and at least two reference images, determining an object in the at least two reference images that matches the image to be processed image.
  • the image similarity between the first image feature and the second image feature of the reference image, and the relationship between the first text feature and the second text feature of the reference image are respectively determined.
  • the text similarity between the texts includes: respectively determining the image similarity between the first image feature and the second image feature of at least two candidate images; according to the image similarity, in the at least two candidate images A reference image is determined from the image; and a text similarity between the first text feature and the second text feature of the reference image is determined.
  • the image matching result includes a comprehensive similarity between the reference image and the image to be processed, wherein, according to the image similarity and the text similarity, the The matching result between the image to be processed and the reference image includes one of the following: the product of the image similarity and the text similarity is determined as the comprehensive similarity; the image similarity and The weighted average of the text similarities is determined as the comprehensive similarity.
  • determining the weighted average of the image similarity and the text similarity as the comprehensive similarity includes: determining the type of the image to be processed; type, and determine weight information; according to the weight information, perform weighted average processing on the image similarity and the text similarity to obtain the comprehensive similarity.
  • determining the weighted average value of the image similarity and the text similarity as the comprehensive similarity includes: determining the area ratio of the area where the text is located in the image to be processed; The weight information is determined according to the area proportion; and the weighted average processing is performed on the image similarity and the text similarity according to the weight information to obtain the comprehensive similarity.
  • the weight information of the image similarity and text similarity can be determined by the type of the image to be processed or the area ratio of the text area, which can improve the accuracy of the weight information, thereby improving the accuracy of the matching results .
  • determining a weighted average of the image similarity and the text similarity as the comprehensive similarity includes: determining weight information according to the first image feature; The weight information performs weighted average processing on the image similarity and the text similarity to obtain the comprehensive similarity.
  • the first text feature includes at least one of the following: semantic feature, format feature, font feature, size feature, typesetting feature, and language feature.
  • an image processing device including: a feature extraction part configured to perform feature extraction processing on an image to be processed, and obtain a first image feature and a first text feature of the image to be processed , wherein, the first text feature is the feature information of the text included in the image to be processed; the similarity determining part is configured to respectively determine the difference between the first image feature and the second image feature of the reference image Image similarity, and the text similarity between the first text feature and the second text feature of the reference image, wherein the second text feature is feature information of the text included in the reference image; matching A part configured to determine an image matching result between the image to be processed and the reference image according to the image similarity and the text similarity.
  • an electronic device including: a processor; a memory configured to store instructions executable by the processor; wherein, when the processor is configured to call the instructions stored in the memory Implement the above method.
  • a computer-readable storage medium on which computer program instructions are stored, and when the computer program instructions are executed by a processor, the foregoing method is implemented.
  • a computer program product includes a non-transitory computer-readable storage medium storing a computer program, and when the computer program is read and executed by a computer, the above-mentioned method.
  • FIG. 1 shows a flowchart of an image processing method according to an embodiment of the disclosure
  • FIG. 2 shows a flowchart of an image processing method according to an embodiment of the present disclosure
  • FIG. 3 shows a flowchart of an image processing method according to an embodiment of the present disclosure
  • FIG. 4 shows a flowchart of an image processing method according to an embodiment of the present disclosure
  • FIG. 5 shows a schematic diagram of an application of an image processing method according to an embodiment of the present disclosure
  • FIG. 6 shows a flowchart of an image processing method according to an embodiment of the present disclosure
  • FIG. 7 shows a flowchart of an image processing method according to an embodiment of the present disclosure
  • FIG. 8 shows a block diagram of an image processing device according to an embodiment of the present disclosure
  • Fig. 9 shows a block diagram of an electronic device according to an embodiment of the present disclosure.
  • Fig. 10 shows a block diagram of an electronic device according to an embodiment of the present disclosure.
  • At least one in the present disclosure means any one or any combination of at least two of the multiple types, for example, including at least one of A, B, and C, which can mean including from A, B Any one or more elements selected from the set formed by and C.
  • FIG. 1 shows a flow chart of an image processing method according to an embodiment of the present disclosure, the method is executed by an electronic device, and will be described with reference to the steps shown in FIG. 1 .
  • image features and text features can be obtained for comprehensive comparison.
  • the text contained in the image is considered, and the color, texture, light and shade, layout, style, and feature of the image are reduced.
  • the similarity of the point position is high, but the text content is inconsistent, the probability of false positives will increase the accuracy of the matching results.
  • feature extraction processing is usually performed on the two images, for example, feature extraction is performed through a deep learning neural network, image features of the two images are obtained respectively, and two images are determined
  • image features of the two images are obtained respectively, and two images are determined
  • the similarity between the image features of the images if the similarity is higher than a threshold, two images can be considered to match.
  • This method can be used in face recognition, object recognition and other fields.
  • the image features extracted by this method can usually describe the feature information of the pattern level such as color, texture, light and dark, layout, style, feature point position of the image, but it is difficult to recognize the text information when the image contains text information.
  • the image features and text features of the images participating in the comparison can be obtained separately, and a comprehensive judgment can be made based on the image similarity between image features and the text similarity between text features Whether the images participating in the comparison match.
  • the text features can describe the semantics, format, font, size, typesetting (including the position of the text in the image), language and other feature information of the text.
  • the present disclosure does not limit the feature information described by the text features.
  • the image to be processed may be an image including text information, for example, the image includes one or more characters.
  • feature extraction processing may be performed on the image to be processed to obtain the first image feature and the first text feature of the image to be processed.
  • the above features can be obtained by processing the image to be processed through a deep learning neural network.
  • the first image feature of the image to be processed may be extracted through a convolutional neural network.
  • the text information in the image (for example, the content of the text, the position of the text area, the shape of the character, etc.) can be obtained through the optical character recognition (Optical Character Recognition, OCR) technology, and the first text can be obtained through the recurrent neural network feature.
  • OCR optical Character Recognition
  • the present disclosure does not limit the acquisition manners of the first image feature and the first text feature.
  • the first text features may include at least one of feature information such as semantic features, format features, font features, size features, typesetting features, and language features of the text.
  • feature information such as semantic features, format features, font features, size features, typesetting features, and language features of the text.
  • Each feature can be obtained separately through a neural network, and weighted average processing is performed to obtain the first text feature.
  • the weight may be determined based on information such as the type of the image and the number of characters. For example, if the image to be processed is a calligraphy image with only a few characters, the weight of semantic features can be made lower, and the weights of format features, font features, and size features can be higher. For another example, if the image to be processed is a screenshot of a chat interface or a news screenshot that includes a lot of text, the weight of the semantic feature can be made higher, and the weight of other features can be made lower.
  • the above features may also be fused in other ways to obtain the first text feature.
  • the above-mentioned features can be expressed in the form of a feature matrix or a feature vector, and the above-mentioned features can be multiplied to obtain the first text feature.
  • the present disclosure does not limit the manner of determining the first text feature.
  • the reference image is an image to be compared with the image to be processed, for example, the reference image and the image to be processed are images of the same type, for example, both are calligraphy images containing less text , advertising images, etc., or screenshots of chat interfaces or news that contain a lot of text.
  • the reference image can also be any image in the image library, not necessarily the same type as the image to be processed. The present disclosure does not limit the types of the reference image and the image to be processed.
  • the acquisition manner of the second image feature and the second text feature of the reference image may be the same as the acquisition manner of the first image feature and the first text feature of the image to be processed respectively.
  • the second image feature and the second text feature of the reference image can be obtained in advance and stored in the feature library corresponding to the above image library.
  • the image similarity between the first image feature and the second image feature, and the text similarity between the first text feature and the second text feature may be determined respectively.
  • the above features may be feature information in the form of a feature matrix or feature vector, and the above similarity may be determined by determining parameters such as cosine similarity, Jaccard similarity coefficient, Pearson correlation coefficient, and relative entropy.
  • the present disclosure does not limit the calculation methods of image similarity and text similarity.
  • the image similarity and text similarity can be comprehensively calculated to determine the image matching result between the image to be processed and the reference image, and the matching result can include The overall similarity obtained.
  • the reference image can be matched one-to-one with the image to be processed, and when the comprehensive similarity is greater than or equal to the similarity threshold, it is determined that the two match.
  • multiple reference images in the image library may be matched with the image to be processed, and the target image with the highest matching degree with the image to be processed may be determined in the image library.
  • FIG. 2 shows a flowchart of an image processing method according to an embodiment of the present disclosure. As shown in FIG. 2, based on FIG. 1, the method further includes:
  • the image library may include multiple candidate images, and the image similarity between the first image feature of the image to be processed and the second image feature of each candidate image may be determined respectively , and determine the text similarity between the first text feature of the image to be processed and the second text feature of each candidate image, and then determine the comprehensive similarity between the image to be processed and each candidate image.
  • FIG. 3 shows a flowchart of an image processing method according to an embodiment of the present disclosure.
  • S102 can be implemented through S301 to S303 , which will be described in conjunction with the steps shown in FIG. 3 .
  • the image similarity is used as a screening condition of the reference image, and the image similarity between the image to be processed and each candidate image is determined. Furthermore, according to the image similarity, a reference image with a higher image similarity can be screened out among multiple candidate images. For example, an image with an image similarity higher than a threshold can be screened out as a reference image, or an image with the highest image n (n is a positive integer) images are used as reference images.
  • the filtered reference image has a high similarity with the image to be processed at the pattern level, so the image with the highest matching degree with the image to be processed may come from these reference images, so when determining the text similarity, it is only necessary to determine the filtered
  • the text similarity between part of the reference image and the image to be processed is determined, and the comprehensive similarity between the selected part of the reference image and the image to be processed is determined. In order to save the calculation amount and improve the processing efficiency.
  • text similarity can also be used as a filter condition (for example, when the image to be processed mainly includes text content, and when the pattern information is less, the text similarity can be used as a filter condition), firstly, among multiple images to be selected Find the reference image with the highest text similarity with the image to be processed, then determine the image similarity between the reference image and the image to be processed, and then determine the comprehensive similarity between the reference image and the image to be processed.
  • the present disclosure does not limit the screening conditions.
  • Fig. 4 shows a flowchart of an image processing method according to an embodiment of the present disclosure.
  • S103 in Fig. 1 can be realized by at least one of S1031 and S1032, which will be shown in conjunction with Fig. 4 steps are explained.
  • both the image similarity and the text similarity can be numeric values in the form of percentages, for example, if the image similarity is 98%, the text similarity is 95%, etc., the two can be multiplied, and The product is determined as the composite similarity.
  • the weighted average of the two can also be calculated.
  • the two similarities can be considered to have the same importance. Therefore, the weights of both can be set to 1, and directly calculate the average of the two.
  • the weights of the two may be determined first, and then weighted average processing is performed. There are many ways to determine the weight. For example, it can be based on the number of characters included in the text. When the number is large, the weight of the text similarity can be made higher, otherwise, the weight of the text similarity can be lowered. The weight value can also be determined through the characteristics of the image to be processed.
  • determining a weighted average of the image similarity and the text similarity as the comprehensive similarity may be implemented in the following manner: determining the type of the image to be processed; Determine weight information according to the type of the image to be processed; perform weighted average processing on the image similarity and the text similarity according to the weight information to obtain the comprehensive similarity.
  • the type of the image to be processed may be determined according to the source of the image, for example, the source of the image to be processed is a screenshot of a communication tool interaction page or a news website page, and its type may be "communication Tool interaction page screenshot", or "news website page screenshot", or, the source of the image to be processed is a street camera, and its type can be "street view image”. Alternatively, the source of the image to be processed is an access control camera, and its type may be "face image” or the like.
  • the type of the image to be processed can also be determined according to the classification mark of the image.
  • the classification mark of the image can be added manually or automatically when the image to be processed is generated, such as the above "communication tool Screenshot of interactive page".
  • the type of the image to be processed can also be determined based on the first image feature.
  • the above-mentioned first image feature is feature information in the form of a matrix or vector, and the first image feature can be deconvoluted , activation, etc., to obtain the type of the image to be processed, for example, it may be determined that the image to be processed is a face image, a street view image, a news screenshot, a calligraphy image, and the like.
  • the type of the image to be processed may also be defined and set by the user, and the present disclosure does not limit the manner of determining the type of the image to be processed.
  • the weight information may be determined based on the type of the image to be processed. For example, when the type of image to be processed indicates that the image is mainly characterized by image features (such as face images, landscape images, etc.), the weight of image similarity can be made higher, and the weight of text similarity can be lower . For another example, when the type of image to be processed indicates that the image is mainly characterized by text features (such as news screenshots, web page screenshots, etc.), the weight of image similarity can be lower, and the weight of text similarity can be higher. .
  • image features such as face images, landscape images, etc.
  • text features such as news screenshots, web page screenshots, etc.
  • corresponding weights may be set for each type of image in advance, and weight information may also be calculated based on the type.
  • the type of the image to be processed can be expressed in the form of probability.
  • the probability of the type of the image to be processed is a news screenshot is 95%, and the probability of being other types is 5%.
  • the weight information can be calculated based on this data.
  • various types of probabilities may be used as elements of a vector, and the vector may be activated to obtain the above weight information. The present disclosure does not limit the method of calculating the weight.
  • S1032. Determine the weighted average of the image similarity and the text similarity as the comprehensive similarity, which may also be implemented in the following manner: determine the text in the image to be processed The area ratio of the area where it is located; according to the area ratio, determine the weight information; according to the weight information, perform weighted average processing on the image similarity and the text similarity to obtain the comprehensive similarity Spend.
  • the area proportion of the region where the text is located may be determined based on the first image feature.
  • the first image feature can be used to represent the layout of the image, and the area ratio of the area where the text is located can be calculated based on the first image feature.
  • the weight information may be determined based on the area ratio.
  • the area proportion of the region where the text is located can be determined as the weight of the text similarity, and then the weight of the image similarity can be calculated.
  • the area ratio of the area where the text is located can be activated to obtain the above weight information.
  • the present disclosure does not limit the specific method for calculating the weight.
  • the type of the image to be processed and the area ratio of the area where the text is located can also be obtained in other ways, for example, the type of the image to be processed can be determined through manual labeling, and the Determines the proportion of the area where the text is located.
  • the area ratio of the area where the text is located can also be determined based on attributes such as color and shape. For example, in images such as screenshots of news websites, the font is usually black, and the area area where the text is located can be determined by the proportion of black.
  • the area where the font is located in the above screenshot is a neat row or column, and the area ratio of the area where the text is located can be determined based on the area of the rectangle presented by the row or column.
  • This disclosure does not limit the method of determining the area ratio .
  • the present disclosure does not limit the manner of determining the type of the image to be processed and the area proportion of the region where the text is located.
  • the weight may be positively correlated with the area ratio, for example, the larger the area ratio of the text area, the lower the weight of the image similarity, and the higher the weight of the text similarity.
  • the method of determining the area ratio of the area where the text is located can also be used in scenarios where the text itself is also an image.
  • the text is a word art, and the word art itself is both text and an image.
  • the area ratio of the region where it is located is 100%, and the area ratio of the region where the image is located is also 100%, so the weights of the two can be made equal.
  • the number of words in the image can also be counted, for example, the interval to which the number of words belongs can be associated with the weight value, the more the number of words, the higher the weight of the text similarity, for example, the number of words If the number of words is greater than or equal to 100, the weight of text similarity is 0.8; if the number of words is greater than or equal to 50 and less than 100 words, the weight of text similarity is 0.5; if the number of words is less than 50 words, the weight of text similarity is 0.3, etc. , the present disclosure does not limit the corresponding relationship between the number of words and the weight.
  • the weight information of the image similarity and text similarity can be determined by the type of the image to be processed or the area ratio of the text area, which can improve the accuracy of the weight information, thereby improving the accuracy of the matching results .
  • determining the weighted average of the image similarity and the text similarity as the comprehensive similarity may also be implemented in the following manner: according to the first image feature, Determine weight information; perform weighted average processing on the image similarity and the text similarity according to the weight information to obtain the comprehensive similarity.
  • the characteristics of the image to be processed can be determined by the first image feature representing the feature information of the pattern level of the image to be processed.
  • the type of the image to be processed can be determined according to the first image feature in the above manner, or it can be determined Handle features such as the area ratio of the area where the text is located in the image. And based on this feature to determine the weight.
  • the weight information may also be obtained directly through a trained network model according to the first image feature, which is not limited in the present disclosure.
  • weighted average processing may be performed on the image similarity and the text similarity based on the weight information to obtain a comprehensive similarity between the image to be processed and the reference image. Furthermore, a matching result can be obtained based on the comprehensive similarity. For example, when a one-to-one comparison is performed between the image to be processed and the reference image, it may be determined that the image to be processed matches the reference image when the comprehensive similarity is higher than a threshold. For another example, when the target image matching the image to be processed is determined in the image library, the reference image with the highest comprehensive similarity with the image to be processed may be determined as the target image matching the image to be processed.
  • image features and text features can be obtained for comprehensive comparison.
  • the text contained in the image is considered, and the type of image to be processed or the area ratio of the area where the text is located To determine the weight information of image similarity and text similarity, and then obtain the matching result. It reduces the probability of false positives when the image color, texture, light and dark, layout, style, feature point position, etc. have a high similarity, but the text content is inconsistent, and improves the accuracy of the matching result.
  • Fig. 5 shows a schematic diagram of the application of the image processing method according to an embodiment of the present disclosure.
  • the image to be processed is an image including text content
  • the image to be processed can comprehensively consider the image similarity between the first image feature of the image to be processed and the second image feature of the reference image, and the text similarity between the first text feature of the image to be processed and the second text feature of the reference image .
  • the second image feature and the second text feature of each reference image in the image library may be acquired in advance, and stored in the feature library for comparison with the features of the image to be processed.
  • the first image feature and the first text feature of the image to be processed may be extracted.
  • the first image feature of the image to be processed can be extracted through a convolutional neural network
  • the text information in the image can be obtained through OCR technology
  • the first text feature of the image to be processed can be obtained through a recurrent neural network.
  • the first text feature can be used for screening, and from the images in the image library, the n images with the highest image similarity can be selected as reference images, and then the first text feature and these reference images can be determined. Text similarity of images. Furthermore, the comprehensive similarity between the image to be processed and these reference images can be determined through weighted average processing.
  • the weights of the text similarity and the image similarity may be determined, for example, the weight of the text similarity may be determined according to the area proportion of the region where the text is located in the image to be processed.
  • the area proportion of the area where the text is located can be determined based on the first image feature of the image to be processed, and then the area proportion of the area where the text is located is determined as the weight x of the text similarity, and then 1-x is determined is the weight of image similarity.
  • weighted average processing can be performed on the image similarity and text similarity based on the above weights, so as to respectively determine the comprehensive similarity between the image to be processed and each reference image, And the reference image with the highest comprehensive similarity is determined as the target image matching the image to be processed.
  • the image processing method can be used in fields such as network supervision. For example, for a website with a strong anti-data crawling function, it is difficult to directly supervise the text content published by the website. Take screenshots of the content published on the website, and compare the screenshots with preset images containing specific words or sentences to determine whether there are specific words or sentences in the screenshots, and then determine whether the content of the website includes specific words or sentences, In this way, publishers of website content can be effectively supervised.
  • the present disclosure does not limit the application scenarios of the image processing method.
  • FIG. 6 shows a flowchart of an image processing method according to an embodiment of the present disclosure. As shown in FIG. 6, the method includes:
  • Fig. 7 shows a flow chart of an image processing method according to an embodiment of the present disclosure.
  • firstly perform S701 and obtain a retrieval picture; then, perform S7021 image feature comparison and S7023 text feature respectively on the retrieval picture Compare; then, through the similarity calculation of S7022 and the similarity calculation of S7024, the similarity results of A_i, B_i, C_i, and D_i in the retrieved picture and image feature library are correspondingly obtained, and the similarity results of the retrieved picture and text feature library are obtained.
  • the similarity result of retrieval picture and A_i is 94%
  • the similarity result with B_i is 96%
  • the similarity result with C_i is 91%
  • the similarity result with D_i is 80%
  • the result of similarity with A_w is 98%
  • the result of similarity with B_w is 90%
  • the result of similarity with C_w is 85%
  • the result of similarity with D_w is 60%.
  • the relevant comparison logic of S703 is: compare the retrieved picture with the image feature database, and determine the image corresponding to the maximum similarity result as the one with the retrieved image.
  • the image most similar to the picture as shown in Table 1, the maximum value of the similarity result obtained by comparing the retrieved image with the image features of the image feature library is 96%, and the image corresponding to the maximum similarity result is a B image.
  • the B image is determined to be the image most similar to the retrieved picture.
  • the disclosed comparison logic of S704 is: compare the retrieved picture with the image features in the image feature database and the text features in the text feature database respectively, and compare the images corresponding to the image features in the image feature database and the text features in the text feature database
  • the similarity results of A and A are added together to obtain a comprehensive similarity, and the image corresponding to the maximum comprehensive similarity is determined to be the most similar image to the retrieved image. It can be seen from Table 1 that the composite similarity result of the retrieved image and A image is 192%.
  • the comprehensive similarity result with image B is 186%
  • the comprehensive similarity result with image C is 176%
  • the comprehensive similarity result with image D is 140%. Therefore, the comprehensive similarity result value between the retrieved image and image A is the largest , so that the A image is determined to be the most similar image to the retrieved image.
  • a related image comparison method is to extract depth features (corresponding to the image features in the above-mentioned embodiments) of the image, and perform similarity calculation based on the image depth features, and then obtain a comparison result.
  • depth features corresponding to the image features in the above-mentioned embodiments
  • similarity calculation based on the image depth features
  • image features are often relatively similar. If only through traditional image comparison methods For comparison, there may be more false positive data, that is, the images are relatively similar, but the text content is completely different.
  • FIG. 8 shows a block diagram of an image processing device according to an embodiment of the present disclosure. As shown in FIG.
  • the device includes: a feature extraction part 801 configured to perform feature extraction processing on an image to be processed, and obtain an image of the image to be processed The first image feature and the first text feature, wherein the first text feature is the feature information of the text included in the image to be processed; the similarity determining part 802 is configured to determine the first image feature and the first text feature respectively the image similarity between the second image features of the reference image, and the text similarity between the first text feature and the second text feature of the reference image, wherein the second text feature is the reference The feature information of the text included in the image; the matching part 803 is configured to determine an image matching result between the image to be processed and the reference image according to the image similarity and the text similarity.
  • a feature extraction part 801 configured to perform feature extraction processing on an image to be processed, and obtain an image of the image to be processed The first image feature and the first text feature, wherein the first text feature is the feature information of the text included in the image to be processed
  • the similarity determining part 802 is configured to determine the first
  • the apparatus further includes: a target image determining part configured to determine, according to the image matching results between the image to be processed and at least two reference images, the The target image to be matched with the image to be processed.
  • the similarity determining part 802 is further configured to: respectively determine the image similarity between the first image feature and the second image feature of at least two candidate images; The image similarity is determined to determine a reference image among the at least two candidate images; and the text similarity between the first text feature and the second text feature of the reference image is determined.
  • the image matching result includes a comprehensive similarity between the reference image and the image to be processed
  • the matching part 803 is further configured to: combine the image similarity with the The product of the text similarity is determined as the comprehensive similarity; or, the weighted average of the image similarity and the text similarity is determined as the comprehensive similarity.
  • the matching part 803 is further configured to: determine the type of the image to be processed; determine weight information according to the type of the image to be processed; The image similarity and the text similarity are weighted and averaged to obtain the comprehensive similarity.
  • the matching part 803 is further configured to: determine the area ratio of the area where the text is located in the image to be processed; determine the weight information according to the area ratio; The weight information is used to perform weighted average processing on the image similarity and the text similarity to obtain the comprehensive similarity.
  • the matching part 803 is further configured to: determine weight information according to the first image feature; compare the image similarity and the text similarity according to the weight information Perform weighted average processing to obtain the comprehensive similarity.
  • the first text features include at least one of semantic features, format features, font features, size features, typesetting features, and language features.
  • the functions or parts included in the apparatus provided by the embodiments of the present disclosure may be configured to execute the methods described in the above method embodiments, and the implementation manner may refer to the descriptions of the above method embodiments.
  • Embodiments of the present disclosure also provide a computer-readable storage medium, on which computer program instructions are stored, and the above-mentioned method is implemented when the computer program instructions are executed by a processor.
  • the computer readable storage medium may be a non-transitory computer readable storage medium.
  • An embodiment of the present disclosure also proposes an electronic device, including: a processor; a memory configured to store instructions executable by the processor; wherein the processor is configured to invoke the instructions stored in the memory to execute the above method.
  • Embodiments of the present disclosure also provide another computer program product configured to store computer-readable instructions, and when the instructions are executed, the computer executes the operations of the image processing method provided by any of the above embodiments.
  • Electronic devices may be provided as terminals, servers, or other forms of devices.
  • FIG. 9 shows a block diagram of an electronic device 900 according to an embodiment of the present disclosure.
  • the electronic device 900 may be a terminal such as a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, or a personal digital assistant.
  • the electronic device 900 may include one or more of the following components: a processing component 901, a memory 902, a power supply component 903, a multimedia component 904, an audio component 905, an input/output (Input/Ouput, I/O) interface 906, A sensor component 907, and a communication component 908.
  • the processing component 902 generally controls the overall operations of the electronic device 900, such as those associated with display, telephone calls, data communications, camera operations, and recording operations.
  • the processing component 901 may include one or more processors 908 to execute instructions to complete all or part of the steps of the above method. Additionally, processing component 901 may include one or more modules to facilitate interaction between processing component 901 and other components. For example, processing component 901 may include a multimedia portion to facilitate interaction between multimedia component 904 and processing component 901 .
  • the memory 902 is configured to store various types of data to support operations at the electronic device 900 . Examples of such data include instructions for any application or method operating on the electronic device 900, contact data, phonebook data, messages, pictures, videos, and the like.
  • the memory 902 can be realized by any type of volatile or non-volatile storage device or their combination, such as Static Random-Access Memory (Static Random-Access Memory, SRAM), Electrically Erasable Programmable Read-Only Memory (Electrically -Erasable Programmable Read-Only Memory, EEPROM), Erasable Programmable Read-Only Memory (Erasable Programmable Read-Only Memory, EPROM), Programmable Read-Only Memory (Programmable read-only memory, PROM), Read Only Memory (Read -Only Memory, ROM), magnetic memory, flash memory, magnetic disk or optical disk.
  • Static Random-Access Memory SRAM
  • Electrically Erasable Programmable Read-Only Memory Electrically Erasable Programmable Read-Only Memory (Electrically
  • the power supply component 903 provides power to various components of the electronic device 900 .
  • Power supply components 903 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for electronic device 900 .
  • the multimedia component 904 includes a screen providing an output interface between the electronic device 900 and the user.
  • the screen may include a liquid crystal display (Liquid Crystal Display, LCD) and a touch panel (Touch Panel, TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user.
  • the touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may not only sense an edge of a touch or slide action, but also detect a duration and pressure associated with the touch or slide operation.
  • the multimedia component 904 includes at least one of a front camera and a rear camera.
  • At least one of the front camera and the rear camera can receive external multimedia data.
  • Each front camera and rear camera can be a fixed optical lens system or have focal length and optical zoom capability.
  • the audio component 905 is configured to at least one of output and input an audio signal.
  • the audio component 905 includes a microphone (Microphone, MIC), which is configured to receive external audio signals when the electronic device 900 is in an operation mode, such as a calling mode, a recording mode and a voice recognition mode. Received audio signals may be further stored in memory 902 or sent via communication component 908 .
  • the audio component 905 also includes a speaker configured to output audio signals.
  • the I/O interface 906 provides an interface between the processing component 901 and a peripheral interface module, which may be a keyboard, a click wheel, a button, and the like. These buttons may include, but are not limited to: a home button, volume buttons, start button, and lock button.
  • Sensor assembly 907 includes one or more sensors configured to provide various aspects of status assessment for electronic device 900 .
  • the sensor component 907 can detect the open/closed state of the electronic device 900, the relative positioning of components, for example, the components are the display and the keypad of the electronic device 900, the sensor component 907 can also detect the electronic device 900 or a Changes in the position of components, presence or absence of user contact with the electronic device 900 , electronic device 900 orientation or acceleration/deceleration and temperature changes in the electronic device 900 .
  • the sensor assembly 907 may include a proximity sensor configured to detect the presence of nearby objects in the absence of any physical contact.
  • the sensor assembly 907 may also include an optical sensor, such as a CMOS image sensor (Complementary Metal-Oxide-Semiconductor, CMOS) or a CCD image sensor (Charge Coupled Device, CCD), configured to be used in imaging applications.
  • CMOS image sensor Complementary Metal-Oxide-Semiconductor
  • CCD image sensor Charge Coupled Device
  • the sensor component 907 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor or a temperature sensor.
  • the communication component 908 is configured to facilitate wired or wireless communication between the electronic device 900 and other devices.
  • the electronic device 900 can access a wireless network based on communication standards, such as WiFi, 2G or 3G, or a combination thereof.
  • the communication component 908 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel.
  • the communication component 908 further includes a near field communication (Near Field Communication, NFC) module to facilitate short-range communication.
  • NFC Near Field Communication
  • the NFC module can be based on Radio Frequency Identification (RFID) technology, Infrared Data Association (Infrared Data Association, IrDA) technology, Ultra Wide Band (UWB) technology, Bluetooth (Bit Torrent, BT) technology and other techniques to achieve.
  • RFID Radio Frequency Identification
  • IrDA Infrared Data Association
  • UWB Ultra Wide Band
  • Bluetooth Bluetooth
  • the electronic device 900 may be implemented by one or more application-specific integrated circuits (Application Specific Integrated Circuit, ASIC), digital signal processors (Digital Signal Processing, DSP), digital signal processing devices (Digital Signal Processing Device , DSPD), Programmable Logic Device (Pulsed Laser Deposition, PLD), Field Programmable Gate Array (Field Programmable Gate Array, FPGA), Controller, Microcontroller, Microprocessor or other electronic component implementation, configured to execute the above method.
  • ASIC Application Specific Integrated Circuit
  • DSP Digital Signal Processing
  • DSPD digital signal processing devices
  • PLD Programmable Logic Device
  • Field Programmable Gate Array Field Programmable Gate Array
  • FPGA Field Programmable Gate Array
  • Controller Microcontroller, Microprocessor or other electronic component implementation, configured to execute the above method.
  • a non-volatile computer-readable storage medium such as a memory 902 including computer program instructions, which can be executed by the processor 909 of the electronic device 900 to implement the above method.
  • FIG. 10 shows a block diagram of an electronic device 1000 according to an embodiment of the present disclosure.
  • the electronic device 1000 may be provided as a server.
  • electronic device 1000 includes processing component 1001 , which also includes one or more processors, and a memory resource represented by memory 1002 configured to store instructions executable by processing component 1001 , such as application programs.
  • the application program stored in memory 1002 may include one or more modules each corresponding to a set of instructions.
  • the processing component 1001 is configured to execute instructions to perform the above method.
  • the electronic device 1000 may also include a power supply component 1003 configured to perform power management of the electronic device 1000 , a wired or wireless network interface 1004 configured to connect the electronic device 1000 to a network, and an I/O interface 1005 .
  • the electronic device 1000 can operate based on an operating system stored in the memory 1002, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM or the like.
  • a non-volatile computer-readable storage medium such as the memory 1002 including computer program instructions, which can be executed by the processing component 1001 of the electronic device 1000 to implement the above method.
  • the present disclosure may be at least one of a system, method and computer program product.
  • a computer program product may include a computer readable storage medium having computer readable program instructions thereon for causing a processor to implement various aspects of the present disclosure.
  • a computer readable storage medium may be a tangible device that can retain and store instructions for use by an instruction execution device.
  • a computer readable storage medium may be, for example, but is not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
  • Computer-readable storage media include: portable computer disks, hard disks, Random-Access Memory (RAM), Read-Only Memory (ROM), Erasable Programmable Read-Only Memory (EPROM) or flash memory, Static Random-Access Memory (Static Random-Access Memory, SRAM), Portable Compact Disc Read-Only Memory (Compact Disc Read-Only Memory, CD-ROM), Digital Versatile Disc (DVD), memory sticks, floppy disks, mechanically encoded devices such as punched cards or raised structures in grooves with instructions stored thereon, and any suitable combination.
  • RAM Random-Access Memory
  • ROM Read-Only Memory
  • EPROM Erasable Programmable Read-Only Memory
  • flash memory Static Random-Access Memory
  • Static Random-Access Memory SRAM
  • Portable Compact Disc Read-Only Memory Compact Disc Read-Only Memory
  • CD-ROM Compact Disc Read-Only Memory
  • DVD Digital Versatile Disc
  • memory sticks floppy disks, mechanically encoded devices such as punched cards or raised structures
  • computer-readable storage media are not to be construed as transient signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., pulses of light through fiber optic cables), or transmitted electrical signals.
  • the computer-readable program instructions described herein can be downloaded from a computer-readable storage medium to a respective computing/processing device, or downloaded to an external computer or external storage device over a network, such as at least one of the Internet, a local area network, a wide area network, and a wireless network .
  • the network may include at least one of copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and edge servers.
  • a network adapter card or a network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in each computing/processing device .
  • Computer program instructions for performing the operations of the present disclosure may be assembly instructions, Instruction Set Architectures (ISA) instructions, machine instructions, machine-dependent instructions, microcode, firmware instructions, state setting data, or in the form of one or more source or object code written in any combination of programming languages, including object-oriented programming languages—such as Smalltalk, C++, etc., and conventional procedural programming languages—such as the “C” language or similar programming languages.
  • Computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server implement.
  • the remote computer can be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or it can be connected to an external computer such as use an Internet service provider to connect via the Internet).
  • electronic circuits such as programmable logic circuits, field programmable gate arrays (Field Programmable Gate Array, FPGA) or programmable logic arrays (Programmable Logic Array, PLA), the electronic circuit can execute computer-readable program instructions, thereby implementing various aspects of the present disclosure.
  • These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine such that when executed by the processor of the computer or other programmable data processing apparatus , producing a device for realizing the functions/actions specified in one or more blocks of at least one of the flowchart and the block diagram.
  • These computer-readable program instructions can also be stored in a computer-readable storage medium, and these instructions cause at least one of computers, programmable data processing devices and other devices to work in a specific way, so that the computer-readable The medium then includes an article of manufacture including instructions for implementing various aspects of the functions/acts specified in one or more blocks of at least one of flowcharts and block diagrams.
  • each block in a flowchart or block diagram may represent a module, a portion of a program segment, or an instruction that includes one or more Executable instructions.
  • the functions noted in the block may occur out of the order noted in the figures. For example, two blocks in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of at least one of the block diagrams and flowcharts, and combinations of blocks of at least one of the block diagrams and flowcharts may be implemented with dedicated hardware-based devices that perform specified functions or actions. system, or it may be implemented by a combination of special purpose hardware and computer instructions.
  • the computer program product can be specifically realized by means of hardware, software or a combination thereof.
  • the computer program product is embodied as a computer storage medium, and in another optional embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK) etc. wait.
  • a software development kit Software Development Kit, SDK
  • the image processing method of the embodiment of the present disclosure can obtain image features and text features for comprehensive comparison.
  • the text contained in the image is considered, and the type of the image to be processed or the area ratio of the area where the text is located, etc.
  • To determine the weight information of image similarity and text similarity and then obtain the matching result.
  • the determined text similarity can be used to reduce false positives in image matching The probability of , which improves the accuracy of the matching results.

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present application relates to an image processing method and apparatus, an electronic device, and a storage medium. The method comprises: performing feature extraction on an image to be processed, to obtain a first image feature and a first text feature; respectively determining the image similarity between the first image feature and a second image feature of a reference image and the text similarity between the first text feature and a second text feature of the reference image; and determining an image matching result between the image to be processed and the reference image according to the image similarity and the text similarity.

Description

图像处理方法及装置、电子设备、存储介质及计算机程序产品Image processing method and device, electronic device, storage medium and computer program product
相关申请的交叉引用Cross References to Related Applications
本公开要求2021年11月29日提交的中国专利申请号为202111435625.7,申请人为深圳市商汤科技有限公司,申请名称为“图像处理方法及装置、电子设备和存储介质”的优先权,该申请的全文以引用的方式并入本公开中。This disclosure requires the Chinese patent application No. 202111435625.7 submitted on November 29, 2021. The applicant is Shenzhen Shangtang Technology Co., Ltd., and the application name is "image processing method and device, electronic equipment and storage medium". is incorporated by reference in its entirety into this disclosure.
技术领域technical field
本公开涉及计算机技术领域,尤其涉及一种图像处理方法及装置、电子设备、存储介质及计算机程序产品。The present disclosure relates to the field of computer technology, and in particular to an image processing method and device, electronic equipment, storage media and computer program products.
背景技术Background technique
在相关技术中,图像比对方法是对图像进行特征提取,并基于图像特征进行相似度计算,进而得出比对结果。然而,在图像中包括文本内容时,则有可能出现两个或多个图像的色彩、纹理、光暗、布局、风格、特征点位置等相似度较高,但文本内容不同的情况(例如,社交软件的聊天界面截图、新闻截图等)。In related technologies, the image comparison method is to extract features of images, and perform similarity calculation based on image features, and then obtain a comparison result. However, when the text content is included in the image, there may be two or more images with high similarity in color, texture, light and shade, layout, style, feature point position, etc., but different text content (for example, Screenshots of the chat interface of social software, news screenshots, etc.).
发明内容Contents of the invention
本公开实施例至少提供一种图像处理方法、装置、设备、存储介质及计算机程序产品。Embodiments of the present disclosure at least provide an image processing method, device, device, storage medium, and computer program product.
本公开实施例的技术方案是这样实现的:The technical scheme of the embodiment of the present disclosure is realized in this way:
根据本公开实施例的一方面,提供了一种图像处理方法,包括:对待处理图像进行特征提取处理,获得所述待处理图像的第一图像特征以及第一文本特征,其中,所述第一文本特征为所述待处理图像中包括的文本的特征信息;分别确定所述第一图像特征与参考图像的第二图像特征之间的图像相似度,以及所述第一文本特征与所述参考图像的第二文本特征之间的文本相似度,其中,所述第二文本特征为所述参考图像中包括的文本的特征信息;根据所述图像相似度和所述文本相似度,确定待处理图像和参考图像之间的图像匹配结果。According to an aspect of an embodiment of the present disclosure, an image processing method is provided, including: performing feature extraction processing on an image to be processed, and obtaining a first image feature and a first text feature of the image to be processed, wherein the first The text feature is the feature information of the text included in the image to be processed; respectively determine the image similarity between the first image feature and the second image feature of the reference image, and the first text feature and the reference image The text similarity between the second text features of the image, wherein the second text feature is the feature information of the text included in the reference image; according to the image similarity and the text similarity, determine the Image matching results between an image and a reference image.
根据本公开的实施例的图像处理方法,可获取图像特征和文本特征来综合进行对比,对比过程中考虑了图像中所包含的文本,减少了图像的色彩、纹理、光暗、布局、风格、特征点位置等相似度较高,但文本内容不一致的情况下,发生误报的概率,提升了匹配结果的准确率。According to the image processing method of the embodiment of the present disclosure, image features and text features can be obtained for comprehensive comparison. During the comparison process, the text contained in the image is considered, and the color, texture, light and shade, layout, style, and color of the image are reduced. The similarity of the feature point position is high, but the text content is inconsistent, the probability of false positives occurs, and the accuracy of the matching result is improved.
在一种可能的实现方式中,所述方法还包括:根据待处理图像与至少两 个参考图像之间的图像匹配结果,确定所述至少两个参考图像中与所述待处理图像匹配的目标图像。In a possible implementation manner, the method further includes: according to the image matching results between the image to be processed and at least two reference images, determining an object in the at least two reference images that matches the image to be processed image.
在一种可能的实现方式中,分别确定所述第一图像特征与参考图像的第二图像特征之间的图像相似度,以及所述第一文本特征与所述参考图像的第二文本特征之间的文本相似度,包括:分别确定所述第一图像特征与至少两个待选图像的第二图像特征之间的图像相似度;根据所述图像相似度,在所述至少两个待选图像中确定出参考图像;确定所述第一文本特征与所述参考图像的第二文本特征之间的文本相似度。In a possible implementation manner, the image similarity between the first image feature and the second image feature of the reference image, and the relationship between the first text feature and the second text feature of the reference image are respectively determined. The text similarity between the texts includes: respectively determining the image similarity between the first image feature and the second image feature of at least two candidate images; according to the image similarity, in the at least two candidate images A reference image is determined from the image; and a text similarity between the first text feature and the second text feature of the reference image is determined.
在一种可能的实现方式中,所述图像匹配结果包括所述参考图像与所述待处理图像之间的综合相似度,其中,根据所述图像相似度和所述文本相似度,确定所述待处理图像与所述参考图像之间的匹配结果,包括以下中的一种:将所述图像相似度和所述文本相似度的乘积确定为所述综合相似度;将所述图像相似度和所述文本相似度的加权平均值确定为所述综合相似度。In a possible implementation manner, the image matching result includes a comprehensive similarity between the reference image and the image to be processed, wherein, according to the image similarity and the text similarity, the The matching result between the image to be processed and the reference image includes one of the following: the product of the image similarity and the text similarity is determined as the comprehensive similarity; the image similarity and The weighted average of the text similarities is determined as the comprehensive similarity.
在一种可能的实现方式中,将所述图像相似度和所述文本相似度的加权平均值确定为所述综合相似度,包括:确定所述待处理图像的类型;根据所述待处理图像的类型,确定权值信息;根据所述权值信息,对所述图像相似度和所述文本相似度进行加权平均处理,获得所述综合相似度。In a possible implementation manner, determining the weighted average of the image similarity and the text similarity as the comprehensive similarity includes: determining the type of the image to be processed; type, and determine weight information; according to the weight information, perform weighted average processing on the image similarity and the text similarity to obtain the comprehensive similarity.
在一种可能的实现方式中,将所述图像相似度和所述文本相似度的加权平均值确定为所述综合相似度,包括:确定所述待处理图像中文本所在区域的面积占比;根据所述面积占比,确定所述权值信息;根据所述权值信息,对所述图像相似度和所述文本相似度进行加权平均处理,获得所述综合相似度。In a possible implementation manner, determining the weighted average value of the image similarity and the text similarity as the comprehensive similarity includes: determining the area ratio of the area where the text is located in the image to be processed; The weight information is determined according to the area proportion; and the weighted average processing is performed on the image similarity and the text similarity according to the weight information to obtain the comprehensive similarity.
通过这种方式,可通过待处理图像的类型或文本所在区域的面积占比来确定图像相似度和文本相似度的权值信息,可提高权值信息的准确度,进而提高匹配结果的准确度。In this way, the weight information of the image similarity and text similarity can be determined by the type of the image to be processed or the area ratio of the text area, which can improve the accuracy of the weight information, thereby improving the accuracy of the matching results .
在一种可能的实现方式中,将所述图像相似度和所述文本相似度的加权平均值确定为所述综合相似度,包括:根据所述第一图像特征,确定权值信息;根据所述权值信息对所述图像相似度和所述文本相似度进行加权平均处理,获得所述综合相似度。In a possible implementation manner, determining a weighted average of the image similarity and the text similarity as the comprehensive similarity includes: determining weight information according to the first image feature; The weight information performs weighted average processing on the image similarity and the text similarity to obtain the comprehensive similarity.
在一种可能的实现方式中,所述第一文本特征包括以下至少之一:语义特征、格式特征、字体特征、尺寸特征、排版特征和语种特征。In a possible implementation manner, the first text feature includes at least one of the following: semantic feature, format feature, font feature, size feature, typesetting feature, and language feature.
根据本公开实施例的一方面,提供了一种图像处理装置,包括:特征提取部分,被配置为对待处理图像进行特征提取处理,获得所述待处理图像的第一图像特征以及第一文本特征,其中,所述第一文本特征为所述待处理图像中包括的文本的特征信息;相似度确定部分,被配置为分别确定所述第一图像特征与参考图像的第二图像特征之间的图像相似度,以及所述第一文本特征与所述参考图像的第二文本特征之间的文本相似度,其中,所述第二文本特征为所述参考图像中包括的文本的特征信息;匹配部分,被配置为根据所述图像相似度和所述文本相似度,确定待处理图像和参考图像之间的图像 匹配结果。According to an aspect of an embodiment of the present disclosure, there is provided an image processing device, including: a feature extraction part configured to perform feature extraction processing on an image to be processed, and obtain a first image feature and a first text feature of the image to be processed , wherein, the first text feature is the feature information of the text included in the image to be processed; the similarity determining part is configured to respectively determine the difference between the first image feature and the second image feature of the reference image Image similarity, and the text similarity between the first text feature and the second text feature of the reference image, wherein the second text feature is feature information of the text included in the reference image; matching A part configured to determine an image matching result between the image to be processed and the reference image according to the image similarity and the text similarity.
根据本公开实施例的一方面,提供了一种电子设备,包括:处理器;被配置为存储处理器可执行指令的存储器;其中,所述处理器被配置为调用所述存储器存储的指令时实现上述方法。According to an aspect of an embodiment of the present disclosure, there is provided an electronic device, including: a processor; a memory configured to store instructions executable by the processor; wherein, when the processor is configured to call the instructions stored in the memory Implement the above method.
根据本公开实施例的一方面,提供了一种计算机可读存储介质,其上存储有计算机程序指令,所述计算机程序指令被处理器执行时实现上述方法。According to an aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium, on which computer program instructions are stored, and when the computer program instructions are executed by a processor, the foregoing method is implemented.
根据本公开实施例的一方面,提供了一种计算机程序产品,所述计算机程序产品包括存储了计算机程序的非瞬时性计算机可读存储介质,所述计算机程序被计算机读取并执行时实现上述方法。According to an aspect of an embodiment of the present disclosure, a computer program product is provided, the computer program product includes a non-transitory computer-readable storage medium storing a computer program, and when the computer program is read and executed by a computer, the above-mentioned method.
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,而非限制本公开。根据下面参考附图对示例性实施例的详细说明,本公开的其它特征及方面将变得清楚。It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure. Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments with reference to the accompanying drawings.
附图说明Description of drawings
此处的附图被并入说明书中并构成本说明书的一部分,这些附图示出了符合本公开的实施例,并与说明书一起用于说明本公开的技术方案。The accompanying drawings here are incorporated into the description and constitute a part of the present description. These drawings show embodiments consistent with the present disclosure, and are used together with the description to explain the technical solution of the present disclosure.
图1示出根据本公开实施例的图像处理方法的流程图;FIG. 1 shows a flowchart of an image processing method according to an embodiment of the disclosure;
图2示出根据本公开实施例的图像处理方法的流程图;FIG. 2 shows a flowchart of an image processing method according to an embodiment of the present disclosure;
图3示出根据本公开实施例的图像处理方法的流程图;FIG. 3 shows a flowchart of an image processing method according to an embodiment of the present disclosure;
图4示出根据本公开实施例的图像处理方法的流程图;FIG. 4 shows a flowchart of an image processing method according to an embodiment of the present disclosure;
图5示出根据本公开实施例的图像处理方法的应用示意图;FIG. 5 shows a schematic diagram of an application of an image processing method according to an embodiment of the present disclosure;
图6示出根据本公开实施例的图像处理方法的流程图;FIG. 6 shows a flowchart of an image processing method according to an embodiment of the present disclosure;
图7示出根据本公开实施例的图像处理方法的流程图;FIG. 7 shows a flowchart of an image processing method according to an embodiment of the present disclosure;
图8示出根据本公开实施例的图像处理装置的框图;8 shows a block diagram of an image processing device according to an embodiment of the present disclosure;
图9示出根据本公开实施例的一种电子设备的框图;Fig. 9 shows a block diagram of an electronic device according to an embodiment of the present disclosure;
图10示出根据本公开实施例的一种电子设备的框图。Fig. 10 shows a block diagram of an electronic device according to an embodiment of the present disclosure.
具体实施方式Detailed ways
以下将参考附图详细说明本公开的各种示例性实施例、特征和方面。附图中相同的附图标记表示功能相同或相似的元件。尽管在附图中示出了实施例的各种方面,但是除非特别指出,不必按比例绘制附图。Various exemplary embodiments, features, and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. The same reference numbers in the figures indicate functionally identical or similar elements. While various aspects of the embodiments are shown in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.
在这里专用的词“示例性”意为“用作例子、实施例或说明性”。这里作为“示例性”所说明的任何实施例不必解释为优于或好于其它实施例。The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration." Any embodiment described herein as "exemplary" is not necessarily to be construed as superior or better than other embodiments.
本公开中术语“至少一种”表示多种中的任意一种或多种中的至少两种的任意组合,例如,包括A、B、C中的至少一种,可以表示包括从A、B和C构成的集合中选择的任意一个或多个元素。The term "at least one" in the present disclosure means any one or any combination of at least two of the multiple types, for example, including at least one of A, B, and C, which can mean including from A, B Any one or more elements selected from the set formed by and C.
另外,为了更好地说明本公开,在下文的具体实施方式中给出了众多的细节。本领域技术人员应当理解,没有某些细节,本公开同样可以实施。在 一些实施例中,对于本领域技术人员熟知的方法、手段、元件和电路未作详细描述,以便于凸显本公开的主旨。In addition, in order to better illustrate the present disclosure, numerous details are given in the following specific embodiments. It will be understood by those skilled in the art that the present disclosure may be practiced without certain of the details. In some embodiments, methods, means, components and circuits well known to those skilled in the art are not described in detail in order to highlight the gist of the present disclosure.
图1示出根据本公开实施例的图像处理方法的流程图,所述方法由电子设备执行,将结合图1示出的步骤进行说明。FIG. 1 shows a flow chart of an image processing method according to an embodiment of the present disclosure, the method is executed by an electronic device, and will be described with reference to the steps shown in FIG. 1 .
S101、对待处理图像进行特征提取处理,获得所述待处理图像的第一图像特征以及第一文本特征,其中,所述第一文本特征为所述待处理图像中包括的文本的特征信息;S101. Perform feature extraction processing on the image to be processed to obtain a first image feature and a first text feature of the image to be processed, wherein the first text feature is feature information of text included in the image to be processed;
S102、分别确定所述第一图像特征与参考图像的第二图像特征之间的图像相似度,以及所述第一文本特征与所述参考图像的第二文本特征之间的文本相似度,其中,所述第二文本特征为所述参考图像中包括的文本的特征信息;S102. Determine the image similarity between the first image feature and the second image feature of the reference image, and the text similarity between the first text feature and the second text feature of the reference image, respectively, wherein , the second text feature is feature information of the text included in the reference image;
S103、根据所述图像相似度和所述文本相似度,确定待处理图像和参考图像之间的图像匹配结果。S103. Determine an image matching result between the image to be processed and the reference image according to the image similarity and the text similarity.
根据本公开实施例的图像处理方法,可获取图像特征和文本特征来综合进行对比,对比过程中考虑了图像中所包含的文本,减少了图像的色彩、纹理、光暗、布局、风格、特征点位置等相似度较高,但文本内容不一致的情况下,发生误报的概率,提升了匹配结果的准确率。According to the image processing method of the embodiment of the present disclosure, image features and text features can be obtained for comprehensive comparison. During the comparison process, the text contained in the image is considered, and the color, texture, light and shade, layout, style, and feature of the image are reduced. When the similarity of the point position is high, but the text content is inconsistent, the probability of false positives will increase the accuracy of the matching results.
在相关技术中,在确定两个图像是否匹配的过程中,通常对两个图像进行特征提取处理,例如,通过深度学习神经网络进行特征提取,分别获取两个图像的图像特征,并确定两个图像的图像特征之间的相似度,如果相似度高于阈值,则可认为两个图像匹配。该方法可用于人脸识别、物体识别等领域。该方法所提取的图像特征通常可描述图像的色彩、纹理、光暗、布局、风格、特征点位置等图案层面的特征信息,但对于图像中包括文本信息的情况,则难以识别其文本信息。例如,可能存在两个图像图案相似,但文本信息不同的情况(例如,社交软件的聊天界面截图等),在这种情况下,尽管两个图像所获取的图像特征的相似度较高,但两个图像所表达的含义却差异很大,因此,仅通过图像特征难以确定这种图像是否匹配。In related technologies, in the process of determining whether two images match, feature extraction processing is usually performed on the two images, for example, feature extraction is performed through a deep learning neural network, image features of the two images are obtained respectively, and two images are determined The similarity between the image features of the images, if the similarity is higher than a threshold, two images can be considered to match. This method can be used in face recognition, object recognition and other fields. The image features extracted by this method can usually describe the feature information of the pattern level such as color, texture, light and dark, layout, style, feature point position of the image, but it is difficult to recognize the text information when the image contains text information. For example, there may be situations where two images have similar patterns but different text information (for example, screenshots of the chat interface of social software, etc.), in this case, although the image features obtained by the two images have a high similarity, the The meanings expressed by the two images are very different, so it is difficult to determine whether such images match through image features alone.
在一种可能的实现方式中,针对上述问题,可分别获得参加比对的图像的图像特征和文本特征,并基于图像特征之间的图像相似度和文本特征之间的文本相似度来综合判断参加比对的图像是否匹配。其中,文本特征可描述文本的语义、格式、字体、尺寸、排版(包括文本在图像中的位置)、语种等特征信息。本公开对文本特征所描述的特征信息不做限制。In a possible implementation, to address the above problems, the image features and text features of the images participating in the comparison can be obtained separately, and a comprehensive judgment can be made based on the image similarity between image features and the text similarity between text features Whether the images participating in the comparison match. Among them, the text features can describe the semantics, format, font, size, typesetting (including the position of the text in the image), language and other feature information of the text. The present disclosure does not limit the feature information described by the text features.
在一种可能的实现方式中,在S101中,待处理图像可以是包括文本信息的图像,例如,图像中包括一个或多个文字。在确定待处理图像和参考图像之间是否匹配时,可通过二者之间的图像相似度和文本相似度综合判断。In a possible implementation manner, in S101, the image to be processed may be an image including text information, for example, the image includes one or more characters. When determining whether there is a match between the image to be processed and the reference image, it can be judged comprehensively based on the image similarity and text similarity between the two.
在一种可能的实现方式中,可对待处理图像进行特征提取处理,获得待处理图像的第一图像特征以及第一文本特征。可通过深度学习神经网络对待处理图像进行处理,获得上述特征。例如,可通过卷积神经网络提取待处理图像的第一图像特征。并可通过光学字符识别(Optical Character Recognition, OCR)技术获得图像中的文本信息(例如,文本的内容、文本所在区域的位置、字符的形状等信息),并可通过递归神经网络获得第一文本特征。本公开对第一图像特征和第一文本特征的获取方式不做限制。In a possible implementation manner, feature extraction processing may be performed on the image to be processed to obtain the first image feature and the first text feature of the image to be processed. The above features can be obtained by processing the image to be processed through a deep learning neural network. For example, the first image feature of the image to be processed may be extracted through a convolutional neural network. And the text information in the image (for example, the content of the text, the position of the text area, the shape of the character, etc.) can be obtained through the optical character recognition (Optical Character Recognition, OCR) technology, and the first text can be obtained through the recurrent neural network feature. The present disclosure does not limit the acquisition manners of the first image feature and the first text feature.
在一种可能的实现方式中,第一文本特征可包括文本的语义特征、格式特征、字体特征、尺寸特征、排版特征、语种特征等特征信息中的至少一种。可通过神经网络等方式分别获取每种特征,并进行加权平均处理,获得第一文本特征。在确定权重时,可基于图像的类型、文字的数量等信息来确定权重。例如,待处理图像为仅有少数文字的书法图像,则可使语义特征权重较低,并使格式特征、字体特征、尺寸特征等权重较高。又例如,待处理图像为包括较多文字的聊天界面截图或新闻截图等图像,则可使语义特征权重较高,并使其他特征权重较低。In a possible implementation manner, the first text features may include at least one of feature information such as semantic features, format features, font features, size features, typesetting features, and language features of the text. Each feature can be obtained separately through a neural network, and weighted average processing is performed to obtain the first text feature. When determining the weight, the weight may be determined based on information such as the type of the image and the number of characters. For example, if the image to be processed is a calligraphy image with only a few characters, the weight of semantic features can be made lower, and the weights of format features, font features, and size features can be higher. For another example, if the image to be processed is a screenshot of a chat interface or a news screenshot that includes a lot of text, the weight of the semantic feature can be made higher, and the weight of other features can be made lower.
在一种可能的实现方式中,还可通过其他方式将上述特征进行融合,获的第一文本特征。例如,上述特征可通过特征矩阵或特征向量等形式表示,可将上述特征进行相乘等处理,获得第一文本特征。本公开对第一文本特征的确定方式不做限制。In a possible implementation manner, the above features may also be fused in other ways to obtain the first text feature. For example, the above-mentioned features can be expressed in the form of a feature matrix or a feature vector, and the above-mentioned features can be multiplied to obtain the first text feature. The present disclosure does not limit the manner of determining the first text feature.
在一种可能的实现方式中,在S102中,参考图像为与待处理图像进行对比的图像,例如,参考图像与待处理图像为类型相同的图像,例如,均为包含文字较少的书法图像、广告图像等,或者均为包含文字较多的聊天界面截图或新闻截图等。参考图像也可以是图像库中的任意图像,不必与待处理图像类型相同。本公开对参考图像和待处理图像的类型不做限制。In a possible implementation, in S102, the reference image is an image to be compared with the image to be processed, for example, the reference image and the image to be processed are images of the same type, for example, both are calligraphy images containing less text , advertising images, etc., or screenshots of chat interfaces or news that contain a lot of text. The reference image can also be any image in the image library, not necessarily the same type as the image to be processed. The present disclosure does not limit the types of the reference image and the image to be processed.
在一种可能的实现方式中,参考图像的第二图像特征和第二文本特征的获取方式可分别与待处理图像的第一图像特征和第一文本特征的获取方式相同。在示例中,可预先获取参考图像的第二图像特征和第二文本特征,并保存在与上述图像库对应的特征库中,在进行匹配时,无需再次对参考图像进行特征提取,以提升匹配的效率。In a possible implementation manner, the acquisition manner of the second image feature and the second text feature of the reference image may be the same as the acquisition manner of the first image feature and the first text feature of the image to be processed respectively. In an example, the second image feature and the second text feature of the reference image can be obtained in advance and stored in the feature library corresponding to the above image library. When performing matching, there is no need to perform feature extraction on the reference image again to improve the matching s efficiency.
在一种可能的实现方式中,可分别确定第一图像特征和第二图像特征的图像相似度,以及第一文本特征和第二文本特征的文本相似度。在示例中,上述特征可以是特征矩阵或特征向量形式的特征信息,可通过确定余弦相似度、Jaccard相似系数、皮尔森相关系数、相对熵等参数来分别确定上述相似度。本公开对图像相似度和文本相似度的计算方式不做限制。In a possible implementation manner, the image similarity between the first image feature and the second image feature, and the text similarity between the first text feature and the second text feature may be determined respectively. In an example, the above features may be feature information in the form of a feature matrix or feature vector, and the above similarity may be determined by determining parameters such as cosine similarity, Jaccard similarity coefficient, Pearson correlation coefficient, and relative entropy. The present disclosure does not limit the calculation methods of image similarity and text similarity.
在一种可能的实现方式中,在S103中,可将图像相似度和文本相似度进行综合计算,确定待处理图像和参考图像之间的图像匹配结果,所述匹配结果可包括经过上述综合计算获得的综合相似度。所述参考图像可与待处理图像进行一对一的匹配处理,并在综合相似度大于或等于相似度阈值时,确定二者匹配。或者,可将图像库中的多个参考图像分别与待处理图像进行匹配处理,并在图像库中确定出与待处理图像匹配度最高的目标图像。In a possible implementation, in S103, the image similarity and text similarity can be comprehensively calculated to determine the image matching result between the image to be processed and the reference image, and the matching result can include The overall similarity obtained. The reference image can be matched one-to-one with the image to be processed, and when the comprehensive similarity is greater than or equal to the similarity threshold, it is determined that the two match. Alternatively, multiple reference images in the image library may be matched with the image to be processed, and the target image with the highest matching degree with the image to be processed may be determined in the image library.
图2示出根据本公开实施例的图像处理方法的流程图,如图2所示,基于图1,所述方法还包括:FIG. 2 shows a flowchart of an image processing method according to an embodiment of the present disclosure. As shown in FIG. 2, based on FIG. 1, the method further includes:
S201、根据待处理图像与至少两个参考图像之间的图像匹配结果,确定 所述至少两个参考图像中与所述待处理图像匹配的目标图像。S201. According to the image matching results between the image to be processed and at least two reference images, determine a target image among the at least two reference images that matches the image to be processed.
在一种可能的实现方式中,所述图像库中可包括多个待选图像,可分别确定待处理图像的第一图像特征和每个待选图像的第二图像特征之间的图像相似度,并确定待处理图像的第一文本特征和每个待选图像的第二文本特征之间的文本相似度,进而确定待处理图像和每个待选图像之间的综合相似度。In a possible implementation, the image library may include multiple candidate images, and the image similarity between the first image feature of the image to be processed and the second image feature of each candidate image may be determined respectively , and determine the text similarity between the first text feature of the image to be processed and the second text feature of each candidate image, and then determine the comprehensive similarity between the image to be processed and each candidate image.
在一种可能的实现方式中,如果图像库中的待选图像数量较大,对所有图像逐个计算上述两种相似度的计算量较大,计算效率较低。针对该问题,可首先从数量较大的待选图像中确定出部分图像作为参考图像,再计算待处理图像和参考图像之间的综合相似度。In a possible implementation manner, if the number of images to be selected in the image database is large, calculating the above two similarities for all images one by one requires a large amount of calculation, and the calculation efficiency is low. To solve this problem, some images can be determined as reference images from a large number of candidate images first, and then the comprehensive similarity between the image to be processed and the reference image can be calculated.
图3示出根据本公开实施例的图像处理方法的流程图,如图3所示,基于图2,S102可以通过S301至S303实现,将结合图3示出的步骤进行说明。FIG. 3 shows a flowchart of an image processing method according to an embodiment of the present disclosure. As shown in FIG. 3 , based on FIG. 2 , S102 can be implemented through S301 to S303 , which will be described in conjunction with the steps shown in FIG. 3 .
S301、分别确定所述第一图像特征与至少两个待选图像的第二图像特征之间的图像相似度;S301. Determine the image similarity between the first image feature and the second image feature of at least two candidate images respectively;
S302、根据所述图像相似度,在所述至少两个待选图像中确定出参考图像;S302. Determine a reference image from the at least two candidate images according to the image similarity;
S303、确定所述第一文本特征与所述参考图像的第二文本特征之间的文本相似度。S303. Determine text similarity between the first text feature and the second text feature of the reference image.
在一种可能的实现方式中,首先将图像相似度作为参考图像的筛选条件,确定待处理图像与各待选图像之间的图像相似度。进而可根据图像相似度,在多个待选图像中筛选出图像相似度较高的参考图像,例如,可筛选出图像相似度高于阈值的图像作为参考图像,或筛选出图像相似度最高的n(n为正整数)个图像作为参考图像。筛选出的参考图像在图案层面与待处理图像的相似度较高,因此,与待处理图像匹配度最高的图像可能来自于这些参考图像中,因而在确定文本相似度时,只需确定筛选出的部分参考图像与待处理图像之间的文本相似度,并确定筛选出的部分参考图像与待处理图像之间的综合相似度。以节约计算量,提升处理效率。In a possible implementation manner, firstly, the image similarity is used as a screening condition of the reference image, and the image similarity between the image to be processed and each candidate image is determined. Furthermore, according to the image similarity, a reference image with a higher image similarity can be screened out among multiple candidate images. For example, an image with an image similarity higher than a threshold can be screened out as a reference image, or an image with the highest image n (n is a positive integer) images are used as reference images. The filtered reference image has a high similarity with the image to be processed at the pattern level, so the image with the highest matching degree with the image to be processed may come from these reference images, so when determining the text similarity, it is only necessary to determine the filtered The text similarity between part of the reference image and the image to be processed is determined, and the comprehensive similarity between the selected part of the reference image and the image to be processed is determined. In order to save the calculation amount and improve the processing efficiency.
当然,也可将文本相似度作为筛选条件(例如,待处理图像主要包括文本内容,而图案方面的信息较少时,可将文本相似度作为筛选条件),首先在多个待选图像中筛选出与待处理图像的文本相似度最高的参考图像,再确定参考图像与待处理图像的图像相似度,进而确定参考图像与待处理图像的综合相似度。本公开对筛选条件不做限制。Of course, text similarity can also be used as a filter condition (for example, when the image to be processed mainly includes text content, and when the pattern information is less, the text similarity can be used as a filter condition), firstly, among multiple images to be selected Find the reference image with the highest text similarity with the image to be processed, then determine the image similarity between the reference image and the image to be processed, and then determine the comprehensive similarity between the reference image and the image to be processed. The present disclosure does not limit the screening conditions.
图4示出根据本公开实施例的图像处理方法的流程图,如图4所示,基于图1,图1中的S103可以通过S1031和S1032中的至少之一实现,将结合图4示出的步骤进行说明。Fig. 4 shows a flowchart of an image processing method according to an embodiment of the present disclosure. As shown in Fig. 4, based on Fig. 1, S103 in Fig. 1 can be realized by at least one of S1031 and S1032, which will be shown in conjunction with Fig. 4 steps are explained.
S1031、将所述图像相似度和所述文本相似度的乘积确定为所述综合相似度;S1031. Determine the product of the image similarity and the text similarity as the comprehensive similarity;
S1032、将所述图像相似度和所述文本相似度的加权平均值确定为所述综合相似度。S1032. Determine a weighted average of the image similarity and the text similarity as the comprehensive similarity.
在一种可能的实现方式中,图像相似度和文本相似度均可为百分比形式 的数值,例如,图像相似度为98%,文本相似度为95%等,可将二者相乘,并将乘积确定为综合相似度。In a possible implementation, both the image similarity and the text similarity can be numeric values in the form of percentages, for example, if the image similarity is 98%, the text similarity is 95%, etc., the two can be multiplied, and The product is determined as the composite similarity.
在另一种可能的实现方式中,也可计算二者的加权平均值,例如,可认为两种相似度的重要性相同,因此,可将二者的权值均设置为1,并直接计算二者的平均值。In another possible implementation, the weighted average of the two can also be calculated. For example, the two similarities can be considered to have the same importance. Therefore, the weights of both can be set to 1, and directly calculate the average of the two.
或者,也可首先确定二者的权值,再进行加权平均处理。权值的确定方式可以有多种,例如,可基于文本中包括文字的数量,在数量较多时,可使文本相似度的权值较高,反之,则使文本相似度的权值较低。也可通过待处理图像自身的特性,来确定权值。Alternatively, the weights of the two may be determined first, and then weighted average processing is performed. There are many ways to determine the weight. For example, it can be based on the number of characters included in the text. When the number is large, the weight of the text similarity can be made higher, otherwise, the weight of the text similarity can be lowered. The weight value can also be determined through the characteristics of the image to be processed.
在一种可能的实现方式中,S1032、将所述图像相似度和所述文本相似度的加权平均值确定为所述综合相似度,可以通过以下方式实现:确定所述待处理图像的类型;根据所述待处理图像的类型,确定权值信息;根据所述权值信息,对所述图像相似度和所述文本相似度进行加权平均处理,获得所述综合相似度。In a possible implementation manner, S1032, determining a weighted average of the image similarity and the text similarity as the comprehensive similarity may be implemented in the following manner: determining the type of the image to be processed; Determine weight information according to the type of the image to be processed; perform weighted average processing on the image similarity and the text similarity according to the weight information to obtain the comprehensive similarity.
在一种可能的实现方式中,所述待处理图像的类型可以是根据图像的来源确定的,例如,待处理图像的来源为通讯工具交互页面或新闻网站页面的截图,其类型可为“通信工具交互页面截图”,或者“新闻网站页面截图”,或者,待处理图像的来源为街道的摄像头,其类型可以是“街景图像”。再或者,待处理图像的来源为门禁摄像头,其类型可以是“人脸图像”等等。In a possible implementation manner, the type of the image to be processed may be determined according to the source of the image, for example, the source of the image to be processed is a screenshot of a communication tool interaction page or a news website page, and its type may be "communication Tool interaction page screenshot", or "news website page screenshot", or, the source of the image to be processed is a street camera, and its type can be "street view image". Alternatively, the source of the image to be processed is an access control camera, and its type may be "face image" or the like.
在一种可能的实现方式中,待处理图像的类型,还可以根据图像的分类标记来确定,图像的分类标记可以是在待处理图像生成时人为或自动添加的,例如上文中的“通信工具交互页面截图”。In a possible implementation, the type of the image to be processed can also be determined according to the classification mark of the image. The classification mark of the image can be added manually or automatically when the image to be processed is generated, such as the above "communication tool Screenshot of interactive page".
在一种可能的实现方式中,待处理图像的类型还可基于第一图像特征来确定,例如,上述第一图像特征为矩阵或向量形式的特征信息,可将第一图像特征进行反卷积、激活等处理,获得待处理图像的类型,例如,可确定待处理图像为人脸图像、街景图像、新闻截图、书法图像等类型。待处理图像的类型也可以由用户自己定义和设置,本公开对确定待处理图像的类型的方式不做限制。In a possible implementation, the type of the image to be processed can also be determined based on the first image feature. For example, the above-mentioned first image feature is feature information in the form of a matrix or vector, and the first image feature can be deconvoluted , activation, etc., to obtain the type of the image to be processed, for example, it may be determined that the image to be processed is a face image, a street view image, a news screenshot, a calligraphy image, and the like. The type of the image to be processed may also be defined and set by the user, and the present disclosure does not limit the manner of determining the type of the image to be processed.
在一种可能的实现方式中,基于待处理图像的类型,可确定所述权值信息。例如,在待处理图像的类型指示该图像主要以图像特征来表征(例如人脸图像、风景图像等)的情况下,可使图像相似度的权值较高,文本相似度的权值较低。又例如,在待处理图像的类型指示该图像主要以文本特征来表征(例如新闻截图、网页截图等)的情况下,可使图像相似度的权值较低,文本相似度的权值较高。In a possible implementation manner, the weight information may be determined based on the type of the image to be processed. For example, when the type of image to be processed indicates that the image is mainly characterized by image features (such as face images, landscape images, etc.), the weight of image similarity can be made higher, and the weight of text similarity can be lower . For another example, when the type of image to be processed indicates that the image is mainly characterized by text features (such as news screenshots, web page screenshots, etc.), the weight of image similarity can be lower, and the weight of text similarity can be higher. .
在一种可能的实现方式中,可预先为各个类型的图像分别设置对应的权值,也可基于类型来计算权值信息。例如,待处理图像的类型可通过概率的形式表示,例如,待处理图像的类型为新闻截图的概率为95%,为其他类型的概率为5%,则可通过该数据来计算权值信息,例如,可将各类型的概率作为向量的元素,并将该向量进行激活处理,可获得上述权值信息。本公开对 计算权值的方法不做限制。In a possible implementation manner, corresponding weights may be set for each type of image in advance, and weight information may also be calculated based on the type. For example, the type of the image to be processed can be expressed in the form of probability. For example, the probability of the type of the image to be processed is a news screenshot is 95%, and the probability of being other types is 5%. Then the weight information can be calculated based on this data. For example, various types of probabilities may be used as elements of a vector, and the vector may be activated to obtain the above weight information. The present disclosure does not limit the method of calculating the weight.
在一种可能的实现方式中,S1032、将所述图像相似度和所述文本相似度的加权平均值确定为所述综合相似度,还可以通过以下方式实现:确定所述待处理图像中文本所在区域的面积占比;根据所述面积占比,确定所述权值信息;根据所述权值信息,对所述图像相似度和所述文本相似度进行加权平均处理,获得所述综合相似度。In a possible implementation manner, S1032. Determine the weighted average of the image similarity and the text similarity as the comprehensive similarity, which may also be implemented in the following manner: determine the text in the image to be processed The area ratio of the area where it is located; according to the area ratio, determine the weight information; according to the weight information, perform weighted average processing on the image similarity and the text similarity to obtain the comprehensive similarity Spend.
在一种可能的实现方式中,可基于第一图像特征来确定文本所在区域的面积占比。所述第一图像特征可用于表示图像的布局,可基于第一图像特征来计算文本所在区域的面积占比。In a possible implementation manner, the area proportion of the region where the text is located may be determined based on the first image feature. The first image feature can be used to represent the layout of the image, and the area ratio of the area where the text is located can be calculated based on the first image feature.
在一种可能的实现方式中,可基于面积占比来确定权值信息。例如,可将文本所在区域的面积占比确定为文本相似度的权值,进而可计算出图像相似度的权值。又例如,可将文本所在区域的面积占比进行激活处理,可获得上述权值信息。本公开对计算权值的具体方法不做限制。In a possible implementation manner, the weight information may be determined based on the area ratio. For example, the area proportion of the region where the text is located can be determined as the weight of the text similarity, and then the weight of the image similarity can be calculated. For another example, the area ratio of the area where the text is located can be activated to obtain the above weight information. The present disclosure does not limit the specific method for calculating the weight.
在一种可能的实现方式中,待处理图像的类型和文本所在区域的面积占比也可通过其他方式获得,例如,可通过人工标注的方式确定待处理图像的类型,可通过人工测量的方式确定文本所在区域的面积占比。在示例中,还可基于色彩、形状等属性来确定文本所在区域的面积占比,例如,新闻网站截图等图像中,字体通常为黑色,可通过黑色的占比来确定文本所在区域的面积占比,或者,上述截图中字体所在区域为工整的行或列,可基于行或列所呈现的矩形的面积来确定文本所在区域的面积占比,本公开对确定面积占比的方式不做限制。本公开对确定待处理图像的类型和文本所在区域的面积占比的方式不做限制。In a possible implementation, the type of the image to be processed and the area ratio of the area where the text is located can also be obtained in other ways, for example, the type of the image to be processed can be determined through manual labeling, and the Determines the proportion of the area where the text is located. In the example, the area ratio of the area where the text is located can also be determined based on attributes such as color and shape. For example, in images such as screenshots of news websites, the font is usually black, and the area area where the text is located can be determined by the proportion of black. Alternatively, the area where the font is located in the above screenshot is a neat row or column, and the area ratio of the area where the text is located can be determined based on the area of the rectangle presented by the row or column. This disclosure does not limit the method of determining the area ratio . The present disclosure does not limit the manner of determining the type of the image to be processed and the area proportion of the region where the text is located.
在一种可能的实现方式中,权值可与面积占比正相关,例如,文本所在区域的面积占比大越大,图像相似度的权值较低,文本相似度的权值较高。In a possible implementation manner, the weight may be positively correlated with the area ratio, for example, the larger the area ratio of the text area, the lower the weight of the image similarity, and the higher the weight of the text similarity.
在一种可能的实现方式中,确定文本所在区域的面积占比的方式还可用于文本本身也是图像的场景,例如,文本为艺术字,该艺术字本身既是文本,又是图像,可确定文本所在区域的面积占比为100%,图像所在区域面积占比也是100%,因而可令二者权值相等。In a possible implementation, the method of determining the area ratio of the area where the text is located can also be used in scenarios where the text itself is also an image. For example, the text is a word art, and the word art itself is both text and an image. The area ratio of the region where it is located is 100%, and the area ratio of the region where the image is located is also 100%, so the weights of the two can be made equal.
在一种可能的实现方式中,还可统计图像中文字的字数等,例如,可将字数所属的区间与权值进行对应,字数越多,则文本相似度的权值越高,例如,字数大于或等于100字,则文本相似度的权值为0.8,字数大于或等于50且小于100字,则文本相似度权值为0.5,字数小于50字,则文本相似度的权值为0.3等,本公开对字数与权值的对应关系不做限制。In a possible implementation, the number of words in the image can also be counted, for example, the interval to which the number of words belongs can be associated with the weight value, the more the number of words, the higher the weight of the text similarity, for example, the number of words If the number of words is greater than or equal to 100, the weight of text similarity is 0.8; if the number of words is greater than or equal to 50 and less than 100 words, the weight of text similarity is 0.5; if the number of words is less than 50 words, the weight of text similarity is 0.3, etc. , the present disclosure does not limit the corresponding relationship between the number of words and the weight.
通过这种方式,可通过待处理图像的类型或文本所在区域的面积占比来确定图像相似度和文本相似度的权值信息,可提高权值信息的准确度,进而提高匹配结果的准确度。In this way, the weight information of the image similarity and text similarity can be determined by the type of the image to be processed or the area ratio of the text area, which can improve the accuracy of the weight information, thereby improving the accuracy of the matching results .
在一种可能的实现方式中,S1032、将所述图像相似度和所述文本相似度的加权平均值确定为所述综合相似度,还可以通过以下方式实现:根据所述第一图像特征,确定权值信息;根据所述权值信息对所述图像相似度和所述 文本相似度进行加权平均处理,获得所述综合相似度。待处理图像的自身特性可通过表示待处理图像的图案层面的特征信息的第一图像特征来确定,例如,可通过如上方式根据第一图像特征来确定待处理图像的类型,或者,可确定待处理图像中文本所在区域的面积占比等特性。并基于该特性来确定权值。或者,也可根据第一图像特征,通过训练好的网络模型,直接得到权值信息,本公开对此不作限制。In a possible implementation manner, S1032, determining the weighted average of the image similarity and the text similarity as the comprehensive similarity may also be implemented in the following manner: according to the first image feature, Determine weight information; perform weighted average processing on the image similarity and the text similarity according to the weight information to obtain the comprehensive similarity. The characteristics of the image to be processed can be determined by the first image feature representing the feature information of the pattern level of the image to be processed. For example, the type of the image to be processed can be determined according to the first image feature in the above manner, or it can be determined Handle features such as the area ratio of the area where the text is located in the image. And based on this feature to determine the weight. Alternatively, the weight information may also be obtained directly through a trained network model according to the first image feature, which is not limited in the present disclosure.
在一种可能的实现方式中,在确定权值信息后,可基于权值信息对图像相似度和文本相似度进行加权平均处理,以获得待处理图像和参考图像的综合相似度。进而,可基于综合相似度获得匹配结果。例如,在待处理图像与参考图像进行一对一比对时,可在综合相似度高于阈值时确定待处理图像与参考图像匹配。又例如,在图像库中确定出与待处理图像匹配的目标图像时,可将与待处理图像的综合相似度最高的参考图像,确定为与待处理图像匹配的目标图像。In a possible implementation manner, after the weight information is determined, weighted average processing may be performed on the image similarity and the text similarity based on the weight information to obtain a comprehensive similarity between the image to be processed and the reference image. Furthermore, a matching result can be obtained based on the comprehensive similarity. For example, when a one-to-one comparison is performed between the image to be processed and the reference image, it may be determined that the image to be processed matches the reference image when the comprehensive similarity is higher than a threshold. For another example, when the target image matching the image to be processed is determined in the image library, the reference image with the highest comprehensive similarity with the image to be processed may be determined as the target image matching the image to be processed.
根据本公开的实施例的图像处理方法,可获取图像特征和文本特征来综合进行对比,对比过程中考虑了图像中所包含的文本,且通过待处理图像的类型或文本所在区域的面积占比来确定图像相似度和文本相似度的权值信息,进而获得匹配结果。减少了图像的色彩、纹理、光暗、布局、风格、特征点位置等相似度较高,但文本内容不一致的情况下发生误报的概率,提升了匹配结果的准确率。According to the image processing method of the embodiment of the present disclosure, image features and text features can be obtained for comprehensive comparison. During the comparison process, the text contained in the image is considered, and the type of image to be processed or the area ratio of the area where the text is located To determine the weight information of image similarity and text similarity, and then obtain the matching result. It reduces the probability of false positives when the image color, texture, light and dark, layout, style, feature point position, etc. have a high similarity, but the text content is inconsistent, and improves the accuracy of the matching result.
图5示出根据本公开实施例的图像处理方法的应用示意图,如图5所示,待处理图像为包括文本内容的图像,在确定该种待处理图像与图像库中参考图像的匹配结果时,可综合考虑待处理图像的第一图像特征与参考图像的第二图像特征之间的图像相似度,以及待处理图像的第一文本特征与参考图像的第二文本特征之间的文本相似度。Fig. 5 shows a schematic diagram of the application of the image processing method according to an embodiment of the present disclosure. As shown in Fig. 5, the image to be processed is an image including text content, when determining the matching result of the image to be processed and the reference image in the image library , can comprehensively consider the image similarity between the first image feature of the image to be processed and the second image feature of the reference image, and the text similarity between the first text feature of the image to be processed and the second text feature of the reference image .
在一种可能的实现方式中,可预先获取图像库中的各参考图像的第二图像特征和第二文本特征,并在特征库中存储,以用于与待处理图像的特征进行比对。In a possible implementation manner, the second image feature and the second text feature of each reference image in the image library may be acquired in advance, and stored in the feature library for comparison with the features of the image to be processed.
在一种可能的实现方式中,可提取待处理图像的第一图像特征和第一文本特征。例如,可通过卷积神经网络提取待处理图像的第一图像特征,并可通过OCR技术获得图像中的文本信息,并通过递归神经网络获得待处理图像的第一文本特征。In a possible implementation manner, the first image feature and the first text feature of the image to be processed may be extracted. For example, the first image feature of the image to be processed can be extracted through a convolutional neural network, the text information in the image can be obtained through OCR technology, and the first text feature of the image to be processed can be obtained through a recurrent neural network.
在一种可能的实现方式中,可首先通过第一文本特征进行筛选,从图像库的各图像中,筛选出图像相似度最高的n个图像作为参考图像,再确定第一文本特征与这些参考图像的文本相似度。进而可通过加权平均处理,确定待处理图像与这些参考图像的综合相似度。In a possible implementation, firstly, the first text feature can be used for screening, and from the images in the image library, the n images with the highest image similarity can be selected as reference images, and then the first text feature and these reference images can be determined. Text similarity of images. Furthermore, the comprehensive similarity between the image to be processed and these reference images can be determined through weighted average processing.
在一种可能的实现方式中,可确定文本相似度和图像相似度的权值,例如,可通过待处理图像中文本所在区域的面积占比来确定文本相似度的权值。在示例中,可基于待处理图像的第一图像特征来确定文本所在区域的面积占比,进而将文本所在区域的面积占比来确定为文本相似度的权值x,再将1-x 确定为图像相似度的权值。In a possible implementation manner, the weights of the text similarity and the image similarity may be determined, for example, the weight of the text similarity may be determined according to the area proportion of the region where the text is located in the image to be processed. In an example, the area proportion of the area where the text is located can be determined based on the first image feature of the image to be processed, and then the area proportion of the area where the text is located is determined as the weight x of the text similarity, and then 1-x is determined is the weight of image similarity.
在一种可能的实现方式中,在确定上述权值后,可基于上述权值,对图像相似度和文本相似度进行加权平均处理,以分别确定待处理图像与各参考图像的综合相似度,并将综合相似度最高的参考图像确定为与待处理图像匹配的目标图像。In a possible implementation, after the above weights are determined, weighted average processing can be performed on the image similarity and text similarity based on the above weights, so as to respectively determine the comprehensive similarity between the image to be processed and each reference image, And the reference image with the highest comprehensive similarity is determined as the target image matching the image to be processed.
在一种可能的实现方式中,所述图像处理方法可用于网络监管等领域中,例如,对于反数据爬取功能较强的网站,难以直接对该网站发布的文字内容进行监管,可对该网站发布的内容进行截图,并将截图与预设的包括特定词汇或语句的图像进行比对,以确定截图中是否存在特定词汇或语句,进而可确定该网站的内容是否包括特定词汇或语句,从而可对网站内容的发布者进行有效监管。本公开对图像处理方法的应用场景不做限制。In a possible implementation, the image processing method can be used in fields such as network supervision. For example, for a website with a strong anti-data crawling function, it is difficult to directly supervise the text content published by the website. Take screenshots of the content published on the website, and compare the screenshots with preset images containing specific words or sentences to determine whether there are specific words or sentences in the screenshots, and then determine whether the content of the website includes specific words or sentences, In this way, publishers of website content can be effectively supervised. The present disclosure does not limit the application scenarios of the image processing method.
图6示出根据本公开实施例的图像处理方法的流程图,如图6所示,所述方法包括:FIG. 6 shows a flowchart of an image processing method according to an embodiment of the present disclosure. As shown in FIG. 6, the method includes:
S601、获取一张带文字内容的图片(简称:检索图片)(对应上述实施例中的待处理图像);S601. Obtain a picture with text content (abbreviation: search picture) (corresponding to the image to be processed in the above embodiment);
S602、将检索图片对应的图像特征(对应上述实施例中的第一图像特征)与搜索底库(对应上述实施例中的图像库)的图片(对应上述实施例中的参考图像)对应的图像特征库进行比对;并将检索图片对应的文字特征(对应上述实施例中的第一文本特征)与搜索底库的图片对应的文字特征库进行比对;S602. Search for the image corresponding to the image feature (corresponding to the first image feature in the above embodiment) corresponding to the picture and the image corresponding to the picture (corresponding to the reference image in the above embodiment) in the search base library (corresponding to the image library in the above embodiment) Compare the feature library; and compare the text feature library corresponding to the retrieval picture (corresponding to the first text feature in the above-mentioned embodiment) with the text feature library corresponding to the picture in the search base library;
S6021、基于深度学习算法提取检索图片的图像特征;S6021, extracting image features of the retrieved image based on a deep learning algorithm;
S6022、将检索图片的图像特征与图像特征库的图像特征进行1:N的相似度计算;S6022. Perform a 1:N similarity calculation between the image features of the retrieved picture and the image features of the image feature database;
S6023得出每一个图像特征相似度计算的结果;S6023 Obtain the result of calculating the similarity of each image feature;
S6024、通过OCR算法识别并提取检索图片的文字内容;S6024. Recognize and extract the text content of the retrieved image through the OCR algorithm;
S6025、对提取出的文字内容进一步提取文字特征(对应上述实施例中的第一文本特征);S6025. Further extract text features from the extracted text content (corresponding to the first text feature in the above embodiment);
S6026、将检索图片的文字特征与文字特征库的文字特征进行1:N的相似度计算;S6026. Perform a 1:N similarity calculation between the text features of the retrieved image and the text features of the text feature library;
S6027、得出每一个文字特征相似度计算的结果;S6027. Obtain the result of calculating the similarity of each character feature;
S603、融合计算,得出综合排序结果;S603. Fusion calculation to obtain a comprehensive sorting result;
S6031、将搜索底库的图片对应的图像特征相似度以及文字特征相似度分别累加计算,得出综合相似度结果;S6031. Accumulate and calculate the similarity of image features and the similarity of text features corresponding to the pictures in the search base library, and obtain a comprehensive similarity result;
S6032、按高到低对综合相似度结果进行排序,将数值最大的综合相似度结果对应的图像确定为和检索图片最相似的图像。S6032. Sort the comprehensive similarity results from high to low, and determine the image corresponding to the comprehensive similarity result with the largest numerical value as the image most similar to the retrieved picture.
图7示出根据本公开实施例的图像处理方法的流程图,如图7所示,首先,进行S701、获取检索图片;接着,对检索图片分别进行S7021的图像特征比对和S7023的文字特征比对;然后,通过S7022的相似度计算和S7024的相似度计算,对应得到了检索图片与图像特征库中的A_i、B_i、C_i、D_i 的相似度结果,得到了检索图片与文字特征库中的A_w、B_w、C_w、D_w的相似度结果。如表1所示,其中,检索图片与A_i的相似度结果为94%、与B_i的相似度结果为96%、与C_i的相似度结果为91%、与D_i的相似度结果为80%、与A_w的相似度结果为98%、与B_w的相似度结果为90%、与C_w的相似度结果为85%、与D_w的相似度结果为60%。Fig. 7 shows a flow chart of an image processing method according to an embodiment of the present disclosure. As shown in Fig. 7, firstly, perform S701 and obtain a retrieval picture; then, perform S7021 image feature comparison and S7023 text feature respectively on the retrieval picture Compare; then, through the similarity calculation of S7022 and the similarity calculation of S7024, the similarity results of A_i, B_i, C_i, and D_i in the retrieved picture and image feature library are correspondingly obtained, and the similarity results of the retrieved picture and text feature library are obtained. The similarity results of A_w, B_w, C_w, D_w. As shown in Table 1, among them, the similarity result of retrieval picture and A_i is 94%, the similarity result with B_i is 96%, the similarity result with C_i is 91%, the similarity result with D_i is 80%, The result of similarity with A_w is 98%, the result of similarity with B_w is 90%, the result of similarity with C_w is 85%, and the result of similarity with D_w is 60%.
表1Table 1
Figure PCTCN2022096004-appb-000001
Figure PCTCN2022096004-appb-000001
根据上述检索图片与图像特征库和文字特征库比对的结果,S703的相关比对逻辑是:将检索图片与图像特征库进行比对,将最大值的相似度结果对应的图像确定为和检索图片最相似的图像,如表1中所示的,检索图像与图像特征库的图像特征进行比对得到的相似度结果最大值为96%,该最大值相似度结果对应的图像为B图像,从而将B图像确定为与检索图片最相似的图像。According to the result of comparing the retrieved picture with the image feature database and the text feature database, the relevant comparison logic of S703 is: compare the retrieved picture with the image feature database, and determine the image corresponding to the maximum similarity result as the one with the retrieved image. The image most similar to the picture, as shown in Table 1, the maximum value of the similarity result obtained by comparing the retrieved image with the image features of the image feature library is 96%, and the image corresponding to the maximum similarity result is a B image. Thus, the B image is determined to be the image most similar to the retrieved picture.
S704的本公开比对逻辑是:将检索图片与图像特征库中的图像特征和文字特征库中的文字特征分别进行比对,将图像特征库中图像特征和文字特征库中文字特征对应的图像的相似度结果相加,得到综合相似度,将最大值综合相似度对应的图像确定为和检索图片最相似的图像,由表1可知,检索图片与A图像的综合相似度结果为192%,与B图像的综合相似度结果为186%,与C图像的综合相似度结果为176%,与D图像的综合相似度结果为140%,所以,检索图片与A图像的综合相似度结果数值最大,从而将A图像确定为与检索图片最相似的图像。The disclosed comparison logic of S704 is: compare the retrieved picture with the image features in the image feature database and the text features in the text feature database respectively, and compare the images corresponding to the image features in the image feature database and the text features in the text feature database The similarity results of A and A are added together to obtain a comprehensive similarity, and the image corresponding to the maximum comprehensive similarity is determined to be the most similar image to the retrieved image. It can be seen from Table 1 that the composite similarity result of the retrieved image and A image is 192%. The comprehensive similarity result with image B is 186%, the comprehensive similarity result with image C is 176%, and the comprehensive similarity result with image D is 140%. Therefore, the comprehensive similarity result value between the retrieved image and image A is the largest , so that the A image is determined to be the most similar image to the retrieved image.
相关的图像比对方法是对图像进行深度特征(对应上述实施例中的图像特征)提取,并基于图像深度特征进行相似度计算,进而得出比对结果。然而,在面对文字内容占比较大的图像比对场景时,例如社交APP截图(微博、Facebook、Twitter等),往往它们的图像特征是比较相似的,如果只是通过传统的图像比对方法去比对,可能会出现较多误报数据,即图像画面比较相似,但文本内容完全不同的情况。A related image comparison method is to extract depth features (corresponding to the image features in the above-mentioned embodiments) of the image, and perform similarity calculation based on the image depth features, and then obtain a comparison result. However, in the face of image comparison scenarios with a large proportion of text content, such as screenshots of social APPs (Weibo, Facebook, Twitter, etc.), their image features are often relatively similar. If only through traditional image comparison methods For comparison, there may be more false positive data, that is, the images are relatively similar, but the text content is completely different.
为了尽可能地减少在细分场景(文字内容占比较大的图像比对)下误报较多的问题,在示例中基于相关的图像比对逻辑,除了对图像进行深度特征提取之外,还会通过OCR技术识别并提取图像的文字内容,并对文字内容进行文字特征提取,再基于文字特征进行相似度计算,得出文字特征比对的结果。然后,将图像特征比对结果和文字特征比对结果进行融合计算,例如,将两个对应的比对结果相加,得出综合相似度最高的结果。根据这一示例的图像处理方法,只有图像画面以及文本内容的相似度都比较高的图像才会被输出,减少在该场景下的误报,提升比对准确率。In order to minimize the problem of more false positives in subdivided scenarios (image comparison with large text content), in the example based on the relevant image comparison logic, in addition to extracting deep features from the image, The text content of the image will be identified and extracted through OCR technology, and the text features will be extracted from the text content, and then the similarity calculation will be performed based on the text features to obtain the result of text feature comparison. Then, the image feature comparison result and the text feature comparison result are fused and calculated, for example, the two corresponding comparison results are added to obtain the result with the highest comprehensive similarity. According to the image processing method of this example, only images with a relatively high similarity between image frames and text content will be output, reducing false positives in this scenario and improving the comparison accuracy.
可以理解,本公开提及的上述各个方法实施例,在不违背原理逻辑的情况下,均可以彼此相互结合形成结合后的实施例。本领域技术人员可以理解,在具体实施方式的上述方法中,各步骤的具体执行顺序应当以其功能和可能的内在逻辑确定。It can be understood that the above-mentioned method embodiments mentioned in this disclosure can all be combined with each other to form a combined embodiment without violating the principle and logic. Those skilled in the art can understand that, in the above method in the specific implementation manner, the specific execution order of each step should be determined according to its function and possible internal logic.
此外,本公开还提供了图像处理装置、电子设备、计算机可读存储介质、程序,上述均可用来实现本公开提供的任一种图像处理方法,相应技术方案和描述可以参见方法部分的相应记载。图8示出根据本公开实施例的图像处理装置的框图,如图8所示,所述装置包括:特征提取部分801,被配置为对待处理图像进行特征提取处理,获得所述待处理图像的第一图像特征以及第一文本特征,其中,所述第一文本特征为所述待处理图像中包括的文本的特征信息;相似度确定部分802,被配置为分别确定所述第一图像特征与参考图像的第二图像特征之间的图像相似度,以及所述第一文本特征与所述参考图像的第二文本特征之间的文本相似度,其中,所述第二文本特征为所述参考图像中包括的文本的特征信息;匹配部分803,被配置为根据所述图像相似度和所述文本相似度,确定待处理图像和参考图像之间的图像匹配结果。In addition, the present disclosure also provides image processing devices, electronic equipment, computer-readable storage media, and programs, all of which can be used to implement any image processing method provided in the present disclosure. For the corresponding technical solutions and descriptions, please refer to the corresponding records in the method section . FIG. 8 shows a block diagram of an image processing device according to an embodiment of the present disclosure. As shown in FIG. 8 , the device includes: a feature extraction part 801 configured to perform feature extraction processing on an image to be processed, and obtain an image of the image to be processed The first image feature and the first text feature, wherein the first text feature is the feature information of the text included in the image to be processed; the similarity determining part 802 is configured to determine the first image feature and the first text feature respectively the image similarity between the second image features of the reference image, and the text similarity between the first text feature and the second text feature of the reference image, wherein the second text feature is the reference The feature information of the text included in the image; the matching part 803 is configured to determine an image matching result between the image to be processed and the reference image according to the image similarity and the text similarity.
在一种可能的实现方式中,所述装置还包括:目标图像确定部分,被配置为根据待处理图像与至少两个参考图像之间的图像匹配结果,确定所述至少两个参考图像中与所述待处理图像匹配的目标图像。In a possible implementation manner, the apparatus further includes: a target image determining part configured to determine, according to the image matching results between the image to be processed and at least two reference images, the The target image to be matched with the image to be processed.
在一种可能的实现方式中,所述相似度确定部分802还被配置为:分别确定所述第一图像特征与至少两个待选图像的第二图像特征之间的图像相似度;根据所述图像相似度,在所述至少两个待选图像中确定出参考图像;确定所述第一文本特征与所述参考图像的第二文本特征之间的文本相似度。In a possible implementation manner, the similarity determining part 802 is further configured to: respectively determine the image similarity between the first image feature and the second image feature of at least two candidate images; The image similarity is determined to determine a reference image among the at least two candidate images; and the text similarity between the first text feature and the second text feature of the reference image is determined.
在一种可能的实现方式中,所述图像匹配结果包括所述参考图像与所述待处理图像之间的综合相似度,所述匹配部分803还被配置为:将所述图像相似度和所述文本相似度的乘积确定为所述综合相似度;或者,将所述图像相似度和所述文本相似度的加权平均值确定为所述综合相似度。In a possible implementation manner, the image matching result includes a comprehensive similarity between the reference image and the image to be processed, and the matching part 803 is further configured to: combine the image similarity with the The product of the text similarity is determined as the comprehensive similarity; or, the weighted average of the image similarity and the text similarity is determined as the comprehensive similarity.
在一种可能的实现方式中,所述匹配部分803还被配置为:确定所述待处理图像的类型;根据所述待处理图像的类型,确定权值信息;根据所述权值信息,对所述图像相似度和所述文本相似度进行加权平均处理,获得所述综合相似度。In a possible implementation manner, the matching part 803 is further configured to: determine the type of the image to be processed; determine weight information according to the type of the image to be processed; The image similarity and the text similarity are weighted and averaged to obtain the comprehensive similarity.
在一种可能的实现方式中,所述匹配部分803还被配置为:确定所述待处理图像中文本所在区域的面积占比;根据所述面积占比,确定所述权值信息;根据所述权值信息,对所述图像相似度和所述文本相似度进行加权平均处理,获得所述综合相似度。In a possible implementation manner, the matching part 803 is further configured to: determine the area ratio of the area where the text is located in the image to be processed; determine the weight information according to the area ratio; The weight information is used to perform weighted average processing on the image similarity and the text similarity to obtain the comprehensive similarity.
在一种可能的实现方式中,所述匹配部分803还被配置为:根据所述第一图像特征,确定权值信息;根据所述权值信息对所述图像相似度和所述文本相似度进行加权平均处理,获得所述综合相似度。In a possible implementation manner, the matching part 803 is further configured to: determine weight information according to the first image feature; compare the image similarity and the text similarity according to the weight information Perform weighted average processing to obtain the comprehensive similarity.
在一种可能的实现方式中,所述第一文本特征包括语义特征、格式特征、字体特征、尺寸特征、排版特征和语种特征中的至少一种。In a possible implementation manner, the first text features include at least one of semantic features, format features, font features, size features, typesetting features, and language features.
在一些实施例中,本公开实施例提供的装置具有的功能或包含的部分可以被配置为执行上述方法实施例描述的方法,其实现方式可以参照上述方法实施例的描述。In some embodiments, the functions or parts included in the apparatus provided by the embodiments of the present disclosure may be configured to execute the methods described in the above method embodiments, and the implementation manner may refer to the descriptions of the above method embodiments.
本公开实施例还提出一种计算机可读存储介质,其上存储有计算机程序指令,所述计算机程序指令被处理器执行时实现上述方法。计算机可读存储介质可以是非易失性计算机可读存储介质。Embodiments of the present disclosure also provide a computer-readable storage medium, on which computer program instructions are stored, and the above-mentioned method is implemented when the computer program instructions are executed by a processor. The computer readable storage medium may be a non-transitory computer readable storage medium.
本公开实施例还提出一种电子设备,包括:处理器;被配置为存储处理器可执行指令的存储器;其中,所述处理器被配置为调用所述存储器存储的指令,以执行上述方法。An embodiment of the present disclosure also proposes an electronic device, including: a processor; a memory configured to store instructions executable by the processor; wherein the processor is configured to invoke the instructions stored in the memory to execute the above method.
本公开实施例还提供了另一种计算机程序产品,被配置为存储计算机可读指令,指令被执行时使得计算机执行上述任一实施例提供的图像处理方法的操作。Embodiments of the present disclosure also provide another computer program product configured to store computer-readable instructions, and when the instructions are executed, the computer executes the operations of the image processing method provided by any of the above embodiments.
电子设备可以被提供为终端、服务器或其它形态的设备。Electronic devices may be provided as terminals, servers, or other forms of devices.
图9示出根据本公开实施例的一种电子设备900的框图。例如,电子设备900可以是移动电话,计算机,数字广播终端,消息收发设备,游戏控制台,平板设备,医疗设备,健身设备,个人数字助理等终端。FIG. 9 shows a block diagram of an electronic device 900 according to an embodiment of the present disclosure. For example, the electronic device 900 may be a terminal such as a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, or a personal digital assistant.
参照图9,电子设备900可以包括以下一个或多个组件:处理组件901,存储器902,电源组件903,多媒体组件904,音频组件905,输入/输出(Input/Ouput,I/O)接口906,传感器组件907,以及通信组件908。Referring to Fig. 9, the electronic device 900 may include one or more of the following components: a processing component 901, a memory 902, a power supply component 903, a multimedia component 904, an audio component 905, an input/output (Input/Ouput, I/O) interface 906, A sensor component 907, and a communication component 908.
处理组件902通常控制电子设备900的整体操作,诸如与显示,电话呼叫,数据通信,相机操作和记录操作相关联的操作。处理组件901可以包括一个或多个处理器908来执行指令,以完成上述的方法的全部或部分步骤。此外,处理组件901可以包括一个或多个模块,便于处理组件901和其他组件之间的交互。例如,处理组件901可以包括多媒体部分,以方便多媒体组件904和处理组件901之间的交互。The processing component 902 generally controls the overall operations of the electronic device 900, such as those associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 901 may include one or more processors 908 to execute instructions to complete all or part of the steps of the above method. Additionally, processing component 901 may include one or more modules to facilitate interaction between processing component 901 and other components. For example, processing component 901 may include a multimedia portion to facilitate interaction between multimedia component 904 and processing component 901 .
存储器902被配置为存储各种类型的数据以支持在电子设备900的操作。这些数据的示例包括用于在电子设备900上操作的任何应用程序或方法的指令,联系人数据,电话簿数据,消息,图片,视频等。存储器902可以由任何类型的易失性或非易失性存储设备或者它们的组合实现,如静态随机存取存储器(Static Random-Access Memory,SRAM),电可擦除可编程只读存储器(Electrically-Erasable Programmable Read-Only Memory,EEPROM),可擦除可编程只读存储器(Erasable Programmable Read-Only Memory,EPROM),可编程只读存储器(Programmable read-only memory,PROM),只读存储器(Read-Only Memory,ROM),磁存储器,快闪存储器,磁盘或光盘。The memory 902 is configured to store various types of data to support operations at the electronic device 900 . Examples of such data include instructions for any application or method operating on the electronic device 900, contact data, phonebook data, messages, pictures, videos, and the like. The memory 902 can be realized by any type of volatile or non-volatile storage device or their combination, such as Static Random-Access Memory (Static Random-Access Memory, SRAM), Electrically Erasable Programmable Read-Only Memory (Electrically -Erasable Programmable Read-Only Memory, EEPROM), Erasable Programmable Read-Only Memory (Erasable Programmable Read-Only Memory, EPROM), Programmable Read-Only Memory (Programmable read-only memory, PROM), Read Only Memory (Read -Only Memory, ROM), magnetic memory, flash memory, magnetic disk or optical disk.
电源组件903电子设备900的各种组件提供电力。电源组件903可以包括电源管理系统,一个或多个电源,及其他与为电子设备900生成、管理和分配电力相关联的组件。The power supply component 903 provides power to various components of the electronic device 900 . Power supply components 903 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for electronic device 900 .
多媒体组件904包括在所述电子设备900和用户之间的提供一个输出接口的屏幕。在一些实施例中,屏幕可以包括液晶显示器(Liquid Crystal  Display,LCD)和触摸面板(Touch Panel,TP)。如果屏幕包括触摸面板,屏幕可以被实现为触摸屏,以接收来自用户的输入信号。触摸面板包括一个或多个触摸传感器以感测触摸、滑动和触摸面板上的手势。所述触摸传感器可以不仅感测触摸或滑动动作的边缘,而且还检测与所述触摸或滑动操作相关的持续时间和压力。在一些实施例中,多媒体组件904包括一个前置摄像头和后置摄像头中的至少之一。当电子设备900处于操作模式,如拍摄模式或视频模式时,前置摄像头和后置摄像头中的至少之一可以接收外部的多媒体数据。每个前置摄像头和后置摄像头可以是一个固定的光学透镜系统或具有焦距和光学变焦能力。The multimedia component 904 includes a screen providing an output interface between the electronic device 900 and the user. In some embodiments, the screen may include a liquid crystal display (Liquid Crystal Display, LCD) and a touch panel (Touch Panel, TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may not only sense an edge of a touch or slide action, but also detect a duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 904 includes at least one of a front camera and a rear camera. When the electronic device 900 is in an operation mode, such as a shooting mode or a video mode, at least one of the front camera and the rear camera can receive external multimedia data. Each front camera and rear camera can be a fixed optical lens system or have focal length and optical zoom capability.
音频组件905被配置为输出和输入音频信号中的至少之一。例如,音频组件905包括一个麦克风(Microphone,MIC),当电子设备900处于操作模式,如呼叫模式、记录模式和语音识别模式时,麦克风被配置为接收外部音频信号。所接收的音频信号可以被进一步存储在存储器902或经由通信组件908发送。在一些实施例中,音频组件905还包括一个扬声器,被配置为输出音频信号。The audio component 905 is configured to at least one of output and input an audio signal. For example, the audio component 905 includes a microphone (Microphone, MIC), which is configured to receive external audio signals when the electronic device 900 is in an operation mode, such as a calling mode, a recording mode and a voice recognition mode. Received audio signals may be further stored in memory 902 or sent via communication component 908 . In some embodiments, the audio component 905 also includes a speaker configured to output audio signals.
I/O接口906为处理组件901和外围接口模块之间提供接口,上述外围接口模块可以是键盘,点击轮,按钮等。这些按钮可包括但不限于:主页按钮、音量按钮、启动按钮和锁定按钮。The I/O interface 906 provides an interface between the processing component 901 and a peripheral interface module, which may be a keyboard, a click wheel, a button, and the like. These buttons may include, but are not limited to: a home button, volume buttons, start button, and lock button.
传感器组件907包括一个或多个传感器,被配置为为电子设备900提供各个方面的状态评估。例如,传感器组件907可以检测到电子设备900的打开/关闭状态,组件的相对定位,例如所述组件为电子设备900的显示器和小键盘,传感器组件907还可以检测电子设备900或电子设备900一个组件的位置改变,用户与电子设备900接触的存在或不存在,电子设备900方位或加速/减速和电子设备900的温度变化。传感器组件907可以包括接近传感器,被配置用来在没有任何的物理接触时检测附近物体的存在。传感器组件907还可以包括光传感器,如CMOS图像传感器(Complementary Metal-Oxide-Semiconductor,CMOS)或CCD图像传感器(Charge Coupled Device,CCD),被配置为在成像应用中使用。在一些实施例中,该传感器组件907还可以包括加速度传感器,陀螺仪传感器,磁传感器,压力传感器或温度传感器。 Sensor assembly 907 includes one or more sensors configured to provide various aspects of status assessment for electronic device 900 . For example, the sensor component 907 can detect the open/closed state of the electronic device 900, the relative positioning of components, for example, the components are the display and the keypad of the electronic device 900, the sensor component 907 can also detect the electronic device 900 or a Changes in the position of components, presence or absence of user contact with the electronic device 900 , electronic device 900 orientation or acceleration/deceleration and temperature changes in the electronic device 900 . The sensor assembly 907 may include a proximity sensor configured to detect the presence of nearby objects in the absence of any physical contact. The sensor assembly 907 may also include an optical sensor, such as a CMOS image sensor (Complementary Metal-Oxide-Semiconductor, CMOS) or a CCD image sensor (Charge Coupled Device, CCD), configured to be used in imaging applications. In some embodiments, the sensor component 907 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor or a temperature sensor.
通信组件908被配置为便于电子设备900和其他设备之间有线或无线方式的通信。电子设备900可以接入基于通信标准的无线网络,如WiFi,2G或3G,或它们的组合。在一个示例性实施例中,通信组件908经由广播信道接收来自外部广播管理系统的广播信号或广播相关信息。在一个示例性实施例中,所述通信组件908还包括近场通信(Near Field Communication,NFC)模块,以促进短程通信。例如,在NFC模块可基于射频识别(Radio Frequency Identification,RFID)技术,红外数据协会(Infrared Data Association,IrDA)技术,超宽带(Ultra Wide Band,UWB)技术,蓝牙(Bit Torrent,BT)技术和其他技术来实现。The communication component 908 is configured to facilitate wired or wireless communication between the electronic device 900 and other devices. The electronic device 900 can access a wireless network based on communication standards, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 908 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 908 further includes a near field communication (Near Field Communication, NFC) module to facilitate short-range communication. For example, the NFC module can be based on Radio Frequency Identification (RFID) technology, Infrared Data Association (Infrared Data Association, IrDA) technology, Ultra Wide Band (UWB) technology, Bluetooth (Bit Torrent, BT) technology and other techniques to achieve.
在示例性实施例中,电子设备900可以被一个或多个应用专用集成电路(Application Specific Integrated Circuit,ASIC)、数字信号处理器(Digital Signal Processing,DSP)、数字信号处理设备(Digital Signal Processing Device,DSPD)、可编程逻辑器件(Pulsed Laser Deposition,PLD)、现场可编程门阵列(Field Programmable Gate Array,FPGA)、控制器、微控制器、微处理器或其他电子元件实现,被配置为执行上述方法。In an exemplary embodiment, the electronic device 900 may be implemented by one or more application-specific integrated circuits (Application Specific Integrated Circuit, ASIC), digital signal processors (Digital Signal Processing, DSP), digital signal processing devices (Digital Signal Processing Device , DSPD), Programmable Logic Device (Pulsed Laser Deposition, PLD), Field Programmable Gate Array (Field Programmable Gate Array, FPGA), Controller, Microcontroller, Microprocessor or other electronic component implementation, configured to execute the above method.
在示例性实施例中,还提供了一种非易失性计算机可读存储介质,例如包括计算机程序指令的存储器902,上述计算机程序指令可由电子设备900的处理器909执行以完成上述方法。In an exemplary embodiment, there is also provided a non-volatile computer-readable storage medium, such as a memory 902 including computer program instructions, which can be executed by the processor 909 of the electronic device 900 to implement the above method.
图10示出根据本公开实施例的一种电子设备1000的框图。例如,电子设备1000可以被提供为一服务器。参照图10,电子设备1000包括处理组件1001,其还包括一个或多个处理器,以及由存储器1002所代表的存储器资源,被配置为存储可由处理组件1001执行的指令,例如应用程序。存储器1002中存储的应用程序可以包括一个或一个以上的每一个对应于一组指令的模块。此外,处理组件1001被配置为执行指令,以执行上述方法。FIG. 10 shows a block diagram of an electronic device 1000 according to an embodiment of the present disclosure. For example, the electronic device 1000 may be provided as a server. Referring to FIG. 10 , electronic device 1000 includes processing component 1001 , which also includes one or more processors, and a memory resource represented by memory 1002 configured to store instructions executable by processing component 1001 , such as application programs. The application program stored in memory 1002 may include one or more modules each corresponding to a set of instructions. In addition, the processing component 1001 is configured to execute instructions to perform the above method.
电子设备1000还可以包括一个电源组件1003,被配置为执行电子设备1000的电源管理,一个有线或无线网络接口1004,被配置为将电子设备1000连接到网络,和一个I/O接口1005。电子设备1000可以操作基于存储在存储器1002的操作系统,例如Windows ServerTM,Mac OS XTM,UnixTM,LinuxTM,FreeBSDTM或类似。The electronic device 1000 may also include a power supply component 1003 configured to perform power management of the electronic device 1000 , a wired or wireless network interface 1004 configured to connect the electronic device 1000 to a network, and an I/O interface 1005 . The electronic device 1000 can operate based on an operating system stored in the memory 1002, such as Windows Server™, Mac OS X™, Unix™, Linux™, FreeBSD™ or the like.
在示例性实施例中,还提供了一种非易失性计算机可读存储介质,例如包括计算机程序指令的存储器1002,上述计算机程序指令可由电子设备1000的处理组件1001执行以完成上述方法。In an exemplary embodiment, there is also provided a non-volatile computer-readable storage medium, such as the memory 1002 including computer program instructions, which can be executed by the processing component 1001 of the electronic device 1000 to implement the above method.
本公开可以是系统、方法和计算机程序产品至少之一。计算机程序产品可以包括计算机可读存储介质,其上载有用于使处理器实现本公开的各个方面的计算机可读程序指令。The present disclosure may be at least one of a system, method and computer program product. A computer program product may include a computer readable storage medium having computer readable program instructions thereon for causing a processor to implement various aspects of the present disclosure.
计算机可读存储介质可以是可以保持和存储由指令执行设备使用的指令的有形设备。计算机可读存储介质例如可以是――但不限于――电存储设备、磁存储设备、光存储设备、电磁存储设备、半导体存储设备或者上述的任意合适的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括:便携式计算机盘、硬盘、随机存取存储器(Random-Access Memory,RAM)、只读存储器(Read-Only Memory,ROM)、可擦式可编程只读存储器(Erasable Programmable Read-Only Memory,EPROM)或闪存、静态随机存取存储器(Static Random-Access Memory,SRAM)、便携式压缩盘只读存储器(Compact Disc Read-Only Memory,CD-ROM)、数字多功能盘(Digital Versatile Disc,DVD)、记忆棒、软盘、机械编码设备、例如其上存储有指令的打孔卡或凹槽内凸起结构、以及上述的任意合适的组合。这里所使用的计算机可读存储介质不被解释为瞬时信号本身,诸如无线电波或者其他自由传播的电磁波、通过波导或其他传输媒介传播的电磁波(例如,通过光纤电缆的光脉冲)、或者 通过电线传输的电信号。A computer readable storage medium may be a tangible device that can retain and store instructions for use by an instruction execution device. A computer readable storage medium may be, for example, but is not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of computer-readable storage media include: portable computer disks, hard disks, Random-Access Memory (RAM), Read-Only Memory (ROM), Erasable Programmable Read-Only Memory (EPROM) or flash memory, Static Random-Access Memory (Static Random-Access Memory, SRAM), Portable Compact Disc Read-Only Memory (Compact Disc Read-Only Memory, CD-ROM), Digital Versatile Disc (DVD), memory sticks, floppy disks, mechanically encoded devices such as punched cards or raised structures in grooves with instructions stored thereon, and any suitable combination. As used herein, computer-readable storage media are not to be construed as transient signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., pulses of light through fiber optic cables), or transmitted electrical signals.
这里所描述的计算机可读程序指令可以从计算机可读存储介质下载到各个计算/处理设备,或者通过网络,例如因特网、局域网、广域网和无线网中的至少之一下载到外部计算机或外部存储设备。网络可以包括铜传输电缆、光纤传输、无线传输、路由器、防火墙、交换机、网关计算机和边缘服务器中的至少之一。每个计算/处理设备中的网络适配卡或者网络接口从网络接收计算机可读程序指令,并转发该计算机可读程序指令,以供存储在各个计算/处理设备中的计算机可读存储介质中。The computer-readable program instructions described herein can be downloaded from a computer-readable storage medium to a respective computing/processing device, or downloaded to an external computer or external storage device over a network, such as at least one of the Internet, a local area network, a wide area network, and a wireless network . The network may include at least one of copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and edge servers. A network adapter card or a network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in each computing/processing device .
用于执行本公开操作的计算机程序指令可以是汇编指令、指令集架构(Instruction Set Architectures,ISA)指令、机器指令、机器相关指令、微代码、固件指令、状态设置数据、或者以一种或多种编程语言的任意组合编写的源代码或目标代码,所述编程语言包括面向对象的编程语言—诸如Smalltalk、C++等,以及常规的过程式编程语言—诸如“C”语言或类似的编程语言。计算机可读程序指令可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络—包括局域网(Local Area Network,LAN)或广域网(Wide Area Network,WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。在一些实施例中,通过利用计算机可读程序指令的状态信息来个性化定制电子电路,例如可编程逻辑电路、现场可编程门阵列(Field Programmable Gate Array,FPGA)或可编程逻辑阵列(Programmable Logic Array,PLA),该电子电路可以执行计算机可读程序指令,从而实现本公开的各个方面。Computer program instructions for performing the operations of the present disclosure may be assembly instructions, Instruction Set Architectures (ISA) instructions, machine instructions, machine-dependent instructions, microcode, firmware instructions, state setting data, or in the form of one or more source or object code written in any combination of programming languages, including object-oriented programming languages—such as Smalltalk, C++, etc., and conventional procedural programming languages—such as the “C” language or similar programming languages. Computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server implement. In cases involving a remote computer, the remote computer can be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or it can be connected to an external computer such as use an Internet service provider to connect via the Internet). In some embodiments, electronic circuits, such as programmable logic circuits, field programmable gate arrays (Field Programmable Gate Array, FPGA) or programmable logic arrays (Programmable Logic Array, PLA), the electronic circuit can execute computer-readable program instructions, thereby implementing various aspects of the present disclosure.
这里参照根据本公开实施例的方法、装置(系统)和计算机程序产品的流程图和框图中的至少之一描述了本公开的各个方面。应当理解,流程图和框图中的至少之一的每个方框以及流程图和框图中的至少之一各方框的组合,都可以由计算机可读程序指令实现。Aspects of the present disclosure are described herein with reference to at least one of flowchart illustrations and block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It should be understood that each block of at least one of the flowchart and block diagrams, and combinations of at least one of blocks in the flowchart and block diagrams, can be implemented by computer-readable program instructions.
这些计算机可读程序指令可以提供给通用计算机、专用计算机或其它可编程数据处理装置的处理器,从而生产出一种机器,使得这些指令在通过计算机或其它可编程数据处理装置的处理器执行时,产生了实现流程图和框图中至少之一的一个或多个方框中规定的功能/动作的装置。也可以把这些计算机可读程序指令存储在计算机可读存储介质中,这些指令使得计算机、可编程数据处理装置和其他设备中的至少之一以特定方式工作,从而,存储有指令的计算机可读介质则包括一个制造品,其包括实现流程图和框图中的至少之一的一个或多个方框中规定的功能/动作的各个方面的指令。These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine such that when executed by the processor of the computer or other programmable data processing apparatus , producing a device for realizing the functions/actions specified in one or more blocks of at least one of the flowchart and the block diagram. These computer-readable program instructions can also be stored in a computer-readable storage medium, and these instructions cause at least one of computers, programmable data processing devices and other devices to work in a specific way, so that the computer-readable The medium then includes an article of manufacture including instructions for implementing various aspects of the functions/acts specified in one or more blocks of at least one of flowcharts and block diagrams.
也可以把计算机可读程序指令加载到计算机、其它可编程数据处理装置、或其它设备上,使得在计算机、其它可编程数据处理装置或其它设备上执行一系列操作步骤,以产生计算机实现的过程,从而使得在计算机、其它可编程数据处理装置、或其它设备上执行的指令实现流程图和框图中的至少之一 的一个或多个方框中规定的功能/动作。It is also possible to load computer-readable program instructions into a computer, other programmable data processing device, or other equipment, so that a series of operational steps are performed on the computer, other programmable data processing device, or other equipment to produce a computer-implemented process , so that instructions executed on computers, other programmable data processing devices, or other devices implement the functions/actions specified in one or more blocks of at least one of the flowcharts and block diagrams.
附图中的流程图和框图显示了根据本公开的多个实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段或指令的一部分,所述模块、程序段或指令的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个连续的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和流程图中的至少之一的每个方框、以及框图和流程图中的至少之一的方框的组合,可以用执行规定的功能或动作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in a flowchart or block diagram may represent a module, a portion of a program segment, or an instruction that includes one or more Executable instructions. In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved. It is also to be noted that each block of at least one of the block diagrams and flowcharts, and combinations of blocks of at least one of the block diagrams and flowcharts, may be implemented with dedicated hardware-based devices that perform specified functions or actions. system, or it may be implemented by a combination of special purpose hardware and computer instructions.
该计算机程序产品可以具体通过硬件、软件或其结合的方式实现。在一个可选实施例中,所述计算机程序产品具体体现为计算机存储介质,在另一个可选实施例中,计算机程序产品具体体现为软件产品,例如软件开发包(Software Development Kit,SDK)等等。The computer program product can be specifically realized by means of hardware, software or a combination thereof. In an optional embodiment, the computer program product is embodied as a computer storage medium, and in another optional embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK) etc. wait.
以上已经描述了本公开的各实施例,上述说明是示例性的,并非穷尽性的,并且也不限于所披露的各实施例。在不偏离所说明的各实施例的范围和精神的情况下,对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。本文中所用术语的选择,旨在解释各实施例的原理、实际应用或对市场中的技术的改进,或者使本技术领域的其他普通技术人员能理解本文披露的各实施例。Having described various embodiments of the present disclosure above, the foregoing description is exemplary, not exhaustive, and is not limited to the disclosed embodiments. Many modifications and alterations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The choice of terminology used herein is intended to explain the principle of each embodiment, practical application or improvement of technology in the market, or to enable other ordinary skilled persons in the technical field to understand each embodiment disclosed herein.
工业实用性Industrial Applicability
本公开实施例的图像处理方法,可获取图像特征和文本特征来综合进行对比,对比过程中考虑了图像中所包含的文本,且通过待处理图像的类型或文本所在区域的面积占比等方式来确定图像相似度和文本相似度的权值信息,进而获得匹配结果。这样,即使在图像的色彩、纹理、光暗、布局、风格、特征点位置等相似度较高,但文本内容不一致的情况下,也可以通过确定的文本相似度,减小图像匹配发生误报的概率,提升了匹配结果的准确率。The image processing method of the embodiment of the present disclosure can obtain image features and text features for comprehensive comparison. During the comparison process, the text contained in the image is considered, and the type of the image to be processed or the area ratio of the area where the text is located, etc. To determine the weight information of image similarity and text similarity, and then obtain the matching result. In this way, even if the similarity of image color, texture, light and shade, layout, style, feature point position, etc. is high, but the text content is inconsistent, the determined text similarity can be used to reduce false positives in image matching The probability of , which improves the accuracy of the matching results.

Claims (18)

  1. 一种图像处理方法,所述方法由电子设备执行,所述方法包括:An image processing method, the method is executed by an electronic device, the method comprising:
    对待处理图像进行特征提取处理,获得所述待处理图像的第一图像特征以及第一文本特征,其中,所述第一文本特征为所述待处理图像中包括的文本的特征信息;performing feature extraction processing on the image to be processed to obtain a first image feature and a first text feature of the image to be processed, wherein the first text feature is feature information of text included in the image to be processed;
    分别确定所述第一图像特征与参考图像的第二图像特征之间的图像相似度,以及所述第一文本特征与所述参考图像的第二文本特征之间的文本相似度,其中,所述第二文本特征为所述参考图像中包括的文本的特征信息;respectively determining the image similarity between the first image feature and the second image feature of the reference image, and the text similarity between the first text feature and the second text feature of the reference image, wherein the The second text feature is feature information of the text included in the reference image;
    根据所述图像相似度和所述文本相似度,确定待处理图像和参考图像之间的图像匹配结果。An image matching result between the image to be processed and the reference image is determined according to the image similarity and the text similarity.
  2. 根据权利要求1所述的方法,其中,所述方法还包括:The method according to claim 1, wherein the method further comprises:
    根据待处理图像与至少两个参考图像之间的图像匹配结果,确定所述至少两个参考图像中与所述待处理图像匹配的目标图像。According to an image matching result between the image to be processed and at least two reference images, a target image matching the image to be processed among the at least two reference images is determined.
  3. 根据权利要求2所述的方法,其中,分别确定所述第一图像特征与参考图像的第二图像特征之间的图像相似度,以及所述第一文本特征与所述参考图像的第二文本特征之间的文本相似度,包括:The method according to claim 2, wherein the image similarity between the first image feature and the second image feature of the reference image, and the first text feature and the second text feature of the reference image are respectively determined. Textual similarity between features, including:
    分别确定所述第一图像特征与至少两个待选图像的第二图像特征之间的图像相似度;respectively determining image similarities between the first image features and the second image features of at least two candidate images;
    根据所述图像相似度,在所述至少两个待选图像中确定出参考图像;determining a reference image among the at least two candidate images according to the image similarity;
    确定所述第一文本特征与所述参考图像的第二文本特征之间的文本相似度。A text similarity between the first text feature and a second text feature of the reference image is determined.
  4. 根据权利要求1至3中任意一项所述的方法,其中,所述图像匹配结果包括所述参考图像与所述待处理图像之间的综合相似度,The method according to any one of claims 1 to 3, wherein the image matching result includes a comprehensive similarity between the reference image and the image to be processed,
    其中,根据所述图像相似度和所述文本相似度,确定所述待处理图像与所述参考图像之间的匹配结果,包括以下中的一种:Wherein, according to the image similarity and the text similarity, determining the matching result between the image to be processed and the reference image includes one of the following:
    将所述图像相似度和所述文本相似度的乘积确定为所述综合相似度;determining the product of the image similarity and the text similarity as the comprehensive similarity;
    将所述图像相似度和所述文本相似度的加权平均值确定为所述综合相似度。A weighted average of the image similarity and the text similarity is determined as the comprehensive similarity.
  5. 根据权利要求4所述的方法,其中,将所述图像相似度和所述文本相似度的加权平均值确定为所述综合相似度,包括:The method according to claim 4, wherein determining the weighted average of the image similarity and the text similarity as the comprehensive similarity includes:
    确定所述待处理图像的类型;determining the type of the image to be processed;
    根据所述待处理图像的类型,确定权值信息;Determine weight information according to the type of the image to be processed;
    根据所述权值信息,对所述图像相似度和所述文本相似度进行加权平均处理,获得所述综合相似度。Perform weighted average processing on the image similarity and the text similarity according to the weight information to obtain the comprehensive similarity.
  6. 根据权利要求4所述的方法,其中,将所述图像相似度和所述文本相似度的加权平均值确定为所述综合相似度,包括:The method according to claim 4, wherein determining the weighted average of the image similarity and the text similarity as the comprehensive similarity includes:
    确定所述待处理图像中文本所在区域的面积占比;Determine the proportion of the area where the text is located in the image to be processed;
    根据所述面积占比,确定权值信息;Determine the weight information according to the area ratio;
    根据所述权值信息,对所述图像相似度和所述文本相似度进行加权平均处理,获得所述综合相似度。Perform weighted average processing on the image similarity and the text similarity according to the weight information to obtain the comprehensive similarity.
  7. 根据权利要求4至6中任意一项所述的方法,其中,将所述图像相似度和所述文本相似度的加权平均值确定为所述综合相似度,包括:The method according to any one of claims 4 to 6, wherein determining the weighted average of the image similarity and the text similarity as the comprehensive similarity includes:
    根据所述第一图像特征,确定权值信息;Determine weight information according to the first image feature;
    根据所述权值信息对所述图像相似度和所述文本相似度进行加权平均处理,获得所述综合相似度。Performing weighted average processing on the image similarity and the text similarity according to the weight information to obtain the comprehensive similarity.
  8. 根据权利要求1至7中任意一项所述的方法,其中,所述第一文本特征包括以下至少之一:语义特征、格式特征、字体特征、尺寸特征、排版特征和语种特征。The method according to any one of claims 1 to 7, wherein the first text features include at least one of the following: semantic features, format features, font features, size features, typesetting features and language features.
  9. 一种图像处理装置,包括:An image processing device, comprising:
    特征提取部分,被配置为对待处理图像进行特征提取处理,获得所述待处理图像的第一图像特征以及第一文本特征,其中,所述第一文本特征为所述待处理图像中包括的文本的特征信息;The feature extraction part is configured to perform feature extraction processing on the image to be processed, and obtain a first image feature and a first text feature of the image to be processed, wherein the first text feature is the text included in the image to be processed feature information;
    相似度确定部分,被配置为分别确定所述第一图像特征与参考图像的第二图像特征之间的图像相似度,以及所述第一文本特征与所述参考图像的第二文本特征之间的文本相似度,其中,所述第二文本特征为所述参考图像中包括的文本的特征信息;a similarity determination part configured to determine image similarity between the first image feature and the second image feature of the reference image, and between the first text feature and the second text feature of the reference image, respectively text similarity, wherein the second text feature is feature information of the text included in the reference image;
    匹配部分,被配置为根据所述图像相似度和所述文本相似度,确定待处理图像和参考图像之间的图像匹配结果。The matching part is configured to determine an image matching result between the image to be processed and the reference image according to the image similarity and the text similarity.
  10. 根据权利要求9所述的装置,其中,所述装置还包括:The device according to claim 9, wherein the device further comprises:
    目标图像确定部分,被配置为根据待处理图像与至少两个参考图像之间的图像匹配结果,确定所述至少两个参考图像中与所述待处理图像匹配的目标图像。The target image determining part is configured to determine a target image among the at least two reference images that matches the image to be processed according to an image matching result between the image to be processed and at least two reference images.
  11. 根据权利要求10所述的装置,其中,所述相似度确定部分还被配置为:The device according to claim 10, wherein the similarity determining part is further configured to:
    分别确定所述第一图像特征与至少两个待选图像的第二图像特征之间的图像相似度;respectively determining image similarities between the first image features and the second image features of at least two candidate images;
    根据所述图像相似度,在所述至少两个待选图像中确定出参考图像;determining a reference image among the at least two candidate images according to the image similarity;
    确定所述第一文本特征与所述参考图像的第二文本特征之间的文本相似度。A text similarity between the first text feature and a second text feature of the reference image is determined.
  12. 根据权利要求9至11任意一项所述的装置,其中,所述图像匹配结果包括所述参考图像与所述待处理图像之间的综合相似度,所述匹配部分还被配置为:The device according to any one of claims 9 to 11, wherein the image matching result includes a comprehensive similarity between the reference image and the image to be processed, and the matching part is further configured to:
    将所述图像相似度和所述文本相似度的乘积确定为所述综合相似度;determining the product of the image similarity and the text similarity as the comprehensive similarity;
    或者,将所述图像相似度和所述文本相似度的加权平均值确定为所述综合相似度。Alternatively, a weighted average of the image similarity and the text similarity is determined as the comprehensive similarity.
  13. 根据权利要求12所述的装置,其中,所述匹配部分还被配置为:The apparatus of claim 12, wherein the matching section is further configured to:
    确定所述待处理图像的类型;determining the type of the image to be processed;
    根据所述待处理图像的类型,确定权值信息;Determine weight information according to the type of the image to be processed;
    根据所述权值信息,对所述图像相似度和所述文本相似度进行加权平均处理,获得所述综合相似度。Perform weighted average processing on the image similarity and the text similarity according to the weight information to obtain the comprehensive similarity.
  14. 根据权利要求12所述的装置,其中,所述匹配部分还被配置为:The apparatus of claim 12, wherein the matching section is further configured to:
    确定所述待处理图像中文本所在区域的面积占比;Determine the proportion of the area where the text is located in the image to be processed;
    根据所述面积占比,确定所述权值信息;determining the weight information according to the area ratio;
    根据所述权值信息,对所述图像相似度和所述文本相似度进行加权平均处理,获得所述综合相似度。Perform weighted average processing on the image similarity and the text similarity according to the weight information to obtain the comprehensive similarity.
  15. 根据权利要求12至14任意一项所述的装置,其中,所述匹配部分还配置为:The device according to any one of claims 12 to 14, wherein the matching part is further configured to:
    根据所述第一图像特征,确定权值信息;Determine weight information according to the first image feature;
    根据所述权值信息对所述图像相似度和所述文本相似度进行加权平均处理,获得所述综合相似度。Performing weighted average processing on the image similarity and the text similarity according to the weight information to obtain the comprehensive similarity.
  16. 一种电子设备,包括:An electronic device comprising:
    处理器;processor;
    被配置为存储处理器可执行指令的存储器;memory configured to store processor-executable instructions;
    其中,所述处理器被配置为调用所述存储器存储的指令时,实现权利要求1至8中任意一项所述的方法。Wherein, the processor is configured to implement the method according to any one of claims 1 to 8 when calling the instructions stored in the memory.
  17. 一种计算机可读存储介质,其上存储有计算机程序指令,所述计算机程序指令被处理器执行时实现权利要求1至8中任意一项所述的方法。A computer-readable storage medium, on which computer program instructions are stored, and when the computer program instructions are executed by a processor, the method according to any one of claims 1 to 8 is realized.
  18. 一种计算机程序产品,所述计算机程序产品包括存储了计算机程序的非瞬时性计算机可读存储介质,所述计算机程序被计算机读取并执行时,实现权利要求1至8中任意一项所述的方法。A computer program product, the computer program product comprising a non-transitory computer-readable storage medium storing a computer program, when the computer program is read and executed by a computer, it realizes any one of claims 1 to 8 Methods.
PCT/CN2022/096004 2021-11-29 2022-05-30 Image processing method and apparatus, electronic device, storage medium, and computer program product WO2023092975A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111435625.7A CN114118278A (en) 2021-11-29 2021-11-29 Image processing method and device, electronic equipment and storage medium
CN202111435625.7 2021-11-29

Publications (1)

Publication Number Publication Date
WO2023092975A1 true WO2023092975A1 (en) 2023-06-01

Family

ID=80371521

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/096004 WO2023092975A1 (en) 2021-11-29 2022-05-30 Image processing method and apparatus, electronic device, storage medium, and computer program product

Country Status (2)

Country Link
CN (1) CN114118278A (en)
WO (1) WO2023092975A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114118278A (en) * 2021-11-29 2022-03-01 深圳市商汤科技有限公司 Image processing method and device, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111694978A (en) * 2020-05-20 2020-09-22 Oppo(重庆)智能科技有限公司 Image similarity detection method and device, storage medium and electronic equipment
US20210124976A1 (en) * 2019-10-28 2021-04-29 Samsung Sds Co., Ltd. Apparatus and method for calculating similarity of images
CN112990376A (en) * 2021-04-29 2021-06-18 北京世纪好未来教育科技有限公司 Text image similarity evaluation method and device and computing equipment
CN113111154A (en) * 2021-06-11 2021-07-13 北京世纪好未来教育科技有限公司 Similarity evaluation method, answer search method, device, equipment and medium
CN114118278A (en) * 2021-11-29 2022-03-01 深圳市商汤科技有限公司 Image processing method and device, electronic equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210124976A1 (en) * 2019-10-28 2021-04-29 Samsung Sds Co., Ltd. Apparatus and method for calculating similarity of images
CN111694978A (en) * 2020-05-20 2020-09-22 Oppo(重庆)智能科技有限公司 Image similarity detection method and device, storage medium and electronic equipment
CN112990376A (en) * 2021-04-29 2021-06-18 北京世纪好未来教育科技有限公司 Text image similarity evaluation method and device and computing equipment
CN113111154A (en) * 2021-06-11 2021-07-13 北京世纪好未来教育科技有限公司 Similarity evaluation method, answer search method, device, equipment and medium
CN114118278A (en) * 2021-11-29 2022-03-01 深圳市商汤科技有限公司 Image processing method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN114118278A (en) 2022-03-01

Similar Documents

Publication Publication Date Title
CN111310616B (en) Image processing method and device, electronic equipment and storage medium
CN113538519B (en) Target tracking method and device, electronic equipment and storage medium
EP3173948A1 (en) Method and apparatus for recommendation of reference documents
WO2020029966A1 (en) Method and device for video processing, electronic device, and storage medium
CN111783756B (en) Text recognition method and device, electronic equipment and storage medium
CN107102746B (en) Candidate word generation method and device and candidate word generation device
WO2021056621A1 (en) Text sequence recognition method and apparatus, electronic device, and storage medium
CN111581488B (en) Data processing method and device, electronic equipment and storage medium
WO2021031645A1 (en) Image processing method and apparatus, electronic device and storage medium
WO2021027343A1 (en) Human face image recognition method and apparatus, electronic device, and storage medium
CN111259967B (en) Image classification and neural network training method, device, equipment and storage medium
CN110781813B (en) Image recognition method and device, electronic equipment and storage medium
WO2021208666A1 (en) Character recognition method and apparatus, electronic device, and storage medium
CN107564526B (en) Processing method, apparatus and machine-readable medium
CN110659690B (en) Neural network construction method and device, electronic equipment and storage medium
CN110391966B (en) Message processing method and device and message processing device
EP3734472A1 (en) Method and device for text processing
CN111222316B (en) Text detection method, device and storage medium
WO2023078414A1 (en) Related article search method and apparatus, electronic device, and storage medium
CN107424612B (en) Processing method, apparatus and machine-readable medium
CN110633715B (en) Image processing method, network training method and device and electronic equipment
WO2023092975A1 (en) Image processing method and apparatus, electronic device, storage medium, and computer program product
CN110232181B (en) Comment analysis method and device
CN114168798A (en) Text storage management and retrieval method and device
CN110070046B (en) Face image recognition method and device, electronic equipment and storage medium