WO2023092975A1 - Procédé et appareil de traitement d'images, dispositif électronique, support d'enregistrement, et produit programme informatique - Google Patents

Procédé et appareil de traitement d'images, dispositif électronique, support d'enregistrement, et produit programme informatique Download PDF

Info

Publication number
WO2023092975A1
WO2023092975A1 PCT/CN2022/096004 CN2022096004W WO2023092975A1 WO 2023092975 A1 WO2023092975 A1 WO 2023092975A1 CN 2022096004 W CN2022096004 W CN 2022096004W WO 2023092975 A1 WO2023092975 A1 WO 2023092975A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
similarity
text
feature
processed
Prior art date
Application number
PCT/CN2022/096004
Other languages
English (en)
Chinese (zh)
Inventor
郭彤
Original Assignee
上海商汤智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海商汤智能科技有限公司 filed Critical 上海商汤智能科技有限公司
Publication of WO2023092975A1 publication Critical patent/WO2023092975A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures

Definitions

  • the present disclosure relates to the field of computer technology, and in particular to an image processing method and device, electronic equipment, storage media and computer program products.
  • the image comparison method is to extract features of images, and perform similarity calculation based on image features, and then obtain a comparison result.
  • the text content is included in the image, there may be two or more images with high similarity in color, texture, light and shade, layout, style, feature point position, etc., but different text content (for example, Screenshots of the chat interface of social software, news screenshots, etc.).
  • Embodiments of the present disclosure at least provide an image processing method, device, device, storage medium, and computer program product.
  • an image processing method including: performing feature extraction processing on an image to be processed, and obtaining a first image feature and a first text feature of the image to be processed, wherein the first The text feature is the feature information of the text included in the image to be processed; respectively determine the image similarity between the first image feature and the second image feature of the reference image, and the first text feature and the reference image The text similarity between the second text features of the image, wherein the second text feature is the feature information of the text included in the reference image; according to the image similarity and the text similarity, determine the Image matching results between an image and a reference image.
  • image features and text features can be obtained for comprehensive comparison.
  • the text contained in the image is considered, and the color, texture, light and shade, layout, style, and color of the image are reduced.
  • the similarity of the feature point position is high, but the text content is inconsistent, the probability of false positives occurs, and the accuracy of the matching result is improved.
  • the method further includes: according to the image matching results between the image to be processed and at least two reference images, determining an object in the at least two reference images that matches the image to be processed image.
  • the image similarity between the first image feature and the second image feature of the reference image, and the relationship between the first text feature and the second text feature of the reference image are respectively determined.
  • the text similarity between the texts includes: respectively determining the image similarity between the first image feature and the second image feature of at least two candidate images; according to the image similarity, in the at least two candidate images A reference image is determined from the image; and a text similarity between the first text feature and the second text feature of the reference image is determined.
  • the image matching result includes a comprehensive similarity between the reference image and the image to be processed, wherein, according to the image similarity and the text similarity, the The matching result between the image to be processed and the reference image includes one of the following: the product of the image similarity and the text similarity is determined as the comprehensive similarity; the image similarity and The weighted average of the text similarities is determined as the comprehensive similarity.
  • determining the weighted average of the image similarity and the text similarity as the comprehensive similarity includes: determining the type of the image to be processed; type, and determine weight information; according to the weight information, perform weighted average processing on the image similarity and the text similarity to obtain the comprehensive similarity.
  • determining the weighted average value of the image similarity and the text similarity as the comprehensive similarity includes: determining the area ratio of the area where the text is located in the image to be processed; The weight information is determined according to the area proportion; and the weighted average processing is performed on the image similarity and the text similarity according to the weight information to obtain the comprehensive similarity.
  • the weight information of the image similarity and text similarity can be determined by the type of the image to be processed or the area ratio of the text area, which can improve the accuracy of the weight information, thereby improving the accuracy of the matching results .
  • determining a weighted average of the image similarity and the text similarity as the comprehensive similarity includes: determining weight information according to the first image feature; The weight information performs weighted average processing on the image similarity and the text similarity to obtain the comprehensive similarity.
  • the first text feature includes at least one of the following: semantic feature, format feature, font feature, size feature, typesetting feature, and language feature.
  • an image processing device including: a feature extraction part configured to perform feature extraction processing on an image to be processed, and obtain a first image feature and a first text feature of the image to be processed , wherein, the first text feature is the feature information of the text included in the image to be processed; the similarity determining part is configured to respectively determine the difference between the first image feature and the second image feature of the reference image Image similarity, and the text similarity between the first text feature and the second text feature of the reference image, wherein the second text feature is feature information of the text included in the reference image; matching A part configured to determine an image matching result between the image to be processed and the reference image according to the image similarity and the text similarity.
  • an electronic device including: a processor; a memory configured to store instructions executable by the processor; wherein, when the processor is configured to call the instructions stored in the memory Implement the above method.
  • a computer-readable storage medium on which computer program instructions are stored, and when the computer program instructions are executed by a processor, the foregoing method is implemented.
  • a computer program product includes a non-transitory computer-readable storage medium storing a computer program, and when the computer program is read and executed by a computer, the above-mentioned method.
  • FIG. 1 shows a flowchart of an image processing method according to an embodiment of the disclosure
  • FIG. 2 shows a flowchart of an image processing method according to an embodiment of the present disclosure
  • FIG. 3 shows a flowchart of an image processing method according to an embodiment of the present disclosure
  • FIG. 4 shows a flowchart of an image processing method according to an embodiment of the present disclosure
  • FIG. 5 shows a schematic diagram of an application of an image processing method according to an embodiment of the present disclosure
  • FIG. 6 shows a flowchart of an image processing method according to an embodiment of the present disclosure
  • FIG. 7 shows a flowchart of an image processing method according to an embodiment of the present disclosure
  • FIG. 8 shows a block diagram of an image processing device according to an embodiment of the present disclosure
  • Fig. 9 shows a block diagram of an electronic device according to an embodiment of the present disclosure.
  • Fig. 10 shows a block diagram of an electronic device according to an embodiment of the present disclosure.
  • At least one in the present disclosure means any one or any combination of at least two of the multiple types, for example, including at least one of A, B, and C, which can mean including from A, B Any one or more elements selected from the set formed by and C.
  • FIG. 1 shows a flow chart of an image processing method according to an embodiment of the present disclosure, the method is executed by an electronic device, and will be described with reference to the steps shown in FIG. 1 .
  • image features and text features can be obtained for comprehensive comparison.
  • the text contained in the image is considered, and the color, texture, light and shade, layout, style, and feature of the image are reduced.
  • the similarity of the point position is high, but the text content is inconsistent, the probability of false positives will increase the accuracy of the matching results.
  • feature extraction processing is usually performed on the two images, for example, feature extraction is performed through a deep learning neural network, image features of the two images are obtained respectively, and two images are determined
  • image features of the two images are obtained respectively, and two images are determined
  • the similarity between the image features of the images if the similarity is higher than a threshold, two images can be considered to match.
  • This method can be used in face recognition, object recognition and other fields.
  • the image features extracted by this method can usually describe the feature information of the pattern level such as color, texture, light and dark, layout, style, feature point position of the image, but it is difficult to recognize the text information when the image contains text information.
  • the image features and text features of the images participating in the comparison can be obtained separately, and a comprehensive judgment can be made based on the image similarity between image features and the text similarity between text features Whether the images participating in the comparison match.
  • the text features can describe the semantics, format, font, size, typesetting (including the position of the text in the image), language and other feature information of the text.
  • the present disclosure does not limit the feature information described by the text features.
  • the image to be processed may be an image including text information, for example, the image includes one or more characters.
  • feature extraction processing may be performed on the image to be processed to obtain the first image feature and the first text feature of the image to be processed.
  • the above features can be obtained by processing the image to be processed through a deep learning neural network.
  • the first image feature of the image to be processed may be extracted through a convolutional neural network.
  • the text information in the image (for example, the content of the text, the position of the text area, the shape of the character, etc.) can be obtained through the optical character recognition (Optical Character Recognition, OCR) technology, and the first text can be obtained through the recurrent neural network feature.
  • OCR optical Character Recognition
  • the present disclosure does not limit the acquisition manners of the first image feature and the first text feature.
  • the first text features may include at least one of feature information such as semantic features, format features, font features, size features, typesetting features, and language features of the text.
  • feature information such as semantic features, format features, font features, size features, typesetting features, and language features of the text.
  • Each feature can be obtained separately through a neural network, and weighted average processing is performed to obtain the first text feature.
  • the weight may be determined based on information such as the type of the image and the number of characters. For example, if the image to be processed is a calligraphy image with only a few characters, the weight of semantic features can be made lower, and the weights of format features, font features, and size features can be higher. For another example, if the image to be processed is a screenshot of a chat interface or a news screenshot that includes a lot of text, the weight of the semantic feature can be made higher, and the weight of other features can be made lower.
  • the above features may also be fused in other ways to obtain the first text feature.
  • the above-mentioned features can be expressed in the form of a feature matrix or a feature vector, and the above-mentioned features can be multiplied to obtain the first text feature.
  • the present disclosure does not limit the manner of determining the first text feature.
  • the reference image is an image to be compared with the image to be processed, for example, the reference image and the image to be processed are images of the same type, for example, both are calligraphy images containing less text , advertising images, etc., or screenshots of chat interfaces or news that contain a lot of text.
  • the reference image can also be any image in the image library, not necessarily the same type as the image to be processed. The present disclosure does not limit the types of the reference image and the image to be processed.
  • the acquisition manner of the second image feature and the second text feature of the reference image may be the same as the acquisition manner of the first image feature and the first text feature of the image to be processed respectively.
  • the second image feature and the second text feature of the reference image can be obtained in advance and stored in the feature library corresponding to the above image library.
  • the image similarity between the first image feature and the second image feature, and the text similarity between the first text feature and the second text feature may be determined respectively.
  • the above features may be feature information in the form of a feature matrix or feature vector, and the above similarity may be determined by determining parameters such as cosine similarity, Jaccard similarity coefficient, Pearson correlation coefficient, and relative entropy.
  • the present disclosure does not limit the calculation methods of image similarity and text similarity.
  • the image similarity and text similarity can be comprehensively calculated to determine the image matching result between the image to be processed and the reference image, and the matching result can include The overall similarity obtained.
  • the reference image can be matched one-to-one with the image to be processed, and when the comprehensive similarity is greater than or equal to the similarity threshold, it is determined that the two match.
  • multiple reference images in the image library may be matched with the image to be processed, and the target image with the highest matching degree with the image to be processed may be determined in the image library.
  • FIG. 2 shows a flowchart of an image processing method according to an embodiment of the present disclosure. As shown in FIG. 2, based on FIG. 1, the method further includes:
  • the image library may include multiple candidate images, and the image similarity between the first image feature of the image to be processed and the second image feature of each candidate image may be determined respectively , and determine the text similarity between the first text feature of the image to be processed and the second text feature of each candidate image, and then determine the comprehensive similarity between the image to be processed and each candidate image.
  • FIG. 3 shows a flowchart of an image processing method according to an embodiment of the present disclosure.
  • S102 can be implemented through S301 to S303 , which will be described in conjunction with the steps shown in FIG. 3 .
  • the image similarity is used as a screening condition of the reference image, and the image similarity between the image to be processed and each candidate image is determined. Furthermore, according to the image similarity, a reference image with a higher image similarity can be screened out among multiple candidate images. For example, an image with an image similarity higher than a threshold can be screened out as a reference image, or an image with the highest image n (n is a positive integer) images are used as reference images.
  • the filtered reference image has a high similarity with the image to be processed at the pattern level, so the image with the highest matching degree with the image to be processed may come from these reference images, so when determining the text similarity, it is only necessary to determine the filtered
  • the text similarity between part of the reference image and the image to be processed is determined, and the comprehensive similarity between the selected part of the reference image and the image to be processed is determined. In order to save the calculation amount and improve the processing efficiency.
  • text similarity can also be used as a filter condition (for example, when the image to be processed mainly includes text content, and when the pattern information is less, the text similarity can be used as a filter condition), firstly, among multiple images to be selected Find the reference image with the highest text similarity with the image to be processed, then determine the image similarity between the reference image and the image to be processed, and then determine the comprehensive similarity between the reference image and the image to be processed.
  • the present disclosure does not limit the screening conditions.
  • Fig. 4 shows a flowchart of an image processing method according to an embodiment of the present disclosure.
  • S103 in Fig. 1 can be realized by at least one of S1031 and S1032, which will be shown in conjunction with Fig. 4 steps are explained.
  • both the image similarity and the text similarity can be numeric values in the form of percentages, for example, if the image similarity is 98%, the text similarity is 95%, etc., the two can be multiplied, and The product is determined as the composite similarity.
  • the weighted average of the two can also be calculated.
  • the two similarities can be considered to have the same importance. Therefore, the weights of both can be set to 1, and directly calculate the average of the two.
  • the weights of the two may be determined first, and then weighted average processing is performed. There are many ways to determine the weight. For example, it can be based on the number of characters included in the text. When the number is large, the weight of the text similarity can be made higher, otherwise, the weight of the text similarity can be lowered. The weight value can also be determined through the characteristics of the image to be processed.
  • determining a weighted average of the image similarity and the text similarity as the comprehensive similarity may be implemented in the following manner: determining the type of the image to be processed; Determine weight information according to the type of the image to be processed; perform weighted average processing on the image similarity and the text similarity according to the weight information to obtain the comprehensive similarity.
  • the type of the image to be processed may be determined according to the source of the image, for example, the source of the image to be processed is a screenshot of a communication tool interaction page or a news website page, and its type may be "communication Tool interaction page screenshot", or "news website page screenshot", or, the source of the image to be processed is a street camera, and its type can be "street view image”. Alternatively, the source of the image to be processed is an access control camera, and its type may be "face image” or the like.
  • the type of the image to be processed can also be determined according to the classification mark of the image.
  • the classification mark of the image can be added manually or automatically when the image to be processed is generated, such as the above "communication tool Screenshot of interactive page".
  • the type of the image to be processed can also be determined based on the first image feature.
  • the above-mentioned first image feature is feature information in the form of a matrix or vector, and the first image feature can be deconvoluted , activation, etc., to obtain the type of the image to be processed, for example, it may be determined that the image to be processed is a face image, a street view image, a news screenshot, a calligraphy image, and the like.
  • the type of the image to be processed may also be defined and set by the user, and the present disclosure does not limit the manner of determining the type of the image to be processed.
  • the weight information may be determined based on the type of the image to be processed. For example, when the type of image to be processed indicates that the image is mainly characterized by image features (such as face images, landscape images, etc.), the weight of image similarity can be made higher, and the weight of text similarity can be lower . For another example, when the type of image to be processed indicates that the image is mainly characterized by text features (such as news screenshots, web page screenshots, etc.), the weight of image similarity can be lower, and the weight of text similarity can be higher. .
  • image features such as face images, landscape images, etc.
  • text features such as news screenshots, web page screenshots, etc.
  • corresponding weights may be set for each type of image in advance, and weight information may also be calculated based on the type.
  • the type of the image to be processed can be expressed in the form of probability.
  • the probability of the type of the image to be processed is a news screenshot is 95%, and the probability of being other types is 5%.
  • the weight information can be calculated based on this data.
  • various types of probabilities may be used as elements of a vector, and the vector may be activated to obtain the above weight information. The present disclosure does not limit the method of calculating the weight.
  • S1032. Determine the weighted average of the image similarity and the text similarity as the comprehensive similarity, which may also be implemented in the following manner: determine the text in the image to be processed The area ratio of the area where it is located; according to the area ratio, determine the weight information; according to the weight information, perform weighted average processing on the image similarity and the text similarity to obtain the comprehensive similarity Spend.
  • the area proportion of the region where the text is located may be determined based on the first image feature.
  • the first image feature can be used to represent the layout of the image, and the area ratio of the area where the text is located can be calculated based on the first image feature.
  • the weight information may be determined based on the area ratio.
  • the area proportion of the region where the text is located can be determined as the weight of the text similarity, and then the weight of the image similarity can be calculated.
  • the area ratio of the area where the text is located can be activated to obtain the above weight information.
  • the present disclosure does not limit the specific method for calculating the weight.
  • the type of the image to be processed and the area ratio of the area where the text is located can also be obtained in other ways, for example, the type of the image to be processed can be determined through manual labeling, and the Determines the proportion of the area where the text is located.
  • the area ratio of the area where the text is located can also be determined based on attributes such as color and shape. For example, in images such as screenshots of news websites, the font is usually black, and the area area where the text is located can be determined by the proportion of black.
  • the area where the font is located in the above screenshot is a neat row or column, and the area ratio of the area where the text is located can be determined based on the area of the rectangle presented by the row or column.
  • This disclosure does not limit the method of determining the area ratio .
  • the present disclosure does not limit the manner of determining the type of the image to be processed and the area proportion of the region where the text is located.
  • the weight may be positively correlated with the area ratio, for example, the larger the area ratio of the text area, the lower the weight of the image similarity, and the higher the weight of the text similarity.
  • the method of determining the area ratio of the area where the text is located can also be used in scenarios where the text itself is also an image.
  • the text is a word art, and the word art itself is both text and an image.
  • the area ratio of the region where it is located is 100%, and the area ratio of the region where the image is located is also 100%, so the weights of the two can be made equal.
  • the number of words in the image can also be counted, for example, the interval to which the number of words belongs can be associated with the weight value, the more the number of words, the higher the weight of the text similarity, for example, the number of words If the number of words is greater than or equal to 100, the weight of text similarity is 0.8; if the number of words is greater than or equal to 50 and less than 100 words, the weight of text similarity is 0.5; if the number of words is less than 50 words, the weight of text similarity is 0.3, etc. , the present disclosure does not limit the corresponding relationship between the number of words and the weight.
  • the weight information of the image similarity and text similarity can be determined by the type of the image to be processed or the area ratio of the text area, which can improve the accuracy of the weight information, thereby improving the accuracy of the matching results .
  • determining the weighted average of the image similarity and the text similarity as the comprehensive similarity may also be implemented in the following manner: according to the first image feature, Determine weight information; perform weighted average processing on the image similarity and the text similarity according to the weight information to obtain the comprehensive similarity.
  • the characteristics of the image to be processed can be determined by the first image feature representing the feature information of the pattern level of the image to be processed.
  • the type of the image to be processed can be determined according to the first image feature in the above manner, or it can be determined Handle features such as the area ratio of the area where the text is located in the image. And based on this feature to determine the weight.
  • the weight information may also be obtained directly through a trained network model according to the first image feature, which is not limited in the present disclosure.
  • weighted average processing may be performed on the image similarity and the text similarity based on the weight information to obtain a comprehensive similarity between the image to be processed and the reference image. Furthermore, a matching result can be obtained based on the comprehensive similarity. For example, when a one-to-one comparison is performed between the image to be processed and the reference image, it may be determined that the image to be processed matches the reference image when the comprehensive similarity is higher than a threshold. For another example, when the target image matching the image to be processed is determined in the image library, the reference image with the highest comprehensive similarity with the image to be processed may be determined as the target image matching the image to be processed.
  • image features and text features can be obtained for comprehensive comparison.
  • the text contained in the image is considered, and the type of image to be processed or the area ratio of the area where the text is located To determine the weight information of image similarity and text similarity, and then obtain the matching result. It reduces the probability of false positives when the image color, texture, light and dark, layout, style, feature point position, etc. have a high similarity, but the text content is inconsistent, and improves the accuracy of the matching result.
  • Fig. 5 shows a schematic diagram of the application of the image processing method according to an embodiment of the present disclosure.
  • the image to be processed is an image including text content
  • the image to be processed can comprehensively consider the image similarity between the first image feature of the image to be processed and the second image feature of the reference image, and the text similarity between the first text feature of the image to be processed and the second text feature of the reference image .
  • the second image feature and the second text feature of each reference image in the image library may be acquired in advance, and stored in the feature library for comparison with the features of the image to be processed.
  • the first image feature and the first text feature of the image to be processed may be extracted.
  • the first image feature of the image to be processed can be extracted through a convolutional neural network
  • the text information in the image can be obtained through OCR technology
  • the first text feature of the image to be processed can be obtained through a recurrent neural network.
  • the first text feature can be used for screening, and from the images in the image library, the n images with the highest image similarity can be selected as reference images, and then the first text feature and these reference images can be determined. Text similarity of images. Furthermore, the comprehensive similarity between the image to be processed and these reference images can be determined through weighted average processing.
  • the weights of the text similarity and the image similarity may be determined, for example, the weight of the text similarity may be determined according to the area proportion of the region where the text is located in the image to be processed.
  • the area proportion of the area where the text is located can be determined based on the first image feature of the image to be processed, and then the area proportion of the area where the text is located is determined as the weight x of the text similarity, and then 1-x is determined is the weight of image similarity.
  • weighted average processing can be performed on the image similarity and text similarity based on the above weights, so as to respectively determine the comprehensive similarity between the image to be processed and each reference image, And the reference image with the highest comprehensive similarity is determined as the target image matching the image to be processed.
  • the image processing method can be used in fields such as network supervision. For example, for a website with a strong anti-data crawling function, it is difficult to directly supervise the text content published by the website. Take screenshots of the content published on the website, and compare the screenshots with preset images containing specific words or sentences to determine whether there are specific words or sentences in the screenshots, and then determine whether the content of the website includes specific words or sentences, In this way, publishers of website content can be effectively supervised.
  • the present disclosure does not limit the application scenarios of the image processing method.
  • FIG. 6 shows a flowchart of an image processing method according to an embodiment of the present disclosure. As shown in FIG. 6, the method includes:
  • Fig. 7 shows a flow chart of an image processing method according to an embodiment of the present disclosure.
  • firstly perform S701 and obtain a retrieval picture; then, perform S7021 image feature comparison and S7023 text feature respectively on the retrieval picture Compare; then, through the similarity calculation of S7022 and the similarity calculation of S7024, the similarity results of A_i, B_i, C_i, and D_i in the retrieved picture and image feature library are correspondingly obtained, and the similarity results of the retrieved picture and text feature library are obtained.
  • the similarity result of retrieval picture and A_i is 94%
  • the similarity result with B_i is 96%
  • the similarity result with C_i is 91%
  • the similarity result with D_i is 80%
  • the result of similarity with A_w is 98%
  • the result of similarity with B_w is 90%
  • the result of similarity with C_w is 85%
  • the result of similarity with D_w is 60%.
  • the relevant comparison logic of S703 is: compare the retrieved picture with the image feature database, and determine the image corresponding to the maximum similarity result as the one with the retrieved image.
  • the image most similar to the picture as shown in Table 1, the maximum value of the similarity result obtained by comparing the retrieved image with the image features of the image feature library is 96%, and the image corresponding to the maximum similarity result is a B image.
  • the B image is determined to be the image most similar to the retrieved picture.
  • the disclosed comparison logic of S704 is: compare the retrieved picture with the image features in the image feature database and the text features in the text feature database respectively, and compare the images corresponding to the image features in the image feature database and the text features in the text feature database
  • the similarity results of A and A are added together to obtain a comprehensive similarity, and the image corresponding to the maximum comprehensive similarity is determined to be the most similar image to the retrieved image. It can be seen from Table 1 that the composite similarity result of the retrieved image and A image is 192%.
  • the comprehensive similarity result with image B is 186%
  • the comprehensive similarity result with image C is 176%
  • the comprehensive similarity result with image D is 140%. Therefore, the comprehensive similarity result value between the retrieved image and image A is the largest , so that the A image is determined to be the most similar image to the retrieved image.
  • a related image comparison method is to extract depth features (corresponding to the image features in the above-mentioned embodiments) of the image, and perform similarity calculation based on the image depth features, and then obtain a comparison result.
  • depth features corresponding to the image features in the above-mentioned embodiments
  • similarity calculation based on the image depth features
  • image features are often relatively similar. If only through traditional image comparison methods For comparison, there may be more false positive data, that is, the images are relatively similar, but the text content is completely different.
  • FIG. 8 shows a block diagram of an image processing device according to an embodiment of the present disclosure. As shown in FIG.
  • the device includes: a feature extraction part 801 configured to perform feature extraction processing on an image to be processed, and obtain an image of the image to be processed The first image feature and the first text feature, wherein the first text feature is the feature information of the text included in the image to be processed; the similarity determining part 802 is configured to determine the first image feature and the first text feature respectively the image similarity between the second image features of the reference image, and the text similarity between the first text feature and the second text feature of the reference image, wherein the second text feature is the reference The feature information of the text included in the image; the matching part 803 is configured to determine an image matching result between the image to be processed and the reference image according to the image similarity and the text similarity.
  • a feature extraction part 801 configured to perform feature extraction processing on an image to be processed, and obtain an image of the image to be processed The first image feature and the first text feature, wherein the first text feature is the feature information of the text included in the image to be processed
  • the similarity determining part 802 is configured to determine the first
  • the apparatus further includes: a target image determining part configured to determine, according to the image matching results between the image to be processed and at least two reference images, the The target image to be matched with the image to be processed.
  • the similarity determining part 802 is further configured to: respectively determine the image similarity between the first image feature and the second image feature of at least two candidate images; The image similarity is determined to determine a reference image among the at least two candidate images; and the text similarity between the first text feature and the second text feature of the reference image is determined.
  • the image matching result includes a comprehensive similarity between the reference image and the image to be processed
  • the matching part 803 is further configured to: combine the image similarity with the The product of the text similarity is determined as the comprehensive similarity; or, the weighted average of the image similarity and the text similarity is determined as the comprehensive similarity.
  • the matching part 803 is further configured to: determine the type of the image to be processed; determine weight information according to the type of the image to be processed; The image similarity and the text similarity are weighted and averaged to obtain the comprehensive similarity.
  • the matching part 803 is further configured to: determine the area ratio of the area where the text is located in the image to be processed; determine the weight information according to the area ratio; The weight information is used to perform weighted average processing on the image similarity and the text similarity to obtain the comprehensive similarity.
  • the matching part 803 is further configured to: determine weight information according to the first image feature; compare the image similarity and the text similarity according to the weight information Perform weighted average processing to obtain the comprehensive similarity.
  • the first text features include at least one of semantic features, format features, font features, size features, typesetting features, and language features.
  • the functions or parts included in the apparatus provided by the embodiments of the present disclosure may be configured to execute the methods described in the above method embodiments, and the implementation manner may refer to the descriptions of the above method embodiments.
  • Embodiments of the present disclosure also provide a computer-readable storage medium, on which computer program instructions are stored, and the above-mentioned method is implemented when the computer program instructions are executed by a processor.
  • the computer readable storage medium may be a non-transitory computer readable storage medium.
  • An embodiment of the present disclosure also proposes an electronic device, including: a processor; a memory configured to store instructions executable by the processor; wherein the processor is configured to invoke the instructions stored in the memory to execute the above method.
  • Embodiments of the present disclosure also provide another computer program product configured to store computer-readable instructions, and when the instructions are executed, the computer executes the operations of the image processing method provided by any of the above embodiments.
  • Electronic devices may be provided as terminals, servers, or other forms of devices.
  • FIG. 9 shows a block diagram of an electronic device 900 according to an embodiment of the present disclosure.
  • the electronic device 900 may be a terminal such as a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, or a personal digital assistant.
  • the electronic device 900 may include one or more of the following components: a processing component 901, a memory 902, a power supply component 903, a multimedia component 904, an audio component 905, an input/output (Input/Ouput, I/O) interface 906, A sensor component 907, and a communication component 908.
  • the processing component 902 generally controls the overall operations of the electronic device 900, such as those associated with display, telephone calls, data communications, camera operations, and recording operations.
  • the processing component 901 may include one or more processors 908 to execute instructions to complete all or part of the steps of the above method. Additionally, processing component 901 may include one or more modules to facilitate interaction between processing component 901 and other components. For example, processing component 901 may include a multimedia portion to facilitate interaction between multimedia component 904 and processing component 901 .
  • the memory 902 is configured to store various types of data to support operations at the electronic device 900 . Examples of such data include instructions for any application or method operating on the electronic device 900, contact data, phonebook data, messages, pictures, videos, and the like.
  • the memory 902 can be realized by any type of volatile or non-volatile storage device or their combination, such as Static Random-Access Memory (Static Random-Access Memory, SRAM), Electrically Erasable Programmable Read-Only Memory (Electrically -Erasable Programmable Read-Only Memory, EEPROM), Erasable Programmable Read-Only Memory (Erasable Programmable Read-Only Memory, EPROM), Programmable Read-Only Memory (Programmable read-only memory, PROM), Read Only Memory (Read -Only Memory, ROM), magnetic memory, flash memory, magnetic disk or optical disk.
  • Static Random-Access Memory SRAM
  • Electrically Erasable Programmable Read-Only Memory Electrically Erasable Programmable Read-Only Memory (Electrically
  • the power supply component 903 provides power to various components of the electronic device 900 .
  • Power supply components 903 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for electronic device 900 .
  • the multimedia component 904 includes a screen providing an output interface between the electronic device 900 and the user.
  • the screen may include a liquid crystal display (Liquid Crystal Display, LCD) and a touch panel (Touch Panel, TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user.
  • the touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may not only sense an edge of a touch or slide action, but also detect a duration and pressure associated with the touch or slide operation.
  • the multimedia component 904 includes at least one of a front camera and a rear camera.
  • At least one of the front camera and the rear camera can receive external multimedia data.
  • Each front camera and rear camera can be a fixed optical lens system or have focal length and optical zoom capability.
  • the audio component 905 is configured to at least one of output and input an audio signal.
  • the audio component 905 includes a microphone (Microphone, MIC), which is configured to receive external audio signals when the electronic device 900 is in an operation mode, such as a calling mode, a recording mode and a voice recognition mode. Received audio signals may be further stored in memory 902 or sent via communication component 908 .
  • the audio component 905 also includes a speaker configured to output audio signals.
  • the I/O interface 906 provides an interface between the processing component 901 and a peripheral interface module, which may be a keyboard, a click wheel, a button, and the like. These buttons may include, but are not limited to: a home button, volume buttons, start button, and lock button.
  • Sensor assembly 907 includes one or more sensors configured to provide various aspects of status assessment for electronic device 900 .
  • the sensor component 907 can detect the open/closed state of the electronic device 900, the relative positioning of components, for example, the components are the display and the keypad of the electronic device 900, the sensor component 907 can also detect the electronic device 900 or a Changes in the position of components, presence or absence of user contact with the electronic device 900 , electronic device 900 orientation or acceleration/deceleration and temperature changes in the electronic device 900 .
  • the sensor assembly 907 may include a proximity sensor configured to detect the presence of nearby objects in the absence of any physical contact.
  • the sensor assembly 907 may also include an optical sensor, such as a CMOS image sensor (Complementary Metal-Oxide-Semiconductor, CMOS) or a CCD image sensor (Charge Coupled Device, CCD), configured to be used in imaging applications.
  • CMOS image sensor Complementary Metal-Oxide-Semiconductor
  • CCD image sensor Charge Coupled Device
  • the sensor component 907 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor or a temperature sensor.
  • the communication component 908 is configured to facilitate wired or wireless communication between the electronic device 900 and other devices.
  • the electronic device 900 can access a wireless network based on communication standards, such as WiFi, 2G or 3G, or a combination thereof.
  • the communication component 908 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel.
  • the communication component 908 further includes a near field communication (Near Field Communication, NFC) module to facilitate short-range communication.
  • NFC Near Field Communication
  • the NFC module can be based on Radio Frequency Identification (RFID) technology, Infrared Data Association (Infrared Data Association, IrDA) technology, Ultra Wide Band (UWB) technology, Bluetooth (Bit Torrent, BT) technology and other techniques to achieve.
  • RFID Radio Frequency Identification
  • IrDA Infrared Data Association
  • UWB Ultra Wide Band
  • Bluetooth Bluetooth
  • the electronic device 900 may be implemented by one or more application-specific integrated circuits (Application Specific Integrated Circuit, ASIC), digital signal processors (Digital Signal Processing, DSP), digital signal processing devices (Digital Signal Processing Device , DSPD), Programmable Logic Device (Pulsed Laser Deposition, PLD), Field Programmable Gate Array (Field Programmable Gate Array, FPGA), Controller, Microcontroller, Microprocessor or other electronic component implementation, configured to execute the above method.
  • ASIC Application Specific Integrated Circuit
  • DSP Digital Signal Processing
  • DSPD digital signal processing devices
  • PLD Programmable Logic Device
  • Field Programmable Gate Array Field Programmable Gate Array
  • FPGA Field Programmable Gate Array
  • Controller Microcontroller, Microprocessor or other electronic component implementation, configured to execute the above method.
  • a non-volatile computer-readable storage medium such as a memory 902 including computer program instructions, which can be executed by the processor 909 of the electronic device 900 to implement the above method.
  • FIG. 10 shows a block diagram of an electronic device 1000 according to an embodiment of the present disclosure.
  • the electronic device 1000 may be provided as a server.
  • electronic device 1000 includes processing component 1001 , which also includes one or more processors, and a memory resource represented by memory 1002 configured to store instructions executable by processing component 1001 , such as application programs.
  • the application program stored in memory 1002 may include one or more modules each corresponding to a set of instructions.
  • the processing component 1001 is configured to execute instructions to perform the above method.
  • the electronic device 1000 may also include a power supply component 1003 configured to perform power management of the electronic device 1000 , a wired or wireless network interface 1004 configured to connect the electronic device 1000 to a network, and an I/O interface 1005 .
  • the electronic device 1000 can operate based on an operating system stored in the memory 1002, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM or the like.
  • a non-volatile computer-readable storage medium such as the memory 1002 including computer program instructions, which can be executed by the processing component 1001 of the electronic device 1000 to implement the above method.
  • the present disclosure may be at least one of a system, method and computer program product.
  • a computer program product may include a computer readable storage medium having computer readable program instructions thereon for causing a processor to implement various aspects of the present disclosure.
  • a computer readable storage medium may be a tangible device that can retain and store instructions for use by an instruction execution device.
  • a computer readable storage medium may be, for example, but is not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
  • Computer-readable storage media include: portable computer disks, hard disks, Random-Access Memory (RAM), Read-Only Memory (ROM), Erasable Programmable Read-Only Memory (EPROM) or flash memory, Static Random-Access Memory (Static Random-Access Memory, SRAM), Portable Compact Disc Read-Only Memory (Compact Disc Read-Only Memory, CD-ROM), Digital Versatile Disc (DVD), memory sticks, floppy disks, mechanically encoded devices such as punched cards or raised structures in grooves with instructions stored thereon, and any suitable combination.
  • RAM Random-Access Memory
  • ROM Read-Only Memory
  • EPROM Erasable Programmable Read-Only Memory
  • flash memory Static Random-Access Memory
  • Static Random-Access Memory SRAM
  • Portable Compact Disc Read-Only Memory Compact Disc Read-Only Memory
  • CD-ROM Compact Disc Read-Only Memory
  • DVD Digital Versatile Disc
  • memory sticks floppy disks, mechanically encoded devices such as punched cards or raised structures
  • computer-readable storage media are not to be construed as transient signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., pulses of light through fiber optic cables), or transmitted electrical signals.
  • the computer-readable program instructions described herein can be downloaded from a computer-readable storage medium to a respective computing/processing device, or downloaded to an external computer or external storage device over a network, such as at least one of the Internet, a local area network, a wide area network, and a wireless network .
  • the network may include at least one of copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and edge servers.
  • a network adapter card or a network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in each computing/processing device .
  • Computer program instructions for performing the operations of the present disclosure may be assembly instructions, Instruction Set Architectures (ISA) instructions, machine instructions, machine-dependent instructions, microcode, firmware instructions, state setting data, or in the form of one or more source or object code written in any combination of programming languages, including object-oriented programming languages—such as Smalltalk, C++, etc., and conventional procedural programming languages—such as the “C” language or similar programming languages.
  • Computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server implement.
  • the remote computer can be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or it can be connected to an external computer such as use an Internet service provider to connect via the Internet).
  • electronic circuits such as programmable logic circuits, field programmable gate arrays (Field Programmable Gate Array, FPGA) or programmable logic arrays (Programmable Logic Array, PLA), the electronic circuit can execute computer-readable program instructions, thereby implementing various aspects of the present disclosure.
  • These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine such that when executed by the processor of the computer or other programmable data processing apparatus , producing a device for realizing the functions/actions specified in one or more blocks of at least one of the flowchart and the block diagram.
  • These computer-readable program instructions can also be stored in a computer-readable storage medium, and these instructions cause at least one of computers, programmable data processing devices and other devices to work in a specific way, so that the computer-readable The medium then includes an article of manufacture including instructions for implementing various aspects of the functions/acts specified in one or more blocks of at least one of flowcharts and block diagrams.
  • each block in a flowchart or block diagram may represent a module, a portion of a program segment, or an instruction that includes one or more Executable instructions.
  • the functions noted in the block may occur out of the order noted in the figures. For example, two blocks in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of at least one of the block diagrams and flowcharts, and combinations of blocks of at least one of the block diagrams and flowcharts may be implemented with dedicated hardware-based devices that perform specified functions or actions. system, or it may be implemented by a combination of special purpose hardware and computer instructions.
  • the computer program product can be specifically realized by means of hardware, software or a combination thereof.
  • the computer program product is embodied as a computer storage medium, and in another optional embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK) etc. wait.
  • a software development kit Software Development Kit, SDK
  • the image processing method of the embodiment of the present disclosure can obtain image features and text features for comprehensive comparison.
  • the text contained in the image is considered, and the type of the image to be processed or the area ratio of the area where the text is located, etc.
  • To determine the weight information of image similarity and text similarity and then obtain the matching result.
  • the determined text similarity can be used to reduce false positives in image matching The probability of , which improves the accuracy of the matching results.

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)

Abstract

La présente demande concerne un procédé et un appareil de traitement d'image, ainsi qu'un dispositif électronique et un support de stockage. Le procédé consiste à : réaliser une extraction de caractéristiques sur une image à traiter, pour obtenir une première caractéristique d'image et une première caractéristique de texte; déterminer respectivement la similarité d'image entre la première caractéristique d'image et une deuxième caractéristique d'image d'une image de référence et la similarité de texte entre la première caractéristique de texte et une deuxième caractéristique de texte de l'image de référence; et déterminer un résultat de correspondance d'image entre l'image à traiter et l'image de référence en fonction de la similarité d'image et de la similarité de texte.
PCT/CN2022/096004 2021-11-29 2022-05-30 Procédé et appareil de traitement d'images, dispositif électronique, support d'enregistrement, et produit programme informatique WO2023092975A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111435625.7 2021-11-29
CN202111435625.7A CN114118278A (zh) 2021-11-29 2021-11-29 图像处理方法及装置、电子设备和存储介质

Publications (1)

Publication Number Publication Date
WO2023092975A1 true WO2023092975A1 (fr) 2023-06-01

Family

ID=80371521

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/096004 WO2023092975A1 (fr) 2021-11-29 2022-05-30 Procédé et appareil de traitement d'images, dispositif électronique, support d'enregistrement, et produit programme informatique

Country Status (2)

Country Link
CN (1) CN114118278A (fr)
WO (1) WO2023092975A1 (fr)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114118278A (zh) * 2021-11-29 2022-03-01 深圳市商汤科技有限公司 图像处理方法及装置、电子设备和存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111694978A (zh) * 2020-05-20 2020-09-22 Oppo(重庆)智能科技有限公司 图像相似度检测方法、装置、存储介质与电子设备
US20210124976A1 (en) * 2019-10-28 2021-04-29 Samsung Sds Co., Ltd. Apparatus and method for calculating similarity of images
CN112990376A (zh) * 2021-04-29 2021-06-18 北京世纪好未来教育科技有限公司 一种文本图像相似度评估方法、装置及计算设备
CN113111154A (zh) * 2021-06-11 2021-07-13 北京世纪好未来教育科技有限公司 相似度评估方法、答案搜索方法、装置、设备及介质
CN114118278A (zh) * 2021-11-29 2022-03-01 深圳市商汤科技有限公司 图像处理方法及装置、电子设备和存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210124976A1 (en) * 2019-10-28 2021-04-29 Samsung Sds Co., Ltd. Apparatus and method for calculating similarity of images
CN111694978A (zh) * 2020-05-20 2020-09-22 Oppo(重庆)智能科技有限公司 图像相似度检测方法、装置、存储介质与电子设备
CN112990376A (zh) * 2021-04-29 2021-06-18 北京世纪好未来教育科技有限公司 一种文本图像相似度评估方法、装置及计算设备
CN113111154A (zh) * 2021-06-11 2021-07-13 北京世纪好未来教育科技有限公司 相似度评估方法、答案搜索方法、装置、设备及介质
CN114118278A (zh) * 2021-11-29 2022-03-01 深圳市商汤科技有限公司 图像处理方法及装置、电子设备和存储介质

Also Published As

Publication number Publication date
CN114118278A (zh) 2022-03-01

Similar Documents

Publication Publication Date Title
US11120078B2 (en) Method and device for video processing, electronic device, and storage medium
CN111310616B (zh) 图像处理方法及装置、电子设备和存储介质
EP3173948A1 (fr) Procédé et appareil de recommandation de documents de référence
CN111783756B (zh) 文本识别方法及装置、电子设备和存储介质
CN107102746B (zh) 候选词生成方法、装置以及用于候选词生成的装置
WO2021056621A1 (fr) Procédé et appareil de reconnaissance de séquence de texte, dispositif électronique et support de stockage
WO2021031645A1 (fr) Procédé et appareil de traitement d'image, dispositif électronique et support d'informations
CN111581488B (zh) 一种数据处理方法及装置、电子设备和存储介质
WO2021027343A1 (fr) Procédé et appareil de reconnaissance d'images de visages humains, dispositif électronique, et support d'informations
CN111259967B (zh) 图像分类及神经网络训练方法、装置、设备及存储介质
CN110781813B (zh) 图像识别方法及装置、电子设备和存储介质
WO2021208666A1 (fr) Procédé et appareil de reconnaissance de caractères, dispositif électronique et support de stockage
CN107784034B (zh) 页面类别识别方法及装置、用于页面类别识别的装置
CN107564526B (zh) 处理方法、装置和机器可读介质
CN110659690B (zh) 神经网络的构建方法及装置、电子设备和存储介质
CN110391966B (zh) 一种消息处理方法、装置和用于消息处理的装置
EP3734472A1 (fr) Procédé et dispositif de traitement de texte
WO2023078414A1 (fr) Procédé et appareil de recherche d'articles apparentés, dispositif électronique et support de stockage
CN107424612B (zh) 处理方法、装置和机器可读介质
CN110633715B (zh) 图像处理方法、网络训练方法及装置、和电子设备
WO2023092975A1 (fr) Procédé et appareil de traitement d'images, dispositif électronique, support d'enregistrement, et produit programme informatique
CN110232181B (zh) 评论分析方法及装置
CN114168798A (zh) 文本存储管理与检索方法及装置
CN111222316B (zh) 文本检测方法、装置及存储介质
CN110070046B (zh) 人脸图像识别方法及装置、电子设备和存储介质