WO2019169769A1

WO2019169769A1 - Advertisement picture identification method, electronic device, and readable storage medium

Info

Publication number: WO2019169769A1
Application number: PCT/CN2018/089720
Authority: WO
Inventors: 宋杰; 郑佳; 赵骏
Original assignee: 平安科技（深圳）有限公司
Priority date: 2018-03-06
Filing date: 2018-06-03
Publication date: 2019-09-12
Also published as: CN108399161A

Abstract

The present application relates to an advertisement picture identification method, an electronic device, and a readable storage medium. The method comprises: performing optical character recognition on a picture to be analyzed, to recognize characters; performing word segmentation on the recognized characters; comparing segmented words with advertisement keywords in a pre-established advertisement keyword library, to obtain segmented words matching the advertisement keywords in the advertisement keyword library; allocating corresponding keyword matching scores according to the matching result and according to a preset matching scoring rule; recognizing different font sizes of characters in said picture, and allocating corresponding font scores according to the font sizes of the matching segmented words and according to a preset font scoring rule; according to the keyword matching score and the font score, using a preset rule to determine whether said picture is an advertisement picture. The present application can accurately and effectively determine whether a picture to be analyzed is an advertisement picture. Furthermore, the present invention can automatically identify an advertisement picture without manual detection, effectively improving detection efficiency.

Description

Advertising picture identification method, electronic device and readable storage medium

Priority claim

This application is based on the priority of the Chinese Patent Application entitled "Advertising Picture Identification Method, Electronic Device and Readable Storage Medium", which is filed on March 6, 2018, with the application number of CN 201810183371.6, which is filed on March 6, 2018. The content is incorporated herein by reference.

Technical field

The present application relates to the field of computer technologies, and in particular, to an advertisement picture identification method, an electronic device, and a readable storage medium.

Background technique

At present, for large Internet finance companies, a large number of business images are involved in various business processes, and various business images may be mixed in the business images. These advertisement images contain various advertisement information, garbage information, etc., which may interfere. Normal business processing must be effectively identified and eliminated. The traditional way of identifying advertisement pictures is to manually check a large number of business pictures one by one to screen out the advertisement pictures. This manual detection cost is high, and it is time-consuming and inefficient.

Summary of the invention

The purpose of the present application is to provide an advertisement picture identification method, an electronic device, and a readable storage medium, which are intended to improve the efficiency of identifying an advertisement picture.

In order to achieve the above object, a first aspect of the present application provides an electronic device, where the electronic device includes a memory and a processor, where the memory stores an advertisement picture authentication system that can be run on the processor, and the advertisement picture The authentication system implements the following steps when executed by the processor:

After receiving the picture to be analyzed, performing optical character recognition on the picture to be analyzed, and identifying the text in the picture to be analyzed;

Perform word segmentation on the recognized words;

Matching each participle with each advertisement keyword in the pre-established advertisement keyword library to obtain a word segment matching the advertisement keyword in the pre-established advertisement keyword library; and assigning a corresponding according to the matching result according to the preset matching scoring rule Keyword matching rating;

Identifying different font sizes of each character in the image to be analyzed, and assigning a corresponding font score according to a preset font score rule according to a font size of the matched word segment;

And determining, according to the keyword matching score and the font score, whether the image to be analyzed is an advertisement image by using a preset rule.

In addition, in order to achieve the above object, the second aspect of the present application further provides an advertisement picture identification method, where the advertisement picture identification method includes:

Perform word segmentation on the recognized words;

Further, in order to achieve the above object, a third aspect of the present application further provides a computer readable storage medium, where the computer readable storage medium stores an advertisement picture authentication system, where the advertisement picture authentication system is executable by at least one processor And causing the at least one processor to perform the steps of the advertisement picture identification method as described above.

The advertisement picture identification method, system and readable storage medium proposed by the application, the optical characters are recognized by the image to be analyzed; the recognized words are segmented; and each participle is associated with each advertisement in the pre-established advertisement keyword library. The keyword is matched, and the corresponding keyword matching score is assigned according to the matching matching rule according to the matching result; the different font sizes of each text are identified, and corresponding fonts are assigned according to the font size of the matched word segment according to the preset font score rule. The font score is determined according to the keyword matching score and the font score, and the preset rule is used to determine whether the image to be analyzed is an advertisement image. Since the advertisement font is different from other normal texts when the advertisement information is generally displayed in the image, the present application can match each word segment in the image to be analyzed with each advertisement keyword in the pre-established advertisement keyword library, according to The matching situation assigns a corresponding keyword matching score, and assigns a corresponding font score according to the font size of the matched word segment, and combines the keyword matching score and the font score to perform comprehensive identification, which can more accurately and effectively determine the image to be analyzed. Whether it is an ad image with advertising information. Moreover, without manual detection, the identification of the advertisement picture can be automatically performed, and the detection efficiency is effectively improved.

DRAWINGS

1 is a schematic diagram of an operating environment of a preferred embodiment of an advertisement picture authentication system 10 of the present application;

FIG. 2 is a schematic flow chart of an embodiment of an advertisement picture identification method according to the present application.

Detailed ways

In order to make the objects, technical solutions, and advantages of the present application more comprehensible, the present application will be further described in detail below with reference to the accompanying drawings and embodiments. It is understood that the specific embodiments described herein are merely illustrative of the application and are not intended to be limiting. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present application without departing from the inventive scope are the scope of the present application.

It should be noted that the descriptions of "first", "second" and the like in the present application are for the purpose of description only, and are not to be construed as indicating or implying their relative importance or implicitly indicating the number of technical features indicated. . Thus, features defining "first" or "second" may include at least one of the features, either explicitly or implicitly. In addition, the technical solutions between the various embodiments may be combined with each other, but must be based on the realization of those skilled in the art, and when the combination of the technical solutions is contradictory or impossible to implement, it should be considered that the combination of the technical solutions does not exist. Nor is it within the scope of protection required by this application.

The application provides an advertisement picture identification system. Please refer to FIG. 1 , which is a schematic diagram of an operating environment of a preferred embodiment of the advertisement image authentication system 10 of the present application.

In the embodiment, the advertisement picture authentication system 10 is installed and operated in the electronic device 1. The electronic device 1 may include, but is not limited to, a memory 11, a processor 12, and a display 13. Figure 1 shows only the electronic device 1 with components 11-13, but it should be understood that not all illustrated components may be implemented, and more or fewer components may be implemented instead.

The memory 11 is at least one type of readable computer storage medium, which in some embodiments may be an internal storage unit of the electronic device 1, such as a hard disk or memory of the electronic device 1. The memory 11 may also be an external storage device of the electronic device 1 in other embodiments, such as a plug-in hard disk equipped on the electronic device 1, a smart memory card (SMC), and a secure digital device. (Secure Digital, SD) card, flash card, etc. Further, the memory 11 may also include both an internal storage unit of the electronic device 1 and an external storage device. The memory 11 is configured to store application software and various types of data installed in the electronic device 1, such as program codes of the advertisement picture authentication system 10, and the like. The memory 11 can also be used to temporarily store data that has been output or is about to be output.

The processor 12, in some embodiments, may be a central processing unit (CPU), a microprocessor or other data processing chip for running program code or processing data stored in the memory 11, for example The advertisement picture authentication system 10 and the like are executed.

The display 13 in some embodiments may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch sensor, or the like. The display 13 is configured to display information processed in the electronic device 1 and a user interface for displaying visualization, such as text recognized by the optical character of the image to be analyzed, word segmentation result of the recognized text, and image to be analyzed The word segmentation (mark) of the advertisement keyword in the advertisement keyword library, whether the image to be analyzed is the final identification result of the advertisement image, and the like. The components 11-13 of the electronic device 1 communicate with one another via a system bus.

The advertising picture authentication system 10 includes at least one computer readable instruction stored in the memory 11, the at least one computer readable instruction being executable by the processor 12 to implement various embodiments of the present application.

Wherein, when the advertisement picture authentication system 10 is executed by the processor 12, the following steps are implemented:

Step S1: After receiving the picture to be analyzed, perform optical character recognition on the picture to be analyzed, and identify the text in the picture to be analyzed.

In this embodiment, the advertisement picture identification system receives an advertisement picture identification request sent by the user, including, for example, an advertisement picture identification request sent by the user through a mobile phone, a tablet computer, a self-service terminal device, and the like, such as receiving the user in the mobile phone. An advertisement picture authentication request sent by a pre-installed client in a terminal such as a tablet computer or a self-service terminal device, or an advertisement picture identification sent by a user on a browser system in a terminal such as a mobile phone, a tablet computer, or a self-service terminal device request.

After receiving the advertisement image authentication request sent by the user, the advertisement image identification system performs Optical Character Recognition (OCR) on the image to be analyzed in the advertisement image identification request, that is, the printed character is optically The text is converted into a black and white dot matrix image file, and the text in the image is converted into a text format by the recognition software.

The OCR is used to analyze the picture for character recognition to identify the text in the picture to be analyzed. In this embodiment, the unmatched word matching strategy can be implemented in the OCR recognition process. Since the advertisement information is easy to understand and easy to publicize, generally no rare characters appear. Therefore, in the OCR recognition process of the image to be analyzed, if When one of the characters performs character recognition, the matching degree of a certain rare word matching the character is high, but if the matching degree of some common universal words matching the character is low, if the OCR recognition error is determined, the text is determined. The lexicon detection that matches the characters around it into the OCR recognition, when a high match is completed with a certain phrase, identifies the common common word of the corresponding position in the matched phrase. In this way, the recognition accuracy of the advertisement information in the subsequent analysis image can be improved.

It is also possible to perform distortion detection on the uncommon words identified in the analysis image. Because the advertisement information sometimes performs some special processing on the text, the text is distorted, for example, circled on the text, crossed, assembled by the advertisement font, etc. Etc., these special inclusions can be removed after detection, and the text itself can be restored to facilitate subsequent matching and identification of the advertisement information.

In an optional implementation manner, the image to be analyzed may also be subjected to two-dimensional code detection. Once the image to be analyzed contains the two-dimensional code information, the image to be analyzed is directly determined as an advertisement image, and the identification is completed without Follow-up actions.

In step S2, word segmentation processing is performed on the recognized characters.

In this embodiment, the characters extracted by the OCR recognition are preprocessed, such as culling the preliminary recognized special characters, and the line break processing is performed on the characters with the same font size and close distance. Partition the pre-processed text. Including: a, taking m characters of the segmentation statement from left to right as matching fields, and m is the longest number of entries in the preset machine dictionary. b. Find and match the extracted m characters in the machine dictionary. If the matching is successful, the matching field is segmented as a word; if the matching is unsuccessful, the last word of the matching field is removed. The next string is used as the new matching field, and the process is repeated again. The above process is repeated until all the words are segmented. c, operate a and b from right to left for word segmentation.

Further, after the word segmentation, the second process can be performed, and the overall capitalization of the consecutive uppercase numbers or English is performed and translated to identify the advertisement information that is promoted by continuous numbers or English.

In the present embodiment, the N-gram model, the Hidden Markov Model (HMM), and the Maximum Entropy Model may be used for word segmentation, and the word segmentation algorithm may be used. Including: forward maximum match, reverse maximum match, two-way maximum match, shortest path algorithm.

Step S3: matching each participle with each advertisement keyword in the pre-established advertisement keyword library to obtain a word segment matching the advertisement keyword in the pre-established advertisement keyword library; and matching the score according to the matching result according to the matching result The rule assigns a corresponding keyword matching score.

In this embodiment, an advertisement keyword library may be established in advance, for example, an advertisement keyword library may be classified according to different advertisement categories, for example, a keyword library is established according to product advertisement, brand advertisement, concept advertisement, public service advertisement, and the like. The advertisements can also be graded according to different levels. For example, the popular yellow gambling gambling and fraudulent advertisements on the network are set to a high-risk level, which must be eliminated; for the competition system and the brand advertisements related to the business system, the risk level is set. For general merchandise advertisements, etc., it is set to the normal level.

Using the established keyword library to perform keyword matching on the word segmentation in the analysis image, and assigning a score p3 according to the matching result of the word segmentation in the image to be analyzed and the keyword library, the specific defined matching matching rule includes:

a, accurately included: if each participle of the picture to be analyzed matches an advertisement keyword in a pre-established advertisement keyword library, the corresponding first keyword matching score is assigned; that is, the matching condition is a to-be-matched word An exact hit is considered when the keyword in the keyword library is completely included, and p3 is scored 10 points.

b, synonymously included: if each participle of the picture to be analyzed matches a preset related word of an advertisement keyword in a pre-established advertisement keyword library, the corresponding second keyword matching score is assigned; wherein The preset related words of the advertising keyword include synonym of the advertising keyword, the synonym, the phrase related to the advertising keyword, and/or the deformed vocabulary of the advertising keyword literally generated after the reverse or interval. That is, the matching condition can be appropriately extended compared to the precise inclusion, and can be extended to the synonym of the keyword, the synonym, the related word, and the phrase containing the keyword, or the partial literal order is reversed or spaced, and the like. That is, the matching condition is that the to-be-matched word completely includes the deformed form of the keyword in the keyword library (insertion, inversion, synonym, synonym, related word), and p3 is 8 points.

c, the core includes: if the word segmentation of the image to be analyzed matches the core part of the advertisement keyword in the pre-established advertisement keyword library or the preset related word of the core part, the corresponding third key is assigned Word match score. That is, the matching condition is that the to-be-matched word contains the core part of the keyword in the keyword library, the deformation of the core part of the keyword (insert, reverse, synonym, synonym, related words), and p3 points 6 points.

After the keyword matching is completed, if the word segmentation in the image to be analyzed matches the keyword in the keyword library (whether it is precisely included, synonymous, or core included), and the matched keyword belongs to the font of the high-risk advertisement, It is directly determined that the image to be analyzed contains high-risk advertisements, which need to be eliminated, and the identification is completed without subsequent operations.

If the matching keywords are not in the font of the high-risk ad, that is, the fonts belonging to the dangerous level and the normal level of advertising, further semantic analysis can be continued. For example, whether the advertisement information or its advertisement category, rank, and the like are included in the image to be analyzed may be determined according to the contextual meaning of the matched keyword or a combination of multiple keywords. It can also detect whether the picture to be analyzed contains direct contact information such as qq, WeChat, email address, website address, mobile phone, etc. If included, it can directly determine that the image to be analyzed contains advertisement information, such as non-business system related advertisements. Specifically, the method for detecting whether the direct contact information is included is as follows: when the character in the picture to be analyzed includes a series of numbers, whether there is a monetary unit information, a unit of measurement information, etc., if not, whether the phone number is detected.

Step S4: Identify different font sizes of each character in the image to be analyzed, and assign a corresponding font score according to a preset font scoring rule according to the font size of the matched word segmentation.

When the individual characters in the to-be-analyzed picture are identified by using the optical character recognition OCR on the picture to be analyzed, font size analysis may be performed on each of the recognized characters. Specifically, the image to be analyzed may be subjected to Gaussian blur processing first. , such as f'(x,y)=f(x,y)*g(x,y), where g(x,y)=exp(-(x2+y2)/9), for f'(x, y) Draw a peak distribution map and extract peak distribution maps of different levels according to the step distribution. That is, the general outline of each character in the image to be analyzed is analyzed to distinguish the different font sizes of the characters in the image to be analyzed. If the characters in the peak profile of the preset level are recognized as larger fonts, the remaining characters in the image to be analyzed are recognized as smaller fonts. In the actual application, if the business picture contains advertisement information, in order to attract attention, the advertisement information is generally displayed in a larger font. Therefore, in the present embodiment, the font score p1 is given to the character font in the picture to be analyzed, wherein the font score of the character assignment of the larger font is higher than the font score of the character of the smaller font. For example, a larger font character has p1=2, and a smaller font character has p1=1.

Further, in an optional implementation manner, the font color analysis may be performed on each of the recognized characters, such as the text recognized by the optical characters in the image to be analyzed, and the font color of each character is calculated to be significant. a character that recognizes a font color saliency greater than a preset color saliency threshold as a high color saliency character, and a character whose font color saliency is less than or equal to a preset color saliency threshold as a low color saliency text; Setting a corresponding color saliency score for each character in the image to be analyzed according to the font color saliency, wherein the color saliency score corresponding to the high color saliency text is greater than the color saliency score corresponding to the low color saliency text . Specifically, for the font detected by the OCR, the color saliency of the font is calculated, for example, when the font's drgb=([rgb(x, y-[rgb(s, t))^2 is greater than a certain threshold) The color of the font is highly noticeable. In practical applications, the advertisement information may obtain a better publicity effect by improving the color saliency. Therefore, in this embodiment, the color saliency score p2 is given to the character font color in the image to be analyzed. Wherein, the color saliency score of the character with high color saliency is higher than the color saliency score of the character with low color saliency. For example, p2 of the character with high color saliency and p1 of the character with low color saliency =0.5.

Step S5: Determine, according to the keyword matching score and the font score, whether the image to be analyzed is an advertisement image by using a preset rule.

In this embodiment, when determining whether the image to be analyzed is an advertisement image by using a preset rule, the P value may be calculated according to the following formula:

P=a1*P1+a2*P2+a3*P3

Wherein, P1 is a font score corresponding to a font size of the matched participle in the image to be analyzed, and P2 is a color saliency score corresponding to a font color saliency of the matched participle in the to-be-analyzed picture, P3 is the The keyword matching score corresponding to the matched word segment in the picture to be analyzed; a1, a2, and a3 are parameter weights set in advance for the font score P1, the color saliency score P2, and the keyword matching score P3, for example, a1= can be set. 0.2, a2 = 0.1, a3 = 0.7.

A threshold is set in advance, and when the calculated P value reaches the threshold, the image to be analyzed is determined to be an advertisement image containing advertisement information, and an early warning is performed. In addition, the advertisement information may be comprehensively evaluated according to the font, color, keyword level, number of keywords, etc. of the matched word segmentation in the image to be analyzed, and different measures may be taken for different advertisements by setting advertisement classification and advertisement level. .

Compared with the prior art, the present embodiment discriminates the characters by optical characters by analyzing the pictures; classifies the recognized words; and matches each part word with each advertisement keyword in the pre-established advertisement keyword library, and According to the matching result, the corresponding keyword matching score is assigned according to the preset matching scoring rule; the different font sizes of each text are identified, and the corresponding font score is assigned according to the font size of the matched word segment according to the preset font scoring rule; The keyword matching score and the font score are used to determine whether the image to be analyzed is an advertisement image by using a preset rule. Since the advertisement information generally appears in the image, the advertisement font will be different from other normal texts, such as font size or font color. In this embodiment, each word segment in the image to be analyzed can be matched with each advertisement keyword in the pre-established advertisement keyword library, and the corresponding keyword matching score is assigned according to the matching situation, and the font size is allocated according to the matching word segment. Corresponding font scores, according to the color saliency of the matching participles, set the corresponding color saliency scores. Finally, combined with keyword matching scores, font scores, and color saliency scores for comprehensive identification, it can be more accurately and effectively judged. Whether the picture to be analyzed is an advertisement picture containing advertisement information. Moreover, without manual detection, the identification of the advertisement picture can be automatically performed, and the detection efficiency is effectively improved.

As shown in FIG. 2, FIG. 2 is a schematic flowchart of an embodiment of an advertisement picture identification method according to an embodiment of the present application. The method for identifying an advertisement picture includes the following steps:

Step S10: After receiving the picture to be analyzed, perform optical character recognition on the picture to be analyzed, and identify the text in the picture to be analyzed.

In step S20, word segmentation processing is performed on the recognized characters.

Step S30, matching each word segment with each advertisement keyword in the pre-established advertisement keyword library, obtaining a word segment matching the advertisement keyword in the pre-established advertisement keyword library; and ranking according to the matching result according to the matching result. The rule assigns a corresponding keyword matching score.

Step S40: Identify different font sizes of each character in the image to be analyzed, and assign a corresponding font score according to a preset font scoring rule according to the font size of the matched word segmentation.

When the individual characters in the to-be-analyzed picture are identified by using the optical character recognition OCR on the picture to be analyzed, font size analysis may be performed on each of the recognized characters. Specifically, the image to be analyzed may be subjected to Gaussian blur processing first. , such as f'(x,y)=f(x,y)*g(x,y), where g(x,y)=exp(-(x2+y2)/9), for f'(x, y) Draw a peak distribution map and extract peak distribution maps of different levels according to the step distribution. That is, the general outline of each character in the image to be analyzed is analyzed to distinguish the different font sizes of the characters in the image to be analyzed. If the characters in the peak profile of the preset level are recognized as larger fonts, the remaining characters in the image to be analyzed are recognized as smaller fonts. In the actual application, if the business picture contains advertisement information, in order to attract attention, the advertisement information is generally displayed in a larger font. Therefore, in this embodiment, the font score p1 is given to the character font in the picture to be analyzed, wherein the font score of the character assignment of the larger font is higher than the font score of the character of the smaller font. For example, a larger font character has p1=2, and a smaller font character has p1=1.

Step S50: Determine, according to the keyword matching score and the font score, whether the image to be analyzed is an advertisement image by using a preset rule.

P=a1*P1+a2*P2+a3*P3

Moreover, the present application also provides a computer readable storage medium storing an advertisement picture authentication system, the advertisement picture authentication system being executable by at least one processor to cause the at least one processor The specific implementation process of the steps S10, S20, and S30 of the method for identifying the advertisement image is as described above, and is not described herein again.

It is to be understood that the term "comprises", "comprising", or any other variants thereof, is intended to encompass a non-exclusive inclusion, such that a process, method, article, or device comprising a series of elements includes those elements. It also includes other elements that are not explicitly listed, or elements that are inherent to such a process, method, article, or device. An element that is defined by the phrase "comprising a ..." does not exclude the presence of additional equivalent elements in the process, method, item, or device that comprises the element.

Through the description of the above embodiments, those skilled in the art can clearly understand that the foregoing embodiment method can be implemented by means of software plus a necessary general hardware platform, and can also be implemented by hardware, but in many cases, the former is A better implementation. Based on such understanding, the technical solution of the present application, which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a storage medium (such as ROM/RAM, disk, The optical disc includes a number of instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the methods described in various embodiments of the present application.

The preferred embodiments of the present application have been described above with reference to the drawings, and are not intended to limit the scope of the application. The serial numbers of the embodiments of the present application are merely for the description, and do not represent the advantages and disadvantages of the embodiments. Additionally, although logical sequences are shown in the flowcharts, in some cases the steps shown or described may be performed in a different order than the ones described herein.

A person skilled in the art can implement the present application in various variants without departing from the scope and spirit of the present application. For example, the features as one embodiment can be used in another embodiment to obtain another embodiment. Any modifications, equivalent substitutions and improvements made within the technical concept of the application should be within the scope of the application.

Claims

An electronic device, comprising: a memory, a processor, on the memory, an advertisement picture authentication system operable on the processor, wherein the advertisement picture identification system is used by the processor The following steps are implemented during execution:

After receiving the picture to be analyzed, performing optical character recognition on the picture to be analyzed, and identifying the text in the picture to be analyzed;

Perform word segmentation on the recognized words;

Matching each participle with each advertisement keyword in the pre-established advertisement keyword library to obtain a word segment matching the advertisement keyword in the pre-established advertisement keyword library; and assigning a corresponding according to the matching result according to the preset matching scoring rule Keyword matching rating;

Identifying different font sizes of each character in the image to be analyzed, and assigning a corresponding font score according to a preset font score rule according to a font size of the matched word segment;

And determining, according to the keyword matching score and the font score, whether the image to be analyzed is an advertisement image by using a preset rule.
The electronic device according to claim 1, wherein the identifying different font sizes of each character in the image to be analyzed comprises:

Performing Gaussian blur processing on the image to be analyzed, drawing a peak distribution map of the image to be analyzed after Gaussian blur processing, extracting peak distribution maps of different levels according to the step distribution; and identifying characters in the peak distribution map of the preset level For a larger font, the remaining characters in the image to be analyzed are identified as smaller fonts;

The preset font scoring rules include:

The corresponding font score is set according to the font size for each character in the image to be analyzed, wherein the font score corresponding to the character of the larger font is greater than the font score corresponding to the character of the smaller font.
The electronic device according to claim 1, wherein the processor is further configured to execute the advertisement picture authentication system to implement the following steps:

Calculating a font color saliency of each character for the characters recognized by the optical characters in the image to be analyzed;

Recognizing a character whose font color saliency is greater than a preset color saliency threshold as a character with a high color saliency, and identifying a character whose font color saliency is less than or equal to a preset color saliency threshold as a character with a low color saliency;

Setting a corresponding color saliency score for each character in the image to be analyzed according to the font color saliency, wherein the color saliency score corresponding to the high color saliency text is greater than the color saliency score corresponding to the low color saliency text .
The electronic device according to claim 2, wherein the processor is further configured to execute the advertisement picture authentication system to implement the following steps:

Calculating the font color saliency of each character for the characters recognized by the optical characters in the image to be analyzed;

Recognizing a character whose font color saliency is greater than a preset color saliency threshold as a character with a high color saliency, and identifying a character whose font color saliency is less than or equal to a preset color saliency threshold as a character with a low color saliency;

Setting a corresponding color saliency score for each character in the image to be analyzed according to the font color saliency, wherein the color saliency score corresponding to the high color saliency text is greater than the color saliency score corresponding to the low color saliency text .
The electronic device according to claim 3, wherein the preset matching scoring rule comprises:

If the word segmentation of the to-be-analyzed image matches the pre-established keyword in the advertisement keyword library as a preset high-risk level advertisement word, directly determining that the image to be analyzed is an advertisement picture;

If the word segmentation of the to-be-analyzed image is not a preset high-risk ad word with a pre-established ad keyword library, then:

If the word segmentation of the image to be analyzed matches the advertisement keyword in the pre-established advertisement keyword library, assigning a corresponding first keyword matching score;

If the word segmentation of the image to be analyzed matches the preset related word of the advertisement keyword in the pre-established advertisement keyword library, the corresponding second keyword matching score is assigned; wherein the preset keyword of the advertisement keyword Synonyms, synonymous words, phrases related to the advertising keyword, and/or the vocabulary of the deformed form after the reverse or interval of the advertising keyword literally;

If the word segmentation of the image to be analyzed matches the core part of the advertisement keyword in the pre-established advertisement keyword library or the preset related word of the core part, the corresponding third keyword matching score is allocated;

The first keyword matching score is greater than the second keyword matching score, and the second keyword matching score is greater than the third keyword matching score.
The electronic device according to claim 4, wherein the preset matching scoring rule comprises:

If the word segmentation of the to-be-analyzed image matches the pre-established keyword in the advertisement keyword library as a preset high-risk level advertisement word, directly determining that the image to be analyzed is an advertisement picture;

If the word segmentation of the to-be-analyzed image is not a preset high-risk ad word with a pre-established ad keyword library, then:

If the word segmentation of the image to be analyzed matches the advertisement keyword in the pre-established advertisement keyword library, assigning a corresponding first keyword matching score;

If the word segmentation of the image to be analyzed matches the preset related word of the advertisement keyword in the pre-established advertisement keyword library, the corresponding second keyword matching score is assigned; wherein the preset keyword of the advertisement keyword Synonym, synonym, phrase related to the advertisement keyword, and/or the keyword of the advertisement keyword may be reversed or separated to form a deformed form vocabulary;

If the word segmentation of the image to be analyzed matches the core part of the advertisement keyword in the pre-established advertisement keyword library or the preset related word of the core part, the corresponding third keyword matching score is allocated;

The first keyword matching score is greater than the second keyword matching score, and the second keyword matching score is greater than the third keyword matching score.
The electronic device according to claim 5, wherein the determining, by the preset rule, whether the image to be analyzed is an advertisement image comprises:

Calculate the P value according to the following formula:

P=a1*P1+a2*P2+a3*P3

Wherein, P1 is a font score corresponding to a font size of the matched participle in the image to be analyzed, and P2 is a color saliency score corresponding to a font color saliency of the matched participle in the to-be-analyzed picture, P3 is the The keyword matching score corresponding to the matched participle in the picture to be analyzed; a1, a2, and a3 are parameter weights set in advance for the font score P1, the color saliency score P2, and the keyword matching score P3;

The calculated P value is compared with a preset threshold. If the P value is greater than a preset threshold, it is determined that the to-be-analyzed picture is an advertisement picture.
The electronic device according to claim 6, wherein the determining, by the preset rule, whether the image to be analyzed is an advertisement image comprises:

Calculate the P value according to the following formula:

P=a1*P1+a2*P2+a3*P3

Wherein, P1 is a font score corresponding to a font size of the matched participle in the image to be analyzed, and P2 is a color saliency score corresponding to a font color saliency of the matched participle in the to-be-analyzed picture, P3 is the The keyword matching score corresponding to the matched participle in the picture to be analyzed; a1, a2, and a3 are parameter weights set in advance for the font score P1, the color saliency score P2, and the keyword matching score P3;

The calculated P value is compared with a preset threshold. If the P value is greater than a preset threshold, it is determined that the to-be-analyzed picture is an advertisement picture.
An advertisement picture identification method, characterized in that the advertisement picture identification method comprises:

After receiving the picture to be analyzed, performing optical character recognition on the picture to be analyzed, and identifying the text in the picture to be analyzed;

Perform word segmentation on the recognized words;

Matching each participle with each advertisement keyword in the pre-established advertisement keyword library to obtain a word segment matching the advertisement keyword in the pre-established advertisement keyword library; and assigning a corresponding according to the matching result according to the preset matching scoring rule Keyword matching rating;

Identifying different font sizes of each character in the image to be analyzed, and assigning a corresponding font score according to a preset font score rule according to a font size of the matched word segment;

And determining, according to the keyword matching score and the font score, whether the image to be analyzed is an advertisement image by using a preset rule.
The method for identifying an advertisement image according to claim 9, wherein the identifying different font sizes of each character in the image to be analyzed includes:

Performing Gaussian blur processing on the image to be analyzed, drawing a peak distribution map of the image to be analyzed after Gaussian blur processing, extracting peak distribution maps of different levels according to the step distribution; and identifying characters in the peak distribution map of the preset level For a larger font, the remaining characters in the image to be analyzed are identified as smaller fonts;

The preset font scoring rules include:

The corresponding font score is set according to the font size for each character in the image to be analyzed, wherein the font score corresponding to the character of the larger font is greater than the font score corresponding to the character of the smaller font.
The method for identifying an advertisement picture according to claim 9, wherein the method further comprises:

Calculating a font color saliency of each character for the characters recognized by the optical characters in the image to be analyzed;

Recognizing a character whose font color saliency is greater than a preset color saliency threshold as a character with a high color saliency, and identifying a character whose font color saliency is less than or equal to a preset color saliency threshold as a character with a low color saliency;

Setting a corresponding color saliency score for each character in the image to be analyzed according to the font color saliency, wherein the color saliency score corresponding to the high color saliency text is greater than the color saliency score corresponding to the low color saliency text .
The method for identifying an advertisement picture according to claim 10, wherein the method further comprises:

Calculating a font color saliency of each character for the characters recognized by the optical characters in the image to be analyzed;

Recognizing a character whose font color saliency is greater than a preset color saliency threshold as a character with a high color saliency, and identifying a character whose font color saliency is less than or equal to a preset color saliency threshold as a character with a low color saliency;

Setting a corresponding color saliency score for each character in the image to be analyzed according to the font color saliency, wherein the color saliency score corresponding to the high color saliency text is greater than the color saliency score corresponding to the low color saliency text .
The method for identifying an advertisement picture according to claim 11, wherein the preset matching scoring rule comprises:

If the word segmentation of the to-be-analyzed image matches the pre-established advertisement keyword library as a preset high-risk level advertisement word, directly determining that the image to be analyzed is an advertisement image;

If the word segmentation of the to-be-analyzed image is not a preset high-risk ad word with a pre-established ad keyword library, then:

If the word segmentation of the image to be analyzed matches the advertisement keyword in the pre-established advertisement keyword library, assigning a corresponding first keyword matching score;

If the word segmentation of the image to be analyzed matches the preset related word of the advertisement keyword in the pre-established advertisement keyword library, the corresponding second keyword matching score is assigned; wherein the preset keyword of the advertisement keyword Synonyms, synonymous words, phrases related to the advertising keyword, and/or the vocabulary of the deformed form after the reverse or interval of the advertising keyword literally;

If the word segmentation of the image to be analyzed matches the core part of the advertisement keyword in the pre-established advertisement keyword library or the preset related word of the core part, the corresponding third keyword matching score is allocated;

The first keyword matching score is greater than the second keyword matching score, and the second keyword matching score is greater than the third keyword matching score.
The method for identifying an advertisement picture according to claim 12, wherein the preset matching scoring rule comprises:

If the word segmentation of the to-be-analyzed image matches the pre-established keyword in the advertisement keyword library as a preset high-risk level advertisement word, directly determining that the image to be analyzed is an advertisement picture;

If the word segmentation of the to-be-analyzed image is not a preset high-risk ad word with a pre-established ad keyword library, then:

If the word segmentation of the image to be analyzed matches the advertisement keyword in the pre-established advertisement keyword library, assigning a corresponding first keyword matching score;

If the word segmentation of the image to be analyzed matches the preset related word of the advertisement keyword in the pre-established advertisement keyword library, the corresponding second keyword matching score is assigned; wherein the preset keyword of the advertisement keyword Synonyms, synonymous words, phrases related to the advertising keyword, and/or the vocabulary of the deformed form after the reverse or interval of the advertising keyword literally;

If the word segmentation of the image to be analyzed matches the core part of the advertisement keyword in the pre-established advertisement keyword library or the preset related word of the core part, the corresponding third keyword matching score is allocated;

The first keyword matching score is greater than the second keyword matching score, and the second keyword matching score is greater than the third keyword matching score.
The method for identifying an advertisement image according to claim 13, wherein the determining, by the preset rule, whether the image to be analyzed is an advertisement image comprises:

Calculate the P value according to the following formula:

P=a1*P1+a2*P2+a3*P3

Wherein, P1 is a font score corresponding to a font size of the matched participle in the image to be analyzed, and P2 is a color saliency score corresponding to a font color saliency of the matched participle in the to-be-analyzed picture, P3 is the The keyword matching score corresponding to the matched participle in the picture to be analyzed; a1, a2, and a3 are parameter weights set in advance for the font score P1, the color saliency score P2, and the keyword matching score P3;

The calculated P value is compared with a preset threshold. If the P value is greater than a preset threshold, it is determined that the to-be-analyzed picture is an advertisement picture.
The method for identifying an advertisement image according to claim 14, wherein the determining, by the preset rule, whether the image to be analyzed is an advertisement image comprises:

Calculate the P value according to the following formula:

P=a1*P1+a2*P2+a3*P3

Wherein, P1 is a font score corresponding to a font size of the matched participle in the image to be analyzed, and P2 is a color saliency score corresponding to a font color saliency of the matched participle in the to-be-analyzed picture, P3 is the The keyword matching score corresponding to the matched participle in the picture to be analyzed; a1, a2, and a3 are parameter weights set in advance for the font score P1, the color saliency score P2, and the keyword matching score P3;

The calculated P value is compared with a preset threshold. If the P value is greater than a preset threshold, it is determined that the to-be-analyzed picture is an advertisement picture.
A computer readable storage medium, wherein the computer readable storage medium stores an advertisement picture authentication system, and when the advertisement picture authentication system is executed by the processor, the following steps are implemented:

After receiving the picture to be analyzed, performing optical character recognition on the picture to be analyzed, and identifying the text in the picture to be analyzed;

Perform word segmentation on the recognized words;

Matching each participle with each advertisement keyword in the pre-established advertisement keyword library to obtain a word segment matching the advertisement keyword in the pre-established advertisement keyword library; and assigning a corresponding according to the matching result according to the preset matching scoring rule Keyword matching rating;

Identifying different font sizes of each character in the image to be analyzed, and assigning a corresponding font score according to a preset font score rule according to a font size of the matched word segment;

And determining, according to the keyword matching score and the font score, whether the image to be analyzed is an advertisement image by using a preset rule.
The computer readable storage medium according to claim 17, wherein the identifying different font sizes of each of the characters in the image to be analyzed comprises:

Performing Gaussian blur processing on the image to be analyzed, drawing a peak distribution map of the image to be analyzed after Gaussian blur processing, extracting peak distribution maps of different levels according to the step distribution; and identifying characters in the peak distribution map of the preset level For a larger font, the remaining characters in the image to be analyzed are identified as smaller fonts;

The preset font scoring rules include:

The corresponding font score is set according to the font size for each character in the image to be analyzed, wherein the font score corresponding to the character of the larger font is greater than the font score corresponding to the character of the smaller font.
The computer readable storage medium of claim 17 further comprising:

Calculating a font color saliency of each character for the characters recognized by the optical characters in the image to be analyzed;

Recognizing a character whose font color saliency is greater than a preset color saliency threshold as a character with a high color saliency, and identifying a character whose font color saliency is less than or equal to a preset color saliency threshold as a character with a low color saliency;

Setting a corresponding color saliency score for each character in the image to be analyzed according to the font color saliency, wherein the color saliency score corresponding to the high color saliency text is greater than the color saliency score corresponding to the low color saliency text .
The computer readable storage medium of claim 18, wherein the method further comprises:

Calculating a font color saliency of each character for the characters recognized by the optical characters in the image to be analyzed;

Recognizing a character whose font color saliency is greater than a preset color saliency threshold as a character with a high color saliency, and identifying a character whose font color saliency is less than or equal to a preset color saliency threshold as a character with a low color saliency;

Setting a corresponding color saliency score for each character in the image to be analyzed according to the font color saliency, wherein the color saliency score corresponding to the high color saliency text is greater than the color saliency score corresponding to the low color saliency text .