CN114881698A - Advertisement compliance auditing method and device, electronic equipment and storage medium - Google Patents

Advertisement compliance auditing method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN114881698A
CN114881698A CN202210542100.1A CN202210542100A CN114881698A CN 114881698 A CN114881698 A CN 114881698A CN 202210542100 A CN202210542100 A CN 202210542100A CN 114881698 A CN114881698 A CN 114881698A
Authority
CN
China
Prior art keywords
picture
advertisement
brand
pictures
enhanced
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202210542100.1A
Other languages
Chinese (zh)
Inventor
王帅峰
乔建秀
朱运
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202210542100.1A priority Critical patent/CN114881698A/en
Publication of CN114881698A publication Critical patent/CN114881698A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/53Querying
    • G06F16/532Query formulation, e.g. graphical querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/5846Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using extracted text
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/146Aligning or centring of the image pick-up or image-field
    • G06V30/1463Orientation detection or correction, e.g. rotation of multiples of 90 degrees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/146Aligning or centring of the image pick-up or image-field
    • G06V30/147Determination of region of interest
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/16Image preprocessing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/16Image preprocessing
    • G06V30/162Quantising the image signal
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/16Image preprocessing
    • G06V30/168Smoothing or thinning of the pattern; Skeletonisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/18Extraction of features or characteristics of the image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/19173Classification techniques

Abstract

The invention relates to an artificial intelligence technology, and discloses an advertisement compliance auditing method, which comprises the following steps: obtaining an advertisement picture, and performing picture enhancement processing on the advertisement picture to obtain an enhanced picture; classifying the enhanced pictures into brand pictures and sensitive information pictures; extracting text information of the sensitive information picture, and extracting the illegal content of the sensitive information in the text information by using a preset auditing model; and searching illegal brand contents of the brand pictures one by one, and collecting the illegal brand contents of the sensitive information and the illegal brand contents into a compliance auditing result of the advertisement pictures. In addition, the invention also relates to a block chain technology, and the data list can be stored in the node of the block chain. The invention also provides an advertisement compliance auditing device, electronic equipment and a storage medium. The invention can improve the efficiency of the advertisement compliance audit.

Description

Advertisement compliance auditing method and device, electronic equipment and storage medium
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to an advertisement compliance auditing method, an advertisement compliance auditing device, electronic equipment and a computer-readable storage medium.
Background
With the development of the internet, the speed and richness of people to obtain information on the internet are continuously enhanced. However, the social impact is caused by the large spread of various bad information on the internet, especially the inferior content of commercial advertisements. Under the double stress of risk and user experience, the enhancement of the examination and verification of the internet advertisement content is a matter which is required to be done by each company.
With increasing formats, putting channels and the like of brand advertisements, manual review is time-consuming and labor-consuming and has high review difficulty. The auditor needs to remember the illegal advertisement content to audit the content, but the illegal information is complicated and complicated, and meanwhile, the auditor lacks the knowledge of sound advertisement laws and sensitive information, and the edge distance, the space, the font type, the font color and the like which are difficult to judge by manual auditing all result in low efficiency of advertisement compliance auditing, so that how to improve the efficiency of advertisement compliance auditing becomes a problem to be solved urgently.
Disclosure of Invention
The invention provides an advertisement compliance auditing method, an advertisement compliance auditing device and a computer readable storage medium, and mainly aims to solve the problem of low efficiency in advertisement compliance auditing.
In order to achieve the above object, the present invention provides an advertisement compliance auditing method, which includes:
obtaining an advertisement picture, and performing picture enhancement processing on the advertisement picture to obtain an enhanced picture;
classifying the enhanced pictures into brand pictures and sensitive information pictures;
extracting text information of the sensitive information picture, and extracting the illegal content of the sensitive information in the text information by using a preset auditing model;
and searching the illegal brand contents of the brand pictures one by one, and collecting the illegal sensitive information contents and the illegal brand contents as the compliance audit results of the advertisement pictures.
Optionally, the performing picture enhancement processing on the advertisement picture to obtain an enhanced picture includes:
uniformly cutting the advertisement pictures to obtain a plurality of advertisement picture blocks;
performing pixel convolution on each advertisement picture block to obtain a plurality of convolution advertisement picture blocks;
respectively carrying out Gaussian smoothing treatment on each convolution advertisement picture block to obtain a plurality of smooth advertisement picture blocks;
and splicing the smooth advertisement picture blocks to obtain an enhanced picture of the advertisement picture.
Optionally, the performing picture enhancement processing on the advertisement picture to obtain an enhanced picture includes:
counting black point pixel values of horizontal projection of the advertisement picture, and selecting a region with the maximum black point pixel value in the horizontal projection as a target region;
calculating the variance of the black pixel values of the target area;
rotating the target area according to a preset angle, and calculating the variance of the black pixel values of the horizontal projection image of the rotated target area to obtain a rotation variance;
calculating to obtain an optimal inclination angle according to the difference between the variance of the black pixel value of the target area and the rotation variance;
and rotating the advertisement picture by utilizing the optimal inclination angle to obtain an enhanced picture.
Optionally, the classifying the enhanced picture into a brand picture and a sensitive information picture includes:
extracting a characteristic region in the enhanced picture, and determining the picture characteristic of the enhanced picture according to the characteristic region;
and classifying the enhanced pictures according to the picture characteristics to obtain brand pictures and sensitive information pictures.
Optionally, the extracting a feature region in the enhanced picture, and determining a picture characteristic of the enhanced picture by using the feature region includes:
dividing the enhanced picture into a plurality of enhanced picture blocks according to a preset proportion;
selecting one enhancement picture block from the plurality of enhancement picture blocks one by one as a target enhancement picture block;
generating global features of the target enhancement picture block according to the pixel gradient in the target enhancement picture block;
performing frame selection on the regions in the target enhanced picture block one by using a preset sliding window to obtain a pixel window;
generating local features of the target enhancement picture block according to the pixel values in each pixel window;
and collecting the global features and the local features as the picture characteristics of the target enhanced picture block.
Optionally, the extracting the text information of the sensitive information picture includes:
acquiring projection information of the sensitive information picture, and performing layout analysis on the projection information to obtain a layout analysis image;
performing line character segmentation on all lines of the layout analysis image one by one to obtain line texts;
and acquiring column segmentation characters of each line text, and identifying the column segmentation characters by using a pre-trained character identification model to obtain text information of the sensitive information picture.
Optionally, the retrieving brand violation content of the brand pictures one by one includes:
extracting the features of the brand pictures to obtain brand picture features;
coding the brand picture features one by one to obtain brand picture codes;
selecting one of the brand picture codes from the brand picture codes one by one to serve as a target brand picture code, calculating the Hamming distance between the target brand picture code and the unselected brand picture codes one by one, and selecting the brand picture with the Hamming distance smaller than a preset distance threshold value as the illegal content of the brand.
In order to solve the above problem, the present invention also provides an advertisement compliance auditing apparatus, including:
the picture enhancement module is used for acquiring an advertisement picture and carrying out picture enhancement processing on the advertisement picture to obtain an enhanced picture;
the picture classification module is used for classifying the enhanced pictures into brand pictures and sensitive information pictures;
the sensitive information module is used for extracting text information of the sensitive information picture and extracting the illegal content of the sensitive information in the text information by using a preset auditing model;
and the brand violation module is used for retrieving the brand violation contents of the brand pictures one by one and collecting the sensitive information violation contents and the brand violation contents into a compliance auditing result of the advertisement pictures.
In order to solve the above problem, the present invention also provides an electronic device, including:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the advertisement compliance review method described above.
In order to solve the above problem, the present invention further provides a computer-readable storage medium, in which at least one computer program is stored, and the at least one computer program is executed by a processor in an electronic device to implement the advertisement compliance checking method described above.
The method comprises the steps of releasing mass contents on the Internet every day, enabling the number of advertisement pictures to be huge, enabling image classification to face various challenges such as visual angle change, illumination conditions, shapes, size change, shielding, background interference and intra-class difference, conducting enhancement processing on the advertisement pictures, being capable of effectively distinguishing different types of target advertisement pictures, conducting text information extraction on the sensitive information pictures, then utilizing a preset auditing model to filter risk contents to be optimal selection, and meanwhile retrieving illegal brand contents of the brand pictures according to the characteristics of the brand pictures to achieve automatic and flow brand auditing.
Drawings
Fig. 1 is a schematic flow chart of an advertisement compliance auditing method according to an embodiment of the present invention;
fig. 2 is a schematic flow chart of picture enhancement according to an embodiment of the present invention;
FIG. 3 is a flow chart illustrating sensitive information provided in accordance with an embodiment of the present invention;
FIG. 4 is a functional block diagram of an advertisement compliance audit device according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of an electronic device for implementing the advertisement compliance auditing method according to an embodiment of the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The embodiment of the application provides an advertisement compliance auditing method. The execution subject of the advertisement compliance auditing method includes but is not limited to at least one of the electronic devices of a server, a terminal and the like which can be configured to execute the method provided by the embodiment of the application. In other words, the advertisement compliance auditing method may be performed by software or hardware installed in the terminal device or the server device, and the software may be a block chain platform. The server includes but is not limited to: a single server, a server cluster, a cloud server or a cloud server cluster, and the like. The server may be an independent server, or may be a cloud server that provides basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a Content Delivery Network (CDN), a big data and artificial intelligence platform, and the like.
Fig. 1 is a schematic flow chart of an advertisement compliance auditing method according to an embodiment of the present invention. In this embodiment, the advertisement compliance auditing method includes:
s1, obtaining an advertisement picture, and performing picture enhancement processing on the advertisement picture to obtain an enhanced picture;
in the embodiment of the invention, the obtained advertisement pictures can be retrieved from each large website through a picture collector and stored in a specified database; the method can be obtained by scanning a specified paper edition picture; can be obtained by accessing the user photo album after the user authorizes.
In the embodiment of the invention, because a large amount of noise pixels or interference information may exist in the obtained training advertisement picture, in order to improve the accuracy of finally identifying the advertisement picture, image enhancement can be performed on the advertisement picture, wherein the image enhancement comprises processing such as noise pixel elimination, binarization processing, texture enhancement, inclination correction and the like.
In this embodiment of the present invention, the performing picture enhancement processing on the advertisement picture to obtain an enhanced picture includes:
s21, uniformly cutting the advertisement pictures to obtain a plurality of advertisement picture blocks;
s22, performing pixel convolution on each advertisement picture block to obtain a plurality of convolution advertisement picture blocks;
s23, respectively carrying out Gaussian smoothing processing on each convolution advertisement picture block to obtain a plurality of smooth advertisement picture blocks;
and S24, splicing the smooth advertisement picture blocks to obtain the enhanced picture of the advertisement picture.
In the embodiment of the invention, the advertisement picture is uniformly cut into a plurality of advertisement picture blocks, which is beneficial to reducing the number of pixels in each advertisement picture block, thereby improving the efficiency of eliminating noise pixels of the advertisement picture.
Specifically, in the embodiment of the present invention, a Gabor filter is adopted to perform pixel convolution on the advertisement picture blocks, and the Gabor filter performs convolution calculation on the advertisement picture blocks according to the preset direction number and the preset scale number, only pixels that meet the preset standard are allowed to pass through, and pixels that do not meet the filter are suppressed.
In the embodiment of the invention, the Gaussian kernel function is used for carrying out Gaussian smoothing processing on the plurality of convolution advertisement picture blocks to obtain a plurality of smooth advertisement picture blocks, the Gaussian kernel function is also called as a radial basis function and is a commonly used smooth kernel function, and the rotational symmetry of the Gaussian kernel function can be used for smoothly mapping the finite dimensional data (namely pixel values) to a high dimensional space so as to realize the Gaussian smoothing processing on the plurality of convolution advertisement picture blocks.
In this embodiment of the present invention, the performing picture enhancement processing on the advertisement picture to obtain an enhanced picture includes: counting black point pixel values of horizontal projection of the advertisement picture, and selecting a region with the maximum black point pixel value in the horizontal projection as a target region; calculating the variance of the black pixel values of the target area; rotating the target area according to a preset angle, and calculating the variance of the black pixel values of the horizontal projection image of the rotated target area to obtain a rotation variance; calculating to obtain an optimal inclination angle according to the difference between the variance of the black pixel value of the target area and the rotation variance; and rotating the advertisement picture by utilizing the optimal inclination angle to obtain an enhanced picture.
Specifically, the pixel is the smallest image unit, and one picture is composed of many pixels. For example: the color of a pixel is represented by three values of RGB, so the pixel matrix corresponds to three color vector matrices, namely, an R matrix (500 × 338 size), a G matrix (500 × 338 size), and a B matrix (500 × 33 size). If the values of the first row and the first column of each matrix are: r: 240, G: 223, B: 204, so the color of this pixel is (240, 223, 204)
In detail, the black pixel value of the horizontal projection of the advertisement picture can be obtained by using image binarization processing. The image binarization processing is a process of setting the gray value of a pixel point on an image to be 0 or 255, namely, enabling the whole image to have an obvious black-and-white effect. Each pixel of the binary image has only two values: either pure black or pure white.
In detail, the selecting the region with the maximum black point pixel value in the horizontal projection as the target region may be dividing a pair of binarized pictures into 5 blocks at random, selecting one block with the largest number of black points from the 5 blocks as the target region, and counting the number of black points in each row in the target region. Because the binary image data is simple enough, many visual algorithms rely on binary images through which the shape and contour of an object can be better analyzed.
Specifically, the variance of the black pixel value of the target region represents the contrast of the shading in the image, and the larger the variance, the more obvious the shading in the image.
S2, classifying the enhanced pictures into brand pictures and sensitive information pictures;
in the embodiment of the invention, the reason for classifying the enhanced pictures is that the whole system is used for auditing the brand compliance contents, and if some irrelevant pictures (people, buildings, landscapes and expressions) are subjected to relevant operations, the efficiency of the system is inevitably influenced greatly. Therefore, before enhanced picture analysis is performed, the pictures need to be classified, the brand pictures and the sensitive information pictures need to be detected, and other irrelevant pictures are filtered out to improve the auditing efficiency.
In an embodiment of the present invention, the classifying the enhanced picture into a brand picture and a sensitive information picture includes: extracting a characteristic region in the enhanced picture, and determining the picture characteristic of the enhanced picture according to the characteristic region; and classifying the enhanced pictures according to the picture characteristics to obtain brand pictures and sensitive information pictures.
In detail, the extracting a feature region in the enhanced picture, and determining a picture characteristic of the enhanced picture by using the feature region includes: dividing the enhanced picture into a plurality of enhanced picture blocks according to a preset proportion; selecting one enhancement picture block from the plurality of enhancement picture blocks one by one as a target enhancement picture block; generating global features of the target enhancement picture block according to the pixel gradient in the target enhancement picture block; performing frame selection on the regions in the target enhanced picture block one by using a preset sliding window to obtain a pixel window; generating local features of the target enhancement picture block according to the pixel values in each pixel window; and collecting the global features and the local features as the picture characteristics of the target enhanced picture block.
In detail, since the enhanced picture includes a large amount of pixel information, but each pixel information is not key information of the enhanced picture, the enhanced picture may be divided according to a preset ratio to divide the enhanced picture into a plurality of enhanced picture blocks, so as to perform accurate analysis on each enhanced picture block in the following process.
Specifically, an image frame may be generated according to the preset size, and then the generated image frame is used to perform non-repetitive framing in the enhanced picture to obtain a plurality of enhanced picture blocks.
For example, if the length of the enhanced picture is 10cm and the width of the enhanced picture is 10cm, and the length of the image frame generated according to the preset size is 2cm and the width of the image frame is 2cm, 25 enhanced picture blocks with the length of 2cm and the width of 2cm can be obtained by using the image frame to perform frame selection in the enhanced picture.
Further, in order to perform a targeted analysis on each enhanced picture block in the enhanced picture, the image features corresponding to each enhanced picture block in the plurality of enhanced picture blocks may be extracted respectively.
In detail, the image features include global and local features of each enhanced picture block.
In one embodiment of the present invention, the global features of the target enhanced picture block may be generated by using HOG (Histogram of Oriented Gradient), DPM (Deformable Part Model), LBP (Local Binary Patterns), and the like, or may be extracted by using a pre-trained artificial intelligence Model with a specific image feature extraction function, where the artificial intelligence Model includes, but is not limited to, VGG-net Model and U-net Model.
In detail, the picture characteristics may include visual features, statistical features, algebraic features, transform coefficient features, and other physical features. The visual features refer to the sensory features of human vision on the target, and comprise colors, edges, contours, regional textures, shapes and the like; the statistical characteristics refer to uniqueness expression obtained by performing statistical calculation on related samples in the image, and the uniqueness expression comprises a color histogram, a gray level histogram, invariant moment and the like; the algebraic characteristics refer to the algebraic relation of image contents, such as singular value decomposition, principal component analysis and the like of the image; the transform coefficient characteristics refer to frequency domain coefficients obtained by performing spatial domain and frequency domain conversion on an image, such as Fourier transform coefficients and wavelet transform coefficients.
Specifically, the core of classifying the enhanced picture according to the picture characteristics is the task of assigning a label to a picture from a given classification set, for example: a color label. The color label is used for counting the distribution quantity of pixels in an image in different quantization intervals according to colors, is convenient to calculate and easy to count, and has strong capability of resisting image deformation and rotation, and a color histogram does not contain coordinate information, so that the color label does not have color space information, is suitable for application with large difference between a target and a background color, and has good robustness on rotation and deformation of the target and partial shielding of the target.
S3, extracting the text information of the sensitive information picture, and extracting the violation content of the sensitive information in the text information by using a preset auditing model;
in the embodiment of the invention, the character detection is a very important link in the process of extracting the text information of the sensitive information picture, the main aim of the character detection is to detect the position of a character area in the picture, and the text information of the sensitive information picture can be extracted by using the area of the text detected by the characters only if the area of the text is found.
In this embodiment of the present invention, the extracting the text information of the sensitive information picture includes:
s31, acquiring projection information of the sensitive information picture, and performing layout analysis on the projection information to obtain a layout analysis image;
s32, performing line character segmentation on all lines of the layout analysis image one by one to obtain line texts;
and S33, acquiring column segmentation characters of each line text, and recognizing the column segmentation characters by using a pre-trained character recognition model to obtain text information of the sensitive information picture.
Specifically, the projection information of the sensitive information picture refers to the projection in the corresponding direction of the image, a straight line is taken in the corresponding direction, the number of black points of pixels on the image perpendicular to the straight line (axis) is counted, and the sum is taken as the value of the position of the axis. For example: horizontal projection means that each line element of a picture is counted (i.e. counted in the horizontal direction), and then a statistical result graph is drawn according to the statistical result, so as to determine the starting point and the ending point of each line. The vertical projection is similar, except that its projection direction is downward, i.e. the number of elements in each column is counted.
In detail, the layout analysis refers to analyzing a block structure of a text image so as to extract the text information of the sensitive information picture. The layout analysis is particularly important due to the relationship between the recognition accuracy and the correct character sequence. Among them, some rules of layout analysis play an important role. For example: a typical text image has some of the following features: the row spacing in the segments is smaller than the row spacing between the segments; the head of the section has a blank, and the tail of the section has a blank; the proportion of black pixels of the image is larger than that of characters; article typesetting is horizontal and vertical, and the like. Simple layout analysis can be achieved using these rules. A simple method is a top-down method, i.e. the image is divided into blocks according to the blank in the image, and the division according to the blank part is continued in these blocks until it can not be subdivided. And finally, determining whether the area is text or image according to the ratio of black and white pixels in the block.
Specifically, the pre-trained character recognition model is to cut characters according to a general segmentation method to obtain a stack of candidate cut character sets, count sizes of most characters of the character sets to obtain a standard size, select standard characters according to the standard size, cut and store the selected standard characters to obtain segmented characters, paint original positions of the segmented characters to white, corrode remaining pictures to obtain sticky characters, cut the sticky characters by the general segmentation method to obtain a complete character set, and recognize the sensitive information picture according to the complete character set to obtain text information of the sensitive information picture.
S4, searching the illegal brand contents of the brand pictures one by one, and collecting the illegal sensitive information contents and the illegal brand contents as a compliance auditing result of the advertisement pictures.
In the embodiment of the invention, the image retrieval technology can be utilized to reduce the auditing difficulty, and the automatic and flow brand advertisement auditing is realized. Like the edge distance, the space, the font type, the font color and the like which are difficult to judge manually, the image processing and analysis can easily and accurately carry out auditing; when the brand fonts are audited, the font types of the single characters are obtained by using an image searching technology, the final brand fonts are obtained through a voting mechanism, and the auditing efficiency is improved.
In an embodiment of the present invention, the retrieving the illegal brand contents of the brand pictures one by one includes: extracting the features of the brand pictures to obtain brand picture features; coding the brand picture features one by one to obtain brand picture codes; selecting one of the brand picture codes from the brand picture codes one by one to serve as a target brand picture code, calculating the Hamming distance between the target brand picture code and the unselected brand picture codes one by one, and selecting the brand picture with the Hamming distance smaller than a preset distance threshold value as the illegal content of the brand.
Specifically, the feature extraction refers to extracting image information using a computer, and deciding whether or not a point of each image belongs to one image feature. The result of feature extraction is to divide the points on the image into different subsets, which often belong to isolated points, continuous curves or continuous regions. The feature extraction includes extracting a font type of the brand advertisement, wherein the font type has a serif font or a serif font, for example: georgia, Times New Roman, etc. are commonly used in Serif fonts, while Arial, Tahoma, Verdana, etc. are in the Sans Serif font. For Chinese, there are two broad categories, and it is obvious that Song and Slim (commonly used in the complex) belong to Serif, while bold and young circles belong to san Serif. The font types comprise equal-width fonts and non-equal-width fonts.
In detail, when light difference images are viewed, it is often the case that continuous texture and similar gray level regions are seen, which combine to form an object. However, if the object is small in size or not high in contrast, it is usually observed with a higher resolution: if the object is large in size or very contrasty, only the resolution needs to be reduced. If the size of the object is large or small, or the contrast is strong or weak, the same thing exists, and then the characteristics of the image are extracted to be advantageous for image research. Common feature extraction methods include Fourier transform, window Fourier transform (Gabor), wavelet transform, least square method, boundary direction histogram method, texture feature extraction based on Tamura texture features, and the like.
Specifically, the brand picture features are encoded one by using hash encoding, the hash encoding can be split into two sub-stages, a hash function set is required before the brand picture features are encoded, and the hash function set is obtained through a hash function learning stage, so that the two sub-stages are a hash function learning stage and a formal hash encoding stage respectively. In a hash function learning stage, dividing the compliance feature library into a training set and a testing set, and performing training learning on the constructed hash function set on the training set; and in the formal hash coding stage, substituting the original brand picture characteristics into the trained hash function set respectively so as to obtain corresponding hash codes.
Specifically, the hamming distance represents the number of corresponding bits of two words of the same length, and d (x, y) represents the hamming distance between the two words x, y. And carrying out exclusive OR operation on the two character strings, and counting the number of 1, wherein the number is the Hamming distance. The hamming distance between the two equal length strings s1 and s2 is defined as the minimum number of substitutions required to change one to the other. For example, the hamming distance between the strings "1111" and "1001" is 2.
The method comprises the steps of releasing mass contents on the Internet every day, enabling the number of advertisement pictures to be huge, enabling image classification to face various challenges such as visual angle change, illumination conditions, shapes, size change, shielding, background interference and intra-class difference, conducting enhancement processing on the advertisement pictures, being capable of effectively distinguishing different types of target advertisement pictures, conducting text information extraction on the sensitive information pictures, utilizing a preset auditing model to filter risk contents to be optimal selection, retrieving illegal brand contents of the brand pictures according to the characteristics of the brand pictures, and achieving automatic and flow brand auditing.
Fig. 4 is a functional block diagram of an advertisement compliance auditing apparatus according to an embodiment of the present invention.
The advertisement compliance audit device 100 of the present invention may be installed in an electronic device. According to the realized functions, the advertisement compliance auditing device 100 may include a picture enhancement module 101, a picture classification module 102, a sensitive information module 103, and a brand violation module 104. The module of the present invention, which may also be referred to as a unit, refers to a series of computer program segments that can be executed by a processor of an electronic device and that can perform a fixed function, and that are stored in a memory of the electronic device.
In the present embodiment, the functions regarding the respective modules/units are as follows:
the picture enhancement module is used for acquiring an advertisement picture and carrying out picture enhancement processing on the advertisement picture to obtain an enhanced picture;
the picture classification module is used for classifying the enhanced pictures into brand pictures and sensitive information pictures;
the sensitive information module is used for extracting text information of the sensitive information picture and extracting the illegal content of the sensitive information in the text information by using a preset auditing model;
and the brand violation module is used for retrieving the brand violation contents of the brand pictures one by one and collecting the sensitive information violation contents and the brand violation contents into a compliance auditing result of the advertisement pictures.
Fig. 5 is a schematic structural diagram of an electronic device for implementing an advertisement compliance auditing method according to an embodiment of the present invention.
The electronic device 1 may comprise a processor 10, a memory 11, a communication bus 12 and a communication interface 13, and may further comprise a computer program, such as an advertisement compliance audit program, stored in the memory 11 and executable on the processor 10.
In some embodiments, the processor 10 may be composed of an integrated circuit, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same function or different functions, and includes one or more Central Processing Units (CPUs), a microprocessor, a digital Processing chip, a graphics processor, a combination of various control chips, and the like. The processor 10 is a Control Unit (Control Unit) of the electronic device, connects various components of the electronic device by using various interfaces and lines, and executes various functions and processes data of the electronic device by running or executing programs or modules (e.g., executing an advertisement compliance audit program, etc.) stored in the memory 11 and calling data stored in the memory 11.
The memory 11 includes at least one type of readable storage medium including flash memory, removable hard disks, multimedia cards, card-type memory (e.g., SD or DX memory, etc.), magnetic memory, magnetic disks, optical disks, etc. The memory 11 may in some embodiments be an internal storage unit of the electronic device, for example a removable hard disk of the electronic device. The memory 11 may also be an external storage device of the electronic device in other embodiments, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the electronic device. Further, the memory 11 may also include both an internal storage unit and an external storage device of the electronic device. The memory 11 may be used not only to store application software installed in the electronic device and various types of data, such as codes of advertisement compliance audit programs, etc., but also to temporarily store data that has been output or is to be output.
The communication bus 12 may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. The bus is arranged to enable connection communication between the memory 11 and at least one processor 10 or the like.
The communication interface 13 is used for communication between the electronic device and other devices, and includes a network interface and a user interface. Optionally, the network interface may include a wired interface and/or a wireless interface (e.g., WI-FI interface, bluetooth interface, etc.), which are typically used to establish a communication connection between the electronic device and other electronic devices. The user interface may be a Display (Display), an input unit such as a Keyboard (Keyboard), and optionally a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable, among other things, for displaying information processed in the electronic device and for displaying a visualized user interface.
Only electronic devices having components are shown, it will be understood by those skilled in the art that the structures shown in the figures do not constitute limitations on the electronic devices, and may include fewer or more components than shown, or some components may be combined, or a different arrangement of components.
For example, although not shown, the electronic device may further include a power supply (such as a battery) for supplying power to each component, and preferably, the power supply may be logically connected to the at least one processor 10 through a power management device, so that functions of charge management, discharge management, power consumption management and the like are realized through the power management device. The power supply may also include any component of one or more dc or ac power sources, recharging devices, power failure detection circuitry, power converters or inverters, power status indicators, and the like. The electronic device may further include various sensors, a bluetooth module, a Wi-Fi module, and the like, which are not described herein again.
It is to be understood that the described embodiments are for purposes of illustration only and that the scope of the appended claims is not limited to such structures.
The advertisement compliance audit program stored in the memory 11 of the electronic device 1 is a combination of instructions that, when executed in the processor 10, may implement:
obtaining an advertisement picture, and performing picture enhancement processing on the advertisement picture to obtain an enhanced picture;
classifying the enhanced pictures into brand pictures and sensitive information pictures;
extracting text information of the sensitive information picture, and extracting the illegal content of the sensitive information in the text information by using a preset auditing model;
and searching the illegal brand contents of the brand pictures one by one, and collecting the illegal sensitive information contents and the illegal brand contents as the compliance audit results of the advertisement pictures.
Specifically, the specific implementation method of the instruction by the processor 10 may refer to the description of the relevant steps in the embodiment corresponding to the drawings, which is not described herein again.
Further, the integrated modules/units of the electronic device 1, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. The computer readable storage medium may be volatile or non-volatile. For example, the computer-readable medium may include: any entity or device capable of carrying said computer program code, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM).
The present invention also provides a computer-readable storage medium, storing a computer program which, when executed by a processor of an electronic device, may implement:
obtaining an advertisement picture, and performing picture enhancement processing on the advertisement picture to obtain an enhanced picture;
classifying the enhanced pictures into brand pictures and sensitive information pictures;
extracting text information of the sensitive information picture, and extracting the illegal content of the sensitive information in the text information by using a preset auditing model;
and searching the illegal brand contents of the brand pictures one by one, and collecting the illegal sensitive information contents and the illegal brand contents as the compliance audit results of the advertisement pictures.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus, device and method can be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof.
The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.
The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
The embodiment of the application can acquire and process related data based on an artificial intelligence technology. Among them, Artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result.
Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the system claims may also be implemented by one unit or means in software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.
Finally, it should be noted that the above embodiments are only intended to illustrate the technical solutions of the present invention and not to limit the same, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions can be made to the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims (10)

1. An advertisement compliance auditing method, the method comprising:
obtaining an advertisement picture, and performing picture enhancement processing on the advertisement picture to obtain an enhanced picture;
classifying the enhanced pictures into brand pictures and sensitive information pictures;
extracting text information of the sensitive information picture, and extracting the illegal content of the sensitive information in the text information by using a preset auditing model;
and searching the illegal brand contents of the brand pictures one by one, and collecting the illegal sensitive information contents and the illegal brand contents as the compliance audit results of the advertisement pictures.
2. The advertisement compliance auditing method of claim 1, wherein the image enhancement processing of the advertisement image to obtain an enhanced image comprises:
uniformly cutting the advertisement pictures to obtain a plurality of advertisement picture blocks;
performing pixel convolution on each advertisement picture block to obtain a plurality of convolution advertisement picture blocks;
respectively carrying out Gaussian smoothing treatment on each convolution advertisement picture block to obtain a plurality of smooth advertisement picture blocks;
and splicing the smooth advertisement picture blocks to obtain an enhanced picture of the advertisement picture.
3. The advertisement compliance auditing method of claim 1, wherein the image enhancement processing of the advertisement image to obtain an enhanced image comprises:
counting black point pixel values of horizontal projection of the advertisement picture, and selecting a region with the maximum black point pixel value in the horizontal projection as a target region;
calculating the variance of the black pixel values of the target area;
rotating the target area according to a preset angle, and calculating the variance of the black pixel values of the horizontal projection image of the rotated target area to obtain a rotation variance;
calculating to obtain an optimal inclination angle according to the difference between the variance of the black pixel value of the target area and the rotation variance;
and rotating the advertisement picture by utilizing the optimal inclination angle to obtain an enhanced picture.
4. The advertisement compliance review method of claim 1, wherein the classifying the enhanced pictures into brand pictures and sensitive information pictures comprises:
extracting a characteristic region in the enhanced picture, and determining the picture characteristic of the enhanced picture according to the characteristic region;
and classifying the enhanced pictures according to the picture characteristics to obtain brand pictures and sensitive information pictures.
5. The advertisement compliance auditing method according to claim 4, wherein said extracting a feature region in the enhanced picture, determining picture characteristics of the enhanced picture using the feature region, comprises:
dividing the enhanced picture into a plurality of enhanced picture blocks according to a preset proportion;
selecting one enhancement picture block from the plurality of enhancement picture blocks one by one as a target enhancement picture block;
generating global features of the target enhancement picture block according to the pixel gradient in the target enhancement picture block;
performing frame selection on the regions in the target enhanced picture block one by using a preset sliding window to obtain a pixel window;
generating local features of the target enhancement picture block according to the pixel values in each pixel window;
and collecting the global features and the local features as the picture characteristics of the target enhanced picture block.
6. The advertisement compliance review method of claim 1, wherein the extracting the text information of the sensitive information picture comprises:
acquiring projection information of the sensitive information picture, and performing layout analysis on the projection information to obtain a layout analysis image;
performing line character segmentation on all lines of the layout analysis image one by one to obtain line texts;
and acquiring column segmentation characters of each line text, and identifying the column segmentation characters by using a pre-trained character identification model to obtain text information of the sensitive information picture.
7. The advertisement compliance review method according to any one of claims 1 to 6, wherein the retrieving brand violation content of the brand pictures one by one includes:
extracting the features of the brand pictures to obtain brand picture features;
coding the brand picture features one by one to obtain brand picture codes;
selecting one of the brand picture codes from the brand picture codes one by one to serve as a target brand picture code, calculating the Hamming distance between the target brand picture code and the unselected brand picture codes one by one, and selecting the brand picture with the Hamming distance smaller than a preset distance threshold value as the illegal content of the brand.
8. An advertisement compliance auditing apparatus, characterized in that the apparatus comprises:
the picture enhancement module is used for acquiring an advertisement picture and carrying out picture enhancement processing on the advertisement picture to obtain an enhanced picture;
the picture classification module is used for classifying the enhanced pictures into brand pictures and sensitive information pictures;
the sensitive information module is used for extracting text information of the sensitive information picture and extracting the illegal content of the sensitive information in the text information by using a preset auditing model;
and the brand violation module is used for retrieving the brand violation contents of the brand pictures one by one and collecting the sensitive information violation contents and the brand violation contents into a compliance auditing result of the advertisement pictures.
9. An electronic device, characterized in that the electronic device comprises:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the advertisement compliance review method of any one of claims 1 to 7.
10. A computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements an advertisement compliance review method as claimed in any one of claims 1 to 7.
CN202210542100.1A 2022-05-17 2022-05-17 Advertisement compliance auditing method and device, electronic equipment and storage medium Withdrawn CN114881698A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210542100.1A CN114881698A (en) 2022-05-17 2022-05-17 Advertisement compliance auditing method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210542100.1A CN114881698A (en) 2022-05-17 2022-05-17 Advertisement compliance auditing method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114881698A true CN114881698A (en) 2022-08-09

Family

ID=82675636

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210542100.1A Withdrawn CN114881698A (en) 2022-05-17 2022-05-17 Advertisement compliance auditing method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114881698A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116206227A (en) * 2023-04-23 2023-06-02 上海帜讯信息技术股份有限公司 Picture examination system and method for 5G rich media information, electronic equipment and medium
CN116520987A (en) * 2023-04-28 2023-08-01 中广电广播电影电视设计研究院有限公司 VR content problem detection method, device, equipment and storage medium
CN116824603A (en) * 2023-07-28 2023-09-29 广州淘通科技股份有限公司 Advertisement page image auditing method and device

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116206227A (en) * 2023-04-23 2023-06-02 上海帜讯信息技术股份有限公司 Picture examination system and method for 5G rich media information, electronic equipment and medium
CN116206227B (en) * 2023-04-23 2023-07-25 上海帜讯信息技术股份有限公司 Picture examination system and method for 5G rich media information, electronic equipment and medium
CN116520987A (en) * 2023-04-28 2023-08-01 中广电广播电影电视设计研究院有限公司 VR content problem detection method, device, equipment and storage medium
CN116824603A (en) * 2023-07-28 2023-09-29 广州淘通科技股份有限公司 Advertisement page image auditing method and device

Similar Documents

Publication Publication Date Title
US20200074169A1 (en) System And Method For Extracting Structured Information From Image Documents
CN112528863A (en) Identification method and device of table structure, electronic equipment and storage medium
CN114881698A (en) Advertisement compliance auditing method and device, electronic equipment and storage medium
US11361570B2 (en) Receipt identification method, apparatus, device and storage medium
CN112699775A (en) Certificate identification method, device and equipment based on deep learning and storage medium
CN108491866B (en) Pornographic picture identification method, electronic device and readable storage medium
CN112861648A (en) Character recognition method and device, electronic equipment and storage medium
CN112036292A (en) Character recognition method and device based on neural network and readable storage medium
CN111860377A (en) Live broadcast method and device based on artificial intelligence, electronic equipment and storage medium
CN113095076A (en) Sensitive word recognition method and device, electronic equipment and storage medium
CN113033543A (en) Curved text recognition method, device, equipment and medium
CN113887438A (en) Watermark detection method, device, equipment and medium for face image
CN114005126A (en) Table reconstruction method and device, computer equipment and readable storage medium
CN114708461A (en) Multi-modal learning model-based classification method, device, equipment and storage medium
CN112686026B (en) Keyword extraction method, device, equipment and medium based on information entropy
CN112861750B (en) Video extraction method, device, equipment and medium based on inflection point detection
US11436852B2 (en) Document information extraction for computer manipulation
CN113821602A (en) Automatic answering method, device, equipment and medium based on image-text chatting record
CN112613367A (en) Bill information text box acquisition method, system, equipment and storage medium
Zhang et al. Computational method for calligraphic style representation and classification
CN115690819A (en) Big data-based identification method and system
CN114943306A (en) Intention classification method, device, equipment and storage medium
CN114783042A (en) Face recognition method, device, equipment and storage medium based on multiple moving targets
CN114267064A (en) Face recognition method and device, electronic equipment and storage medium
CN113888760A (en) Violation information monitoring method, device, equipment and medium based on software application

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20220809

WW01 Invention patent application withdrawn after publication