CN114677178A - Illegal advertisement detection method and device and electronic equipment - Google Patents

Illegal advertisement detection method and device and electronic equipment Download PDF

Info

Publication number
CN114677178A
CN114677178A CN202210334874.5A CN202210334874A CN114677178A CN 114677178 A CN114677178 A CN 114677178A CN 202210334874 A CN202210334874 A CN 202210334874A CN 114677178 A CN114677178 A CN 114677178A
Authority
CN
China
Prior art keywords
advertisement
text
standard
video
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210334874.5A
Other languages
Chinese (zh)
Inventor
陈文海
沈菁
康单
张聪
张天生
陆璐
熊家治
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Feishu Shennuo Digital Technology Shanghai Co ltd
Original Assignee
Feishu Shennuo Digital Technology Shanghai Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Feishu Shennuo Digital Technology Shanghai Co ltd filed Critical Feishu Shennuo Digital Technology Shanghai Co ltd
Priority to CN202210334874.5A priority Critical patent/CN114677178A/en
Publication of CN114677178A publication Critical patent/CN114677178A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0248Avoiding fraud
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Strategic Management (AREA)
  • Artificial Intelligence (AREA)
  • Finance (AREA)
  • Development Economics (AREA)
  • Accounting & Taxation (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Game Theory and Decision Science (AREA)
  • Evolutionary Computation (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a method, a device and electronic equipment for detecting illegal advertisements, which can detect the advertisements with non-standard terms, accurately judge whether the advertisements have the non-standard terms or not, detect the non-standard terms in the advertisements more truly, accurately and in real time, and reduce the risk of account forbidding caused by putting the illegal advertisements by advertisers; moreover, the advertising phrases can be more standardized, and the commodity effect can be reasonably promoted.

Description

Illegal advertisement detection method and device and electronic equipment
Technical Field
The invention relates to the technical field of computers, in particular to a method and a device for detecting illegal advertisements and electronic equipment.
Background
At present, irregular expressions such as false publicity and exaggerated effect often appear in the advertisement, and the irregular expressions can cause the advertisement to violate the regulations of the country where the advertisement is put, cause the advertisement to be off-shelf and cause the loss of the advertiser.
Disclosure of Invention
In order to solve the above problem, an embodiment of the present invention provides a method, an apparatus, and an electronic device for detecting an illegal advertisement.
In a first aspect, an embodiment of the present invention provides a method for detecting an illegal advertisement, including:
acquiring an irregular expression, traversing the advertisement text of the delivered advertisement by using the irregular expression, and inquiring the advertisement text containing the irregular expression and the regular advertisement text; wherein the regular advertisement text is an advertisement text that does not contain the non-standard expression;
taking the advertisement text containing the non-standard expressions as a training negative sample and the regular advertisement text as a training positive sample, and training a BERT model to obtain a non-standard expression advertisement text prediction model;
acquiring an advertisement to be detected, extracting an advertisement text in the advertisement, and processing the advertisement text in the advertisement through the non-standard expression advertisement text prediction model to obtain a first probability value that the advertisement text of the advertisement has non-standard expressions;
acquiring an advertisement image of the advertisement, and processing the advertisement image of the advertisement to obtain a second probability value that the advertisement image of the advertisement has the non-standard expressions;
acquiring an advertisement video of the advertisement, and processing the advertisement video of the advertisement to obtain a third probability value that the advertisement video of the advertisement has the non-standard expression;
calculating a violation parameter of the advertisement according to the obtained first probability value, the second probability value and the third probability value;
when the violation parameter of the advertisement is greater than a violation parameter threshold, determining the advertisement as a violation advertisement.
In a second aspect, an embodiment of the present invention further provides an illegal advertisement detection device, which is characterized by including:
the acquisition module is used for acquiring the non-standard expression, traversing the advertisement text of the delivered advertisement by using the non-standard expression and inquiring the advertisement text containing the non-standard expression and the regular advertisement text; wherein the regular advertisement text is an advertisement text that does not contain the non-standard expression;
the training module is used for taking the advertisement text containing the non-standard expression as a training negative sample and the regular advertisement text as a training positive sample, and training a BERT model to obtain a non-standard expression advertisement text prediction model;
the first detection module is used for acquiring an advertisement to be detected, extracting an advertisement text in the advertisement, and processing the advertisement text in the advertisement through the non-standard expression advertisement text prediction model to obtain a first probability value that the advertisement text of the advertisement has non-standard expressions;
the second detection module is used for acquiring the advertisement image of the advertisement and processing the advertisement image of the advertisement to obtain a second probability value that the advertisement image of the advertisement has the non-standard expression;
the third detection module is used for acquiring the advertisement video of the advertisement, processing the advertisement video of the advertisement and obtaining a third probability value that the advertisement video of the advertisement has the non-standard expression;
a calculation module, configured to calculate a violation parameter of the advertisement according to the obtained first probability value, the obtained second probability value, and the obtained third probability value;
a determination module to determine the advertisement as a violation advertisement when the violation parameter of the advertisement is greater than a violation parameter threshold.
In a third aspect, the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program performs the steps of the method in the first aspect.
In a fourth aspect, embodiments of the present invention also provide an electronic device, which includes a memory, a processor, and one or more programs, where the one or more programs are stored in the memory and configured to be executed by the processor to perform the steps of the method according to the first aspect.
In the solutions provided by the above first to fourth aspects of the embodiments of the present invention, a BERT model is trained by using an advertisement text with an irregular term to obtain an irregular term advertisement text prediction model, so that the advertisement text in the advertisement is processed by the irregular term advertisement text prediction model to obtain a first probability value that the advertisement text of the advertisement has the irregular term, then an advertisement image and an advertisement video of the advertisement are processed respectively to obtain a second probability value that the advertisement image of the advertisement has the irregular term and a third probability value that the advertisement video of the advertisement has the irregular term, and a violation parameter of the advertisement is calculated according to the obtained first probability value, the second probability value and the third probability value, and finally whether the advertisement is a violation advertisement is determined according to the calculated violation parameter, compared with the mode that the advertisement with the non-standard terms cannot be detected in the related technology, the method can detect the advertisement with the non-standard terms, accurately judge whether the advertisement has the non-standard terms, detect the non-standard terms in the advertisement more truly, accurately and in real time, and reduce the risk that an account is blocked because an advertiser puts illegal advertisements; moreover, the advertising phrases can be more standardized, and the commodity effect can be reasonably promoted; moreover, the advertisement with the non-standard expressions can be automatically detected, and the efficiency of detecting the advertisement with the non-standard expressions is greatly improved.
In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a flowchart illustrating a method for detecting illegal advertisements according to embodiment 1 of the present invention;
fig. 2 is a schematic structural diagram illustrating an illegal advertisement detection device provided in embodiment 2 of the present invention;
fig. 3 shows a schematic structural diagram of an electronic device provided in embodiment 3 of the present invention.
Detailed Description
In the description of the present invention, it is to be understood that the terms "center", "longitudinal", "lateral", "length", "width", "thickness", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", "clockwise", "counterclockwise", and the like, indicate orientations and positional relationships based on those shown in the drawings, and are used only for convenience of description and simplicity of description, and do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be considered as limiting the present invention.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or to implicitly indicate the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.
In the present invention, unless otherwise expressly specified or limited, the terms "mounted," "connected," "secured," and the like are to be construed broadly and can, for example, be fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.
At present, irregular expressions such as false publicity and exaggerated effect often appear in the advertisement, and the irregular expressions can cause the advertisement to violate the regulations of the country where the advertisement is put, cause the advertisement to be off-shelf and cause the loss of the advertiser.
The non-canonical terms may include, but are not limited to: among cosmetic products: spot removal, scar removal, skin whitening and tendering, and tooth whitening;
among medical products: regulating blood sugar and helping sleep;
in the clothing shoes and hats: relieving knee pain, correcting walking posture, and correcting posture.
Advertisement text, advertisement images, and advertisement videos that refer to the non-canonical terms are all illegal advertisements.
Based on this, the embodiment provides a method, an apparatus, and an electronic device for detecting illegal advertisements, which train a BERT model using an advertisement text with an irregular term to obtain an irregular term advertisement text prediction model, process an advertisement text in an advertisement through the irregular term advertisement text prediction model to obtain a first probability value that the advertisement text of the advertisement has the irregular term, then process an advertisement image and an advertisement video of the advertisement respectively to obtain a second probability value that the advertisement image of the advertisement has the irregular term and a third probability value that the advertisement video of the advertisement has the irregular term, calculate an illegal parameter of the advertisement according to the obtained first probability value, the second probability value, and the third probability value, and finally determine whether the advertisement is illegal according to the calculated illegal parameter, the method can detect the advertisement with the non-standard terms, accurately judge whether the advertisement has the non-standard terms or not, detect the non-standard terms in the advertisement more truly, accurately and in real time, and reduce the risk that an account is forbidden due to illegal advertisement putting by an advertiser.
In order to make the aforementioned objects, features and advantages of the present application more comprehensible, the present application is described in further detail with reference to the accompanying drawings and the detailed description.
Example 1
In the illegal advertisement detection method provided by this embodiment, the execution subject is a server.
Referring to a flowchart of a method for detecting an illegal advertisement shown in fig. 1, the present embodiment provides a method for detecting an illegal advertisement, which includes the following specific steps:
step 100, acquiring an unnormalized expression, traversing an advertisement text with an advertisement delivered by using the unnormalized expression, and inquiring an advertisement text containing the unnormalized expression and a regular advertisement text; wherein the regular advertisement text is an advertisement text that does not include the non-standard term.
In step 100, the non-standard term is extracted from the offending advertisement by the worker and stored in the server.
The advertisement text of the delivered advertisement is cached in the server; the advertisement text of the delivered advertisement can be the text of the text advertisement itself, or the advertisement text extracted from the advertisement image by using the OCR technology.
And 102, taking the advertisement text containing the non-standard expression as a training negative sample and the regular advertisement text as a training positive sample, and training a BERT model to obtain a non-standard expression advertisement text prediction model.
In the step 102, in order to train the BERT model and obtain the ad text prediction model with the non-canonical expression, the following steps (1) to (3) may be performed:
(1) translating the advertisement text containing the non-standard wording into first translated texts of various languages, taking the first translated texts as training negative samples, translating the regular advertisement text into second translated texts of various languages, and taking the second translated texts as training positive samples;
(2) randomly extracting samples with a preset proportion from the training negative samples and the training positive samples to form a first training set;
(3) and training the BERT model by using the first training set to obtain an irregular phrase advertisement text prediction model.
In the step (1), the advertisement text containing the non-standard expression is translated into first translation texts of various languages, and the regular advertisement text is translated into second translation texts of various languages by using translation software.
The various languages, including: chinese, english, japanese, thai, french, western, german, russian, korean, and arabic.
In the step (2), in one embodiment, the preset ratio may be set to 0.8.
Of course, the preset ratio may also be set to any value between 0.6 and 0.85, which is not described in detail herein.
Samples with preset proportion are randomly extracted from the training negative sample and the training positive sample to form a first training set, and the number of the samples extracted from the training negative sample and the number of the samples extracted from the training positive sample can be approximately the same, so that the effectiveness of the training model is ensured.
In the step (3), the concrete process of training the BERT model by using the first training set to obtain the ad text prediction model with the non-standard phrases is the prior art, and is not described herein again.
And 104, acquiring the advertisement to be detected, extracting the advertisement text in the advertisement, and processing the advertisement text in the advertisement through the non-standard expression advertisement text prediction model to obtain a first probability value that the advertisement text of the advertisement has non-standard expressions.
In the step 104, in the process of extracting the advertisement text in the advertisement, if the advertisement is the advertisement text, directly processing the advertisement text of the advertisement; if the advertisement is an advertisement image, extracting an advertisement text from the advertisement image by using an OCR (optical character recognition) technology, and then processing the advertisement text in the advertisement; if the advertisement is an advertisement video, extracting a video key frame from the advertisement, then extracting an advertisement text from the video key frame of the advertisement by using an OCR technology, and finally processing the advertisement text.
The specific process of processing the advertisement text in the advertisement through the non-canonical expression advertisement text prediction model to obtain that the advertisement text of the advertisement has the first probability value of the non-canonical expression is the prior art, and is not described herein again.
And 106, acquiring an advertisement image of the advertisement, and processing the advertisement image of the advertisement to obtain a second probability value that the advertisement image of the advertisement has the non-standard expression.
In step 106, in order to process the advertisement image of the advertisement and obtain a second probability value that the advertisement image of the advertisement has the non-canonical expression, steps (1) to (4) may be performed:
(1) acquiring an advertisement picture containing the non-standard phrases from a violation advertisement database, and acquiring a regular advertisement picture used by a regular advertisement from a regular advertisement database; wherein the regular advertisement picture is an advertisement picture not containing the non-standard phrase;
(2) randomly extracting pictures with a preset proportion from the advertisement pictures containing the non-standard phrases and the regular advertisement pictures to form a second training set, and converting the resolution of the pictures in the second training set to a preset resolution;
(3) training a ResNet34 model by using the second training set with the preset resolution pictures to obtain an advertisement picture prediction model of the non-standard phrases;
(4) and acquiring an advertisement image of the advertisement, inputting the advertisement image of the advertisement into the non-standard expression advertisement picture prediction model, and processing the advertisement image of the advertisement through the non-standard expression advertisement picture prediction model to obtain a second probability value that the advertisement image of the advertisement has non-standard expression.
In step (1) above, the illegal advertisement database may store advertisement texts, advertisement pictures and advertisement videos of illegal advertisements containing the non-standard expressions.
The regular advertisement database can store advertisement texts, advertisement pictures and advertisement videos of regular advertisements.
In the step (2), in one embodiment, the preset resolution may be 224 × 224 pixels.
In the step (3), the specific process of training the ResNet34 model by using the second training set with the preset resolution picture to obtain the advertisement picture prediction model with non-standard phrases is the prior art, and is not described herein again.
In the step (4), the specific process of obtaining the second probability value that the advertisement image of the advertisement has the non-canonical expression by processing the advertisement image of the advertisement through the non-canonical expression advertisement image prediction model is the prior art, and is not described herein again.
And 108, acquiring the advertisement video of the advertisement, and processing the advertisement video of the advertisement to obtain a third probability value that the advertisement video of the advertisement has the non-standard expression.
In step 108, in order to process the advertisement video of the advertisement and obtain a third probability value that the advertisement video of the advertisement has the non-canonical expression, the following steps (1) to (3) may be performed:
(1) acquiring an advertisement video of the advertisement, and extracting a video key frame in the advertisement video by using a key frame extraction technology;
(2) deleting the video key frames positioned at the beginning of the advertisement video and the video key frames positioned at the end of the advertisement video in the extracted video key frames, and extracting a plurality of video key frames to be detected from the video key frames which are deleted and positioned at the beginning of the advertisement video and the remaining video key frames which are positioned at the end of the advertisement video according to a preset time interval;
(3) and inputting the plurality of video key frames of the advertisement video into the non-standard expression advertisement picture prediction model, and processing the plurality of video key frames of the advertisement video through the non-standard expression advertisement picture prediction model to obtain a third probability value that the advertisement video of the advertisement has non-standard expressions.
In the step (1), a specific process of extracting the video key frames in the advertisement video by using the key frame extraction technology is the prior art, and is not repeated here. The video key frames carry timestamps.
The video key frame carries a time stamp.
In the step (2), according to the time indicated by the timestamp carried in the video key frame, deleting the video key frame at the beginning of the advertisement video and the video key frame at the end of the advertisement video in the extracted video key frames.
According to the sequence from small to large of the time indicated by the time stamps of the video key frames, the remaining video key frames after the head and tail video key frames are deleted are sequenced, and a plurality of video key frames to be detected are extracted from the sequenced remaining video key frames according to a preset time interval.
In one embodiment, the preset time interval may be set to any time length between 2 seconds and 5 seconds, which is not described in detail herein.
In the step (3), the specific process of inputting the plurality of video key frames of the advertisement video into the non-canonical expression advertisement picture prediction model and processing the plurality of video key frames of the advertisement video through the non-canonical expression advertisement picture prediction model to obtain the third probability value of the advertisement video having the non-canonical expression is the prior art, and is not described herein again.
After each video key frame in the plurality of video key frames is input into the nonstandard expression advertisement picture prediction model, the maximum value of the probability values output by the nonstandard expression advertisement picture prediction model is determined as the third probability value.
And step 110, calculating the violation parameters of the advertisement according to the obtained first probability value, the second probability value and the third probability value.
In step 110, the violation parameters of the advertisement are calculated by the following formula:
M=A1*S1+A2*S2+A3*S3
wherein M represents a violation parameter for the advertisement; a1 denotes a first weight value; a2 denotes a first weight value; a3 denotes a third weight value; s1 represents a first probability value; s2 represents a second probability value; s3 represents a third probability value.
The first weight value, the second weight value, and the third weight value are preset in the server,
and step 112, when the violation parameters of the advertisements are larger than the violation parameter threshold, determining the advertisements as the violation advertisements.
In step 112, the violation parameter threshold is cached in the server and may be set to any value between 0.6 and 0.8, which is not described in detail here.
In summary, this embodiment provides a method for detecting illegal advertisements, which trains a BERT model using an advertisement text with an irregular term to obtain an irregular term advertisement text prediction model, processes an advertisement text in an advertisement through the irregular term advertisement text prediction model to obtain a first probability value that the advertisement text of the advertisement has the irregular term, then processes an advertisement image and an advertisement video of the advertisement respectively to obtain a second probability value that the advertisement image of the advertisement has the irregular term and a third probability value that the advertisement video of the advertisement has the irregular term, calculates illegal parameters of the advertisement according to the obtained first probability value, second probability value and third probability value, and finally determines whether the advertisement is an illegal advertisement according to the calculated illegal parameters, compared with the mode that the advertisement with the non-standard phrase cannot be detected in the related technology, the method can detect the advertisement with the non-standard phrase, accurately judge whether the advertisement has the non-standard phrase, detect the non-standard phrase in the advertisement more truly, accurately and in real time, and reduce the risk that an account is sealed because an advertiser puts an illegal advertisement; moreover, the advertising phrases can be more standardized, and the commodity effect can be reasonably promoted; moreover, the method and the device can automatically detect the advertisements with the non-standard expressions, and greatly improve the efficiency of detecting the advertisements with the non-standard expressions.
Example 2
The present embodiment provides an illegal advertisement detection device, which is used for executing the illegal advertisement detection method described in embodiment 1 above.
Referring to a schematic structural diagram of an illegal advertisement detection device shown in fig. 2, the present embodiment provides an illegal advertisement detection device, including:
an obtaining module 200, configured to obtain an irregular expression, traverse an advertisement text with an advertisement delivered by using the irregular expression, and query an advertisement text containing the irregular expression and a regular advertisement text; wherein the regular advertisement text is an advertisement text that does not contain the non-standard expression;
the training module 202 is configured to train a BERT model by using the advertisement text containing the non-standard term as a training negative sample and the regular advertisement text as a training positive sample, so as to obtain a non-standard term advertisement text prediction model;
the first detection module 204 is configured to obtain an advertisement to be detected, extract an advertisement text in the advertisement, and process the advertisement text in the advertisement through the non-canonical expression advertisement text prediction model to obtain a first probability value that the advertisement text of the advertisement has non-canonical expressions;
the second detection module 206 is configured to obtain an advertisement image of the advertisement, process the advertisement image of the advertisement, and obtain a second probability value that the advertisement image of the advertisement has an irregular expression;
the third detection module 208 is configured to obtain an advertisement video of the advertisement, process the advertisement video of the advertisement, and obtain a third probability value that the advertisement video of the advertisement has the non-canonical expression;
a calculating module 210, configured to calculate a violation parameter of the advertisement according to the obtained first probability value, the obtained second probability value, and the obtained third probability value;
a determining module 212, configured to determine the advertisement as a violation advertisement when the violation parameter of the advertisement is greater than the violation parameter threshold.
Specifically, the training module 202 is specifically configured to:
translating the advertisement text containing the non-standard expression into first translation texts of various languages, taking the first translation texts as training negative samples, translating the regular advertisement text into second translation texts of various languages, and taking the second translation texts as training positive samples;
randomly extracting samples with a preset proportion from the training negative samples and the training positive samples to form a first training set;
and training the BERT model by using the first training set to obtain an irregular expression advertisement text prediction model.
Specifically, the second detecting module 206 is specifically configured to:
acquiring an advertisement picture containing the non-standard phrases from a violation advertisement database, and acquiring a regular advertisement picture used by a regular advertisement from a regular advertisement database; wherein the regular advertisement picture is an advertisement picture not containing the non-standard phrase;
randomly extracting pictures with a preset proportion from the advertisement pictures containing the non-standard phrases and the regular advertisement pictures to form a second training set, and converting the resolution of the pictures in the second training set to a preset resolution;
training a ResNet34 model by using the second training set with the preset resolution pictures to obtain an advertisement picture prediction model of the non-standard phrases;
and acquiring an advertisement image of the advertisement, inputting the advertisement image of the advertisement into the non-standard expression advertisement picture prediction model, and processing the advertisement image of the advertisement through the non-standard expression advertisement picture prediction model to obtain a second probability value that the advertisement image of the advertisement has non-standard expression.
In summary, the present embodiment provides a violation advertisement detection apparatus, which trains a BERT model using an advertisement text with an irregular term to obtain an irregular term advertisement text prediction model, processes an advertisement text in an advertisement through the irregular term advertisement text prediction model to obtain a first probability value that the advertisement text of the advertisement has the irregular term, then processes an advertisement image and an advertisement video of the advertisement respectively to obtain a second probability value that the advertisement image of the advertisement has the irregular term and a third probability value that the advertisement video of the advertisement has the irregular term, calculates violation parameters of the advertisement according to the obtained first probability value, the second probability value and the third probability value, and finally determines whether the advertisement is a violation advertisement according to the calculated violation parameters, compared with the mode that the advertisement with the non-standard phrase cannot be detected in the related technology, the method can detect the advertisement with the non-standard phrase, accurately judge whether the advertisement has the non-standard phrase, detect the non-standard phrase in the advertisement more truly, accurately and in real time, and reduce the risk that an account is sealed because an advertiser puts an illegal advertisement; moreover, the advertising phrases can be more standardized, and the commodity effect can be reasonably promoted; moreover, the advertisement with the non-standard expressions can be automatically detected, and the efficiency of detecting the advertisement with the non-standard expressions is greatly improved.
Example 3
The present embodiment proposes a computer-readable storage medium, which stores thereon a computer program, which, when executed by a processor, performs the steps of the illegal advertisement detection method described in embodiment 1 above. For specific implementation, refer to method embodiment 1, which is not described herein again.
In addition, referring to the schematic structural diagram of an electronic device shown in fig. 3, the present embodiment further provides an electronic device, where the electronic device includes a bus 51, a processor 52, a transceiver 53, a bus interface 54, a memory 55, and a user interface 56. The electronic device comprises a memory 55.
In this embodiment, the electronic device further includes: one or more programs stored on the memory 55 and executable on the processor 52, the one or more programs configured for execution by the processor for performing the following steps (1) through (7):
(1) acquiring an irregular expression, traversing the advertisement text of the delivered advertisement by using the irregular expression, and inquiring the advertisement text containing the irregular expression and the regular advertisement text; wherein the regular advertisement text is an advertisement text that does not contain the non-standard expression;
(2) taking the advertisement text containing the non-standard expressions as a training negative sample and the regular advertisement text as a training positive sample, and training a BERT model to obtain a non-standard expression advertisement text prediction model;
(3) acquiring an advertisement to be detected, extracting an advertisement text in the advertisement, and processing the advertisement text in the advertisement through the non-standard expression advertisement text prediction model to obtain a first probability value that the advertisement text of the advertisement has non-standard expressions;
(4) acquiring an advertisement image of the advertisement, and processing the advertisement image of the advertisement to obtain a second probability value that the advertisement image of the advertisement has the non-standard expression;
(5) acquiring an advertisement video of the advertisement, and processing the advertisement video of the advertisement to obtain a third probability value that the advertisement video of the advertisement has the non-standard expression;
(6) calculating a violation parameter of the advertisement according to the obtained first probability value, the second probability value and the third probability value;
(7) when the violation parameter of the advertisement is greater than a violation parameter threshold, the advertisement is determined to be a violation advertisement.
A transceiver 53 for receiving and transmitting data under the control of the processor 52.
Where a bus architecture (represented by bus 51) is used, bus 51 may include any number of interconnected buses and bridges, with bus 51 linking together various circuits including one or more processors, represented by processor 52, and memory, represented by memory 55. The bus 51 may also link various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further in this embodiment. A bus interface 54 provides an interface between the bus 51 and the transceiver 53. The transceiver 53 may be one element or may be multiple elements, such as multiple receivers and transmitters, providing a means for communicating with various other apparatus over a transmission medium. For example: the transceiver 53 receives external data from other devices. The transceiver 53 is used for transmitting data processed by the processor 52 to other devices. Depending on the nature of the computing system, a user interface 56, such as a keypad, display, speaker, microphone, joystick, may also be provided.
The processor 52 is responsible for managing the bus 51 and the usual processing, running a general-purpose operating system as described above. And memory 55 may be used to store data used by processor 52 in performing operations.
Alternatively, processor 52 may be, but is not limited to: a central processing unit, a singlechip, a microprocessor or a programmable logic device.
It will be appreciated that the memory 55 in embodiments of the invention may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The non-volatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), or a flash Memory. Volatile Memory can be Random Access Memory (RAM), which acts as external cache Memory. By way of illustration and not limitation, many forms of RAM are available, such as Static random access memory (Static RAM, SRAM), Dynamic Random Access Memory (DRAM), Synchronous Dynamic random access memory (Synchronous DRAM, SDRAM), Double Data Rate Synchronous Dynamic random access memory (ddr Data Rate SDRAM, ddr SDRAM), Enhanced Synchronous SDRAM (ESDRAM), Synchlink DRAM (SLDRAM), and Direct Rambus RAM (DRRAM). The memory 55 of the systems and methods described in this embodiment is intended to comprise, without being limited to, these and any other suitable types of memory.
In some embodiments, memory 55 stores elements, executable modules or data structures, or a subset thereof, or an expanded set thereof as follows: an operating system 551 and application programs 552.
The operating system 551 includes various system programs, such as a framework layer, a core library layer, a driver layer, and the like, for implementing various basic services and processing hardware-based tasks. The application 552 includes various applications, such as a Media Player (Media Player), a Browser (Browser), and the like, for implementing various application services. A program implementing the method of an embodiment of the present invention may be included in the application 552.
In summary, the present embodiment provides a computer-readable storage medium and an electronic device, which train a BERT model using an advertisement text with an irregular term to obtain an irregular term advertisement text prediction model, so as to process the advertisement text in the advertisement through the irregular term advertisement text prediction model to obtain a first probability value that the advertisement text of the advertisement has the irregular term, then process an advertisement image and an advertisement video of the advertisement respectively to obtain a second probability value that the advertisement image of the advertisement has the irregular term and a third probability value that the advertisement video of the advertisement has the irregular term, respectively calculate a violation parameter of the advertisement according to the obtained first probability value, the second probability value and the third probability value, and finally determine whether the advertisement is a violation advertisement according to the calculated violation parameter, compared with the mode that the advertisement with the non-standard terms cannot be detected in the related technology, the method can detect the advertisement with the non-standard terms, accurately judge whether the advertisement has the non-standard terms, detect the non-standard terms in the advertisement more truly, accurately and in real time, and reduce the risk that an account is blocked because an advertiser puts illegal advertisements; moreover, the advertising wording can be more standardized, and the commodity effect can be reasonably publicized; moreover, the advertisement with the non-standard expressions can be automatically detected, and the efficiency of detecting the advertisement with the non-standard expressions is greatly improved.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily think of the changes or substitutions within the technical scope of the present invention, and shall cover the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims (10)

1. A method for detecting an illegal advertisement, comprising:
acquiring an irregular expression, traversing the advertisement text of the delivered advertisement by using the irregular expression, and inquiring the advertisement text containing the irregular expression and the regular advertisement text; wherein the regular advertisement text is an advertisement text that does not contain the non-standard expression;
taking the advertisement text containing the non-standard expressions as a training negative sample and the regular advertisement text as a training positive sample, and training a BERT model to obtain a non-standard expression advertisement text prediction model;
acquiring an advertisement to be detected, extracting an advertisement text in the advertisement, and processing the advertisement text in the advertisement through the non-standard expression advertisement text prediction model to obtain a first probability value that the advertisement text of the advertisement has non-standard expressions;
acquiring an advertisement image of the advertisement, and processing the advertisement image of the advertisement to obtain a second probability value that the advertisement image of the advertisement has the non-standard expression;
acquiring an advertisement video of the advertisement, and processing the advertisement video of the advertisement to obtain a third probability value that the advertisement video of the advertisement has the non-standard expression;
calculating a violation parameter of the advertisement according to the obtained first probability value, the second probability value and the third probability value;
when the violation parameter of the advertisement is greater than a violation parameter threshold, determining the advertisement as a violation advertisement.
2. The method of claim 1, wherein training a BERT model using the advertisement text containing the irregular term as a training negative sample and the regular advertisement text as a training positive sample to obtain an irregular term advertisement text prediction model comprises:
translating the advertisement text containing the non-standard expression into first translation texts of various languages, taking the first translation texts as training negative samples, translating the regular advertisement text into second translation texts of various languages, and taking the second translation texts as training positive samples;
randomly extracting samples with a preset proportion from the training negative samples and the training positive samples to form a first training set;
and training the BERT model by using the first training set to obtain an irregular expression advertisement text prediction model.
3. The method of claim 1, wherein obtaining the advertisement image of the advertisement, processing the advertisement image of the advertisement to obtain a second probability value that the advertisement image of the advertisement has non-canonical terms comprises:
acquiring an advertisement picture containing the non-standard phrases from a violation advertisement database, and acquiring a regular advertisement picture used by a regular advertisement from a regular advertisement database; wherein the regular advertisement picture is an advertisement picture not containing the non-standard phrase;
randomly extracting pictures with a preset proportion from the advertisement pictures containing the non-standard expressions and the regular advertisement pictures to form a second training set, and converting the resolution of the pictures in the second training set to a preset resolution;
training a ResNet34 model by using the second training set with the preset resolution pictures to obtain an advertisement picture prediction model of the non-standard phrases;
and acquiring an advertisement image of the advertisement, inputting the advertisement image of the advertisement into the non-standard expression advertisement picture prediction model, and processing the advertisement image of the advertisement through the non-standard expression advertisement picture prediction model to obtain a second probability value that the advertisement image of the advertisement has non-standard expression.
4. The method of claim 3, wherein obtaining the video of the advertisement, processing the video of the advertisement to obtain a third probability value that the video of the advertisement has non-canonical terms comprises:
acquiring an advertisement video of the advertisement, and extracting a video key frame in the advertisement video by using a key frame extraction technology;
deleting the video key frames positioned at the beginning of the advertisement video and the video key frames positioned at the end of the advertisement video in the extracted video key frames, and extracting a plurality of video key frames to be detected from the video key frames which are deleted and positioned at the beginning of the advertisement video and the remaining video key frames which are positioned at the end of the advertisement video according to a preset time interval;
and inputting the plurality of video key frames of the advertisement video into the non-standard expression advertisement picture prediction model, and processing the plurality of video key frames of the advertisement video through the non-standard expression advertisement picture prediction model to obtain a third probability value that the advertisement video of the advertisement has non-standard expressions.
5. The method of claim 1, wherein the calculating the violation parameter for the advertisement based on the obtained first probability value, the second probability value, and the third probability value comprises:
calculating a violation parameter for the advertisement by:
M=A1*S1+A2*S2+A3*S3
wherein M represents an advertisement violation parameter; a1 denotes a first weight value; a2 denotes a first weight value; a3 denotes a third weight value; s1 represents a first probability value; s2 represents a second probability value; s3 represents a third probability value.
6. An illegal advertisement detection device, comprising:
the acquisition module is used for acquiring the non-standard expression, traversing the advertisement text of the delivered advertisement by using the non-standard expression and inquiring the advertisement text containing the non-standard expression and the regular advertisement text; wherein the regular advertisement text is an advertisement text that does not contain the non-standard expression;
the training module is used for taking the advertisement text containing the non-standard expression as a training negative sample and the regular advertisement text as a training positive sample, and training a BERT model to obtain a non-standard expression advertisement text prediction model;
the first detection module is used for acquiring an advertisement to be detected, extracting an advertisement text in the advertisement, and processing the advertisement text in the advertisement through the non-standard phrase advertisement text prediction model to obtain a first probability value that the advertisement text of the advertisement has non-standard phrases;
the second detection module is used for acquiring the advertisement image of the advertisement and processing the advertisement image of the advertisement to obtain a second probability value that the advertisement image of the advertisement has the non-standard expression;
the third detection module is used for acquiring the advertisement video of the advertisement, processing the advertisement video of the advertisement and obtaining a third probability value that the advertisement video of the advertisement has the non-standard expression;
a calculation module, configured to calculate a violation parameter of the advertisement according to the obtained first probability value, the obtained second probability value, and the obtained third probability value;
and the determining module is used for determining the advertisement as the illegal advertisement when the illegal parameter of the advertisement is larger than the illegal parameter threshold value.
7. The apparatus of claim 6, wherein the training module is specifically configured to:
translating the advertisement text containing the non-standard wording into first translated texts of various languages, taking the first translated texts as training negative samples, translating the regular advertisement text into second translated texts of various languages, and taking the second translated texts as training positive samples;
randomly extracting samples with a preset proportion from the training negative samples and the training positive samples to form a first training set;
and training the BERT model by using the first training set to obtain an irregular expression advertisement text prediction model.
8. The apparatus of claim 6, wherein the second detection module is specifically configured to:
acquiring an advertisement picture containing the non-standard phrases from a violation advertisement database, and acquiring a regular advertisement picture used by a regular advertisement from a regular advertisement database; wherein the regular advertisement picture is an advertisement picture not containing the non-standard phrase;
randomly extracting pictures with a preset proportion from the advertisement pictures containing the non-standard phrases and the regular advertisement pictures to form a second training set, and converting the resolution of the pictures in the second training set to a preset resolution;
training a ResNet34 model by using the second training set with the preset resolution pictures to obtain an advertisement picture prediction model of the non-standard phrases;
and acquiring an advertisement image of the advertisement, inputting the advertisement image of the advertisement into the non-standard expression advertisement picture prediction model, and processing the advertisement image of the advertisement through the non-standard expression advertisement picture prediction model to obtain a second probability value that the advertisement image of the advertisement has non-standard expressions.
9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to any one of the claims 1 to 5.
10. An electronic device comprising a memory, a processor, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the processor to perform the steps of the method of any of claims 1-5.
CN202210334874.5A 2022-03-31 2022-03-31 Illegal advertisement detection method and device and electronic equipment Pending CN114677178A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210334874.5A CN114677178A (en) 2022-03-31 2022-03-31 Illegal advertisement detection method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210334874.5A CN114677178A (en) 2022-03-31 2022-03-31 Illegal advertisement detection method and device and electronic equipment

Publications (1)

Publication Number Publication Date
CN114677178A true CN114677178A (en) 2022-06-28

Family

ID=82076699

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210334874.5A Pending CN114677178A (en) 2022-03-31 2022-03-31 Illegal advertisement detection method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN114677178A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7223950B1 (en) * 2022-08-16 2023-02-17 株式会社アートワークスコンサルティング ADVERTISING EXPRESSION DETERMINATION DEVICE, STORAGE MEDIUM, AND PROGRAM
CN116956897A (en) * 2023-09-20 2023-10-27 湖南财信数字科技有限公司 Method, device, computer equipment and storage medium for processing hidden advertisement

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7223950B1 (en) * 2022-08-16 2023-02-17 株式会社アートワークスコンサルティング ADVERTISING EXPRESSION DETERMINATION DEVICE, STORAGE MEDIUM, AND PROGRAM
TWI833657B (en) * 2022-08-16 2024-02-21 日商藝術工作顧問股份有限公司 Advertising performance determination devices, memory media and programs
WO2024038509A1 (en) * 2022-08-16 2024-02-22 株式会社アートワークスコンサルティング Advertising expression determination device, storage medium, and program
CN116956897A (en) * 2023-09-20 2023-10-27 湖南财信数字科技有限公司 Method, device, computer equipment and storage medium for processing hidden advertisement
CN116956897B (en) * 2023-09-20 2023-12-15 湖南财信数字科技有限公司 Method, device, computer equipment and storage medium for processing hidden advertisement

Similar Documents

Publication Publication Date Title
CN114677178A (en) Illegal advertisement detection method and device and electronic equipment
US11074436B1 (en) Method and apparatus for face recognition
EP4002197A1 (en) Sign language recognition method and apparatus, computer-readable storage medium, and computer device
US11856277B2 (en) Method and apparatus for processing video, electronic device, medium and product
CN110457699B (en) Method and device for mining stop words, electronic equipment and storage medium
CN110198464B (en) Intelligent voice broadcasting method and device, computer equipment and storage medium
CN111046904B (en) Image description method, image description device and computer storage medium
EP3944188A1 (en) Image processing device, image processing method, and recording medium in which program is stored
CN110727803A (en) Text event extraction method and device
CN112395391A (en) Concept graph construction method and device, computer equipment and storage medium
CN112749639B (en) Model training method and device, computer equipment and storage medium
CN111368056B (en) Ancient poetry generating method and device
CN113112185A (en) Teacher expressive force evaluation method and device and electronic equipment
CN116248906A (en) Live caption adding method, device and equipment
CN111191446A (en) Interactive information processing method and device, computer equipment and storage medium
CN115294592A (en) Claim settlement information acquisition method and acquisition device, computer equipment and storage medium
CN114067362A (en) Sign language recognition method, device, equipment and medium based on neural network model
CN111382322B (en) Method and device for determining similarity of character strings
CN113869099A (en) Image processing method and device, electronic equipment and storage medium
CN113705697A (en) Information pushing method, device, equipment and medium based on emotion classification model
CN114663152A (en) Advertisement processing method and device and electronic equipment
CN110163043B (en) Face detection method, device, storage medium and electronic device
CN111863268B (en) Method suitable for extracting and structuring medical report content
CN113434895B (en) Text decryption method, device, equipment and storage medium
CN113673414B (en) Bullet screen generation method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination