CN115311022A - Advertisement traffic identification method and device and computer readable storage medium - Google Patents

Advertisement traffic identification method and device and computer readable storage medium Download PDF

Info

Publication number
CN115311022A
CN115311022A CN202211028480.3A CN202211028480A CN115311022A CN 115311022 A CN115311022 A CN 115311022A CN 202211028480 A CN202211028480 A CN 202211028480A CN 115311022 A CN115311022 A CN 115311022A
Authority
CN
China
Prior art keywords
advertisement
click
data
traffic
identification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211028480.3A
Other languages
Chinese (zh)
Inventor
张晟
张运辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Zooking Software Co ltd
Original Assignee
Shenzhen Zooking Software Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Zooking Software Co ltd filed Critical Shenzhen Zooking Software Co ltd
Priority to CN202211028480.3A priority Critical patent/CN115311022A/en
Publication of CN115311022A publication Critical patent/CN115311022A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0242Determining effectiveness of advertisements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/56Extraction of image or video features relating to colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Business, Economics & Management (AREA)
  • Development Economics (AREA)
  • Accounting & Taxation (AREA)
  • Strategic Management (AREA)
  • Finance (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Biophysics (AREA)
  • Game Theory and Decision Science (AREA)
  • Computational Linguistics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application relates to an advertisement flow identification method, an advertisement flow identification device and a computer readable storage medium, wherein the method comprises the steps of obtaining advertisement flow data when an advertisement is clicked; the advertisement flow data comprises advertisement click data and advertisement graph data associated with advertisement click; inputting the advertisement traffic data into the trained advertisement traffic recognition model to obtain a target recognition result of the advertisement traffic data; the target identification result is used for representing whether the advertisement traffic data belongs to invalid traffic data, and the invalid traffic data is mistakenly clicked traffic data or fraudulent traffic data. According to the method, the invalid flow and the low-quality effective flow in the advertisement can be effectively and quickly identified through the advertisement flow identification model, the flow anti-cheating judgment can be also quickly carried out on the click thermodynamic diagram data of different media advertisement positions in different advertisement pictures in real time, and the accuracy of the identification result is improved.

Description

Advertisement traffic identification method and device and computer readable storage medium
Technical Field
The present application relates to the field of advertisement technologies, and in particular, to an advertisement traffic identification method, an advertisement traffic identification device, and a computer-readable storage medium.
Background
In advertisement putting, the putting effect of the advertisement is reflected by monitoring advertisement flow data. After the advertisement is put, various cheating behaviors are easy to occur, including advertisement clicking caused by illegal path flow brought by simulating clicking of a user through a program and the like, and the benefits of an advertiser and an advertisement platform are damaged. Therefore, a method of identifying whether advertisement traffic is valid should be applied.
In the prior art, the advertisement traffic identification method usually adopts a single classification model for identification, and does not consider advertisement click fraud scenes and advertisement picture fraud scenes delivered by media, so that the identification is not accurate.
Disclosure of Invention
In view of the foregoing, it is desirable to provide an advertisement traffic identification method, device and computer readable storage medium to improve the advertisement traffic identification accuracy.
In a first aspect, an embodiment of the present application provides an advertisement traffic identification method, where the advertisement traffic identification method includes:
acquiring advertisement flow data when an advertisement is clicked; wherein the advertisement traffic data comprises advertisement click data and advertisement graph data associated with the advertisement click time;
inputting the advertisement traffic data into a trained advertisement traffic recognition model to obtain a target recognition result of the advertisement traffic data; the target identification result is used for representing whether the advertisement traffic data belongs to invalid traffic data or not, and the invalid traffic data comprises error click traffic data and fraud traffic data.
In one embodiment, the advertisement traffic recognition model includes an advertisement click recognition model and an advertisement graph recognition model, and the inputting the advertisement traffic data into the trained advertisement traffic recognition model to obtain the recognition result of the advertisement traffic data includes:
inputting the advertisement click data to the advertisement click recognition model to obtain a first recognition result of the advertisement click data;
inputting the advertisement image data into the advertisement image recognition model to obtain a second recognition result of the advertisement image data;
determining a target recognition result of the advertisement traffic data based on the first recognition result and the second recognition result.
In one embodiment, the method for identifying advertisement clicks includes the steps of, for example, mistakenly clicking an advertisement and cheating advertisement clicks, the advertisement click data includes advertisement click coordinate information, and the step of inputting the advertisement click data to the advertisement click identification model to obtain a first identification result of the advertisement click data includes:
inputting the advertisement click data to the advertisement click recognition model, and judging whether the advertisement click coordinate information is located in a preset advertisement click area;
if the advertisement click coordinate information is not located in the preset advertisement click area, judging the advertisement click as a mistaken click advertisement, and judging that a first identification result of the advertisement click data is an invalid click;
and if the advertisement click coordinate information is located in the preset advertisement click area, recognizing the advertisement click data by using a first recognition rule preset in the advertisement click recognition model, judging the advertisement click meeting the first recognition rule as a fraudulent advertisement click, and judging a first recognition result of the advertisement click data as an invalid click.
In one embodiment, the inputting the advertisement graph data into the advertisement graph recognition model to obtain a second recognition result of the advertisement graph data includes
Inputting the advertisement image data into the advertisement image identification model, and outputting the advertisement image characteristic vector of the advertisement image data;
and identifying the characteristic vector of the advertisement image by using a second identification rule in the advertisement image identification model, and judging the advertisement image data meeting the second identification rule as a fraud image, wherein a second identification result of the advertisement image data is an invalid image.
In one embodiment, the determining a target recognition result of the advertisement traffic data based on the first recognition result and the second recognition result includes:
if the first recognition result is invalid click or the second recognition result is invalid graph, judging that the target recognition result of the advertisement traffic data is invalid traffic data;
and if the first identification result is a valid click and the second identification result is a valid image, judging that the target identification result of the advertisement traffic data is valid traffic data, and the valid traffic data is normal traffic data.
In one embodiment, the training method of the advertisement traffic recognition model comprises the following steps:
acquiring a first sample training set and a second sample training set, wherein the first sample training set is sample advertisement click data, and the second sample training set is three-dimensional sample advertisement graph data;
inputting the first sample training set to an initial advertisement click recognition model for training, and inputting the second sample training set to an initial advertisement image recognition model for training;
calculating the gradient of a loss function, updating the weight parameters of each model through gradient reduction until convergence when the accuracy and recall rate corresponding to each model reach a preset threshold value, and obtaining a trained advertisement click recognition model and an advertisement image recognition model;
and fusing the trained advertisement click recognition model and the advertisement image recognition model to obtain a trained advertisement flow recognition model.
In one embodiment, the loss function is a cross-entropy loss function, and the gradient descent is a random gradient descent.
In one embodiment, the advertisement clicks include offline clicks and online clicks.
In a second aspect, an embodiment of the present application further provides an advertisement traffic identification apparatus, where the advertisement traffic identification apparatus includes:
the advertisement flow data acquisition module is used for acquiring advertisement flow data when an advertisement is clicked; wherein the advertisement traffic data comprises advertisement click data and advertisement graph data associated with the advertisement click time;
the advertisement traffic identification module is used for inputting the advertisement traffic data into a trained advertisement traffic identification model to obtain a target identification result of the advertisement traffic data; the target identification result is used for representing whether the advertisement traffic data belongs to invalid traffic data or not, and the invalid traffic data comprises error click traffic data and fraud traffic data.
In a third aspect, an embodiment of the present application further provides a computer-readable storage medium, which is configured to execute the method for identifying advertisement traffic data according to the first aspect when executed by a computer processor.
Compared with the prior art, the advertisement flow identification method, the device and the computer readable storage medium provided by the application have the advantages that the advertisement flow data when the advertisement is clicked are obtained; the advertisement flow data comprises advertisement click data and advertisement graph data associated with advertisement click; inputting the advertisement traffic data into the trained advertisement traffic recognition model to obtain a target recognition result of the advertisement traffic data; the target identification result is used for representing whether the advertisement traffic data belongs to invalid traffic data, and the invalid traffic data is mistakenly clicked traffic data or fraudulent traffic data. According to the method, the invalid flow and the low-quality effective flow in the advertisement can be effectively and quickly identified through the advertisement flow identification model, the flow anti-cheating judgment can be also quickly carried out on the click thermodynamic diagram data of different media advertisement positions in different advertisement pictures in real time, and the accuracy of the identification result is improved.
Drawings
Fig. 1 is a schematic flowchart of an advertisement traffic identification method according to an embodiment of the present application;
FIG. 2 is an exploded flowchart illustrating the steps 200 of an advertisement traffic identification method according to an embodiment of the present application;
fig. 3 is an exploded view illustrating the flow of step S201 of the advertisement traffic identification method according to an embodiment of the present application;
FIG. 4 is an exploded view of the flowchart of step S202 of the advertisement traffic identification method according to an embodiment of the present application;
fig. 5 is an exploded view of the flowchart of step S203 of the advertisement traffic identification method according to an embodiment of the present application;
FIG. 6 is a flowchart illustrating a training method of an advertisement traffic recognition model according to another embodiment of the present application;
fig. 7 is a frame structure diagram of an advertisement traffic recognition apparatus according to another embodiment of the present application;
fig. 8 is a block diagram of a frame of an apparatus according to another embodiment of the present application.
Detailed Description
To facilitate an understanding of the present application, the present application will now be described more fully with reference to the accompanying drawings. Preferred embodiments of the present application are given in the accompanying drawings. This application may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.
It will be understood that when an element is referred to as being "secured to" another element, it can be directly on the other element or intervening elements may also be present. When an element is referred to as being "connected" to another element, it can be directly connected to the other element or intervening elements may also be present. The terms "vertical," "horizontal," "left," "right," and the like as used herein are for illustrative purposes only and do not denote a single embodiment.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein in the description of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
Referring to fig. 1, a flowchart of an advertisement traffic identification method according to an embodiment of the present application is exemplarily shown, where the advertisement traffic identification method includes the following steps:
s100, obtaining advertisement flow data when the advertisement is clicked.
In the embodiment of the application, the advertisement clicks can be divided into user online clicks and user offline clicks. Of course, the advertisement click may be a click of an advertisement request made by a user, or may be a click behavior in which the user simply clicks on an advertisement map. Specifically, the user may click on the advertisement through an Internet web page, or the user may click on the advertisement through a terminal Device, where the terminal Device may be a communication terminal, an Internet access terminal such as a computer, and a music/video playing terminal, such as a PDA, an MID (Mobile Internet Device) and/or a Mobile phone with a music/video playing function, or may be a smart television, a set-top box, and other devices; the terminal equipment can also be a mobile terminal, such as a mobile phone, a tablet, a palm computer and the like.
The advertisement traffic data includes, but is not limited to, advertisement click data, advertisement graph data, user base data, and the like. In this embodiment, the advertisement traffic data includes advertisement click data and advertisement map data.
The advertisement click data includes, but is not limited to, information such as advertisement click area coordinate information, coordinate information when a user clicks an advertisement, advertisement slot information, advertisement link address, advertisement click time, IP address of a click device, ID of media, and device model. Specifically, the advertisement position information may be used to describe a delivery platform of the advertisement, a layout of the advertisement, a specific position of the advertisement on the layout, a playing time of the advertisement, and the like. The advertisement link address can be used for describing a link address or a download address corresponding to the advertisement delivery media, and the like.
Illustratively, the coordinate information of the advertisement click area is coordinate information of four vertexes of a rectangular area in a certain ad slot area. The four vertex coordinate information constitute a region clicked by the user as the advertisement click region coordinate information. When a user accesses a page, such as a shopping website, an advertisement display area is arranged in a preset advertisement area of the website, such as the lower right corner of the webpage
The advertisement graph data refers to an advertisement material graph, and the advertisement material graph can be used for describing information such as characters, special effects, logos and the like contained in the advertisement. One ad spot may correspond to the placement of multiple ad maps. Of course, different media ad spots may deliver different maps of advertising material.
The user basic data includes user behavior data, such as uninstalling information of an application recommended by an advertisement, retention information of an application program embedded in the SDK data acquisition device, recommendation behavior information of a content recommended by the advertisement (for example, whether a user forwards the content to a friend circle, whether recommendation is performed through an application such as a microblog), and the like, and also includes user basic information. The user basic information can be the time, IP address, internet access mode, language and the like of the user on the internet. Illustratively, the users are classified into normal users and fraudulent users according to the existing basic information of the users. The advertisement clicks made by normal users belong to normal advertisement clicks, and the advertisement clicks made by fraudulent users belong to fraudulent advertisement clicks. As yet another example, when a current advertisement click is detected in the advertisement request phase, a fraudulent user action, such as an advertisement click, does not return an advertisement.
S200, inputting the advertisement traffic data to a trained advertisement traffic recognition model to obtain a target recognition result of the advertisement traffic data.
In the embodiment of the application, the advertisement flow identification model comprises an advertisement click identification model and an advertisement graph identification model. The target identification result is used for representing whether the advertisement traffic data belongs to invalid traffic data, wherein the invalid traffic data is mis-click traffic data or fraud traffic data.
Specifically, the advertisement traffic recognition model is a trained convolutional neural network model. In an embodiment, the advertisement traffic recognition model is obtained by fusing a trained advertisement click recognition model and a trained advertisement graph recognition model.
The advertisement click identification model is used for identifying flow data generated by advertisement click; the advertisement graph identification model is used to identify traffic data generated by the advertisement graph data.
Further, the advertisement traffic data is divided into effective traffic data and ineffective traffic data, and the effective traffic data refers to traffic data that is clicked on a normal advertisement and that is clicked on a normal graph. The invalid traffic data is divided into three categories: the first is the traffic data of the false click, the second is the traffic data of the fraudulent click, and the third is the traffic data of the false map of the advertisement click.
Further, in this embodiment, the traffic data of the advertisement click false graph includes the traffic data of the normal click false graph, also includes the traffic data of the false click false graph, and also includes the traffic data of the false click false graph. In other words, whenever a click is a false graph, its ad clicks are classified as invalid traffic data regardless of fraud or not. It should be noted that the ghost map may also be referred to as a fraud map.
In one embodiment, as shown in fig. 2, step S200, inputting the advertisement traffic data into a trained advertisement traffic recognition model to obtain a target recognition result of the advertisement traffic data, includes:
s201, inputting the advertisement click data to the advertisement click recognition model to obtain a first recognition result of the advertisement click data;
specifically, the advertisement click recognition model is directed to advertisement clicks that are determined based on user behavior, and thus, advertisement clicks may be classified as normal advertisement clicks, mis-click advertisements, fraudulent advertisement clicks. The mis-click advertisement refers to a click behavior generated by a user unconsciously triggering the advertisement. In this embodiment, a region where the user clicks outside the preset advertisement clicking region is determined as a mis-click.
Further, the first recognition result is used for characterizing the classification result of the advertisement click model on the advertisement click data. In the embodiment of the application, the first recognition result is divided into valid clicks and invalid clicks, the traffic data of the mistaken click advertisement or the fraudulent click advertisement belongs to the invalid clicks, and the traffic data of the normal wide click advertisement belongs to the valid clicks.
Further, in the present embodiment, as shown in fig. 3, step S201 is detailed. Inputting the advertisement click data into the advertisement click recognition model to obtain a first recognition result of the advertisement click data, wherein the first recognition result comprises:
s2011, inputting the advertisement click data to the advertisement click identification model, and judging whether the advertisement click coordinate information is located in a preset advertisement click area;
s2012, if the advertisement click coordinate information is not located in the preset advertisement click area, judging the advertisement click to be a mistaken click advertisement, and judging that a first identification result of the advertisement click data is an invalid click;
s2013, if the advertisement click coordinate information is located in the preset advertisement click area, recognizing the advertisement click data by using a first recognition rule preset in the advertisement click recognition model, judging that the advertisement click meeting the first recognition rule is a fraudulent advertisement click, and judging that a first recognition result of the advertisement click data is an invalid click.
In one embodiment, the advertisement click data includes advertisement click coordinate information when a user clicks. It should be noted that the advertisement click recognition model is pre-configured with each advertisement position identification, the size of the advertisement position, and a preset advertisement click area corresponding to each advertisement position identification. The preset advertisement click region is preferably coordinate information of a vertex of the advertisement click region.
Based on the above, when the advertisement click is detected, the advertisement click coordinate information and the advertisement position identification during the advertisement click are obtained, and the advertisement click identification model can determine the target preset advertisement click area corresponding to the advertisement click coordinate information according to the mapping relation between the advertisement position identification and the preset advertisement click area, so that whether the advertisement click coordinate information is located in the target preset advertisement click area can be judged.
When the advertisement click coordinate information clicked by the user is not located in the target preset advertisement click area, the user click is regarded as a false click operation, in other words, the user is regarded as a click behavior triggered unconsciously. Therefore, the user advertisement click is determined as a mis-click advertisement. Based on this, the first recognition result output by the advertisement click recognition model is the invalid click.
Of course, in another embodiment, the first recognition result output by the advertisement click recognition model may also be the mis-click advertisement traffic data.
When the advertisement click coordinate information clicked by the user is located in the target preset advertisement click area, it is indicated that the user click behavior meets the user's will and is not the click behavior triggered unconsciously. At this time, the advertisement click clicked by the user needs to be determined to be a normal advertisement click or a fraudulent advertisement click based on the fraudulent advertisement or the false advertisement behavior according to the user's intention.
In an embodiment of the application, the advertisement click recognition model is further configured with a first recognition rule. The first recognition rule is used to recognize whether the advertisement click is a normal advertisement click or a fraudulent advertisement click. In some embodiments, the first recognition rule may be a set of conditional judgment statements, such as multiple advertisement clicks in a short time being fraudulent advertisement clicks, too many changes in geographic location IP in a short time being fraudulent advertisement clicks, etc. In the embodiment of the present application, the first identification rule is that the advertisement click frequency is greater than or equal to the preset click threshold value, and is a fraudulent advertisement click, and the advertisement click frequency is less than the preset click threshold value, and is a normal advertisement click. The preset click threshold may be set according to actual requirements of different scenes, and may be 60% or 70%. Of course, the first identification rule is not limited, and may be a fraudulent advertisement click when a certain characteristic factor satisfies a condition. The special factors may be unload rate, retention rate, click-through rate, advertising revenue, etc. The condition may be that the average value, the median value, the maximum value or the minimum value corresponding to the probability value is used as a factor for determining the fraudulent advertisement click, or that the average value, the median value, the maximum value or the minimum value corresponding to the probability value is greater than a preset probability threshold or a probability area is used as a factor for determining the fraudulent advertisement click.
Further, the advertisement click data meeting the first identification rule is judged as fraudulent advertisement click, and the advertisement click data not meeting the first identification rule is judged as normal advertisement click. Based on the above, the first recognition result output by the advertisement click recognition model corresponding to the normal advertisement click is an effective click, and the first recognition result output by the advertisement click recognition model corresponding to the fraudulent advertisement click is an invalid click.
S202, inputting the advertisement image data into the advertisement image recognition model to obtain a second recognition result of the advertisement image data.
In the embodiment of the application, the advertisement image data refers to a corresponding advertisement material image when a user clicks. In other words, the advertisement graph placed on the media ad slot. Preferably, the advertisement graph is divided into a fraud graph and a normal graph. Illustratively, the pictures delivered by the advertiser are normal pictures, and the pictures of the normal pictures simulated by the illegal platform are false pictures. In one embodiment, the pictures placed by the fraudulent user can also be regarded as a false graph or a fraud graph.
The advertisement image identification model is used for identifying the advertisement image clicked by the user so as to judge whether the advertisement image is a fraud image or a false image or a normal image. That is, if the advertisement map is determined to be a fraud map or a false map, the second recognition result output by the advertisement map recognition model is an invalid map, and if the advertisement map is determined to be a normal map, the second recognition result output by the advertisement map recognition model is a normal map.
Further, in the present embodiment, as shown in fig. 4, step S202 is detailed. Inputting the advertisement image data into the advertisement image recognition model to obtain a second recognition result of the advertisement image data, wherein the second recognition result comprises:
s2021, inputting the advertisement image data into the advertisement image identification model, and outputting an advertisement image feature vector of the advertisement image data;
s2022, identifying the feature vector of the advertisement image by using a second identification rule in the advertisement image identification model, and judging the advertisement image data meeting the second identification rule as a fraud image, wherein the second identification result of the advertisement image data is an invalid image.
In an embodiment of the application, the advertisement graph identification model is configured with a second identification rule, and the second identification rule is used for judging whether the advertisement graph is a fraud graph or a false graph or a normal graph. The advertisement graph conforming to the second recognition rule is a fraud graph or a false graph, and the advertisement graph not conforming to the second recognition rule is a normal graph. Specifically, the second identification rule is that the cosine similarity between the feature vector of the advertisement image clicked by the user and the feature vector of the advertisement image released by the advertisement position is smaller than a preset similarity threshold, and then the advertisement image is judged to be a fraud image or a false image. Of course, if the cosine similarity is greater than or equal to the preset similarity threshold, it is determined that the advertisement map is a normal map. It should be noted that the second identification rule is not limited, and may be set according to different actual scenarios, for example, the value of the cosine similarity matches that the advertisement map corresponding to an area is a fraud map or a false map.
In an embodiment, the feature of the advertisement map clicked by the user is extracted as RGB three advertisement map feature vectors, and the second identification rule determines that the fraud map or the false map is determined if cosine similarities between the three advertisement map feature vectors and three feature vectors correspondingly converted from the associated targeted advertisement map are all smaller than a preset similarity threshold. That is, as long as any cosine similarity value is greater than or equal to the preset similarity threshold, the normal map is determined.
Specifically, the advertising map is input into the advertising map recognition model, and the advertising map is extracted into an R pixel map, a G pixel map, and a B pixel map. In other words, the advertisement map is divided into an R advertisement map, a G advertisement map, and a B advertisement map through the advertisement map recognition model. And then respectively carrying out feature conversion on the R advertisement image, the G advertisement image and the B advertisement image to obtain an R advertisement image feature vector, a G advertisement image feature vector and a B advertisement image feature vector. And acquiring a preset target advertisement image corresponding to the advertisement click, and performing feature extraction on the target advertisement image to obtain an R target advertisement image feature vector, a G target advertisement image feature vector and a B target advertisement image feature vector. And then, respectively calculating the similarity between the R advertisement image feature vector and the R target advertisement image feature vector, the similarity between the G advertisement image feature vector and the G target advertisement image feature vector, and the similarity between the B advertisement image feature vector and the B target advertisement image feature vector by utilizing a cosine similarity formula so as to judge whether the three similarity values meet a second identification rule or not. If yes, the second recognition result output by the advertisement image recognition model is an invalid image, and is also a false image or a fraud image. If not, the second recognition result output by the advertisement map recognition model is a valid map, also called a normal map.
S203, determining a target identification result of the advertisement traffic data based on the first identification result and the second identification result.
In the embodiment of the application, the target identification result of the advertisement traffic data is comprehensively judged by combining the first identification result and the second identification result. Therefore, the target recognition result can be more accurate and the robustness can be enhanced through comprehensive judgment of at least two recognition result data.
Further, in the present embodiment, as shown in fig. 5, step S203 is detailed. The determining a target recognition result of the advertisement traffic data based on the first recognition result and the second recognition result includes:
s2031, if the first identification result is invalid click or the second identification result is an invalid graph, judging that the target identification result of the advertisement traffic data is invalid traffic data;
s2032, if the first recognition result is a valid click and the second recognition result is a valid map, determining that the target recognition result of the advertisement traffic data is valid traffic data, and the valid traffic data is normal traffic data.
That is, the target recognition result of the advertisement traffic data is invalid traffic data as long as either of the first recognition result or the second recognition result is determined to be invalid, which includes false or fraudulent, and also includes false positive.
In one embodiment, as shown in fig. 6, the training method of the advertisement traffic recognition model includes:
s601, obtaining a first sample training set and a second sample training set, wherein the first sample training set is sample advertisement click data, and the second sample training set is three-dimensional sample advertisement graph data;
s602, inputting the first sample training set to an initial advertisement click recognition model for training, and inputting the second sample training set to the initial advertisement image recognition model for training;
s603, calculating a loss function gradient, and updating the weight parameters of each model through gradient reduction until convergence occurs when the accuracy and the recall rate corresponding to each model reach a preset threshold value, so as to obtain a trained advertisement click recognition model and an advertisement image recognition model;
s604, fusing the trained advertisement click recognition model with the advertisement image recognition model to obtain a trained advertisement traffic recognition model.
In particular, the sample advertisement click data may be derived from an open source library as well as historical advertisement click data. The sample advertisement click data includes at least advertisement click coordinate information and a reference normal click region (i.e., coordinate information) flag. The three-dimensional sample advertisement image data at least comprises an RGB three-channel advertisement image and a reference three-channel advertisement image mark. Illustratively, the size of the advertisement material map, that is, the size of the advertisement map, is (m, n), the training input data has a data dimension (m, n, 4) formed by connecting 3 channels of RGB advertisement map data (m, n, 3) as a second sample training set and advertisement click coordinate distribution data (m, n, 1) on the advertisement map as a first sample training set, as a complete training sample set.
And constructing an initial advertisement click recognition model and an initial advertisement graph recognition model which have the same or partially same structures. In the embodiment, the advertisement click identification model and the initial advertisement graph identification model both adopt the same loss function and gradient descent function. Inputting the first sample training set into the initial advertisement click recognition model for training, adjusting parameters of the initial advertisement click recognition model according to the calculated loss function and gradient decline, and judging whether to stop training or not by using accuracy and recall rate. Specifically, in the training process, if the obtained accuracy and the obtained recall ratio both meet preset conditions, for example, the values of the accuracy and the recall ratio are both greater than or equal to preset threshold values, the training can be stopped, and the corresponding advertisement click recognition model when the training is stopped is used as the trained advertisement click recognition model.
And similarly, inputting the second sample training set into the initial advertisement image recognition model for training, performing parameter adjustment on the initial advertisement image recognition model according to the calculated loss function and gradient descent, and judging whether to stop training by using the accuracy and the recall rate. Specifically, in the training process, if the obtained accuracy and the obtained recall ratio both meet preset conditions, for example, the values of the accuracy and the recall ratio are both greater than or equal to preset threshold values, the training can be stopped, and the corresponding advertisement map recognition model when the training is stopped is used as the trained advertisement map recognition model. And finally, fusing the trained advertisement click recognition model and the advertisement image recognition model to obtain the advertisement flow recognition model.
In the embodiment of the application, the advertisement traffic identification model adopts an EfficientNet model, and the model belongs to a convolutional neural network model, and the structure of the model comprises an input layer, a convolutional layer, a full connection layer and an output layer. And a cross entropy loss function is adopted between the full connection layer and the output layer, and the output layer adopts a classifier to output the classification result of the advertisement traffic data. The classification result comprises traffic data of normal advertisement clicks, traffic data of error click advertisements and traffic data of fraud advertisement clicks. The flow data of normal advertisement clicks belongs to effective flow, and the flow data of error click advertisements and the flow data of fraud advertisement clicks belong to ineffective flow.
Further, the loss function is a cross-entropy loss function, and the gradient descent adopts a random gradient descent. Specifically, the loss function is selected from the Softmax cross entropy loss function, formula one is as follows:
Figure BDA0003816524870000121
random gradient descent (SGD), equation two is as follows:
Figure BDA0003816524870000122
further, in a possible embodiment, the trained advertisement traffic recognition model is deployed on a network device or a web page, so that the online clicked advertisement traffic data is judged in real time at the time of creating an automatic task. Similarly, offline clicked advertisement traffic data may also be identified.
In summary, the advertisement traffic identification method provided by the present application includes obtaining advertisement traffic data when an advertisement is clicked; the advertisement flow data comprises advertisement click data and advertisement graph data related to the advertisement click; inputting the advertisement traffic data into the trained advertisement traffic recognition model to obtain a target recognition result of the advertisement traffic data; the target identification result is used for representing whether the advertisement traffic data belongs to invalid traffic data, and the invalid traffic data is mistakenly clicked traffic data or fraudulent traffic data. According to the method, the invalid flow and the low-quality effective flow in the advertisement can be effectively and quickly identified through the advertisement flow identification model, the flow anti-cheating judgment can be also quickly carried out on the click thermodynamic diagram data of different media advertisement positions in different advertisement pictures in real time, and the accuracy of the identification result is improved.
It should be noted that, for simplicity of description, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the illustrated order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments of the present invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.
Referring to fig. 7, the present application further provides an advertisement traffic recognition apparatus 10, where the advertisement traffic recognition apparatus 10 includes an advertisement traffic data obtaining module 110 and an advertisement traffic recognition module 120.
As an alternative embodiment, the advertisement traffic recognition device 10 is used for real-time determination of advertisement traffic data for advertisement clicks, such as online clicks.
The advertisement traffic data acquiring module 110 is configured to acquire advertisement traffic data when an advertisement is clicked; wherein the advertisement traffic data comprises advertisement click data and advertisement graph data associated with the advertisement click;
the advertisement traffic identification module 120 is configured to input the advertisement traffic data to a trained advertisement traffic identification model to obtain a target identification result of the advertisement traffic data; the target identification result is used for representing whether the advertisement traffic data belongs to invalid traffic data, wherein the invalid traffic data is error click traffic data or fraud traffic data.
As an optional implementation, the advertisement traffic identification module 120 includes:
the advertisement flow first identification module 121 is configured to input the advertisement click data to the advertisement click identification model, and obtain a first identification result of the advertisement click data;
the advertisement flow second identification module 122 is configured to input the advertisement graph data to the advertisement graph identification model, so as to obtain a second identification result of the advertisement graph data;
a first determining module 123, configured to determine a target identification result of the advertisement traffic data based on the first identification result and the second identification result.
Further, as an optional implementation manner, the advertisement flow first identification module 121 includes:
a click region determination module 1212, configured to input the advertisement click data to the advertisement click recognition model, and determine whether the advertisement click coordinate information is located in a preset advertisement click region;
a click determination first module 1212, configured to determine that the advertisement click is a mis-click advertisement if the advertisement click coordinate information is not located in the preset advertisement click area, and determine that the first identification result of the advertisement click data is an invalid click;
a click determination second module 1213, configured to identify the advertisement click data according to a first identification rule preset in the advertisement click identification model if the advertisement click coordinate information is located in the preset advertisement click area, determine that an advertisement click meeting the first identification rule is a fraudulent advertisement click, and determine that a first identification result of the advertisement click data is an invalid click.
Further, as an optional implementation manner, the advertisement flow second identification module 122 includes:
the feature extraction module 1221 is configured to input the advertisement image data to the advertisement image identification model, and output an advertisement image feature vector of the advertisement image data;
the advertisement graph recognition module 1222 is configured to recognize the advertisement graph feature vector by using a second recognition rule in the advertisement graph recognition model, and determine the advertisement graph data meeting the second recognition rule as a fraud graph, where a second recognition result of the advertisement graph data is an invalid graph.
Further, as an optional implementation manner, the first determining module 123 includes:
an invalid determination module 1231, configured to determine that a target recognition result of the advertisement traffic data is invalid traffic data if the first recognition result is an invalid click or the second recognition result is an invalid graph;
an effective determination module 1232, configured to determine that the target identification result of the advertisement traffic data is effective traffic data if the first identification result is an effective click and the second identification result is an effective graph, where the effective traffic data is normal traffic data.
An embodiment of the present application further provides a machine-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the advertisement traffic identification method of any of the above embodiments.
The system/computer device integrated components/modules/units, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow in the method of the embodiments described above can be realized by a computer program, which can be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments described above can be realized. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable storage medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, read-only Memory (ROM), random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, etc. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.
As another aspect, as shown in fig. 7, the present application also provides an apparatus 1300 including one or more Central Processing Units (CPUs) 1301 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 1302 or a program loaded from a storage portion 1308 into a Random Access Memory (RAM) 1303. In the RAM1303, various programs and data necessary for the operation of the device 1300 are also stored. The CPU1301, the ROM1302, and the RAM1303 are connected to each other via a bus 1304. An input/output (I/O) interface 1305 is also connected to bus 1304.
The following components are connected to the I/O interface 1305: an input portion 1306 including a keyboard, a mouse, and the like; an output portion 1307 including a signal such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 1308 including a hard disk and the like; and a communication section 1309 including a network interface card such as a LAN card, a modem, or the like. The communication section 1309 performs communication processing via a network such as the internet. The drive 1310 is also connected to the I/O interface 1305 as needed. A removable medium 1311 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 1310 as necessary, so that a computer program read out therefrom is mounted into the storage portion 1308 as necessary.
Of course, in another embodiment, the present application further provides a computer device comprising: a processor; a memory for storing executable instructions of the processor; wherein the processor is configured to perform the advertisement traffic identification method of any of the above embodiments via execution of the executable instructions.
In the several embodiments provided in this application, it should be understood that the disclosed systems and methods may be implemented in other ways. For example, the above-described system embodiments are merely illustrative, and for example, the division of the components is only one logical division, and other divisions may be realized in practice.
In addition, each functional module/component in the embodiments of the present application may be integrated into the same processing module/component, or each functional module/component may exist alone physically, or two or more functional modules/components may be integrated into the same processing module/component. The integrated modules/components can be implemented in the form of hardware, or can be implemented in the form of hardware plus software functional modules/components.
It will be evident to those skilled in the art that the embodiments of the present application are not limited to the details of the foregoing illustrative embodiments, and that the embodiments of the present application can be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the embodiments being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. Several units, modules or means recited in the system, apparatus or terminal claims may also be implemented by one and the same unit, module or means in software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is specific and detailed, but not to be understood as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent application shall be subject to the appended claims.

Claims (10)

1. An advertisement traffic identification method, characterized by comprising:
acquiring advertisement flow data when an advertisement is clicked; wherein the advertisement traffic data comprises advertisement click data and advertisement graph data associated with the advertisement click;
inputting the advertisement traffic data into a trained advertisement traffic recognition model to obtain a target recognition result of the advertisement traffic data; the target identification result is used for representing whether the advertisement traffic data belongs to invalid traffic data, and the invalid traffic data is mis-click traffic data or fraud traffic data.
2. The advertisement traffic recognition method according to claim 1, wherein the advertisement traffic recognition model includes an advertisement click recognition model and an advertisement graph recognition model, and the inputting the advertisement traffic data into the trained advertisement traffic recognition model to obtain the target recognition result of the advertisement traffic data includes:
inputting the advertisement click data to the advertisement click recognition model to obtain a first recognition result of the advertisement click data;
inputting the advertisement image data into the advertisement image recognition model to obtain a second recognition result of the advertisement image data;
determining a target recognition result of the advertisement traffic data based on the first recognition result and the second recognition result.
3. The advertisement traffic identification method according to claim 2, wherein the advertisement clicks include a mis-click advertisement and a fraudulent advertisement click, the advertisement click data includes advertisement click coordinate information, and the inputting the advertisement click data to the advertisement click identification model to obtain the first identification result of the advertisement click data includes:
inputting the advertisement click data to the advertisement click recognition model, and judging whether the advertisement click coordinate information is located in a preset advertisement click area;
if the advertisement click coordinate information is not located in the preset advertisement click area, judging the advertisement click as a mistaken click advertisement, and judging that a first identification result of the advertisement click data is an invalid click;
and if the advertisement click coordinate information is located in the preset advertisement click area, recognizing the advertisement click data by using a first recognition rule preset in the advertisement click recognition model, judging the advertisement click meeting the first recognition rule as a fraudulent advertisement click, and judging a first recognition result of the advertisement click data as an invalid click.
4. The advertisement traffic identification method according to claim 2, wherein the inputting the advertisement graph data to the advertisement graph identification model to obtain a second identification result of the advertisement graph data comprises:
inputting the advertisement image data into the advertisement image identification model, and outputting an advertisement image feature vector of the advertisement image data;
and identifying the characteristic vector of the advertisement image by using a second identification rule in the advertisement image identification model, and judging the advertisement image data meeting the second identification rule as a fraud image, wherein a second identification result of the advertisement image data is an invalid image.
5. The advertisement traffic identification method according to any one of claims 2 to 4, wherein the determining a target identification result of the advertisement traffic data based on the first identification result and the second identification result includes:
if the first recognition result is an invalid click or the second recognition result is an invalid graph, judging that the target recognition result of the advertisement traffic data is invalid traffic data;
and if the first identification result is a valid click and the second identification result is a valid graph, determining that the target identification result of the advertisement traffic data is valid traffic data, and the valid traffic data is normal traffic data.
6. The advertisement traffic recognition method according to claim 1, wherein the training method of the advertisement traffic recognition model comprises:
acquiring a first sample training set and a second sample training set, wherein the first sample training set is sample advertisement click data, and the second sample training set is three-dimensional sample advertisement graph data;
inputting the first sample training set to an initial advertisement click recognition model for training, and inputting the second sample training set to an initial advertisement image recognition model for training;
calculating the gradient of a loss function, updating the weight parameters of each model through gradient reduction until convergence when the accuracy and recall rate corresponding to each model reach a preset threshold value, and obtaining a trained advertisement click recognition model and an advertisement image recognition model;
and fusing the trained advertisement click recognition model and the advertisement image recognition model to obtain a trained advertisement flow recognition model.
7. The ad traffic identification method of claim 6, wherein the loss function is a cross-entropy loss function, and the gradient descent employs a random gradient descent.
8. The advertisement traffic identification method of claim 1, wherein the advertisement clicks include offline clicks and online clicks.
9. An advertisement traffic recognition apparatus, characterized in that the advertisement traffic recognition apparatus comprises:
the advertisement flow data acquisition module is used for acquiring advertisement flow data when an advertisement is clicked; wherein the advertisement traffic data comprises advertisement click data and advertisement graph data associated with the advertisement click time;
the advertisement traffic identification module is used for inputting the advertisement traffic data into a trained advertisement traffic identification model to obtain a target identification result of the advertisement traffic data; the target identification result is used for representing whether the advertisement traffic data belongs to invalid traffic data, wherein the invalid traffic data is error click traffic data or fraud traffic data.
10. A computer-readable storage medium, wherein the computer-readable storage medium, when executed by a computer processor, is configured to perform the method of identifying advertisement traffic data of any of claims 1-8.
CN202211028480.3A 2022-08-25 2022-08-25 Advertisement traffic identification method and device and computer readable storage medium Pending CN115311022A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211028480.3A CN115311022A (en) 2022-08-25 2022-08-25 Advertisement traffic identification method and device and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211028480.3A CN115311022A (en) 2022-08-25 2022-08-25 Advertisement traffic identification method and device and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN115311022A true CN115311022A (en) 2022-11-08

Family

ID=83864083

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211028480.3A Pending CN115311022A (en) 2022-08-25 2022-08-25 Advertisement traffic identification method and device and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN115311022A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116051185A (en) * 2023-04-03 2023-05-02 深圳媒介之家文化传播有限公司 Advertisement position data abnormality detection and screening method

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116051185A (en) * 2023-04-03 2023-05-02 深圳媒介之家文化传播有限公司 Advertisement position data abnormality detection and screening method

Similar Documents

Publication Publication Date Title
US8732015B1 (en) Social media pricing engine
US20090112715A1 (en) Engine, system and method for generation of brand affinity content
CN111522724B (en) Method and device for determining abnormal account number, server and storage medium
CN111932268A (en) Enterprise risk identification method and device
CN110659961A (en) Method and device for identifying off-line commercial tenant
CN109325845A (en) A kind of financial product intelligent recommendation method and system
CN109102324B (en) Model training method, and red packet material laying prediction method and device based on model
CN111178146A (en) Method and device for identifying anchor based on face features
CN112529575A (en) Risk early warning method, equipment, storage medium and device
CN115311022A (en) Advertisement traffic identification method and device and computer readable storage medium
JP7015927B2 (en) Learning model application system, learning model application method, and program
WO2020202327A1 (en) Learning system, learning method, and program
CN115730125A (en) Object identification method and device, computer equipment and storage medium
CN111402027B (en) Identity recognition method, commodity loan auditing method, device and terminal equipment
CN109711984B (en) Pre-loan risk monitoring method and device based on collection urging
CN114817518B (en) License handling method, system and medium based on big data archive identification
CN110717817A (en) Pre-loan approval method and device, electronic equipment and computer-readable storage medium
CN114842198A (en) Intelligent loss assessment method, device and equipment for vehicle and storage medium
CN112583860B (en) Method, device and equipment for detecting abnormal internet traffic
CN113256401A (en) Method, device, server and storage medium for intercepting user outside pre-loan domain
KR102285964B1 (en) Preference evaluation method and system
CN107818483B (en) Network card and ticket recommendation method and system
CN112434136B (en) Sex classification method, apparatus, electronic device and computer storage medium
CN112529623B (en) Malicious user identification method, device and equipment
CN115115843B (en) Data processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination