WO2017206811A1 - 一种信息处理方法、服务器及非易失性存储介质 - Google Patents

一种信息处理方法、服务器及非易失性存储介质 Download PDF

Info

Publication number
WO2017206811A1
WO2017206811A1 PCT/CN2017/086126 CN2017086126W WO2017206811A1 WO 2017206811 A1 WO2017206811 A1 WO 2017206811A1 CN 2017086126 W CN2017086126 W CN 2017086126W WO 2017206811 A1 WO2017206811 A1 WO 2017206811A1
Authority
WO
WIPO (PCT)
Prior art keywords
terminal
information display
media information
abnormal
display position
Prior art date
Application number
PCT/CN2017/086126
Other languages
English (en)
French (fr)
Inventor
李东豫
彭作杰
刘杰
王春辉
孙宇
李益群
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to EP17805756.8A priority Critical patent/EP3471044A4/en
Priority to JP2018527752A priority patent/JP6628376B2/ja
Publication of WO2017206811A1 publication Critical patent/WO2017206811A1/zh
Priority to US15/989,997 priority patent/US11373205B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0248Avoiding fraud
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0207Discounts or incentives, e.g. coupons or rebates
    • G06Q30/0225Avoiding frauds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0242Determining effectiveness of advertisements
    • G06Q30/0245Surveys
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0242Determining effectiveness of advertisements
    • G06Q30/0246Traffic
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0273Determination of fees for advertising

Definitions

  • the present invention relates to information processing technologies, and in particular, to an information processing method, a server, and a nonvolatile storage medium.
  • the current technical means of cheating mainly include the following: The first one is to forge terminal information by technical means, such as an International Mobile Equipment Identity (IMEI) in an Android system, an Android ID, etc., or for example, Advertising identifier (IDFA), medium access control (MAC) address, etc. in the IOS system.
  • IMEI International Mobile Equipment Identity
  • IDFA Advertising identifier
  • MAC medium access control
  • a mobile terminal can be identified as a plurality of terminals by forged terminal information.
  • the second is to obtain almost all Internet Protocol (IP) resources through technical means.
  • IP Internet Protocol
  • the third is to achieve user click behavior through simulated click technology.
  • the embodiments of the present invention are intended to provide an information processing method, a server, and a non-volatile storage medium, so as to solve the problem that the cheating technical means of media information display in the prior art cannot accurately detect and identify a cheating user.
  • An embodiment of the present invention provides an information processing method, where the method includes:
  • An embodiment of the present invention further provides a server, where the server includes: a data acquiring unit, a data analyzing unit, and a determining unit;
  • the data obtaining unit is configured to obtain first log information in a first time period
  • the data analysis unit is configured to obtain, according to the first log information, terminal information of a terminal that has a click behavior on a media information display position; and determine, according to the terminal information, location information corresponding to the terminal, where the regional information is used by And indicating a region where the terminal is located; and determining whether the number of regions in which the terminal is located within a preset time range is greater than a first threshold;
  • the determining unit is configured to obtain, according to the determination result obtained by the data analysis unit, the first terminal information that the corresponding number of regions is greater than the first threshold, and determine that the terminal corresponding to the first terminal information is an abnormal terminal.
  • the embodiment of the present invention further provides a non-volatile storage medium, which stores program instructions, and when the processor executes the stored program instructions, the information processing method is performed, and the information processing method includes:
  • the information processing method, the server, and the non-volatile storage medium provided by the embodiment of the present invention include: obtaining first log information in a first time period; and obtaining presence of a media information display position based on the first log information Clicking the terminal information of the terminal of the behavior; determining the area information corresponding to the terminal based on the terminal information, wherein the area information is used to indicate the area where the terminal is located; and determining, according to the area information, within a preset time range Whether the number of the regions in which the terminal is located is greater than the first threshold; and the first terminal information that is greater than the first threshold is obtained based on the determination result, and the terminal corresponding to the first terminal information is determined to be an abnormal terminal.
  • the terminal information having the click behavior of the media information display position and the corresponding regional information are analyzed, and the terminal whose number of regions is greater than the first threshold is determined as an abnormal terminal, which effectively solves the prior art.
  • the cheating technical means of media information display cannot accurately detect the problem of identifying cheating users, and greatly improves the accuracy of the clicks of media information display positions.
  • 1 is a schematic diagram of hardware entities of each party performing information interaction in an embodiment of the present invention
  • FIG. 2 is a schematic diagram of a first process of an information processing method according to an embodiment of the present invention
  • FIG. 3 is a schematic diagram of a second process of an information processing method according to an embodiment of the present invention.
  • FIG. 4 is a schematic diagram of a normal click amount curve of a media information display position in an embodiment of the present invention.
  • 5a and 5b are schematic diagrams showing abnormal click amounts of media information display bits in an embodiment of the present invention.
  • FIG. 6 is a scatter diagram of a day-to-night ratio distribution of a media information display bit in an embodiment of the present invention
  • 7a to 7c are respectively schematic diagrams showing a click position distribution of a media information display position
  • FIG. 8 is a schematic diagram showing the relationship between the proportion of abnormal terminals and the number of advertisement slots in the embodiment of the present invention.
  • FIG. 9 is a schematic diagram of a processing procedure of an information processing method according to an embodiment of the present invention.
  • FIG. 10 is a schematic diagram of a processing procedure in an application scenario of an information processing method according to an embodiment of the present disclosure.
  • FIG. 11 is a schematic diagram of a first component structure of a server according to an embodiment of the present invention.
  • FIG. 12 is a schematic diagram of a second component structure of a server according to an embodiment of the present invention.
  • FIG. 13 is a schematic structural diagram of a modularization of a server according to an embodiment of the present invention.
  • FIG. 14 is a diagram showing an example of a server as a hardware entity according to an embodiment of the present invention.
  • FIG. 1 is a schematic diagram of hardware entities of various parties performing information interaction according to an embodiment of the present invention.
  • FIG. 1 includes: a server 11...1n and a terminal device 21-24.
  • the terminal device 21-24 performs information exchange with the server through a wired network or a wireless network.
  • the terminal device may include a mobile phone, a desktop computer, a PC, an all-in-one, and the like.
  • the server can interact with the first type of terminal over the network.
  • the first type of terminal may be, for example, a terminal where the advertiser is located, or an object that provides promotion of the creative and content.
  • the terminal devices 21-24 may be referred to as a second type of terminal relative to the first type of terminal.
  • the second type of terminal may be, for example, a terminal where an ordinary user is located, or an object that is displayed or exposed by an advertisement, and the second type of terminal may be a user who watches a video through a video application, a user who uses a social application, or the like.
  • all the applications installed in the terminal device or specified applications can add advertisements to show the user more recommendation information.
  • the server includes two types of servers; wherein the first type of server is used to provide traffic for media information delivery, which may be referred to as a traffic party in this embodiment.
  • the first type of terminal (such as the terminal where the advertiser is located) needs to purchase traffic from the first type of server (such as the traffic party) to deliver media information through the purchased traffic.
  • the second type of server is used to detect the above behaviors to prevent the traffic party from increasing the amount of clicks on the media information by cheating, thereby damaging the interests of the advertiser.
  • the information processing method of this embodiment is applied to the above-described second type of server or server cluster.
  • the embodiment of the invention provides an information processing method.
  • 2 is a first schematic flowchart of an information processing method according to an embodiment of the present invention; as shown in FIG. 1, the method includes:
  • Step 101 Obtain first log information in the first time period.
  • Step 102 Obtain terminal information of a terminal that has a click behavior on the media information display position based on the first log information.
  • Step 103 Determine, according to the terminal information, area information corresponding to the terminal, where the area information is used to indicate an area where the terminal is located.
  • Step 104 Determine, according to the regional information, whether the number of regions in which the terminal is located within a preset time range is greater than a first threshold.
  • Step 105 Determine, according to the determination result, the first terminal information that the corresponding number of regions is greater than the first threshold, and determine that the terminal corresponding to the first terminal information is an abnormal terminal.
  • the information processing method of the embodiment of the present invention is applied to a media information delivery system.
  • the media information content is specifically, for example, advertisement content or the like.
  • the information processing method of this embodiment is mainly directed to the cheating technical means for forging users in the prior art.
  • a forged user is implemented mainly by transforming an IP address and transforming terminal information.
  • the inventor found that taking the terminal information as the IMEI as an example, the dimensions of the IMEI and the IP address are separately dispersed.
  • the IP address corresponds to the area where the terminal is located (the area can be accurately To the city level, etc., under normal circumstances, for a short period of time (for example, within an hour), the normal terminal will be fixed in a certain area, and the terminal appears in multiple areas is a small probability event. Therefore, in the data representation, it is found that some IMEIs correspond to multiple regions in a short period of time, and the terminals corresponding to these IMEIs may be forged terminals.
  • Table 1 shows the geographical distribution of all IMEIs in the media information delivery platform within an hour. Statistics show that about 97% of IMEI will only appear in one region, a small amount of about 2% of IMEI will appear in two regions, and less than 1% of IMEI will appear in more than two in one hour. In the region, the most IMEI appeared in 261 different regions (not shown in the table).
  • the probability that an IMEI will appear in 3 or more areas per hour is very small, about 0.22%, and can be considered as a terminal information within a preset time range (for example, within an hour).
  • the terminal information may be determined to be forged by the forged terminal information combined with the converted IP address, and belongs to the abnormal terminal.
  • the first threshold may be two. Based on this, in the embodiment, the terminal information in the media information pushing platform and the corresponding regional information are obtained, and the regional information may be specifically represented by a city-level regional range.
  • the first terminal corresponding to the first terminal information may be an abnormal terminal.
  • the first log information is log information in a short time range.
  • the first log information may be hour-level log information, such as one-hour log information; Level log information, which can also be minute level log information and so on.
  • the first log information includes all the information obtained by the media information pushing platform, including the click behavior of the media information display position, the terminal information, the user information, and the location information of the terminal.
  • the terminal information having the click behavior of the media information display position and the corresponding regional information are analyzed, and the terminal whose number of regions is greater than the first threshold is determined as an abnormal terminal, which effectively solves the prior art.
  • the cheating technical means of media information display cannot detect the problem of identifying cheating users, greatly improving the accuracy of the traffic volume of the media information display position; on the other hand, it also protects the interests of media information distributors (such as advertisers). .
  • the embodiment of the invention also provides an information processing method. Based on the information processing scheme of the first embodiment, based on the identified abnormal terminal and the second log information with a long time range, the information processing scheme of the embodiment is mainly used for identifying the abnormal media information display position.
  • FIG. 3 is a schematic diagram of a second flow of an information processing method according to an embodiment of the present invention. As shown in FIG. 3, after the step 105 of the first embodiment, the information processing method includes:
  • Step 106 The number of abnormal terminals corresponding to the first media information display position and the total number of terminals in the media information display position corresponding to the abnormal terminal; wherein the first media information display position is in the media information display position corresponding to the abnormal terminal Any media information display.
  • Step 107 Calculate a first ratio of the number of abnormal terminals to the total number of terminals.
  • Step 108 When the first ratio is greater than the second threshold, mark the first media information display bit as a suspected abnormal media information display bit, and further perform step 208.
  • Step 201 Obtain second log information in the second time period; wherein the time range of the second time period is greater than the time range of the first time period.
  • Step 202 Obtain a first click quantity of the second media information display position in the first preset time range and a second click quantity in the second preset time range according to the second log information;
  • a preset time range is different from the second preset time range.
  • the first preset time range may be used to represent the time range of the day; and the second preset time range may be used to represent the time range of the night.
  • Step 203 Calculate a second ratio of the first click amount to the second click amount.
  • Step 204 When the second ratio is less than the second threshold, determine that the second media information display bit is a suspected abnormal media information display bit, and further perform step 208.
  • Step 205 Obtain click location information of the third media information display position based on the second log information.
  • Step 206 Calculate a first parameter according to the click location information, where the first parameter represents a distribution of a click location of the third media information display position.
  • Step 207 When the first parameter is not in the preset threshold range, determine that the third media information display bit is a suspected abnormal media information display bit, and further perform step 208.
  • Step 208 Calculate, according to the first ratio, the second ratio, the first parameter, and the corresponding weight value, a second parameter corresponding to the suspected abnormal media information display position.
  • Step 209 When the second parameter is greater than the third threshold, determine that the suspected abnormal media information display bit is an abnormal media information display position.
  • the ultimate goal of analyzing traffic party cheating is to obtain a higher amount of media information delivery party (such as an advertiser). Therefore, most traffic parties will maximize the benefits Use as a way to increase the amount of clicks on media information. Then the performance of the click behavior of the media information through the cheating technique is different from the performance of the click behavior of the normal media information. The first difference is that in order to maximize revenue, the traffic party will click on the media information day and night. This is contrary to normal behavior.
  • FIG. 4 is a schematic diagram of a normal click amount curve of a media information display position in an embodiment of the present invention. As shown in Figure 4, it can be seen that between 2 am and 6 am, the number of clicks on the media information display position is low, and from 8 am, the number of clicks on the media information display position is gradually increased to the highest on the day. Level. It will drop again around 23 o'clock in the night. It can be seen that the click behavior of the media information display position is related to the user's work schedule, and the probability of the click behavior of the media information display position when the user is not in the sleep state during the daytime is far greater than that when the user is in the sleep state at night. The probability of the click behavior of the information display bit.
  • the click behavior of the media information display position does not have the day/night difference shown in FIG. 3, but the click behavior is generated regardless of day and night, it is also considered as a small probability event from a statistical point of view.
  • FIG. 5a and FIG. 5b are respectively schematic diagrams of abnormal clicks of media information display bits in an embodiment of the present invention. It can be seen that the rules of the graphs of the normal clicks of the media information display positions shown in FIG. 5a and FIG. 5b and FIG. 4 are different. As shown in FIG. 5a, although the click amount of the media information display position fluctuates up and down, there is no day/night difference, and it can be considered that the click amount of the media information display position is evenly distributed within one day. As shown in Figure 5b, the amount of clicks on the media information display position is higher at 0:00 pm to 7:00 am, and gradually decreases after 7:00 am until 11:00 am, which is close to the lowest level. The normal click behavior of the media information display bits shown in Figure 4 is completely different. Therefore, it can be considered that the click behavior generated by these media information display positions has been artificially intervened and suspected of cheating.
  • time periods may be taken to represent day and night: for example, “night” (user sleep period) It can be defined as a time period between 0:00 AM and 8:00 AM, and "Day" (User awake time) can be defined as a time interval between 8 am and 0 pm.
  • “day” is a first preset time range
  • the first preset time range represents a time period in which the user is in the awake state
  • “night” is the second preset time range
  • the second time is Range indicates when the user is asleep segment.
  • the first click amount of the second media information display position in the first preset time range and the second click amount in the second preset time range are counted; wherein the second media information display position Any media information display space in the platform for the media information push.
  • the second threshold is a value less than or equal to 1, and the second ratio is smaller than the second threshold, indicating that the "day" click amount of the second media information display position is smaller than the "night” click amount.
  • the portion with the abscissa less than 1 is an active position that is more active at night than during the day, and the ordinate indicates the number of advertisement slots. It can be seen that some advertisements are active at night, excluding some special applications. Aside, a large part is cheating ad slots.
  • the second difference is that the location of the media information display position of each user's click can be reported in the Software Development Kit (SDK), and the server will count each The distribution of click coordinates of the media information display positions. Through statistical analysis, it is found that the location distribution of the media information display position by cheating means will be different from the normal media information display position.
  • SDK Software Development Kit
  • FIG. 7a to 7c are schematic diagrams showing the distribution of click positions of media information display positions, respectively.
  • FIG. 7a is a schematic diagram showing a click position distribution of a normal media information display position.
  • a click position of a normal media information display position has a certain hot spot distribution according to the style and content of the media information. For example, the coordinates of the media information display bits in some areas are scattered in a dotted manner; while the media information display positions in some areas are more concerned by the users, and the coordinates are distributed in a centralized manner.
  • the click position distribution of the abnormal media information display position can be as shown in Fig. 7b and Fig.
  • the click behavior of the media information display position due to cheating is derived from the programmatic fixed mode, and the click position will exhibit a certain regularity.
  • the distribution of the click position of the abnormal media information display position tends to be scatter or concentrated, as shown in FIG. 7b and FIG. 7c respectively; wherein, FIG. 7b shows the scatter-like click position distribution.
  • FIG. 7b shows the scatter-like click position distribution.
  • Fig. 7c shows a concentrated click position distribution
  • the thickness of the line in Fig. 7c and the virtual reality state respectively indicate different clicks; for example, a thin solid line indicates a click amount; a thick solid line indicates a second click amount; and a broken line indicates a first click amount; Three clicks.
  • the third media message Calculating a first parameter, where the first parameter represents a distribution of a click position of the third media information display position; and when the first parameter does not meet a preset condition, determining a third medium
  • the information display position is the suspected abnormal media information display position.
  • the first parameter may be represented by an entropy value, that is, an entropy algorithm is used to identify a media information display media information display bit.
  • the distribution of the click position of the abnormal media information display position is scatter-like, as shown in FIG. 7b, the coordinate points are also uniformly distributed from the horizontal direction and the vertical direction. Therefore, if it can be distinguished that the distribution in the horizontal direction or the vertical direction is uniform, it can be recognized that the distribution of the click position is scatter, that is, the click position of the media information display position is abnormal.
  • Uniform distribution maximizes entropy values given a range of horizontal and vertical ranges. Entropy is used to describe the degree of uniformity of the click distribution. Taking the horizontal direction as an example, the calculation method is as follows:
  • x represents the abscissa of the click position
  • p(x) represents the probability when the abscissa of the click position is x
  • H(x) represents the entropy value of the click position of the media information display bit.
  • x and y represent the abscissa and ordinate of the click position, respectively;
  • p(x) represents the probability when the abscissa of the click position is x;
  • p(x, y) indicates that the abscissa of the click position is x, and the ordinate is The probability of y;
  • X) represents the entropy value of the click position of the media information display bit.
  • the first parameter may specifically be an entropy value; and displaying a bit for different types of media information (such as a banner, a screen insertion) Setting the corresponding preset threshold range separately; when the first parameter (for example, the entropy value) of the calculated media information display bit is not within the preset threshold range, determining that the media information display bit is suspected abnormal media information Display position.
  • the entropy value is about 8 bits or so, which can be considered to represent a uniform distribution under certain circumstances, and may be cheating.
  • the placard advertisement bit can be determined as a suspected abnormal media information display position.
  • the third difference is that, for the abnormal terminal identified in the first embodiment, assuming that one advertisement bit has 100 clicks per day, and 10% of the clicks are from such IMEI, then the following can be adopted.
  • P is a value that is infinitely close to 0, indicating that this situation is a very small probability event. If there is a large amount of such IMEI in the traffic of an advertisement space, then it is very likely that it is a cheating advertisement. Big.
  • the distribution in Figure 8 also illustrates this. In the normal ad slot, the proportion of abnormal terminals is very small.
  • the terminal is an abnormal terminal.
  • the number of abnormal terminals corresponding to the first media information display position and the total number of terminals in the media information display position corresponding to the abnormal terminal are counted;
  • the first media information display position is a media information display bit in the media information display position corresponding to the abnormal terminal; calculating a first ratio of the number of abnormal terminals to the total number of terminals; and when the first ratio is greater than the second threshold, the first media information The display bit is marked as a suspected abnormal media information display bit.
  • a suspected abnormal media information display bit marked based on the above three manners that is, a second ratio obtained based on the day and night click behavior, a first parameter obtained based on the click position distribution condition, and a first ratio obtained based on the abnormal terminal (ie, the user dimension) And calculating a second parameter corresponding to the abnormal media information display position according to the preset weight value; wherein the weight value corresponding to the first parameter may be relatively large; multiplying each parameter by the corresponding weight value and The results are summed to finally obtain the second parameter. Comparing whether the second parameter is greater than a third threshold; and when the second parameter is greater than the third threshold, determining that the first suspected abnormal media information display bit is an abnormal media information display bit.
  • FIG. 9 is a schematic diagram of a processing procedure of an information processing method according to an embodiment of the present invention.
  • the log information acquired by the log information in the log system includes a hourly log and a logistic log.
  • the hourly log may be, for example, the first log information described in this embodiment
  • the log may be, for example, the second log information described in this embodiment.
  • the method further includes: The click behavior of the media information display bit corresponding to the abnormal terminal is invalid.
  • the cheating media information display position (such as an advertisement space) is identified based on the sky-level log, and the abnormal media information display position is determined in the manner described in Embodiment 2. Further, the determining the first suspected abnormal media After the information display position is the abnormal media information display position, the method further includes: marking the click behavior for the abnormal media information display position as invalid.
  • the terminal information having the clicked behavior of the media information display position and the corresponding regional information are analyzed, and the terminal whose number of regions is greater than the first threshold is determined as an abnormal terminal;
  • the abnormal media information display position is identified. Therefore, the problem that the cheating technical means of the media information display in the prior art can effectively detect the problem of identifying the cheating user is effectively solved, and the hit amount of the media information display position that cannot be accurately obtained in the prior art is also solved, which is greatly improved.
  • the accuracy of the traffic volume of the media information display while also protecting the interests of media information providers (such as advertisers).
  • FIG. 10 is a schematic diagram of a processing procedure in an application scenario of an information processing method according to an embodiment of the present invention.
  • the information processing solution in the application scenario includes two parts: an abnormal user identification and a penalty process, which specifically includes steps 41 to 43; the identification of the abnormal advertisement space and the penalty process, including steps 51 to 57 and steps 44, 45 to 54.
  • the identification of the abnormal user and the penalty process specifically include:
  • Step 41 Obtain hourly log information.
  • the server can set an hourly timer; each time the timer expires, the IMEI and the corresponding address information in the log system within an hour time range are obtained.
  • Step 42 Calculate the abnormal IMEI that occurs multiple times per hour.
  • the geographical information of each IMEI occurring within the current hour range may be counted; when it is determined that the number of regions in which an IMEI appears within one hour range is reached.
  • the threshold is preset, the IMEI is determined to be an abnormal IMEI.
  • the preset threshold can be 3
  • Step 43 The abnormal user pushes the online penalty.
  • the terminal corresponding to the abnormal IMEI is determined to be an abnormal terminal, and the user corresponding to the abnormal IMEI may be determined as an abnormal user.
  • the abnormal user push online penalty specifically includes: marking the click behavior corresponding to the abnormal terminal as invalid.
  • the identification and penalty process of the abnormal advertising space specifically includes three parts: the first part is to determine the suspected abnormal advertising position by identifying the ratio of the number of abnormal users in the advertising space; the second part is the day and night traffic by identifying the advertising space. The third part is to determine the suspected abnormal advertising position by the coordinate distribution of the click by the advertising space; and then the suspected abnormal advertising positions determined by the above three methods are counted to determine the final abnormal advertising position. Specifically include:
  • Step 44 Aggregate the abnormal IMEI by the advertisement slot, count the number of abnormal users and the total number of users in the click of the advertisement slot.
  • Step 45 Determine whether the proportion of the number of abnormal users and the total number of users exceeds a preset threshold; when it is determined that the proportion of the number of abnormal users and the total number of users exceeds a preset threshold, determining that the advertisement slot is a suspected abnormal advertisement position, The suspected abnormal advertisement slot is pushed to the determination of the abnormal advertisement slot in step 54.
  • the proportion of abnormal users is small; if the click of an advertisement position is abnormal, the advertisement position is determined to be suspect. Abnormal ad slot.
  • Step 51 Obtain a day level log.
  • the server can obtain the log information of the previous day by a fixed time every day.
  • Step 52 Calculate the day/night click ratio of each ad slot based on the obtained day level log.
  • Step 53 Determine whether the day/night click volume ratio exceeds a preset threshold.
  • the advertisement slot may be determined to be a suspected abnormal advertisement slot, and the suspected abnormal advertisement slot is pushed to step 54. The determination of an abnormal ad slot.
  • Step 55 Based on the obtained day-level log, the advertisement position is clicked on the coordinate distribution.
  • Step 56 Calculate parameters for characterizing the concentration and dispersion degree of the click coordinates, and further perform the steps 54.
  • the parameter for calculating the concentration and the degree of dispersion of the click coordinates may be determined by calculating the entropy of the click coordinates of the advertisement slot.
  • Step 53 Determine whether the parameter is greater than a preset threshold.
  • the advertisement position may be determined to be a suspected abnormal advertisement position, and the suspected abnormal advertisement position is pushed to the abnormal advertising in step 54. The judgment of the bit.
  • the preset threshold for comparison is different from the preset threshold for comparison with the day-night click ratio.
  • Step 54 Perform the determination of the abnormal advertisement position according to the suspected abnormal advertisement position determined by the above three methods.
  • the abnormal advertisement position may be pre-configured with the weight value corresponding to the suspected abnormal advertisement position determined by the above three methods.
  • the suspected abnormal advertisement positions determined by the above three methods correspond to three parameters: a second ratio representing the day and night clicks obtained based on the day and night click behavior, and a first parameter indicating the concentration or the degree of dispersion of the click coordinates obtained based on the click position distribution.
  • the suspected abnormal advertisement slot is determined to be an abnormal advertisement slot.
  • Step 57 Push the abnormal advertisement position to the online penalty.
  • pushing the abnormal advertisement position online penalty specifically includes: marking the click behavior corresponding to the abnormal advertisement position as invalid.
  • FIG. 11 is a schematic diagram of a first component structure of a server according to an embodiment of the present invention.
  • the server includes a data acquisition unit 31, a data analysis unit 32, and a determination unit 33.
  • the data obtaining unit 31 is configured to obtain first log information in a first time period
  • the data analyzing unit 32 is configured to obtain, according to the first log information, terminal information of a terminal that has a click behavior on a media information display position, and determine, according to the terminal information, region information corresponding to the terminal, where the regional information is And indicating, by the location where the terminal is located, and determining whether the number of regions in which the terminal is located within a preset time range is greater than a first threshold;
  • the determining unit 33 is configured to obtain a pair based on the determination result obtained by the data analyzing unit 32.
  • the first terminal information that is greater than the first threshold is determined, and the terminal corresponding to the first terminal information is determined to be an abnormal terminal.
  • the first log information is log information in a short time range.
  • the first log information may be hour-level log information, such as one-hour log information; Level log information, which can also be minute level log information and so on.
  • the first log information includes all the information obtained by the media information pushing platform, including the click behavior of the media information display position, the terminal information, the user information, and the location information of the terminal.
  • the data analysis unit 32 obtains the terminal information in the media information push platform and the corresponding region information, and the region information may be specifically represented by a city-level region range.
  • the determining unit 33 may determine that the first terminal corresponding to the first terminal information is an abnormal terminal, when the number of the first terminal information is lower than the first threshold in the preset time range (for example, one hour).
  • the first threshold may be three.
  • the server further includes a first penalty unit 35, configured to: after the determining unit 33 determines that the terminal corresponding to the first terminal information is an abnormal terminal, display the media information corresponding to the abnormal terminal. The click behavior is marked as invalid.
  • FIG. 12 is a schematic diagram of a second component structure of a server according to an embodiment of the present invention; as shown in FIG. 12, the server includes: a data acquiring unit 31, and data.
  • the data obtaining unit 31 is configured to obtain first log information in a first time period
  • the data analyzing unit 32 is configured to obtain, according to the first log information obtained by the data acquiring unit 31, terminal information of a terminal that has a click behavior on a media information display position; and determine, according to the terminal information, the terminal corresponding to the terminal Territory information, wherein the area information is used to indicate an area in which the terminal is located; and determining whether the number of areas in which the terminal is located within a preset time range is large
  • the first threshold is also used to collect the number of abnormal terminals corresponding to the first media information display position and the total number of terminals in the media information display position corresponding to the abnormal terminal, wherein the first media information display position is the media corresponding to the abnormal terminal. Any media information display position in the information display position; calculating a first ratio of the number of abnormal terminals to the total number of terminals;
  • the determining unit 33 is configured to obtain, according to the determination result obtained by the data analysis unit 32, the first terminal information that the corresponding number of regions is greater than the first threshold, and determine that the terminal corresponding to the first terminal information is an abnormal terminal; When the first ratio is greater than the second threshold, marking the first media information display bit as a suspected abnormal media information display position;
  • the data obtaining unit 31 is further configured to obtain second log information in the second time period; wherein the time range of the second time period is greater than the time range of the first time period;
  • the data analyzing unit 32 is further configured to obtain, according to the second log information, a first click amount of the second media information display position in the first preset time range and a second click in the second preset time range. And the first preset time range is different from the second preset time range; calculating a second ratio of the first click amount to the second click amount; further configured to be based on the second
  • the log information obtains the click location information of the third media information display position; the first parameter is calculated according to the click location information of the third media information display position, where the first parameter represents the click location of the third media information display position a distribution condition; determining whether the first parameter is within a preset threshold range;
  • the determining unit 33 is further configured to: when the second ratio is less than the second threshold, determine that the second media information display bit is a suspected abnormal media information display bit; and is further configured to: when the first parameter is not preset When the threshold is within the range, determining that the third media information display position is a suspected abnormal media information display position;
  • the setting unit 34 is configured to set a corresponding weight value for the first ratio, the second ratio, and the first parameter, respectively;
  • the determining unit 33 is further configured to calculate, according to the first ratio, the second ratio, and the first parameter, a second parameter corresponding to the suspected abnormal media information display position according to the corresponding weight value; When the second parameter is greater than the third threshold, determining that the suspected abnormal media information display bit is an abnormal media information display position.
  • the data obtaining unit 31 acquires the first log information and the second log information, where the first log information is log information in a short time range, and is implemented as an implementation.
  • the first log information may be hourly log information, such as one hour of log information; of course, not limited to hourly log information, or minute log information, and the like.
  • the time range of the second log information is greater than the time range of the first log information; and the second log information has a longer time range relative to the first log information.
  • the second log information may be log information of the day; of course, it is not limited to log information in a time range of one day, or may be log information in a range of ten days or the like.
  • the data analyzing unit 32 displays the first click amount (ie, the daily click amount) of the media information in the first preset time range according to the second log information, and the second pre- The second click amount (ie, the night hit amount) in the time range is analyzed, and the ratio of the daily hit amount to the night hit amount is calculated.
  • the determining unit 33 determines that the corresponding media information display bit is a suspected abnormal media information display bit when the ratio is less than the second threshold.
  • the data analysis unit 32 analyzes the click position information of the media information display position based on the second log information, and calculates a first parameter (ie, an entropy value) based on the click position information of the media information display position.
  • the determining unit 33 determines that the third media information display bit is a suspected abnormal media information display bit when the first parameter (ie, the entropy value) is not within the preset threshold range.
  • the data analysis unit 32 analyzes the media information display bits involved in the abnormal terminal for the abnormal terminal determined by the determining unit 33, and calculates a ratio of the abnormal terminal to the total amount of the terminal in the media information display position; When the ratio is greater than the second threshold, the determining unit 33 marks the first media information display bit as a suspected abnormal media information display bit.
  • the determining unit 33 respectively calculates a second parameter corresponding to the display position of the abnormal media information based on the weight value set by the setting unit 34; wherein the weight value corresponding to the first parameter may be relatively large And multiplying each parameter by a corresponding weight value and adding the results, and finally obtaining the second parameter. Comparing whether the second parameter is greater than a third threshold; and when the second parameter is greater than the third threshold, determining that the first suspected abnormal media information display bit is an abnormal media information display bit.
  • the server further includes a first penalty unit 35, configured to: after the determining unit 33 determines that the terminal corresponding to the first terminal information is an abnormal terminal, display the media information corresponding to the abnormal terminal. The click behavior is marked as invalid.
  • the server includes a second penalty unit 36, where the determining unit 33 determines that the suspected abnormal media information display position is an abnormal media information display position, and then clicks on the abnormal media information display position. Behavior is marked as invalid.
  • the data obtaining unit 31, the data analyzing unit 32, the determining unit 33, the setting unit 34, the first penalty unit 35, and the second penalty unit 36 in the server are in practical applications.
  • FIG. 13 is a schematic structural diagram of a server modularization according to an embodiment of the present invention.
  • a cheating terminal identification module, a click day/night ratio anomaly identification module, and a click location distribution abnormality identification module respectively obtain log information from a log system;
  • the cheat terminal identification module obtains the hourly log information; the day and night than the abnormality recognition module and the click location distribution abnormality identification module respectively obtain the heavenly log information.
  • the policy configuration information configures the cheat determination policy to the policy update module.
  • the cheating terminal identification module, the click day/night ratio abnormality recognition module, and the click location distribution abnormality identification module perform analysis and identification according to the cheating determination strategy in the policy update module; the policy update module obtains the abnormal terminal and the abnormal media information display position (ie, the abnormal advertisement bit).
  • the online real-time system After receiving the request of the billing module, the online real-time system performs the penalty according to the obtained judgment result, and the click behavior of the abnormal terminal is invalidated, and the click behavior for the abnormal media information display position is invalidated; The module does not charge for click actions that are marked as invalid.
  • the server includes a processor 61, a storage medium 62, and at least one external communication interface 63; the processor 61, the storage medium 62, and the external communication interface 63 are all connected by a bus 64.
  • the disclosed apparatus and method may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner such as: multiple units or components may be combined, or Can be integrated into another system, or some features can be ignored or not executed.
  • the coupling, or direct coupling, or communication connection of the components shown or discussed may be indirect coupling or communication connection through some interfaces, devices or units, and may be electrical, mechanical or other forms. of.
  • the units described above as separate components may or may not be physically separated, and the components displayed as the unit may or may not be physical units, that is, may be located in one place or distributed to multiple network units; Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may be separately used as one unit, or two or more units may be integrated into one unit;
  • the unit can be implemented in the form of hardware or in the form of hardware plus software functional units.
  • the foregoing program may be stored in a computer readable storage medium, and the program is executed when executed.
  • the foregoing storage device includes the following steps: the foregoing storage medium includes: a mobile storage device, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk.
  • ROM read-only memory
  • RAM random access memory
  • magnetic disk or an optical disk.
  • optical disk A medium that can store program code.
  • the above-described integrated unit of the present invention may be stored in a computer readable storage medium if it is implemented in the form of a software function module and sold or used as a standalone product.
  • the technical solution of the embodiments of the present invention may be embodied in the form of a software product in essence or in the form of a software product stored in a storage medium, including a plurality of instructions.
  • Make a computer device (can be a personal computer, a server, Either network device or the like) performs all or part of the methods described in various embodiments of the invention.
  • the foregoing storage medium includes various media that can store program codes, such as a mobile storage device, a ROM, a RAM, a magnetic disk, or an optical disk.

Landscapes

  • Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Engineering & Computer Science (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Economics (AREA)
  • Game Theory and Decision Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Transfer Between Computers (AREA)
  • Telephonic Communication Services (AREA)
  • Computer And Data Communications (AREA)

Abstract

本发明实施例公开了一种信息处理方法、服务器及非易失性存储介质。所述方法包括:获得第一时间段内的第一日志信息;基于所述第一日志信息获得对媒体信息展示位存在点击行为的终端的终端信息;基于所述终端信息确定所述终端对应的地域信息,其中所述地域信息用于表示所述终端所处的地域;根据所述地域信息判断在预设时间范围内所述终端所处的地域的数量是否大于第一阈值;基于判断结果获得对应的地域数量大于第一阈值的第一终端信息,判定所述第一终端信息对应的终端为异常终端。

Description

一种信息处理方法、服务器及非易失性存储介质
本申请要求于2016年6月2日提交中国专利局,申请号为201610389956.4,发明名称为“一种信息处理方法及服务器”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及信息处理技术,具体涉及一种信息处理方法、服务器及非易失性存储介质。
背景技术
在移动终端的媒体信息(例如广告)的展示中,由于经济利益的原因,作弊的手段随着反作弊的手段的提升不断升级。作弊方为了获得更多的收入,需要上报更多的点击量,其中最直接的方式是伪造新用户。目前作弊的技术手段主要包括以下几种:第一种是通过技术手段伪造终端信息,所述终端信息例如在安卓(Android)系统中的国际移动设备标识(IMEI)、Android ID等,或者例如在IOS系统中的广告标示符(IDFA)、媒介访问控制(MAC)地址等。通过伪造的终端信息可以使一台移动终端被识别为多个终端。第二种是通过技术手段获得几乎所有的网络互联协议(IP)资源。第三种是通过模拟点击技术实现用户的点击行为。
这样,采用上述三种作弊的技术手段及其结合,采用现有的反作弊检测策略是无法准确检测识别出作弊用户的,进而也无法准确的统计获得媒体信息展示位的点击量。然而,相关技术中,对于该问题,尚无有效解决方案。
发明内容
本发明实施例期望提供了一种信息处理方法、服务器及非易失性存储介质,以解决现有技术中的媒体信息展示的作弊技术手段无法准确检测识别出作弊用户的问题。
为达到上述目的,本发明实施例的技术方案是这样实现的:
本发明实施例提供了一种信息处理方法,所述方法包括:
获得第一时间段内的第一日志信息;
基于所述第一日志信息获得对媒体信息展示位存在点击行为的终端的终端信息;
基于所述终端信息确定所述终端对应的地域信息;其中所述地域信息用于表示所述终端所处的地域;
根据所述地域信息判断预设时间范围内所述终端所述的地域的数量是否大于第一阈值;以及
基于判断结果获得对应的地域数量大于第一阈值的第一终端信息,判定所述第一终端信息对应的终端为异常终端。
本发明实施例还提供了一种服务器,所述服务器包括:数据获取单元、数据分析单元和确定单元;其中,
所述数据获取单元,用于获得第一时间段内的第一日志信息;
所述数据分析单元,用于基于所述第一日志信息获得对媒体信息展示位存在点击行为的终端的终端信息;基于所述终端信息确定所述终端对应的地域信息,其中所述地域信息用于表示所述终端所处的地域;以及判断在预设时间范围内所述终端所处的地域的数量是否大于第一阈值;以及
所述确定单元,用于基于所述数据分析单元获得的判断结果获得对应的地域数量大于第一阈值的第一终端信息,判定所述第一终端信息对应的终端为异常终端。
本发明实施例还提供了一种非易失性存储介质,存储有程序指令,处理器执行所存储的程序指令时执行信息处理方法,该信息处理方法包括:
获得第一时间段内的第一日志信息;
基于所述第一日志信息获得对媒体信息展示位存在点击行为的终端的终端信息;
基于所述终端信息确定所述终端对应的地域信息;其中所述地域信息用于表示所述终端所处的地域;
根据所述地域信息判断预设时间范围内所述终端所述的地域的数量是否 大于第一阈值;以及
基于判断结果获得对应的地域数量大于第一阈值的第一终端信息,判定所述第一终端信息对应的终端为异常终端。
本发明实施例提供的信息处理方法、服务器及非易失性存储介质,所述方法包括:获得第一时间段内的第一日志信息;基于所述第一日志信息获得对媒体信息展示位存在点击行为的终端的终端信息;基于所述终端信息确定所述终端对应的地域信息,其中所述地域信息用于表示所述终端所处的地域;根据所述地域信息判断在预设时间范围内所述终端所处的地域的数量是否大于第一阈值;基于判断结果获得对应的地域数量大于第一阈值的第一终端信息,判定所述第一终端信息对应的终端为异常终端。采用本发明实施例的技术方案,通过对媒体信息展示位存在点击行为的终端信息以及对应的地域信息进行分析,将地域数量大于第一阈值的终端判定为异常终端,有效的解决了现有技术中的媒体信息展示的作弊技术手段无法准确检测识别出作弊用户的问题,大大提升了媒体信息展示位的点击量的准确率。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本发明实施例中进行信息交互的各方硬件实体的示意图;
图2为本发明实施例的信息处理方法的第一种流程示意图;
图3为本发明实施例的信息处理方法的第二种流程示意图;
图4为本发明实施例中的媒体信息展示位的正常点击量曲线示意图;
图5a和图5b分别为本发明实施例中的媒体信息展示位的异常点击量示意图;
图6为本发明实施例中的媒体信息展示位点击量日夜比分布散点图;
图7a至图7c分别为媒体信息展示位的点击位置分布示意图;
图8为本发明实施例中异常终端占比与广告位数量的比重关系示意图;
图9为本发明实施例的信息处理方法的一种处理过程示意图;
图10为本发明实施例的信息处理方法的一种应用场景下的处理过程示意图;
图11为本发明实施例的服务器的第一种组成结构示意图;
图12为本发明实施例的服务器的第二种组成结构示意图;
图13为本发明实施例的服务器模块化的具体结构示意图;
图14为本发明实施例的服务器作为硬件实体的一个示例图。
具体实施方式
下面结合附图及具体实施例对本发明作进一步详细的说明。
图1为本发明实施例中进行信息交互的各方硬件实体的示意图,图1中包括:服务器11……1n和终端设备21-24。终端设备21-24通过有线网络或者无线网络与服务器进行信息交互。终端设备可以包括手机、台式机、PC机、一体机等类型。一个示例中,服务器可以通过网络与第一类终端进行交互。第一类型终端例如可以为广告主所在的终端,或称为提供广告素材和内容推广的对象。第一类终端将想要投放的广告提交后,被存储在服务器或服务器集群中,可以配备技术人员对第一类终端投放的广告进行审核等一系列处理。其中,相对于第一类终端而言,终端设备21-24可以称为第二类终端。该第二类终端可以为例如普通用户所在的终端,或称为广告展示或曝光的对象,该第二类终端可以为通过视频应用看视频的用户、使用社交应用的用户等等。其中,终端设备中安装的所有应用或者指定的应用(如游戏应用,视频应用,导航应用等等)都可以添加广告以展示给用户更多的推荐信息。
在本实施例中,所述服务器包括两类服务器;其中,第一类服务器用于提供媒体信息投放的流量,在本实施例中可称为流量方。第一类终端(如广告主所在的终端)需从第一类服务器中(如流量方)购买流量,以通过所购买的流量投放媒体信息。第二类服务器用于对上述行为进行作弊检测,以防止流量方通过作弊手段增加媒体信息的点击量,从而损害广告主的利益。本实施例的信息处理方法应用于上述第二类服务器或服务器集群中。
实施例一
本发明实施例提供了一种信息处理方法。图2为本发明实施例的信息处理方法的第一种流程示意图;如图1所示,所述方法包括:
步骤101:获得第一时间段内的第一日志信息。
步骤102:基于所述第一日志信息获得对媒体信息展示位存在点击行为的终端的终端信息。
步骤103:基于所述终端信息确定所述终端对应的地域信息,其中所述地域信息用于表示所述终端所处的地域。
步骤104:根据所述地域信息判断预设时间范围内所述终端所处的地域的数量是否大于第一阈值。
步骤105:基于判断结果获得对应的地域数量大于第一阈值的第一终端信息,判定所述第一终端信息对应的终端为异常终端。
本发明实施例的信息处理方法应用于媒体信息投放系统。所述媒体信息内容具体例如广告内容等。
本实施例的信息处理方法主要针对现有技术中伪造用户的作弊技术手段。现有技术中主要通过变换IP地址与变换终端信息结合的方式实现伪造用户。发明人发现,以终端信息为IMEI为例,单独看IMEI与IP地址的维度都是分散的,然而如果将IMEI与IP地址合并来看,将IP地址对应于终端所在的区域(该区域可精确到市级等),正常情况下,某一小段时间内(例如一个小时内),正常的终端会固定在某一区域内,该终端出现在多个区域的是小概率事件。因而在数据表现上,找出在一小段时间内,某些IMEI对应多个区域的情况,这些IMEI对应的终端和可能是伪造出的终端。
表1为某一小时内,媒体信息投放平台中所有IMEI出现地域的分布情况。统计显示,约97%的IMEI只会出现在1个地区,有少量的约2%的IMEI会出现在2个地区,而还有不到1%的IMEI会在1小时内出现在2个以上的地区,最多的一个IMEI出现在261个不同的地区(表中未示出)。
出现地区数 IMEI数量 占比
等于1 513457 97.78%
等于2 10530 2.00%
大于2 1127 0.22%
表1
可以看出,每小时内一个IMEI会出现在3个或3个以上的区域概率很小,约在0.22%左右,可以认为在预设时间范围内(例如一个小时内),如果一个终端信息出现的地域数量大于第一阈值时,可以认定所述终端信息为通过伪造终端信息结合变换IP地址伪造的,属于异常终端。该第一阈值可以为2。基于此,本实施例中通过获得媒体信息推送平台中的终端信息以及对应的地域信息,所述地域信息具体可通过市级区域范围表示。当低于预设时间范围(例如一个小时)内第一终端信息对应的低于数量大于第一阈值(例如3)时,可确定所述第一终端信息对应的第一终端为异常终端。
本实施例中,所述第一日志信息为较短时间范围内日志信息,作为一种实施方式,所述第一日志信息可以为小时级日志信息,例如一小时的日志信息;当然不限于小时级日志信息,也可以为分钟级日志信息等等。其中,所述第一日志信息包括媒体信息推送平台获得的所有信息,包括媒体信息展示位的点击行为、终端信息、用户信息以及终端所在的位置信息等等。
采用本发明实施例的技术方案,通过对媒体信息展示位存在点击行为的终端信息以及对应的地域信息进行分析,将地域数量大于第一阈值的终端判定为异常终端,有效的解决了现有技术中的媒体信息展示的作弊技术手段无法确检测识别出作弊用户的问题,大大提升了媒体信息展示位的点击量的准确率;另一方面也保护了媒体信息投放方(例如广告主)的利益。
实施例二
本发明实施例还提供了一种信息处理方法。基于实施例一的信息处理方案,基于识别出的异常终端,结合时间范围较长的第二日志信息,本实施例的信息处理方案主要用于对异常媒体信息展示位的识别。图3为本发明实施例的信息处理方法的第二种流程示意图。如图3所示,在实施例一的步骤105之后,所述信息处理方法包括:
步骤106:统计异常终端对应的媒体信息展示位中,第一媒体信息展示位对应的异常终端数量和终端总数量;其中,所述第一媒体信息展示位为异常终端对应的媒体信息展示位中任一媒体信息展示位。
步骤107:计算所述异常终端数量与终端总数量的第一比值。
步骤108:当所述第一比值大于第二阈值时,将所述第一媒体信息展示位标记为疑似异常媒体信息展示位,进一步执行步骤208。
步骤201:获得第二时间段内的第二日志信息;其中所述第二时间段的时间范围大于第一时间段的时间范围。
步骤202:基于所述第二日志信息获得第二媒体信息展示位在第一预设时间范围内的第一点击量以及在第二预设时间范围内的第二点击量;其中,所述第一预设时间范围与所述第二预设时间范围不同。具体地,第一预设时间范围可以用来表征白天的时间范围;以及第二预设时间范围可以用来表征夜晚的时间范围。
步骤203:计算所述第一点击量与所述第二点击量的第二比值。
步骤204:当所述第二比值小于第二阈值时,判定所述第二媒体信息展示位为疑似异常媒体信息展示位,进一步执行步骤208。
步骤205:基于所述第二日志信息获得第三媒体信息展示位的点击位置信息。
步骤206:根据所述点击位置信息计算第一参数,所述第一参数表征所述第三媒体信息展示位的点击位置的分布情况。
步骤207:当所述第一参数不在预设阈值范围内时,确定第三媒体信息展示位为疑似异常媒体信息展示位,进一步执行步骤208。
步骤208:根据所述第一比值、所述第二比值、所述第一参数按、以及对应的权重值进行计算获得疑似异常媒体信息展示位对应的第二参数。
步骤209:当所述第二参数大于第三阈值时,确定所述疑似异常媒体信息展示位为异常媒体信息展示位。
本实施例中,分析流量方作弊的终极目的,无非是要拿到更高额的媒体信息投放方(例如广告主)的投放费用。因此,多数流量方会最大限度地利 用作弊方式增加媒体信息的点击量。那么通过作弊技术的媒体信息的点击行为的表现与正常的媒体信息的点击行为的表现就有所不同。作为其中的第一种差异在于,为了最大化收益,流量方会不分昼夜地对媒体信息进行点击。这与正常行为相悖。
图4为本发明实施例中的媒体信息展示位的正常点击量曲线示意图。如图4所示,可以看出在凌晨2点到6点之间,媒体信息展示位的点击行为数量较低,而从早上8点开始,媒体信息展示位的点击行为数量逐渐增加到当天最高水平。在夜里23点左右又会有所下降。由此可见,媒体信息展示位的点击行为是与用户的作息时间相关,在白天用户未处于睡眠状态时产生媒体信息展示位的点击行为的概率要远远大于在夜晚用户处于睡眠状态时产生媒体信息展示位的点击行为的概率。
基于此,如果有媒体信息展示位的点击行为不存在图3所示的日夜区别,而是不分昼夜的产生点击行为,则从统计学的角度也会被认为是小概率事件。
图5a和图5b分别为本发明实施例中的媒体信息展示位的异常点击量示意图。可以看出,图5a和图5b与图4所示的媒体信息展示位的正常点击量的曲线示意图的规律是不同的。具体如图5a所示,媒体信息展示位的点击量虽然存在上下波动,但并无日夜区别,可以认为在一天的时间内媒体信息展示位的点击量是平均分布的。而如图5b所示,媒体信息展示位的点击量在夜里0点以后至上午7点以前均在较高的数值,在上午7点以后逐渐下降直至上午11点接近最低水平,这是与图4所示的媒体信息展示位的正常点击行为分布式完全相悖的。由此可以认为这些媒体信息展示位产生的点击行为都经过人为的干预,有作弊的嫌疑。
根据图4所示的媒体信息展示位的正常点击行为的分布规律,本实施例步骤202至步骤204中,可以取几个有代表性的时段表示日夜:例如,“夜”(用户睡眠时段)可定义为夜间0点到上午8点之间的时段,而“日”(用户清醒时段)可定义为上午8点至夜间0点之间的时段。当然,本实施例中,“日”为第一预设时间范围,所述第一预设时间范围表示用户处于清醒状态的时段;“夜”为第二预设时间范围,所述第二时间范围表示用户处于睡眠状态的时 段。本实施例中,统计第二媒体信息展示位在第一预设时间范围内的第一点击量以及在第二预设时间范围内的第二点击量;其中,所述第二媒体信息展示位为媒体信息推送平台中的任一媒体信息展示位。计算所述第一点击量和所述第二点击量的第二比值;当所述第二比值小于第二阈值时,判定所述第二媒体信息展示位为疑似异常媒体信息展示位。其中,所述第二阈值为小于等于1的数值,所述第二比值小于所述第二阈值,表明所述第二媒体信息展示位的“日”点击量小于“夜”点击量。具体可如图6所示,处于横坐标小于1的部分为夜间比日间更活跃的广告位,纵坐标表示广告位的数量,可以看出有部分广告位在夜间活跃,排除一些特殊的应用之外,很大一部分是作弊的广告位。
本实施例中,作为其中的第二种差异在于,通过在软件开发工具包(SDK,Software Development Kit)可以将每个用户的点击的媒体信息展示位的位置进行上报,服务端会统计出每个媒体信息展示位的点击坐标分布情况。通过统计分析发现,通过作弊手段的媒体信息展示位的点击位置分布会与正常媒体信息展示位有所不同。
图7a至图7c分别为媒体信息展示位的点击位置分布示意图。图7a为正常的媒体信息展示位的点击位置分布示意图,如图7a所示,正常的媒体信息展示位的点击位置会根据媒体信息的样式、内容等具有一定的热点分布。例如某些区域的媒体信息展示位的坐标呈散点状分布;而某些区域的媒体信息展示位较受到用户的关注,其坐标呈集中状同分布。而异常的媒体信息展示位的点击位置分布可如图7b和图7c所示;由于作弊手段的媒体信息展示位的点击行为由程序化的固定模式而来,其点击位置会呈现一定的规律性,经数据分析,异常媒体信息展示位的点击位置的分布往往会呈现散点状或是集中状,具体可分别参照图7b和图7c所示;其中,图7b表示散点状的点击位置分布。图7c表示集中状的点击位置分布;图7c中线条的粗细程度以及虚实状态分别表示不同点击量;例如,细实线表示一种点击量;粗实线表示第二种点击量;虚线表示第三种点击量。
基于上述描述,本实施例步骤205至步骤207中,根据所述第三媒体信 息展示位的点击位置信息计算第一参数,所述第一参数表征所述第三媒体信息展示位的点击位置的分布情况;当所述第一参数不符合预设条件时,确定第三媒体信息展示位为疑似异常媒体信息展示位。其中,所述第一参数可通过熵值表示,即通过熵的算法识别媒体信息展示媒体信息展示位。
具体的,针对异常媒体信息展示位的点击位置的分布呈现散点状的情况,如图7b所示,从水平方向和垂直方向来看,坐标点也都是均匀分布的。所以如果能分辨出在水平方向或垂直方向的分布都很均匀,就能识别出点击位置的分布呈现散点状,即识别出媒体信息展示位的点击位置异常。在给定水平和垂直方向的区间范围的情况下,均匀分布可以使熵值达到最大。用熵来描述点击分布的均匀程度,以水平方向为例,其计算方法如下:
H(x)=-∑p(x)log(p(x))            (1)
其中,x表示点击位置的横坐标;p(x)表示点击位置的横坐标为x时的概率;H(x)表示媒体信息展示位的点击位置的熵值。
另一方面,针对异常媒体信息展示位的点击位置的分布呈现集中状的情况,如图7c所示,当固定水平坐标值x时,垂直坐标y的取值是比较少的,即在已知横坐标x的情况下,纵坐标y的不确定性是比较小的,因此可以通过条件熵来表示,具体如下所示:
Figure PCTCN2017086126-appb-000001
其中,x和y分别表示点击位置的横坐标和纵坐标;p(x)表示点击位置的横坐标为x时的概率;p(x,y)表示点击位置的横坐标为x、纵坐标为y时的概率;H(Y|X)表示媒体信息展示位的点击位置的熵值。
基于上述熵的计算方式,对每一个媒体信息展示位的点击位置计算第一参数,所述第一参数具体可以为熵值;针对不同类型的媒体信息展示位(所述类型例如横幅、插屏等类型)分别设置对应的预设阈值范围;当计算获得的媒体信息展示位的第一参数(例如熵值)不在预设阈值范围内时,可确定所述媒体信息展示位为疑似异常媒体信息展示位。例如,对插屏广告位来说,熵值在约8bit左右时可以认为其代表了某种情况下的均匀分布,可能有作弊 嫌疑,即所述插屏广告位可以判定为疑似异常媒体信息展示位。
本实施例中,作为其中的第三种差异在于,对于实施例一中识别出的异常终端,假设一个广告位每天有100次点击,有占10%的点击来自这样的IMEI,那么可以通过以下公式大体估算此事件发生的概率:P=(0.0022)10×(0.9978)90。上述P是一个无限接近于0的数值,说明这种情况是个极小概率事件,若某广告位的流量上有大量的此类多地域出现的IMEI的话,那么它是作弊广告位的可能性很大。图8的分布也说明了这一点,正常的广告位中,异常终端占比都很小,例如在三千个广告位中,约有两千六百至两千七百个广告位中,点击上述广告位的终端中有不足5%的终端为异常终端;而有部分广告位中,异常终端占比超过10%;有约一百个广告位中,点击上述广告位的终端中有100%的终端为异常终端。
基于上述描述,本实施例步骤105至步骤107中,统计异常终端对应的媒体信息展示位中,第一媒体信息展示位对应的异常终端数量和终端总数量;所述第一媒体信息展示位为异常终端对应的媒体信息展示位中任一媒体信息展示位;计算所述异常终端数量与终端总数量的第一比值;当所述第一比值大于第二阈值时,将所述第一媒体信息展示位标记为疑似异常媒体信息展示位。
基于上述三种方式标记的疑似异常媒体信息展示位,即基于日夜点击行为的获得的第二比值、基于点击位置分布情况获得的第一参数以及基于异常终端(即用户维度)获得的第一比值,按预先设置的权重值分别计算对应异常媒体信息展示位对应的第二参数;其中,所述第一参数对应的权重值可相对较大;将每个参数与对应的权重值相乘并将结果相加,最终获得所述第二参数。比较所述第二参数是否大于第三阈值;当所述第二参数大于第三阈值时,确定所述第一疑似异常媒体信息展示位为异常媒体信息展示位。
图9为本发明实施例的信息处理方法的一种处理过程示意图;如图9所示,通过对日志系统中的日志信息的获取,所获取的日志信息包括小时级日志和天级日志。具体地,小时级日志可以为例如本实施例中所述的第一日志信息,以及天级日志可以为例如本实施例中所述的第二日志信息。一方面, 基于小时级日志对作弊终端进行识别,按实施例一中所述方式确定异常终端,进一步地,所述判定所述第一终端信息对应的终端为异常终端后,所述方法还包括:将所述异常终端对应的媒体信息展示位的点击行为记为无效。另一方面,基于天级日志对作弊媒体信息展示位(例如广告位)进行识别,按实施例二中所述方式确定异常媒体信息展示位,进一步地,所述确定所述第一疑似异常媒体信息展示位为异常媒体信息展示位后,所述方法还包括:将针对所述异常媒体信息展示位的点击行为记为无效。
采用本发明实施例的技术方案,一方面,通过对媒体信息展示位存在点击行为的终端信息以及对应的地域信息进行分析,将地域数量大于第一阈值的终端判定为异常终端;另一方面,通过对媒体信息展示位的日夜点击量比值、以及点击位置分布情况进行分析,从而识别出异常媒体信息展示位。从而,有效的解决了现有技术中的媒体信息展示的作弊技术手段无法准确检测识别出作弊用户的问题,也解决了现有技术中无法准确的统计获得媒体信息展示位的点击量,大大提升了媒体信息展示位的点击量的准确率,同时也保护了媒体信息投放方(例如广告主)的利益。
下面以具体的广告展示的应用场景对本发明实施例的信息处理方法进行详细说明。
图10为本发明实施例的信息处理方法的一种应用场景下的处理过程示意图。如图10所示,本应用场景下的信息处理方案包括两部分内容:异常用户的识别以及判罚过程,具体包括步骤41至步骤43;异常广告位的识别以及判罚过程,具体包括步骤51至步骤57以及步骤44、45至步骤54。其中,异常用户的识别以及判罚过程具体包括:
步骤41:获得小时级日志信息。
具体的,服务器可通过设置一小时的定时器;每当定时器定时时间到时,获得日志系统中在一个小时时间范围内的IMEI以及对应的地址信息。
步骤42:每小时计算多地出现的异常IMEI。
其中,可在获得小时级日志信息之后统计每个IMEI在当前一个小时范围内出现的地域信息;当确定某个IMEI在一个小时范围内出现的地域数量达到 预设阈值时,则确定该IMEI为异常IMEI。该预设阈值可以为3
步骤43:异常用户推送线上判罚。
具体的,将异常IMEI对应的终端判定为异常终端,也可以理解为将异常IMEI对应的用户判定为异常用户。将异常用户推送线上判罚具体包括:将所述异常终端对应的点击行为记为无效。
异常广告位的识别以及判罚过程具体包括三部分内容:第一部分内容为通过识别广告位中的异常用户的点击数的比值确定疑似异常广告位;第二部分内容为通过识别广告位的日夜点击量比确定疑似异常广告位;第三部分内容为通过是被广告位点击的坐标分布情况确定疑似异常广告位;再对上述三种方式确定的疑似异常广告位进行统计从而确定最终的异常广告位。具体包括:
步骤44:将异常IMEI按广告位聚合,统计广告位的点击中,异常用户的数量以及用户总数量。
步骤45:判断异常用户的数量与用户总数量的占比是否超出预设阈值;当判定异常用户的数量与用户总数量的占比超出预设阈值时,确定该广告位为疑似异常广告位,将所述疑似异常广告位推送至步骤54中进行异常广告位的判定。
具体的,按照实施例二中的分析描述,正常的广告位点击中,异常用户的占比很小;如果某广告位的点击中,异常用户的占比较大,则可确定该广告位为疑似异常广告位。
步骤51:获得天级别日志。
具体的,服务器可通过在每天的固定时刻获得前一天的日志信息。
步骤52:基于获得的天级别日志计算每个广告位的日夜点击量比。
步骤53:判断日夜点击量比是否超出预设阈值,当确定日夜点击量比超出预设阈值时,可确定该广告位为疑似异常广告位,将所述疑似异常广告位推送至步骤54中进行异常广告位的判定。
步骤55:基于获得的天级别日志统计广告位点击坐标分布情况。
步骤56:计算表征点击坐标的集中与分散程度的参数,进一步执行步骤 54。
具体的,所述计算表征点击坐标的集中与分散程度的参数,可通过计算广告位的点击坐标的熵确定。
步骤53:判断所述参数是否大于预设阈值,当确定所述参数大于预设阈值时,可确定该广告位为疑似异常广告位,将所述疑似异常广告位推送至步骤54中进行异常广告位的判定。当然,在这里执行步骤53时,进行比较的预设阈值与上述与日夜点击量比进行比较的预设阈值不同。
步骤54:根据上述三种方式确定的疑似异常广告位进行异常广告位的判定,具体的,可针对上述三种方式确定的疑似异常广告位进行异常广告位预先配置相对应的权重值。上述三种方式确定的疑似异常广告位对应三种参数:基于日夜点击行为的获得的表征日夜点击量的第二比值、基于点击位置分布情况获得的表征点击坐标的集中或分散程度的第一参数以及基于异常终端(即用户维度)获得的表征异常用户与用户总量的第一比值;将每个参数与对应的权重值相乘并将结果相加,最终获得的记过如果大于预设阈值,则可确定该疑似异常广告位为异常广告位。
步骤57:将异常广告位推送线上判罚。
具体的,将异常广告位推送线上判罚具体包括:将针对所述异常广告位对应的点击行为记为无效。
实施例三
基于实施例一,本发明实施例还提供了一种服务器。图11为本发明实施例的服务器的第一种组成结构示意图。如图11所示,所述服务器包括:数据获取单元31、数据分析单元32和确定单元33。
所述数据获取单元31,用于获得第一时间段内的第一日志信息;
所述数据分析单元32,用于基于所述第一日志信息获得对媒体信息展示位存在点击行为的终端的终端信息;基于所述终端信息确定所述终端对应的地域信息,其中所述地域信息用于表示所述终端所处的地域;以及判断在预设时间范围内所述终端所处的地域的数量是否大于第一阈值;
所述确定单元33,用于基于所述数据分析单元32获得的判断结果获得对 应的地域数量大于第一阈值的第一终端信息,判定所述第一终端信息对应的终端为异常终端。
本实施例中,所述第一日志信息为较短时间范围内日志信息,作为一种实施方式,所述第一日志信息可以为小时级日志信息,例如一小时的日志信息;当然不限于小时级日志信息,也可以为分钟级日志信息等等。其中,所述第一日志信息包括媒体信息推送平台获得的所有信息,包括媒体信息展示位的点击行为、终端信息、用户信息以及终端所在的位置信息等等。
本实施例中,基于实施例一的具体描述,所述数据分析单元32获得媒体信息推送平台中的终端信息以及对应的地域信息,所述地域信息具体可通过市级区域范围表示。当低于预设时间范围(例如一个小时)内第一终端信息对应的低于数量大于第一阈值时,所述确定单元33可确定所述第一终端信息对应的第一终端为异常终端。该第一阈值可以为3。
作为一种实施方式,所述服务器还包括第一判罚单元35,用于所述确定单元33判定所述第一终端信息对应的终端为异常终端后,将所述异常终端对应的媒体信息展示位的点击行为记为无效。
本领域技术人员应当理解,本发明实施例的服务器中各处理单元的功能,可参照前述信息处理方法的相关描述而理解,本发明实施例的服务器中各处理单元,可通过实现本发明实施例所述的功能的模拟电路而实现,也可以通过执行本发明实施例所述的功能的软件在智能终端上的运行而实现。
实施例四
基于实施例二,本发明实施例还提供了一种服务器;图12为本发明实施例的服务器的第二种组成结构示意图;如图12所示,所述服务器包括:数据获取单元31、数据分析单元32、确定单元33和设置单元34;其中,
所述数据获取单元31,用于获得第一时间段内的第一日志信息;
所述数据分析单元32,用于基于所述数据获取单元31获得的所述第一日志信息获得对媒体信息展示位存在点击行为的终端的终端信息;基于所述终端信息确定所述终端对应的地域信息,其中所述地域信息用于表示所述终端所处的地域;以及判断在预设时间范围内所述终端所处的地域的数量是否大 于第一阈值;还用于统计异常终端对应的媒体信息展示位中,第一媒体信息展示位对应的异常终端数量和终端总数量,其中所述第一媒体信息展示位为异常终端对应的媒体信息展示位中任一媒体信息展示位;计算所述异常终端数量与终端总数量的第一比值;
所述确定单元33,用于基于所述数据分析单元32获得的判断结果获得对应的地域数量大于第一阈值的第一终端信息,判定所述第一终端信息对应的终端为异常终端;还用于当所述第一比值大于第二阈值时,将所述第一媒体信息展示位标记为疑似异常媒体信息展示位;
所述数据获取单元31,还用于获得第二时间段内的第二日志信息;其中所述第二时间段的时间范围大于第一时间段的时间范围;
所述数据分析单元32,还用于基于所述第二日志信息获得第二媒体信息展示位在第一预设时间范围内的第一点击量以及在第二预设时间范围内的第二点击量,其中,所述第一预设时间范围与所述第二预设时间范围不同;计算所述第一点击量与所述第二点击量的第二比值;还用于基于所述第二日志信息获得第三媒体信息展示位的点击位置信息;根据所述第三媒体信息展示位的点击位置信息计算第一参数,所述第一参数表征所述第三媒体信息展示位的点击位置的分布情况;判断所述第一参数是否在预设阈值范围内;
所述确定单元33,还用于当所述第二比值小于第二阈值时,判定所述第二媒体信息展示位为疑似异常媒体信息展示位;还用于当所述第一参数不在预设阈值范围内时,确定第三媒体信息展示位为疑似异常媒体信息展示位;
所述设置单元34,用于分别为第一比值、所述第二比值和所述第一参数设置对应的权重值;
所述确定单元33,还用于根据所述第一比值、所述第二比值和所述第一参数按对应的权重值进行计算获得疑似异常媒体信息展示位对应的第二参数;当所述第二参数大于第三阈值时,确定所述疑似异常媒体信息展示位为异常媒体信息展示位。
本实施例中,所述数据获取单元31获取第一日志信息,以及第二日志信息;其中,所述第一日志信息为较短时间范围内日志信息,作为一种实施方 式,所述第一日志信息可以为小时级日志信息,例如一小时的日志信息;当然不限于小时级日志信息,也可以为分钟级日志信息等等。所述第二日志信息的时间范围大于所述第一日志信息的时间范围;所述第二日志信息相对于所述第一日志信息具有较长的时间范围。作为一种实施方式,所述第二日志信息可以为天级日志信息;当然不限于一天时间范围内的日志信息,也可以是十天范围内的日志信息等等。
基于实施例二的描述,第一方面,所述数据分析单元32基于第二日志信息对媒体信息展示位在第一预设时间范围内的第一点击量(即日点击量)以及在第二预设时间范围内的第二点击量(即夜点击量)进行分析,计算日点击量与夜点击量的比值。所述确定单元33当所述比值小于第二阈值时,判定相应的媒体信息展示位为疑似异常媒体信息展示位。
第二方面,所述数据分析单元32基于第二日志信息对媒体信息展示位的点击位置信息进行分析,基于媒体信息展示位的点击位置信息计算第一参数(即熵值)。所述确定单元33当所述第一参数(即熵值)不在预设阈值范围内时,确定第三媒体信息展示位为疑似异常媒体信息展示位。
第三方面,所述数据分析单元32针对所述确定单元33判定的异常终端,对异常终端涉及的媒体信息展示位进行分析,计算媒体信息展示位中异常终端与终端总量的比值;当所述比值大于第二阈值时,所述确定单元33将所述第一媒体信息展示位标记为疑似异常媒体信息展示位。
结合上述三方面,所述确定单元33基于所述设置单元34设置的权重值,分别计算对应异常媒体信息展示位对应的第二参数;其中,所述第一参数对应的权重值可相对较大;将每个参数与对应的权重值相乘并将结果相加,最终获得所述第二参数。比较所述第二参数是否大于第三阈值;当所述第二参数大于第三阈值时,确定所述第一疑似异常媒体信息展示位为异常媒体信息展示位。
作为一种实施方式,所述服务器还包括第一判罚单元35,用于所述确定单元33判定所述第一终端信息对应的终端为异常终端后,将所述异常终端对应的媒体信息展示位的点击行为记为无效。
作为另一种实施方式,所述服务器包括第二判罚单元36,用于所述确定单元33判定疑似异常媒体信息展示位为异常媒体信息展示位后,将针对所述异常媒体信息展示位的点击行为记为无效。
本领域技术人员应当理解,本发明实施例的服务器中各处理单元的功能,可参照前述信息处理方法的相关描述而理解,本发明实施例的服务器中各处理单元,可通过实现本发明实施例所述的功能的模拟电路而实现,也可以通过执行本发明实施例所述的功能的软件在智能终端上的运行而实现。
本发明实施例三和实施例四中,所述服务器中的数据获取单元31、数据分析单元32、确定单元33、设置单元34、第一判罚单元35和第二判罚单元36,在实际应用中均可由所述服务器的中央处理器(CPU,Central Processing Unit)、数字信号处理器(DSP,Digital Signal Processor)、微控制单元(MCU,Microcontroller Unit)或可编程门阵列(FPGA,Field-Programmable Gate Array)实现。
图13为本发明实施例的服务器模块化的具体结构示意图,如图13所示,作弊终端识别模块、点击日夜比异常识别模块和点击位置分布异常识别模块分别从日志系统中获得日志信息;其中,作弊终端识别模块获得小时级日志信息;点击日夜比异常识别模块和点击位置分布异常识别模块分别获得天级日志信息。策略配置信息将作弊判定策略配置给策略更新模块。作弊终端识别模块、点击日夜比异常识别模块和点击位置分布异常识别模块依据策略更新模块中的作弊判定策略进行分析识别;策略更新模块获得异常终端和异常媒体信息展示位(即异常广告位)。线上实时系统接收到计费模块的请求后,依据获得的判定结果进行判罚,对异常终端进行的点击行为记为无效,对针对异常媒体信息展示位的点击行为记为无效;同时,记为模块对记为无效的点击行为不进行计费。
本实施例中,服务器作为硬件实体的一个示例如图14所示。所述服务器包括处理器61、存储介质62以及至少一个外部通信接口63;所述处理器61、存储介质62以及外部通信接口63均通过总线64连接。
这里需要指出的是:以上涉及服务器项的描述,与上述方法描述是类似 的,同方法的有益效果描述,不做赘述。对于本发明服务器实施例中未披露的技术细节,请参照本发明方法实施例的描述。
在本申请所提供的几个实施例中,应该理解到,所揭露的设备和方法,可以通过其它的方式实现。以上所描述的设备实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,如:多个单元或组件可以结合,或可以集成到另一个系统,或一些特征可以忽略,或不执行。另外,所显示或讨论的各组成部分相互之间的耦合、或直接耦合、或通信连接可以是通过一些接口,设备或单元的间接耦合或通信连接,可以是电性的、机械的或其它形式的。
上述作为分离部件说明的单元可以是、或也可以不是物理上分开的,作为单元显示的部件可以是、或也可以不是物理单元,即可以位于一个地方,也可以分布到多个网络单元上;可以根据实际的需要选择其中的部分或全部单元来实现本实施例方案的目的。
另外,在本发明各实施例中的各功能单元可以全部集成在一个处理单元中,也可以是各单元分别单独作为一个单元,也可以两个或两个以上单元集成在一个单元中;上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能单元的形式实现。
本领域普通技术人员可以理解:实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述的程序可以存储于一计算机可读取存储介质中,该程序在执行时,执行包括上述方法实施例的步骤;而前述的存储介质包括:移动存储设备、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。
或者,本发明上述集成的单元如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明实施例的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机、服务器、 或者网络设备等)执行本发明各个实施例所述方法的全部或部分。而前述的存储介质包括:移动存储设备、ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应以所述权利要求的保护范围为准。

Claims (15)

  1. 一种信息处理方法,其特征在于,所述方法包括:
    获得第一时间段内的第一日志信息;
    基于所述第一日志信息获得对媒体信息展示位存在点击行为的终端的终端信息;
    基于所述终端信息确定所述终端对应的地域信息,其中所述地域信息用于表示所述终端所处的地域;
    根据所述地域信息判断在预设时间范围内所述终端所处的地域的数量是否大于第一阈值;以及
    基于判断结果获得对应的地域数量大于第一阈值的第一终端信息,判定所述第一终端信息对应的终端为异常终端。
  2. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    统计异常终端对应的媒体信息展示位中,第一媒体信息展示位对应的异常终端数量和终端总数量;其中,所述第一媒体信息展示位为异常终端对应的媒体信息展示位中任一媒体信息展示位;
    计算所述异常终端数量与终端总数量的第一比值;以及
    当所述第一比值大于第二阈值时,将所述第一媒体信息展示位标记为疑似异常媒体信息展示位。
  3. 根据权利要求2所述的方法,其特征在于,所述方法还包括:
    获得第二时间段内的第二日志信息;其中所述第二时间段的时间范围大于第一时间段的时间范围;
    基于所述第二日志信息获得第二媒体信息展示位在第一预设时间范围内的第一点击量以及在第二预设时间范围内的第二点击量;其中,所述第一预设时间范围与所述第二预设时间范围不同;
    计算所述第一点击量与所述第二点击量的第二比值;
    当所述第二比值小于第二阈值时,判定所述第二媒体信息展示位为疑似异常媒体信息展示位。
  4. 根据权利要求3所述的方法,其特征在于,所述获得第二日志信息后, 所述方法还包括:
    基于所述第二日志信息获得第三媒体信息展示位的点击位置信息;
    根据所述点击位置信息计算第一参数,所述第一参数表征所述第三媒体信息展示位的点击位置的分布情况;
    当所述第一参数不在预设阈值范围内时,确定第三媒体信息展示位为疑似异常媒体信息展示位。
  5. 根据权利要求4所述的方法,其特征在于,所述方法还包括:
    分别为所述第一比值、所述第二比值和所述第一参数设置权重值;
    根据所述第一比值、所述第二比值、所述第一参数按、和对应的权重值进行计算获得疑似异常媒体信息展示位对应的第二参数;以及
    当所述第二参数大于第三阈值时,确定所述疑似异常媒体信息展示位为异常媒体信息展示位。
  6. 根据权利要求1所述的方法,其特征在于,所述判定所述第一终端信息对应的终端为异常终端后,所述方法还包括:将所述异常终端对应的媒体信息展示位的点击行为记为无效。
  7. 根据权利要求5所述的方法,其特征在于,所述确定所述疑似异常媒体信息展示位为异常媒体信息展示位后,所述方法还包括:将针对所述异常媒体信息展示位的点击行为记为无效。
  8. 一种服务器,其特征在于,包括:数据获取单元、数据分析单元和确定单元;其中,
    所述数据获取单元,用于获得第一时间段内的第一日志信息;
    所述数据分析单元,用于基于所述第一日志信息获得对媒体信息展示位存在点击行为的终端的终端信息;基于所述终端信息确定所述终端对应的地域信息,其中所述地域信息用于表示所述终端所处的地域;以及判断在预设时间范围内所述终端所处的地域的数量是否大于第一阈值;以及
    所述确定单元,用于基于所述数据分析单元获得的判断结果获得对应的地域数量大于第一阈值的第一终端信息,判定所述第一终端信息对应的终端为异常终端。
  9. 根据权利要求8所述的服务器,其特征在于,所述数据分析单元,还用于统计异常终端对应的媒体信息展示位中,第一媒体信息展示位对应的异常终端数量和终端总数量;其中,所述第一媒体信息展示位为异常终端对应的媒体信息展示位中任一媒体信息展示位;计算所述异常终端数量与终端总数量的第一比值;以及
    所述确定单元,还用于当所述第一比值大于第二阈值时,将所述第一媒体信息展示位标记为疑似异常媒体信息展示位。
  10. 根据权利要求9所述的服务器,其特征在于,所述数据获取单元,还用于获得第二时间段内的第二日志信息;其中所述第二时间段的时间范围大于第一时间段的时间范围;
    所述数据分析单元,还用于基于所述第二日志信息获得第二媒体信息展示位在第一预设时间范围内的第一点击量以及在第二预设时间范围内的第二点击量,其中,所述第一预设时间范围与所述第二预设时间范围不同;计算所述第一点击量与所述第二点击量的第二比值;
    所述确定单元,还用于当所述第二比值小于第二阈值时,判定所述第二媒体信息展示位为疑似异常媒体信息展示位。
  11. 根据权利要求10所述的服务器,其特征在于,所述数据分析单元,还用于基于所述第二日志信息获得第三媒体信息展示位的点击位置信息;根据所述点击位置信息计算第一参数,所述第一参数表征所述第三媒体信息展示位的点击位置的分布情况;判断所述第一参数是否在预设阈值范围内;
    所述确定单元,还用于当所述第一参数不在预设阈值范围内时,确定第三媒体信息展示位为疑似异常媒体信息展示位。
  12. 根据权利要求11所述的服务器,其特征在于,所述服务器还包括:
    设置单元,用于分别为第一比值、所述第二比值和所述第一参数设置对应的权重值;
    所述确定单元,还用于根据所述第一比值、所述第二比值、所述第一参数按、和对应的权重值进行计算获得疑似异常媒体信息展示位对应的第二参数;以及当所述第二参数大于第三阈值时,确定所述疑似异常媒体信息展示 位为异常媒体信息展示位。
  13. 根据权利要求8所述的服务器,其特征在于,所述服务器还包括:
    第一判罚单元,用于所述确定单元判定所述第一终端信息对应的终端为异常终端后,将所述异常终端对应的媒体信息展示位的点击行为记为无效。
  14. 根据权利要求12所述的服务器,其特征在于,所述服务器包括第二判罚单元,用于所述确定单元判定疑似异常媒体信息展示位为异常媒体信息展示位后,将针对所述异常媒体信息展示位的点击行为记为无效。
  15. 一种非易失性存储介质,存储有程序指令,其特征在于,处理器执行所存储的程序指令时执行根据权利要求1至7中任一项所述的信息处理方法。
PCT/CN2017/086126 2016-06-02 2017-05-26 一种信息处理方法、服务器及非易失性存储介质 WO2017206811A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
EP17805756.8A EP3471044A4 (en) 2016-06-02 2017-05-26 INFORMATION PROCESSING, SERVER AND NON-VOLATILE STORAGE MEDIUM
JP2018527752A JP6628376B2 (ja) 2016-06-02 2017-05-26 情報処理方法、サーバ、および不揮発性記憶媒体
US15/989,997 US11373205B2 (en) 2016-06-02 2018-05-25 Identifying and punishing cheating terminals that generate inflated hit rates

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610389956.4A CN106097000B (zh) 2016-06-02 2016-06-02 一种信息处理方法及服务器
CN201610389956.4 2016-06-02

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US15/989,997 Continuation US11373205B2 (en) 2016-06-02 2018-05-25 Identifying and punishing cheating terminals that generate inflated hit rates

Publications (1)

Publication Number Publication Date
WO2017206811A1 true WO2017206811A1 (zh) 2017-12-07

Family

ID=57448565

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/086126 WO2017206811A1 (zh) 2016-06-02 2017-05-26 一种信息处理方法、服务器及非易失性存储介质

Country Status (5)

Country Link
US (1) US11373205B2 (zh)
EP (1) EP3471044A4 (zh)
JP (1) JP6628376B2 (zh)
CN (1) CN106097000B (zh)
WO (1) WO2017206811A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110413497A (zh) * 2019-07-30 2019-11-05 Oppo广东移动通信有限公司 异常监控方法、装置、终端设备及计算机可读存储介质
CN115292331A (zh) * 2022-10-09 2022-11-04 淄博青禾检测科技有限公司 一种异常地理区域的获取方法、设备及介质
US11743672B2 (en) 2017-12-18 2023-08-29 Dolby International Ab Method and system for handling local transitions between listening positions in a virtual reality environment

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106097000B (zh) 2016-06-02 2022-07-26 腾讯科技(深圳)有限公司 一种信息处理方法及服务器
CN108255879B (zh) * 2016-12-29 2021-10-08 北京国双科技有限公司 网页浏览流量作弊的检测方法及装置
JP6900825B2 (ja) * 2017-08-01 2021-07-07 株式会社リコー 情報処理システム、情報処理装置、プログラム及び配信方法
CN107483443B (zh) * 2017-08-22 2020-06-05 北京京东尚科信息技术有限公司 广告信息处理方法、客户端、存储介质和电子设备
CN108011936B (zh) 2017-11-28 2021-06-04 百度在线网络技术(北京)有限公司 用于推送信息的方法和装置
CN108810947B (zh) * 2018-05-29 2021-05-11 每日互动股份有限公司 基于ip地址的鉴别真实流量的服务器
CN111105262B (zh) * 2018-10-29 2024-05-14 北京奇虎科技有限公司 一种用户识别方法、装置、电子设备和存储介质
CN109413103A (zh) * 2018-12-11 2019-03-01 泰康保险集团股份有限公司 虚假用户识别的处理方法、装置、设备及存储介质
CN110381063A (zh) * 2019-07-22 2019-10-25 秒针信息技术有限公司 一种确定作弊流量的方法及装置
WO2021262170A1 (en) 2020-06-25 2021-12-30 Google Llc Anomalous user interface input detection
CN113438201B (zh) * 2021-05-17 2023-03-28 北京达佳互联信息技术有限公司 设备识别方法、装置、设备及存储介质
CN114358819A (zh) * 2021-12-22 2022-04-15 广州趣丸网络科技有限公司 一种覆盖多平台的广告发布的方法及装置

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080201214A1 (en) * 2007-02-15 2008-08-21 Bellsouth Intellectual Property Corporation Methods, Systems and Computer Program Products that Use Measured Location Data to Identify Sources that Fraudulently Activate Internet Advertisements
US20080281941A1 (en) * 2007-05-08 2008-11-13 At&T Knowledge Ventures, Lp System and method of processing online advertisement selections
CN103714057A (zh) * 2012-09-28 2014-04-09 北京亿赞普网络技术有限公司 一种在线网页信息的实时监测方法和装置
CN106097000A (zh) * 2016-06-02 2016-11-09 腾讯科技(深圳)有限公司 一种信息处理方法及服务器

Family Cites Families (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8321269B2 (en) * 2004-10-26 2012-11-27 Validclick, Inc Method for performing real-time click fraud detection, prevention and reporting for online advertising
JP2007286803A (ja) * 2006-04-14 2007-11-01 Nippon Telegr & Teleph Corp <Ntt> 広告配信管理装置、広告配信管理方法、広告配信管理プログラム
US7657626B1 (en) * 2006-09-19 2010-02-02 Enquisite, Inc. Click fraud detection
US20080052629A1 (en) * 2006-08-26 2008-02-28 Adknowledge, Inc. Methods and systems for monitoring time on a web site and detecting click validity
US8131611B2 (en) * 2006-12-28 2012-03-06 International Business Machines Corporation Statistics based method for neutralizing financial impact of click fraud
US20080162202A1 (en) * 2006-12-29 2008-07-03 Richendra Khanna Detecting inappropriate activity by analysis of user interactions
US20080281606A1 (en) * 2007-05-07 2008-11-13 Microsoft Corporation Identifying automated click fraud programs
JP4945490B2 (ja) * 2008-03-24 2012-06-06 ヤフー株式会社 不正検出装置および不正検出方法
US8245282B1 (en) * 2008-08-19 2012-08-14 Eharmony, Inc. Creating tests to identify fraudulent users
US8433785B2 (en) * 2008-09-16 2013-04-30 Yahoo! Inc. System and method for detecting internet bots
US20120130801A1 (en) * 2010-05-27 2012-05-24 Victor Baranov System and method for mobile advertising
US9219744B2 (en) * 2010-12-08 2015-12-22 At&T Intellectual Property I, L.P. Mobile botnet mitigation
CN102663065B (zh) * 2012-03-30 2014-12-10 浙江盘石信息技术有限公司 一种广告位异常数据识别和筛选方法
US20130325591A1 (en) * 2012-06-01 2013-12-05 Airpush, Inc. Methods and systems for click-fraud detection in online advertising
CN103593415B (zh) * 2013-10-29 2017-08-01 北京国双科技有限公司 网页访问量作弊的检测方法和装置
US20150262226A1 (en) * 2014-03-13 2015-09-17 Mastercard International Incorporated Method and system for identifying fraudulent and unconverted clicks in web advertisements
CN104580244B (zh) * 2015-01-26 2018-03-13 百度在线网络技术(北京)有限公司 恶意点击的防御方法和装置
CN104715395A (zh) * 2015-02-13 2015-06-17 北京集奥聚合科技有限公司 一种过滤作弊点击的方法和系统
US20160267529A1 (en) * 2015-03-09 2016-09-15 Qualcomm Incorporated Method and System of Detecting Malicious Video Advertising Impressions
CN105046529A (zh) * 2015-07-30 2015-11-11 华南理工大学 一种移动广告作弊识别方法
US10630707B1 (en) * 2015-10-29 2020-04-21 Integral Ad Science, Inc. Methods, systems, and media for detecting fraudulent activity based on hardware events
JP6717046B2 (ja) * 2016-05-17 2020-07-01 富士通株式会社 関心度評価装置,方法及びプログラム

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080201214A1 (en) * 2007-02-15 2008-08-21 Bellsouth Intellectual Property Corporation Methods, Systems and Computer Program Products that Use Measured Location Data to Identify Sources that Fraudulently Activate Internet Advertisements
US20080281941A1 (en) * 2007-05-08 2008-11-13 At&T Knowledge Ventures, Lp System and method of processing online advertisement selections
CN103714057A (zh) * 2012-09-28 2014-04-09 北京亿赞普网络技术有限公司 一种在线网页信息的实时监测方法和装置
CN106097000A (zh) * 2016-06-02 2016-11-09 腾讯科技(深圳)有限公司 一种信息处理方法及服务器

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3471044A4 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11743672B2 (en) 2017-12-18 2023-08-29 Dolby International Ab Method and system for handling local transitions between listening positions in a virtual reality environment
CN110413497A (zh) * 2019-07-30 2019-11-05 Oppo广东移动通信有限公司 异常监控方法、装置、终端设备及计算机可读存储介质
CN110413497B (zh) * 2019-07-30 2024-02-13 Oppo广东移动通信有限公司 异常监控方法、装置、终端设备及计算机可读存储介质
CN115292331A (zh) * 2022-10-09 2022-11-04 淄博青禾检测科技有限公司 一种异常地理区域的获取方法、设备及介质

Also Published As

Publication number Publication date
EP3471044A1 (en) 2019-04-17
JP6628376B2 (ja) 2020-01-08
US11373205B2 (en) 2022-06-28
EP3471044A4 (en) 2019-11-27
CN106097000A (zh) 2016-11-09
JP2019510283A (ja) 2019-04-11
US20180276709A1 (en) 2018-09-27
CN106097000B (zh) 2022-07-26

Similar Documents

Publication Publication Date Title
WO2017206811A1 (zh) 一种信息处理方法、服务器及非易失性存储介质
US10929879B2 (en) Method and apparatus for identification of fraudulent click activity
CN105447724B (zh) 内容项推荐方法及装置
WO2016119499A1 (zh) 恶意点击的防御方法、装置和存储介质
JP6636143B2 (ja) モバイル機器に配信された情報の効果を測定するための方法及び装置
US10037546B1 (en) Honeypot web page metrics
US20130060629A1 (en) Optimization of Content Placement
KR101300517B1 (ko) 전자 입찰에 있어 투찰금액 예측방법 및 그 시스템
US20140040020A1 (en) Measuring effect of impressions on social media networks
US9230269B2 (en) Segment-based floors for use in online ad auctioning techniques
AU2010210706A1 (en) Advertising triggers based on internet trends
WO2012048244A2 (en) System and method for real-time advertising campaign adaptation
Book et al. An empirical study of mobile ad targeting
CN106886906B (zh) 一种设备识别方法和装置
CN109428910B (zh) 一种数据处理方法、装置及系统
TW201601086A (zh) 資訊推送方法及裝置
CN109034867B (zh) 点击流量检测方法、装置及存储介质
US20190340184A1 (en) System and method for managing content presentations
US9865004B1 (en) System and method for advertisement impression volume estimation
CN106033302B (zh) 信息展示区的操作处理方法及系统
CN112016959A (zh) 广告处理方法及服务器
CN113191800B (zh) 一种统计app上广告点击量的方法和装置
CN106296236B (zh) 信息处理方法及信息投放系统
Liang et al. Ad Fraud Under the Vertical Contract Structure
CN107818472B (zh) 一种信息处理方法及服务器

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2018527752

Country of ref document: JP

Kind code of ref document: A

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17805756

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE