WO2017114209A1 - 一种标签数据泄漏渠道检测方法及装置 - Google Patents

一种标签数据泄漏渠道检测方法及装置 Download PDF

Info

Publication number
WO2017114209A1
WO2017114209A1 PCT/CN2016/110714 CN2016110714W WO2017114209A1 WO 2017114209 A1 WO2017114209 A1 WO 2017114209A1 CN 2016110714 W CN2016110714 W CN 2016110714W WO 2017114209 A1 WO2017114209 A1 WO 2017114209A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
channel
label
tag
detection
Prior art date
Application number
PCT/CN2016/110714
Other languages
English (en)
French (fr)
Inventor
文镇
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Priority to JP2018532787A priority Critical patent/JP6895972B2/ja
Publication of WO2017114209A1 publication Critical patent/WO2017114209A1/zh
Priority to US16/020,872 priority patent/US10678946B2/en
Priority to US16/874,012 priority patent/US11080427B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6272Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database by registering files or documents with a third party
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/12Network monitoring probes

Definitions

  • the invention belongs to the technical field of data security, and in particular relates to a method and a device for detecting leakage of a tag data.
  • a tag is a form of Internet content organization that is highly relevant to the attributes of an object entity. Tags help to easily describe and categorize content for easy retrieval and sharing.
  • tags help to easily describe and categorize content for easy retrieval and sharing.
  • PII user personal data
  • Existing data security technologies use encryption, system hardening, access control, and audit monitoring to prevent data from leaking out of the data owner's controllable environment.
  • the data usually leaves the controllable environment of the data owner and enters the environment of the uncontrollable partner.
  • traditional database watermarking techniques and data trajectory tracking techniques cannot solve the challenges of massive, dynamic user tag data.
  • tag data is usually used in a decentralized manner, making watermark detection difficult.
  • tag data has massive and dynamic characteristics, and it also has great challenges for updating and detecting the watermark.
  • the value of tag data is generally very common, and tracking on the Internet is very difficult.
  • the object of the present invention is to provide a tag data leakage channel detecting method and device to solve
  • the prior art scheme label data is difficult to track the technical problem of detection, and can effectively detect possible data leakage channels.
  • a tag data leakage channel detecting method is used for detecting a leak channel of user tag data, and the detecting method includes:
  • the push information received by the user is intercepted according to the probability that the push information is generated by the normal label of the user;
  • the probability is generated according to the probability that the user detects the label to generate the push information, and if the probability that the push information is generated by the user detecting the label is higher than a given threshold, the user detection label is added to the suspected leak.
  • the search channel index is obtained, and the corresponding suspected leak channel ID list is obtained;
  • the adding a detection label to the user based on the normal label owned by the user includes:
  • the probability that the newly added detection tag appears simultaneously with the user's existing label is lower than the set first threshold.
  • the assigning a detection label to a given channel according to the user label data set, and establishing a channel index associated with the user ID, the detection label, and the channel ID including:
  • the user ID is used as a variable, and the detection tag corresponding to the channel is selected from the detection label of the user according to the extracted HASH function;
  • intercepting the push information received by the user according to the probability that the push information is generated by the normal label of the user includes:
  • the detecting method further includes the step of updating the user detection label according to the change of the normal label of the user, which specifically includes:
  • the detection label with high probability of occurrence at the same time as the new normal label of the user is deleted;
  • the new detection label is newly added to the user, and the probability that the newly added detection label appears simultaneously with the existing label of the user is lower than the first threshold.
  • the detecting method further includes:
  • the present invention also provides a tag data leakage channel detecting device for detecting a leak channel of user tag data, the detecting device comprising:
  • the detection label adding module is configured to add a detection label to the user based on the normal label owned by the user, and generate a user label data set;
  • a channel association module configured to assign a detection label to a given channel according to the user label data set, and establish a channel index associated with the user ID, the detection label, and the channel ID;
  • the intercepting module is configured to connect to the user according to the probability of generating push information by the normal label of the user The received push information is intercepted;
  • the interception information analysis module is configured to filter, according to the intercepted push information, a probability of generating the push information by the user detecting the label, and if the probability that the push information is generated by the user detecting the label is higher than a given threshold, The user detection tag is added to the suspected leak tag set;
  • a channel retrieval module configured to search for a channel index according to a suspected leaked label set, and obtain a corresponding suspected leak channel ID list;
  • the output module is configured to detect whether the push information is derived from the searched channel, and if so, delete the corresponding channel, and output the remaining channels as suspected leak channels.
  • the detection label adding module adds a detection label to the user based on the normal label owned by the user, the probability that the newly added detection label appears simultaneously with the existing label of the user is lower than the set first threshold.
  • the channel association module assigns a detection label to a given channel according to the user label data set, the following operations are performed:
  • the user ID is used as a variable, and the detection tag corresponding to the channel is selected from the detection label of the user according to the extracted HASH function;
  • the intercepting module intercepts the push information received by the user according to the probability that the push information is generated by the normal label of the user, the intercepting module performs the following operations:
  • the detection tag adding module is further configured to change according to a normal label of the user.
  • the steps for the new user to detect the label are as follows:
  • the detection label with high probability of occurrence at the same time as the new normal label of the user is deleted;
  • the new detection label is newly added to the user, and the probability that the newly added detection label appears simultaneously with the existing label of the user is lower than the first threshold.
  • the channel association module is further configured to remove the deleted detection tag related item from the channel index.
  • the invention provides a method and a device for detecting tag data leakage channels, which use different probabilities of tags appearing in the same user to generate different detection tags for different data usage channels. Then indirect detection of the use of the detection tag, and finally through the massive data index and search technology to effectively detect possible data leakage channels.
  • the detection method is highly efficient and can handle massive and dynamic user tag data.
  • FIG. 1 is a flow chart of a method for detecting a tag data leakage channel according to the present invention
  • FIG. 2 is a schematic structural view of a tag data leakage channel detecting device of the present invention.
  • the browsed webpage When a user browses the Internet, the browsed webpage generates a label indicating the user's preference for the user, and the Internet development has accumulated a large number of user preference data represented by the label.
  • the invention adds a certain amount of detection labels to each user on the basis of the normal label of the user.
  • the push information caused by the detection label is found, the channel for leaking the user label data can be found according to the push information.
  • the push information in this embodiment may include an advertisement, a pushed webpage, etc., and an advertisement is taken as an example for description.
  • a label data leakage channel detecting method as shown in FIG. 1 , includes:
  • Step S1 Add a detection label to the user based on the normal label owned by the user, and generate a user label data set.
  • the label for identifying the user's preference generated by the user's Internet access is referred to as a normal label
  • the label for subsequent detection generated by the user for this step is referred to as a detection label.
  • the detection label does not represent the user's preference, and only uses For subsequent testing.
  • the user tag data set includes a normal tag and a detection tag.
  • each user needs to have enough detection tags to correspond to different channels.
  • a detection tag is generated for the user, so that the user's detection tag reaches a set number.
  • the user U1 has two normal labels, namely: watching TV, junk food, and the detection labels required by the embodiment are two, for which two detection labels are generated, for example, vegetables, hiking shoes.
  • a tag is generated that has a probability that the user's existing tag is lower than the set first threshold, and the tag is added to the user tag data set as the user's detection tag.
  • Step S2 assign a detection label to the given channel according to the user label data set, and establish a channel index associated with the user ID, the detection label, and the channel ID.
  • the channel in this embodiment refers to a channel that uses user data, for example, a network platform provides its own user data to a channel. Advertisers, who are customers of the web platform and a channel for using user data.
  • the credibility of the channel refers to the credibility of the channel to send the advertisement according to the user data. If the channel is not based on the user data to push the advertisement, but the advertisement that the user is not interested in is pushed to the user is not credible. And you can use the unique ID of the channel as the variable key to select the Hash function H1 from a set of Hash function sets. Next, based on the channel credibility sampling user group, the channel sample population with high credibility can be smaller. Then, for each user of the sampled group, with the user ID as the key, the H1 function is used to select the detection label corresponding to the channel from the detection label of the user.
  • the sample user includes the user U1, and a random value is calculated by the H1 function and the user ID of the user U1, and a detection tag is selected from all the detection tags of the user U1 according to the random value to be given to the channel 1.
  • the random value calculated by the H1 function is 1, and the first detection tag is selected and assigned to channel 1 according to the order of the user 1 detecting the label. It is assumed that the detection label "vegetable" of the user U1 is given to the channel 1.
  • a detection tag is added to the user tag data set, and only the detection tag corresponding to the channel is assigned to the corresponding channel, for example, [U1, hiking shoes] is assigned to channel 2. If channel 2 pushes an advertisement based on the user tag data set, the advertisement sent either according to the normal label or the detection label [U1, hiking shoes] is considered safe. After the illegal user obtains the leaked user tag data, the user also sends an advertisement such as a hiking shoe to the user. According to the channel index, if the illegal channel is not the channel 2 in the channel index, the user tag data is considered to have leaked.
  • Step S3 Pushing the user according to the probability of generating the push information by the normal label of the user Send information for interception.
  • the advertisement received by the user is reflected on the user's terminal, and the detection of the advertisement can be first performed on the client on the user terminal.
  • security assistants are installed on many personal computers and smart phones, and the existing security assistants can be used to directly block advertisements on user terminals. It is of course also possible to develop a specific client for ad detection on the user terminal.
  • step S2 for the user tag data set, the crowd in which the security assistant is not installed is first filtered out. That is, only the people who have installed the security assistant are sampled, and those who do not have the security assistant installed are not considered. This eliminates the need for additional development of the client and directly uses the user's security assistant to filter the advertisement on the user terminal side.
  • the advertisement is filtered, that is, the advertisement received by the user is intercepted according to the probability that the advertisement is generated by the normal label of the user. If the probability that the advertisement is generated by the normal label is lower than the set threshold, the next step is processed, otherwise Show the ad to the user.
  • the normal label of the user needs to be synchronized to the user security assistant of the user, so that the security assistant intercepts according to the probability that the normal label generates the advertisement.
  • the probability of generating the advertisement according to the normal label is generally calculated by the security assistant according to the degree of matching between the advertisement source and the normal label of the user, and details are not described herein again.
  • For ads generated by normal tags with a probability lower than the set threshold intercept and send to a dedicated backend server for further processing.
  • Step S4 Perform screening on the intercepted push information according to the probability that the push detection information is generated by the user detection label. If the probability that the push information is generated by the user detection label is higher than a given threshold, the user detection label is Join the collection of suspected leaked labels.
  • the screening is further based on the probability that the advertisement is generated by the user detection tag. If the probability that the advertisement is generated by a user detecting the tag is above a given threshold, then the user detection tag is added to the suspected leaked tag set.
  • an advertisement for a trekking pole sent to the user U1 is generated according to the normal label “watching TV” and “junk snack”, and is sent to the background server.
  • the probability of generating the advertisement by the "climbing shoes” is relatively high, so [user U1, hiking shoes] is added to the suspected leak label set.
  • Step S5 Searching for the channel index according to the suspected leaked label set, and obtaining a corresponding suspected leak channel ID list.
  • the suspect tags are taken from the suspected leaked tag set and searched in the channel index to get a list of possible channel IDs.
  • the suspected leaky label [user U1, hiking shoes] is taken out from the suspected leaked label set.
  • channel index because the detection label of channel 2 has "climbing shoes", channel 2 is added to the list of suspected leaking channel IDs. .
  • Step S6 detecting whether the push information is derived from the searched channel, and if yes, deleting the corresponding channel, and outputting the remaining channels as suspected leak channels.
  • the final channel list includes all possible tag data leaks.
  • more investigative means can be used to collect evidence, such as the inclusion of monitorable bait (honeypot) data in collaborative data, combined with offline surveys.
  • the detection label of the user needs to be updated.
  • the process of detecting the label update by the user in this embodiment is as follows:
  • the detection label with high probability of occurrence at the same time as the new normal label of the user is deleted;
  • the new detection label is newly added to the user, and the probability that the newly added detection label appears simultaneously with the existing label of the user is lower than the first threshold.
  • a tag data leakage channel detecting device is configured to detect a leak channel of user tag data, and the detecting device includes:
  • the detection label adding module is configured to add a detection label to the user based on the normal label owned by the user, and generate a user label data set;
  • a channel association module configured to assign a detection label to a given channel according to the user label data set, and establish a channel index associated with the user ID, the detection label, and the channel ID;
  • the intercepting module is configured to intercept the push information received by the user according to the probability of generating the push information by the normal label of the user;
  • the interception information analysis module is configured to filter, according to the intercepted push information, a probability of generating the push information by the user detecting the label, and if the probability that the push information is generated by the user detecting the label is higher than a given threshold, The user detection tag is added to the suspected leak tag set;
  • a channel retrieval module configured to search for a channel index according to a suspected leaked label set, and obtain a corresponding suspected leak channel ID list;
  • the output module is configured to detect whether the push information is derived from the searched channel, and if so, delete the corresponding channel, and output the remaining channels as suspected leak channels.
  • the apparatus of this embodiment can be applied to a background server of an application system, wherein the interception module can be integrated in the user terminal and intercepted on the user terminal side, and the interception module can use a third-party client such as a security guard, or A dedicated client to intercept.
  • a third-party client such as a security guard, or A dedicated client to intercept.
  • the detection label adding module adds a detection label to the user based on the normal label owned by the user, the probability that the newly added detection label appears simultaneously with the existing label of the user is lower than the set first threshold. That is, the newly generated detection label is not similar to the existing label in the user label set, and has a difference, and the probability of occurrence is low, so that it does not affect each other.
  • the channel association module gives a check to a given channel according to the user label data set.
  • measuring the label do the following:
  • the user ID is used as a variable, and the detection tag corresponding to the channel is selected from the detection label of the user according to the extracted HASH function;
  • the intercepting module performs the following operations when intercepting the push information received by the user according to the probability that the push information is generated by the normal label of the user:
  • the detecting label adding module is further configured to update the user detecting label according to the change of the normal label of the user, and specifically perform the following steps:
  • the detection label with high probability of occurrence at the same time as the new normal label of the user is deleted;
  • the new detection label is newly added to the user, and the probability that the newly added detection label appears simultaneously with the existing label of the user is lower than the first threshold.
  • the channel association module of this embodiment is further configured to remove the deleted detection tag related item from the channel index. Therefore, when the user generates a new normal label, the user label set is updated in time.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Medical Informatics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Storage Device Security (AREA)

Abstract

本发明公开了一种标签数据泄漏渠道检测方法及装置,该方法利用标签在同一用户出现的不同概率,对不同数据使用渠道产生不同的检测标签,然后对检测标签的使用进行间接检测,最后通过海量数据索引和搜索技术有效地检测可能的数据泄露渠道。本发明的装置包括检测标签添加模块、渠道关联模块、拦截模块、拦截信息分析模块、渠道检索模块和输出模块。本发明的检测方法及装置检测效率高,能够处理海量、动态的用户标签数据。

Description

一种标签数据泄漏渠道检测方法及装置 技术领域
本发明属于数据安全技术领域,尤其涉及一种标签数据泄漏渠道检测方法及装置。
背景技术
标签是一种互联网内容组织形式,是与对象实体的属性相关性很强的关键字。标签有助于轻松的描述和分类内容,以便于检索和分享。互联网发展中积累了大量以标签来表示的用户偏好数据,这些数据构成了互联网广告、推荐等产品的基础。另一方面,这些数据因为其价值,也和其他用户个人数据(PII)一起成为数据泄露的目标,被违规获取、转卖。现有的数据安全技术利用加密、系统加固、权限控制和审计监控,来防止数据泄露出数据所有者的可控环境。但是在数据合作的业务场景中,数据通常会离开数据所有者的可控环境,进入不可控的合作者的环境中去。在此场景中,传统的数据库水印技术和数据轨迹追踪技术不能解决海量的、动态的用户标签数据的挑战。
传统的数据库水印技术和数据轨迹追踪技术不能对用户标签这样缺乏数值型字段的数据有效地产生水印。其次标签数据通常被分散使用,从而使水印检测很困难。另外标签数据具有海量、动态特征,对水印的更新和检测也有很大挑战。标签数据的取值一般很常见,在互联网中进行追踪非常困难。
发明内容
本发明的目的是提供一种标签数据泄漏渠道检测方法及装置,以解决 现有技术方案标签数据难以跟踪检测的技术问题,能够有效地检测可能的数据泄露渠道。
为了实现上述目的,本发明技术方案如下:
一种标签数据泄漏渠道检测方法,用于检测用户标签数据的泄漏渠道,所述检测方法包括:
在用户拥有的正常标签基础上为用户添加检测标签,生成用户标签数据集;
根据用户标签数据集为给定的渠道赋予检测标签,建立用户ID、检测标签和渠道ID相关联的渠道索引;
根据由用户正常标签产生推送信息的概率,对用户接收的推送信息进行拦截;
对于拦截的推送信息,根据由该用户检测标签产生该推送信息的概率进行筛选,如果由该用户检测标签产生该推送信息的概率高于给定的阈值,则将该用户检测标签加入到疑似泄漏标签集合;
根据疑似泄漏标签集合,搜索渠道索引,得到对应的疑似泄漏渠道ID列表;
检测该推送信息是否来源于所搜索到的渠道,如果是,删除对应的渠道,将剩下的渠道作为疑似泄漏渠道输出。
进一步地,所述在用户拥有的正常标签基础上为用户添加检测标签,包括:
新添加的检测标签与用户现有标签同时出现的概率低于设定的第一阈值。
进一步地,所述根据用户标签数据集为给定的渠道赋予检测标签,建立用户ID、检测标签和渠道ID相关联的渠道索引,包括:
对于给定的渠道,根据其历史行为计算其可信度;
以该渠道的渠道ID作为变量,从设定的HASH函数集中选取一个HASH函数;
基于渠道可信度抽样用户群;
对抽样得到的用户群中每一个用户,以用户ID作为变量,根据抽取得到的HASH函数从该用户的检测标签中选出该渠道对应的检测标签;
建立[用户ID、检测标签]到渠道ID的渠道索引。
进一步地,所述根据由用户正常标签产生推送信息的概率,对用户接收的推送信息进行拦截,包括:
如果推送信息由正常标签产生的概率低于设定的第二阈值,则进行拦截,否则向用户展示该推送信息。
进一步地,所述检测方法还包括根据用户正常标签的变化更新用户检测标签的步骤,具体包括:
根据新的正常标签与现有检测标签同时出现概率,删除与用户新的正常标签同时出现概率高的检测标签;
重新为用户添加新的检测标签,新添加的检测标签与用户现有标签同时出现的概率低于第一阈值。
进一步地,所述检测方法还包括:
从渠道索引中除去被删除检测标签相关项。
本发明还提出了一种标签数据泄漏渠道检测装置,用于检测用户标签数据的泄漏渠道,所述检测装置包括:
检测标签添加模块,用于在用户拥有的正常标签基础上为用户添加检测标签,生成用户标签数据集;
渠道关联模块,用于根据用户标签数据集为给定的渠道赋予检测标签,建立用户ID、检测标签和渠道ID相关联的渠道索引;
拦截模块,用于根据由用户正常标签产生推送信息的概率,对用户接 收的推送信息进行拦截;
拦截信息分析模块,用于对于拦截的推送信息,根据由该用户检测标签产生该推送信息的概率进行筛选,如果由该用户检测标签产生该推送信息的概率高于给定的阈值,则将该用户检测标签加入到疑似泄漏标签集合;
渠道检索模块,用于根据疑似泄漏标签集合,搜索渠道索引,得到对应的疑似泄漏渠道ID列表;
输出模块,用于检测该推送信息是否来源于所搜索到的渠道,如果是,删除对应的渠道,将剩下的渠道作为疑似泄漏渠道输出。
进一步地,所述检测标签添加模块在用户拥有的正常标签基础上为用户添加检测标签时,新添加的检测标签与用户现有标签同时出现的概率低于设定的第一阈值。
进一步地,所述渠道关联模块在根据用户标签数据集为给定的渠道赋予检测标签时,执行如下操作:
对于给定的渠道,根据其历史行为计算其可信度;
以该渠道的渠道ID作为变量,从设定的HASH函数集中选取一个HASH函数;
基于渠道可信度抽样用户群;
对抽样得到的用户群中每一个用户,以用户ID作为变量,根据抽取得到的HASH函数从该用户的检测标签中选出该渠道对应的检测标签;
建立[用户ID、检测标签]到渠道ID的渠道索引。
进一步地,所述拦截模块在根据由用户正常标签产生推送信息的概率,对用户接收的推送信息进行拦截时,执行如下操作:
如果推送信息由正常标签产生的概率低于设定的第二阈值,则进行拦截,否则向用户展示该推送信息。
进一步地,所述检测标签添加模块还用于根据用户正常标签的变化更 新用户检测标签的步骤,具体执行如下步骤:
根据新的正常标签与现有检测标签同时出现概率,删除与用户新的正常标签同时出现概率高的检测标签;
重新为用户添加新的检测标签,新添加的检测标签与用户现有标签同时出现的概率低于第一阈值。
进一步地,所述渠道关联模块还用于从渠道索引中除去被删除检测标签相关项。
本发明提出了一种标签数据泄漏渠道检测方法及装置,利用标签在同一用户出现的不同概率,对不同数据使用渠道产生不同的检测标签。然后对检测标签的使用进行间接检测,最后通过海量数据索引和搜索技术有效地检测可能的数据泄露渠道。检测方法效率高,能够处理海量、动态的用户标签数据。
附图说明
图1为本发明标签数据泄漏渠道检测方法流程图;
图2为本发明标签数据泄漏渠道检测装置结构示意图。
具体实施方式
下面结合附图和实施例对本发明技术方案做进一步详细说明,以下实施例不构成对本发明的限定。
用户浏览互联网时,浏览的网页会为用户生成表示其偏好的标签,互联网发展中积累了大量以标签来表示的用户偏好数据。本发明在用户拥有正常标签的基础上,为每一个用户加一定量的检测标签,当发现有检测标签导致的推送信息时,可以根据该推送信息查找用户标签数据泄漏的渠道。本实施例推送信息可以包括广告,推送的网页等,以下以广告为例进行说明。
本实施例一种标签数据泄漏渠道检测方法,如图1所示,包括:
步骤S1、在用户拥有的正常标签基础上为用户添加检测标签,生成用户标签数据集。
本实施例将由用户上网而产生的标识用户偏好的标签称为正常标签,而将通过本步骤为用户生成的用于后续检测的标签称为检测标签,显然检测标签不代表用户的偏好,仅用作后续的检测。用户标签数据集包括正常标签和检测标签。
为了后续分析方便,每个用户需有足够多的检测标签,以便对应不同的渠道。为此,当用户没有足够多的检测标签时,为用户生成检测标签,使用户的检测标签达到设定的数量。
例如,用户U1有两个正常标签,分别为:看电视、垃圾快餐,而本实施例要求的检测标签为两个,则为其产生两个检测标签,例如为:蔬菜、登山鞋。
具体生成用户检测标签的过程如下:
判断用户标签数据集中是否有指定数量的检测标签,如果已经达到指定的数量则结束,否则进入下一步;
生成一个与用户现有标签同时出现概率低于设定的第一阈值的标签,将该标签作为用户的检测标签加入到用户标签数据集。
其中,在生成新的检测标签时,需要在常见的标签中找到一个与用户现有的正常标签、现有的检测标签同时出现概率较低的标签,即新生成的检测标签与用户标签集中现有标签均不相似,具有差异性,同时出现的概率低。
步骤S2、根据用户标签数据集为给定的渠道赋予检测标签,建立用户ID、检测标签和渠道ID相关联的渠道索引。
对于给定的渠道,可以根据其历史行为计算其可信度。本实施例渠道是指使用用户数据的渠道,例如一个网络平台将自己的用户数据提供给一 个广告商,该广告商就是网络平台的客户,也是使用用户数据的一个渠道。渠道的可信度是指该渠道根据用户数据发送广告的可信度,如果该渠道不是基于用户数据来推送广告,而是将用户不感兴趣的广告推送给用户则不可信。并且可以利用该渠道的唯一ID作为变量key,从一个设定的Hash函数集里面选取Hash函数H1。接下来基于渠道可信度抽样用户群,可信度高的渠道抽样人群可以小一些。然后对抽样人群的每一个用户,以用户ID为key,用H1函数从该用户的检测标签集中选出该渠道对应的检测标签。
例如对于给定的渠道1,抽样用户中包括用户U1,通过H1函数与用户U1的用户ID计算得到一个随机值,根据该随机值从用户U1的所有检测标签中选择一个检测标签赋予给渠道1。例如渠道1,通过H1函数计算得到的随机值为1,则根据用户1检测标签的排序,选择第一个检测标签赋予给渠道1。假设将用户U1的检测标签“蔬菜”赋予给渠道1。
同样地,将用户U1的检测标签“登山鞋”赋予给渠道2。
这样就可以建立[用户ID,检测标签]到渠道ID的渠道索引,即在渠道索引中建立一条记录,例如建立如表1所示的渠道索引:
序号 [用户ID,检测标签] 渠道ID
1 [U1,蔬菜] 渠道1
2 [U1,登山鞋] 渠道2
表1
在用户标签数据集中加入检测标签,仅将与渠道对应的检测标签赋予给对应的渠道,例如将[U1,登山鞋]赋予给渠道2。如果渠道2根据用户标签数据集来推送广告,无论是根据正常标签还是检测标签[U1,登山鞋]发送的广告都认为是安全的。而非法的用户获得泄漏的用户标签数据后,也向用户发送登山鞋之类的广告,根据渠道索引发现该非法渠道不是渠道索引中的渠道2时,则认为用户标签数据发生了泄漏。
步骤S3、根据由用户正常标签产生推送信息的概率,对用户接收的推 送信息进行拦截。
一般情况下,由于用户上网的终端一般都在用户一侧,因此用户接收到的广告是反映在用户的终端上的,对于广告的检测首先可以在用户终端上的客户端上进行。例如现在很多个人电脑和智能手机上都安装了安全助手,可以直接采用现有的安全助手在用户终端上进行广告拦截。当然也可以开发特定的客户端,用于在用户终端上进行广告检测。
在进行广告拦截时,如果广告由正常标签产生的概率低于设定的第二阈值,则进行拦截,否则向用户展示该广告。
容易理解的是,如果采用用户终端现有的安全助手,在步骤S2中,对于用户标签数据集,首先要过滤掉其中没有安装安全助手的人群。即仅对安装了安全助手的人群进行抽样,对于没有安装安全助手的用户不予考虑。这样可以不需要额外开发客户端,直接采用用户的安全助手来进行用户终端一侧的广告过滤。
具体地,对广告进行过滤,即根据由用户正常标签产生广告的概率,对用户接收的广告进行拦截,如果该广告由正常标签产生的概率低于设定的阈值,则进入下一步处理,否则向用户展示该广告。
需要说明的是,用户正常标签需要同步到该用户的用户端安全助手中,以便安全助手根据正常标签产生该广告的概率来进行拦截。根据正常标签产生该广告的概率,一般由安全助手根据该广告来源与用户正常标签的匹配程度来计算,这里不再赘述。对于由正常标签产生的概率低于设定的阈值的广告,进行拦截并发送到专门的后台服务器端进行下一步的处理。
步骤S4、对于拦截的推送信息,根据由该用户检测标签产生该推送信息的概率进行筛选,如果由该用户检测标签产生该推送信息的概率高于给定的阈值,则将该该用户检测标签加入到疑似泄漏标签集合。
对于发送到后台服务器端的广告,进一步根据由该用户检测标签产生该广告的概率进行筛选。如果由某一用户检测标签产生该广告的概率高于给定的阈值,则将该用户检测标签加入到疑似泄漏标签集合。
例如一个发送到用户U1的登山杖的广告,根据正常标签“看电视”、“垃圾快餐”产生的概率比较低,被发送到后台服务器端。然而对于用户U1的检测标签“登山鞋”,由“登山鞋”产生该广告的概率却比较高,因此[用户U1,登山鞋]被加入到疑似泄漏标签集合。
步骤S5、根据疑似泄漏标签集合,搜索渠道索引,得到对应的疑似泄漏渠道ID列表。
接下来,从疑似泄漏标签集合中取出疑似标签,并在渠道索引中进行搜索,得到有可能的渠道ID排序列表。
例如前面这个例子中,从疑似泄漏标签集合中取出疑似泄漏标签[用户U1,登山鞋],在渠道索引中因为渠道2的检测标签有“登山鞋”,将渠道2加入到疑似泄漏渠道ID列表。
步骤S6、检测该推送信息是否来源于所搜索到的渠道,如果是,删除对应的渠道,将剩下的渠道作为疑似泄漏渠道输出。
最后,需要检测该用户终端的广告来源是否是渠道2,如果是的话则表明是合规情况,从渠道列表中删除。
最终渠道列表中包括了所有可能的标签数据泄露渠道。对这些渠道,可以采取更多的调查手段收集证据,例如在合作数据中加入可监控的诱饵(蜜罐)数据,结合线下调查等手段。
进一步地,由于用户的正常标签经常得到更新,在更新了用户的正常标签后,需要更新该用户的检测标签。本实施例用户检测标签更新的过程如下:
根据新的正常标签与现有检测标签同时出现概率,删除与用户新的正常标签同时出现概率高的检测标签;
重新为用户添加新的检测标签,新添加的检测标签与用户现有标签同时出现的概率低于第一阈值。
对应地,还需要对渠道索引进行更新:
从渠道索引中除去被删除检测标签相关项。
从而更新了渠道索引,以便再次拦截广告时,采用新的渠道索引来检测疑似泄漏渠道。
如图2所示,一种标签数据泄漏渠道检测装置,用于检测用户标签数据的泄漏渠道,该检测装置包括:
检测标签添加模块,用于在用户拥有的正常标签基础上为用户添加检测标签,生成用户标签数据集;
渠道关联模块,用于根据用户标签数据集为给定的渠道赋予检测标签,建立用户ID、检测标签和渠道ID相关联的渠道索引;
拦截模块,用于根据由用户正常标签产生推送信息的概率,对用户接收的推送信息进行拦截;
拦截信息分析模块,用于对于拦截的推送信息,根据由该用户检测标签产生该推送信息的概率进行筛选,如果由该用户检测标签产生该推送信息的概率高于给定的阈值,则将该用户检测标签加入到疑似泄漏标签集合;
渠道检索模块,用于根据疑似泄漏标签集合,搜索渠道索引,得到对应的疑似泄漏渠道ID列表;
输出模块,用于检测该推送信息是否来源于所搜索到的渠道,如果是,删除对应的渠道,将剩下的渠道作为疑似泄漏渠道输出。
容易理解的是,本实施例的装置可以应用于应用系统的后台服务器,其中拦截模块可以集成在用户终端,在用户终端侧进行拦截,该拦截模块可以采用第三方的客户端如安全卫士,或专门的客户端来进行拦截。
本实施例检测标签添加模块在用户拥有的正常标签基础上为用户添加检测标签时,新添加的检测标签与用户现有标签同时出现的概率低于设定的第一阈值。即新生成的检测标签与用户标签集中现有标签均不相似,具有差异性,同时出现的概率低,从而不会相互发生影响。
本实施例渠道关联模块在根据用户标签数据集为给定的渠道赋予检 测标签时,执行如下操作:
对于给定的渠道,根据其历史行为计算其可信度;
以该渠道的渠道ID作为变量,从设定的HASH函数集中选取一个HASH函数;
基于渠道可信度抽样用户群;
对抽样得到的用户群中每一个用户,以用户ID作为变量,根据抽取得到的HASH函数从该用户的检测标签中选出该渠道对应的检测标签;
建立[用户ID、检测标签]到渠道ID的渠道索引。
本实施例拦截模块在根据由用户正常标签产生推送信息的概率,对用户接收的推送信息进行拦截时,执行如下操作:
如果推送信息由正常标签产生的概率低于设定的第二阈值,则进行拦截,否则向用户展示该推送信息。
本实施例检测标签添加模块还用于根据用户正常标签的变化更新用户检测标签的步骤,具体执行如下步骤:
根据新的正常标签与现有检测标签同时出现概率,删除与用户新的正常标签同时出现概率高的检测标签;
重新为用户添加新的检测标签,新添加的检测标签与用户现有标签同时出现的概率低于第一阈值。
本实施例渠道关联模块还用于从渠道索引中除去被删除检测标签相关项。从而在用户产生新的正常标签时,及时对用户标签集进行更新。
以上实施例仅用以说明本发明的技术方案而非对其进行限制,在不背离本发明精神及其实质的情况下,熟悉本领域的技术人员当可根据本发明作出各种相应的改变和变形,但这些相应的改变和变形都应属于本发明所附的权利要求的保护范围。

Claims (12)

  1. 一种标签数据泄漏渠道检测方法,用于检测用户标签数据的泄漏渠道,其特征在于,所述检测方法包括:
    在用户拥有的正常标签基础上为用户添加检测标签,生成用户标签数据集;
    根据用户标签数据集为给定的渠道赋予检测标签,建立用户ID、检测标签和渠道ID相关联的渠道索引;
    根据由用户正常标签产生推送信息的概率,对用户接收的推送信息进行拦截;
    对于拦截的推送信息,根据由该用户检测标签产生该推送信息的概率进行筛选,如果由该用户检测标签产生该推送信息的概率高于给定的阈值,则将该用户检测标签加入到疑似泄漏标签集合;
    根据疑似泄漏标签集合,搜索渠道索引,得到对应的疑似泄漏渠道ID列表;
    检测该推送信息是否来源于所搜索到的渠道,如果是,删除对应的渠道,将剩下的渠道作为疑似泄漏渠道输出。
  2. 根据权利要求1所述的标签数据泄漏渠道检测方法,其特征在于,所述在用户拥有的正常标签基础上为用户添加检测标签,包括:
    新添加的检测标签与用户现有标签同时出现的概率低于设定的第一阈值。
  3. 根据权利要求1所述的标签数据泄漏渠道检测方法,其特征在于,所述根据用户标签数据集为给定的渠道赋予检测标签,建立用户ID、检测标签和渠道ID相关联的渠道索引,包括:
    对于给定的渠道,根据其历史行为计算其可信度;
    以该渠道的渠道ID作为变量,从设定的HASH函数集中选取一个 HASH函数;
    基于渠道可信度抽样用户群;
    对抽样得到的用户群中每一个用户,以用户ID作为变量,根据抽取得到的HASH函数从该用户的检测标签中选出该渠道对应的检测标签;
    建立[用户ID、检测标签]到渠道ID的渠道索引。
  4. 根据权利要求1所述的标签数据泄漏渠道检测方法,其特征在于,所述根据由用户正常标签产生推送信息的概率,对用户接收的推送信息进行拦截,包括:
    如果推送信息由正常标签产生的概率低于设定的第二阈值,则进行拦截,否则向用户展示该推送信息。
  5. 根据权利要求1所述的标签数据泄漏渠道检测方法,其特征在于,所述检测方法还包括根据用户正常标签的变化更新用户检测标签的步骤,具体包括:
    根据新的正常标签与现有检测标签同时出现概率,删除与用户新的正常标签同时出现概率高的检测标签;
    重新为用户添加新的检测标签,新添加的检测标签与用户现有标签同时出现的概率低于第一阈值。
  6. 根据权利要求5所述的标签数据泄漏渠道检测方法,其特征在于,所述检测方法还包括:
    从渠道索引中除去被删除检测标签相关项。
  7. 一种标签数据泄漏渠道检测装置,用于检测用户标签数据的泄漏渠道,其特征在于,所述检测装置包括:
    检测标签添加模块,用于在用户拥有的正常标签基础上为用户添加检测标签,生成用户标签数据集;
    渠道关联模块,用于根据用户标签数据集为给定的渠道赋予检测标签, 建立用户ID、检测标签和渠道ID相关联的渠道索引;
    拦截模块,用于根据由用户正常标签产生推送信息的概率,对用户接收的推送信息进行拦截;
    拦截信息分析模块,用于对于拦截的推送信息,根据由该用户检测标签产生该推送信息的概率进行筛选,如果由该用户检测标签产生该推送信息的概率高于给定的阈值,则将该用户检测标签加入到疑似泄漏标签集合;
    渠道检索模块,用于根据疑似泄漏标签集合,搜索渠道索引,得到对应的疑似泄漏渠道ID列表;
    输出模块,用于检测该推送信息是否来源于所搜索到的渠道,如果是,删除对应的渠道,将剩下的渠道作为疑似泄漏渠道输出。
  8. 根据权利要求7所述的标签数据泄漏渠道检测装置,其特征在于,所述检测标签添加模块在用户拥有的正常标签基础上为用户添加检测标签时,新添加的检测标签与用户现有标签同时出现的概率低于设定的第一阈值。
  9. 根据权利要求7所述的标签数据泄漏渠道检测装置,其特征在于,所述渠道关联模块在根据用户标签数据集为给定的渠道赋予检测标签时,执行如下操作:
    对于给定的渠道,根据其历史行为计算其可信度;
    以该渠道的渠道ID作为变量,从设定的HASH函数集中选取一个HASH函数;
    基于渠道可信度抽样用户群;
    对抽样得到的用户群中每一个用户,以用户ID作为变量,根据抽取得到的HASH函数从该用户的检测标签中选出该渠道对应的检测标签;
    建立[用户ID、检测标签]到渠道ID的渠道索引。
  10. 根据权利要求7所述的标签数据泄漏渠道检测装置,其特征在于,所述拦截模块在根据由用户正常标签产生推送信息的概率,对用户接收的 推送信息进行拦截时,执行如下操作:
    如果推送信息由正常标签产生的概率低于设定的第二阈值,则进行拦截,否则向用户展示该推送信息。
  11. 根据权利要求7所述的标签数据泄漏渠道检测装置,其特征在于,所述检测标签添加模块还用于根据用户正常标签的变化更新用户检测标签的步骤,具体执行如下步骤:
    根据新的正常标签与现有检测标签同时出现概率,删除与用户新的正常标签同时出现概率高的检测标签;
    重新为用户添加新的检测标签,新添加的检测标签与用户现有标签同时出现的概率低于第一阈值。
  12. 根据权利要求11所述的标签数据泄漏渠道检测装置,其特征在于,所述渠道关联模块还用于从渠道索引中除去被删除检测标签相关项。
PCT/CN2016/110714 2015-12-31 2016-12-19 一种标签数据泄漏渠道检测方法及装置 WO2017114209A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
JP2018532787A JP6895972B2 (ja) 2015-12-31 2016-12-19 ラベルデータ漏洩チャネル検出方法および装置
US16/020,872 US10678946B2 (en) 2015-12-31 2018-06-27 Method and apparatus for detecting label data leakage channel
US16/874,012 US11080427B2 (en) 2015-12-31 2020-05-14 Method and apparatus for detecting label data leakage channel

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201511028180.5 2015-12-31
CN201511028180.5A CN106933880B (zh) 2015-12-31 2015-12-31 一种标签数据泄漏渠道检测方法及装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/020,872 Continuation US10678946B2 (en) 2015-12-31 2018-06-27 Method and apparatus for detecting label data leakage channel

Publications (1)

Publication Number Publication Date
WO2017114209A1 true WO2017114209A1 (zh) 2017-07-06

Family

ID=59225617

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/110714 WO2017114209A1 (zh) 2015-12-31 2016-12-19 一种标签数据泄漏渠道检测方法及装置

Country Status (4)

Country Link
US (2) US10678946B2 (zh)
JP (1) JP6895972B2 (zh)
CN (1) CN106933880B (zh)
WO (1) WO2017114209A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10678946B2 (en) 2015-12-31 2020-06-09 Alibaba Group Holding Limited Method and apparatus for detecting label data leakage channel

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020009861A1 (en) 2018-07-02 2020-01-09 Walmart Apollo, Llc Systems and methods for detecting exposed data
CN109739889B (zh) * 2018-12-27 2020-12-08 北京三未信安科技发展有限公司 一种基于数据映射的数据泄漏溯源判定方法及系统
CN117528154B (zh) * 2024-01-04 2024-03-29 湖南快乐阳光互动娱乐传媒有限公司 一种视频投放方法、装置、电子设备及存储介质

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7668821B1 (en) * 2005-11-17 2010-02-23 Amazon Technologies, Inc. Recommendations based on item tagging activities of users
CN104778419A (zh) * 2015-04-15 2015-07-15 华中科技大学 云环境下基于动态数据流跟踪的用户隐私数据保护方法
CN104965890A (zh) * 2015-06-17 2015-10-07 深圳市腾讯计算机系统有限公司 广告推荐的方法和装置

Family Cites Families (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005149126A (ja) * 2003-11-14 2005-06-09 Sony Corp 情報取得システム、情報取得方法、及び情報処理プログラム
JP2005222135A (ja) * 2004-02-03 2005-08-18 Internatl Business Mach Corp <Ibm> データベースアクセス監視装置、情報流出元特定システム、データベースアクセス監視方法、情報流出元特定方法、およびプログラム
US8893300B2 (en) * 2010-09-20 2014-11-18 Georgia Tech Research Corporation Security systems and methods to reduce data leaks in enterprise networks
JP2012150652A (ja) * 2011-01-19 2012-08-09 Kddi Corp インフルエンサー抽出装置、インフルエンサー抽出方法およびプログラム
US8799227B2 (en) * 2011-11-11 2014-08-05 Blackberry Limited Presenting metadata from multiple perimeters
JP5572646B2 (ja) * 2012-02-10 2014-08-13 ヤフー株式会社 情報提供装置、情報提供方法および情報提供プログラム
US9349015B1 (en) * 2012-06-12 2016-05-24 Galois, Inc. Programmatically detecting collusion-based security policy violations
CN103581863B (zh) 2012-08-08 2018-06-22 中兴通讯股份有限公司 扣费方法及装置
JP5921693B2 (ja) * 2012-08-09 2016-05-24 日本電信電話株式会社 トレースセンタ装置
CN103870000B (zh) * 2012-12-11 2018-12-14 百度国际科技(深圳)有限公司 一种对输入法所产生的候选项进行排序的方法及装置
US9444719B2 (en) * 2013-03-05 2016-09-13 Comcast Cable Communications, Llc Remote detection and measurement of data signal leakage
CN103237018A (zh) * 2013-03-29 2013-08-07 东莞宇龙通信科技有限公司 一种客户端匹配方法、服务器及通信系统
CN103281403A (zh) * 2013-06-19 2013-09-04 浙江工商大学 一种在网络销售渠道中提高个人信息安全的云保护系统
EP2998901B1 (en) * 2013-07-05 2020-06-17 Nippon Telegraph and Telephone Corporation Unauthorized-access detection system and unauthorized-access detection method
US9208551B2 (en) * 2013-08-28 2015-12-08 Intuit Inc. Method and system for providing efficient feedback regarding captured optical image quality
US10108918B2 (en) * 2013-09-19 2018-10-23 Acxiom Corporation Method and system for inferring risk of data leakage from third-party tags
CN103581883A (zh) * 2013-10-31 2014-02-12 宇龙计算机通信科技(深圳)有限公司 通信终端及其应用数据的获取方法
CN103581190B (zh) * 2013-11-07 2016-04-27 江南大学 一种基于云计算技术的文件安全访问控制方法
CN103593465A (zh) * 2013-11-26 2014-02-19 北京网秦天下科技有限公司 用于诊断应用推广渠道异常的方法和设备
US9256727B1 (en) * 2014-02-20 2016-02-09 Symantec Corporation Systems and methods for detecting data leaks
JP6215095B2 (ja) * 2014-03-14 2017-10-18 株式会社日立製作所 情報システム
CN104133837B (zh) * 2014-06-24 2017-10-31 上海交通大学 一种基于分布式计算的互联网信息投放渠道优化系统
CN106933880B (zh) 2015-12-31 2020-08-11 阿里巴巴集团控股有限公司 一种标签数据泄漏渠道检测方法及装置

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7668821B1 (en) * 2005-11-17 2010-02-23 Amazon Technologies, Inc. Recommendations based on item tagging activities of users
CN104778419A (zh) * 2015-04-15 2015-07-15 华中科技大学 云环境下基于动态数据流跟踪的用户隐私数据保护方法
CN104965890A (zh) * 2015-06-17 2015-10-07 深圳市腾讯计算机系统有限公司 广告推荐的方法和装置

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10678946B2 (en) 2015-12-31 2020-06-09 Alibaba Group Holding Limited Method and apparatus for detecting label data leakage channel
US11080427B2 (en) 2015-12-31 2021-08-03 Alibaba Group Holding Limited Method and apparatus for detecting label data leakage channel

Also Published As

Publication number Publication date
CN106933880B (zh) 2020-08-11
JP6895972B2 (ja) 2021-06-30
JP2019508779A (ja) 2019-03-28
CN106933880A (zh) 2017-07-07
US20180314856A1 (en) 2018-11-01
US10678946B2 (en) 2020-06-09
US11080427B2 (en) 2021-08-03
US20200272765A1 (en) 2020-08-27

Similar Documents

Publication Publication Date Title
US20220122097A1 (en) Method and system for providing business intelligence based on user behavior
US10600076B2 (en) Systems and methods for obfuscated audience measurement
US20230127891A1 (en) Systems and methods of managing data rights and selective data sharing
US20190294642A1 (en) Website fingerprinting
CN105850100B (zh) 用于受众测量的系统和方法
US10304036B2 (en) Social media profiling for one or more authors using one or more social media platforms
WO2017114209A1 (zh) 一种标签数据泄漏渠道检测方法及装置
Pv et al. UbCadet: detection of compromised accounts in twitter based on user behavioural profiling
Dewan et al. Facebook Inspector (FbI): Towards automatic real-time detection of malicious content on Facebook
US20150081602A1 (en) Apparatus and Method to Increase Accuracy in Individual Attributes Derived from Anonymous Aggregate Data
Vo et al. Revealing and detecting malicious retweeter groups
Adedoyin-Olowe et al. Trcm: a methodology for temporal analysis of evolving concepts in twitter
EP3384451A1 (en) Method for detecting web tracking services
Sharma et al. Anonymisation in social network: A literature survey and classification
Roedler et al. Profile matching across online social networks based on geo-tags
He et al. Mobile app identification for encrypted network flows by traffic correlation
Karakaya et al. Survey of cross device matching approaches with a case study on a novel database
CN112084501A (zh) 一种恶意程序的检测方法、装置、电子设备及存储介质
US10445312B1 (en) Systems and methods for extracting signal differences from sparse data sets
Funkhouser et al. Device graphing by example
Alnajjar et al. Feature indexing and search optimization for enhancing the forensic analysis of mobile cloud environment
Malloy et al. Graphing crumbling cookies
US20240007502A1 (en) Automated social media-related brand protection
Flores et al. Utilizing web trackers for sybil defense
Parthiban et al. SIMILARITY-BASED CLUSTERING AND SECURITY ASSURANCE MODEL FOR BIG DATA PROCESSING IN CLOUD ENVIRONMENT.

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16880998

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2018532787

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16880998

Country of ref document: EP

Kind code of ref document: A1