WO2019019711A1 - Method and apparatus for publishing behaviour pattern data, terminal device and medium - Google Patents

Method and apparatus for publishing behaviour pattern data, terminal device and medium Download PDF

Info

Publication number
WO2019019711A1
WO2019019711A1 PCT/CN2018/083551 CN2018083551W WO2019019711A1 WO 2019019711 A1 WO2019019711 A1 WO 2019019711A1 CN 2018083551 W CN2018083551 W CN 2018083551W WO 2019019711 A1 WO2019019711 A1 WO 2019019711A1
Authority
WO
WIPO (PCT)
Prior art keywords
behavior pattern
noise
data
pattern distribution
behavior
Prior art date
Application number
PCT/CN2018/083551
Other languages
French (fr)
Chinese (zh)
Inventor
王健宗
吴天博
黄章成
肖京
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2019019711A1 publication Critical patent/WO2019019711A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6227Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database where protection concerns the structure of data, e.g. records, types, queries

Definitions

  • the present application belongs to the field of data processing, and in particular, to a method, device, terminal device and medium for distributing behavior mode data.
  • social media data includes not only the effective information published by the user but also the relationship between the user and the user, the attacker can mine the effective information posted by the user and the mutual users. The relationship matches the user's identity in the social network, and thus can obtain the user's private information in all directions.
  • these social media data are provided to the right researchers, they can use data mining analysis methods to extract information that has a driving value for social progress.
  • the embodiment of the present application provides a method, an apparatus, a terminal device, and a medium for distributing behavior pattern data, so as to solve the problem that the behavior mode data of the mining value cannot be released while protecting the privacy of the user in the prior art.
  • the problem is a method, an apparatus, a terminal device, and a medium for distributing behavior pattern data, so as to solve the problem that the behavior mode data of the mining value cannot be released while protecting the privacy of the user in the prior art.
  • a first aspect of the embodiments of the present application provides a method for publishing behavior mode data, including:
  • the noise-added behavior pattern data is released.
  • a second aspect of the embodiments of the present application provides a device for distributing behavior pattern data, the apparatus for distributing behavior pattern data comprising means for performing a method for distributing behavior pattern data according to the above first aspect.
  • a third aspect of the embodiments of the present application provides a terminal device, including a memory and a processor, where the computer stores computer readable instructions executable on the processor, the processor executing the computer
  • the step of the method of issuing the behavior pattern data as described in the first aspect is implemented when the instruction is read.
  • a fourth aspect of the embodiments of the present application provides a computer readable storage medium storing computer readable instructions, the computer readable instructions being executed by a processor to implement the first aspect as described in the first aspect The steps of the method of publishing behavioral data.
  • the behavior pattern data that are not related to each other can be summarized; by integrating the random noise function into the behavior pattern distribution function, the pattern distribution function based on the noise enhancement behavior is obtained.
  • the published behavioral pattern data is noisy, so even if the attacker steals the released noisy behavior pattern data, it still cannot accurately match the noisy behavior pattern data to each user, thus strengthening the user's individual. Privacy security.
  • the behavior mode data after the noise preservation still retains the reference value of the original behavior pattern data, thus ensuring that the scientific research personnel can perform effective analysis and mining based on the noise-added behavior pattern data.
  • FIG. 1 is a flowchart of an implementation of a method for distributing behavior pattern data provided by an embodiment of the present application
  • FIG. 2 is a specific implementation flowchart of a method for publishing behavior mode data provided by an embodiment of the present application
  • FIG. 3 is a specific implementation flowchart of a method for publishing behavior pattern data according to an embodiment of the present application
  • FIG. 4 is a structural block diagram of an apparatus for distributing behavior pattern data according to an embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of a terminal device according to an embodiment of the present application.
  • FIG. 1 is a flowchart showing an implementation process of a method for distributing behavior pattern data provided by an embodiment of the present application, which is described in detail as follows:
  • S101 Acquire social media data published by multiple users.
  • all data published by the user in a social platform such as a forum, a blog, a microblog, a chat room, or a circle of friends is social media data. Since the social platform collects and stores the social media data published by the user, the social media data is collected and stored, so that the social media data stored by each social platform can be read by calling a preset platform docking interface.
  • the social media data published by the same social accounts with the same registered mailboxes are summarized and counted.
  • the social media data corresponding to each real user is obtained by summarizing.
  • S102 Establish a behavior pattern distribution function corresponding to the social media data.
  • the social media data published by each user on each social platform is analyzed and processed to determine the user's behavior pattern distribution.
  • the behavior pattern distribution indicates the probability of publishing various types of social media data.
  • the above categories may be, for example, sports, music, food, travel, and the like.
  • the behavior pattern distribution of the user is modeled. Specifically, each user's behavior pattern distribution is traversed first, and then processed by an existing behavior pattern modeling algorithm to generate a behavior pattern distribution function that can be used to describe multiple user behavior patterns.
  • the foregoing S102 specifically includes:
  • each user has a corresponding one of the user tags.
  • User tags are used to identify different users.
  • the user tag can be, for example, a user's social account number, user name, and the like.
  • the social media data obtained in the above S101 is N (N is greater than zero and N is an integer) social media data published by the user, the N users constitute one user set. A user tag is recorded in the user set.
  • the subset is a set of users obtained by dividing a plurality of users, that is, each subset is also a set of users including one or more user tags.
  • the social media data published by 8 users is obtained, and the user labels of the 8 users are “Tom”, “Susan”, “Sue”, “Jack”, “Bob”, “John”, “Zoe”, respectively. "And “Lily”, then a set of users composed of these 8 users is ⁇ Tom; Susan; Sue; Jack; Bob; John; Zoe; Lily ⁇ .
  • a subset of the set of users is, for example, ⁇ Tom; Susan; Sue ⁇ and ⁇ Jack; Bob; John ⁇ and the like.
  • S202 Perform parsing processing on the social media data corresponding to each of the user sets to obtain behavior pattern data of the user set.
  • the social media data corresponding to the user set is social media data published by each user in the user set. Therefore, since the user tags in different user sets are different, the social media data corresponding to different user sets is also different.
  • the social media data corresponding to each user set is processed, and the behavior pattern data of each user in the user set can be determined.
  • the behavior pattern data indicates the frequency of occurrence and weight of various categories of words in the social media data published by the user.
  • S203 Input behavior pattern data of each of the user sets into a preset behavior pattern distribution model.
  • the above behavior pattern distribution model is:
  • the x is a set of users
  • the y is a set containing respective words in the social media data
  • the f j (x, y) is a proportion of the j-th feature of the user in the y
  • ⁇ j,y is the attribute parameter corresponding to the j-th feature of the user
  • the Z(x) is a normalization factor
  • F is the total number of features of the user set x.
  • the social media data published by each user in the user set x is subjected to word segmentation processing to filter out a plurality of words corresponding to each user.
  • Social media data often contains words that do not have actual semantics. These words do not have the data mining reference value. Therefore, among the selected words, the stop words are removed, and the keywords corresponding to each user are obtained.
  • the first vector formed by each keyword corresponding to each user is referred to as the information state space of the user, and each element in the information state space is different from each other. For example, the information state space corresponding to the user Tom is [apple, London, Messi, football, rain].
  • n is the total number of elements in the user's information state space.
  • each feature corresponding to the user set is obtained by a Linear Discriminant Analysis algorithm, and the feature is used to represent the category of the keyword.
  • Each feature has a preset one vocabulary, and each feature corresponds to one attribute parameter ⁇ .
  • each keyword in the information state space corresponding to the user exists in the thesaurus. If one or more keywords in the information state space corresponding to the user exist in the thesaurus, the respective element values corresponding to the one or more keywords in the second vector are obtained. After the respective element values are added to obtain a sum value, the product of the sum value and the attribute parameter ⁇ of the feature corresponding to the lexicon is output as a behavior feature distribution value of the user.
  • the behavior pattern distribution model of the attribute parameter ⁇ can be obtained by substituting the above behavior pattern distribution model.
  • each attribute parameter in the behavior pattern distribution model is determined based on the maximum entropy principle, wherein each attribute parameter corresponds to one feature. Since the principle of maximum entropy is a common technical means in the art, it will not be repeated one by one.
  • S204 may be implemented based on the following principles: adjusting each attribute parameter in the behavior pattern distribution model multiple times; after each adjustment, calculating the entropy of the current behavior pattern distribution model and recording the attribute parameters at this time; Among the calculated entropies, the largest entropy value is determined, and the attribute parameters corresponding to the entropy value are selected.
  • S205 Output the behavior pattern distribution model determined by the attribute parameter as a behavior pattern distribution function corresponding to the social media data.
  • the attribute parameter corresponding to each feature is a constant value. Therefore, by re-substituting the constant value into the above formula (1), the behavior of the user set x and the set y as variables can be obtained. Mode distribution function.
  • the behavior pattern data corresponding to each user set is fitted through a preset behavior pattern distribution model, and the behavior pattern distribution function after the attribute parameter determination can be generated. Since the normalization factor is added to the behavior pattern distribution model, the over-fitting of the behavior pattern data is avoided, the accuracy of the behavior pattern distribution function is improved, and the behavior pattern data obtained after the subsequent noise-adding processing can be ensured. It reflects the overall distribution of behavioral attributes and maintains the value of data mining.
  • the obtained noise-added behavior pattern distribution function A(x) is as follows:
  • the preset random noise function may be a function based on a mechanism such as a Laplace mechanism or an exponential mechanism.
  • the preset random noise function obeys the scale parameter as Laplace distribution
  • the random noise function can be expressed as Wherein ⁇ f is a global sensitivity of the behavior pattern distribution function, and the ⁇ is a preset privacy protection budget parameter.
  • the privacy protection budget parameter ⁇ is used to control the probability ratio of the behavior pattern distribution function to obtain the same output on two adjacent user sets, which embodies the privacy protection level that the behavior pattern distribution function can provide.
  • the behavior pattern distribution function will output two probability distributions completely. The same result.
  • these results do not reflect any useful information about the data set. Therefore, in order to achieve a balance between security and usability of the output, the default privacy protection budget parameter ⁇ has a value of 0.1.
  • the preset random noise function is a Laplacian-based function
  • the global sensitivity ⁇ f of the user set x is first calculated, and then combined with the preset privacy protection budget parameter ⁇ to determine that the Laplace is consistent.
  • Distributed random noise function is a Laplacian-based function
  • the global sensitivity ⁇ f of the user set x is calculated by the following formula:
  • F(x) is the above behavior pattern distribution function.
  • ⁇ F(x)-F(x') ⁇ is the first-order norm distance between F(x) and F(x').
  • F(x) and F(x') are adjacent data sets, that is, the maximum difference between the total number of set elements corresponding to F(x) and F(x') is 1.
  • the random user function is added to the behavior pattern distribution function to protect the potential user privacy information in the behavior pattern data.
  • the attacker needs to query the behavior pattern data, the attacking process is not for the attacker. Visible, thus realizing that even if the attacker has mastered any of the noisy behavior pattern data, the attacker can still not guess the user's original behavior pattern data, thereby being able to completely remove the possibility of leakage of private information from the data source.
  • S104 Input the behavior mode data query parameter into the noise-added behavior pattern distribution function to obtain the noise-added behavior pattern data.
  • the behavior mode query parameter may be automatically generated, or may be obtained from a data query request sent by the querier.
  • the noise-added behavior pattern data corresponding to the behavior pattern data query parameter can be directly calculated.
  • FIG. 3 shows a specific implementation process of a method for publishing behavior mode data provided by an embodiment of the present application, which is described in detail below;
  • S301 Acquire a data query request carrying a behavior mode data query parameter, where the behavior mode data query parameter includes a user set and a feature.
  • S303 Input the random parameter and the behavior mode data query parameter into the noise-adding behavior pattern distribution function to obtain the noise-added behavior pattern data.
  • the user's behavior pattern data can be calculated by the behavior pattern distribution function F(x). For each feature in the user set, a random parameter a is generated, and the random parameter a is input into the random noise function, and the random noise value can be calculated based on the global sensitivity ⁇ f of the user set x and the preset privacy protection budget parameter ⁇ . . After the random noise value and the user behavior pattern data are added, the noise-added behavior pattern data is obtained.
  • the process of releasing the noise-added behavior pattern data is a process of sharing the noise-added behavior pattern data to any external user under the condition of satisfying the privacy protection, including a process of responding to the behavior pattern data query request issued by the queryer, and All the noise-added behavior pattern data is actively released on a preset data sharing platform.
  • the behavior pattern data that are not related to each other can be summarized; by integrating the random noise function into the behavior pattern distribution function, the pattern distribution function based on the noise enhancement behavior is obtained.
  • the published behavioral pattern data is noisy, so even if the attacker steals the released noisy behavior pattern data, it still cannot accurately match the noisy behavior pattern data to each user, thus strengthening the user's individual. Privacy security.
  • the behavior mode data after the noise preservation still retains the reference value of the original behavior pattern data, thus ensuring that the scientific research personnel can perform effective analysis and mining based on the noise-added behavior pattern data.
  • FIG 4 is a block diagram showing the structure of the behavior mode data providing apparatus provided in the embodiment of the present application. For the convenience of description, only the embodiment of the present application is shown. Related parts.
  • the apparatus includes:
  • the obtaining unit 41 is configured to acquire social media data published by a plurality of users.
  • the establishing unit 42 is configured to establish a behavior pattern distribution function corresponding to the social media data.
  • the noise adding unit 43 is configured to integrate the preset random noise function with the behavior pattern distribution function to obtain a noise added behavior pattern distribution function.
  • the input unit 44 is configured to input the behavior mode data query parameter into the noise-added behavior pattern distribution function to obtain the noise-added behavior pattern data.
  • the issuing unit 45 is configured to release the noise-added behavior mode data.
  • the establishing unit 42 includes:
  • the parsing subunit is configured to parse the social media data corresponding to each of the user sets to obtain behavior pattern data of the user set.
  • the first input subunit is configured to input behavior mode data of each of the user sets into a preset behavior pattern distribution model.
  • an output subunit configured to output the behavior pattern distribution model determined by the attribute parameter as a behavior pattern distribution function corresponding to the social media data.
  • the behavior pattern distribution model is:
  • the x is a set of users
  • the y is a set containing respective words in the social media data
  • the f j (x, y) is a proportion of the j-th feature of the user in the y
  • ⁇ j,y is the attribute parameter corresponding to the j-th feature of the user
  • the Z(x) is a normalization factor
  • the input unit 44 includes:
  • the obtaining subunit is configured to obtain a data query request carrying a behavior mode data query parameter, where the behavior mode data query parameter includes a user set and a feature.
  • a second input subunit configured to input the random parameter and the behavior mode data query parameter into the noise added behavior mode distribution function to obtain the noise added behavior pattern data.
  • the preset random noise function obeys a scale parameter Laplace distribution.
  • ⁇ f is a global sensitivity of the behavior pattern distribution function
  • is a preset privacy protection budget parameter
  • the behavior pattern data that are not related to each other can be summarized; by integrating the random noise function into the behavior pattern distribution function, the pattern distribution function based on the noise enhancement behavior is obtained.
  • the published behavioral pattern data is noisy, so even if the attacker steals the released noisy behavior pattern data, it still cannot accurately match the noisy behavior pattern data to each user, thus strengthening the user's individual. Privacy security.
  • the behavior mode data after the noise preservation still retains the reference value of the original behavior pattern data, thus ensuring that the scientific research personnel can perform effective analysis and mining based on the noise-added behavior pattern data.
  • FIG. 5 is a schematic diagram of a terminal device according to an embodiment of the present application.
  • the terminal device 5 of this embodiment includes a processor 50 and a memory 51 in which computer readable instructions 52, such as behavioral mode data, executable on the processor 50 are stored. Publish the program.
  • the processor 50 executes the computer readable instructions 52 to implement the steps in the foregoing method for distributing the respective behavior pattern data, such as steps S101 to S105 shown in FIG. 1.
  • the processor 50 when executing the computer readable instructions 52, implements the functions of the various modules/units in the various apparatus embodiments described above, such as the functions of the units 41 through 45 shown in FIG.
  • the computer readable instructions 52 may be partitioned into one or more modules/units that are stored in the memory 51 and executed by the processor 50, To complete this application.
  • the one or more modules/units may be a series of computer readable instruction segments capable of performing a particular function, the instruction segments being used to describe the execution of the computer readable instructions 52 in the terminal device 5.
  • the terminal device 5 may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server.
  • the terminal device may include, but is not limited to, the processor 50 and the memory 51. It will be understood by those skilled in the art that FIG. 5 is only an example of the terminal device 5, does not constitute a limitation of the terminal device 5, may include more or less components than the illustrated, or combine some components, or different components.
  • the terminal device may further include an input/output device, a network access device, a bus, and the like.
  • the processor 50 may be a central processing unit (CPU), or may be other general-purpose processors, a digital signal processor (DSP), an application specific integrated circuit (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, etc.
  • the general purpose processor may be a microprocessor or the processor or any conventional processor or the like.
  • the memory 51 may be an internal storage unit of the terminal device 5, such as a hard disk or a memory of the terminal device 5.
  • the memory 51 may also be an external storage device of the terminal device 5, such as a plug-in hard disk equipped on the terminal device 5, a smart memory card (SMC), and a secure digital (SD). Card, flash card, etc. Further, the memory 51 may also include both an internal storage unit of the terminal device 5 and an external storage device.
  • the memory 51 is configured to store the computer readable instructions and other programs and data required by the terminal device.
  • the memory 51 can also be used to temporarily store data that has been output or is about to be output.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
  • the integrated unit if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium.
  • a computer readable storage medium A number of instructions are included to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present application.
  • the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like, which can store program codes. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Databases & Information Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Medical Informatics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

Provided are a method and apparatus for publishing behaviour pattern data, a terminal device and a medium, which are suitable for the field of data processing. The method comprises: acquiring social media data published by a plurality of users; establishing a behaviour pattern distribution function corresponding to the social media data; integrating a pre-set random noise function and the behaviour pattern distribution function, so as to obtain a noise-added behaviour pattern distribution function; inputting behaviour pattern data query parameters into the noise-added behaviour pattern distribution function, so as to obtain noise-added behaviour pattern data; and publishing the noise-added behaviour pattern data. According to the solution, by integrating a random noise function with a behaviour pattern distribution function, all behaviour pattern data published on the basis of a noise-added behaviour pattern distribution function have noise, and thus the privacy security of users is enhanced. At the same time, the noise-added behaviour pattern data still retain the reference value of original behaviour pattern data, and thus it is ensured that scientific research personnel can carry out effective analysis and mining based on the noise-added behaviour pattern data.

Description

行为模式数据的发布方法、装置、终端设备及介质Method, device, terminal device and medium for distributing behavior pattern data
本申请要求于2017年07月24日提交中国专利局、申请号为201710605631.X、发明名称为“行为模式数据的发布方法及终端设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese Patent Application filed on July 24, 2017, the Chinese Patent Office, the application number is 201710605631.X, and the invention name is "the release method of the behavioral model data and the terminal device", the entire contents of which are incorporated by reference. In this application.
技术领域Technical field
本申请属于数据处理领域,尤其涉及一种行为模式数据的发布方法、装置、终端设备及介质。The present application belongs to the field of data processing, and in particular, to a method, device, terminal device and medium for distributing behavior mode data.
背景技术Background technique
社交媒体的发展使得用户的个人隐私越来越容易被他人获取。具体地,由于社交媒体数据不仅包含了用户发布的有效信息,还包含了用户与用户之间的相互关系,因此,对于攻击者来说,其可以通过挖掘用户发布的有效信息以及用户间的相互关系,匹配出用户在社会网络中的身份,进而能够全方位地获取用户的隐私信息。然而,如果把这些社交媒体数据提供给合适的科研人员,其却能通过数据挖掘分析方法,挖掘出对社会进步具有推动价值的信息。The development of social media makes the user's personal privacy more and more accessible to others. Specifically, since the social media data includes not only the effective information published by the user but also the relationship between the user and the user, the attacker can mine the effective information posted by the user and the mutual users. The relationship matches the user's identity in the social network, and thus can obtain the user's private information in all directions. However, if these social media data are provided to the right researchers, they can use data mining analysis methods to extract information that has a driving value for social progress.
现有技术中,为了保护用户的隐私安全,通常需要对社交媒体数据所对应的用户行为模式数据进行加密处理。这种方式下,科研人员难以基于加密数据来分析用户的行为模式。因此,保护社交媒体数据所集中的个人隐私和对社交媒体数据加以分析利用来推动社会进步,二者之间存在着互斥的关系。现有技术中,无法在保护用户隐私安全的同时,发布有挖掘价值的行为模式数据。In the prior art, in order to protect the privacy of the user, it is generally required to encrypt the user behavior mode data corresponding to the social media data. In this way, it is difficult for researchers to analyze user behavior patterns based on encrypted data. Therefore, there is a mutually exclusive relationship between the protection of personal privacy in social media data and the analysis and utilization of social media data to promote social progress. In the prior art, it is impossible to release the behavior mode data with mining value while protecting the privacy of the user.
技术问题technical problem
有鉴于此,本申请实施例提供了一种行为模式数据的发布方法、装置、终端设备及介质,以解决现有技术中,无法在保护用户隐私安全的同时,发布有挖掘价值的行为模式数据的问题。In view of this, the embodiment of the present application provides a method, an apparatus, a terminal device, and a medium for distributing behavior pattern data, so as to solve the problem that the behavior mode data of the mining value cannot be released while protecting the privacy of the user in the prior art. The problem.
技术解决方案Technical solution
本申请实施例的第一方面提供了一种行为模式数据的发布方法,包括:A first aspect of the embodiments of the present application provides a method for publishing behavior mode data, including:
获取多个用户发布的社交媒体数据;Get social media data published by multiple users;
建立所述社交媒体数据对应的行为模式分布函数;Establishing a behavior pattern distribution function corresponding to the social media data;
将预设的所述随机噪声函数与所述行为模式分布函数进行整合,得到加噪行为模式分布函数;Integrating the preset random noise function with the behavior pattern distribution function to obtain a noise-added behavior pattern distribution function;
将行为模式数据查询参数输入所述加噪行为模式分布函数,以得到加噪行为模式数据;Entering the behavior mode data query parameter into the noise-added behavior pattern distribution function to obtain the noise-added behavior pattern data;
发布所述加噪行为模式数据。The noise-added behavior pattern data is released.
本申请实施例的第二方面提供了一种行为模式数据的发布装置,该行为模式数据的发布装置包括用于执行上述第一方面所述的行为模式数据的发布方法的单元。A second aspect of the embodiments of the present application provides a device for distributing behavior pattern data, the apparatus for distributing behavior pattern data comprising means for performing a method for distributing behavior pattern data according to the above first aspect.
本申请实施例的第三方面提供了一种终端设备,包括存储器以及处理器,所述存储器中存储有可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现如第一方面所述的行为模式数据的发布方法的步骤。A third aspect of the embodiments of the present application provides a terminal device, including a memory and a processor, where the computer stores computer readable instructions executable on the processor, the processor executing the computer The step of the method of issuing the behavior pattern data as described in the first aspect is implemented when the instruction is read.
本申请实施例的第四方面提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机可读指令,所述计算机可读指令被处理器执行时实现如第一方面所述的行为模式数据的发布方法的步骤。A fourth aspect of the embodiments of the present application provides a computer readable storage medium storing computer readable instructions, the computer readable instructions being executed by a processor to implement the first aspect as described in the first aspect The steps of the method of publishing behavioral data.
有益效果Beneficial effect
本申请实施例中,通过建立社交媒体数据对应的行为模式分布函数,能够将互不关联的行为模式数据进行归纳;通过将随机噪声函数整合于行为模式分布函数,使得基于加噪行为模式分布函数所发布的行为模式数据都是带有噪声的,因此,即使攻击者窃取到已发布的加噪行为模式数据,其仍然无法将加噪行为模式数据准确匹配到各个用户,因而加强了用户的个人隐私安全。同时,加噪后的行为模式数据依然保留了原有行为模式数据的参考价值,故保证了科研人员能够基于加噪行为模式数据进行有效的分析挖掘。In the embodiment of the present application, by establishing a behavior pattern distribution function corresponding to the social media data, the behavior pattern data that are not related to each other can be summarized; by integrating the random noise function into the behavior pattern distribution function, the pattern distribution function based on the noise enhancement behavior is obtained. The published behavioral pattern data is noisy, so even if the attacker steals the released noisy behavior pattern data, it still cannot accurately match the noisy behavior pattern data to each user, thus strengthening the user's individual. Privacy security. At the same time, the behavior mode data after the noise preservation still retains the reference value of the original behavior pattern data, thus ensuring that the scientific research personnel can perform effective analysis and mining based on the noise-added behavior pattern data.
附图说明DRAWINGS
图1是本申请实施例提供的行为模式数据的发布方法的实现流程图;1 is a flowchart of an implementation of a method for distributing behavior pattern data provided by an embodiment of the present application;
图2是本申请实施例提供的行为模式数据的发布方法S102的具体实现流程图;2 is a specific implementation flowchart of a method for publishing behavior mode data provided by an embodiment of the present application;
图3是本申请实施例提供的行为模式数据的发布方法S104的具体实现流程图;FIG. 3 is a specific implementation flowchart of a method for publishing behavior pattern data according to an embodiment of the present application;
图4是本申请实施例提供的行为模式数据的发布装置的结构框图;4 is a structural block diagram of an apparatus for distributing behavior pattern data according to an embodiment of the present application;
图5是本申请实施例提供的终端设备的结构示意图。FIG. 5 is a schematic structural diagram of a terminal device according to an embodiment of the present application.
本发明的实施方式Embodiments of the invention
为了说明本申请所述的技术方案,下面通过具体实施例来进行说明。In order to explain the technical solutions described in the present application, the following description will be made by way of specific embodiments.
图1示出了本申请实施例提供的行为模式数据的发布方法的实现流程,详述如下:FIG. 1 is a flowchart showing an implementation process of a method for distributing behavior pattern data provided by an embodiment of the present application, which is described in detail as follows:
S101:获取多个用户发布的社交媒体数据。S101: Acquire social media data published by multiple users.
本申请实施例中,用户在论坛、博客、微博、聊天室或朋友圈等社交平台中所发布的一切数据为社交媒体数据。由于社交平台在接收到用户所发布的上述社交媒体数据时,会将这些社交媒体数据进行收集并存储,因而可通过调用预先设置的平台对接接口,读取各社交平台所存储的社交媒体数据。In the embodiment of the present application, all data published by the user in a social platform such as a forum, a blog, a microblog, a chat room, or a circle of friends is social media data. Since the social platform collects and stores the social media data published by the user, the social media data is collected and stored, so that the social media data stored by each social platform can be read by calling a preset platform docking interface.
对于同一用户而言,其往往拥有多个社交平台的社交账号,且社交账号也经常会使用相同的注册邮箱,因此,将注册邮箱相同的各个社交账号所发布的社交媒体数据进行归纳及统计,以汇总得到各个真实用户所对应的社交媒体数据。For the same user, it often has social accounts of multiple social platforms, and social accounts often use the same registered mailboxes. Therefore, the social media data published by the same social accounts with the same registered mailboxes are summarized and counted. The social media data corresponding to each real user is obtained by summarizing.
S102:建立所述社交媒体数据对应的行为模式分布函数。S102: Establish a behavior pattern distribution function corresponding to the social media data.
对每一用户在各个社交平台所发布的社交媒体数据进行分析处理,以确定用户的行为模式分布。行为模式分布表示,各种类别的社交媒体数据的发布概率。上述类别例如可以是体育、音乐、食物、旅游等。The social media data published by each user on each social platform is analyzed and processed to determine the user's behavior pattern distribution. The behavior pattern distribution indicates the probability of publishing various types of social media data. The above categories may be, for example, sports, music, food, travel, and the like.
为了归纳出多个用户的社交媒体数据之间的行为模式分布的总体状况,对用户的行为模式分布进行建模。具体地,先遍历出用户的各个行为模式分布,再通过现有的行为模式建模算法来对其进行处理,从而生成可用于描述多个用户行为模式的行为模式分布函数。In order to generalize the overall state of behavior pattern distribution between social media data of multiple users, the behavior pattern distribution of the user is modeled. Specifically, each user's behavior pattern distribution is traversed first, and then processed by an existing behavior pattern modeling algorithm to generate a behavior pattern distribution function that can be used to describe multiple user behavior patterns.
作为本申请的一个实施例,如图2所示,上述S102具体包括:As an embodiment of the present application, as shown in FIG. 2, the foregoing S102 specifically includes:
S201:将所述多个用户划分为多个用户集。S201: Divide the multiple users into multiple user sets.
本申请实施例中,每一用户具有相应的一个用户标签。用户标签用于标识不同的用户。用户标签例如可以是用户的社交账号、用户名等。In this embodiment of the present application, each user has a corresponding one of the user tags. User tags are used to identify different users. The user tag can be, for example, a user's social account number, user name, and the like.
若上述S101中所获得的社交媒体数据为N(N大于零且N为整数)个用户所发布的社交媒体数据,则N个用户构成一个用户集。在用户集中记录有用户标签。If the social media data obtained in the above S101 is N (N is greater than zero and N is an integer) social media data published by the user, the N users constitute one user set. A user tag is recorded in the user set.
获取该用户集的各个子集,则该子集即为将多个用户进行划分后所得到的用户集,即,每个子集同样为包含一个或多个用户标签的用户集。To obtain each subset of the user set, the subset is a set of users obtained by dividing a plurality of users, that is, each subset is also a set of users including one or more user tags.
例如,若获取得到8个用户发布的社交媒体数据,且这8个用户的用户标签分别为“Tom”、“Susan”、“Sue”、“Jack”、“Bob”、“John”、“Zoe”以及“Lily”,则这8个用户所构成的一个用户集为{Tom;Susan;Sue;Jack;Bob;John;Zoe;Lily}。该用户集的子集例如是{Tom;Susan;Sue}以及{Jack;Bob;John}等。For example, if the social media data published by 8 users is obtained, and the user labels of the 8 users are “Tom”, “Susan”, “Sue”, “Jack”, “Bob”, “John”, “Zoe”, respectively. "And "Lily", then a set of users composed of these 8 users is {Tom; Susan; Sue; Jack; Bob; John; Zoe; Lily}. A subset of the set of users is, for example, {Tom; Susan; Sue} and {Jack; Bob; John} and the like.
S202:对每一所述用户集对应的所述社交媒体数据进行解析处理,得到所述用户集的行为模式数据。S202: Perform parsing processing on the social media data corresponding to each of the user sets to obtain behavior pattern data of the user set.
对于其中的一个用户集而言,该用户集所对应的社交媒体数据为用户集中各个用户所发布的社交媒体数据。因此,由于不同用户集中的用户标签不同,故不同用户集所对应的社交媒体数据也不同。For one of the user sets, the social media data corresponding to the user set is social media data published by each user in the user set. Therefore, since the user tags in different user sets are different, the social media data corresponding to different user sets is also different.
对每一用户集所对应的社交媒体数据进行处理,可确定该用户集中各个用户的行为模式数据。行为模式数据表示用户所发布的社交媒体数据中各种类别词语的出现频率以及权重等。The social media data corresponding to each user set is processed, and the behavior pattern data of each user in the user set can be determined. The behavior pattern data indicates the frequency of occurrence and weight of various categories of words in the social media data published by the user.
S203:将每一所述用户集的行为模式数据输入预设的行为模式分布模型。S203: Input behavior pattern data of each of the user sets into a preset behavior pattern distribution model.
作为本申请的一个实施例,上述行为模式分布模型为:As an embodiment of the present application, the above behavior pattern distribution model is:
Figure PCTCN2018083551-appb-000001
Figure PCTCN2018083551-appb-000001
其中,所述x为用户集,所述y为包含社交媒体数据中的各个词语的集合,所述f j(x,y)为用户的第j个特征在所述y中所占的比重,λ j,y为用户的第j个特征所对应的属性参数,所述Z(x)为正规化因子,F为用户集x的特征总数。 Wherein the x is a set of users, the y is a set containing respective words in the social media data, and the f j (x, y) is a proportion of the j-th feature of the user in the y, λ j,y is the attribute parameter corresponding to the j-th feature of the user, the Z(x) is a normalization factor, and F is the total number of features of the user set x.
本申请实施例中,对用户集x中每一用户所发布的社交媒体数据进行分词处理,以筛选出每一用户对应的多个词语。社交媒体数据中往往包含有不具备实际语义的词语,这些词语不具备数据挖掘参考价值,故在筛选出的多个词语中,去除停用词,得到每一用户所对应的关键词。将每一用户对应的各个关键词所构成的第一向量称为该用户的信息状态空间,且信息状态空间中的各个元素互不相同。例如,用户Tom所对应的信息状态空间为[apple,London,Messi,football,rain]。In the embodiment of the present application, the social media data published by each user in the user set x is subjected to word segmentation processing to filter out a plurality of words corresponding to each user. Social media data often contains words that do not have actual semantics. These words do not have the data mining reference value. Therefore, among the selected words, the stop words are removed, and the keywords corresponding to each user are obtained. The first vector formed by each keyword corresponding to each user is referred to as the information state space of the user, and each element in the information state space is different from each other. For example, the information state space corresponding to the user Tom is [apple, London, Messi, football, rain].
对于每一用户的信息状态空间中的每一关键词,获取该关键词在该用户所对应的社交媒体数据中的出现频率。将各个关键词所对应的出现频率共同输出为第二向量[a 1,a 2,a 3,a 4…,a n]。其中,n为用户的信息状态空间中的元素总数。 For each keyword in the information state space of each user, the frequency of occurrence of the keyword in the social media data corresponding to the user is obtained. The appearance frequencies corresponding to the respective keywords are collectively output as the second vector [a 1 , a 2 , a 3 , a 4 ..., a n ]. Where n is the total number of elements in the user's information state space.
本申请实施例中,通过线性判别式分析(Linear Discriminant Analysis)算法来获取用户集所对应的各个特征,所述特征用于表示关键词的类别。其中,每个特征具有预设的一个词库,且每个特征对应一个属性参数λ。In the embodiment of the present application, each feature corresponding to the user set is obtained by a Linear Discriminant Analysis algorithm, and the feature is used to represent the category of the keyword. Each feature has a preset one vocabulary, and each feature corresponds to one attribute parameter λ.
判断用户所对应的信息状态空间中的各个关键词是否存在于词库。若用户所对应的信息状态空间中的一个或多个关键词存在于词库,则获取所述一个或多个关键词在第二向量中所分别对应的各个元素值。将各个元素值进行相加得到和值后,将该和值与词库所对应的特征的属性参数λ的乘积输出为用户的一个行为特征分布值。It is determined whether each keyword in the information state space corresponding to the user exists in the thesaurus. If one or more keywords in the information state space corresponding to the user exist in the thesaurus, the respective element values corresponding to the one or more keywords in the second vector are obtained. After the respective element values are added to obtain a sum value, the product of the sum value and the attribute parameter λ of the feature corresponding to the lexicon is output as a behavior feature distribution value of the user.
例如,若f 1为其中的一个代表“sports”的特征,且该特征所对应的词库为[football,Messi,Ronaldo,manchester],属性参数为λ 1,则对于用户Tom所对应的信息状态空间[apple,London,Messi,football,rain]来说,该信息状态空间中的“Messi”及“football”均存在于词库。假设该信息状态空间所对应的第二向量为[a 1,a 2,a 3,a 4…,a n],则获取“Messi”及“football”在第二向量中所分别对应的元素值,即a 3和a 4。最终可得,用户Tom在“sports”特征上所对应的行为特征分布值为(a 3+a 41For example, if f 1 is one of the features representing “sports” and the lexicon corresponding to the feature is [football, Messi, Ronaldo, manchester], and the attribute parameter is λ 1 , then the information state corresponding to the user Tom In the space [apple, London, Messi, football, rain], both "Messi" and "football" in the information state space exist in the thesaurus. Assuming that the second vector corresponding to the information state space is [a 1 , a 2 , a 3 , a 4 ..., a n ], the element values corresponding to the "Messi" and "football" in the second vector are respectively obtained. , ie a 3 and a 4 . Finally, the behavioral attribute distribution value corresponding to the user's "sports" feature is (a 3 + a 4 ) λ 1 .
以此类推,计算出用户集中各个用户的行为特征分布值后,代入上述行为模式分布模型,可得到一个关于属性参数λ变化的行为模式分布模型。By analogy, after calculating the distribution of the behavior characteristics of each user in the user set, the behavior pattern distribution model of the attribute parameter λ can be obtained by substituting the above behavior pattern distribution model.
进一步地,上述行为模式分布模型中的正规化因子Z(x)所对应的取值为
Figure PCTCN2018083551-appb-000002
Further, the value corresponding to the normalization factor Z(x) in the behavior pattern distribution model is
Figure PCTCN2018083551-appb-000002
S204:最大化所述行为模式分布模型的熵,以确定所述行为模式分布模型中的属性参数。S204: Maximizing an entropy of the behavior pattern distribution model to determine an attribute parameter in the behavior pattern distribution model.
本申请实施例中,基于最大熵原理来确定出行为模式分布模型中的各个属性参数,其中,每一属性参数与一个特征对应。由于最大熵原理是本领域的常用技术手段,因而不再一一赘述。In the embodiment of the present application, each attribute parameter in the behavior pattern distribution model is determined based on the maximum entropy principle, wherein each attribute parameter corresponds to one feature. Since the principle of maximum entropy is a common technical means in the art, it will not be repeated one by one.
作为本申请的一个实施示例,S204可基于以下原理实现:多次调整行为模式分布模型中的各个属性参数;在每一次调整后,计算当前行为模式分布模型的熵并记录此时的属性参数;在计算得到的各个熵中,确定出最大的一个熵值,对该熵值所对应的属性参数进行选取。As an implementation example of the present application, S204 may be implemented based on the following principles: adjusting each attribute parameter in the behavior pattern distribution model multiple times; after each adjustment, calculating the entropy of the current behavior pattern distribution model and recording the attribute parameters at this time; Among the calculated entropies, the largest entropy value is determined, and the attribute parameters corresponding to the entropy value are selected.
S205:将所述属性参数确定后的所述行为模式分布模型输出为所述社交媒体数据对应的行为模式分布函数。S205: Output the behavior pattern distribution model determined by the attribute parameter as a behavior pattern distribution function corresponding to the social media data.
在属性参数λ j,y确定后,每一特征所对应的属性参数则为常数值,因此,将常数值重新代上上述式子(1),可得到用户集x以及集合y为变量的行为模式分布函数。 After the attribute parameter λ j,y is determined, the attribute parameter corresponding to each feature is a constant value. Therefore, by re-substituting the constant value into the above formula (1), the behavior of the user set x and the set y as variables can be obtained. Mode distribution function.
本申请实施例中,通过预设的行为模式分布模型对各个用户集所对应的行为模式数据进行拟合,能够生成属性参数确定后的行为模式分布函数。由于在行为模式分布模型中加入了正规化因子,故避免了行为模式数据过拟合的情况发生,提高了行为模式分布函数的准确性,保证了后续加噪处理后所得到的行为模式数据能够体现整体的行为属性分布,维持了数据的挖掘价值。In the embodiment of the present application, the behavior pattern data corresponding to each user set is fitted through a preset behavior pattern distribution model, and the behavior pattern distribution function after the attribute parameter determination can be generated. Since the normalization factor is added to the behavior pattern distribution model, the over-fitting of the behavior pattern data is avoided, the accuracy of the behavior pattern distribution function is improved, and the behavior pattern data obtained after the subsequent noise-adding processing can be ensured. It reflects the overall distribution of behavioral attributes and maintains the value of data mining.
S103:将预设的所述随机噪声函数与所述行为模式分布函数进行整合,得到加噪行为模式分布函数。S103: Integrate the preset random noise function with the behavior pattern distribution function to obtain a noise-added behavior pattern distribution function.
将随机噪声函数Q(x)与行为模式分布函数F(x)进行相加整合后,得到的加噪行为模式分布函数A(x)如下:After adding and integrating the random noise function Q(x) and the behavior pattern distribution function F(x), the obtained noise-added behavior pattern distribution function A(x) is as follows:
A(x)=F(x)+Q(x)A(x)=F(x)+Q(x)
其中,x表示上述用户集。Where x represents the above set of users.
本申请实施例中,预设的随机噪声函数可以为基于拉普拉斯(Laplace)机制或指数机制等机制的函数。In the embodiment of the present application, the preset random noise function may be a function based on a mechanism such as a Laplace mechanism or an exponential mechanism.
作为本申请的一个实施例,预设的随机噪声函数服从尺度参数为
Figure PCTCN2018083551-appb-000003
的拉普拉斯分布,此时,随机噪声函数可表示为
Figure PCTCN2018083551-appb-000004
其中,所述Δf为所述行为模式分布函数的全局敏感度,所述ε为预设的隐私保护预算参数。
As an embodiment of the present application, the preset random noise function obeys the scale parameter as
Figure PCTCN2018083551-appb-000003
Laplace distribution, at this time, the random noise function can be expressed as
Figure PCTCN2018083551-appb-000004
Wherein Δf is a global sensitivity of the behavior pattern distribution function, and the ε is a preset privacy protection budget parameter.
本申请实施例中,隐私保护预算参数ε用来控制行为模式分布函数在两个邻近的用户集上获得相同输出的概率比值,其体现了行为模式分布函数所能够提供的隐私保护水平。隐私 保护预算参数ε的值越小,表示隐私保护水平越高,当ε=0时,保护水平达到最高,此时对于任意两个邻近的用户集,行为模式分布函数将输出两个概率分布完全相同的结果。但这些结果不能反映任何关于数据集的有用信息,因此,为了达到输出结果的安全性与可用性的平衡,预设的隐私保护预算参数ε的取值为0.1。In the embodiment of the present application, the privacy protection budget parameter ε is used to control the probability ratio of the behavior pattern distribution function to obtain the same output on two adjacent user sets, which embodies the privacy protection level that the behavior pattern distribution function can provide. The smaller the value of the privacy protection budget parameter ε, the higher the level of privacy protection. When ε=0, the protection level reaches the highest. At this time, for any two adjacent user sets, the behavior pattern distribution function will output two probability distributions completely. The same result. However, these results do not reflect any useful information about the data set. Therefore, in order to achieve a balance between security and usability of the output, the default privacy protection budget parameter ε has a value of 0.1.
具体地,当预设的随机噪声函数为基于拉普拉斯机制的函数时,先计算用户集x的全局敏感度Δf,再结合预设的隐私保护预算参数ε来确定出符合拉普拉斯分布的随机噪声函数。Specifically, when the preset random noise function is a Laplacian-based function, the global sensitivity Δf of the user set x is first calculated, and then combined with the preset privacy protection budget parameter ε to determine that the Laplace is consistent. Distributed random noise function.
用户集x的全局敏感度Δf通过以下公式计算得出:The global sensitivity Δf of the user set x is calculated by the following formula:
Figure PCTCN2018083551-appb-000005
Figure PCTCN2018083551-appb-000005
其中,F(x)为上述行为模式分布函数。‖F(x)-F(x′)‖是F(x)与F(x′)之间的一阶范数距离。F(x)与F(x′)为邻近数据集,即F(x)与F(x′)所分别对应的集合元素总数的差值最大值为1。Where F(x) is the above behavior pattern distribution function. ‖F(x)-F(x')‖ is the first-order norm distance between F(x) and F(x'). F(x) and F(x') are adjacent data sets, that is, the maximum difference between the total number of set elements corresponding to F(x) and F(x') is 1.
本申请实施例中,通过对行为模式分布函数添加随机噪声函数,保护了行为模式数据中潜在的用户隐私信息,当攻击者需要查询行为模式数据时,由于加噪过程对于攻击者而言是不可见的,因而实现了即便攻击者已经掌握了任一加噪行为模式数据,该攻击者仍然无法推测出用户的原始行为模式数据,由此能够从数据源头彻底切除隐私信息泄露的可能性。In the embodiment of the present application, the random user function is added to the behavior pattern distribution function to protect the potential user privacy information in the behavior pattern data. When the attacker needs to query the behavior pattern data, the attacking process is not for the attacker. Visible, thus realizing that even if the attacker has mastered any of the noisy behavior pattern data, the attacker can still not guess the user's original behavior pattern data, thereby being able to completely remove the possibility of leakage of private information from the data source.
S104:将行为模式数据查询参数输入所述加噪行为模式分布函数,以得到加噪行为模式数据。S104: Input the behavior mode data query parameter into the noise-added behavior pattern distribution function to obtain the noise-added behavior pattern data.
本申请实施例中,可自动生成行为模式查询参数,也可从查询者所发出的数据查询请求中获取。In the embodiment of the present application, the behavior mode query parameter may be automatically generated, or may be obtained from a data query request sent by the querier.
例如,可遍历出社交媒体数据所对应的各个用户集以及用户集中所出现的各个特征,则其中的每一个用户集及其对应的任意一个特征均为数据查询请求中可能出现的行为模式查询参数。因此,可读取其中的一个用户集及特征,将该用户集及特征作为行为模式数据查询参数,输入加噪行为模式分布函数A(x)=F(x)+Q(x)中。For example, each user set corresponding to the social media data and each feature appearing in the user set may be traversed, and each of the user sets and any corresponding one of the features are behavior mode query parameters that may appear in the data query request. . Therefore, one of the user sets and features can be read, and the user set and features are used as behavior mode data query parameters, and input into the noisy behavior mode distribution function A(x)=F(x)+Q(x).
基于加噪行为模式分布函数中的各个属性参数以及获取到的行为模式数据查询参数,可直接计算出行为模式数据查询参数所对应的加噪行为模式数据。Based on each attribute parameter in the noise-adding behavior pattern distribution function and the acquired behavior pattern data query parameter, the noise-added behavior pattern data corresponding to the behavior pattern data query parameter can be directly calculated.
作为本申请的一个实施例,图3示出了本申请实施例提供的行为模式数据的发布方法S104的具体实现流程,详述如下;As an embodiment of the present application, FIG. 3 shows a specific implementation process of a method for publishing behavior mode data provided by an embodiment of the present application, which is described in detail below;
S301:获取携带有行为模式数据查询参数的数据查询请求,所述行为模式数据查询参数包括用户集以及特征。S301: Acquire a data query request carrying a behavior mode data query parameter, where the behavior mode data query parameter includes a user set and a feature.
S302:生成所述特征对应的随机参数。S302: Generate a random parameter corresponding to the feature.
S303:将所述随机参数以及所述行为模式数据查询参数输入所述加噪行为模式分布函数,以得到加噪行为模式数据。S303: Input the random parameter and the behavior mode data query parameter into the noise-adding behavior pattern distribution function to obtain the noise-added behavior pattern data.
由于用户集及特征已确定,故通过行为模式分布函数F(x)可计算出用户的行为模式数据。对于用户集中的每一特征,生成随机参数a,将该随机参数a输入随机噪声函数,则基于上述用户集x的全局敏感度Δf以及预设的隐私保护预算参数ε,可计算出随机噪声值。将随机噪声值以及用户行为模式数据进行相加处理后,得到加噪行为模式数据。Since the user set and characteristics have been determined, the user's behavior pattern data can be calculated by the behavior pattern distribution function F(x). For each feature in the user set, a random parameter a is generated, and the random parameter a is input into the random noise function, and the random noise value can be calculated based on the global sensitivity Δf of the user set x and the preset privacy protection budget parameter ε. . After the random noise value and the user behavior pattern data are added, the noise-added behavior pattern data is obtained.
S105:发布所述加噪行为模式数据。S105: Publish the noise-added behavior mode data.
加噪行为模式数据的发布过程即为在满足隐私保护的条件下,将加噪行为模式数据分享至外界任一用户的过程,包括对查询者发出的行为模式数据查询请求进行响应的过程以及将所有加噪行为模式数据主动发布于预设的数据分享平台等。The process of releasing the noise-added behavior pattern data is a process of sharing the noise-added behavior pattern data to any external user under the condition of satisfying the privacy protection, including a process of responding to the behavior pattern data query request issued by the queryer, and All the noise-added behavior pattern data is actively released on a preset data sharing platform.
本申请实施例中,通过建立社交媒体数据对应的行为模式分布函数,能够将互不关联的行为模式数据进行归纳;通过将随机噪声函数整合于行为模式分布函数,使得基于加噪行为模式分布函数所发布的行为模式数据都是带有噪声的,因此,即使攻击者窃取到已发布的加噪行为模式数据,其仍然无法将加噪行为模式数据准确匹配到各个用户,因而加强了用户的个人隐私安全。同时,加噪后的行为模式数据依然保留了原有行为模式数据的参考价值,故保证了科研人员能够基于加噪行为模式数据进行有效的分析挖掘。In the embodiment of the present application, by establishing a behavior pattern distribution function corresponding to the social media data, the behavior pattern data that are not related to each other can be summarized; by integrating the random noise function into the behavior pattern distribution function, the pattern distribution function based on the noise enhancement behavior is obtained. The published behavioral pattern data is noisy, so even if the attacker steals the released noisy behavior pattern data, it still cannot accurately match the noisy behavior pattern data to each user, thus strengthening the user's individual. Privacy security. At the same time, the behavior mode data after the noise preservation still retains the reference value of the original behavior pattern data, thus ensuring that the scientific research personnel can perform effective analysis and mining based on the noise-added behavior pattern data.
应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。It should be understood that the size of the sequence of the steps in the above embodiments does not mean that the order of execution is performed. The order of execution of each process should be determined by its function and internal logic, and should not be construed as limiting the implementation process of the embodiments of the present application.
对应于上文实施例所述的行为模式数据的发布方法,图4示出了本申请实施例提供的行为模式数据的发布装置的结构框图,为了便于说明,仅示出了与本申请实施例相关的部分。Figure 4 is a block diagram showing the structure of the behavior mode data providing apparatus provided in the embodiment of the present application. For the convenience of description, only the embodiment of the present application is shown. Related parts.
参照图4,该装置包括:Referring to Figure 4, the apparatus includes:
获取单元41,用于获取多个用户发布的社交媒体数据。The obtaining unit 41 is configured to acquire social media data published by a plurality of users.
建立单元42,用于建立所述社交媒体数据对应的行为模式分布函数。The establishing unit 42 is configured to establish a behavior pattern distribution function corresponding to the social media data.
加噪单元43,用于将预设的所述随机噪声函数与所述行为模式分布函数进行整合,得到加噪行为模式分布函数。The noise adding unit 43 is configured to integrate the preset random noise function with the behavior pattern distribution function to obtain a noise added behavior pattern distribution function.
输入单元44,用于将行为模式数据查询参数输入所述加噪行为模式分布函数,以得到加噪行为模式数据。The input unit 44 is configured to input the behavior mode data query parameter into the noise-added behavior pattern distribution function to obtain the noise-added behavior pattern data.
发布单元45,用于发布所述加噪行为模式数据。The issuing unit 45 is configured to release the noise-added behavior mode data.
可选地,所述建立单元42包括:Optionally, the establishing unit 42 includes:
划分子单元,用于将所述多个用户划分为多个用户集。Dividing a subunit for dividing the plurality of users into a plurality of user sets.
解析子单元,用于对每一所述用户集对应的所述社交媒体数据进行解析处理,得到所 述用户集的行为模式数据。The parsing subunit is configured to parse the social media data corresponding to each of the user sets to obtain behavior pattern data of the user set.
第一输入子单元,用于将每一所述用户集的行为模式数据输入预设的行为模式分布模型。The first input subunit is configured to input behavior mode data of each of the user sets into a preset behavior pattern distribution model.
确定子单元,用于最大化所述行为模式分布模型的熵,以确定所述行为模式分布模型中的属性参数。Determining a subunit for maximizing an entropy of the behavioral pattern distribution model to determine an attribute parameter in the behavioral pattern distribution model.
输出子单元,用于将所述属性参数确定后的所述行为模式分布模型输出为所述社交媒体数据对应的行为模式分布函数。And an output subunit, configured to output the behavior pattern distribution model determined by the attribute parameter as a behavior pattern distribution function corresponding to the social media data.
可选地,所述行为模式分布模型为:Optionally, the behavior pattern distribution model is:
Figure PCTCN2018083551-appb-000006
Figure PCTCN2018083551-appb-000006
其中,所述x为用户集,所述y为包含社交媒体数据中的各个词语的集合,所述f j(x,y)为用户的第j个特征在所述y中所占的比重,λ j,y为用户的第j个特征所对应的属性参数,所述Z(x)为正规化因子。 Wherein the x is a set of users, the y is a set containing respective words in the social media data, and the f j (x, y) is a proportion of the j-th feature of the user in the y, λ j,y is the attribute parameter corresponding to the j-th feature of the user, and the Z(x) is a normalization factor.
可选地,所述输入单元44包括:Optionally, the input unit 44 includes:
获取子单元,用于获取携带有行为模式数据查询参数的数据查询请求,所述行为模式数据查询参数包括用户集以及特征。The obtaining subunit is configured to obtain a data query request carrying a behavior mode data query parameter, where the behavior mode data query parameter includes a user set and a feature.
生成子单元,用于生成所述特征对应的随机参数。Generating a subunit for generating a random parameter corresponding to the feature.
第二输入子单元,用于将所述随机参数以及所述行为模式数据查询参数输入所述加噪行为模式分布函数,以得到加噪行为模式数据。And a second input subunit, configured to input the random parameter and the behavior mode data query parameter into the noise added behavior mode distribution function to obtain the noise added behavior pattern data.
可选地,所述预设的所述随机噪声函数服从尺度参数为
Figure PCTCN2018083551-appb-000007
的拉普拉斯分布。
Optionally, the preset random noise function obeys a scale parameter
Figure PCTCN2018083551-appb-000007
Laplace distribution.
其中,所述Δf为所述行为模式分布函数的全局敏感度,所述ε为预设的隐私保护预算参数。Wherein Δf is a global sensitivity of the behavior pattern distribution function, and the ε is a preset privacy protection budget parameter.
本申请实施例中,通过建立社交媒体数据对应的行为模式分布函数,能够将互不关联的行为模式数据进行归纳;通过将随机噪声函数整合于行为模式分布函数,使得基于加噪行为模式分布函数所发布的行为模式数据都是带有噪声的,因此,即使攻击者窃取到已发布的加噪行为模式数据,其仍然无法将加噪行为模式数据准确匹配到各个用户,因而加强了用户的个人隐私安全。同时,加噪后的行为模式数据依然保留了原有行为模式数据的参考价值,故保证了科研人员能够基于加噪行为模式数据进行有效的分析挖掘。In the embodiment of the present application, by establishing a behavior pattern distribution function corresponding to the social media data, the behavior pattern data that are not related to each other can be summarized; by integrating the random noise function into the behavior pattern distribution function, the pattern distribution function based on the noise enhancement behavior is obtained. The published behavioral pattern data is noisy, so even if the attacker steals the released noisy behavior pattern data, it still cannot accurately match the noisy behavior pattern data to each user, thus strengthening the user's individual. Privacy security. At the same time, the behavior mode data after the noise preservation still retains the reference value of the original behavior pattern data, thus ensuring that the scientific research personnel can perform effective analysis and mining based on the noise-added behavior pattern data.
图5是本申请一实施例提供的终端设备的示意图。如图5所示,该实施例的终端设备5包括:处理器50以及存储器51,所述存储器51中存储有可在所述处理器50上运行的计算机可读指令52,例如行为模式数据的发布程序。所述处理器50执行所述计算机可读指令52 时实现上述各个行为模式数据的发布方法实施例中的步骤,例如图1所示的步骤S101至S105。或者,所述处理器50执行所述计算机可读指令52时实现上述各装置实施例中各模块/单元的功能,例如图4所示单元41至45的功能。FIG. 5 is a schematic diagram of a terminal device according to an embodiment of the present application. As shown in FIG. 5, the terminal device 5 of this embodiment includes a processor 50 and a memory 51 in which computer readable instructions 52, such as behavioral mode data, executable on the processor 50 are stored. Publish the program. The processor 50 executes the computer readable instructions 52 to implement the steps in the foregoing method for distributing the respective behavior pattern data, such as steps S101 to S105 shown in FIG. 1. Alternatively, the processor 50, when executing the computer readable instructions 52, implements the functions of the various modules/units in the various apparatus embodiments described above, such as the functions of the units 41 through 45 shown in FIG.
示例性的,所述计算机可读指令52可以被分割成一个或多个模块/单元,所述一个或者多个模块/单元被存储在所述存储器51中,并由所述处理器50执行,以完成本申请。所述一个或多个模块/单元可以是能够完成特定功能的一系列计算机可读指令段,该指令段用于描述所述计算机可读指令52在所述终端设备5中的执行过程。Illustratively, the computer readable instructions 52 may be partitioned into one or more modules/units that are stored in the memory 51 and executed by the processor 50, To complete this application. The one or more modules/units may be a series of computer readable instruction segments capable of performing a particular function, the instruction segments being used to describe the execution of the computer readable instructions 52 in the terminal device 5.
所述终端设备5可以是桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。所述终端设备可包括,但不仅限于处理器50和存储器51。本领域技术人员可以理解,图5仅仅是终端设备5的示例,并不构成对终端设备5的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件,例如所述终端设备还可以包括输入输出设备、网络接入设备、总线等。The terminal device 5 may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server. The terminal device may include, but is not limited to, the processor 50 and the memory 51. It will be understood by those skilled in the art that FIG. 5 is only an example of the terminal device 5, does not constitute a limitation of the terminal device 5, may include more or less components than the illustrated, or combine some components, or different components. For example, the terminal device may further include an input/output device, a network access device, a bus, and the like.
所称处理器50可以是中央处理单元(Central Processing Unit,CPU),还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。The processor 50 may be a central processing unit (CPU), or may be other general-purpose processors, a digital signal processor (DSP), an application specific integrated circuit (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, etc. The general purpose processor may be a microprocessor or the processor or any conventional processor or the like.
所述存储器51可以是所述终端设备5的内部存储单元,例如终端设备5的硬盘或内存。所述存储器51也可以是所述终端设备5的外部存储设备,例如所述终端设备5上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。进一步地,所述存储器51还可以既包括所述终端设备5的内部存储单元也包括外部存储设备。所述存储器51用于存储所述计算机可读指令以及所述终端设备所需的其他程序和数据。所述存储器51还可以用于暂时地存储已经输出或者将要输出的数据。The memory 51 may be an internal storage unit of the terminal device 5, such as a hard disk or a memory of the terminal device 5. The memory 51 may also be an external storage device of the terminal device 5, such as a plug-in hard disk equipped on the terminal device 5, a smart memory card (SMC), and a secure digital (SD). Card, flash card, etc. Further, the memory 51 may also include both an internal storage unit of the terminal device 5 and an external storage device. The memory 51 is configured to store the computer readable instructions and other programs and data required by the terminal device. The memory 51 can also be used to temporarily store data that has been output or is about to be output.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit. The above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是 个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。The integrated unit, if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application, in essence or the contribution to the prior art, or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium. A number of instructions are included to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present application. The foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like, which can store program codes. .
以上所述,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。The above embodiments are only used to explain the technical solutions of the present application, and are not limited thereto; although the present application has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that they can still The technical solutions described in the embodiments are modified, or the equivalents of the technical features are replaced by the equivalents. The modifications and substitutions of the embodiments do not depart from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims (20)

  1. 一种行为模式数据的发布方法,其特征在于,包括:A method for distributing behavior pattern data, comprising:
    获取多个用户发布的社交媒体数据;Get social media data published by multiple users;
    建立所述社交媒体数据对应的行为模式分布函数;Establishing a behavior pattern distribution function corresponding to the social media data;
    将预设的所述随机噪声函数与所述行为模式分布函数进行整合,得到加噪行为模式分布函数;Integrating the preset random noise function with the behavior pattern distribution function to obtain a noise-added behavior pattern distribution function;
    将行为模式数据查询参数输入所述加噪行为模式分布函数,以得到加噪行为模式数据;Entering the behavior mode data query parameter into the noise-added behavior pattern distribution function to obtain the noise-added behavior pattern data;
    发布所述加噪行为模式数据。The noise-added behavior pattern data is released.
  2. 如权利要求1所述的行为模式数据的发布方法,其特征在于,所述建立所述社交媒体数据对应的行为模式分布函数,包括:The method for distributing behavior pattern data according to claim 1, wherein the establishing a behavior pattern distribution function corresponding to the social media data comprises:
    将所述多个用户划分为多个用户集;Dividing the plurality of users into a plurality of user sets;
    对每一所述用户集对应的所述社交媒体数据进行解析处理,得到所述用户集的行为模式数据;Performing parsing processing on the social media data corresponding to each of the user sets to obtain behavior pattern data of the user set;
    将每一所述用户集的行为模式数据输入预设的行为模式分布模型;Inputting behavior pattern data of each of the user sets into a preset behavior pattern distribution model;
    最大化所述行为模式分布模型的熵,以确定所述行为模式分布模型中的属性参数;Maximizing an entropy of the behavior pattern distribution model to determine an attribute parameter in the behavior pattern distribution model;
    将所述属性参数确定后的所述行为模式分布模型输出为所述社交媒体数据对应的行为模式分布函数。Outputting the behavior pattern distribution model determined by the attribute parameter as a behavior pattern distribution function corresponding to the social media data.
  3. 如权利要求2所述的行为模式数据的发布方法,其特征在于,所述行为模式分布模型为:The method for distributing behavior pattern data according to claim 2, wherein the behavior pattern distribution model is:
    Figure PCTCN2018083551-appb-100001
    Figure PCTCN2018083551-appb-100001
    其中,所述x为用户集,所述y为包含社交媒体数据中的各个词语的集合,所述f j(x,y)为用户的第j个特征在所述y中所占的比重,λ j,y为用户的第j个特征所对应的属性参数,所述Z(x)为正规化因子。 Wherein the x is a set of users, the y is a set containing respective words in the social media data, and the f j (x, y) is a proportion of the j-th feature of the user in the y, λ j,y is the attribute parameter corresponding to the j-th feature of the user, and the Z(x) is a normalization factor.
  4. 如权利要求2或3所述的行为模式数据的发布方法,其特征在于,所述将行为模式数据查询参数输入所述加噪行为模式分布函数,以得到加噪行为模式数据,包括:The method for distributing behavioral pattern data according to claim 2 or 3, wherein the inputting the behavioral pattern data query parameter into the noise-added behavior pattern distribution function to obtain the noise-added behavior pattern data comprises:
    获取携带有行为模式数据查询参数的数据查询请求,所述行为模式数据查询参数包括用户集以及特征;Obtaining a data query request carrying a behavior mode data query parameter, where the behavior pattern data query parameter includes a user set and a feature;
    生成所述特征对应的随机参数;Generating a random parameter corresponding to the feature;
    将所述随机参数以及所述行为模式数据查询参数输入所述加噪行为模式分布函数,以 得到加噪行为模式数据。And inputting the random parameter and the behavior pattern data query parameter into the noise-added behavior pattern distribution function to obtain the noise-added behavior pattern data.
  5. 如权利要求1所述的行为模式数据的发布方法,其特征在于,所述预设的所述随机噪声函数服从尺度参数为
    Figure PCTCN2018083551-appb-100002
    的拉普拉斯分布;
    The method for distributing behavior pattern data according to claim 1, wherein the preset random noise function obeys a scale parameter
    Figure PCTCN2018083551-appb-100002
    Laplace distribution;
    其中,所述Δf为所述行为模式分布函数的全局敏感度,所述ε为预设的隐私保护预算参数。Wherein Δf is a global sensitivity of the behavior pattern distribution function, and the ε is a preset privacy protection budget parameter.
  6. 一种行为模式数据的发布装置,其特征在于,包括:A device for distributing behavior pattern data, comprising:
    获取单元,用于获取多个用户发布的社交媒体数据;An obtaining unit, configured to acquire social media data published by multiple users;
    建立单元,用于建立所述社交媒体数据对应的行为模式分布函数;Establishing a unit, configured to establish a behavior pattern distribution function corresponding to the social media data;
    加噪单元,用于将预设的所述随机噪声函数与所述行为模式分布函数进行整合,得到加噪行为模式分布函数;a noise adding unit, configured to integrate the preset random noise function with the behavior pattern distribution function to obtain a noise-added behavior pattern distribution function;
    输入单元,用于将行为模式数据查询参数输入所述加噪行为模式分布函数,以得到加噪行为模式数据;An input unit, configured to input a behavior mode data query parameter into the noise-added behavior mode distribution function to obtain a noise-added behavior mode data;
    发布单元,用于发布所述加噪行为模式数据。a publishing unit, configured to release the noise-added behavior mode data.
  7. 根据权利要求6所述的行为模式数据的发布装置,其特征在于,所述建立单元42包括:The apparatus for distributing behavior pattern data according to claim 6, wherein the establishing unit 42 comprises:
    划分子单元,用于将所述多个用户划分为多个用户集;Dividing a subunit for dividing the plurality of users into a plurality of user sets;
    解析子单元,用于对每一所述用户集对应的所述社交媒体数据进行解析处理,得到所述用户集的行为模式数据;a parsing subunit, configured to parse the social media data corresponding to each of the user sets to obtain behavior pattern data of the user set;
    第一输入子单元,用于将每一所述用户集的行为模式数据输入预设的行为模式分布模型;a first input subunit, configured to input behavior mode data of each of the user sets into a preset behavior pattern distribution model;
    确定子单元,用于最大化所述行为模式分布模型的熵,以确定所述行为模式分布模型中的属性参数;Determining a subunit for maximizing an entropy of the behavior pattern distribution model to determine an attribute parameter in the behavior pattern distribution model;
    输出子单元,用于将所述属性参数确定后的所述行为模式分布模型输出为所述社交媒体数据对应的行为模式分布函数。And an output subunit, configured to output the behavior pattern distribution model determined by the attribute parameter as a behavior pattern distribution function corresponding to the social media data.
  8. 根据权利要求7所述的行为模式数据的发布装置,其特征在于,所述行为模式分布模型为:The apparatus for distributing behavior pattern data according to claim 7, wherein the behavior pattern distribution model is:
    Figure PCTCN2018083551-appb-100003
    Figure PCTCN2018083551-appb-100003
    其中,所述x为用户集,所述y为包含社交媒体数据中的各个词语的集合,所述f j(x,y)为用户的第j个特征在所述y中所占的比重,λ j,y为用户的第j个特征所对应的属性参数,所述Z(x)为正规化因子。 Wherein the x is a set of users, the y is a set containing respective words in the social media data, and the f j (x, y) is a proportion of the j-th feature of the user in the y, λ j,y is the attribute parameter corresponding to the j-th feature of the user, and the Z(x) is a normalization factor.
  9. 根据权利要求7或8所述的行为模式数据的发布装置,其特征在于,所述输入单元包括:The apparatus for distributing behavior pattern data according to claim 7 or 8, wherein the input unit comprises:
    获取子单元,用于获取携带有行为模式数据查询参数的数据查询请求,所述行为模式数据查询参数包括用户集以及特征;Obtaining a sub-unit, configured to acquire a data query request carrying a behavior mode data query parameter, where the behavior mode data query parameter includes a user set and a feature;
    生成子单元,用于生成所述特征对应的随机参数;Generating a subunit for generating a random parameter corresponding to the feature;
    第二输入子单元,用于将所述随机参数以及所述行为模式数据查询参数输入所述加噪行为模式分布函数,以得到加噪行为模式数据。And a second input subunit, configured to input the random parameter and the behavior mode data query parameter into the noise added behavior mode distribution function to obtain the noise added behavior pattern data.
  10. 根据权利要求6所述的行为模式数据的发布装置,其特征在于,所述预设的所述随机噪声函数服从尺度参数为
    Figure PCTCN2018083551-appb-100004
    的拉普拉斯分布。
    The apparatus for distributing behavior pattern data according to claim 6, wherein the preset random noise function obeys a scale parameter
    Figure PCTCN2018083551-appb-100004
    Laplace distribution.
    其中,所述Δf为所述行为模式分布函数的全局敏感度,所述ε为预设的隐私保护预算参数。Wherein Δf is a global sensitivity of the behavior pattern distribution function, and the ε is a preset privacy protection budget parameter.
  11. 一种终端设备,其特征在于,包括存储器以及处理器,所述存储器中存储有可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现如下步骤:A terminal device, comprising: a memory and a processor, wherein the memory stores computer readable instructions executable on the processor, and the processor implements the following steps when the computer readable instructions are executed :
    获取多个用户发布的社交媒体数据;Get social media data published by multiple users;
    建立所述社交媒体数据对应的行为模式分布函数;Establishing a behavior pattern distribution function corresponding to the social media data;
    将预设的所述随机噪声函数与所述行为模式分布函数进行整合,得到加噪行为模式分布函数;Integrating the preset random noise function with the behavior pattern distribution function to obtain a noise-added behavior pattern distribution function;
    将行为模式数据查询参数输入所述加噪行为模式分布函数,以得到加噪行为模式数据;Entering the behavior mode data query parameter into the noise-added behavior pattern distribution function to obtain the noise-added behavior pattern data;
    发布所述加噪行为模式数据。The noise-added behavior pattern data is released.
  12. 根据权利要求11所述的终端设备,其特征在于,所述建立所述社交媒体数据对应的行为模式分布函数,包括:The terminal device according to claim 11, wherein the establishing a behavior pattern distribution function corresponding to the social media data comprises:
    将所述多个用户划分为多个用户集;Dividing the plurality of users into a plurality of user sets;
    对每一所述用户集对应的所述社交媒体数据进行解析处理,得到所述用户集的行为模式数据;Performing parsing processing on the social media data corresponding to each of the user sets to obtain behavior pattern data of the user set;
    将每一所述用户集的行为模式数据输入预设的行为模式分布模型;Inputting behavior pattern data of each of the user sets into a preset behavior pattern distribution model;
    最大化所述行为模式分布模型的熵,以确定所述行为模式分布模型中的属性参数;Maximizing an entropy of the behavior pattern distribution model to determine an attribute parameter in the behavior pattern distribution model;
    将所述属性参数确定后的所述行为模式分布模型输出为所述社交媒体数据对应的行为模式分布函数。Outputting the behavior pattern distribution model determined by the attribute parameter as a behavior pattern distribution function corresponding to the social media data.
  13. 根据权利要求12所述的终端设备,其特征在于,所述行为模式分布模型为:The terminal device according to claim 12, wherein the behavior pattern distribution model is:
    Figure PCTCN2018083551-appb-100005
    Figure PCTCN2018083551-appb-100005
    其中,所述x为用户集,所述y为包含社交媒体数据中的各个词语的集合,所述f j(x,y)为用户的第j个特征在所述y中所占的比重,λ j,y为用户的第j个特征所对应的属性参数,所述Z(x)为正规化因子。 Wherein the x is a set of users, the y is a set containing respective words in the social media data, and the f j (x, y) is a proportion of the j-th feature of the user in the y, λ j,y is the attribute parameter corresponding to the j-th feature of the user, and the Z(x) is a normalization factor.
  14. 根据权利要求12或13所述的终端设备,其特征在于,所述将行为模式数据查询参数输入所述加噪行为模式分布函数,以得到加噪行为模式数据,包括:The terminal device according to claim 12 or 13, wherein the inputting the behavior mode data query parameter into the noise-adding behavior mode distribution function to obtain the noise-adding behavior pattern data comprises:
    获取携带有行为模式数据查询参数的数据查询请求,所述行为模式数据查询参数包括用户集以及特征;Obtaining a data query request carrying a behavior mode data query parameter, where the behavior pattern data query parameter includes a user set and a feature;
    生成所述特征对应的随机参数;Generating a random parameter corresponding to the feature;
    将所述随机参数以及所述行为模式数据查询参数输入所述加噪行为模式分布函数,以得到加噪行为模式数据。And inputting the random parameter and the behavior pattern data query parameter into the noise-added behavior pattern distribution function to obtain the noise-added behavior pattern data.
  15. 根据权利要求11所述的终端设备,其特征在于,所述预设的所述随机噪声函数服从尺度参数为
    Figure PCTCN2018083551-appb-100006
    的拉普拉斯分布;
    The terminal device according to claim 11, wherein the preset random noise function obeys a scale parameter
    Figure PCTCN2018083551-appb-100006
    Laplace distribution;
    其中,所述Δf为所述行为模式分布函数的全局敏感度,所述ε为预设的隐私保护预算参数。Wherein Δf is a global sensitivity of the behavior pattern distribution function, and the ε is a preset privacy protection budget parameter.
  16. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机可读指令,其特征在于,所述计算机可读指令被至少一个处理器执行时实现如下步骤:A computer readable storage medium storing computer readable instructions, wherein the computer readable instructions, when executed by at least one processor, implement the following steps:
    获取多个用户发布的社交媒体数据;Get social media data published by multiple users;
    建立所述社交媒体数据对应的行为模式分布函数;Establishing a behavior pattern distribution function corresponding to the social media data;
    将预设的所述随机噪声函数与所述行为模式分布函数进行整合,得到加噪行为模式分布函数;Integrating the preset random noise function with the behavior pattern distribution function to obtain a noise-added behavior pattern distribution function;
    将行为模式数据查询参数输入所述加噪行为模式分布函数,以得到加噪行为模式数据;Entering the behavior mode data query parameter into the noise-added behavior pattern distribution function to obtain the noise-added behavior pattern data;
    发布所述加噪行为模式数据。The noise-added behavior pattern data is released.
  17. 根据权利要求16所述的计算机可读存储介质,其特征在于,所述建立所述社交媒体数据对应的行为模式分布函数,包括:The computer readable storage medium according to claim 16, wherein the establishing a behavior pattern distribution function corresponding to the social media data comprises:
    将所述多个用户划分为多个用户集;Dividing the plurality of users into a plurality of user sets;
    对每一所述用户集对应的所述社交媒体数据进行解析处理,得到所述用户集的行为模式数据;Performing parsing processing on the social media data corresponding to each of the user sets to obtain behavior pattern data of the user set;
    将每一所述用户集的行为模式数据输入预设的行为模式分布模型;Inputting behavior pattern data of each of the user sets into a preset behavior pattern distribution model;
    最大化所述行为模式分布模型的熵,以确定所述行为模式分布模型中的属性参数;Maximizing an entropy of the behavior pattern distribution model to determine an attribute parameter in the behavior pattern distribution model;
    将所述属性参数确定后的所述行为模式分布模型输出为所述社交媒体数据对应的行为 模式分布函数。The behavior pattern distribution model determined by the attribute parameter is output as a behavior pattern distribution function corresponding to the social media data.
  18. 根据权利要求17所述的计算机可读存储介质,其特征在于,所述行为模式分布模型为:The computer readable storage medium of claim 17, wherein the behavior pattern distribution model is:
    Figure PCTCN2018083551-appb-100007
    Figure PCTCN2018083551-appb-100007
    其中,所述x为用户集,所述y为包含社交媒体数据中的各个词语的集合,所述f j(x,y)为用户的第j个特征在所述y中所占的比重,λ j,y为用户的第j个特征所对应的属性参数,所述Z(x)为正规化因子。 Wherein the x is a set of users, the y is a set containing respective words in the social media data, and the f j (x, y) is a proportion of the j-th feature of the user in the y, λ j,y is the attribute parameter corresponding to the j-th feature of the user, and the Z(x) is a normalization factor.
  19. 根据权利要求17或18所述的计算机可读存储介质,其特征在于,所述将行为模式数据查询参数输入所述加噪行为模式分布函数,以得到加噪行为模式数据,包括:The computer readable storage medium according to claim 17 or 18, wherein the inputting the behavior mode data query parameter into the noise-added behavior pattern distribution function to obtain the noise-added behavior pattern data comprises:
    获取携带有行为模式数据查询参数的数据查询请求,所述行为模式数据查询参数包括用户集以及特征;Obtaining a data query request carrying a behavior mode data query parameter, where the behavior pattern data query parameter includes a user set and a feature;
    生成所述特征对应的随机参数;Generating a random parameter corresponding to the feature;
    将所述随机参数以及所述行为模式数据查询参数输入所述加噪行为模式分布函数,以得到加噪行为模式数据。And inputting the random parameter and the behavior pattern data query parameter into the noise-added behavior pattern distribution function to obtain the noise-added behavior pattern data.
  20. 根据权利要求16所述的计算机可读存储介质,其特征在于,所述预设的所述随机噪声函数服从尺度参数为
    Figure PCTCN2018083551-appb-100008
    的拉普拉斯分布;
    The computer readable storage medium according to claim 16, wherein said predetermined said random noise function obeys a scale parameter
    Figure PCTCN2018083551-appb-100008
    Laplace distribution;
    其中,所述Δf为所述行为模式分布函数的全局敏感度,所述ε为预设的隐私保护预算参数。Wherein Δf is a global sensitivity of the behavior pattern distribution function, and the ε is a preset privacy protection budget parameter.
PCT/CN2018/083551 2017-07-24 2018-04-18 Method and apparatus for publishing behaviour pattern data, terminal device and medium WO2019019711A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710605631.XA CN107798249B (en) 2017-07-24 2017-07-24 Method for releasing behavior pattern data and terminal equipment
CN201710605631.X 2017-07-24

Publications (1)

Publication Number Publication Date
WO2019019711A1 true WO2019019711A1 (en) 2019-01-31

Family

ID=61530306

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/083551 WO2019019711A1 (en) 2017-07-24 2018-04-18 Method and apparatus for publishing behaviour pattern data, terminal device and medium

Country Status (2)

Country Link
CN (1) CN107798249B (en)
WO (1) WO2019019711A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107798249B (en) * 2017-07-24 2020-02-21 平安科技(深圳)有限公司 Method for releasing behavior pattern data and terminal equipment
CN109784006A (en) * 2019-01-04 2019-05-21 平安科技(深圳)有限公司 Watermark insertion and extracting method and terminal device

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102377804A (en) * 2010-08-19 2012-03-14 王轶彤 Geographic location-based interactive information service system and method
CN102549614A (en) * 2009-10-07 2012-07-04 微软公司 A privacy vault for maintaining the privacy of user profiles
CN104050267A (en) * 2014-06-23 2014-09-17 中国科学院软件研究所 Individuality recommendation method and system protecting user privacy on basis of association rules
CN104361123A (en) * 2014-12-03 2015-02-18 中国科学技术大学 Individual behavior data anonymization method and system
CN104809408A (en) * 2015-05-08 2015-07-29 中国科学技术大学 Histogram release method based on difference privacy
CN106209457A (en) * 2016-07-14 2016-12-07 北京工业大学 Tackle method for secret protection and the system of bypass attack in smart home environment
WO2017062601A1 (en) * 2015-10-09 2017-04-13 Interdigital Technology Corporation Multi-level dynamic privacy management in an internet of things environment with multiple personalized service providers
CN107798249A (en) * 2017-07-24 2018-03-13 平安科技(深圳)有限公司 The dissemination method and terminal device of behavioral pattern data

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105608446B (en) * 2016-02-02 2019-02-12 北京大学深圳研究生院 A kind of detection method and device of video flowing anomalous event

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102549614A (en) * 2009-10-07 2012-07-04 微软公司 A privacy vault for maintaining the privacy of user profiles
CN102377804A (en) * 2010-08-19 2012-03-14 王轶彤 Geographic location-based interactive information service system and method
CN104050267A (en) * 2014-06-23 2014-09-17 中国科学院软件研究所 Individuality recommendation method and system protecting user privacy on basis of association rules
CN104361123A (en) * 2014-12-03 2015-02-18 中国科学技术大学 Individual behavior data anonymization method and system
CN104809408A (en) * 2015-05-08 2015-07-29 中国科学技术大学 Histogram release method based on difference privacy
WO2017062601A1 (en) * 2015-10-09 2017-04-13 Interdigital Technology Corporation Multi-level dynamic privacy management in an internet of things environment with multiple personalized service providers
CN106209457A (en) * 2016-07-14 2016-12-07 北京工业大学 Tackle method for secret protection and the system of bypass attack in smart home environment
CN107798249A (en) * 2017-07-24 2018-03-13 平安科技(深圳)有限公司 The dissemination method and terminal device of behavioral pattern data

Also Published As

Publication number Publication date
CN107798249B (en) 2020-02-21
CN107798249A (en) 2018-03-13

Similar Documents

Publication Publication Date Title
EP2866421B1 (en) Method and apparatus for identifying a same user in multiple social networks
US11468186B2 (en) Data protection via aggregation-based obfuscation
KR102430649B1 (en) Computer-implemented system and method for automatically identifying attributes for anonymization
US10831927B2 (en) Noise propagation-based data anonymization
US20180232528A1 (en) Sensitive Data Classification
EP3002686A1 (en) Language identification
CN111615706A (en) Analysis of spatial sparse data based on sub-manifold sparse convolutional neural network
US10943068B2 (en) N-ary relation prediction over text spans
US20150356091A1 (en) Method and system for identifying microblog user identity
US20210165913A1 (en) Controlling access to de-identified data sets based on a risk of re- identification
CN104077723B (en) A kind of social networks commending system and method
JP2021096858A (en) Method and system for detecting duplicate documents using vector quantization
TW201820173A (en) De-identification data generation apparatus, method, and computer program product thereof
WO2019019711A1 (en) Method and apparatus for publishing behaviour pattern data, terminal device and medium
JP5952441B2 (en) Method for identifying secret data, electronic apparatus and computer-readable recording medium
WO2019085118A1 (en) Topic model-based associated word analysis method, and electronic apparatus and storage medium
CN106779929B (en) Product recommendation method and device and computing equipment
US20230186212A1 (en) System, method, electronic device, and storage medium for identifying risk event based on social information
Qu et al. Privacy preserving in big data sets through multiple shuffle
CN111967045A (en) Big data-based data publishing privacy protection algorithm and system
WO2018205460A1 (en) Target user acquisition method and apparatus, electronic device and medium
US20220327139A1 (en) Method and system for using target documents camouflaged as traps with similarity maps to detect patterns
US20220180226A1 (en) Applying a k-anonymity model to protect node level privacy in knowledge graphs and a differential privacy model to protect edge level privacy in knowledge graphs
CN111597453B (en) User image drawing method, device, computer equipment and computer readable storage medium
US11456996B2 (en) Attribute-based quasi-identifier discovery

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 04/08/2020)

122 Ep: pct application non-entry in european phase

Ref document number: 18839478

Country of ref document: EP

Kind code of ref document: A1