CN109670342B - Method and device for measuring risk of information leakage - Google Patents

Method and device for measuring risk of information leakage Download PDF

Info

Publication number
CN109670342B
CN109670342B CN201811646432.4A CN201811646432A CN109670342B CN 109670342 B CN109670342 B CN 109670342B CN 201811646432 A CN201811646432 A CN 201811646432A CN 109670342 B CN109670342 B CN 109670342B
Authority
CN
China
Prior art keywords
node
privacy
information
target node
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811646432.4A
Other languages
Chinese (zh)
Other versions
CN109670342A (en
Inventor
朱娜斐
王思雨
何泾沙
滕达
李文欣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Yongbo Technology Co ltd
Hebei Free Trade Zone Zhimao Technology Co.,Ltd.
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN201811646432.4A priority Critical patent/CN109670342B/en
Publication of CN109670342A publication Critical patent/CN109670342A/en
Application granted granted Critical
Publication of CN109670342B publication Critical patent/CN109670342B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Databases & Information Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Medical Informatics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present application relates to the field of information security, and in particular, to a method and an apparatus for measuring risk of information leakage. Applied to a target system, the method comprises: constructing a privacy information ontology tree comprising a plurality of nodes; determining known privacy information and unknown privacy information; respectively mapping the known privacy information and the unknown privacy information to each node in the privacy information ontology tree; selecting a node mapped with unknown privacy information from the privacy information ontology tree as a target node; acquiring a node having a parent-child relationship with a target node as a current node according to the target node; calculating the privacy disclosure value of the target node according to the privacy disclosure value of the current node; and determining the leakage risk degree of unknown privacy information according to the privacy leakage value of the target node. The problem that in privacy protection, the protection degree is not enough due to the fact that the reasoning among information is not considered, and the issued information can be forbidden to be known in practice is solved. The protection degree of unknown private information is improved.

Description

Method and device for measuring risk of information leakage
Technical Field
The present application relates to the field of information security, and in particular, to a method and an apparatus for measuring risk of information leakage.
Background
In the big data era, while people enjoy the convenience brought by intelligent technologies such as recommendation algorithm, voice recognition, image recognition and unmanned vehicle driving, privacy is also revealed unconsciously. In order to solve the problem, technologies such as a k-anonymization algorithm, differential privacy, data tracing, data watermarking and the like are successively proposed, but the technologies only solve the most basic problems, namely encryption and hiding of private information. However, information is correlated, and privacy leakage due to the correlation between information still leaves a wide range.
Disclosure of Invention
In view of this, embodiments of the present application provide a method and an apparatus for measuring risk of information disclosure, so as to improve protection of private information.
In a first aspect, an embodiment of the present application provides a method for measuring risk of information leakage, which is applied to a target system, and the method includes: constructing a privacy information ontology tree comprising a plurality of nodes; determining known privacy information and unknown privacy information; respectively mapping the known privacy information and the unknown privacy information to each node in the privacy information ontology tree; selecting a node mapped with unknown privacy information from the privacy information ontology tree as a target node; acquiring a node having a parent-child relationship with the target node according to the target node as a current node; if the privacy disclosure value of the current node is known, calculating the privacy disclosure value of the target node according to the privacy disclosure value of the current node; and determining the leakage risk degree of the unknown privacy information according to the privacy leakage value of the target node.
With reference to the first aspect, an embodiment of the present invention provides a first possible implementation manner of the first aspect, where: the constructing of the privacy information ontology tree comprising a plurality of nodes comprises: acquiring nouns and adjectives which accord with upper and lower relations, instance relations, integral part relations and attribute relations from a dictionary; and constructing the privacy information ontology tree by using the nouns and the adjectives.
With reference to the first aspect, an embodiment of the present invention provides a second possible implementation manner of the first aspect, where: the mapping the known privacy information and the unknown privacy information to each node in the privacy information ontology tree respectively comprises: calculating semantic similarity of the private information and the unknown private information and the keywords corresponding to each node by using a Vector Space Model (VSM) algorithm; if the semantic similarity is not zero, mapping the privacy information and the unknown privacy information with each node; if the semantic similarity is zero, calculating the word similarity of the private information to be mapped and the unknown private information and the keywords corresponding to each node by using a cosine similarity algorithm; and if the word similarity is larger than a preset threshold value, mapping the privacy information and the unknown privacy information with each node.
With reference to the first aspect, an embodiment of the present invention provides a third possible implementation manner of the first aspect, where: the calculating the privacy disclosure value of the target node according to the privacy disclosure value of the current node comprises: starting from the target node, finding a corresponding current node according to the parent-child relationship stored in the target node; and calculating the privacy disclosure value of the target node according to the privacy disclosure value of the current node.
With reference to the third possible implementation manner of the first aspect, an embodiment of the present invention provides a fourth possible implementation manner of the first aspect, where: the calculating the privacy disclosure value of the target node comprises: if the relationship between the target node and the current node is a top-bottom relationship or an instance relationship, and the target node is a child node of the current node, the privacy disclosure value of the target node is the same as the privacy disclosure value of the current node; if the relationship between the target node and the current node is a top-bottom relationship or an instance relationship, and the target node is a parent node of the current node, the privacy disclosure value of the target node is 1/n of the privacy disclosure value of the current node; if the relationship between the target node and the current node is an integral part relationship or an attribute relationship, and the target node is a child node of the current node, the privacy disclosure value of the target node is 1/n of the privacy disclosure value of the current node; and if the relationship between the target node and the current node is an integral part relationship or an attribute relationship, and the target node is a father node of the current node, the privacy disclosure value of the target node is the same as the privacy disclosure value of the current node.
With reference to the first aspect, an embodiment of the present invention provides a fifth possible implementation manner of the first aspect, where: the determining the leakage risk degree of the unknown privacy information according to the privacy leakage value of the target node comprises the following steps: if the privacy disclosure value reaches a preset threshold value, the unknown privacy information is determined to be high disclosure risk information or disclosure information; and if the privacy leakage value does not reach a preset threshold value, the unknown privacy information is determined to be low-leakage-risk information or non-leakage information.
In a second aspect, an embodiment of the present application provides an apparatus for measuring risk of information leakage, which is applied to a target system, and the apparatus includes: the building module is used for building a privacy information ontology tree comprising a plurality of nodes; the privacy information determining module is used for determining known privacy information and unknown privacy information; the mapping module is used for mapping the known privacy information and the unknown privacy information to each node in the privacy information ontology tree respectively; the computing module is used for selecting a node mapped with unknown privacy information from the privacy information ontology tree as a target node; acquiring a node having a parent-child relationship with the target node according to the target node as a current node; if the privacy disclosure value of the current node is known, calculating the privacy disclosure value of the target node according to the privacy disclosure value of the current node; and the leakage risk degree determining module is used for determining the leakage risk degree of the unknown privacy information according to the privacy leakage value of the target node.
With reference to the second aspect, an embodiment of the present invention provides a first possible implementation manner of the second aspect, where: the calculation module is further to: starting from the target node, finding a corresponding current node according to the parent-child relationship stored in the target node; and calculating the privacy disclosure value of the target node according to the privacy disclosure value of the current node.
With reference to the first possible implementation manner of the second aspect, an embodiment of the present invention provides a second possible implementation manner of the second aspect, where: the calculation module is further to: if the relationship between the target node and the current node is a top-bottom relationship or an instance relationship, and the target node is a child node of the current node, the privacy disclosure value of the target node is the same as the privacy disclosure value of the current node; if the relationship between the target node and the current node is a top-bottom relationship or an instance relationship, and the target node is a parent node of the current node, the privacy disclosure value of the target node is 1/n of the privacy disclosure value of the current node; if the relationship between the target node and the current node is an integral part relationship or an attribute relationship, and the target node is a child node of the current node, the privacy disclosure value of the target node is 1/n of the privacy disclosure value of the current node; and if the relationship between the target node and the current node is an integral part relationship or an attribute relationship, and the target node is a father node of the current node, the privacy disclosure value of the target node is the same as the privacy disclosure value of the current node.
In combination with the second aspect, embodiments of the present invention provide a third possible implementation manner of the second aspect, where: the leakage risk level determination module is further configured to: if the privacy disclosure value reaches a preset threshold value, the unknown privacy information is determined to be high disclosure risk information or disclosure information; and if the privacy leakage value does not reach a preset threshold value, the unknown privacy information is determined to be low-leakage-risk information or non-leakage information.
By adopting the scheme, the privacy leakage value of the target node is calculated through the privacy leakage value of the node with known privacy information, and the leakage risk degree of unknown privacy information is determined according to the privacy leakage value of the target node. In the mode, the privacy leakage degree of the unknown privacy information is calculated by utilizing the leakage degree of the known privacy information, so that the protection strength of the unknown privacy information can be improved according to the leakage degree of the unknown privacy information. The problem that in privacy protection, the protection degree is not enough due to the fact that the reasoning among information is not considered, and the issued information can be forbidden to be known in practice is solved.
Additional features and advantages of the disclosure will be set forth in the description which follows, or in part may be learned by the practice of the above-described techniques of the disclosure, or may be learned by practice of the disclosure.
In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present disclosure, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a flowchart of an information leakage risk measurement method according to an embodiment of the present disclosure;
FIG. 2 is a schematic diagram of reasoning for private information provided in an embodiment of the present application;
fig. 3 is a schematic diagram of an application of a vector space model VSM algorithm and a cosine similarity algorithm according to an embodiment of the present application;
fig. 4 is a schematic diagram of a calculation rule when a privacy leakage value is calculated in a traversal manner in a privacy leakage risk measurement algorithm of an ontology and an information inference relationship provided in the embodiment of the present application;
fig. 5 is a schematic structural diagram of an information leakage risk measurement apparatus according to an embodiment of the present application.
Detailed Description
To make the objects, technical solutions and advantages of the embodiments of the present disclosure more apparent, the embodiments of the present disclosure will be described clearly and completely with reference to the accompanying drawings, and it is to be understood that the described embodiments are some, but not all embodiments of the present disclosure. All other embodiments, which can be derived by one of ordinary skill in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.
The privacy protection technology in the prior art does not solve the problem of association between information, namely the reasoning of the information. The information reasoning comprises a plurality of methods such as association reasoning, comprehensive reasoning, causal reasoning, hypothesis reasoning and the like, so that an information acquirer can further acquire more information on the basis of knowing a part of information. The research on the aspect is less, a universal privacy propagation model is not popular, meanwhile, the determination of the privacy information type needs semantic analysis, and how the analyzed information generates reasoning relation does not have an authoritative standard up to now. In 2002, the world wide web provides a P3P privacy preference platform, privacy information protection is performed from a brand-new perspective, a user does not passively let software identify privacy any more, but actively provides own privacy preference, and a service party matches own policy with the user privacy preference, so that the purpose of privacy protection is achieved. In the P3P platform, since the user can make rules by himself, some simple reasoning process can be considered by the user, for example, the user does not want to reveal his birthday, and certainly does not allow the user to issue his identity card number, because the birthday can be easily deduced from the identity card number. However, the reasoning process that the user can consider is very shallow and incomplete, and still privacy information cannot be well protected, and the privacy protection difficulty caused by reasoning can be solved only by introducing perfect semantic reasoning.
Based on this, the embodiment of the application provides a method and a device for measuring information disclosure risk, so as to improve the protection of private information.
The embodiments of the present application do not limit specific application scenarios, and any method using the embodiments of the present application is within the scope of the present application. The following is a detailed description by way of specific examples.
See fig. 1 for a flow chart of a method of information leakage risk measurement. The embodiment provides a method for measuring risk of information leakage, which is applied to a target system and comprises the following steps:
s101, constructing a privacy information ontology tree comprising a plurality of nodes;
each kind of private information is regarded as one information point. The information point may be described by a specific word, such as "name", "house number", etc., and the specific information contained therein is different in each flow of the system operation and is related to the user using the system. In real life, people can often reason by knowing information, so as to obtain more information, because the information has the characteristic of reasoning, so does the privacy information. Referring to fig. 2, a diagram of inference of private information is shown, where PA represents a known data set and dm represents a new data set inferred from the known data set.
In the PB set, p1, p2, p3, p4 and p5 are known data; d1 is new data inferred from p 1; d2 is new data inferred from p 2; d3 is new data inferred from p 3; d4 is new data inferred from p 4; d5 is new data inferred from p 5.
Taking age information as an example, the information system often considers "working age" and "retirement age" as two different fields, and in fact, as long as the age of a person at his time of employment is known, his retirement age can be easily inferred from the working age. In the semantic reasoning model, the age of job and the working age are PA, the age of retirement is dm, and dm can further reason more information.
Step S102, known privacy information and unknown privacy information are determined;
wherein known and unknown privacy information in the target system is determined. The target system is any system that can operate on data, such as a data processing system, an access control system, a data distribution system, and the like. Given that private information is information that is freely accessible or published within a data processing system, this portion of the information can be captured by any role in the system and used to reason about it. The reasoning refers to obtaining more information through the association between the information on the basis of the existing information, for example, knowing the age and working age of a person, the retirement time can be roughly inferred. The invention mainly depends on the reasoning relation among the privacy information, the measurement is carried out on the premise that the information is leaked or actively issued by an information owner, and the leakage risk of other information is calculated according to the information. For example, in an access control system, known private information refers to information that has been accessed, i.e., a person makes an access request and the system gives "allow" decision information.
Unknown private information, i.e. information to be calculated, is determined in the target system. The information to be calculated refers to the leakage degree of certain private information that a user needs to know, and in a data processing system, the user usually has no way to directly obtain the information, namely, the content is vacant, or the data owner intentionally hides the information. For example, if a person sets a "detailed address" to be inaccessible in the access control system, the detailed address is one of the information to be calculated. After the known information is available, it is also necessary to know which information is to be calculated, and this step is only for determining the target, so as to reduce the running time of the algorithm and improve the efficiency. In theory, the atmosphere of the information to be calculated can be very wide and even be expanded to all information, that is, the algorithm can have a definite calculation target or no specific target, and only what information has been leaked in the current information background is measured.
Step S103, respectively mapping the known privacy information and the unknown privacy information to each node in the privacy information ontology tree;
step S104, selecting a node mapped with unknown privacy information from the privacy information ontology tree as a target node;
acquiring a node having a parent-child relationship with a target node as a current node according to the target node;
if the privacy disclosure value of the current node is known, calculating the privacy disclosure value of the target node according to the privacy disclosure value of the current node;
wherein the root privacy leakage value is represented as a specific numerical value between 0 and 1. If the privacy disclosure value of the current node is known, calculating the privacy disclosure value of the target node according to the privacy disclosure value of the current node; and if the privacy disclosure value of the current node is unknown, the current node is taken as a target node, traversal is continued, the node with parent-child relationship with the current node is continuously searched, and the node with the known privacy disclosure value is always found.
And step S105, determining the leakage risk degree of unknown privacy information according to the privacy leakage value of the target node.
According to the method, the privacy leakage value of the target node is calculated through the privacy leakage value of the node with known privacy information, and the leakage risk degree of unknown privacy information is determined according to the privacy leakage value of the target node. In the mode, the privacy leakage degree of the unknown privacy information is calculated by utilizing the leakage degree of the known privacy information, so that the protection strength of the unknown privacy information can be improved according to the leakage degree of the unknown privacy information. The problem that in privacy protection, the protection degree is not enough due to the fact that the reasoning among information is not considered, and the issued information can be forbidden to be known in practice is solved.
In a possible implementation manner, step S101, constructing a privacy information ontology tree including a plurality of nodes, includes: acquiring nouns and adjectives which accord with upper and lower relations, instance relations, integral part relations and attribute relations from a dictionary; and constructing a private information ontology tree by using nouns and adjectives.
The method comprises the steps of determining that the privacy information ontology tree contains two parts of contents, namely determining a relationship and determining part of speech. The reasoning process of the private information requires dependency relationships, such as "working age" and "retirement age" are chronological relationships, and the private information itself also needs to be described by words of specific parts of speech.
A plurality of different relations are given in a WordNet ontology library, and only four relations with the maximum association degree of the private information are selected in the invention, namely hyponym-of, instance-of, part-of and attribute-of. hyponym-of is a superior-inferior relationship, such as "job → teacher, employee"; instance-of is an example relationship, such as "emperor → Alexandrium; part-of is an integral part relationship, such as "car → wheel, engine"; attribute-of is an attribute relationship such as "beauty → neat, beautiful". The four relations are independently cut out from the WordNet ontology library to be used as the privacy information ontology tree in the invention.
For private information, two parts of speech are sufficient, namely nouns and adjectives. Nouns are used to describe a specific object, most commonly used in private information, such as "occupation", "age", "address", etc., while adjectives are used to describe the nature of an object, such as a scientist who has "intelligent", "smart", etc. tags on his body, and actresses who have "beautiful", "elegant", etc. tags on his body. Words of the two parts of speech are independently cut out from a WordNet ontology library, and form a privacy information ontology tree together with the four relations.
In one possible implementation, step S103, mapping the known privacy information and the unknown privacy information onto each node in the privacy information ontology tree, respectively, includes:
calculating semantic similarity of the private information and the unknown private information with the keywords corresponding to each node by using a Vector Space Model (VSM) algorithm;
in the body tree of the private information, each node represents a specific private information, the private information is represented by a keyword, and the keyword inherits the vocabulary of WordNet and has corresponding meaning explanation. The principle of the vector space model VSM algorithm is to calculate the semantic similarity of two words according to the probability of repeated words in the meaning of the words. The calculation of this step is decisive.
If the semantic similarity is not zero, mapping the privacy information and the unknown privacy information with each node;
when the semantic similarity of the two words is not 0, the mapping relation is added to the two words, and the two words are considered to express the same privacy information. For example, in the target system, "home" and "address" both represent the address of the user, and when the attribute of certain information in the target system is "home", the algorithm maps the information to the "address" node of the privacy information ontology tree, thereby completing subsequent node traversal and calculation.
If the semantic similarity is zero, calculating the word similarity of the private information to be mapped and the unknown private information and the keywords corresponding to each node by using a cosine similarity algorithm;
the VSM algorithm is an algorithm produced on the basis of a WordNet ontology library, can only calculate words in the WordNet ontology library, but cannot correctly measure the similarity between abbreviations and the words not in the WordNet, so a cosine similarity calculation method is introduced. Referring to the application diagram of the vector space model VSM algorithm and the cosine similarity algorithm shown in FIG. 3; inputting the attribute of certain information in a target system and the key words of nodes into a Vector Space Model (VSM) algorithm and a cosine similarity algorithm; obtaining semantic similarity between information and keywords through a Vector Space Model (VSM) algorithm; obtaining word similarity between the information and the keywords through a cosine similarity algorithm; the result can be obtained according to the semantic similarity or the word similarity. The meaning of the arrow labeled "decide" in fig. 3 is: if the cosine similarity algorithm can convert the mapped words into the existing words in WordNet, then VSM is further used for semantic similarity calculation. After the computation is finished, all the known information in the information system is mapped to the nodes of the privacy information ontology tree.
If the word similarity is greater than a predetermined threshold, the private information and the unknown private information are mapped with each node.
It is assumed that when the cosine similarity exceeds 0.7, the mapped word is regarded as an abbreviation of the node keyword, and mapping relationships such as "address" and "address" are also given. And the cosine similarity algorithm can also be used for converting the abbreviations into the whole, so that suggestions are provided for the VSM algorithm.
In a possible implementation manner, see a schematic diagram of a calculation rule when a privacy leakage value is calculated in a traversal manner in a privacy leakage risk measurement algorithm of an ontology and information reasoning relationship shown in fig. 4;
calculating the privacy disclosure value of the target node according to the privacy disclosure value of the current node, comprising: starting from the target node, finding a corresponding current node according to the parent-child relationship stored in the target node; and calculating the privacy disclosure value of the target node according to the privacy disclosure value of the current node.
If a plurality of current nodes having a parent-child relationship with the target node exist, calculating a privacy disclosure value of the target node according to the privacy disclosure value of each current node; and accumulating and summing the privacy leakage values of the target node, which are calculated according to the privacy leakage value of each current node, to obtain the privacy leakage value of the target node.
In one possible embodiment, the step of calculating the privacy disclosure value of the target node includes: if the relationship between the target node and the current node is a top-bottom relationship, or instance relationship, and the target node is a child node of the current node, the privacy disclosure value of the target node is the same as the privacy disclosure value of the current node;
if the relationship between the target node and the current node is a top-bottom relationship, or instance relationship, and the target node is a parent node of the current node, the privacy disclosure value of the target node is 1/n of the privacy disclosure value of the current node;
if the relationship between the target node and the current node is a whole part relationship part-of or an attribute relationship attribute-of, and the target node is a child node of the current node, the privacy disclosure value of the target node is 1/n of the privacy disclosure value of the current node;
and if the relationship between the target node and the current node is an integral part relationship part-of or an attribute relationship attribute-of, and the target node is a parent node of the current node, the privacy disclosure value of the target node is the same as that of the current node.
During specific implementation, outward traversal is started from a node mapped with unknown privacy information, leakage values are calculated according to the steps, and if the leakage value of the node is unknown after passing through one node, traversal is continued; until a node is encountered that is mapped with known private information. And calculating the leakage value of the node with unknown privacy information according to the risk leakage value of the node with the known privacy information.
Traversal assignments follow three principles:
1. nodes that have already been computed are not computed again unless an intersection of paths is involved;
2. in the structural aspect, traversal is breadth-first, in the relational aspect, traversal is priority with a relation without changing leakage values, for example, a hyponym-of relation is from a parent to a child, and a part-of relation is from the child to the parent, so that the algorithm efficiency is improved, and the algorithm is finished in as short a time as possible;
3. when path crossing is involved, the leakage value of the crossing point takes the larger value in the traversal of the two paths.
When the privacy leakage value of a certain target node reaches 1, the subsequent traversal of the current node is immediately stopped, and the previous node is returned to find a new traversal path, because the subsequent traversal has no significance. The leakage value of the current point on all paths reaches 1, or no new point can be traversed, and the algorithm is finished integrally. If a threshold value is set, the traversal can be controlled according to the threshold value of specific information, and the traversal is also stopped when the privacy leakage value of the target node reaches the threshold value.
In one possible implementation manner, determining the disclosure risk degree of unknown privacy information according to the privacy disclosure value of the target node includes: if the privacy disclosure value reaches a preset threshold value, the unknown privacy information is determined to be high disclosure risk information or disclosure information; and if the privacy leakage value does not reach the preset threshold value, the unknown privacy information is determined to be low-leakage-risk information or non-leakage information.
Wherein, when the concrete implementation is carried out, the results are sorted. Information represented by a node whose leak value has reached 1 (or a threshold value) is judged as "leaked information", information represented by a node whose leak value has not reached 1 (or a threshold value) is judged as "undisleaked information", and information represented by a node whose leak value has not reached 1 (or a threshold value) but is close thereto is judged as "information with a higher leak risk". In the step, all the calculated privacy disclosure values are stored on the corresponding nodes, and when the known information set is unchanged and only the calculation target is changed, the calculated value can be directly used in the next calculation, so that the algorithm efficiency is greatly optimized. When the set of known information changes, the leakage value of each node can only be calculated again.
And finally, transmitting the determined result to a corresponding module in the target system, or directly transmitting the determined result to a user in a simpler system.
Corresponding to the method, the application also provides a device for measuring the risk of information leakage, which is applied to a target system, and the device is shown in a schematic diagram of the device for measuring the risk of information leakage in fig. 5; the device includes:
a building module 51, configured to build a privacy information ontology tree including a plurality of nodes;
a privacy information determination module 52 for determining known privacy information and unknown privacy information;
the mapping module 53 is configured to map the known privacy information and the unknown privacy information to each node in the privacy information ontology tree respectively;
the calculation module 54 is configured to select a node mapped with unknown privacy information from the privacy information ontology tree as a target node;
acquiring a node having a parent-child relationship with a target node as a current node according to the target node;
if the privacy disclosure value of the current node is known, calculating the privacy disclosure value of the target node according to the privacy disclosure value of the current node;
and the disclosure risk degree determining module 55 is configured to determine the disclosure risk degree of the unknown privacy information according to the privacy disclosure value of the target node.
The device calculates the privacy disclosure value of the target node according to the privacy disclosure value of the node with known privacy information through a calculation module 54; the disclosure risk degree determining module 55 determines the disclosure risk degree of the unknown privacy information according to the privacy disclosure value of the target node. In the mode, the privacy leakage degree of the unknown privacy information is calculated by utilizing the leakage degree of the known privacy information, so that the protection strength of the unknown privacy information can be improved according to the leakage degree of the unknown privacy information.
In one possible implementation, the calculation module 54 is further configured to: starting from the target node, finding a corresponding current node according to the parent-child relationship stored in the target node; and calculating the privacy disclosure value of the target node according to the privacy disclosure value of the current node.
In one possible implementation, the calculation module 54 is further configured to: if the relationship between the target node and the current node is a top-bottom relationship or an instance relationship and the target node is a child node of the current node, the privacy disclosure value of the target node is the same as the privacy disclosure value of the current node;
if the relationship between the target node and the current node is a top-bottom relationship or an instance relationship and the target node is a father node of the current node, the privacy disclosure value of the target node is 1/n of the privacy disclosure value of the current node;
if the relationship between the target node and the current node is an integral part relationship or an attribute relationship, and the target node is a child node of the current node, the privacy disclosure value of the target node is 1/n of the privacy disclosure value of the current node;
and if the relationship between the target node and the current node is an integral part relationship or an attribute relationship, and the target node is a father node of the current node, the privacy disclosure value of the target node is the same as that of the current node.
In a possible embodiment, the leakage risk level determination module 45 is further configured to: if the privacy disclosure value reaches a preset threshold value, the unknown privacy information is determined to be high disclosure risk information or disclosure information; and if the privacy leakage value does not reach the preset threshold value, the unknown privacy information is determined to be low-leakage-risk information or non-leakage information.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (6)

1. A method for measuring risk of information leakage is applied to a target system, and the method comprises the following steps:
constructing a privacy information ontology tree comprising a plurality of nodes;
determining known privacy information and unknown privacy information;
respectively mapping the known privacy information and the unknown privacy information to each node in the privacy information ontology tree;
selecting a node mapped with unknown privacy information from the privacy information ontology tree as a target node;
acquiring a node having a parent-child relationship with the target node according to the target node as a current node;
if the privacy disclosure value of the current node is known, calculating the privacy disclosure value of the target node according to the privacy disclosure value of the current node;
determining the leakage risk degree of the unknown privacy information according to the privacy leakage value of the target node;
the calculating the privacy disclosure value of the target node according to the privacy disclosure value of the current node comprises: starting from the target node, finding a corresponding current node according to the parent-child relationship stored in the target node;
the calculating the privacy disclosure value of the target node comprises:
if the relationship between the target node and the current node is a top-bottom relationship or an instance relationship, and the target node is a child node of the current node, the privacy disclosure value of the target node is the same as the privacy disclosure value of the current node;
if the relationship between the target node and the current node is a top-bottom relationship or an instance relationship, and the target node is a parent node of the current node, the privacy disclosure value of the target node is 1/n of the privacy disclosure value of the current node;
if the relationship between the target node and the current node is an integral part relationship or an attribute relationship, and the target node is a child node of the current node, the privacy disclosure value of the target node is 1/n of the privacy disclosure value of the current node;
if the relationship between the target node and the current node is an integral part relationship or an attribute relationship, and the target node is a father node of the current node, the privacy disclosure value of the target node is the same as the privacy disclosure value of the current node;
where n represents the number of all child nodes that the parent node includes.
2. The method of claim 1, wherein the constructing the private information ontology tree comprising a plurality of nodes comprises:
acquiring nouns and adjectives which accord with upper and lower relations, instance relations, integral part relations and attribute relations from a dictionary;
and constructing the privacy information ontology tree by using the nouns and the adjectives.
3. The method of claim 1, wherein the mapping the known and unknown privacy information onto each node in the privacy information ontology tree, respectively, comprises:
calculating the semantic similarity of the known privacy information and the keywords corresponding to each node and the semantic similarity of the unknown privacy information and the keywords corresponding to each node by using a Vector Space Model (VSM) algorithm;
if the semantic similarity is not zero, mapping the known privacy information and the unknown privacy information with each node;
if the semantic similarity is zero, calculating the word similarity of the known privacy information to be mapped and the keywords corresponding to each node and the word similarity of the unknown privacy information to be mapped and the keywords corresponding to each node by using a cosine similarity algorithm;
mapping the known and unknown privacy information with each node if the word similarity is greater than a predetermined threshold.
4. The method of claim 1, wherein the determining the leakage risk degree of the unknown privacy information according to the privacy leakage value of the target node comprises:
if the privacy leakage value of the target node reaches a preset threshold value, the unknown privacy information is determined to be high-leakage-risk information or leaked information;
and if the privacy leakage value of the target node does not reach a preset threshold value, the unknown privacy information is determined to be low-leakage-risk information or non-leakage information.
5. An apparatus for measuring risk of information leakage, applied to a target system, the apparatus comprising:
the building module is used for building a privacy information ontology tree comprising a plurality of nodes;
the privacy information determining module is used for determining known privacy information and unknown privacy information;
the mapping module is used for mapping the known privacy information and the unknown privacy information to each node in the privacy information ontology tree respectively;
the computing module is used for selecting a node mapped with unknown privacy information from the privacy information ontology tree as a target node;
acquiring a node having a parent-child relationship with the target node according to the target node as a current node;
if the privacy disclosure value of the current node is known, calculating the privacy disclosure value of the target node according to the privacy disclosure value of the current node;
the leakage risk degree determining module is used for determining the leakage risk degree of the unknown privacy information according to the privacy leakage value of the target node;
the calculation module is further to: starting from the target node, finding a corresponding current node according to the parent-child relationship stored in the target node;
the calculation module is further to:
if the relationship between the target node and the current node is a top-bottom relationship or an instance relationship, and the target node is a child node of the current node, the privacy disclosure value of the target node is the same as the privacy disclosure value of the current node;
if the relationship between the target node and the current node is a top-bottom relationship or an instance relationship, and the target node is a parent node of the current node, the privacy disclosure value of the target node is 1/n of the privacy disclosure value of the current node;
if the relationship between the target node and the current node is an integral part relationship or an attribute relationship, and the target node is a child node of the current node, the privacy disclosure value of the target node is 1/n of the privacy disclosure value of the current node;
if the relationship between the target node and the current node is an integral part relationship or an attribute relationship, and the target node is a father node of the current node, the privacy disclosure value of the target node is the same as the privacy disclosure value of the current node;
where n represents the number of all child nodes that the parent node includes.
6. The apparatus of claim 5, wherein the leak risk level determination module is further to:
if the privacy leakage value of the target node reaches a preset threshold value, the unknown privacy information is determined to be high-leakage-risk information or leaked information;
and if the privacy leakage value of the target node does not reach a preset threshold value, the unknown privacy information is determined to be low-leakage-risk information or non-leakage information.
CN201811646432.4A 2018-12-30 2018-12-30 Method and device for measuring risk of information leakage Active CN109670342B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811646432.4A CN109670342B (en) 2018-12-30 2018-12-30 Method and device for measuring risk of information leakage

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811646432.4A CN109670342B (en) 2018-12-30 2018-12-30 Method and device for measuring risk of information leakage

Publications (2)

Publication Number Publication Date
CN109670342A CN109670342A (en) 2019-04-23
CN109670342B true CN109670342B (en) 2021-02-26

Family

ID=66146763

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811646432.4A Active CN109670342B (en) 2018-12-30 2018-12-30 Method and device for measuring risk of information leakage

Country Status (1)

Country Link
CN (1) CN109670342B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110807208B (en) * 2019-10-31 2022-02-18 北京工业大学 K anonymous privacy protection method capable of meeting personalized requirements of users
CN111539382A (en) * 2020-05-22 2020-08-14 支付宝(杭州)信息技术有限公司 Image recognition model privacy risk assessment method and device and electronic equipment
CN112084531B (en) * 2020-09-10 2024-05-17 杭州中奥科技有限公司 Data sensitivity grading method, device, equipment and storage medium
CN113709090B (en) * 2020-10-15 2023-03-17 天翼数字生活科技有限公司 System and method for determining group privacy disclosure risk
CN112580097B (en) * 2020-12-18 2023-12-26 北京工业大学 User privacy data protection method and device based on semantic reasoning, electronic equipment and storage medium
CN112668055B (en) * 2021-01-15 2023-11-10 北京工业大学 Privacy information access control method and system based on ontology reasoning

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101930462A (en) * 2010-08-20 2010-12-29 华中科技大学 Comprehensive body similarity detection method
US9817892B2 (en) * 2014-01-31 2017-11-14 Verint Systems Ltd. Automated removal of private information
CN106156272A (en) * 2016-06-21 2016-11-23 北京工业大学 A kind of information retrieval method based on multi-source semantic analysis
CN106572111B (en) * 2016-11-09 2019-06-28 南京邮电大学 A kind of privacy information towards big data issues the discovery method of exposure chain
CN108491731A (en) * 2018-03-11 2018-09-04 海南大学 Information privacy protection method under IoT environment towards typing resource

Also Published As

Publication number Publication date
CN109670342A (en) 2019-04-23

Similar Documents

Publication Publication Date Title
CN109670342B (en) Method and device for measuring risk of information leakage
CN112084331B (en) Text processing and model training method and device, computer equipment and storage medium
US20190332672A1 (en) Methods, devices, and systems for constructing intelligent knowledge base
Cao et al. Efficient processing of spatial group keyword queries
Meliou et al. Tracing data errors with view-conditioned causality
US20110246439A1 (en) Augmented query search
US20090112827A1 (en) System and method for employing social networks for information discovery
CN111783903B (en) Text processing method, text model processing method and device and computer equipment
Schockaert et al. Generating approximate region boundaries from heterogeneous spatial information: An evolutionary approach
Trojahn et al. A cooperative approach for composite ontology mapping
CN112287020A (en) Big data mining method based on graph analysis
Baum et al. Towards a framework combining machine ethics and machine explainability
CN114706989A (en) Intelligent recommendation method based on technical innovation assets as knowledge base
CN111177404A (en) Knowledge graph construction method and device of home decoration knowledge and computer equipment
CN111126617B (en) Method, device and equipment for selecting fusion model weight parameters
CN110543601B (en) Method and system for recommending context-aware interest points based on intelligent set
CN112417174A (en) Data processing method and device
CN109409102B (en) Data privacy protection method based on dynamic context
Gall Active goal recognition design
CN109828984A (en) A kind of method, apparatus, computer storage medium and the terminal of analysis processing
CN115577119A (en) Knowledge graph inference model training method, device and storage medium
Pant et al. A Real-Time Application of Soft Set in Parameterization Reduction for Decision Making Problem
Xu et al. The semantic analysis of knowledge map for the traffic violations from the surveillance video big data.
Chen et al. Stability of Martin boundary under non-local Feynman-Kac perturbations
Zheng et al. A Family of Neural Contextual Matrix Factorization Models for Context-Aware Recommendations

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20230807

Address after: 101200 room 205-211526, No. 40, Fuqian West Street, Pinggu town, Pinggu District, Beijing (cluster registration)

Patentee after: BEIJING YONGBO TECHNOLOGY CO.,LTD.

Address before: 100000 No. 100 Chaoyang District Ping Tian Park, Beijing

Patentee before: Beijing University of Technology

Effective date of registration: 20230807

Address after: No. 2418, Block B, Investment Service Center, Caofeidian Comprehensive Bonded Zone, Caofeidian Area, China (Hebei) Pilot Free Trade Zone, Tangshan City, Hebei Province, 063205

Patentee after: Hebei Free Trade Zone Zhimao Technology Co.,Ltd.

Address before: 101200 room 205-211526, No. 40, Fuqian West Street, Pinggu town, Pinggu District, Beijing (cluster registration)

Patentee before: BEIJING YONGBO TECHNOLOGY CO.,LTD.