CN112163160B - Sensitive identification method based on knowledge graph - Google Patents

Sensitive identification method based on knowledge graph Download PDF

Info

Publication number
CN112163160B
CN112163160B CN202011082927.6A CN202011082927A CN112163160B CN 112163160 B CN112163160 B CN 112163160B CN 202011082927 A CN202011082927 A CN 202011082927A CN 112163160 B CN112163160 B CN 112163160B
Authority
CN
China
Prior art keywords
data
user
sensitive
knowledge graph
constructing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011082927.6A
Other languages
Chinese (zh)
Other versions
CN112163160A (en
Inventor
王利娥
李小聪
李先贤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangxi Normal University
Original Assignee
Guangxi Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangxi Normal University filed Critical Guangxi Normal University
Priority to CN202011082927.6A priority Critical patent/CN112163160B/en
Publication of CN112163160A publication Critical patent/CN112163160A/en
Application granted granted Critical
Publication of CN112163160B publication Critical patent/CN112163160B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a sensitive identification method based on a knowledge graph, which comprises the steps of firstly, preprocessing an acquired original data set for constructing a user-commodity knowledge graph, constructing a pattern graph of a user-object through preprocessed data, and constructing the knowledge graph according to the preprocessed data and the pattern graph; secondly, in order to identify sensitive data, the built sensitive relation reasoning rules complement the originally non-existing sensitive relation between the user and the object in the knowledge graph; and finally, inquiring the sensitive data of the whole knowledge graph, outputting the sensitive data, and improving the recognition speed.

Description

Sensitive identification method based on knowledge graph
Technical Field
The invention relates to the technical field of data security, in particular to a sensitive identification method based on a knowledge graph.
Background
A recommender system is an information filtering tool that aims to accurately predict the user's preference for items, thereby giving them priority to items that are more valuable to the user. The historical behavior data of the user is a supporting basis of the recommendation system, and often relates to personal sensitive data of the user. The premise of privacy protection of sensitive data is that the sensitive data can be selected from a large amount of data, and identification of the sensitive data is completed.
The traditional recognition method of the sensitive data mainly comprises a dictionary matching method and a manual recognition method. The industry mostly adopts a combination mode of a dictionary matching method and a manual identification method to identify sensitive data. The main process is as follows: the user defines a sensitive data pattern matching formula, a dictionary matching range is determined according to a predefined model, then matching scanning is carried out on a target by using dictionary matching, after scanning is completed, matching results are filtered manually, and the pattern data matching formula is optimized, but recognition speed is slow due to judging standards and dictionary matching problems.
Disclosure of Invention
The invention aims to provide a sensitive identification method based on a knowledge graph, which improves the identification speed.
In order to achieve the above purpose, the invention provides a sensitive identification method based on a knowledge graph, which comprises the following steps:
preprocessing the acquired original data, and constructing a pattern diagram of the user object;
constructing a knowledge graph according to the pattern graph and the preprocessed data;
constructing a sensitive relation reasoning rule and complementing the knowledge graph;
and inquiring the sensitive data in the knowledge graph and outputting the sensitive data.
The method comprises the steps of preprocessing acquired original data and constructing a pattern diagram of a user object, wherein the method comprises the following steps:
and unifying the data storage formats and the coding methods in the acquired various types of original data, and deleting redundant data.
The method comprises the steps of preprocessing the acquired original data, constructing a pattern diagram of a user object, and further comprising:
and taking the age, occupation and sex of the user as attributes of the user, marking the relationship between the user and the articles as purchasing relationship, and then carrying out entity alignment on the preprocessed data by adopting a database tool to construct a pattern diagram of the articles of the user.
The method for constructing the knowledge graph comprises the steps of:
and taking the user and the article as nodes, and constructing an attribute map model according to the obtained key value pair of each attribute of the user and the article.
Wherein, construct the knowledge graph according to the pattern diagram and the preprocessed data, further include:
mapping the user as a head entity, mapping the object as a tail entity, mapping the relation between the user and the corresponding object as 0 or 1, and storing a knowledge graph by adopting a graph database.
The query on the sensitive data in the knowledge graph and the output of the sensitive data comprise:
and inquiring the graphic data in the completed knowledge graph by using a graphic inquiry language, and returning all users and object nodes with corresponding sensitive relations according to the declared inquiry targets.
Wherein, query the sensitive data in the knowledge graph, and output the sensitive data, further comprising:
and restoring the returned sensitive nodes into the corresponding original data according to the data storage format and the coding method, and storing the corresponding original data into a corresponding storage file.
In the sensitive identification method based on the knowledge graph, firstly, in order to construct the user-commodity knowledge graph, the acquired original data set is required to be preprocessed, a pattern graph of the user-object is constructed through the preprocessed data, and then the knowledge graph is constructed according to the preprocessed data and the pattern graph; secondly, in order to identify sensitive data, the built sensitive relation reasoning rules complement the originally non-existing sensitive relation between the user and the object in the knowledge graph; and finally, inquiring the sensitive data of the whole knowledge graph, outputting the sensitive data, and improving the recognition speed.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic step diagram of a knowledge-graph-based sensitive identification method provided by the invention.
Fig. 2 is a schematic flow chart of a sensitive identification method based on a knowledge graph.
Fig. 3 is a schematic diagram of a user-commodity provided by the present invention.
FIG. 4 is a property graph model provided by the present invention.
Detailed Description
Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative and intended to explain the present invention and should not be construed as limiting the invention.
In the description of the present invention, the meaning of "a plurality" is two or more, unless explicitly defined otherwise.
Referring to fig. 1 and 2, the present invention provides a sensitive identification method based on a knowledge graph, which includes the following steps:
s101, preprocessing the acquired original data, and constructing a pattern diagram of the user object.
Specifically, the original data includes structured scoring data and unstructured text data, such as comments of users and descriptions of products, and the part of the work mainly includes the following aspects:
1. for various types of data, the data storage format and the coding method are unified first, and in order to meet the requirements of subsequent knowledge extraction and data storage, grading data of multiple data sources are converted into files containing user IDs, object IDs, user attributes and object attributes.
2. Because of the existence of a plurality of redundant data in the electronic commerce data, such as repeated scoring records, advertisement information in texts, repeated paragraphs and low-quality data, the data with low reliability and incompleteness are removed, so that the data is normalized, the usability of the data is improved, the subsequent data identification quantity is reduced, and the identification speed is improved.
The preprocessed data mainly comprises three files of rates, namely, users, costs and products, wherein the rates mainly store the user and the scoring of the product, the users mainly store the user and the attribute (gender, age, occupation and the like), and the products mainly store the product and the attribute (such as category).
And selecting the age, occupation and sex of the user as attributes of the user, and marking the relationship between the user and the article as a purchasing relationship. And carrying out entity alignment on the data in the data set by adopting a database tool. And constructing a pattern diagram of the user object through the preprocessed information, wherein the pattern diagram comprises the user, the commodity, the related attribute of the user and the purchasing relationship between the user and the commodity as shown in fig. 3.
S102, constructing a knowledge graph according to the pattern graph and the preprocessed data.
Specifically, a user commodity knowledge graph is constructed through the preprocessed data and the designed pattern graph. The method mainly comprises the steps of knowledge representation, knowledge extraction, knowledge fusion and the like.
The specific steps of knowledge representation are as follows:
the attribute map model is used as a data model. The user and the object are expressed as nodes, the user nodes have the attributes of age, occupation, sex and the like, the object nodes have the attributes of category, price and the like, and each attribute is a key value pair. Each edge has a label that indicates the relationship. Edges also have attributes. For example, the number of the cells to be processed, "user Li Ming once purchased a pen........the corresponding attributes are as shown in figure 4, the corresponding triples may be represented as (Li Ming, purchase, pen), meaning that there is a" buy "link between the head entity" Li Ming "and the tail entity" pen ", while also displaying the age of the user and the price of the good.
The specific steps of knowledge extraction are as follows:
the purpose of knowledge extraction is to perform entity extraction, entity relation extraction and attribute extraction on structured data and unstructured data and store the structured data and unstructured data in a knowledge graph. For structured data, the user ID is directly mapped to the ID of the head entity, the object ID is mapped to the ID of the tail entity, and the relation between the user ID and the object ID is mapped to 1 or 0, so that whether interaction is performed between the user and the object is indicated. Such as triplet (231,1,324) indicates that the user with ID 231 purchased the item with ID 324.
The specific steps of storing the knowledge graph are as follows:
the user-object relationship knowledge graph is of a graph structure, and the storage of a relational database is unfavorable for the subsequent searching and processing of sensitive relationships, so that the user-object knowledge graph is stored by using a mainstream graph database Neo4j, a user entity is stored as a node in the graph database, and the relationship between the user and the object is stored as an edge for connecting the node.
After entity extraction, entity relation extraction and attribute extraction, a corresponding knowledge graph is established, and in order to save the knowledge graph, a corresponding graph database can be used for saving data. The knowledge graph stored in the graph database can be well fused with multi-source data in a recommended system scene, and the database is queried through the graph query language, so that sensitive data can be extracted rapidly.
S103, constructing a sensitive relation reasoning rule and complementing the knowledge graph.
Specifically, in order to identify sensitive data, first, a sensitive relationship reasoning rule between a user and an article needs to be constructed. Since the user has attributes of age, occupation, sex, etc., the item has attributes of price, category, etc. Examples of defined rules are as follows:
rule1: users of ordinary profession often purchase medications, the item is sensitive to the current user.
And adopting a Jena tool to embed a rule-based inference engine for inference. The method mainly comprises the steps of modeling a basic module, constructing an ontology and adding an inference engine.
1. Modeling required module
The most basic package of the model, org.apache.jena.rdf.model, is first built for modeling. And secondly, establishing a binary relation between org.apache.jena.vocabolary.RDF and org.apache.jena.vocabolary.RDFS in RDF and RDFS. org.apache.jena.reactioner.reactioner and org.apache.jena.reactioner.reactionging are used to create inference engines.
2. Building a ontology
The Model built in step 1 above is essentially the knowledge base structure in Jena, namely the ontology.
3. Adding inference engine
And directly selecting a built-in RDFS (remote data file system) inference engine to complete sensitive relation inference.
After Rule1 is executed, sensitive relations are added among all nodes meeting the conditions in the knowledge graph.
S104, inquiring the sensitive data in the knowledge graph and outputting the sensitive data.
Specifically, in step S103, the sensitive relationship between the user and the object is already complemented in the knowledge graph. The relationship between the nodes is expanded from the original single purchase relationship to a purchase relationship and a sensitive relationship. The graphic data in the knowledge graph is queried through a descriptive graphic query language, namely Cypher, and the main method is to directly declare 'query target' for the knowledge graph. An example script, as follows, would return all nodes that have a relationship with the active tag, i.e., all users and item nodes that have a sensitive relationship.
match(n)--(m:sensitive)
returnn;
Outputting sensitive data
For all the identified sensitive nodes, restoring the sensitive nodes into original data according to a data storage format and an encoding method, storing the original data into a storage file, and storing the original data into the established graph database, so that the sensitive data is selected from a large amount of data, and the identification of the sensitive data is completed.
The innovation points of the invention include the following aspects:
a personalized privacy definition is presented that takes into account the sensitivity between the user and the merchandise. The privacy protection problem of the existing recommendation system assumes that feedback data of users are sensitive, and in fact, the sensitivity of different users to different commodities is different, namely, different users have different privacy protection requirements, and different objects have different sensitivity degrees.
The method is provided that the sensitive data is identified by constructing a user-commodity knowledge graph and completing the sensitive relationship which does not exist in the knowledge graph through relationship reasoning, so that the problem of low identification speed of the traditional sensitive data identification method can be solved.
Compared with the prior art, the method not only effectively improves the identification accuracy of the sensitive data, but also improves the identification speed when facing a large amount of complex data.
In the sensitive identification method based on the knowledge graph, firstly, in order to construct the user-commodity knowledge graph, the acquired original data set is required to be preprocessed, a pattern graph of the user-object is constructed through the preprocessed data, and then the knowledge graph is constructed according to the preprocessed data and the pattern graph; secondly, in order to identify sensitive data, the built sensitive relation reasoning rules complement the originally non-existing sensitive relation between the user and the object in the knowledge graph; and finally, inquiring the sensitive data of the whole knowledge graph, outputting the sensitive data, and improving the recognition speed.
The above disclosure is only a preferred embodiment of the present invention, and it should be understood that the scope of the invention is not limited thereto, and those skilled in the art will appreciate that all or part of the procedures described above can be performed according to the equivalent changes of the claims, and still fall within the scope of the present invention.

Claims (1)

1. The sensitive identification method based on the knowledge graph is characterized by comprising the following steps of:
preprocessing the acquired original data, and constructing a pattern diagram of the user object;
constructing a knowledge graph according to the pattern graph and the preprocessed data;
constructing a sensitive relation reasoning rule and complementing the knowledge graph;
inquiring the sensitive data in the knowledge graph and outputting the sensitive data;
preprocessing the acquired original data, and constructing a pattern diagram of the user object, wherein the method comprises the following steps:
unifying the data storage formats and the coding methods in the acquired various types of original data, and deleting redundant data;
preprocessing the acquired original data, constructing a pattern diagram of the user object, and further comprising:
taking the age, occupation and sex of the user as attributes of the user, marking the relationship between the user and the articles as purchasing relationship, and then carrying out entity alignment on the preprocessed data by adopting a database tool to construct a pattern diagram of the articles of the user;
constructing a knowledge graph according to the pattern graph and the preprocessed data, wherein the knowledge graph comprises the following steps:
taking a user and an article as nodes, and constructing an attribute map model according to the obtained key value pair of each attribute of the user and the article;
constructing a knowledge graph according to the pattern graph and the preprocessed data, and further comprising:
mapping the user as a head entity, mapping the object as a tail entity, mapping the relation between the user and the corresponding object as 0 or 1, and storing a knowledge graph by adopting a graph database;
inquiring the sensitive data in the knowledge graph and outputting the sensitive data, wherein the method comprises the following steps:
inquiring the graphic data in the completed knowledge graph by using a graphic inquiry language, and returning all users and object nodes with corresponding sensitive relations according to the declared inquiry targets;
inquiring the sensitive data in the knowledge graph, outputting the sensitive data, and further comprising:
and restoring the returned sensitive nodes into the corresponding original data according to the data storage format and the coding method, and storing the corresponding original data into a corresponding storage file.
CN202011082927.6A 2020-10-12 2020-10-12 Sensitive identification method based on knowledge graph Active CN112163160B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011082927.6A CN112163160B (en) 2020-10-12 2020-10-12 Sensitive identification method based on knowledge graph

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011082927.6A CN112163160B (en) 2020-10-12 2020-10-12 Sensitive identification method based on knowledge graph

Publications (2)

Publication Number Publication Date
CN112163160A CN112163160A (en) 2021-01-01
CN112163160B true CN112163160B (en) 2023-08-08

Family

ID=73868157

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011082927.6A Active CN112163160B (en) 2020-10-12 2020-10-12 Sensitive identification method based on knowledge graph

Country Status (1)

Country Link
CN (1) CN112163160B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113239392A (en) * 2021-04-02 2021-08-10 国网福建省电力有限公司信息通信分公司 Desensitization method based on data center sensitive data
CN114021191B (en) * 2021-11-05 2022-07-01 江苏安泰信息科技发展有限公司 Safe production informatization sensitive data management method and system
CN114186689B (en) * 2022-02-14 2022-05-20 支付宝(杭州)信息技术有限公司 Methods, systems, apparatus, and media for path discovery in a knowledge graph
CN114417845B (en) * 2022-03-30 2022-07-12 支付宝(杭州)信息技术有限公司 Same entity identification method and system based on knowledge graph
CN115099909A (en) * 2022-08-23 2022-09-23 深圳洽客科技有限公司 Information processing method and system based on E-commerce intention database mining

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9419841B1 (en) * 2011-06-29 2016-08-16 Amazon Technologies, Inc. Token-based secure data management
CN108471414A (en) * 2018-03-24 2018-08-31 海南大学 Internet of Things data method for secret protection towards typing resource
CN108647532A (en) * 2018-05-15 2018-10-12 广东因特利信息科技股份有限公司 Method, apparatus, electronic equipment and the storage medium of sensitive users mark secrecy
CN108804950A (en) * 2018-06-09 2018-11-13 海南大学 Based on data collection of illustrative plates, modeling and the data-privacy guard method of Information Atlas and knowledge mapping
CN109284394A (en) * 2018-09-12 2019-01-29 青岛大学 A method of Company Knowledge map is constructed from multi-source data integration visual angle
KR101987915B1 (en) * 2017-12-22 2019-06-12 주식회사 솔트룩스 System for generating template used to generate query to knowledge base from natural language question and question answering system including the same
CN111259260A (en) * 2020-03-30 2020-06-09 九江学院 Privacy protection method in personalized recommendation based on sorting classification
CN111353091A (en) * 2018-12-24 2020-06-30 北京三星通信技术研究有限公司 Information processing method and device, electronic equipment and readable storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140214895A1 (en) * 2013-01-31 2014-07-31 Inplore Systems and method for the privacy-maintaining strategic integration of public and multi-user personal electronic data and history
US20150286709A1 (en) * 2014-04-02 2015-10-08 Samsung Electronics Co., Ltd. Method and system for retrieving information from knowledge-based assistive network to assist users intent
US10542015B2 (en) * 2016-08-15 2020-01-21 International Business Machines Corporation Cognitive offense analysis using contextual data and knowledge graphs
US10521608B2 (en) * 2018-01-09 2019-12-31 Accenture Global Solutions Limited Automated secure identification of personal information
US11016965B2 (en) * 2019-01-22 2021-05-25 International Business Machines Corporation Graphical user interface for defining atomic query for querying knowledge graph databases

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9419841B1 (en) * 2011-06-29 2016-08-16 Amazon Technologies, Inc. Token-based secure data management
KR101987915B1 (en) * 2017-12-22 2019-06-12 주식회사 솔트룩스 System for generating template used to generate query to knowledge base from natural language question and question answering system including the same
CN108471414A (en) * 2018-03-24 2018-08-31 海南大学 Internet of Things data method for secret protection towards typing resource
CN108647532A (en) * 2018-05-15 2018-10-12 广东因特利信息科技股份有限公司 Method, apparatus, electronic equipment and the storage medium of sensitive users mark secrecy
CN108804950A (en) * 2018-06-09 2018-11-13 海南大学 Based on data collection of illustrative plates, modeling and the data-privacy guard method of Information Atlas and knowledge mapping
CN109284394A (en) * 2018-09-12 2019-01-29 青岛大学 A method of Company Knowledge map is constructed from multi-source data integration visual angle
CN111353091A (en) * 2018-12-24 2020-06-30 北京三星通信技术研究有限公司 Information processing method and device, electronic equipment and readable storage medium
CN111259260A (en) * 2020-03-30 2020-06-09 九江学院 Privacy protection method in personalized recommendation based on sorting classification

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
焦佳.社会网络数据发布中一种敏感关系的隐私保护方法.现代计算机(专业版).2013,全文. *

Also Published As

Publication number Publication date
CN112163160A (en) 2021-01-01

Similar Documents

Publication Publication Date Title
CN112163160B (en) Sensitive identification method based on knowledge graph
US11373106B2 (en) System and method for detecting friction in websites
US8589429B1 (en) System and method for providing query recommendations based on search activity of a user base
US7383254B2 (en) Method and system for identifying object information
CN107330752B (en) Method and device for identifying brand words
CN109446341A (en) The construction method and device of knowledge mapping
WO2022156529A1 (en) Commodity recommendation method and apparatus for enterprise user
CN104636402A (en) Classification, search and push methods and systems of service objects
WO2006108694A1 (en) Model-driven event detection, implication, and reporting system
US8972463B2 (en) Method and apparatus for functional integration of metadata
CN110909536A (en) System and method for automatically generating articles for a product
KR101505858B1 (en) A templet-based online composing system for analyzing reports or views of big data by providing past templets of database tables and reference fields
US20180349981A1 (en) Methods and systems of discovery of products in e-commerce
Malik et al. EPR-ML: E-Commerce Product Recommendation Using NLP and Machine Learning Algorithm
CN107146095B (en) Method and device for processing display information of mail and mail system
CN117172725A (en) Knowledge-graph-based industrial chain multi-cooperation intelligent decision method
CN113190651B (en) Electric power data global knowledge graph completion method based on quota knowledge graph technology
US20230385291A1 (en) Semantic entity search using vector space
CN111444368B (en) Method and device for constructing user portrait, computer equipment and storage medium
CN112257448A (en) Multitask named entity identification method, system, medium and terminal
US20150052028A1 (en) Systems and Methods for Recommending Providers and for Processing Product Inventories of Providers
CN112862553A (en) Commodity recommendation method and device
CN114036387A (en) Large health field label system and user portrait construction method
CN112100202B (en) Product identification and product information completion method, storage medium and robot
CN116739626A (en) Commodity data mining processing method and device, electronic equipment and readable medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant