CN107491472B - Life cycle-based big data platform sensitive data secure sharing system and method - Google Patents

Life cycle-based big data platform sensitive data secure sharing system and method Download PDF

Info

Publication number
CN107491472B
CN107491472B CN201710483185.XA CN201710483185A CN107491472B CN 107491472 B CN107491472 B CN 107491472B CN 201710483185 A CN201710483185 A CN 201710483185A CN 107491472 B CN107491472 B CN 107491472B
Authority
CN
China
Prior art keywords
data
service
sensitive
platform
sharing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710483185.XA
Other languages
Chinese (zh)
Other versions
CN107491472A (en
Inventor
陈海江
祝超峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Lishi Technology Co Ltd
Original Assignee
Zhejiang Lishi Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Lishi Technology Co Ltd filed Critical Zhejiang Lishi Technology Co Ltd
Priority to CN201710483185.XA priority Critical patent/CN107491472B/en
Publication of CN107491472A publication Critical patent/CN107491472A/en
Application granted granted Critical
Publication of CN107491472B publication Critical patent/CN107491472B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2216/00Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
    • G06F2216/03Data mining

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a life cycle-based big data platform sensitive data secure sharing system and method. The invention can integrate big data platforms under various network services and provide a uniform big data storage and sharing system for big data services of the network services. For sensitive information related to the safety and privacy of users, the invention newly sets a data safety layer, and realizes the separation and extraction of sensitive information data and common service processing data for service processing data in the layer. The sensitive information data are independently stored in each network service platform, and the life cycle of the sensitive information data is set according to the security level; on the basis, a special safe exchange sharing mechanism is realized among all big data platforms aiming at sensitive information data based on the life cycle.

Description

Life cycle-based big data platform sensitive data secure sharing system and method
Technical Field
The invention belongs to the field of big data information mining and analysis, and particularly relates to a life cycle-based big data platform sensitive data secure sharing system and method.
Background
The big data technology is implemented by a storage operation platform with mass data load capacity, aiming at business data generated in various computer network service operations, performing collection, processing and analysis, mining meaningful interconnection and change rules in various information, and applying the relevant rules to practical application.
Currently, various network service operators facing a large customer base are vigorously developing and using large data platforms. For example, banks can arrange and analyze the deposit and payment, credit card transaction and loan record of clients through a big data platform, determine the income level and the asset scale of the clients, and provide services for the clients in the aspects of overdraft limit determination, loan risk assessment, financial management customization and the like. The big data platform for online shopping analyzes the consumption capability and consumption preference of the user from the commodity selection and transaction record information of the customer, on one hand, the future sales volume of various types of commodities can be predicted macroscopically, and on the other hand, targeted services such as advertisement pushing, personalized customization and the like can be executed for specific customers. The big data platform of the medical system can acquire the recorded data of the target individual in the aspects of medical treatment, big disease operation, regular physical examination, immunoprophylaxis, psychological consultation and the like and perform specialized analysis, thereby providing an assessment report related to the health condition of the individual. The establishment of the big data platform further promotes the intelligent degree of various network services, and effectively develops the value of the information data.
However, the traditional big data platform is deployed and operated independently by operators of various network services, and business data generated under one network service is stored, analyzed and applied in the big data platform of the traditional big data platform; the big data platforms are dispersed and isolated from each other, and the layout of cross-platform integration is not realized, so that the comprehensive analysis and application of the big data comprising multiple platforms cannot be supported. For example, user asset and income information under a big data platform of a bank cannot be obtained by the online shopping platform, and is less likely to be applied to the analysis of the user consumption capacity by the online shopping platform.
The mutual isolation of large data platforms is largely due to the consideration of information security and user privacy protection. Because all or most of business data generated in the process of enjoying network services by users are collected under the large data platform, sensitive information related to user identity confidentiality, account security and personal privacy, such as bank account numbers, stored value account numbers, personal names, identity card numbers, medical insurance card numbers, telephones, addresses and the like, is inevitably carried, and once the information is leaked, serious harm is brought to the life of the users, and the reputation of service operators is also influenced.
Therefore, the operator sets the access right limit for the service data in the big data platform and the processing data obtained by performing analysis calculation on the basis of the service data, and applies a relatively strict security measure. Particularly, data transmission and exchange are carried out on data access requests from the outside of the big data platform (including access requests submitted to the platform by other big data platforms) through a fixed preset interface according to a strictly defined rule, so that means such as authority control, identity authentication, data encryption, log recording and the like are conveniently applied to data output of the big data platform facing the outside of the platform.
However, the above measures also cause that data exchange between large data platforms depends on limited preset interfaces and complex transmission rules, and the delay is large, the speed is slow, and the efficiency is low; thus, only sporadic, small data exchanges can be performed.
In fact, the distribution of valuable information in big data is extremely sparse, and the valuable information can be extracted only by sorting, comparing, associating and clustering massive business data. For realizing comprehensive analysis and application of big data in a plurality of network services, small amount of data exchange between big data platforms of the network services cannot be realized at all, and the mutual sharing of the large-scale data among the big data platforms must be supported.
That is to say, the cross-platform data sharing requires that massive business data generated in multiple network services are transmitted and applied together between large data platforms, which cannot be supported by the existing data exchange methods.
Disclosure of Invention
In order to overcome the defects in the prior art, the invention provides a life cycle-based large data platform sensitive data secure sharing system and method.
The invention can integrate big data platforms under various network services and provide a uniform big data storage and sharing system for big data services of the network services; the big data platforms can execute big data processing analysis operation on original business data generated by self network service to generate business processing data, and the business processing data is put into the big data storage sharing system; moreover, each big data platform can utilize the big data storage sharing system to obtain required business processing data related to various network services in a cross-platform mode, and the analysis and application of the big data level are carried out on the data.
For sensitive information related to safety and privacy of users, a data safety layer is newly arranged in a big data storage and sharing system, separation and extraction of sensitive information data and common service processing data are realized on the service processing data in the layer, and a mapping relation table of the sensitive information data and the common service processing data is established; for sensitive information data, defining a security level on a data security layer, and setting a life cycle according to the security level; on the basis, sensitive information data are independently stored under a big data platform of each network service; and based on the security level and the life cycle, a special security exchange sharing mechanism is realized among all the big data platforms aiming at sensitive information data, wherein the special security exchange sharing mechanism comprises temporary shared storage and limited specific interface exchange. And for common service processing data, after further personal information carding, the common service processing data enters a data sharing layer to realize cross-platform uniform storage, sharing and application.
The invention provides a secure sharing system for sensitive data of a big data platform, which is characterized by comprising the following components: the system comprises a data acquisition layer, a data processing layer, a data security layer and a big data sharing layer; the data acquisition layer, the data processing layer and the data security layer are deployed in service systems of various network service platforms; the big data sharing layer is deployed across a plurality of network service platforms and provides a uniform big data storage sharing layer for big data services of each network service platform;
the data acquisition layer is used for acquiring all original service data generated in the operation process of the network service platform and adding the original service data into an original service data warehouse;
the data processing layer is a big data mining analysis platform of the network service platform and is used for carrying out data mining and analysis algorithms in a highly parallel computing mode facing to a data stream formed by original business data and generating business processing data after data primary processing;
the data security layer is embedded in a service system of the network service platform and is used for separating a small amount of sensitive information from massive service processing data and respectively generating sensitive data containing the sensitive information and common service data without the sensitive information; registering a mapping relation between sensitive data and common service data, wherein the mapping relation corresponds to the sensitive data and the common service data; sensitive data are independently stored and managed in a data security layer, wherein a life cycle is set for the sensitive data, and strictly limited exchange sharing is carried out between network service platforms, wherein the exchange sharing comprises temporary shared storage based on the life cycle duration and limited specific interface exchange; transmitting common service data from a data security layer to a big data sharing layer for cross-platform sharing exchange;
the big data sharing layer is used for uniformly storing the common business data obtained from the data security layers of the plurality of network service platforms and providing uniform data read-write access standards and operation specifications, so that each network service platform can obtain the common business data of each network service platform from the big data sharing layer and apply the common business data to big data mining analysis.
Preferably, the big data platform sensitive data secure sharing system further includes: the limiting exchange interface is used for responding to a short-time calling request sent by an external network service platform to a data security layer of the network service platform and providing requested sensitive data to the external network service platform; and, limiting a length of time that the sensitive data provided to the external network platform is decryptable and applicable based on the lifecycle of the sensitive data.
Preferably, the big data platform sensitive data secure sharing system further includes: the temporary shared storage area is positioned in the big data sharing layer, responds to a short-time calling request sent by an external network service platform to a data security layer of the network service platform, and uploads requested sensitive data to the temporary shared storage area by the data security layer, and the external network service platform can acquire the sensitive data from the temporary shared storage area; and the temporary shared storage area controls the sharing duration of the sensitive data based on the life cycle of the sensitive data.
Preferably, the data acquisition layer comprises:
the data adaptation interface is used for adapting to the data type, format and export rule of a service system of the network service platform and serving as an output channel of original service data generated by the service system;
the data acquisition module is used for receiving original service data from the data adaptation interface in real time or non-real time;
the data query module is used for actively sending a query message to the data adaptation interface and receiving original service data transmitted by the data adaptation interface in response to the query message;
the data verification module is used for verifying the acquired original service data according to a predefined data verification rule, removing or re-acquiring incomplete or non-rule-conforming original service data through verification, and providing the complete verified original service data meeting the rule requirements for the data processing module;
the data conversion processing module is used for carrying out ETL (extraction, conversion and loading) processing on the original service data which is verified to be qualified by the data verification module and converting the original service data into a standard data format;
and the original service data warehouse is used for storing the original service data in the standard data format after being converted by the data conversion processing module.
Preferably, the big data mining analysis platform of the data processing layer comprises:
the intra-platform interface module is used for receiving the upper-layer scheduling of the network service platform service system and transmitting a data mining analysis task downwards; and externally outputting the service processing data processed by the lower layer;
the parallel flow task module is used for receiving a data mining analysis task, setting at least one task flow according to the task and managing the generation, maintenance and killing of each task flow; extracting original service data from an original service data warehouse in a data stream mode for each task stream, and providing the original service data to different sub-modules of the data association analysis module to realize parallel task processing;
the data association analysis module comprises a data association calculation submodule, a data classification calculation submodule and a data clustering calculation submodule; the sub-modules respectively undertake the task flow matched with the algorithm types of the sub-modules, receive the data flow of the original service data corresponding to the task flow, calculate the data flow by using the algorithm of the sub-modules to obtain service processing data, and generate and output the data flow of the service processing data to the interface module in the platform.
The data security layer comprises:
the sensitive data extraction module is used for filtering out a service processing data unit containing sensitive information from the service processing data according to a preset filtering rule aiming at the service processing data and providing the service processing data unit containing the sensitive information to the sensitive data separation module; the service processing data unit which is filtered and does not contain sensitive information is used as common service data to be provided to the big data sharing layer;
the sensitive data separation module is used for separating the service processing data unit containing sensitive information filtered by the sensitive data extraction module into a sensitive data part and a common service data part, establishing, storing and maintaining a mapping relation table of the sensitive data part and the common service data part, and registering the corresponding relation between the sensitive data part and the common service data part separated by the same service processing data unit in the table; after mapping registration, providing a common service data part to a big data sharing layer, and providing a sensitive data part to a sensitive data storage management unit for storage and management;
the sensitive data storage management unit receives and stores the sensitive data part from the sensitive data separation module and sets the life cycle for the sensitive data part according to the security level of the sensitive data part;
and the data sharing interface uploads the common service data to the big data sharing layer.
Further preferably, the data security layer further includes: and the depersonalized data combing module is used for combing the data of the depersonalized information for the common service data which does not contain the sensitive information or the common service data part of which the sensitive information is separated by the sensitive data separating module.
Preferably, the big data sharing layer specifically includes:
the shared storage library is used for receiving common service data from the data security layers of the network service platforms and storing the common service data in a unified manner;
and the standardized sharing interface is used for obtaining the common service data of each platform uniformly stored in the shared storage library by each network service platform through the interface by adopting a uniform data read-write access standard and an operation standard.
The invention further provides a life cycle-based big data platform sensitive data secure sharing method, which is characterized by comprising the following steps:
acquiring original business data generated by various network service platforms, and executing big data processing analysis operation on the original business data to generate business processing data;
separating sensitive information data containing sensitive information from common service processing data not containing sensitive information for the service processing data; for a sensitive data part and a common service data part separated from the same service processing data unit, establishing a mapping relation table of the sensitive information data part and the common service processing data part; for the sensitive information data part, defining a security level, and setting a life cycle according to the security level;
the separated sensitive information data are independently stored in each network service platform;
a special safe exchange sharing mechanism is adopted, and based on the life cycle, strictly limited exchange sharing is carried out on sensitive information data among all big data platforms, wherein the strictly limited exchange sharing comprises temporary shared storage and limited specific interface exchange;
and transmitting the common service data to a data sharing layer which spans a plurality of network service platforms, and performing unified storage, sharing and application of the platforms.
Preferably, the time during which the sensitive data provided to the external network platform is decryptable and applicable is restricted by the specific interface exchange according to the lifecycle of the sensitive data; or, limiting the sharing duration of the temporary shared storage according to the life cycle of the sensitive data.
In fact, sensitive information about user security and privacy is very sparse relative to the traffic data volume of the whole network platform. The invention separates a small amount of sensitive information from massive business data, and the sensitive information is independently stored and managed by large data platforms of each network service, and strictly limited exchange and sharing are carried out among the platforms. The method comprises the steps of establishing a life cycle-based exchange sharing mechanism, and ensuring that an external platform obtains and utilizes sensitive information of the platform temporarily and unrepeatably. For common service data, by using the unified storage and sharing mechanism of the invention, each big data platform can be obtained and analyzed, thus breaking through the bottleneck of interface data exchange in the prior art and realizing the cross-platform big data application of mass levels.
Drawings
FIG. 1 is a schematic diagram of a hierarchical architecture of a big data platform sensitive data secure sharing system according to the present invention;
FIG. 2 is a schematic block diagram of a data acquisition layer according to the present invention;
FIG. 3 is a schematic structural diagram of a big data mining analysis platform module of the data processing layer according to the present invention;
FIG. 4 is a block diagram of the data security layer according to the present invention;
FIG. 5 is a block diagram of a big data sharing layer according to the present invention.
Detailed Description
The technical scheme of the invention is further specifically described by the following embodiments and the accompanying drawings.
As shown in fig. 1, the big data platform sensitive data security sharing system provided by the present invention is divided into a data acquisition layer 1, a data processing layer 2, a data security layer 3, and a big data sharing layer 4.
The data acquisition layer 1 is oriented to various network services, such as online banking, online shopping, medical health information systems and the like, acquires all original business data generated in the running process of the network services, and adds the original business data into an original business data warehouse; for example, a bank account balance flow sheet, a loan and interest repayment record sheet, a signal card swiping card bill and the like of an online bank, a shopping cart record, a transaction record, a payment record and the like of online shopping, a registration and treatment record, a drug billing bill, a physical examination report sheet and the like of a medical health information system can be used as original business data.
FIG. 2 is a schematic block diagram of a data acquisition layer according to the present invention; data acquisition layer module frameworks belonging to the network services are independently deployed in service systems of the network services such as banks, online shopping, medical information and the like. The architecture of the data acquisition layer 1 in each network service business system comprises:
the data adaptation interface 101 is configured in the service system according to a docking mechanism of the service system of the network service itself, and the data adaptation interface 101 adapts to the data type, format and export rule of the service system itself and serves as an output channel of original service data generated in the operation process of the service system.
The data collection module 102 receives raw service data from the data adaptation interface 101 in real time or non-real time according to a ready notification message sent by the data adaptation interface in a ready state of the raw service data that can be output by the data adaptation interface.
A data query module 103 that actively sends query messages to the data adaptation interface 101 and receives raw service data transmitted by the interface in response to the query messages. When the real-time or non-real-time reception of the data acquisition module 102 is interrupted or data is incomplete, the data query module 103 can send the query message and re-receive the original service data to ensure the integrity of the service data. When the big data application needs to actively query the target service data, the query message can also be sent by the data query module 103 to actively extract the original service data.
And the data verification module 104 is configured to verify the original service data obtained by the data acquisition module 102 and the data query module 103 according to a data verification rule predefined to adapt to a specific situation of the network service system. For data which is determined to be incomplete or not in accordance with the rules after verification, such as format error, messy codes, numerical value exceeding the limit value, consistency verification error, etc., the data verification module 104 sends feedback to the data query module 103, and instructs the data query module 103 to retrieve the defective original service data. If the original business data which are complete and meet the requirement of the verification rule can not be obtained through repeated queries of the data query module 103, the data verification module 104 rejects the problematic data. The original service data that is verified to be complete and meets the requirements of the rules is provided to the data conversion processing module 105 for further processing.
The data conversion processing module 105 performs ETL (extraction, conversion, and loading) processing on the raw service data verified to be qualified by the data verification module 104 by the data conversion module 105, and converts the data format of the multiple source and heterogeneous network service system itself into a uniform data format to generate raw service data in a standard data format suitable for the data processing layer 2 to perform big data processing analysis.
The raw service data warehouse 106 is independently erected inside the network service system, and stores the raw service data in the standard data format converted by the data conversion processing module 105. For example, the original business data warehouse 106 of the data acquisition layer of the present invention is embedded in the business systems of online banking, online shopping and medical health information services, respectively.
The big data mining analysis platform of the network service of bank, online shopping, medical health and the like is used as the data processing layer 2 of the invention. For the original service data stored in the original service data warehouse 106 of the data acquisition layer 1, each network service calls a big data mining analysis platform of the data processing layer 2, and executes a data mining and analysis algorithm in a highly parallel computing mode facing to a data stream formed by the original service data to generate service processing data after the data is primarily processed.
FIG. 3 shows a block diagram of a big data mining analysis platform of the data processing layer. As shown in fig. 3, the big data mining analysis platform sequentially includes, from top to bottom:
the intra-platform interface module 201 is used for receiving system upper layer scheduling and transmitting data mining analysis tasks downwards; and outputs the service processing data processed by the lower layer to the outside. The big data mining analysis platform of the data processing layer 2 is used as an execution carrier of big data mining analysis functions in various network service systems such as banks, online shopping, medical care and the like, receives a scheduling task instruction manually or automatically issued by the upper layer of the network service system through the in-platform interface module 201, obtains a data mining analysis task, and downwardly transmits the data mining analysis task to the parallel flow task module 202. The intra-platform interface module 201 provides a friendly man-machine interaction interface, an operator of a big data platform can conveniently start a mining analysis task on the interface, select an algorithm rule of the mining analysis, input a result feedback time requirement of the mining analysis task, select a target data range and a training data range of the mining analysis, and the interface converts the parameters into a scheduling task instruction issued manually. The upper layer of the system may also issue a task scheduling command automatically, for example, according to the magnitude of raw service data accumulated in the raw service data warehouse 106 or according to the requirement of new data mining analysis, issue the scheduling task command automatically to create a new task. For the service processing data obtained by executing the data mining analysis task, the service processing data is output externally through the in-platform interface module 201, on one hand, the service processing data can be provided for the upper layer of the big data of the network service system to perform deeper analysis and application, and on the other hand, as the key point of the invention, the service processing data is provided for the data security layer 3 so as to perform subsequent cross-platform sharing exchange.
The parallel flow task module 202 receives a data mining analysis task from the intra-platform interface module 201, and sets up at least one task flow, generally a plurality of task flows operating in parallel, according to the task; the parallel stream task module 202 extracts raw traffic data in the form of a data stream from the raw traffic data repository 106 for each task stream, for example, extracts raw traffic data in units of predetermined data units (e.g., one data table, one data block, one data record) at a rate of 200M/s, and provides the extracted data stream of raw traffic data to the data association analysis module 203. The parallel flow task module 202 manages generation, maintenance, and killing of each task flow, and allocates and recovers a data distribution range of original task data extracted by each task flow and system resources occupied by each task flow. For the condition that the same data mining analysis task relates to multiple types of algorithms such as data association calculation, data classification calculation, data clustering calculation and the like, the parallel flow task module 202 responds to the task, sets a task flow for each algorithm type, and delivers the task flow to different sub-modules of the data association analysis module for operation, so that parallel task processing is realized.
The data association analysis module 203, which is a core of the data processing layer 2, as shown in fig. 3, includes a data association calculation sub-module 203A, a data classification calculation sub-module 203B, and a data cluster calculation sub-module 203C. The above sub-modules respectively undertake the task flow matched with the algorithm type of the sub-modules, receive the data flow of the original service data corresponding to the task flow, perform operation on the data flow by using the algorithm of the sub-modules, obtain service processing data, generate and output the data flow of the service processing data to the in-platform interface module 201. For example, the data association calculation submodule 203A may analyze the association between the original service data streams based on at least one of a probability data association algorithm, a joint probability association algorithm, a multi-target data association algorithm, a neural network association algorithm, and the like, and establish an association record table for recording the association, and output the association record table in the form of the service processing data stream. The data classification calculation sub-module 203B may perform data classification calculation for the original service data based on at least one of a bayesian classifier algorithm, a divide-and-conquer decision tree algorithm, a Bagging algorithm, and a linear regression algorithm, and add a classification type tag to the original service data according to a calculation result, so that the tagged data is output as service processing data in the form of a service processing data stream. The data clustering calculation submodule 203C performs learning by using a clustering training data set, performs data clustering calculation on original service data by using data clustering algorithms such as K-Means, and generates a clustering set record table to output in the form of service processing data streams.
As a key ring of a sensitive data security sharing mechanism based on a life cycle, a data security layer 3 is embedded in a business system of each network service such as banking, online shopping, medical health and the like on the basis that a data processing layer 2 provides business processing data. The data security layer 3 separates a small amount of sensitive information from massive business processing data to generate sensitive data and common business data without the sensitive information; and registering the mapping relation of the mutual corresponding application of the sensitive data and the common service data. The data security layer 3 independently stores and manages sensitive data inside a large data platform which is always served by each network, and carries out strictly limited exchange sharing between the platforms, including temporary shared storage and limited specific interface exchange. For common business data, further depersonalized data combing is carried out by the data security layer 3, and then the common business data is handed to the big data sharing layer 4 for sharing exchange of cross platforms. The data security layer 3 establishes a life cycle-based exchange sharing mechanism for sensitive information, and once the life cycle of the sensitive data provided for the external platform exceeds, the sensitive data fails or no longer supports sharing, thereby ensuring that other external network service platforms obtain and utilize the sensitive information in the platform temporarily and unrepeatable.
Fig. 4 shows a schematic block diagram of the data security layer 3 according to the invention. The data security layer 3 in the respective business system of the network services of bank, online shopping, medical health and the like comprises the following modules:
a sensitive data extraction module 301, configured to filter out, according to a predetermined filtering rule, a service processing data unit containing sensitive information from the service processing data obtained from the intra-platform interface module 201, and provide the service processing data unit to a sensitive data separation module 302; the filtered service process data units that do not contain sensitive information are provided to the de-personalization data grooming module 304 as normal service data. As described above, the service processing data is a data unit formed after the association record table, the cluster set record table and the classification label are added to the original service data after being processed by the big data mining analysis algorithm. Setting a predetermined information type as a filtering condition, and if certain type of information directly relates to identity confidentiality, account security and privacy of a user, setting the information type as the filtering condition in the filtering rule, for example, setting information types such as a bank account number, a stored value account number, an identity card number, a medical insurance card number, a telephone, an address and the like as the filtering condition. The sensitive data extracting module 301 filters the service process data according to the filtering rule, that is, determines whether any service process data unit includes the information type specified in the filtering condition, and if so, filters the service process data unit and provides the filtered service process data unit to the sensitive data separating module 302.
The sensitive data separation module 302 separates the service processing data unit containing the sensitive information filtered by the sensitive data extraction module 301 into a sensitive data part and a common service data part, establishes, stores and maintains a mapping relation table of the sensitive data part and the common service data part, and registers a corresponding relation between the sensitive data part and the common service data part separated by the same service processing data unit in the table. For example, for a business processing data unit generated by account opening information in an online banking business system, wherein the business processing data unit contains respective bank account numbers, identity card numbers, contact phones, addresses, stored value balance and drawing records of a plurality of storage users in a certain cluster, the bank account numbers, the identity card numbers, the contact phones and the addresses of the users are separated into sensitive data parts, the stored value balance and the drawing records are classified into common business data parts, and the corresponding relation between the sensitive data parts and the common business data parts of the users is registered in a mapping relation table. A sequence number may be set for each of the sensitive data portion and the general service data portion, and a correspondence relationship between the sequence numbers may be registered in the mapping relationship table. After mapping registration, the common service data part is used as a service processing data unit after sensitive information removal and is provided for the personalized data removal carding module 304; the sensitive data part is provided to the sensitive data storage managing unit 303 for storage and management.
As shown in fig. 4, the sensitive data storage management unit 303 includes a sensitive data warehouse 303A, a security level determination module 303B, and a life cycle setting module 303C. The sensitive data warehouse 303A receives and stores the sensitive data part from the sensitive data separation module 302, so that the sensitive data part is always stored and managed inside the data security layer 3 of each network service business system. The security level judgment module 303B judges the security level of each sensitive data part according to the information type of the sensitive information in the sensitive data part; for example, if the sensitive data part contains a bank account number or an identity card number, the security level of the sensitive data part is set as the highest level; if the sensitive data portion contains a contact phone, setting its security level to a medium level; if the sensitive data part contains address information, the security level of the sensitive data part can be set to the lowest level; the security level determination module 303B adds a security level tag to the corresponding sensitive data portion according to the determined security level. Furthermore, the life cycle setting module 303C sets a life cycle for each sensitive data part according to the security level determined by the security level determining module 303B and according to the corresponding relationship between the predetermined security level and the life cycle; the sensitive data part with the highest level has the shortest time length of the set life cycle, and the lower the level is, the longer the time length of the life cycle is. The life cycle represents the time that sensitive data can be acquired or applied by other platforms except the platform every time in the subsequent cross-platform exchange sharing process, and if the time length of the life cycle is exceeded, the sensitive data can not be acquired or applied by other platforms except the platform. The life cycle setting module 303C adds a life cycle label to the sensitive data part, where the life cycle duration of the sensitive data part is recorded in the life cycle label.
On the one hand, the sensitive data part is stored and managed by the sensitive data storage managing unit 303; on the other hand, the general service data which does not contain sensitive information or the part of the general service data from which the sensitive information is separated by the sensitive data separation module 302 is sent to the depersonalized data combing module 304 to perform further depersonalized data combing. The de-personalization data combing means that original data which embody personal information of a user among general service data is replaced by a substitute symbol. For example, the general business data may include the personal name, age, birth date, medical card number, preferential card number, etc. of the user, and although these information do not belong to the information types of the aforementioned sensitive information, it is not suitable to present some effective information of the user himself, and to directly provide the effective information to other platforms. Therefore, an alternative operation is performed by the de-personalization data grooming module 304 for personal information in the general business data. Specifically, firstly, searching common service data and searching user personal information in the common service data; furthermore, all or a part of original data of the user personal information is replaced by a preset replacing symbol; for example, for the user name Zpeng, the substitution symbol "ZP" may be substituted.
The data security layer 3 also has a data sharing interface 305. For common service data, after being subjected to personalized data combing, the common service data is uploaded to the big data sharing layer 4 through the data sharing interface 305, and is uniformly stored by the big data sharing layer 4, and a uniform access way is provided.
The big data sharing layer 4 is a cross-platform hierarchy in the invention, is oriented to various network services such as online banking, online shopping, medical information systems and the like, and provides a uniform big data storage and sharing system for big data services of the network service platforms. Common service data processed by various network services at the data security layer 3 and output by the data sharing interface 305 are all put at the big data sharing layer 4 for unified storage and management, and unified data reading and writing standards and operation specifications are adopted; the big data platform of each network service can utilize the big data sharing layer 4 to obtain the needed common business data related to various network services in a cross-platform manner, and the analysis and application of the big data level are carried out on the data.
FIG. 5 is a block diagram of a big data sharing layer. The cross-platform unified big data sharing layer 4 specifically comprises the following steps: a shared repository 401 and a standardized shared interface 402. The shared repository 401 is used for receiving and storing common business data from the data sharing interface 305 of each network service platform such as bank, online shopping, medical health, etc. in a unified manner. The shared repository 401 also allows the big data mining analysis platform of each network service to obtain common business data of each platform uniformly stored in the repository through the standardized shared interface 402, and the common business data is applied to the big data business of each network service, so that convenience is brought to the cross-platform application of the big data for comprehensive analysis. The standardized sharing interface 402 adopts a unified data read-write access standard and an operation specification, so that a big data mining analysis platform of each network service can simply and stably obtain support of common service data.
For example, the big data mining analysis platform of the bank can retrieve, from the shared repository 401, customer consumption capacity analysis data generated by big data mining of the online shopping platform integrated with the customer big consumption records, and user health condition analysis data obtained by the medical health service system integrated with the customer physical examination data over the years, and evaluate the medium and long term credit capacity of the customer in combination with the customer general balance status obtained by classifying and analyzing the customer bank account deposit and withdrawal records.
For a certain network service, the sensitive data generated in the operation of the network service is always stored in the sensitive data storage management unit 303 of the platform data security layer 3. In some cases, as a requirement of the cross-platform sharing exchange described later in the present invention, other network services may need to call the sensitive data of the network service for a short time in order to realize big data functions and applications. For example, after determining that the credit ability of the client can bear mortgage-free personal consumption loan through the big data analysis, the bank may need to push the advertisement of the mortgage-free personal consumption loan to the online shopping user in addition to the bank depositor, and at this time, the account information and the contact telephone of the user need to be known from the online shopping platform, which involves the cross-platform acquisition and application of information sensitive to the online shopping platform. To accommodate this requirement, the present invention also includes a restrictive exchange interface 5 across the data security layer 3 and the large data sharing layer 4, and a temporary shared memory 6. The limiting exchange interface 5 and the temporary sharing storage area 6 both allow short-time cross-platform sensitive data exchange and sharing based on the setting of the data security layer 3 on the life cycle of the sensitive data.
Specifically, when the big data service of the bank platform needs to call the sensitive data of the online shopping platform, a short-time call request of the sensitive data can be sent to the sensitive data storage management unit 303 of the online shopping platform data security layer. The sensitive data storage management unit 303 may perform the exchange and sharing of the sensitive data in one of the following two ways in response to the short-time invocation request.
One way is that the sensitive data storage management unit 303 of the online shopping platform data security layer 3 provides the requested sensitive data to the bank platform via the restrictive exchange interface 5; the sensitive data being encrypted and providing a decryption key having a validity duration limit, the validity duration not exceeding a lifetime of the sensitive data; the bank system can obtain the sensitive data by decryption by using the decryption key; when the valid duration is exceeded, sensitive data cannot be decrypted and applied again due to password failure.
The other way is that the sensitive data storage management unit 303 of the online shopping platform data security layer 3 uploads the sensitive data to the temporary shared storage area 6 located in the big data sharing layer 4, and the bank system can obtain the sensitive data from the temporary shared storage area 6 through the standardized sharing interface 402; the temporary shared storage area 6 controls the sharing duration of the sensitive data based on the life cycle of the sensitive data, once the sharing duration reaches the upper limit of the life cycle, the temporary shared storage area 6 terminates the sharing of the sensitive data, and the bank system can not continue to obtain the sensitive data from the temporary shared storage area 6; further, the temporary shared memory area 6 also deletes sensitive data reaching the upper limit of the life cycle.
Therefore, the invention separates a small amount of sensitive information from massive business data, and the sensitive information is independently stored and managed by the large data platforms of each network service and is strictly limited to exchange and share among the platforms. The method comprises the steps of establishing a life cycle-based exchange sharing mechanism, and ensuring that an external platform obtains and utilizes sensitive information of the platform temporarily and unrepeatably. For common service data, by using the unified storage and sharing mechanism of the invention, each big data platform can be obtained and analyzed, thus breaking through the bottleneck of interface data exchange in the prior art and realizing the cross-platform big data application of mass levels.
The above embodiments are only for illustrating the invention and are not to be construed as limiting the invention, and those skilled in the art can make various changes and modifications without departing from the spirit and scope of the invention, therefore, all equivalent technical solutions also belong to the scope of the invention, and the scope of the invention is defined by the claims.

Claims (7)

1. A big data platform sensitive data secure sharing system is characterized by comprising: the system comprises a data acquisition layer, a data processing layer, a data security layer and a big data sharing layer; the data acquisition layer, the data processing layer and the data security layer are deployed in service systems of various network service platforms; the big data sharing layer is deployed across a plurality of network service platforms and provides a uniform big data storage sharing layer for big data services of each network service platform;
the data acquisition layer is used for acquiring all original service data generated in the operation process of the network service platform and adding the original service data into an original service data warehouse;
the data processing layer is a big data mining analysis platform of the network service platform and is used for carrying out data mining and analysis algorithms in a highly parallel computing mode facing to a data stream formed by original business data and generating business processing data after data primary processing;
the data security layer is embedded in a service system of the network service platform and is used for separating a small amount of sensitive information from massive service processing data and respectively generating sensitive data containing the sensitive information and common service data without the sensitive information; registering a mapping relation between sensitive data and common service data, wherein the mapping relation corresponds to the sensitive data and the common service data; sensitive data are independently stored and managed in a data security layer, wherein a life cycle is set for the sensitive data, and strictly limited exchange sharing is carried out between network service platforms, wherein the exchange sharing comprises temporary shared storage based on the life cycle duration and limited specific interface exchange; transmitting common service data from a data security layer to a big data sharing layer for cross-platform sharing exchange;
the big data sharing layer is used for uniformly storing the common business data obtained from the data security layers of the plurality of network service platforms and providing uniform data read-write access standards and operation specifications so that each network service platform can obtain the common business data of each network service platform from the big data sharing layer and apply the common business data to big data mining analysis;
the big data platform sensitive data secure sharing system further comprises: the limiting exchange interface is used for responding to a short-time calling request sent by an external network service platform to a data security layer of the network service platform and providing requested sensitive data to the external network service platform; and, according to the lifecycle of the sensitive data, limiting a length of time that the sensitive data provided to the external network platform can be decrypted and applied;
the big data platform sensitive data secure sharing system further comprises: the temporary shared storage area is positioned in the big data sharing layer, responds to a short-time calling request sent by an external network service platform to a data security layer of the network service platform, and uploads requested sensitive data to the temporary shared storage area by the data security layer, and the external network service platform can acquire the sensitive data from the temporary shared storage area; and the temporary shared storage area controls the sharing duration of the sensitive data based on the life cycle of the sensitive data.
2. The big data platform sensitive data secure sharing system according to claim 1, wherein the data collection layer comprises:
the data adaptation interface is used for adapting to the data type, format and export rule of a service system of the network service platform and serving as an output channel of original service data generated by the service system;
the data acquisition module is used for receiving original service data from the data adaptation interface in real time or non-real time;
the data query module is used for actively sending a query message to the data adaptation interface and receiving original service data transmitted by the data adaptation interface in response to the query message;
the data verification module is used for verifying the acquired original service data according to a predefined data verification rule, removing or re-acquiring incomplete or non-rule-conforming original service data through verification, and providing the complete verified original service data meeting the rule requirements for the data processing module;
the data conversion processing module is used for carrying out ETL (extraction, conversion and loading) processing on the original service data which is verified to be qualified by the data verification module and converting the original service data into a standard data format;
and the original service data warehouse is used for storing the original service data in the standard data format after being converted by the data conversion processing module.
3. The big data platform sensitive data secure sharing system according to claim 1, wherein the big data mining analysis platform of the data processing layer comprises:
the intra-platform interface module is used for receiving the upper-layer scheduling of the network service platform service system and transmitting a data mining analysis task downwards; and externally outputting the service processing data processed by the lower layer;
the parallel flow task module is used for receiving a data mining analysis task, setting at least one task flow according to the task and managing the generation, maintenance and killing of each task flow; extracting original service data from an original service data warehouse in a data stream mode for each task stream, and providing the original service data to different sub-modules of a data association analysis module to realize parallel task processing;
the data association analysis module comprises a data association calculation submodule, a data classification calculation submodule and a data clustering calculation submodule; the sub-modules respectively undertake the task flow matched with the algorithm types of the sub-modules, receive the data flow of the original service data corresponding to the task flow, calculate the data flow by using the algorithm of the sub-modules to obtain service processing data, and generate and output the data flow of the service processing data to the interface module in the platform.
4. The big data platform sensitive data secure sharing system according to claim 1, wherein the data security layer comprises:
the sensitive data extraction module is used for filtering out a service processing data unit containing sensitive information from the service processing data according to a preset filtering rule aiming at the service processing data and providing the service processing data unit containing the sensitive information to the sensitive data separation module; the service processing data unit which is filtered and does not contain sensitive information is used as common service data to be provided to the big data sharing layer;
the sensitive data separation module is used for separating the service processing data unit containing sensitive information filtered by the sensitive data extraction module into a sensitive data part and a common service data part, establishing, storing and maintaining a mapping relation table of the sensitive data part and the common service data part, and registering the corresponding relation between the sensitive data part and the common service data part separated by the same service processing data unit in the table; after mapping registration, providing a common service data part to a big data sharing layer, and providing a sensitive data part to a sensitive data storage management unit for storage and management;
the sensitive data storage management unit receives and stores the sensitive data part from the sensitive data separation module and sets the life cycle for the sensitive data part according to the security level of the sensitive data part;
and the data sharing interface uploads the common service data to the big data sharing layer.
5. The big data platform sensitive data secure sharing system according to claim 4, wherein the data security layer further comprises: and the depersonalized data combing module is used for combing the data of the depersonalized information for the common service data which does not contain the sensitive information or the common service data part of which the sensitive information is separated by the sensitive data separating module.
6. The big data platform sensitive data secure sharing system according to claim 1, wherein the big data sharing layer specifically comprises:
the shared storage library is used for receiving common service data from the data security layers of the network service platforms and storing the common service data in a unified manner;
and the standardized sharing interface is used for obtaining the common service data of each platform uniformly stored in the shared storage library by each network service platform through the interface by adopting a uniform data read-write access standard and an operation standard.
7. A big data platform sensitive data secure sharing method based on a life cycle is characterized by comprising the following steps:
acquiring original business data generated by various network service platforms, and executing big data processing analysis operation on the original business data to generate business processing data;
separating sensitive information data containing sensitive information from common service processing data not containing sensitive information for the service processing data; for a sensitive data part and a common service data part separated from the same service processing data unit, establishing a mapping relation table of the sensitive information data part and the common service processing data part; for the sensitive information data part, defining a security level, and setting a life cycle according to the security level;
the separated sensitive information data are independently stored in each network service platform;
a special safe exchange sharing mechanism is adopted, and based on the life cycle, strictly limited exchange sharing is carried out on sensitive information data among all big data platforms, wherein the strictly limited exchange sharing comprises temporary shared storage and limited specific interface exchange; limiting a length of time during which sensitive data provided to an external network platform is decryptable and applicable in exchange through a particular interface according to the lifecycle of the sensitive data; or, limiting the sharing duration of the temporary sharing storage according to the life cycle of the sensitive data;
and transmitting the common service data to a data sharing layer which spans a plurality of network service platforms, and performing unified storage, sharing and application of the platforms.
CN201710483185.XA 2017-06-22 2017-06-22 Life cycle-based big data platform sensitive data secure sharing system and method Active CN107491472B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710483185.XA CN107491472B (en) 2017-06-22 2017-06-22 Life cycle-based big data platform sensitive data secure sharing system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710483185.XA CN107491472B (en) 2017-06-22 2017-06-22 Life cycle-based big data platform sensitive data secure sharing system and method

Publications (2)

Publication Number Publication Date
CN107491472A CN107491472A (en) 2017-12-19
CN107491472B true CN107491472B (en) 2020-11-13

Family

ID=60643605

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710483185.XA Active CN107491472B (en) 2017-06-22 2017-06-22 Life cycle-based big data platform sensitive data secure sharing system and method

Country Status (1)

Country Link
CN (1) CN107491472B (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111241571B (en) * 2018-11-28 2023-08-01 创新工场(北京)企业管理股份有限公司 Data sharing method, model system and storage medium
CN109636170A (en) * 2018-12-06 2019-04-16 控福(上海)智能科技有限公司 Shared laboratory information digital platform, leasing system and Design of Laboratory Management System
CN109785192A (en) * 2018-12-28 2019-05-21 桂林市鼎耀信息科技有限公司 Tourism intelligent perception system based on Internet of Things
CN111177694B (en) * 2019-12-16 2023-03-17 华为技术有限公司 Method and device for processing data
CN111143421A (en) * 2019-12-26 2020-05-12 杭州数梦工场科技有限公司 Data sharing method and device, electronic equipment and storage medium
CN111143880B (en) * 2019-12-27 2022-06-07 中电长城网际系统应用有限公司 Data processing method and device, electronic equipment and readable medium
CN112257113B (en) * 2020-11-17 2022-03-25 珠海大横琴科技发展有限公司 Safety control method, device, equipment and medium for data resource platform
CN112291278B (en) * 2020-12-29 2021-06-04 中天众达智慧城市科技有限公司 Personal consumption data processing device in urban brain system
CN113127575A (en) * 2021-03-19 2021-07-16 福建省万物智联科技有限公司 Employee data management method, system, device and storage medium
CN117972792B (en) * 2024-03-28 2024-06-07 江苏开博科技有限公司 Method for desensitizing massive user information in bank development environment

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103780626A (en) * 2014-01-27 2014-05-07 北京飞流九天科技有限公司 Data sharing method of cloud server and intelligent terminal
CN104378386A (en) * 2014-12-09 2015-02-25 浪潮电子信息产业股份有限公司 Method for cloud data confidentiality protection and access control
US10747895B2 (en) * 2015-09-25 2020-08-18 T-Mobile Usa, Inc. Distribute big data security architecture
CN105553940A (en) * 2015-12-09 2016-05-04 北京中科云集科技有限公司 Safety protection method based on big data processing platform
CN105653981B (en) * 2015-12-31 2018-11-30 中国电子科技网络信息安全有限公司 The sensitive data protection system and method for the data circulation and transaction of big data platform
CN106209821B (en) * 2016-07-07 2017-04-05 广西电网有限责任公司 Information security big data management system based on credible cloud computing
CN106203146B (en) * 2016-08-30 2017-04-26 广东港鑫科技有限公司 Big data safety management system

Also Published As

Publication number Publication date
CN107491472A (en) 2017-12-19

Similar Documents

Publication Publication Date Title
CN107491472B (en) Life cycle-based big data platform sensitive data secure sharing system and method
US10628833B2 (en) Computer architecture incorporating blockchain based immutable audit ledger for compliance with data regulations
CN106779527A (en) Commodity circulation information inquiry system and method based on block chain
US10346620B2 (en) Systems and methods for authentication of access based on multi-data source information
CN106203140A (en) Data circulation method based on data structure, device and terminal
CN111639914A (en) Block chain case information management method and device, electronic equipment and storage medium
CN109658126B (en) Data processing method, device, equipment and storage medium based on product popularization
CN108491267A (en) Method and apparatus for generating information
CN111125042A (en) Method and device for determining risk operation event
US20210349955A1 (en) Systems and methods for real estate data collection, normalization, and visualization
CN110109905A (en) Risk list data generation method, device, equipment and computer storage medium
CN109273086A (en) User health data management system and method
CN112364086A (en) Business visualization method and system based on big data platform
KR20220073899A (en) Method for dividing profit of medical service by sharing medical data employing blockchain
CN110246033A (en) Credit risk monitoring method, device, equipment and storage medium
CN109816517A (en) Based on method for processing business, device, computer equipment and the storage medium for increasing letter
JP4413575B2 (en) Information processing apparatus that supports integrated management of account service information, integrated management method of account service information, program, and recording medium
CN104704521A (en) Multi-factor profile and security fingerprint analysis
CN112699088B (en) Method, system and medium for sharing fraud-related data
CN117709901A (en) Whole-flow control method and system for technological achievements based on blockchain
JP2016184330A (en) Approach support system, Approach support method and Approach support program
CN117151736A (en) Anti-electricity fraud management early warning method and system
CN112750043B (en) Service data pushing method, device and server
CN112966024A (en) Financial wind control data analysis system based on big data
CN112100657A (en) Data processing method based on block chain

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant