CN117390190A - Data processing method, device, electronic equipment and computer readable medium - Google Patents

Data processing method, device, electronic equipment and computer readable medium Download PDF

Info

Publication number
CN117390190A
CN117390190A CN202311458701.5A CN202311458701A CN117390190A CN 117390190 A CN117390190 A CN 117390190A CN 202311458701 A CN202311458701 A CN 202311458701A CN 117390190 A CN117390190 A CN 117390190A
Authority
CN
China
Prior art keywords
data
synchronization
determining
classification
application
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311458701.5A
Other languages
Chinese (zh)
Inventor
郝帅卫
陈旭
钱雪梅
秦朋飞
秦波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Wodong Tianjun Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN202311458701.5A priority Critical patent/CN117390190A/en
Publication of CN117390190A publication Critical patent/CN117390190A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/38Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/381Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using identifiers, e.g. barcodes, RFIDs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Bioethics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Library & Information Science (AREA)
  • Medical Informatics (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a data processing method, a data processing device, electronic equipment and a computer readable medium, and relates to the technical field of computers; according to the synchronization mode, executing synchronization of the data corresponding to the corresponding data source identifier to obtain synchronous data; processing the stream of the synchronous data in real time through a data warehouse to obtain analysis data, inputting the analysis data into a convergence layer to obtain classification data, and executing classification storage of the classification data; dividing the classified data based on the preset detail types to obtain detail type data, obtaining weights and scores corresponding to the detail type data, determining final coefficients according to the weights and scores, and determining sensitivity levels according to the final coefficients; and constructing attribute portraits and sensitivities based on the application dimensions according to the sensitivity level, and further obtaining corresponding sensitive data assets. Liberating manpower input and avoiding excessive interference of subjective consciousness.

Description

Data processing method, device, electronic equipment and computer readable medium
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a data processing method, a data processing device, an electronic device, and a computer readable medium.
Background
Currently, existing technologies in the data security field are mainly biased towards CMDB (operation and maintenance management platform) in terms of data asset management, and biased towards metadata management in terms of data class management. The prior art cannot meet the requirement that the sensitive data assets are quickly acquired in daily work in the field of data security, and most of time is wasted and input in the aspect of manpower.
Disclosure of Invention
In view of this, the embodiments of the present application provide a data processing method, apparatus, electronic device, and computer readable medium, which can solve the problem that the existing data security field cannot be satisfied to rapidly acquire sensitive data assets in daily work, and most of the time is wasted and input in terms of manpower.
To achieve the above object, according to one aspect of the embodiments of the present application, there is provided a data processing method, including:
receiving a data processing request, acquiring a corresponding data source identifier, and determining a synchronization mode based on the data source identifier;
according to the synchronization mode, executing synchronization of the data corresponding to the corresponding data source identifier to obtain synchronous data;
processing the stream of the synchronous data in real time through a data warehouse to obtain analysis data, inputting the analysis data into a convergence layer to obtain classification data, and executing classification storage of the classification data;
dividing the classified data based on the preset detail types to obtain detail type data, obtaining weights and scores corresponding to the detail type data, determining final coefficients according to the weights and scores, and determining sensitivity levels according to the final coefficients;
and constructing attribute portraits and sensitivities based on the application dimensions according to the sensitivity level, and further obtaining corresponding sensitive data assets.
Optionally, determining the synchronization manner based on the data source identifier includes:
determining that the synchronization mode is interface synchronization in response to the data source identification corresponding to metadata or data table storage;
determining that the synchronization mode is unidirectional synchronization in response to the data source identifier corresponding to the application data or the data relationship;
in response to the data source identification corresponding to the data table storage or access amount, determining that the synchronization manner is real-time message queue synchronization.
Optionally, constructing the attribute representation and the sensitivity based on the application dimension according to the sensitivity level includes:
and serially connecting the sensitivity level, the classification data, the detail type data, the weight, the score and the final coefficient by using the application entity identity corresponding to the data source identification so as to generate the attribute portrait and the sensitivity of the corresponding application dimension.
Optionally, after obtaining the corresponding sensitive data asset, the method further comprises:
and customizing the sensitive data assets based on the preset scenes to obtain corresponding scene sensitive data assets.
Optionally, determining the final coefficient according to the weight and the score includes:
calculating the product of each weight and each corresponding score;
the products are accumulated to obtain the final coefficient.
Optionally, acquiring weights and scores corresponding to the detail type data includes:
and determining the score of the sensitivity corresponding to each detail type data, and distributing corresponding weight for each detail type data according to each score.
Optionally, obtaining classification data includes:
acquiring service attributes corresponding to the analysis data;
and classifying the analysis data according to the service attribute to obtain classified data.
In addition, the application also provides a data processing device, which comprises:
the receiving unit is configured to receive the data processing request, acquire the corresponding data source identifier and determine the synchronization mode based on the data source identifier;
the synchronization unit is configured to perform synchronization of the data corresponding to the corresponding data source identifier according to the synchronization mode so as to obtain synchronous data;
the classification storage unit is configured to process the synchronous data through the data warehouse in real time to obtain analysis data, input the analysis data into the convergence layer to obtain classification data, and execute classification storage of the classification data;
the sensitivity level determining unit is configured to split the classified data based on the preset detail types to obtain detail type data, obtain weights and scores corresponding to the detail type data, determine final coefficients according to the weights and scores, and determine sensitivity levels according to the final coefficients;
and the data processing unit is configured to construct attribute portraits and sensitivities based on the application dimensions according to the sensitivity level, so as to obtain corresponding sensitive data assets.
Optionally, the receiving unit is further configured to:
determining that the synchronization mode is interface synchronization in response to the data source identification corresponding to metadata or data table storage;
determining that the synchronization mode is unidirectional synchronization in response to the data source identifier corresponding to the application data or the data relationship;
in response to the data source identification corresponding to the data table storage or access amount, determining that the synchronization manner is real-time message queue synchronization.
Optionally, the data processing unit is further configured to:
and serially connecting the sensitivity level, the classification data, the detail type data, the weight, the score and the final coefficient by using the application entity identity corresponding to the data source identification so as to generate the attribute portrait and the sensitivity of the corresponding application dimension.
Optionally, the data processing apparatus further comprises a scene customization unit configured to:
and customizing the sensitive data assets based on the preset scenes to obtain corresponding scene sensitive data assets.
Optionally, the sensitivity level determination unit is further configured to:
calculating the product of each weight and each corresponding score;
the products are accumulated to obtain the final coefficient.
Optionally, the sensitivity level determination unit is further configured to:
and determining the score of the sensitivity corresponding to each detail type data, and distributing corresponding weight for each detail type data according to each score.
Optionally, the classification storage unit is further configured to:
acquiring service attributes corresponding to the analysis data;
and classifying the analysis data according to the service attribute to obtain classified data.
In addition, the application also provides data processing electronic equipment, which comprises: one or more processors; and a storage device for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the data processing method as described above.
In addition, the application also provides a computer readable medium, on which a computer program is stored, which when executed by a processor implements a data processing method as described above.
One embodiment of the above invention has the following advantages or benefits: the method comprises the steps of obtaining a corresponding data source identifier by receiving a data processing request, and determining a synchronization mode based on the data source identifier; according to the synchronization mode, executing synchronization of the data corresponding to the corresponding data source identifier to obtain synchronous data; processing the stream of the synchronous data in real time through a data warehouse to obtain analysis data, inputting the analysis data into a convergence layer to obtain classification data, and executing classification storage of the classification data; dividing the classified data based on the preset detail types to obtain detail type data, obtaining weights and scores corresponding to the detail type data, determining final coefficients according to the weights and scores, and determining sensitivity levels according to the final coefficients; and constructing attribute portraits and sensitivities based on the application dimensions according to the sensitivity level, and further obtaining corresponding sensitive data assets. The mining and extraction of sensitive data assets in daily work are realized efficiently, the manpower investment is liberated, and excessive interference of subjective consciousness is avoided.
Further effects of the above-described non-conventional alternatives are described below in connection with the embodiments.
Drawings
The drawings are included to provide a better understanding of the present application and are not to be construed as unduly limiting the present application. Wherein:
FIG. 1 is a schematic diagram of the main flow of a data processing method provided according to one embodiment of the present application;
FIG. 2 is a schematic diagram of the main flow of a data processing method provided according to one embodiment of the present application;
FIG. 3 is a schematic diagram of a data processing method provided according to one embodiment of the present application;
FIG. 4 is a schematic diagram of a data acquisition flow of a data processing method according to one embodiment of the present application;
FIG. 5 is a schematic diagram of an aggregate computation flow of a data processing method according to one embodiment of the present application;
FIG. 6 is a schematic diagram of the main units of a data processing apparatus according to an embodiment of the present application;
FIG. 7 is an exemplary system architecture diagram in which embodiments of the present application may be applied;
fig. 8 is a schematic diagram of a computer system suitable for use in implementing the terminal device or server of the embodiments of the present application.
Detailed Description
Exemplary embodiments of the present application are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present application to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness. It should be noted that, in the technical solution of the present disclosure, the related aspects of collecting, updating, analyzing, processing, using, transmitting, storing, etc. of the personal information of the user all conform to the rules of the related laws and regulations, and are used for legal purposes without violating the public order colloquial. Necessary measures are taken for the personal information of the user, illegal access to the personal information data of the user is prevented, and the personal information security, network security and national security of the user are maintained.
Fig. 1 is a schematic diagram of main flow of a data processing method according to an embodiment of the present application, and as shown in fig. 1, the data processing method includes:
step S101, receiving a data processing request, acquiring a corresponding data source identifier, and determining a synchronization mode based on the data source identifier.
In this embodiment, the execution body (for example, may be a server) of the data processing method may receive the data processing request by means of a wired connection or a wireless connection. The data processing request may be, for example, a request for quick acquisition of a sensitive data asset. After receiving the data processing request, the executing body can acquire the corresponding data source identifier. The data source identification is used to characterize the data acquisition. As shown in fig. 3, the data source may include metadata, application data, data table storage, data relationships, and access volume (host/object). The synchronization means may include interface synchronization, unidirectional synchronization, and real-time message queue synchronization. Different synchronization patterns may be selected depending on the source of the data. As shown in fig. 4, when the data source is metadata or data table storage, the synchronization mode of interface API synchronization may be adopted, when the data source is application data or data relationship, the synchronization mode of unidirectional synchronization SYNC may be adopted, and when the data source is data table storage or access (host/object), the synchronization mode of real-time message queue TOPIC synchronization may be adopted.
Step S102, synchronization of the data corresponding to the corresponding data source identifiers is executed according to the synchronization mode, so that synchronous data are obtained.
And synchronously updating the corresponding source data based on the determined synchronous mode to obtain updated synchronous data.
Step S103, processing the stream of the synchronous data in real time through a data warehouse to obtain analysis data, inputting the analysis data into a convergence layer to obtain classification data, and executing classification storage of the classification data.
As shown in fig. 4, the obtained synchronous data reaches the convergence layer after the ETL processing flow, and is classified and stored after the convergence processing. Specifically, the acquisition process of fig. 4 is implemented at the server, and first, the type of the data source is determined: metadata, application data, data relation and access quantity, and making synchronization modes of an API interface, SYNC unidirectional synchronization and TOPIC real-time message queue respectively, wherein the synchronized data can be subjected to ETL real-time processing flow, data assets from different sources are analyzed to form a convergence layer, and the convergence layer is classified and stored according to different results.
Step S104, splitting the classified data based on the preset detail types to obtain detail type data, obtaining the weight and the score corresponding to each detail type data, determining a final coefficient according to the weight and the score, and determining the sensitivity level according to the final coefficient.
Based on the results of the sorted storage, the data assets of the usage object, data level, data magnitude, user group, data type, service style are obtained as shown in fig. 5. Further, as shown in fig. 5, the execution body may further perform detail type classification, namely, the execution body may divide the user object into an external person and an internal person, divide the data class into L1-L4 class (L1-L4 is a class definition of data security versus data), divide the data level into 5 steps (more than one million, one hundred thousand-million, one thousand-thousand, 0-thousand), divide the user group into toC, toB, toE (toC refers to enterprise service oriented to a person, i.e., user group; toB refers to enterprise oriented mainly to an enterprise, i.e., a client; toE refers to an enterprise internal business system, and the used user group is an enterprise internal person), divide the data type into order, waybill and human resource, divide the service mode into an application programming interface REST according to REST architecture specification, a remote procedure call (Remote Procedure Call, RPC), a SOCKET soet (one SOCKET is one end of communication on a network, and provides a mechanism that an application layer process uses a network protocol to exchange data), and divide the data by a timing task, and divide each data (i.e., use of the object, user class, user and user group are classified into data types according to the detail type). Writing the detail type data obtained by splitting into weight ratio and specific score, then multiplying the weight ratio by the score, then obtaining a final coefficient by using combined calculation (such as accumulation calculation), and writing the final coefficient result into sensitive level positioning, namely high, medium and low; and then, all the data are connected in series by using the identity of the application entity, so that attribute portraits and sensitivities based on the application dimension are constructed, and finally, a sensitive data asset list is generated. When the execution main body extracts the sensitive data assets, the execution main body can be based on the list to obtain corresponding data assets, and meanwhile, the sensitive data asset list can be customized according to different scenes to achieve scenes compatible with the computation of the sensitive data assets with different dimensions.
As shown in fig. 5, for each classified data, for example, classified data of an object is used, the corresponding preset detail type may be external personnel and internal personnel, the detail type data obtained by splitting based on the preset detail type may be external personnel data and internal personnel data, and the preset weights and scores of the external personnel data and the internal personnel data are obtained. And obtaining a final coefficient according to the product of the weight and the corresponding score. The preset detail types corresponding to the data level may be L1-L4 levels. The preset detail types corresponding to the data magnitude can be more than one million, one hundred thousand-one hundred thousand, one thousand-one thousand and 0 thousand. The preset detail type corresponding to the user group may be toC, toB, toE. The preset detail types corresponding to the data types can be orders, freight notes and financing. The preset detail type corresponding to the service mode can be RESTApi, RPC, SOCKET and timing tasks. Other classification data, such as data class, data magnitude, user group, data type and final coefficient calculation mode of service mode are similar, and will not be described here.
Specifically, determining the final coefficient according to the weight and the score includes: calculating the product of each weight and each corresponding score; the products are accumulated to obtain the final coefficient.
The resulting products are summed to obtain the final coefficients.
And step 105, constructing attribute portraits and sensitivities based on application dimensions according to the sensitivity level, and further obtaining corresponding sensitive data assets.
Specifically, constructing attribute portraits and sensitivities based on application dimensions according to sensitivity levels, including: and serially connecting the sensitivity level, the classification data, the detail type data, the weight, the score and the final coefficient by using the application entity identity corresponding to the data source identification so as to generate the attribute portrait and the sensitivity of the corresponding application dimension.
An application entity is the only window that an application process utilizes OSI communication functions. According to the communication protocol (application protocol) agreed between application entities, the requirements of application process are transferred, and according to the requirements of application entities, application protocol control information is transferred between systems, and some functions can be implemented by presentation layer and layers below presentation layer. An application entity consists of a user element and a set of application service elements. Among these, application service elements can be divided into two categories, namely common application service element CASE and specific application service element CASE. The application-specific service element can provide services that meet the specific requirements of a particular application. Whereas a common application service element is the part of a particular application service that is commonly used, or it provides a service that is common to all other service elements. User elements are the processing units of those application elements that an application process needs to use inside an application entity for its communication purposes. And all data such as the sensitivity level, the classification data, the detail type data, the weight, the score, the final coefficient and the like corresponding to the identity of the same application entity are connected in series, and finally, a sensitive data asset list is generated.
Specifically, after obtaining the corresponding sensitive data asset, the method further comprises: and customizing the sensitive data assets based on the preset scenes to obtain corresponding scene sensitive data assets.
When the execution main body extracts the sensitive data assets, the execution main body can be based on the list to obtain corresponding data assets, and meanwhile, the sensitive data asset list can be customized according to different scenes to achieve scenes compatible with the computation of the sensitive data assets with different dimensions.
According to the embodiment, a corresponding data source identifier is obtained by receiving a data processing request, and a synchronization mode is determined based on the data source identifier; according to the synchronization mode, executing synchronization of the data corresponding to the corresponding data source identifier to obtain synchronous data; processing the stream of the synchronous data in real time through a data warehouse to obtain analysis data, inputting the analysis data into a convergence layer to obtain classification data, and executing classification storage of the classification data; dividing the classified data based on the preset detail types to obtain detail type data, obtaining weights and scores corresponding to the detail type data, determining final coefficients according to the weights and scores, and determining sensitivity levels according to the final coefficients; and constructing attribute portraits and sensitivities based on the application dimensions according to the sensitivity level, and further obtaining corresponding sensitive data assets. The mining and extraction of sensitive data assets in daily work are realized efficiently, the manpower investment is liberated, and excessive interference of subjective consciousness is avoided.
Fig. 2 is a main flow diagram of a data processing method according to an embodiment of the present application, and as shown in fig. 2, the data processing method includes:
step S201, receiving a data processing request, and obtaining a corresponding data source identifier.
In step S202, in response to the data source identifier corresponding to the metadata or the data table storage amount, it is determined that the synchronization manner is interface synchronization.
Interface synchronization, such as API synchronization in FIG. 4, is the synchronization of calling interfaces to perform data. When the data source is from metadata or data table storage, adopting a synchronous mode of interface synchronization.
In step S203, in response to the data source identifier corresponding to the application data or the data relationship, it is determined that the synchronization manner is unidirectional synchronization.
Unidirectional synchronization, such as SYNC in fig. 4, for example, where the a server and the B server synchronize, only changes made to a need to synchronize to B, and changes made to B are not desired to synchronize to a, unidirectional synchronization may be achieved by only setting the folder type of B to receive only. When the data source is from application data or data relation, a one-way synchronous mode is adopted to realize data synchronization which can realize specific requirements.
In step S204, in response to the data source identifier corresponding to the data table storage amount or the access amount, it is determined that the synchronization manner is real-time message queue synchronization.
Real-time message queue synchronization, such as TOPIC in FIG. 4, places the data that needs synchronization into the message queue for the message consumer to acquire consumption in real-time. When the data source is from the storage amount or the access amount of the data table, a synchronous mode of real-time message queue synchronization is adopted to ensure timeliness of data synchronization.
Step S205, synchronization of the data corresponding to the corresponding data source identifier is performed according to the synchronization mode, so as to obtain the synchronization data.
Step S206, the synchronous data is processed in real time through the data warehouse to obtain analysis data, the analysis data is input into the convergence layer to obtain classification data, and classification storage of the classification data is executed.
The data warehouse processes streams in real time, for example, ETL process streams as shown in fig. 4. Specifically, ETL is an abbreviation for Extract-Transform-Load, and is used to describe the process of extracting (Extract), converting (Transform), and loading (Load) data from a source to a destination. The synchronous data can be extracted, converted and loaded into the analysis data in a preset form through the real-time processing flow of the synchronous data through the data warehouse. Convergence layer: the part between the access layer and the core layer is network equipment for connecting the access layer and the core layer, and provides data aggregation, transmission, management and distribution treatment for the access layer; policy-based connections are provided such as address merging, protocol filtering, routing services, authentication management, and the like. The convergence layer can cluster the input analytic data according to a preset clustering algorithm to obtain each cluster, and then outputs each cluster for classified storage.
Specifically, classification data is obtained, including: acquiring service attributes corresponding to the analysis data; and classifying the analysis data according to the service attribute to obtain classified data.
The business attributes may be, for example, usage objects, user groups, service style, data levels, data magnitudes, data types, and the like. Clustering is carried out according to the service attributes corresponding to the analysis data to realize classification of the analysis data, and then each cluster can be obtained, namely each classification data is obtained.
Step S207, splitting the classified data based on the preset detail types to obtain detail type data, obtaining weights and scores corresponding to the detail type data, determining final coefficients according to the weights and scores, and determining sensitivity levels according to the final coefficients.
Specifically, acquiring weights and scores corresponding to the detail type data comprises the following steps: and determining the score of the sensitivity corresponding to each detail type data, and distributing corresponding weight for each detail type data according to each score.
For example, the executing body may determine the number of detail type data corresponding to each classified data, the executing body may obtain a sensitivity score set by the user for each classified data, taking one classified data as an example, the executing body may determine the sensitivity score of each corresponding detail type data according to the sensitivity score set by the user for the classified data and the number of detail type data corresponding to the classified data (the sensitivity score of each classified data may be uniformly divided according to the number of detail type data corresponding to the classified data, or may further perform sensitivity score allocation according to the importance of the detail type data, where the importance of the detail type data may be specified by the user), and further allocate a corresponding weight for each detail type data according to the sensitivity score of each detail type data. Thereby realizing more accurate positioning of the sensitivity level of each detail type data.
And step S208, constructing attribute portraits and sensitivities based on application dimensions according to the sensitivity level, and further obtaining corresponding sensitive data assets.
According to the embodiment of the application, the sensitive data assets can be efficiently excavated and extracted in daily work, the manpower investment is liberated, and excessive interference of subjective consciousness is avoided.
Fig. 3 is an application scenario diagram of a data processing method according to an embodiment of the present application. The data processing method of the embodiment of the application can be applied to a scene of quickly acquiring the sensitive data asset. A system schematic as shown in fig. 3: the collection layer (data source layer) is used for collecting original data of multiple teams of business, operation and maintenance and traffic in the interfacing enterprise by using a 4-layer structure design, and the data sources can be metadata, application data, data table storage amount, data relationship and access amount (host/object). The identification layer cleans and gathers the original data by using classification, algorithm capability and service label system to form relevant service attributes such as use object, user group, service mode, data grade, data magnitude and data type. The calculation layer distributes percentile, assembles the convergence layer and marks the sensitivity for the converged result, calculates and gathers each type of related service attribute, and the sensitivity is high as the score is larger; the application layer is responsible for collecting the quantile calculation result and outputting a sensitive data asset list, such as sensitive application, sensitive crowd, sensitive export and the like. The mining and extraction of sensitive data assets can be efficiently realized in daily work, the manpower investment is liberated, and excessive interference of subjective consciousness is avoided. The mining and the extraction of sensitive data assets are realized in daily work, and the manpower investment is liberated; the method solves the problem that the sensitive data asset in the data security has no calculation reference basis and the subjective consciousness is brought by scientific calculation.
Fig. 6 is a schematic diagram of main units of a data processing apparatus according to an embodiment of the present application. As shown in fig. 6, the data processing apparatus 600 includes a receiving unit 601, a synchronizing unit 602, a classification storage unit 603, a sensitivity level determining unit 604, and a data processing unit 605.
The receiving unit 601 is configured to receive a data processing request, obtain a corresponding data source identifier, and determine a synchronization manner based on the data source identifier.
And a synchronization unit 602 configured to perform synchronization of the data corresponding to the corresponding data source identifier according to the synchronization manner, so as to obtain synchronization data.
The classification storage unit 603 is configured to process the stream of the synchronous data through the data warehouse in real time to obtain the parsed data, input the parsed data to the convergence layer to obtain the classified data, and perform classification storage of the classified data.
The sensitivity level determining unit 604 is configured to split the classified data based on the preset detail types to obtain detail type data, obtain weights and scores corresponding to the detail type data, determine final coefficients according to the weights and scores, and determine the sensitivity level according to the final coefficients.
The data processing unit 605 is configured to construct attribute portraits and sensitivities based on application dimensions according to the sensitivity level, resulting in corresponding sensitive data assets.
In some embodiments, the receiving unit 601 is further configured to: determining that the synchronization mode is interface synchronization in response to the data source identification corresponding to metadata or data table storage; determining that the synchronization mode is unidirectional synchronization in response to the data source identifier corresponding to the application data or the data relationship; in response to the data source identification corresponding to the data table storage or access amount, determining that the synchronization manner is real-time message queue synchronization.
In some embodiments, the data processing unit 605 is further configured to: and serially connecting the sensitivity level, the classification data, the detail type data, the weight, the score and the final coefficient by using the application entity identity corresponding to the data source identification so as to generate the attribute portrait and the sensitivity of the corresponding application dimension.
In some embodiments, the data processing apparatus further comprises a scene customization unit, not shown in fig. 6, configured to: and customizing the sensitive data assets based on the preset scenes to obtain corresponding scene sensitive data assets.
In some embodiments, the sensitivity level determination unit 604 is further configured to: calculating the product of each weight and each corresponding score; the products are accumulated to obtain the final coefficient.
In some embodiments, the sensitivity level determination unit 604 is further configured to: and determining the score of the sensitivity corresponding to each detail type data, and distributing corresponding weight for each detail type data according to each score.
In some embodiments, the classification storage unit 603 is further configured to: acquiring service attributes corresponding to the analysis data; and classifying the analysis data according to the service attribute to obtain classified data.
Note that the data processing method and the data processing apparatus of the present application have a corresponding relationship in terms of implementation contents, and therefore, the description is not repeated.
Fig. 7 illustrates an exemplary system architecture 700 in which the data processing methods or data processing apparatus of embodiments of the present application may be applied.
As shown in fig. 7, a system architecture 700 may include terminal devices 701, 702, 703, a network 704, and a server 705. The network 704 is the medium used to provide communication links between the terminal devices 701, 702, 703 and the server 705. The network 704 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
A user may interact with the server 705 via the network 704 using the terminal devices 701, 702, 703 to receive or send messages or the like. Various communication client applications such as shopping class applications, web browser applications, search class applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only) may be installed on the terminal devices 701, 702, 703.
The terminal devices 701, 702, 703 may be various electronic devices having a data processing screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.
The server 705 may be a server providing various services, such as a background management server (by way of example only) providing support for data processing requests submitted by users using the terminal devices 701, 702, 703. The background management server can receive the data processing request, acquire the corresponding data source identifier and determine the synchronization mode based on the data source identifier; according to the synchronization mode, executing synchronization of the data corresponding to the corresponding data source identifier to obtain synchronous data; processing the stream of the synchronous data in real time through a data warehouse to obtain analysis data, inputting the analysis data into a convergence layer to obtain classification data, and executing classification storage of the classification data; dividing the classified data based on the preset detail types to obtain detail type data, obtaining weights and scores corresponding to the detail type data, determining final coefficients according to the weights and scores, and determining sensitivity levels according to the final coefficients; and constructing attribute portraits and sensitivities based on the application dimensions according to the sensitivity level, and further obtaining corresponding sensitive data assets. The mining and extraction of sensitive data assets in daily work are realized efficiently, the manpower investment is liberated, and excessive interference of subjective consciousness is avoided.
It should be noted that, the data processing method provided in the embodiments of the present application is generally executed by the server 705, and accordingly, the data processing apparatus is generally disposed in the server 705.
It should be understood that the number of terminal devices, networks and servers in fig. 7 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Referring now to FIG. 8, there is illustrated a schematic diagram of a computer system 800 suitable for use in implementing the terminal device of an embodiment of the present application. The terminal device shown in fig. 8 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiment of the present application.
As shown in fig. 8, the computer system 800 includes a Central Processing Unit (CPU) 801 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 802 or a program loaded from a storage section 808 into a Random Access Memory (RAM) 803. In the RAM803, various programs and data required for the operation of the computer system 800 are also stored. The CPU801, ROM802, and RAM803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to the bus 804.
The following components are connected to the I/O interface 805: an input portion 806 including a keyboard, mouse, etc.; an output section 807 including a display such as a Cathode Ray Tube (CRT), a liquid crystal credit authorization query processor (LCD), and a speaker; a storage section 808 including a hard disk or the like; and a communication section 809 including a network interface card such as a LAN card, a modem, or the like. The communication section 809 performs communication processing via a network such as the internet. The drive 810 is also connected to the I/O interface 805 as needed. A removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 810 as needed so that a computer program read out therefrom is mounted into the storage section 808 as needed.
In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments disclosed herein include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from a network via the communication section 809, and/or installed from the removable media 811. The above-described functions defined in the system of the present application are performed when the computer program is executed by a Central Processing Unit (CPU) 801.
It should be noted that the computer readable medium shown in the present application may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium may include, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present application, however, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units involved in the embodiments of the present application may be implemented by software, or may be implemented by hardware. The described units may also be provided in a processor, for example, described as: a processor includes a receiving unit, a synchronizing unit, a classification storage unit, a sensitivity level determining unit, and a data processing unit. Wherein the names of the units do not constitute a limitation of the units themselves in some cases.
As another aspect, the present application also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be present alone without being fitted into the device. The computer readable medium carries one or more programs that, when executed by one of the devices, cause the device to receive a data processing request, obtain a corresponding data source identifier, and determine a synchronization manner based on the data source identifier; according to the synchronization mode, executing synchronization of the data corresponding to the corresponding data source identifier to obtain synchronous data; processing the stream of the synchronous data in real time through a data warehouse to obtain analysis data, inputting the analysis data into a convergence layer to obtain classification data, and executing classification storage of the classification data; dividing the classified data based on the preset detail types to obtain detail type data, obtaining weights and scores corresponding to the detail type data, determining final coefficients according to the weights and scores, and determining sensitivity levels according to the final coefficients; and constructing attribute portraits and sensitivities based on the application dimensions according to the sensitivity level, and further obtaining corresponding sensitive data assets.
According to the technical scheme of the embodiment of the application, the sensitive data assets can be efficiently excavated and extracted in daily work, the manpower investment is liberated, and the interference of excessive subjective consciousness is avoided.
The above embodiments do not limit the scope of the application. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives can occur depending upon design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present application are intended to be included within the scope of the present application.

Claims (12)

1. A method of data processing, comprising:
receiving a data processing request, acquiring a corresponding data source identifier, and determining a synchronization mode based on the data source identifier;
according to the synchronization mode, executing synchronization of the corresponding data corresponding to the data source identifier to obtain synchronous data;
processing the stream of the synchronous data in real time through a data warehouse to obtain analysis data, inputting the analysis data into a convergence layer to obtain classification data, and executing classification storage of the classification data;
splitting the classified data based on a preset detail type to obtain detail type data, obtaining weight and score corresponding to each detail type data, determining a final coefficient according to the weight and the score, and determining a sensitivity level according to the final coefficient;
and constructing attribute portraits and sensitivities based on application dimensions according to the sensitivity level, and further obtaining corresponding sensitive data assets.
2. The method of claim 1, wherein said determining a synchronization pattern based on said data source identification comprises:
determining that the synchronization mode is interface synchronization in response to the data source identification corresponding to metadata or data table storage amount;
determining that the synchronization mode is unidirectional synchronization in response to the data source identifier corresponding to application data or a data relationship;
and determining that the synchronization mode is real-time message queue synchronization in response to the data source identification corresponding to the data table storage amount or the access amount.
3. The method of claim 1, wherein said constructing an application-dimension-based attribute representation and sensitivity from said sensitivity level comprises:
and connecting the sensitivity level, the classification data, the detail type data, the weight, the score and the final coefficient in series by using the application entity identity corresponding to the data source identification so as to generate attribute portraits and sensitivities of corresponding application dimensions.
4. The method of claim 1, wherein after said obtaining the corresponding sensitive data asset, the method further comprises:
and customizing the sensitive data assets based on preset scenes to obtain corresponding scene sensitive data assets.
5. The method of claim 1, wherein said determining final coefficients from said weights and said scores comprises:
calculating the product of each weight and each corresponding score;
the products are accumulated to obtain a final coefficient.
6. The method according to claim 1, wherein the obtaining weights and scores corresponding to each detail type data includes:
and determining the score of the sensitivity corresponding to each detail type data, and distributing corresponding weight to each detail type data according to each score.
7. The method of claim 1, wherein the obtaining classification data comprises:
acquiring service attributes corresponding to the analysis data;
and classifying the analysis data according to the service attribute to obtain classification data.
8. A data processing apparatus, comprising:
the receiving unit is configured to receive a data processing request, acquire a corresponding data source identifier and determine a synchronization mode based on the data source identifier;
the synchronization unit is configured to perform synchronization of the data corresponding to the corresponding data source identifier according to the synchronization mode so as to obtain synchronous data;
the classification storage unit is configured to process the synchronous data through a data warehouse in real time to obtain analysis data, input the analysis data into a convergence layer to obtain classification data, and execute classification storage of the classification data;
the sensitivity level determining unit is configured to split the classified data based on a preset detail type to obtain detail type data, obtain weights and scores corresponding to the detail type data, determine a final coefficient according to the weights and the scores, and determine a sensitivity level according to the final coefficient;
and the data processing unit is configured to construct attribute portraits and sensitivities based on application dimensions according to the sensitivity level, so as to obtain corresponding sensitive data assets.
9. The apparatus of claim 8, wherein the receiving unit is further configured to:
determining that the synchronization mode is interface synchronization in response to the data source identification corresponding to metadata or data table storage amount;
determining that the synchronization mode is unidirectional synchronization in response to the data source identifier corresponding to application data or a data relationship;
and determining that the synchronization mode is real-time message queue synchronization in response to the data source identification corresponding to the data table storage amount or the access amount.
10. The apparatus of claim 8, wherein the data processing unit is further configured to:
and connecting the sensitivity level, the classification data, the detail type data, the weight, the score and the final coefficient in series by using the application entity identity corresponding to the data source identification so as to generate attribute portraits and sensitivities of corresponding application dimensions.
11. A data processing electronic device, comprising:
one or more processors;
storage means for storing one or more programs,
when executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-7.
12. A computer readable medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any of claims 1-7.
CN202311458701.5A 2023-11-03 2023-11-03 Data processing method, device, electronic equipment and computer readable medium Pending CN117390190A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311458701.5A CN117390190A (en) 2023-11-03 2023-11-03 Data processing method, device, electronic equipment and computer readable medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311458701.5A CN117390190A (en) 2023-11-03 2023-11-03 Data processing method, device, electronic equipment and computer readable medium

Publications (1)

Publication Number Publication Date
CN117390190A true CN117390190A (en) 2024-01-12

Family

ID=89462924

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311458701.5A Pending CN117390190A (en) 2023-11-03 2023-11-03 Data processing method, device, electronic equipment and computer readable medium

Country Status (1)

Country Link
CN (1) CN117390190A (en)

Similar Documents

Publication Publication Date Title
CN108491267B (en) Method and apparatus for generating information
JP2022529967A (en) Extracting data from the blockchain network
CN107944481B (en) Method and apparatus for generating information
US11163906B2 (en) Adaptive redaction and data releasability systems using dynamic parameters and user defined rule sets
CN105530272B (en) A kind of synchronous method and device using data
CN110300084B (en) IP address-based portrait method and apparatus, electronic device, and readable medium
CN111339743B (en) Account number generation method and device
CN110730201A (en) Data sharing method and system based on metadata
CN110866040A (en) User portrait generation method, device and system
CN113190517B (en) Data integration method and device, electronic equipment and computer readable medium
CN114357280A (en) Information pushing method and device, electronic equipment and computer readable medium
CN112035256A (en) Resource allocation method, device, electronic equipment and medium
CN115103015B (en) Data pushing method and device, electronic equipment and computer readable medium
CN111488386A (en) Data query method and device
CN107679096B (en) Method and device for sharing indexes among data marts
CN111062572A (en) Task allocation method and device
US10601749B1 (en) Trends in a messaging platform
KR20180042726A (en) A method for analyzing big data based on cloud service and an apparatus therefore
CN111723063A (en) Method and device for processing offline log data
CN117390190A (en) Data processing method, device, electronic equipment and computer readable medium
CN108647333A (en) A kind of information sharing method, device and equipment
CN114817347A (en) Business approval method and device, electronic equipment and storage medium
CN113434754A (en) Method and device for determining recommended API (application program interface) service, electronic equipment and storage medium
CN111400313A (en) Method and device for processing request
CN111161067A (en) Method and device for determining transaction route

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination