CN118536159A - Data management method, data management system, big data platform and storage medium - Google Patents

Data management method, data management system, big data platform and storage medium Download PDF

Info

Publication number
CN118536159A
CN118536159A CN202410675588.4A CN202410675588A CN118536159A CN 118536159 A CN118536159 A CN 118536159A CN 202410675588 A CN202410675588 A CN 202410675588A CN 118536159 A CN118536159 A CN 118536159A
Authority
CN
China
Prior art keywords
data
platform
big
big data
data platform
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410675588.4A
Other languages
Chinese (zh)
Inventor
严学俊
尹炬
吴安峻
张海风
李金莉
林玉婷
张梦娇
吴超
陈佳庆
胡杰
肖莹
王腾波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Xinxiang Semiconductor Technology Co ltd
Original Assignee
Shenzhen Xinxiang Semiconductor Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Xinxiang Semiconductor Technology Co ltd filed Critical Shenzhen Xinxiang Semiconductor Technology Co ltd
Priority to CN202410675588.4A priority Critical patent/CN118536159A/en
Publication of CN118536159A publication Critical patent/CN118536159A/en
Pending legal-status Critical Current

Links

Landscapes

  • Computer And Data Communications (AREA)

Abstract

The application relates to a data management method, a data management system, a big data platform and a computer readable storage medium, wherein the data management method is applied to a first big data platform and comprises the following steps: and collecting the original data and storing the original data into a first database, wherein the first database is accessed by a first big data platform. And storing the data meeting the preset conditions in the first database to a first shared source library. And transmitting the data in the first sharing source library to at least one second big data platform for data intercommunication according to a preset sharing rule. Therefore, the large data platform and other large data platforms in the application are communicated with a data intercommunication network, data collection and data sharing are carried out between the shared source library and the other large data platforms, and the other large data platforms cannot access the core database of the large data platform to realize data isolation, so that data leakage can be avoided when data collection or data sharing is carried out among a plurality of large data platforms, and the application can realize the aim of safe data management.

Description

Data management method, data management system, big data platform and storage medium
Technical Field
The present application relates to the field of data sharing technologies, and in particular, to a data management method, a data management system, a large data platform, and a computer readable storage medium.
Background
The big data platform can realize cross-platform and cross-node data acquisition among a plurality of processing systems connected with the big data platform. The big data platform has a database for storing data corresponding to the functions according to the functions (e.g., analysis functions) of the big data platform to support the implementation of the functions.
At present, in practical application, there is a need for realizing data collection among a plurality of big data platforms, but when realizing data collection among a plurality of big data platforms, there is a risk of data leakage caused by database exposure, so how to avoid data leakage when realizing data collection among a plurality of big data platforms is a technical problem that needs to be solved by a person skilled in the art.
Disclosure of Invention
In view of the above technical problems, the present application provides a data management method, a data management system, a large data platform, and a computer readable storage medium, which can avoid data leakage when data collection or data sharing is performed between a plurality of large data platforms, thereby achieving the purpose of safe data management.
To solve the above-mentioned technical problem, a first aspect of the present application provides a data management method applied to a first big data platform configured with a first database and a first shared source library, including: and collecting the original data and storing the original data into a first database, wherein the first database is accessed by a first big data platform. And storing the data meeting the preset conditions in the first database to a first shared source library. And transmitting the data in the first sharing source library to at least one second big data platform for data intercommunication according to a preset sharing rule.
Optionally, before the step of transmitting the data in the first shared source library to the second big data platform of the at least one data intercommunication according to the preset sharing rule, the method includes: and establishing a data transmission channel with at least one second big data platform. And sending the first platform information of the second large data platform to the second large data platform so that the second large data platform can establish a first data acquisition task according to the first platform information.
Optionally, the step of transmitting the data in the first shared source library to at least one second big data platform for data intercommunication according to a preset sharing rule includes: and identifying sensitive data in the data corresponding to the first data acquisition task in the shared source library and the corresponding sensitive grade thereof by a sensitive data identification technology. And performing corresponding desensitization treatment according to the sensitivity level corresponding to the sensitive data. And transmitting the conventional data corresponding to the first data acquisition task in the shared source library and the desensitized sensitive data to at least one second big data platform for data intercommunication.
Optionally, the data management method provided by the application further includes: and acquiring second platform information sent by a second big data platform. And creating a second data acquisition task according to the second platform information. And receiving and storing data fed back by the second big data platform based on the second data acquisition task.
Optionally, the step of receiving and storing the data fed back by the second big data platform based on the second data acquisition task includes: based on the second data acquisition task, a data path is switched on with the second large data platform. And receiving data fed back by the second big data platform through the data path, and performing network security monitoring. When the detection result corresponds to a network attack, a threat defense mechanism is activated.
Optionally, the threat defense mechanism comprises at least one of:
Interrupting receiving data transmitted by the data path;
discarding received data corresponding to the data acquisition task;
preventing data corresponding to the unauthorized IP address;
generating threat defense logs;
And executing corresponding reminding operation based on the detection result.
Optionally, the step of storing the data fed back by the second big data platform includes at least one of:
Storing the data fed back by the second big data platform to a first shared source library;
storing the data fed back by the second big data platform into a shared source table corresponding to the second big data platform in the first shared source library;
After data processing is carried out on the data fed back by the second big data platform, the data are stored in a first forward library of the second big data platform;
and after data processing is carried out on the data fed back by the second big data platform, storing the data into a destination table corresponding to the second big data platform in the first destination library of the data processing device.
A second aspect of the present application provides a data management system comprising: a first big data platform and at least one second big data platform. The first large data platform is configured with a first database and a first shared source database, and is used for collecting and storing original data to the first database, and storing data meeting preset conditions in the first database to the first shared source database, wherein the first database is accessed by the first large data platform. The second big data platform is in data intercommunication with the first big data platform and is used for receiving data in a first sharing source library transmitted by the first big data platform according to a preset sharing rule.
A third aspect of the present application provides a big data platform comprising: a memory, a processor, the memory having stored thereon a computer program which, when executed by the processor, performs the steps of the data management method as described in any of the above.
A fourth aspect of the application provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the data management method as described in any of the preceding claims.
The application provides a data management method, a data management system, a big data platform and a computer readable storage medium, wherein the data management method is applied to a first big data platform configured with a first database and a first shared source library, and comprises the following steps: and collecting the original data and storing the original data into a first database, wherein the first database is accessed by a first big data platform. And storing the data meeting the preset conditions in the first database to a first shared source library. And transmitting the data in the first sharing source library to at least one second big data platform for data intercommunication according to a preset sharing rule. Therefore, the large data platform and other large data platforms in the application are communicated with a data intercommunication network, data collection and data sharing are carried out between the shared source library and the other large data platforms, and the other large data platforms cannot access the core database of the large data platform to realize data isolation, so that data leakage can be avoided when data collection or data sharing is carried out among a plurality of large data platforms, and the application can realize the aim of safe data management.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application. In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, and it will be obvious to those skilled in the art that other drawings can be obtained from these drawings without inventive effort.
Fig. 1 is a flowchart of a data management method according to a first embodiment of the present application.
Fig. 2 is a schematic diagram of a topology of network interworking between big data platforms according to an example of the present application.
Fig. 3 is a schematic diagram of an application scenario in which a big data platform collects data according to an example of the present application.
Fig. 4 is a schematic diagram of another application scenario in which a big data platform collects data according to an example of the present application.
Fig. 5 is a schematic structural diagram of a big data platform according to a second embodiment of the present application.
The achievement of the objects, functional features and advantages of the present application will be further described with reference to the accompanying drawings, in conjunction with the embodiments. Specific embodiments of the present application have been shown by way of the above drawings and will be described in more detail below. The drawings and the written description are not intended to limit the scope of the inventive concepts in any way, but rather to illustrate the inventive concepts to those skilled in the art by reference to the specific embodiments.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the application. Rather, they are merely examples of apparatus and methods consistent with aspects of the application as detailed in the accompanying claims.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, the element(s) defined by the phrase "comprising one … …" does not exclude the presence of additional identical elements in a process, method, article, or apparatus that comprises the element(s), alternatively, elements, features, or elements having the same name in different embodiments of the application may have the same meaning or may have different meanings, a particular meaning of which is to be determined by its interpretation in this particular embodiment or by further context of this particular embodiment.
It should be understood that although the terms first, second, third, etc. may be used herein to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope herein. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "in response to a determination" depending on the context. Furthermore, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context indicates otherwise. It will be further understood that the terms "comprises," "comprising," "includes," and/or "including" specify the presence of stated features, steps, operations, elements, components, items, categories, and/or groups, but do not preclude the presence, presence or addition of one or more other features, steps, operations, elements, components, items, categories, and/or groups. The terms "or" and/or "as used herein are to be construed as inclusive, or meaning any one or any combination. Thus, "A, B or C" or "A, B and/or C" means "any of the following: a, A is as follows; b, a step of preparing a composite material; c, performing operation; a and B; a and C; b and C; A. b and C). An exception to this definition will occur only when a combination of elements, functions, steps or operations are in some way inherently mutually exclusive.
It should be understood that, although the steps in the flowcharts in the embodiments of the present application are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited in order and may be performed in other orders, unless explicitly stated herein. Moreover, at least some of the steps in the figures may include multiple sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, the order of their execution not necessarily occurring in sequence, but may be performed alternately or alternately with other steps or at least a portion of the other steps or stages.
It should be noted that, in this document, step numbers such as S1 and S2 are adopted, and the purpose of the present application is to more clearly and briefly describe the corresponding content, and not to constitute a substantial limitation on the sequence, and those skilled in the art may execute S2 first and then execute S1 when implementing the present application, which is within the scope of protection of the present application.
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
In the following description, suffixes such as "module", "part" or "unit" for representing elements are used only for facilitating the description of the present application, and have no specific meaning per se. Thus, "module," "component," or "unit" may be used in combination.
First embodiment
Fig. 1 is a flowchart of a data management method according to a first embodiment of the present application. For a clear description of the data management method provided in the first embodiment of the present application, reference may be made to fig. 1.
The first embodiment of the application provides a data management method applied to a first big data platform configured with a first database and a first shared source library, comprising the following steps:
S1: and collecting the original data and storing the original data into a first database, wherein the first database is accessed by a first big data platform.
In one embodiment, the raw data may represent the raw data directly generated by the device/system at the application end or the raw data directly generated by the device/system at the application end and subjected to a simple processing operation.
In an embodiment, the application end may correspond to a specific application scenario, such as a factory, a product line, a warehouse, etc.
S2: and storing the data meeting the preset conditions in the first database to a first shared source library.
In one embodiment, the data satisfying the preset condition includes, but is not limited to, at least one of the following:
The data type is sharable data;
selecting target data by setting operation;
And determining target data based on the acquisition requirements corresponding to the data acquisition tasks.
According to the method, the first data to be shared in the first database is stored in the first shared source database, so that other big data platforms are prevented from directly accessing the first database comprising core service data when acquiring the data in the first big data platform, the first shared source database is used as a transfer, and the other big data platforms can acquire the data in the first big data platform only through the first shared source database.
S3: and transmitting the data in the first sharing source library to at least one second big data platform for data intercommunication according to a preset sharing rule.
In one embodiment, S3: before the step of transmitting the data in the first shared source library to the second big data platform of the at least one data interworking according to the preset sharing rule, the method includes but is not limited to: and establishing a data transmission channel with at least one second big data platform. And sending the first platform information of the second large data platform to the second large data platform so that the second large data platform can establish a first data acquisition task according to the first platform information.
In an embodiment, the first platform information may characterize information required to enable manual or automatic creation of data acquisition tasks corresponding to the first large data platform with other large data platforms. Optionally, the first platform information includes, but is not limited to, at least one of network information, platform database information, interface configuration information, data parsing information, and the like.
In an embodiment, the network information may characterize information related to the network at the time of creating the data acquisition task corresponding to the first large data platform.
In one embodiment, the platform database information may characterize information related to the platform database for other large data platforms to create data acquisition tasks corresponding to the first large data platform. For example, identification information of each database in the platform, data table construction information in the platform database, and the like.
In an embodiment, the interface configuration information may characterize information related to the interface when creating the data acquisition task corresponding to the first large data platform.
In an embodiment, the data parsing information may represent information related to parsing or restoring the desensitized data output by the first large data platform. Such as keys, data dictionaries, etc.
In an embodiment, the first platform information of the second large data platform is sent to the second large data platform, so that the second large data platform can create a first data acquisition task according to the first platform information, for example, after a data transmission channel is established with other large data platforms, the first platform information is automatically broadcasted to the other large data platforms in a broadcasting mode, so that the other large data platforms can automatically acquire and store the first platform information, and then the other large data platforms can create a data acquisition task for the first large data platform based on the first platform information, so that the first large data platform feeds back data corresponding to the data acquisition task to the other large data platforms.
In one embodiment, S3: transmitting the data in the first shared source library to at least one second big data platform for data interworking according to a preset sharing rule, including but not limited to: and identifying sensitive data in the data corresponding to the first data acquisition task in the shared source library and the corresponding sensitive grade thereof by a sensitive data identification technology. And performing corresponding desensitization treatment according to the sensitivity level corresponding to the sensitive data. And transmitting the conventional data corresponding to the first data acquisition task in the shared source library and the desensitized sensitive data to at least one second big data platform for data intercommunication. In this way, the embodiment can realize data desensitization, and different sensitivity levels can correspond to different desensitization processes so as to perform data classification or other refinement processes later, and in addition, the embodiment can also avoid the problem of data leakage caused by malicious theft of data when the data is transmitted, so that the safety of data transmission can be improved.
In one embodiment, the step of performing the corresponding desensitization process according to the sensitivity level corresponding to the sensitive data includes: and carrying out corresponding desensitization according to the corresponding sensitivity level of the sensitive data and the contrast relation information of the desensitization. The desensitization processing control relation information comprises a corresponding relation between the sensitivity level and the desensitization processing mode.
In one embodiment, the desensitizing treatment includes, but is not limited to: encryption processing of sensitive data, field conversion of sensitive data, and the like.
In an embodiment, the second large data platform receives the sensitive data after the desensitization processing of the first large data platform, and the sensitive data after the encryption processing can be decrypted by a key in the first platform information to obtain the original data, and/or the sensitive data after the field conversion can be restored by a data dictionary in the first platform information to obtain the original data.
In an embodiment, the sensitive data identification technology may be a data classification technology based on a clustering algorithm, the deep neural network structure is used for performing feature learning on the original data, the K-Means clustering algorithm is used for clustering in a low-dimensional space, and continuous iterative training and learning are implemented, so that a machine can automatically identify the sensitive data, and automation and intellectualization of classification of the sensitive data are realized.
In an implementation manner, the data management method provided in this embodiment further includes, but is not limited to: and acquiring second platform information sent by a second big data platform. And creating a second data acquisition task according to the second platform information. And receiving and storing data fed back by the second big data platform based on the second data acquisition task. Thus, the first large data platform in this embodiment can also receive platform information of other large data platforms, so that a data acquisition task can be created to acquire data in the other large data platforms.
In an embodiment, the second platform information includes, but is not limited to, at least one of network information, platform database information, interface configuration information, data parsing information, and the like.
In an embodiment, the network information may characterize information related to the network when creating the data acquisition task corresponding to the second large data platform.
In one embodiment, the platform database information may characterize information related to the platform database for other large data platforms to create data acquisition tasks corresponding to the second large data platform. For example, identification information of each database in the platform, data table construction information in the platform database, and the like.
In an embodiment, the interface configuration information may characterize information related to the interface when creating the data acquisition task corresponding to the second large data platform.
In an embodiment, the data parsing information may represent information related to parsing or restoring the desensitized data output by the second big data platform. Such as keys, data dictionaries, etc.
In an embodiment, a data collection task (e.g., a first data collection task and/or a second data collection task) may characterize a task that collects specified data or unspecified data for a target large data platform or a target database in a target large data platform. The data acquisition tasks may be automatically created based on platform information and/or preset data acquisition requirements, or may be manually created. The data acquisition requirement can be corresponding to specified data. Optionally, the embodiment can automatically create the data acquisition task based on the platform information and/or the preset data acquisition requirement, so that manual creation is avoided, the data acquisition efficiency is improved, the operation is simplified, and the maintenance cost is reduced.
In one embodiment, the step of receiving and storing the data fed back by the second big data platform based on the second data acquisition task includes, but is not limited to: based on the second data acquisition task, a data path is switched on with the second large data platform. And receiving data fed back by the second big data platform through the data path, and performing network security monitoring. When the detection result corresponds to a network attack, a threat defense mechanism is activated. Thus, the embodiment has the network attack detection function, so that when the network attack is detected, a threat defense mechanism can be triggered, and the data security is further ensured.
In one embodiment, the threat defense mechanism includes at least one of:
Interrupting receiving data transmitted by the data path;
discarding received data corresponding to the data acquisition task;
preventing data corresponding to the unauthorized IP address;
generating threat defense logs;
And executing corresponding reminding operation based on the detection result.
The embodiment can provide various threat defense mechanisms for selection, and further ensures the security of data.
In an embodiment, the threat defense log may include, but is not limited to, at least one of a detection result, a threat defense mechanism undertaken, a defense result. The embodiment conveniently traces the defense records through the threat defense log.
In one embodiment, the corresponding alert operation is performed based on the detection result, including but not limited to at least one of:
Sending information corresponding to the detection result to a manager side to prompt the manager;
And displaying alarm information in the first big data platform and/or at least one second big data platform based on the detection result.
So, this embodiment can remind the operation, and the relevant personnel of being convenient for in time know and overhaul and maintain.
In one embodiment, the step of storing the data fed back by the second big data platform includes at least one of:
Storing the data fed back by the second big data platform to a first shared source library;
storing the data fed back by the second big data platform into a shared source table corresponding to the second big data platform in the first shared source library;
After data processing is carried out on the data fed back by the second big data platform, the data are stored in a first forward library of the second big data platform;
and after data processing is carried out on the data fed back by the second big data platform, storing the data into a destination table corresponding to the second big data platform in the first destination library of the data processing device.
In one embodiment, the destination library (Data Destination Database, collectively referred to as data destination library) generally refers to the location where data is collected, processed and stored. In the course of data integration, going to a library may involve multiple stages, including preprocessing, cleansing, converting, etc., of the data, ultimately storing the data in a particular database or file system or going to a table for further analysis and application. For example, in a Web big data system, an appropriate data source is selected and its data is imported into the data warehouse, which is a representation of the data going to the library.
In one embodiment, a source library (Data Source Database, collectively referred to as a data source library) may focus primarily on the original source of the data, i.e., how the data was generated, and where the data came from.
In one embodiment, network security monitoring may be performed by a network security detection system. The network detection system includes an intrusion detection system manager (Intrusion detection SYSTEM MANAGER, idsM) and an intrusion detection system reporter (Intrusion detection system Reporter, idsR).
In one embodiment, the intrusion detection system manager IdsM is responsible for collecting information and may consist of a time collector, a security event filter, a security event encoder, and a security event processor. Optionally, the intrusion detection system manager IdsM may house an intrusion detection system (Intrusion Detection System, IDS), an intrusion prevention system (Intrusion Prevention System, IPS), or an intrusion detection and prevention system (Intrusion Detection and Prevention System, IDPS) that functions to detect potential unauthorized activities and security threats and to block those threats. The IDS, IPS or IDPS includes, but is not limited to, the following systems or combinations of the following systems: network intrusion detection and prevention systems (Network Intrusion Detection and Prevention System, NIDPS), traffic monitoring systems (Traffic Monitor System, TMS), security event log systems (Security Log System, SLS).
In one embodiment, the capability requirements of the network intrusion detection and prevention system NIDPS include, but are not limited to: ① Monitoring and analyzing data flow passing through a network in real time; ② Being able to identify various types of attacks, including known attacks (e.g., doS attacks, trojans, viruses, worms, etc.) and unknown attacks (zero-day exploits); ③ The system has a behavior analysis function, and abnormal or malicious behaviors and the like are identified by monitoring the mode and the behavior of network activities.
In one embodiment, the capability requirements of the flow monitoring system TMS include, but are not limited to: ① Supporting network card level flow monitoring; ② Supporting VLAN-level traffic monitoring; ③ Support protocol level (e.g., TCP, UDP) traffic monitoring; ④ Supporting and identifying network traffic surge and other anomalies; ⑤ And supporting setting of the traffic reporting time frequency.
In one embodiment, the capabilities of the security event log system SLS include, but are not limited to: ① A unified safety log read-write interface is supported; ② Supporting definition of log format, grade and type, including perfect log format, log grade (high, medium and low), log type definition; ③ The size of the supporting security log file is configurable; ④ Supporting local circular storage of the security log file; ⑤ Supporting real-time reporting and network interruption continuous transmission of the security log; ⑥ And a safety log library is supported, so that unified log management, aggregation reporting and the like of each application are facilitated.
In one embodiment, network security monitoring may employ network attack detection techniques.
In an embodiment, the network attack detection technique includes, but is not limited to, a combination of one or more of an attack detection bypass technique, a Trojan communication identification technique, an attack detection technique, and the like.
In an embodiment, the attack detection bypass technology is based on machine learning, a 3-gram-based algorithm is adopted to automatically detect malicious attacks in http traffic, attack sentences in all files in data are read, the occurrence frequency of each triplet is counted through an interpolation method, and the triplet and corresponding frequency information which occur in each attack type are recorded as a model.
In one embodiment, the Trojan communication recognition technology is based on deep learning, adopts LSTM (Longshort-termmemory long and short term memory network) to learn character sequences (modes of domain names), wherein LSTM is a neural network model capable of having the memory modes, and uses more than ten millions of recorded DGA (DomainGenerationAlgorithm domain name generation algorithm) domain names and normal domain names to train out a model capable of accurately judging the DGA domain names through the LSTM neural network.
In one embodiment, the attack detection technology is an attack detection technology based on semantic analysis, the character strings to be analyzed are spliced by the semantic analysis detection technology, then lexical grammar analysis is performed, a grammar tree (AST) is successfully analyzed and established to be regarded as an SQL sentence, the grammar tree (AST) is traversed, and attack characteristics are extracted according to a common function of the attack SQL sentence.
In an embodiment, a large data platform (e.g., a first large data platform and/or a second large data platform) may characterize a platform with data collection, analysis, and application functionality. Alternatively, the big data platform may be one or a combination of two of a server and a terminal device.
The data management method provided in this embodiment is applied to a first big data platform configured with a first database and a first shared source library, and includes: s1: collecting original data and storing the original data into a first database, wherein the first database is accessed by a first big data platform; s2: storing the data meeting the preset conditions in the first database to a first shared source library; s3: and transmitting the data in the first sharing source library to at least one second big data platform for data intercommunication according to a preset sharing rule. Therefore, in the embodiment, the large data platform and other large data platforms are communicated with the data intercommunication network, data collection and data sharing are carried out between the shared source library and the other large data platforms, the other large data platforms cannot access the core database of the large data platforms, and data isolation is achieved, so that data leakage can be avoided when data collection or data sharing is carried out among the large data platforms, and the embodiment can achieve the aim of safe data management.
Second embodiment:
a first embodiment of the present application provides a data management system, including: a first big data platform and at least one second big data platform.
The first big data platform is configured with a first database and a first shared source library, and is used for collecting and storing original data in the first database, and storing data meeting preset conditions in the first database in the first shared source library.
The first database is used for the first big data platform to access.
In an embodiment, the first big data platform may be further configured to establish a data transmission channel with at least one second big data platform, so as to send the own first platform information to the second big data platform, so that the second big data platform realizes the creation of the first data acquisition task according to the first platform information.
In an embodiment, the first big data platform may be further configured to identify, by using a sensitive data identification technology, sensitive data in data corresponding to the first data acquisition task in the shared source library and a sensitive level corresponding to the sensitive data, so that corresponding desensitization processing is performed according to the sensitive level corresponding to the sensitive data, and conventional data corresponding to the first data acquisition task in the shared source library and the desensitized sensitive data are further transmitted to the second big data platform for at least one data intercommunication.
In an embodiment, the first big data platform may be further configured to obtain second platform information sent by the second big data platform, so as to create a second data acquisition task according to the second platform information, so that the second big data platform sends data corresponding to the second data acquisition task to the first big data platform.
In an embodiment, the first big data platform may be further configured to receive, based on the second data acquisition task, data fed back by the second big data platform through the data path, perform network security monitoring, and activate a threat defense mechanism when the detection result corresponds to a network attack.
In one embodiment, the threat defense mechanism includes at least one of:
Interrupting receiving data transmitted by the data path;
discarding received data corresponding to the data acquisition task;
preventing data corresponding to the unauthorized IP address;
generating threat defense logs;
And executing corresponding reminding operation based on the detection result.
The second big data platform is in data intercommunication with the first big data platform and is used for receiving data in a first shared source library transmitted by the first big data platform according to a preset sharing rule.
In an embodiment, the second large data platform may function the same as or different from the first large data platform. Optionally, when the function of the second big data platform is the same as that of the first big data platform, the specific embodiment of the second big data platform may refer to the first big data platform, which will not be described herein.
In one embodiment, the step of storing the data fed back by the other big data platform by the target big data platform (for example, the first big data platform or the second big data platform) includes at least one of the following:
The target big data platform stores the data fed back by other big data platforms into a shared source library of the target big data platform;
the target big data platform stores the data fed back by other big data platforms into a shared source table corresponding to the other big data platforms in a shared source library of the target big data platform;
The target big data platform processes the data fed back by other big data platforms and stores the processed data into a self-oriented library;
And after the target big data platform processes the data fed back by other big data platforms, storing the data into the destination tables corresponding to the other big data platforms in the destination library of the target big data platform.
The first embodiment of the application provides a data management system, which can open a data intercommunication network among large data platforms, and each large data platform performs data acquisition and data sharing with other large data platforms through a shared source library so that the other large data platforms cannot access own core database, thereby realizing data isolation. Therefore, the embodiment can avoid data leakage when data collection or data sharing is performed among a plurality of large data platforms, and the embodiment can achieve the aim of safe data management. In addition, the embodiment can enable a plurality of big data platforms to keep data channels to realize data intercommunication of the embodiment, so that complexity of manually configuring the data channels is reduced, and data interaction efficiency is improved. In addition, the big data platform serving as the acquisition party automatically creates the library table resources on the big data platform of the acquisition party by recording the platform information of the big data platform corresponding to the data source party, the big data platform corresponding to the source party automatically creates the library table resources and the acquisition tasks, the configuration complexity of the data acquisition tasks is reduced, the data interaction efficiency is improved, and in addition, the data of the big data platform respectively corresponding to a plurality of data source parties can be timely and rapidly acquired.
Based on the technical concept of the above technical solution, referring to fig. 2, this embodiment illustrates a topology structure of network interworking between big data platforms, where a big data platform T1 (i.e. a first big data platform) collects data of a plant a, a big data platform T2 (i.e. a second big data platform) collects data of a plant B, and a big data platform T3 (i.e. a second big data platform) collects data of a plant C. The large data platform T1 and the large data platform T2 are communicated with each other (a data transmission channel is established) to mutually receive platform information in a broadcasting mode and mutually store the platform information, so that the large data platform T1 and the large data platform T2 can both establish a data acquisition task based on the platform information of the other side, and further the mutual sending and receiving of data can be realized between the large data platform T1 and the large data platform T2; the large data platform T1 and the large data platform T3 are communicated with each other (a data transmission channel is established) in a network mode so as to mutually receive platform information in a broadcasting mode and mutually store the platform information, so that the large data platform T1 and the large data platform T3 can both establish a data acquisition task based on the platform information of the other side, and further the large data platform T1 and the large data platform T3 can mutually send and receive data.
Based on the technical concept of the above technical solution, referring to fig. 3, this embodiment illustrates an application scenario of collecting data by a big data platform, where the collection authority is opened after each big data platform is automatically created or a user creates original data (or called source data), and according to platform information of other big data platforms, a linked data collection task is automatically created, and collection is performed from the database of the big data platform to a platform database (e.g. a shared source database) of other big data platforms; for example, big data platform T1 (i.e., the first big data platform) corresponds to plant A and connects to plant A's destination library_A_ods_DB (including destination table ods_source_table_1), big data platform T2 (i.e., the second big data platform) corresponds to plant B and connects to plant B's source library_B_DB (including source table source_table) and shared source library_B_source_DB (including shared source table source_table_1); the Factory B or the big data platform T2 automatically or by a user creates a shared source library factor_B_source_DB, and then sets the data of a source table source_table as acquirable; the method comprises the steps that a data acquisition task is automatically or automatically created by a Factory A or a large data platform T1 or a user and distributed to a large data platform T2 of a Factory B, the large data platform T2 of the Factory B automatically generates a cascade data acquisition task, data is firstly acquired into a shared source table source_table_1 of a shared source library source_B_source_DB of the Factory B, then the data acquisition task of the large data platform T1 of the Factory A automatically acquires the data from the shared source table source_table_1 to the destination table ods_source_table_1, and each user of the large data platform T1 of the Factory A develops and uses the data from the destination table ods_source_table_1 according to requirements.
Based on the technical concept of the above technical solution, referring to fig. 4, another application scenario of data collection by a big data platform is illustrated in this embodiment, where there are multiple platforms, the big data platform located above can automatically collect data to the big data platform located above (the big data platform located above) according to the platform information of the big data platform located below, so that the big data platform located above can collect data of the big data platform located below, and also can provide data of the big data platform located above to other big data platforms; for example, large data platform T1 (i.e., the first large data platform) corresponds to plant A and connects to the forward library factor_A_ods_DB of plant A (including forward table 1ods_source_table_B_1, forward table 2ods_source_table_C_1, forward table 3ods_source_table_E_1, etc.), large data platform T2 (i.e., the second large data platform or the first large data platform) corresponds to plant B and connects to the source library factor_B_DB of plant B (including source table source_table_B) and the shared source library factor_B_source_DB (including shared source table 1source_table_B_1, shared source table 2source_table_C_1, shared source table 3source_table_E_1, etc.), the big data platform T3 (i.e., the second big data platform) corresponds to the plant C and connects the source library factor_c_db (including source table source_table_c) and the shared source library factor_c_source_db (including shared source table 1 source_table_c_1) of the plant C, and the big data platform T4 (i.e., the second big data platform) corresponds to the plant E and connects the source library factor_e_db (including source table source_table_e) and the shared source library factor_e_source_db of the plant E.
Referring to fig. 4, the Factory E or the big data platform T4 automatically or by a user creates a shared source library factor_e_source_db, and sets the shared source table source_table_e to be collectable.
The Factory B or the big data platform T2 automatically or a user creates a data acquisition task and distributes the data acquisition task to the big data platform T4 of the Factory E, the big data platform T4 of the Factory E automatically generates a cascade data acquisition task, data in a shared source table source_table_E is firstly acquired into a shared source library factor_E_source_DB of the Factory E, and then the data acquisition task of the big data platform T2 of the Factory B automatically acquires the data from the shared source library factor_E_source_DB to a destination table shared source table 3source_table_E_1.
In addition, the Factory B or the big data platform T2 automatically or by a user creates a shared source library factor_b_source_db, and sets any one or more of the shared source table 1source_table_b_1, the shared source table 2source_table_c_1, the shared source table 3source_table_e_1, the source table source_table_b, and the like to be collectable.
The method comprises the steps that a data acquisition task is automatically or automatically created by a Factory A or a large data platform T1 or a user and distributed to a large data platform T2 of a Factory B, the large data platform T2 of the Factory B automatically generates a cascade data acquisition task, data in a source table source_table_B are firstly acquired into a shared source table source_table_B_1 of a shared source library source_B_DB of the Factory B, then the data acquisition task of the large data platform T1 of the Factory A automatically acquires from the shared source table source_table_B_1 to a destination table ods_source_table_B_1, and each user of the large data platform T1 of the Factory A develops and uses the data from the destination table ods_source_table_b_1 according to requirements.
The large data platform T2 of the factory B automatically generates cascade data acquisition tasks, the data acquisition tasks of the large data platform T1 of the factory A are automatically acquired from the shared source table 2source_table_C_1 to the destination table ods_source_table_C_1, and each user of the large data platform T1 of the factory A develops and uses data from the destination table ods_source_table_C_1 according to the requirements.
The large data platform T2 of the factory B automatically generates cascade data acquisition tasks, the data acquisition tasks of the large data platform T1 of the factory A are automatically acquired from the shared source table 3source_table_E_1 to the destination table ods_source_table_E_1, and each user of the large data platform T1 of the factory A develops and uses data from the destination table ods_source_table_E_1 according to the requirements.
The Factory C or the big data platform T3 automatically or by a user creates a shared source library factor_C_source_DB, and sets a source table source_table_C as collectable.
The method comprises the steps that a data acquisition task is automatically or automatically created by a Factory A or a large data platform T1 or a user and distributed to a large data platform T3 of a Factory C, the large data platform T3 of the Factory C automatically generates a cascade data acquisition task, data in a source table source_table_C is firstly acquired into a shared source table source_table_C_1 of a shared source library source_C_source_DB of the Factory C, then the data acquisition task of the large data platform T1 of the Factory A automatically acquires data from the shared source table_table_C_1 to a destination table ods_source_table_C_1, and each user of the large data platform T1 of the Factory A develops and uses the data from the destination table ods_source_table_C_1 according to requirements.
A second embodiment of the present application provides a processor configured to perform the data management method of the first embodiment of the present application. Therefore, when the processor provided by the embodiment works, the large data platform and other large data platforms can be communicated with a data intercommunication network, data collection and data sharing are carried out between the large data platform and the other large data platforms through the shared source library, the other large data platforms cannot access the core database of the processor to realize data isolation, and accordingly data leakage can be avoided when data collection or data sharing is carried out among the large data platforms, and the embodiment can achieve the purpose of safe data management.
Fig. 5 is a schematic structural diagram of a big data platform according to a second embodiment of the present application. For a clear description of the big data platform 1 provided in the second embodiment of the present application, please refer to fig. 5.
The second embodiment of the present application also provides a big data platform 1, the big data platform 1 comprising at least one processor a101, capable of implementing the steps of the data management method of the first embodiment of the present application.
In an implementation manner, the big data platform 1 in this embodiment may further include at least one memory a201. Optionally, the at least one processor a101 is configured to execute a computer program A6 stored in the at least one memory a201 to implement the steps of the data management method as described in the first embodiment.
Alternatively, the at least one processor a101 may be referred to as a processing unit A1, and the at least one memory a201 may be referred to as a storage unit A2. Alternatively, the storage unit A2 stores a computer program A6, which when executed by the processing unit A1, causes the large data platform 1 provided by the present embodiment to implement the steps of the data management method as described in the first embodiment.
Alternatively, the large data platform 1 provided in the present embodiment may include a plurality of memories a201 (simply referred to as a storage unit A2).
Alternatively, the storage unit A2 may be a volatile memory or a nonvolatile memory, and may include both volatile and nonvolatile memories. alternatively, the nonvolatile Memory may be a Read Only Memory (ROM), a programmable Read Only Memory (PROM, programmable Read-Only Memory), erasable programmable Read-Only Memory (EPROM, erasable Programmable Read-Only Memory), electrically erasable programmable Read-Only Memory (EEPROM, ELECTRICALLY ERASABLE PROGRAMMABLE READ-Only Memory), Magnetic random access Memory (FRAM, ferromagnetic random access Memory), flash Memory (Flash Memory), magnetic surface Memory, optical disk, or compact disk-Only (CD-ROM, compact Disc Read-Only Memory); The magnetic surface memory may be a disk memory or a tape memory. The volatile memory may be random access memory (RAM, random Access Memory) which acts as external cache memory. By way of example and not limitation, many forms of RAM are available, such as static random access memory (SRAM, static Random Access Memory), synchronous static random access memory (SSRAM, synchronous Static Random Access Memory), dynamic random access memory (DRAM, dynamic Random Access Memory), synchronous dynamic random access memory (SDRAM, synchronous Dynamic Random Access Memory), and, Double data rate synchronous dynamic random access memory (DDRSDRAM, double Data Rate Synchronous Dynamic Random Access Memory), enhanced synchronous dynamic random access memory (ESDRAM, enhanced Synchronous Dynamic Random Access Memory), synchronous link dynamic random access memory (SLDRAM, syncLink Dynamic Random Access Memory), direct memory bus random access memory (DRRAM, direct Rambus Random Access Memory). the memory cell A2 described in embodiments of the present application is intended to comprise, without being limited to, these and any other suitable types of memory.
Optionally, the big data platform 1 further comprises a bus connecting the different components (e.g. processor a101, memory a201, touch sensitive display A3).
Optionally, the big data platform 1 in the present embodiment may further include a communication interface (e.g. I/O interface A4), which may be used for communication with an external device or a service system (e.g. a service module).
Optionally, the big data platform 1 provided in the present embodiment may further include a communication device A5 for providing various communication services.
The big data platform 1 provided by the second embodiment of the present application includes a processor a101 and a memory a201, and the processor a101 is configured to execute a computer program A6 stored in the memory a201 to implement the steps of the data management method as described in the first embodiment, and thus, the big data platform 1 provided by the present embodiment.
The second embodiment of the present application also provides a machine-readable storage medium having stored thereon instructions or a computer program A6 which, when executed by a processor, causes the processor to be configured to perform the data management method as provided by the first embodiment. Such as the steps shown in fig. 1.
Alternatively, the machine storage medium capable of being provided by the present embodiment may include any entity or device capable of carrying computer program code, a recording medium, such as ROM, RAM, magnetic disk, optical disk, flash memory, and so forth.
The instructions or the computer program A6 stored in the machine storage medium provided in the second embodiment of the present application, when executed by the processor a101, can enable the large data platform to communicate with other large data platforms to form a data intercommunication network, and perform data collection and data sharing with other large data platforms through the shared source library, where other large data platforms cannot access their own core databases, so as to implement data isolation, thereby avoiding data leakage when data collection or data sharing is performed between multiple large data platforms, and therefore, the present embodiment can implement a secure data management purpose.
Optionally, in the embodiments of the electronic device and the machine storage medium provided by the present application, all technical features of each embodiment of the foregoing data management method are included, and the expansion and explanation contents of the description are substantially the same as those of each embodiment of the foregoing data management method, which is not repeated herein.
Embodiments of the present application also provide a computer program product comprising computer program code which, when run on a computer, causes the computer to perform the method as in the various possible embodiments described above.
The embodiment of the application also provides a chip, which comprises a memory and a processor, wherein the memory is used for storing a computer program, and the processor is used for calling and running the computer program from the memory, so that the device provided with the chip executes the method in the various possible implementation manners.
The foregoing embodiment numbers of the present application are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.
The steps in the method of the embodiment of the application can be sequentially adjusted, combined and deleted according to actual needs.
The units in the device of the embodiment of the application can be combined, divided and deleted according to actual needs.
In the present application, the same or similar term concept, technical solution and/or application scenario description will be generally described in detail only when first appearing and then repeatedly appearing, and for brevity, the description will not be repeated generally, and in understanding the present application technical solution and the like, reference may be made to the previous related detailed description thereof for the same or similar term concept, technical solution and/or application scenario description and the like which are not described in detail later.
In the present application, the descriptions of the embodiments are emphasized, and the details or descriptions of the other embodiments may be referred to.
The technical features of the technical scheme of the application can be arbitrarily combined, and all possible combinations of the technical features in the above embodiment are not described for the sake of brevity, however, as long as there is no contradiction between the combinations of the technical features, the application shall be considered as the scope of the description of the application.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) as above, comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, a controlled terminal, or a network device, etc.) to perform the method of each embodiment of the present application.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the processes or functions in accordance with embodiments of the present application are produced in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable devices. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital subscriber line), or wireless (e.g., infrared, wireless, microwave, etc.). Computer readable storage media can be any available media that can be accessed by a computer or data storage devices, such as servers, data centers, etc., that contain an integration of one or more available media. Usable media may be magnetic media (e.g., floppy disks, storage disks, magnetic tape), optical media (e.g., DVD), or semiconductor media (e.g., solid state storage disk Solid STATE DISK (SSD)), etc.
The foregoing description is only of the preferred embodiments of the present application, and is not intended to limit the scope of the application, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims (10)

1. A method of data management for a first large data platform configured with a first database and a first shared source library, comprising:
collecting original data and storing the original data into the first database, wherein the first database is accessed by the first big data platform;
storing the data meeting the preset conditions in the first database to the first shared source library;
and transmitting the data in the first sharing source library to at least one second big data platform for data intercommunication according to a preset sharing rule.
2. The method according to claim 1, wherein before the step of transmitting the data in the first shared source library to the at least one second big data platform for data interworking according to a preset sharing rule, the method comprises:
establishing a data transmission channel with at least one second big data platform;
And sending the first platform information of the second large data platform to the second large data platform so that the second large data platform can establish a first data acquisition task according to the first platform information.
3. The data management method according to claim 2, wherein the step of transmitting the data in the first shared source library to at least one second large data platform for data interworking according to a preset sharing rule comprises:
identifying sensitive data and corresponding sensitive grades in the data corresponding to the first data acquisition task in the shared source library through a sensitive data identification technology;
according to the sensitivity level corresponding to the sensitive data, performing corresponding desensitization treatment;
And transmitting the conventional data corresponding to the first data acquisition task in the shared source library and the desensitized sensitive data to the second big data platform of at least one data intercommunication.
4. The data management method according to claim 1, further comprising:
Acquiring second platform information sent by the second big data platform;
Creating a second data acquisition task according to the second platform information;
and receiving and storing the data fed back by the second big data platform based on the second data acquisition task.
5. The method according to claim 4, wherein the step of receiving and storing the data fed back by the second big data platform based on the second data acquisition task comprises:
Based on the second data acquisition task, a data path with the second big data platform is connected;
Receiving data fed back by the second big data platform through the data path, and performing network security monitoring;
when the detection result corresponds to a network attack, a threat defense mechanism is activated.
6. The data management method of claim 5, wherein the threat defense mechanism comprises at least one of:
interrupting receiving the data transmitted by the data path;
discarding received data corresponding to the data acquisition task;
preventing data corresponding to the unauthorized IP address;
generating threat defense logs;
And executing corresponding reminding operation based on the detection result.
7. The data management method according to any one of claims 1 to 6, wherein the step of storing the data fed back by the second large data platform includes at least one of:
storing the data fed back by the second big data platform to the first shared source library;
Storing the data fed back by the second big data platform into a shared source table corresponding to the second big data platform in the first shared source library;
After data processing is carried out on the data fed back by the second big data platform, the data are stored into a first forward library of the second big data platform;
and after data processing is carried out on the data fed back by the second big data platform, storing the data into a destination table corresponding to the second big data platform in a first destination library of the data processing device.
8. A data management system, comprising: a first big data platform and at least one second big data platform;
The first big data platform is configured with a first database and a first shared source library, and is used for collecting and storing original data to the first database, and storing data meeting preset conditions in the first database to the first shared source library, wherein the first database is accessed by the first big data platform;
the second big data platform is in data intercommunication with the first big data platform and is used for receiving data in the first shared source library transmitted by the first big data platform according to a preset sharing rule.
9. A big data platform, comprising: a memory, a processor, the memory having stored thereon a computer program which, when executed by the processor, implements the steps of the data management method of any of claims 1 to 7.
10. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the steps of the data management method according to any of claims 1 to 7.
CN202410675588.4A 2024-05-27 2024-05-27 Data management method, data management system, big data platform and storage medium Pending CN118536159A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410675588.4A CN118536159A (en) 2024-05-27 2024-05-27 Data management method, data management system, big data platform and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410675588.4A CN118536159A (en) 2024-05-27 2024-05-27 Data management method, data management system, big data platform and storage medium

Publications (1)

Publication Number Publication Date
CN118536159A true CN118536159A (en) 2024-08-23

Family

ID=92384068

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410675588.4A Pending CN118536159A (en) 2024-05-27 2024-05-27 Data management method, data management system, big data platform and storage medium

Country Status (1)

Country Link
CN (1) CN118536159A (en)

Similar Documents

Publication Publication Date Title
CN107577939B (en) Data leakage prevention method based on keyword technology
CN107579956B (en) User behavior detection method and device
Sikos AI in digital forensics: Ontology engineering for cybercrime investigations
CN110210213B (en) Method and device for filtering malicious sample, storage medium and electronic device
CN110855676A (en) Network attack processing method and device and storage medium
CN111404937B (en) Method and device for detecting server vulnerability
KR20190010956A (en) intelligence type security log analysis method
US10445514B1 (en) Request processing in a compromised account
CN113177205B (en) Malicious application detection system and method
CN111274276A (en) Operation auditing method and device, electronic equipment and computer-readable storage medium
CN111314301A (en) Website access control method and device based on DNS (Domain name Server) analysis
CN115470489A (en) Detection model training method, detection method, device and computer readable medium
EP4024252A1 (en) A system and method for identifying exploited cves using honeypots
CN112714118B (en) Network traffic detection method and device
Zammit A machine learning based approach for intrusion prevention using honeypot interaction patterns as training data
CN115361182B (en) Botnet behavior analysis method, device, electronic equipment and medium
Al-Hussaeni et al. A Review of Internet of Things (IoT) Forensics Frameworks and Models
Minkevics et al. Methods, Models and Techniques to Improve Information System's Security in Large Organizations.
CN118536159A (en) Data management method, data management system, big data platform and storage medium
CN112257100A (en) Method and device for detecting sensitive data protection effect and storage medium
Anashkin et al. Implementation of Behavioral Indicators in Threat Detection and User Behavior Analysis
CN115378670B (en) APT attack identification method and device, electronic equipment and medium
CN116488947B (en) Security element treatment method
Alosaimi et al. Computer Vision‐Based Intrusion Detection System for Internet of Things
US12028376B2 (en) Systems and methods for creation, management, and storage of honeyrecords

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination