Disclosure of Invention
In view of this, embodiments of the present invention provide a method and a system for extracting and identifying spatial features of data, which solve the problem in the prior art that identification of sensitive data only judges from data content features without considering differences of service scenes, resulting in high identification misjudgment rate.
In a first aspect, an embodiment of the present invention provides a method for extracting spatial features of data, including: acquiring data characteristics of historical data, and constructing a data space characteristic model according to the data characteristics; and performing spatial feature extraction on the historical data based on the data spatial feature model, and determining the spatial features of the historical data to form a historical data spatial feature library.
Optionally, the data characteristics include: the data attribute, the user attribute, the operation attribute and the environment attribute, and the data characteristic of the acquired historical data comprises the following steps: performing data analysis on a storage database of the historical data to determine the data attribute of the historical data; acquiring a preset relation between user information and access authority and access user information of the historical data, and determining the access authority of the historical data according to the user information and the preset relation so as to determine the user attribute of the historical data; acquiring operation information for operating the historical data to determine the operation attribute of the historical data; and acquiring gateway information of a user accessing the historical data to determine the environmental attribute of the historical data.
In a second aspect, an embodiment of the present invention further provides a data identification method, including: acquiring current electric power data, and determining a current electric power data spatial feature library by using the data spatial feature extraction method of the first aspect of the embodiment; acquiring a historical data spatial feature library, and determining the sensitive data spatial feature library by using preset sensitive data features and the historical data spatial feature library; and performing modular operation on the current electric power data spatial feature library and the sensitive data spatial feature library to determine the identification result of the current electric power data.
Optionally, the obtaining a historical data spatial feature library, and determining the sensitive data spatial feature library by using a preset sensitive data feature and the historical data spatial feature library, includes: acquiring a preset rule of sensitive data; and screening the historical data spatial feature library based on the preset rule to determine a sensitive data spatial feature library.
Optionally, the performing a modular operation on the current power data spatial feature library and the sensitive data spatial feature library to determine an identification result of the current power data includes: performing modular operation on the current electric power data spatial feature library and the sensitive data spatial feature library to determine a first operation result; and when the first operation result is zero, judging that the current electric power data are sensitive electric power data.
Optionally, the method for identifying data provided in the embodiment of the present invention further includes: when the first operation result is not zero, performing modular operation on the current power data space characteristic library and the historical data space characteristic library to determine a second operation result; and when the second operation result is zero, judging that the current electric power data is conventional electric power data. When the second operation result is not zero, judging that the current power data are newly generated power data; adding the newly generated power data to the historical data spatial feature library.
In a third aspect, an embodiment of the present invention further provides a system for extracting spatial features of data, including: the model building module is used for obtaining the data characteristics of historical data and building a data space characteristic model according to the data characteristics; and the extraction module is used for extracting the spatial features of the historical data based on the data spatial feature model and determining the spatial features of the historical data to form a historical data spatial feature library.
In a fourth aspect, an embodiment of the present invention further provides a data identification system, including: a first processing module, configured to obtain current power data, and determine a current power data spatial feature library by using a spatial feature extraction system for data according to the third aspect of this embodiment; the second processing module is used for acquiring a historical data spatial feature library and determining the sensitive data spatial feature library by utilizing preset sensitive data features and the historical data spatial feature library; and the third processing module is used for performing modular operation on the current electric power data spatial feature library and the sensitive data spatial feature library to determine the identification result of the current electric power data.
In a fifth aspect, the present invention further provides a computer-readable storage medium, which stores computer instructions to execute the method provided in the first aspect or the second aspect of the present invention.
An embodiment of the present invention further provides an electronic device, including: the device comprises a memory and a processor, wherein the memory and the processor are communicatively connected with each other, the memory stores computer instructions, and the processor is used for executing the computer instructions to execute the method provided by the first aspect or the second aspect of the embodiment of the invention through executing the computer instructions.
The technical scheme of the invention has the following advantages:
1. the method for extracting the spatial features of the data realizes the spatial feature extraction of the historical data by constructing a data spatial feature model, and determines the spatial features of the historical data to form a historical data spatial feature library; sensitive attributes such as application scenes and access objects of the data are fully considered, and multidimensional space features are constructed, so that completeness of an identification process of the sensitive data is guaranteed, and identification rate of the sensitive data is guaranteed.
2. According to the data identification method provided by the invention, the identification result of the current electric power data is determined by constructing the current electric power data spatial feature library and the sensitive data spatial feature library; the problem that the identification accuracy of a traditional sensitive data identification method of an application scene without considering data is low is solved, accurate identification of sensitive data in electric power mass data is achieved based on matching identification of the space characteristic vector, automatic identification of electric power sensitive data identification can be supported, identification efficiency is improved, and the data safety protection level is improved.
Detailed Description
The technical solutions of the present invention will be described clearly and completely with reference to the accompanying drawings, and it should be understood that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In addition, the technical features involved in the different embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
The embodiment of the invention provides a method for extracting spatial features of data, which fully considers the sensitive attributes of an application scene, an access object and the like of the data to construct multidimensional spatial features, thereby being suitable for the situation that the judgment on whether the same data is sensitive or not is different in different service scenes, solving the problem that the sensitive data identification accuracy is not high in the traditional (such as a method for identifying sensitive data only through data content features and the like) and static (such as an application scene and a service environment without considering the data) methods, and fully considering the sensitive attributes of the application scene, the access object and the like of the data to construct the multidimensional spatial features. For example, the same piece of user information belongs to sensitive information in a user profile, irrelevant people should not have access rights, and the information is not sensitive information as a meeting notification contact, and misjudgment will occur if the information is judged only from the data content characteristics.
Specifically, as shown in fig. 1, the method for extracting spatial features of data specifically includes:
step S01: and acquiring the data characteristics of the historical data, and constructing a data space characteristic model according to the data characteristics.
In this embodiment, first, data characteristics of each historical data are obtained, then, a data spatial feature model is constructed according to the data characteristics as shown in fig. 2, where different dotted line types represent different data attributes, that is, first, a power sensitive data spatial feature model is constructed from 4 aspects of data characteristics, user attributes, data attributes, operation attributes, environment attributes, and the like, and then, when a visitor needs to use sensitive data, attribute features of the 4 dimensions are collected in real time, and the sensitive data are determined and identified through spatial feature matching.
Step S02: and performing spatial feature extraction on the historical data based on the data spatial feature model, and determining the spatial features of the historical data to form a historical data spatial feature library. In this embodiment, the determined data spatial feature model is used to perform spatial feature extraction on the historical data, determine each spatial feature corresponding to the historical data, and further construct a historical data spatial feature library, where V ═1,V2,…,Vn}。
The method for extracting the spatial features of the data realizes the spatial feature extraction of the historical data by constructing a data spatial feature model, and determines the spatial features of the historical data to form a historical data spatial feature library; sensitive attributes such as application scenes and access objects of the data are fully considered, and multidimensional space features are constructed, so that completeness of an identification process of the sensitive data is guaranteed, and identification rate of the sensitive data is guaranteed.
In one embodiment, the data characteristics include: the process of executing step S01 may specifically include the following steps:
specifically, the collected data characteristics include: the information of the data attribute, the user attribute, the operation attribute, the environment attribute, etc. is recorded as
Wherein
The attributes of the data are represented by,
the attributes of the user are represented by,
the attribute of the operation is represented by,
representing an environmental attribute. The data attributes include data type, data content characteristics, data security level, data owner, data manager, data access time, data timeliness and the like. User attributes should include a unique user identity, ID, organization to which it belongs, network location where access is to be located, user temporary privilege pass, etc. The operation attribute should include the type of operation that may be performed on the data, such as a read operation, an add operation, a modify operation, a delete operation, and the like. The environment attribute shall include the specific access environment of the data in the actual scene, such as possible access service, geographic location of access, time of use of access, etc.
Step S011: and performing data analysis on a storage database of the historical data to determine the data attribute of the historical data. In the embodiment of the invention, the data attribute is obtained by analyzing the power data (historical data) storage database regularly or in advance, for example, the mailbox and the address of a client belong to different types of client information, and each client also has respective content characteristic keywords, and information such as data security level, data owner, data manager, data access time, data timeliness and the like can be obtained through the attribute and description of a database table. When a certain field of a certain table in a certain database is accessed, the data attribute can be collected by the method.
Step S012: the method comprises the steps of obtaining a preset relation between user information and access authority and access user information of historical data, and determining the access authority of the historical data according to the user information and the preset relation so as to determine user attributes of the historical data.
In the embodiment of the invention, the user attribute can be obtained from an authority control system of an electric power information system, when a user needs to access certain electric power data (service data), the user firstly logs in the information system through a wired or wireless network, the information system authenticates an account number, a password, an IP address and the like of the user, and at the moment, the user attribute such as a unique user Identity (ID), an affiliated organization unit, a network position where the user accesses and the like can be collected; when the user needs to access the super-authority data, the user needs to pass the approval of management departments at all levels to obtain the temporary authority, and the temporary privilege pass information of the user can be collected.
Step S013: operation information of the operation history data is acquired to determine operation attributes of the history data. In practical application, the operation attribute can be acquired by the power information system or the database server, when a user needs to access certain service data, the data reading operation, the adding operation, the modifying operation and the deleting operation can be carried out through the power information system, and the operation attribute of the user can be acquired at the moment; the power information system also sends the relevant operation to the background database server for execution, and at the moment, the operation attribute of the user can be acquired.
Step S14: gateway information of a user accessing the historical data is obtained to determine environmental attributes of the historical data. In the embodiment of the invention, the environment attribute can be acquired by an access network of a user, when the user needs to access certain service data, the user can access the border gateway of the power information network through a wired or wireless network to log in and use a related system, at the moment, the information of the geographic position, the time, the IP address, the terminal type and the like of the access data of the user can be acquired, and the access data of the user through a private network, the Internet, a fixed network or a mobile network can also be acquired.
The method for extracting the spatial features of the data realizes the spatial feature extraction of the historical data by constructing a data spatial feature model, and determines the spatial features of the historical data to form a historical data spatial feature library; sensitive attributes such as application scenes and access objects of the data are fully considered, and multidimensional space features are constructed, so that completeness of an identification process of the sensitive data is guaranteed, and identification rate of the sensitive data is guaranteed.
The embodiment of the present invention further provides a data identification method, as shown in fig. 3, specifically including the following steps:
step S1: and acquiring current power data, and determining a current power data spatial feature library. In this embodiment, when the power system receives an access request, the current power data that needs to be accessed and corresponds to the access request is determined, and then the spatial feature extraction method of the data is used to determine the spatial feature library V' of the current power data.
Step S2: and acquiring a historical data spatial feature library, and determining the sensitive data spatial feature library by using preset sensitive data features and the historical data spatial feature library. The acquired historical data spatial feature library is determined according to the spatial feature extraction method of the data.
Step S3: and performing modular operation on the current power data spatial feature library and the sensitive data spatial feature library to determine the identification result of the current power data.
According to the data identification method provided by the invention, the identification result of the current electric power data is determined by constructing the current electric power data spatial feature library and the sensitive data spatial feature library; the problem that the identification accuracy of a traditional sensitive data identification method of an application scene without considering data is low is solved, accurate identification of sensitive data in electric power mass data is achieved based on matching identification of the space characteristic vector, automatic identification of electric power sensitive data identification can be supported, identification efficiency is improved, and the data safety protection level is improved.
In an embodiment, the step S2 includes the following steps:
step S21: and acquiring a preset rule of the sensitive data. The preset rule of the sensitive data can be used by security personnel and business personnel to comb and define the characteristics of the power sensitive data according to the relevant sensitive data system specifications of the country, the industry and the power enterprise, and can be adjusted according to actual requirements.
Step S22: and screening the historical data spatial feature library based on a preset rule to determine the sensitive data spatial feature library. In this embodiment, based on a preset rule, based on the data attribute of the current power data, the features of the power sensitive data are sorted and defined, and then the historical data spatial feature library is screened to complete the construction of the initial sensitive data spatial feature library, which can be recorded as Vs={Vs1,Vs2,…,VsnV, in which a library of sensitive data spatial features VsIs a subset of the historical data spatial feature library V.
Specifically, the step S3 further includes the following steps:
step S31: and performing modular operation on the current power data spatial feature library and the sensitive data spatial feature library to determine a first operation result.
The modular operation is widely applied to program writing, the modular operation is widely applied to both number theory and program design, the discrimination from odd numbers and even numbers to the discrimination of prime numbers and the solution from modular exponentiation operation to the greatest common divisor have no figure which does not fill the modular operation. Specifically, a current power data space feature library V' and a sensitive data space feature library V are combinedsPerforming modulo operation to determine the first operation resultCan be expressed by the following formula:
step S32: and when the first operation result is zero, judging that the current electric power data are sensitive electric power data. In particular, if | V' -VsIf | ═ 0, it can be determined that the currently accessed power data is sensitive.
Step S33: and when the first operation result is not zero, performing modular operation on the current power data space characteristic library and the historical data space characteristic library to determine a second operation result. Specifically, the modulo operation method is consistent with the above formula (1), and is not described herein again.
Step S34: and when the second operation result is zero, judging that the current electric power data is the conventional electric power data. If | V' -VsIf | is not equal to 0, but | V' -V | ═ 0, it can be determined that the currently accessed power data is not sensitive, that is, the current power data is regular power data.
Step S35: and when the second operation result is not zero, judging that the current power data are newly generated power data. If | V' -VsIf | ≠ 0 and | V' -V | ≠ 0, it can be determined that the currently accessed power data is new power data generated under a new service, and then the historical data spatial feature library can be updated through the newly generated power data, specifically, the historical data spatial feature library can be updated through real-time generation, or the database can be updated through storage at a fixed time, which is not limited in this embodiment.
Step S36: newly generated power data is added to the historical data spatial feature library. In this embodiment, the newly generated power data is included in the historical data spatial feature V. Then, security personnel and business personnel judge whether the data is sensitive according to the relevant sensitive data system specifications of the country, the industry and the power enterprise, and if the data is sensitive, the data is brought into a power sensitive data feature database Vs。
According to the data identification method provided by the invention, the identification result of the current electric power data is determined by constructing the current electric power data spatial feature library and the sensitive data spatial feature library; the problem that the identification accuracy of a traditional sensitive data identification method of an application scene without considering data is low is solved, accurate identification of sensitive data in electric power mass data is achieved based on matching identification of the space characteristic vector, automatic identification of electric power sensitive data identification can be supported, identification efficiency is improved, and the data safety protection level is improved.
An embodiment of the present invention further provides a system for extracting spatial features of data, as shown in fig. 4, the system includes:
and the model building module 01 is used for obtaining the data characteristics of the historical data and building a data space characteristic model according to the data characteristics. For details, refer to the related description of step S01 in the above method embodiment, and are not described herein again.
And the extraction module 02 is used for performing spatial feature extraction on the historical data based on the data spatial feature model, and determining the spatial features of the historical data to form a historical data spatial feature library. For details, refer to the related description of step S02 in the above method embodiment, and are not described herein again.
Through the cooperative cooperation of all module components, the spatial feature extraction system of the data provided by the invention realizes the spatial feature extraction of historical data by constructing a data spatial feature model, and determines the spatial features of the historical data to form a historical data spatial feature library; sensitive attributes such as application scenes and access objects of the data are fully considered, and multidimensional space features are constructed, so that completeness of an identification process of the sensitive data is guaranteed, and identification rate of the sensitive data is guaranteed.
An embodiment of the present invention further provides a data identification system, as shown in fig. 5, where the system includes:
a first processing module 1 for obtaining current power data, determining a current power data spatial feature library using the spatial feature extraction system of data according to claim 8. For details, refer to the related description of step S1 in the above method embodiment, and are not described herein again.
And the second processing module 2 is used for acquiring a historical data spatial feature library and determining the sensitive data spatial feature library by using the preset sensitive data features and the historical data spatial feature library. For details, refer to the related description of step S2 in the above method embodiment, and are not described herein again.
And the third processing module 3 is used for performing modular operation on the current power data spatial feature library and the sensitive data spatial feature library to determine the identification result of the current power data. For details, refer to the related description of step S3 in the above method embodiment, and are not described herein again.
Specifically, as shown in fig. 6, an interaction process of the data identification system provided in this embodiment is that a data server, a unified permission system, an electric power service system, and a unified access gateway respectively determine data attributes, user attributes, operation attributes, and environment attributes, then perform acquisition of corresponding attributes by using each acquisition module, perform a sensitive data spatial feature library, finally perform matching calculation of sensitive data spatial features, and perform accurate identification and maintenance on sensitive data of the electric power service system after determining the sensitive data.
Through the cooperative cooperation of all module components, the data identification system provided by the invention determines the identification result of the current electric power data by constructing the current electric power data spatial feature library and the sensitive data spatial feature library; the problem that the identification accuracy of a traditional sensitive data identification method of an application scene without considering data is low is solved, accurate identification of sensitive data in electric power mass data is achieved based on matching identification of the space characteristic vector, automatic identification of electric power sensitive data identification can be supported, identification efficiency is improved, and the data safety protection level is improved.
An embodiment of the present invention provides a computer device, as shown in fig. 7, including: at least one processor 401, such as a CPU (Central Processing Unit), at least one communication interface 403, memory 404, and at least one communication bus 402. Wherein a communication bus 402 is used to enable connective communication between these components. The communication interface 403 may include a Display (Display) and a Keyboard (Keyboard), and the optional communication interface 403 may also include a standard wired interface and a standard wireless interface. The Memory 404 may be a RAM (random Access Memory) or a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. The memory 404 may optionally be at least one memory device located remotely from the processor 401. Wherein the processor 401 may perform a spatial feature extraction method of data or a recognition method of data. A set of program codes is stored in the memory 404, and the processor 401 calls the program codes stored in the memory 404 for performing the above-described spatial feature extraction method of data or the identification method of data.
The communication bus 402 may be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus. The communication bus 402 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one line is shown in FIG. 7, but it is not intended that there be only one bus or one type of bus.
The memory 404 may include a volatile memory (RAM), such as a random-access memory (RAM); the memory may also include a non-volatile memory (english: non-volatile memory), such as a flash memory (english: flash memory), a hard disk (english: hard disk drive, abbreviated: HDD) or a solid-state drive (english: SSD); the memory 404 may also comprise a combination of memories of the kind described above.
The processor 401 may be a Central Processing Unit (CPU), a Network Processor (NP), or a combination of a CPU and an NP.
The processor 401 may further include a hardware chip. The hardware chip may be an application-specific integrated circuit (ASIC), a Programmable Logic Device (PLD), or a combination thereof. The PLD may be a Complex Programmable Logic Device (CPLD), a field-programmable gate array (FPGA), a General Array Logic (GAL), or any combination thereof.
Optionally, the memory 404 is also used to store program instructions. The processor 401 may call program instructions to implement a spatial feature extraction method of data or an identification method of data as in the present application.
The embodiment of the invention also provides a computer-readable storage medium, wherein a computer-executable instruction is stored on the computer-readable storage medium, and the computer-executable instruction can execute a space feature extraction method of data or a data identification method. The storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a Flash Memory (Flash Memory), a Hard Disk (Hard Disk Drive, abbreviated as HDD), a Solid-State Drive (SSD), or the like; the storage medium may also comprise a combination of memories of the kind described above.
It should be understood that the above examples are only for clarity of illustration and are not intended to limit the embodiments. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. And obvious variations or modifications of the invention may be made without departing from the spirit or scope of the invention.