CN117076610A - Identification method and device of data sensitive table, electronic equipment and storage medium - Google Patents

Identification method and device of data sensitive table, electronic equipment and storage medium Download PDF

Info

Publication number
CN117076610A
CN117076610A CN202311105446.6A CN202311105446A CN117076610A CN 117076610 A CN117076610 A CN 117076610A CN 202311105446 A CN202311105446 A CN 202311105446A CN 117076610 A CN117076610 A CN 117076610A
Authority
CN
China
Prior art keywords
data table
information data
sensitive
feature
identified
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311105446.6A
Other languages
Chinese (zh)
Inventor
钟丹晔
鲍鑫伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Baowangda Software Technology Co ltd
Original Assignee
Jiangsu Baowangda Software Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Baowangda Software Technology Co ltd filed Critical Jiangsu Baowangda Software Technology Co ltd
Priority to CN202311105446.6A priority Critical patent/CN117076610A/en
Publication of CN117076610A publication Critical patent/CN117076610A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes

Abstract

The invention discloses a data sensitive table identification method, a data sensitive table identification device, electronic equipment and a storage medium. Acquiring an information data table to be identified in real time; performing feature extraction on the information data table to be identified according to a preset table feature extraction method to obtain a feature information data table corresponding to the information data table to be identified; inputting the characteristic information data table into a pre-trained sensitive data table recognition model, and determining a data table recognition result corresponding to the information data table to be recognized; and if the data table identification result is a sensitive data table identification result, acquiring and automatically executing a safety control measure associated with the sensitive data table identification result so as to ensure the data information safety of the sensitive data table according to the safety control measure. The problem of sensitive information leakage caused by the fact that sensitive data in a table cannot be identified in time is solved, and accuracy and efficiency of identifying the sensitive data are improved.

Description

Identification method and device of data sensitive table, electronic equipment and storage medium
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a method and apparatus for identifying a data sensitive table, an electronic device, and a storage medium.
Background
In modern society, data is increasingly widely used, and the risks of data leakage and privacy leakage are also increased. Therefore, it is important to protect the privacy and sensitive information of the user.
The inventors have found that the following drawbacks exist in the prior art in the process of implementing the present invention: at present, for identifying sensitive data, omission and errors are easy to occur according to professional knowledge of personnel, and the method is mainly characterized by comprising the following steps: first, grammar determination: usually, the correctness of the written grammar needs artificial judgment, and has the defects of insufficient precision and low efficiency. Second, lack of global view: the expert is required to have a deep knowledge of the sensitive information, and the person's analysis range and viewing angle are limited, possibly ignoring some critical data or data outside the range. Third, time factor: particularly in the current explosive growth environment of data, the requirement for rapid processing of data cannot be met.
Disclosure of Invention
The invention provides a data sensitive table identification method, a device, electronic equipment and a storage medium, so as to realize accuracy and efficiency of sensitive data identification and increase accuracy of sensitive data identification.
According to an aspect of the present invention, there is provided a method for identifying a data sensitive table, including:
acquiring an information data table to be identified in real time;
performing feature extraction on the information data table to be identified according to a preset table feature extraction method to obtain a feature information data table corresponding to the information data table to be identified;
inputting the characteristic information data table into a pre-trained sensitive data table recognition model, and determining a data table recognition result corresponding to the information data table to be recognized;
and if the data table identification result is a sensitive data table identification result, acquiring and automatically executing a safety control measure associated with the sensitive data table identification result so as to ensure the data information safety of the sensitive data table according to the safety control measure.
According to another aspect of the present invention, there is provided an identification apparatus for a data sensitive table, including:
the information data table acquisition module is used for acquiring the information data table to be identified in real time;
the characteristic information data table obtaining module is used for carrying out characteristic extraction on the information data table to be identified according to a preset table characteristic extraction method to obtain a characteristic information data table corresponding to the information data table to be identified;
The data table identification result determining module is used for inputting the characteristic information data table into a pre-trained sensitive data table identification model and determining a data table identification result corresponding to the information data table to be identified;
and the safety control measure acquisition and automatic execution module is used for acquiring and automatically executing the safety control measure associated with the sensitive data table identification result if the data table identification result is the sensitive data table identification result so as to ensure the data information safety of the sensitive data table according to the safety control measure.
According to another aspect of the present invention, there is provided an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements a method for identifying a data sensitive table according to any embodiment of the present invention when executing the computer program.
According to another aspect of the present invention, there is provided a computer readable storage medium storing computer instructions for causing a processor to implement a method for identifying a data sensitive table according to any embodiment of the present invention when executed.
According to the technical scheme, the information data table to be identified is obtained in real time; performing feature extraction on the information data table to be identified according to a preset table feature extraction method to obtain a feature information data table corresponding to the information data table to be identified; inputting the characteristic information data table into a pre-trained sensitive data table recognition model, and determining a data table recognition result corresponding to the information data table to be recognized; and if the data table identification result is a sensitive data table identification result, acquiring and automatically executing a safety control measure associated with the sensitive data table identification result so as to ensure the data information safety of the sensitive data table according to the safety control measure. The method solves the problem of sensitive information leakage caused by incapability of timely identifying the sensitive data in the form, improves the accuracy and efficiency of identifying the sensitive data, increases the accuracy of identifying the sensitive data, and ensures the safety of the sensitive data.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the invention or to delineate the scope of the invention. Other features of the present invention will become apparent from the description that follows.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a method for identifying a data sensitive table according to a first embodiment of the present invention;
fig. 2 is a schematic structural diagram of an identification device of a data sensitive table according to a second embodiment of the present invention;
fig. 3 is a schematic structural diagram of an electronic device according to a third embodiment of the present invention.
Detailed Description
In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
It should be noted that the terms "target," "current," and the like in the description and claims of the present invention and the above-described drawings are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Example 1
Fig. 1 is a flowchart of a method for identifying a data sensitive table according to an embodiment of the present invention, where the method may be performed by an identifying device of the data sensitive table, and the identifying device of the data sensitive table may be implemented in hardware and/or software.
Accordingly, as shown in fig. 1, the method includes:
s110, acquiring an information data table to be identified in real time.
The information data table to be identified can be a data table needing sensitive data information identification.
Specifically, the information data table to be identified includes a list head, a list name and a table content, and table information such as a table data type, a list length and the like can be further determined according to the table content.
Further, the information data table to be identified can be a normal information data table or a sensitive information data table.
In addition, the information data table to be identified can come from different fields, for example: medical data, financial data, personal privacy data, and the like.
And S120, carrying out feature extraction on the information data table to be identified according to a preset table feature extraction method to obtain a feature information data table corresponding to the information data table to be identified.
The feature information data table may be an information data table obtained after feature extraction of the information data table to be identified.
Specifically, the table feature extraction method may be determined to include at least one of the following: keyword extraction method, text feature extraction method, numerical feature extraction method, and column selection extraction method.
In this embodiment, feature extraction is required to be performed on the information data table to be identified according to a preset table feature extraction method, that is, feature extraction is performed according to a keyword extraction method, a text feature extraction method, a numerical feature extraction method, a column selection extraction method, and the like, so that the feature information data table can be further obtained. Correspondingly, the identification of the sensitive data can be performed through the sensitive data table identification model according to the characteristic information data table.
Optionally, the feature extraction is performed on the information data table to be identified according to a preset table feature extraction method to obtain a feature information data table corresponding to the information data table to be identified, including: extracting keywords from the information data table to be identified by a preset keyword extraction method to obtain a keyword characteristic information data table; extracting text features from the keyword feature information data table by a text feature extraction method to obtain a text feature information data table; extracting numerical characteristics from the text characteristic information data table by a numerical characteristic extraction method to obtain a numerical characteristic information data table; and carrying out column information selection and extraction on the numerical characteristic information data table by a column selection and extraction method to obtain a characteristic information data table corresponding to the information data table to be identified.
In this embodiment, the keyword extraction method may specifically include a keyword bag model or TF-IDF (term frequency-inverse text frequency index) method, to perform keyword extraction processing on the information data table to be identified, to obtain a keyword feature information data table; the keyword extraction method is not particularly limited here.
Further, the text feature extraction method is not particularly limited, and the text feature extraction method adopted on the side can be a word bag model or a TF-IDF method, and the like.
Further, extracting the numerical characteristics of the text characteristic information data table by a numerical characteristic extraction method to obtain the numerical characteristic information data table; here, a column composed of numerical information is mainly extracted, and statistical characteristics, for example, information such as an average value, a maximum value, a minimum value, or a standard deviation, can be calculated. In addition, the extraction of the statistical features can be further performed according to the calculated statistical features, that is, the statistical analysis of the whole information data table can be realized, and the extraction of the overall statistical features can be, for example, an overall average value, an overall standard deviation and the like.
Correspondingly, a column information selection extraction method is used for carrying out column information selection extraction on the digital characteristic information data table, so as to obtain a characteristic information data table corresponding to the information data table to be identified. The selection processing of the related information columns can be mainly performed in the information data table according to the data information type and the target requirement, so that the operation of extracting the column information selection characteristics is realized.
Preferably, the method further comprises time feature extraction, and the time feature extraction operation can be performed based on the numerical feature information data table. The temporal features are extracted mainly for columns containing temporal information. For example: year, quarter, month, day of week, etc.
Optionally, the column information selecting and extracting method performs column information selecting and extracting on the numerical characteristic information data table to obtain a characteristic information data table corresponding to the information data table to be identified, including: acquiring list data information corresponding to the column selection extraction method; wherein the list data information includes at least one of: list header, column name, table data type, and column length; and respectively carrying out column information selection and extraction on the numerical characteristic information data table according to the list head, the column name, the table data type and the column length to obtain a characteristic information data table corresponding to the information data table to be identified.
In this embodiment, feature extraction may be performed according to parameter feature requirements corresponding to the list header, the list name, the table data type, and the list length, and further, a feature information data table corresponding to the information data table to be identified may be obtained.
Optionally, the table feature extraction method further includes: an associated feature extraction method; and respectively carrying out column information selection and extraction on the numerical characteristic information data table according to the list head, the column name, the table data type and the column length to obtain a characteristic information data table corresponding to the information data table to be identified, and further comprising: calculating and extracting the associated features of the feature information data table by an associated feature extraction method to obtain an associated feature information data table; and adding the associated characteristic information data table into the characteristic information data table, and inputting the processed characteristic information data table into the sensitive data table identification model.
In this embodiment, analysis is required according to the feature information data table, and calculation processing of indexes such as correlation, covariance and the like is performed according to a preset association relationship of a plurality of columns, so that multi-feature association processing operation is further realized, and thus the associated feature information data table is obtained.
Accordingly, the identification and determination processing operation of the sensitive data can be performed through the sensitive data table identification model according to the associated characteristic information data table.
S130, inputting the characteristic information data table into a pre-trained sensitive data table recognition model, and determining a data table recognition result corresponding to the information data table to be recognized.
In this embodiment, the recognition processing operation of whether the feature information data table is the sensitive information data table may be performed through the sensitive data table recognition model, and further, a specific data table recognition result may be determined.
Specifically, the data table identification result may include a sensitive data table identification result and a normal data table identification result.
Optionally, the inputting the feature information data table into a pre-trained sensitive data table recognition model, determining a data table recognition result corresponding to the information data table to be recognized, includes: performing feature preprocessing on the feature information data table according to a feature preprocessing method to obtain a feature preprocessing information data table; wherein the feature pretreatment method comprises at least one of the following steps: feature coding, feature marking, feature selection and feature dimension reduction; inputting the characteristic preprocessing information data table into a pre-trained sensitive data table recognition model, and determining a data table recognition result corresponding to the information data table to be recognized.
In this embodiment, since the feature information data table includes a plurality of data information features, feature preprocessing may be performed according to a feature preprocessing method, and specifically, feature encoding, feature labeling, feature selection, and feature dimension reduction may be included.
The detailed process can be as follows: the extracted features are encoded and labeled and converted into a format that can be used by a machine learning algorithm, where the extracted features can be encoded by using a method such as single-hot encoding or tag encoding, and the like, and are not particularly limited herein. In addition, according to the importance and the relevance of the features, feature selection and dimension reduction processing are performed to reduce the dimension and redundant information of the features.
After feature pretreatment is carried out on the feature information data table, a feature pretreatment information data table is obtained, then the feature pretreatment information data table is input into a pre-trained sensitive data table identification model, and a data table identification result corresponding to the information data table to be identified is determined.
And S140, if the data table identification result is a sensitive data table identification result, acquiring and automatically executing a safety control measure associated with the sensitive data table identification result so as to ensure the data information safety of the sensitive data table according to the safety control measure.
In this embodiment, if it is determined that the data table identification result is a sensitive data table identification result, it is indicated that the information data table to be identified includes sensitive information, so that a security control measure associated with the sensitive data table identification result needs to be acquired, and specifically, the security control measure may include, but is not limited to, operations such as authority control, application approval, and the like.
Further, the automatic triggering execution operation is performed according to the acquired safety control measures, so that the data information safety of the sensitive data table can be guaranteed according to the safety control measures, and the data leakage caused by the information data table to be identified containing the sensitive information can be prevented.
Optionally, after the characteristic information data table is input into the pre-trained sensitive data table identification model and the data table identification result corresponding to the information data table to be identified is determined, the method further includes: and if the data table identification result is not the sensitive data table identification result, carrying out data flow processing on the information data table to be identified by the next node.
In this embodiment, if it is determined that the data table identification result is not the sensitive data table identification result, the data table identification result is the normal data table identification result, and then the information data table to be identified needs to be subjected to data flow processing of the next node, that is, the information data table to be identified does not contain sensitive data, and execution of security control measures is not required, so that data transmission processing operation can be performed.
Optionally, before the acquiring the information data table to be identified in real time, the method further includes: acquiring a plurality of historical information data tables, wherein the historical information data tables comprise a historical normal information data table and a historical sensitive information data table; respectively carrying out feature extraction on each historical information data table through the table feature extraction method to obtain a historical feature information data table; and carrying out model training on the initial sensitive data table identification model through the historical characteristic information data table until the model output accuracy reaches an accuracy threshold, and determining that training is completed on the sensitive data table identification model.
In this embodiment, according to acquiring a plurality of history information data tables, and performing table feature extraction on the plurality of history information data tables, a history feature information data table is obtained. Further, training operation of the sensitive data table identification model is carried out according to the historical characteristic information data table.
Specifically, if the model output accuracy reaches an accuracy threshold, determining that training is completed to identify the model by the sensitive data table; if the model output accuracy rate does not reach the accuracy rate threshold value, the historical information data table is further required to be acquired to retrain the model until the model is trained to obtain the sensitive data table identification model.
Additionally, the initial sensitive data sheet recognition model may be constructed based on machine learning algorithms, decision trees, random forests, neural networks, and the like.
According to the technical scheme, the information data table to be identified is obtained in real time; performing feature extraction on the information data table to be identified according to a preset table feature extraction method to obtain a feature information data table corresponding to the information data table to be identified; inputting the characteristic information data table into a pre-trained sensitive data table recognition model, and determining a data table recognition result corresponding to the information data table to be recognized; and if the data table identification result is a sensitive data table identification result, acquiring and automatically executing a safety control measure associated with the sensitive data table identification result so as to ensure the data information safety of the sensitive data table according to the safety control measure. The method solves the problem of sensitive information leakage caused by incapability of timely identifying the sensitive data in the form, improves the accuracy and efficiency of identifying the sensitive data, increases the accuracy of identifying the sensitive data, and ensures the safety of the sensitive data.
Example two
Fig. 2 is a schematic structural diagram of an identification device for a data sensitive table according to a second embodiment of the present invention. The identification device of the data sensitive table provided by the embodiment of the invention can be realized through software and/or hardware, and can be configured in terminal equipment or a server to realize the identification method of the data sensitive table in the embodiment of the invention. As shown in fig. 2, the apparatus includes: an information data table acquisition module 210, a characteristic information data table acquisition module 220, a data table identification result determination module 230, and a security control measure acquisition and automatic execution module 240.
The information data table obtaining module 210 is configured to obtain an information data table to be identified in real time;
the feature information data table obtaining module 220 is configured to perform feature extraction on the information data table to be identified according to a preset table feature extraction method, so as to obtain a feature information data table corresponding to the information data table to be identified;
the data table identification result determining module 230 is configured to input the characteristic information data table into a pre-trained sensitive data table identification model, and determine a data table identification result corresponding to the information data table to be identified;
And the security control measure acquiring and automatic executing module 240 is configured to acquire and automatically execute a security control measure associated with the sensitive data table identification result if the data table identification result is the sensitive data table identification result, so as to ensure the data information security of the sensitive data table according to the security control measure.
According to the technical scheme, the information data table to be identified is obtained in real time; performing feature extraction on the information data table to be identified according to a preset table feature extraction method to obtain a feature information data table corresponding to the information data table to be identified; inputting the characteristic information data table into a pre-trained sensitive data table recognition model, and determining a data table recognition result corresponding to the information data table to be recognized; and if the data table identification result is a sensitive data table identification result, acquiring and automatically executing a safety control measure associated with the sensitive data table identification result so as to ensure the data information safety of the sensitive data table according to the safety control measure. The method solves the problem of sensitive information leakage caused by incapability of timely identifying the sensitive data in the form, improves the accuracy and efficiency of identifying the sensitive data, increases the accuracy of identifying the sensitive data, and ensures the safety of the sensitive data.
Optionally, the data flow processing module may be specifically configured to: and after the characteristic information data table is input into a pre-trained sensitive data table identification model and a data table identification result corresponding to the information data table to be identified is determined, if the data table identification result is not the sensitive data table identification result, carrying out data flow processing on the information data table to be identified by the next node.
Optionally, the table feature extraction method includes at least one of the following: keyword extraction method, text feature extraction method, numerical feature extraction method, and column selection extraction method.
Optionally, the feature information data table obtaining module 220 may be specifically configured to: extracting keywords from the information data table to be identified by a preset keyword extraction method to obtain a keyword characteristic information data table; extracting text features from the keyword feature information data table by a text feature extraction method to obtain a text feature information data table; extracting numerical characteristics from the text characteristic information data table by a numerical characteristic extraction method to obtain a numerical characteristic information data table; and carrying out column information selection and extraction on the numerical characteristic information data table by a column selection and extraction method to obtain a characteristic information data table corresponding to the information data table to be identified.
Optionally, the feature information data table obtaining module 220 may be further specifically configured to: acquiring list data information corresponding to the column selection extraction method; wherein the list data information includes at least one of: list header, column name, table data type, and column length; and respectively carrying out column information selection and extraction on the numerical characteristic information data table according to the list head, the column name, the table data type and the column length to obtain a characteristic information data table corresponding to the information data table to be identified.
Optionally, the table feature extraction method further includes: and (5) an associated feature extraction method.
Optionally, the method further includes a related characteristic information data table determining module, which may be specifically configured to: respectively carrying out column information selection extraction on the numerical characteristic information data table according to the list head, the column name, the table data type and the column length to obtain a characteristic information data table corresponding to the information data table to be identified, and then carrying out associated characteristic calculation extraction on the characteristic information data table by an associated characteristic extraction method to obtain an associated characteristic information data table; and adding the associated characteristic information data table into the characteristic information data table, and inputting the processed characteristic information data table into the sensitive data table identification model.
Optionally, the data table identification result determining module 230 may be specifically configured to: performing feature preprocessing on the feature information data table according to a feature preprocessing method to obtain a feature preprocessing information data table; wherein the feature pretreatment method comprises at least one of the following steps: feature coding, feature marking, feature selection and feature dimension reduction; inputting the characteristic preprocessing information data table into a pre-trained sensitive data table recognition model, and determining a data table recognition result corresponding to the information data table to be recognized.
Optionally, the system further includes a sensitive data table recognition model training module, which may be specifically configured to: before the information data table to be identified is obtained in real time, a plurality of historical information data tables are obtained, wherein the historical information data tables comprise a historical normal information data table and a historical sensitive information data table; respectively carrying out feature extraction on each historical information data table through the table feature extraction method to obtain a historical feature information data table; and carrying out model training on the initial sensitive data table identification model through the historical characteristic information data table until the model output accuracy reaches an accuracy threshold, and determining that training is completed on the sensitive data table identification model.
The identification device of the data sensitive table provided by the embodiment of the invention can execute the identification method of the data sensitive table provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.
Example III
Fig. 3 shows a schematic diagram of the structure of an electronic device 10 that may be used to implement a third embodiment of the invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic equipment may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 3, the electronic device 10 includes at least one processor 11, and a memory, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, etc., communicatively connected to the at least one processor 11, in which the memory stores a computer program executable by the at least one processor, and the processor 11 may perform various appropriate actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from the storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data required for the operation of the electronic device 10 may also be stored. The processor 11, the ROM 12 and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.
Various components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, etc.; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. The processor 11 performs the various methods and processes described above, such as the identification method of the data sensitive table.
In some embodiments, the method of identifying a data sensitive table may be implemented as a computer program tangibly embodied on a computer-readable storage medium, such as storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. When the computer program is loaded into RAM 13 and executed by processor 11, one or more steps of the data sensitive table identification method described above may be performed. Alternatively, in other embodiments, the processor 11 may be configured to perform the method of identifying the data sensitive table in any other suitable way (e.g., by means of firmware).
The method comprises the following steps: acquiring an information data table to be identified in real time; performing feature extraction on the information data table to be identified according to a preset table feature extraction method to obtain a feature information data table corresponding to the information data table to be identified; inputting the characteristic information data table into a pre-trained sensitive data table recognition model, and determining a data table recognition result corresponding to the information data table to be recognized; and if the data table identification result is a sensitive data table identification result, acquiring and automatically executing a safety control measure associated with the sensitive data table identification result so as to ensure the data information safety of the sensitive data table according to the safety control measure.
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
A computer program for carrying out methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be implemented. The computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) through which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.
The computing system may include clients and servers. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present invention may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution of the present invention are achieved, and the present invention is not limited herein.
The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.
Example IV
A fourth embodiment of the present invention also provides a computer-readable storage medium containing computer-readable instructions, which when executed by a computer processor, are configured to perform a method of identifying a data-sensitive table, the method comprising: acquiring an information data table to be identified in real time; performing feature extraction on the information data table to be identified according to a preset table feature extraction method to obtain a feature information data table corresponding to the information data table to be identified; inputting the characteristic information data table into a pre-trained sensitive data table recognition model, and determining a data table recognition result corresponding to the information data table to be recognized; and if the data table identification result is a sensitive data table identification result, acquiring and automatically executing a safety control measure associated with the sensitive data table identification result so as to ensure the data information safety of the sensitive data table according to the safety control measure.
Of course, the computer-readable storage medium provided in the embodiments of the present invention is not limited to the above-described method operations, and may also perform the related operations in the data-sensitive table identification method provided in any embodiment of the present invention.
From the above description of embodiments, it will be clear to a person skilled in the art that the present invention may be implemented by means of software and necessary general purpose hardware, but of course also by means of hardware, although in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, etc., and include several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments of the present invention.
It should be noted that, in the embodiment of the identification device of the data sensitive table, each unit and module included are only divided according to the functional logic, but not limited to the above-mentioned division, so long as the corresponding functions can be implemented; in addition, the specific names of the functional units are also only for distinguishing from each other, and are not used to limit the protection scope of the present invention.
The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims (10)

1. A method for identifying a data sensitive table, comprising:
acquiring an information data table to be identified in real time;
performing feature extraction on the information data table to be identified according to a preset table feature extraction method to obtain a feature information data table corresponding to the information data table to be identified;
inputting the characteristic information data table into a pre-trained sensitive data table recognition model, and determining a data table recognition result corresponding to the information data table to be recognized;
and if the data table identification result is a sensitive data table identification result, acquiring and automatically executing a safety control measure associated with the sensitive data table identification result so as to ensure the data information safety of the sensitive data table according to the safety control measure.
2. The method according to claim 1, further comprising, after the inputting the characteristic information data table into a pre-trained sensitive data table recognition model and determining a data table recognition result corresponding to the information data table to be recognized:
and if the data table identification result is not the sensitive data table identification result, carrying out data flow processing on the information data table to be identified by the next node.
3. The method of claim 1, wherein the table feature extraction method comprises at least one of: keyword extraction method, text feature extraction method, numerical feature extraction method and column selection extraction method;
the feature extraction is carried out on the information data table to be identified according to a preset table feature extraction method to obtain a feature information data table corresponding to the information data table to be identified, and the feature information data table comprises:
extracting keywords from the information data table to be identified by a preset keyword extraction method to obtain a keyword characteristic information data table;
extracting text features from the keyword feature information data table by a text feature extraction method to obtain a text feature information data table;
Extracting numerical characteristics from the text characteristic information data table by a numerical characteristic extraction method to obtain a numerical characteristic information data table;
and carrying out column information selection and extraction on the numerical characteristic information data table by a column selection and extraction method to obtain a characteristic information data table corresponding to the information data table to be identified.
4. A method according to claim 3, wherein the step of performing column information selection extraction on the numerical feature information data table by a column selection extraction method to obtain a feature information data table corresponding to the information data table to be identified comprises:
acquiring list data information corresponding to the column selection extraction method; wherein the list data information includes at least one of: list header, column name, table data type, and column length;
and respectively carrying out column information selection and extraction on the numerical characteristic information data table according to the list head, the column name, the table data type and the column length to obtain a characteristic information data table corresponding to the information data table to be identified.
5. The method of claim 4, wherein the table feature extraction method further comprises: an associated feature extraction method;
And respectively carrying out column information selection and extraction on the numerical characteristic information data table according to the list head, the column name, the table data type and the column length to obtain a characteristic information data table corresponding to the information data table to be identified, and further comprising:
calculating and extracting the associated features of the feature information data table by an associated feature extraction method to obtain an associated feature information data table;
and adding the associated characteristic information data table into the characteristic information data table, and inputting the processed characteristic information data table into the sensitive data table identification model.
6. The method according to claim 5, wherein the inputting the characteristic information data table into a pre-trained sensitive data table recognition model, determining a data table recognition result corresponding to the information data table to be recognized, includes:
performing feature preprocessing on the feature information data table according to a feature preprocessing method to obtain a feature preprocessing information data table;
wherein the feature pretreatment method comprises at least one of the following steps: feature coding, feature marking, feature selection and feature dimension reduction;
Inputting the characteristic preprocessing information data table into a pre-trained sensitive data table recognition model, and determining a data table recognition result corresponding to the information data table to be recognized.
7. The method of claim 6, further comprising, prior to said acquiring in real time the information data table to be identified:
acquiring a plurality of historical information data tables, wherein the historical information data tables comprise a historical normal information data table and a historical sensitive information data table;
respectively carrying out feature extraction on each historical information data table through the table feature extraction method to obtain a historical feature information data table;
and carrying out model training on the initial sensitive data table identification model through the historical characteristic information data table until the model output accuracy reaches an accuracy threshold, and determining that training is completed on the sensitive data table identification model.
8. An apparatus for identifying a data sensitive table, comprising:
the information data table acquisition module is used for acquiring the information data table to be identified in real time;
the characteristic information data table obtaining module is used for carrying out characteristic extraction on the information data table to be identified according to a preset table characteristic extraction method to obtain a characteristic information data table corresponding to the information data table to be identified;
The data table identification result determining module is used for inputting the characteristic information data table into a pre-trained sensitive data table identification model and determining a data table identification result corresponding to the information data table to be identified;
and the safety control measure acquisition and automatic execution module is used for acquiring and automatically executing the safety control measure associated with the sensitive data table identification result if the data table identification result is the sensitive data table identification result so as to ensure the data information safety of the sensitive data table according to the safety control measure.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of identifying a data sensitive table according to any of claims 1-7 when executing the computer program.
10. A computer readable storage medium storing computer instructions for causing a processor to perform the method of identifying a data sensitive table according to any one of claims 1-7.
CN202311105446.6A 2023-08-29 2023-08-29 Identification method and device of data sensitive table, electronic equipment and storage medium Pending CN117076610A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311105446.6A CN117076610A (en) 2023-08-29 2023-08-29 Identification method and device of data sensitive table, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311105446.6A CN117076610A (en) 2023-08-29 2023-08-29 Identification method and device of data sensitive table, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117076610A true CN117076610A (en) 2023-11-17

Family

ID=88705909

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311105446.6A Pending CN117076610A (en) 2023-08-29 2023-08-29 Identification method and device of data sensitive table, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117076610A (en)

Similar Documents

Publication Publication Date Title
CN116167352B (en) Data processing method, device, electronic equipment and storage medium
CN112507118A (en) Information classification and extraction method and device and electronic equipment
CN115168562A (en) Method, device, equipment and medium for constructing intelligent question-answering system
CN112632227A (en) Resume matching method, resume matching device, electronic equipment, storage medium and program product
CN115603955B (en) Abnormal access object identification method, device, equipment and medium
US20230052623A1 (en) Word mining method and apparatus, electronic device and readable storage medium
CN116340777A (en) Training method of log classification model, log classification method and device
CN115600592A (en) Method, device, equipment and medium for extracting key information of text content
CN117076610A (en) Identification method and device of data sensitive table, electronic equipment and storage medium
CN113963197A (en) Image recognition method and device, electronic equipment and readable storage medium
CN113408280A (en) Negative example construction method, device, equipment and storage medium
CN113961672A (en) Information labeling method and device, electronic equipment and storage medium
CN114328687B (en) Event extraction model training method and device and event extraction method and device
CN116340831B (en) Information classification method and device, electronic equipment and storage medium
CN113536751B (en) Processing method and device of form data, electronic equipment and storage medium
CN116069914B (en) Training data generation method, model training method and device
CN116244740B (en) Log desensitization method and device, electronic equipment and storage medium
CN117807972A (en) Method, device, equipment and medium for extracting form information in long document
CN115618242A (en) Repeated text recognition method and device, electronic equipment and storage medium
CN117609723A (en) Object identification method and device, electronic equipment and storage medium
CN117851599A (en) Method, device, equipment and medium for extracting text of other elements of investment supervision
CN114898374A (en) Image semantic recognition method, device, equipment and storage medium
CN118035445A (en) Work order classification method and device, electronic equipment and storage medium
CN116431809A (en) Text labeling method, device and storage medium based on bank customer service scene
CN115630068A (en) Abnormal data table determining method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination